Bakes the chunker output for fixtures/markdown/long-section.md (3 H1s, nested H2 under Alpha, a 50-line code block, a 3-col x 4-row table, and a multi-paragraph Gamma section) into the JSON snapshot baseline. Confirms the priority rules end-to-end: - Heading boundaries hold across H1 → H2 → H1 transitions - The code block emits one chunk at 427 tokens > target=200 - The table stays single-chunk - Gamma's paragraph stream splits with one block of overlap seed A second test runs the full parse → normalize → chunk pipeline 5 times and asserts identical chunk_ids each pass. Drops the unused `kb-config` and `serde` from regular dependencies — they were declared but no source path imports them; `serde` flows in transitively via `kb-core` as a public API requirement, and `ChunkingCfg` lives in `kb-config` but the chunker takes `ChunkPolicy` directly. Production deps are now exactly the allowed set actually used: anyhow, blake3, kb-core, serde_json_canonicalizer, tracing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
27 lines
1.1 KiB
TOML
27 lines
1.1 KiB
TOML
[package]
|
|
name = "kb-chunk"
|
|
version = { workspace = true }
|
|
edition = { workspace = true }
|
|
rust-version = { workspace = true }
|
|
license = { workspace = true }
|
|
repository = { workspace = true }
|
|
description = "Chunkers that turn kb-core::CanonicalDocument into kb-core::Chunk batches (§3.5, §4.2, §7.2)"
|
|
|
|
[dependencies]
|
|
kb-core = { path = "../kb-core" }
|
|
serde_json_canonicalizer = "0.3"
|
|
blake3 = { workspace = true }
|
|
anyhow = { workspace = true }
|
|
tracing = { workspace = true }
|
|
|
|
[dev-dependencies]
|
|
# kb-parse-md / kb-normalize are dev-only — used by the snapshot integration
|
|
# test to build a CanonicalDocument from a fixture Markdown file. Forbidden as
|
|
# regular deps per design §8 (chunker consumes CanonicalDocument from kb-core
|
|
# only); `cargo tree -p kb-chunk --depth 1` (default scope, excludes dev-deps)
|
|
# confirms this.
|
|
kb-parse-md = { path = "../kb-parse-md" }
|
|
kb-normalize = { path = "../kb-normalize" }
|
|
serde_json = { workspace = true }
|
|
time = { workspace = true }
|