Files
kebab/crates/kb-chunk/Cargo.toml
altair823 58f7b8573d p1-5: add long-section fixture + Vec<Chunk> snapshot test
Bakes the chunker output for fixtures/markdown/long-section.md (3 H1s,
nested H2 under Alpha, a 50-line code block, a 3-col x 4-row table,
and a multi-paragraph Gamma section) into the JSON snapshot baseline.
Confirms the priority rules end-to-end:

  - Heading boundaries hold across H1 → H2 → H1 transitions
  - The code block emits one chunk at 427 tokens > target=200
  - The table stays single-chunk
  - Gamma's paragraph stream splits with one block of overlap seed

A second test runs the full parse → normalize → chunk pipeline 5
times and asserts identical chunk_ids each pass.

Drops the unused `kb-config` and `serde` from regular dependencies —
they were declared but no source path imports them; `serde` flows in
transitively via `kb-core` as a public API requirement, and
`ChunkingCfg` lives in `kb-config` but the chunker takes
`ChunkPolicy` directly. Production deps are now exactly the allowed
set actually used: anyhow, blake3, kb-core, serde_json_canonicalizer,
tracing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:33:29 +00:00

27 lines
1.1 KiB
TOML

[package]
name = "kb-chunk"
version = { workspace = true }
edition = { workspace = true }
rust-version = { workspace = true }
license = { workspace = true }
repository = { workspace = true }
description = "Chunkers that turn kb-core::CanonicalDocument into kb-core::Chunk batches (§3.5, §4.2, §7.2)"
[dependencies]
kb-core = { path = "../kb-core" }
serde_json_canonicalizer = "0.3"
blake3 = { workspace = true }
anyhow = { workspace = true }
tracing = { workspace = true }
[dev-dependencies]
# kb-parse-md / kb-normalize are dev-only — used by the snapshot integration
# test to build a CanonicalDocument from a fixture Markdown file. Forbidden as
# regular deps per design §8 (chunker consumes CanonicalDocument from kb-core
# only); `cargo tree -p kb-chunk --depth 1` (default scope, excludes dev-deps)
# confirms this.
kb-parse-md = { path = "../kb-parse-md" }
kb-normalize = { path = "../kb-normalize" }
serde_json = { workspace = true }
time = { workspace = true }