Commit Graph

3 Commits

Author SHA1 Message Date
53ec9b4dc5 test(chunk): regenerate AST + long-section snapshots for V009 chunk field
S3 의 Chunk struct 갱신 (kebab-core 의 tokenized_korean_text:
Option<String> field 추가) 가 모든 chunk snapshot JSON 의 serde
serialize 결과를 변경시킴. 10 snapshot fixture (9 AST chunker +
markdown long-section) 의 baseline 을 V009 형태로 regenerate.

각 snapshot 의 변경 = chunk JSON 마다 `"tokenized_korean_text":
null` field 추가 (대부분의 fixture 가 영어 코드라 lindera 의 None
fallback). 동작 변경 없음 — serde representation 의 cascade만.

Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §6.2
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S3 follow-up via S11 sanity)
2026-05-28 12:27:37 +00:00
207a0ff61e kb-chunk: regenerate long-section.chunks.snapshot.json baseline
The snapshot now includes the policy_hash field on every Chunk per the
preceding kb-core schema change. chunk_ids are unchanged because the
chunk_id recipe (§4.2) already incorporated policy_hash via the chunker —
the field is simply now visible in the wire form.

Regenerated via:
  UPDATE_SNAPSHOTS=1 cargo test -p kb-chunk long_section_chunks_snapshot
2026-04-30 17:02:53 +00:00
58f7b8573d p1-5: add long-section fixture + Vec<Chunk> snapshot test
Bakes the chunker output for fixtures/markdown/long-section.md (3 H1s,
nested H2 under Alpha, a 50-line code block, a 3-col x 4-row table,
and a multi-paragraph Gamma section) into the JSON snapshot baseline.
Confirms the priority rules end-to-end:

  - Heading boundaries hold across H1 → H2 → H1 transitions
  - The code block emits one chunk at 427 tokens > target=200
  - The table stays single-chunk
  - Gamma's paragraph stream splits with one block of overlap seed

A second test runs the full parse → normalize → chunk pipeline 5
times and asserts identical chunk_ids each pass.

Drops the unused `kb-config` and `serde` from regular dependencies —
they were declared but no source path imports them; `serde` flows in
transitively via `kb-core` as a public API requirement, and
`ChunkingCfg` lives in `kb-config` but the chunker takes
`ChunkPolicy` directly. Production deps are now exactly the allowed
set actually used: anyhow, blake3, kb-core, serde_json_canonicalizer,
tracing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:33:29 +00:00