kebab

altair823-org/kebab

Fork 0

Commit Graph

Author	SHA1	Message	Date
altair823	53ec9b4dc5	test(chunk): regenerate AST + long-section snapshots for V009 chunk field S3 의 Chunk struct 갱신 (kebab-core 의 tokenized_korean_text: Option<String> field 추가) 가 모든 chunk snapshot JSON 의 serde serialize 결과를 변경시킴. 10 snapshot fixture (9 AST chunker + markdown long-section) 의 baseline 을 V009 형태로 regenerate. 각 snapshot 의 변경 = chunk JSON 마다 `"tokenized_korean_text": null` field 추가 (대부분의 fixture 가 영어 코드라 lindera 의 None fallback). 동작 변경 없음 — serde representation 의 cascade만. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §6.2 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S3 follow-up via S11 sanity)	2026-05-28 12:27:37 +00:00
altair823	207a0ff61e	kb-chunk: regenerate long-section.chunks.snapshot.json baseline The snapshot now includes the policy_hash field on every Chunk per the preceding kb-core schema change. chunk_ids are unchanged because the chunk_id recipe (§4.2) already incorporated policy_hash via the chunker — the field is simply now visible in the wire form. Regenerated via: UPDATE_SNAPSHOTS=1 cargo test -p kb-chunk long_section_chunks_snapshot	2026-04-30 17:02:53 +00:00
altair823	58f7b8573d	p1-5: add long-section fixture + Vec<Chunk> snapshot test Bakes the chunker output for fixtures/markdown/long-section.md (3 H1s, nested H2 under Alpha, a 50-line code block, a 3-col x 4-row table, and a multi-paragraph Gamma section) into the JSON snapshot baseline. Confirms the priority rules end-to-end: - Heading boundaries hold across H1 → H2 → H1 transitions - The code block emits one chunk at 427 tokens > target=200 - The table stays single-chunk - Gamma's paragraph stream splits with one block of overlap seed A second test runs the full parse → normalize → chunk pipeline 5 times and asserts identical chunk_ids each pass. Drops the unused `kb-config` and `serde` from regular dependencies — they were declared but no source path imports them; `serde` flows in transitively via `kb-core` as a public API requirement, and `ChunkingCfg` lives in `kb-config` but the chunker takes `ChunkPolicy` directly. Production deps are now exactly the allowed set actually used: anyhow, blake3, kb-core, serde_json_canonicalizer, tracing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:33:29 +00:00

Author

SHA1

Message

Date

altair823

53ec9b4dc5

test(chunk): regenerate AST + long-section snapshots for V009 chunk field

S3 의 Chunk struct 갱신 (kebab-core 의 tokenized_korean_text:
Option<String> field 추가) 가 모든 chunk snapshot JSON 의 serde
serialize 결과를 변경시킴. 10 snapshot fixture (9 AST chunker +
markdown long-section) 의 baseline 을 V009 형태로 regenerate.

각 snapshot 의 변경 = chunk JSON 마다 `"tokenized_korean_text":
null` field 추가 (대부분의 fixture 가 영어 코드라 lindera 의 None
fallback). 동작 변경 없음 — serde representation 의 cascade만.

Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §6.2
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S3 follow-up via S11 sanity)

2026-05-28 12:27:37 +00:00

altair823

207a0ff61e

kb-chunk: regenerate long-section.chunks.snapshot.json baseline

The snapshot now includes the policy_hash field on every Chunk per the
preceding kb-core schema change. chunk_ids are unchanged because the
chunk_id recipe (§4.2) already incorporated policy_hash via the chunker —
the field is simply now visible in the wire form.

Regenerated via:
  UPDATE_SNAPSHOTS=1 cargo test -p kb-chunk long_section_chunks_snapshot

2026-04-30 17:02:53 +00:00

altair823

58f7b8573d

p1-5: add long-section fixture + Vec<Chunk> snapshot test

Bakes the chunker output for fixtures/markdown/long-section.md (3 H1s,
nested H2 under Alpha, a 50-line code block, a 3-col x 4-row table,
and a multi-paragraph Gamma section) into the JSON snapshot baseline.
Confirms the priority rules end-to-end:

  - Heading boundaries hold across H1 → H2 → H1 transitions
  - The code block emits one chunk at 427 tokens > target=200
  - The table stays single-chunk
  - Gamma's paragraph stream splits with one block of overlap seed

A second test runs the full parse → normalize → chunk pipeline 5
times and asserts identical chunk_ids each pass.

Drops the unused `kb-config` and `serde` from regular dependencies —
they were declared but no source path imports them; `serde` flows in
transitively via `kb-core` as a public API requirement, and
`ChunkingCfg` lives in `kb-config` but the chunker takes
`ChunkPolicy` directly. Production deps are now exactly the allowed
set actually used: anyhow, blake3, kb-core, serde_json_canonicalizer,
tracing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 16:33:29 +00:00

3 Commits