feat(p1-4): kb-normalize + kb-core Inline schema hotfix #9
Reference in New Issue
Block a user
Delete Branch "feat/p1-4-normalize"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
요약
P1-4 task spec 구현. 신규 crate
kb-normalize—RawAsset(P1-1) +Metadata(P1-2) +Vec<ParsedBlock>(P1-3) →kb_core::CanonicalDocument결정적 ID 부여. 동시에 P1-3 spec reviewer가 P1-4로 defer한kb_core::Inlineserde schema §9 hotfix 동반.기준: tasks/p1/p1-4-normalize.md, docs/superpowers/specs/2026-04-27-kb-final-form-design.md (§3.4 / §3.7b / §4.2-3 / §3.6 / §8 / §9).
구현
kb-core::Inlineschema 핫픽스 (§9 migration)Inline::Text(String)/Code(String)/Strong(Vec<…>)/Emph(Vec<…>)4개 newtype variant → struct variant 형태로 변환 (Text { text: String }등).tag = \"kind\"내부 태그가 newtype variant + non-struct payload 조합을 지원하지 않는 serde 한계로 4/5 variant runtime 실패.inline_serde_round_trip(kb-core unit test) — 5 variant 모두 round-trip 검증.kb-parse-md/src/blocks.rs모든 호출자 mechanical sweep.kb-parse-md/tests/blocks_snapshots.rs—BlockView/PayloadViewprojection 우회 코드 제거, 직접Vec<ParsedBlock>직렬화로 전환. baselinenested-headings.blocks.snapshot.json재생성.BlockTuple(id_for_block recipe) 에는Inline미포함 — ID stability 영향 없음 (검증).crates/kb-normalizeCargo.toml— 워크스페이스 멤버, deps 정확히 사용분만 (kb-core, kb-parse-types, anyhow, serde, serde_json, time, tracing, unicode-normalization).kb-parse-md는[dev-dependencies]만 (§8 경계 유지)src/lib.rs— public surface 정확히build_canonical_document+pub use kb_core::{id_for_doc, id_for_block}tests/normalize_snapshot.rs—code-and-table.md픽스처 →CanonicalDocumentJSON 결정적 round-tripfixtures/markdown/code-and-table.canonical.snapshot.json— baseline동작 contract 핵심
doc_id = id_for_doc(workspace_path, asset_id, parser_version)per §4.2block_idper §4.3 ordinal rule —(heading_path, kind)별 0-based, document orderheading_pathstrings NFC-normalize 후 hash + wire form 양쪽 모두 NFC 형태 사용 (NFD/NFC 동일성 보장)metadata.user[\"title\"]/[\"lang\"]→CanonicalDocument.title/.lang로 lift, user map 에서 제거now_utc()1회 공유AssetId(\"\")placeholder 유효성 위반 방지)warning_agent라우팅: upstreamExtractFailed→kb-parse-md(panic-recovery 출처), lift-stage warnings →kb-normalize(별도lift_warnings벡터)검증
cargo check / build (RUSTFLAGS=-D warnings) / clippy --all-targets -D warningscleancargo test -p kb-normalize→ 14 unit + 1 integration passcargo test --workspace→ 147 passing 전체 그린cargo tree -p kb-normalize --depth 1→ 허용 deps 만 (parser-impl 없음)Review trail
e0df429,557275c)⚠️ Bisect 주의
606ce1c(kb-core 핫픽스) →cfccb36(kb-parse-md 호출자 갱신) 사이 중간 커밋은 컴파일 안 됨. 워크스페이스 내부 schema migration 의 본질적 한계 — 한 crate 단독 atomic 불가.git bisect시 두 커밋을 같이 건너뛰거나 squash 권장.후속 자연 해소 (비차단)
serde_json_canonicalizerv0.3 가 NFC normalize 안 함 — 현재 caller 측에서 NFC 보장. 추후id_from자체에 NFC 단계 추가 시 caller 의존성 줄일 수 있음.block_id공유 — §4.2 recipe 확장 시 per-item ID 가능 (rustdoc 명시).ExtractFailed가 lift-stage 에서도 발생 가능 — 향후Warning에WarningSource필드 추가가 구조적 fix.title_empty_string_in_user_map_falls_back_to_default테스트명 약간 오해 소지 (cosmetic).다음
머지 후
feat/p1-5-chunk분기 → P1-5 (kb-chunk) 진행.`#[serde(tag = "kind")]` rejects newtype variants whose payload is not a struct, so 4 of 5 `Inline` variants (`Text(String)`, `Code(String)`, `Strong(Vec<…>)`, `Emph(Vec<…>)`) failed to serialize at runtime — only `Link { text, href }` worked. Convert every variant to struct form so the internally-tagged shape is well-formed and round-trips through JSON. Add `inline_serde_round_trip` covering all five variants. Per design §9, this is a wire-schema migration; no `docs/wire-schema/v1/*.json` change required since `Inline` is not directly referenced there. Callers in kb-parse-md follow in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Mechanical sweep over `Inline::Text(_)` / `Code(_)` / `Strong(_)` / `Emph(_)` construction and match sites under the new struct-variant shape introduced in the previous commit. `Inline::Link { text, href }` is unchanged. The snapshot test in `tests/blocks_snapshots.rs` previously projected `ParsedBlock` into a `BlockView`/`PayloadView` shim because the old `Inline` could not serialize. With the schema fix in place we now serialize `ParsedBlock` directly through serde — the shim and its `flatten_inline` helper are removed. Inlines surface as structured objects (`{"kind":"text","text":"…"}` etc.). Regenerated `nested-headings.blocks.snapshot.json` to reflect the new shape via the existing `--ignored` emitter; `code-and-table.blocks.snapshot.json` has no inlines and is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>I1: warning_agent maps ExtractFailed → "kb-parse-md" (the panic-recovery emitter in kb-parse-md/src/blocks.rs). Lift-stage warnings from build_canonical_document are tracked separately and attributed to "kb-normalize", so the I1 mapping change does not lie about kb-normalize-originated drops. I2: ParsedPayload::AudioRef no longer synthesizes Block::AudioRef with an invalid empty AssetId (would violate AssetId::from_str's 32-hex invariant). Block is dropped, Warning surfaces in Provenance with src mention, attributed to kb-normalize (lift-stage decision). TODO(P8) comment marks this as a placeholder until the audio extractor lands. I3: NFC-normalize each heading_path string in lift_block before feeding into id_for_block AND into CommonBlock.heading_path. pulldown-cmark does not NFC heading text and serde_json_canonicalizer v0.3 does not either, so canonically-equivalent NFD/NFC inputs would produce different block_ids without this normalization. Mirrors the existing doc_id NFC handling via to_posix. Minors: - M4: trim Cargo.toml — drop kb-config, serde_json_canonicalizer, blake3 (unused); keep tracing (now wired) + unicode-normalization (now used by I3). - M5: determinism_1000_iterations_under_1s now uses the same 5-block fixture as block_ordinals_scoped_per_heading_and_kind (extracted into fixture_blocks_five helper) so the determinism property is exercised on a real lift_block path, not just an empty Vec. Still < 1s. - M6: snapshot integration test now passes BodyHints { first_h1: Some("Code And Table"), .. } and asserts doc.title == "Code And Table" end-to-end. Baseline JSON updated. - M7: title/lang edge-case unit tests pin policy: empty string lifts to empty string; non-stringy values silently drop. Rustdoc updated. - M10: provenance_contains_stage_events_in_order asserts events[1].at == events[2].at to pin the shared-now_utc invariant. New tests (unit, kb-normalize): - provenance_with_extract_failed_warning_attributes_to_kb_parse_md (I1) - audio_ref_block_skipped_with_warning (I2) - nfc_nfd_korean_heading_path_same_block_id (I3) - title_empty_string_in_user_map_falls_back_to_default (M7) - title_non_string_in_user_map_silently_drops (M7) - lang_invalid_shape_silently_drops (M7) kb-normalize unit tests: 9 → 14. Integration snapshot: 1 (unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>리뷰: 모든 항목 해소 — APPROVE 권고 (gate 정책상 COMMENT 등록)
내부 review loop 3회 완료. 마지막 리뷰어 verdict: APPROVED, must-fix-now 0건.
Review trail
e0df429,557275c핵심 검증
kb_core::Inlineschema (§9 migration)doc_id/block_idper §4.2-3(heading_path, kind)scoped 0-based, 5개 block fixture 로 검증heading_pathNFC 핫픽스nfc_nfd_korean_heading_path_same_block_idAssetId("")유효성 위반 차단, TODO(P8) 명시Gate 통과
cargo check / build (RUSTFLAGS=-D warnings) / clippy --all-targets -D warningscleancargo test --workspace→ 147 passing (was 142, +5)cargo tree -p kb-normalize --depth 1parser-impl 부재⚠️ Bisect 주의
606ce1c→cfccb36사이는 컴파일 안 됨 (cross-crate schema migration).git bisect시 두 커밋 같이 skip 또는 squash 권장.후속 자연 해소 (비차단)
id_fromNFC 자체 적용 (현재 caller 측 보장)Warning::WarningSource필드 (multi-stage 동일 kind 구조적 분리)결론
papered over 없음. self-review gate 정책상 본 코멘트는 COMMENT — author 측 수동 APPROVE + merge 부탁.