feat(p1-2): kb-parse-md frontmatter (YAML/TOML → Metadata) #7
Reference in New Issue
Block a user
Delete Branch "feat/p1-2-parse-md-frontmatter"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
요약
P1-2 task spec 구현. 신규 crate
kb-parse-md의 frontmatter submodule — Markdown bytes에서 YAML/TOML frontmatter를 파싱해kb_core::Metadata+Vec<Warning>+Option<FrontmatterSpan>산출. 디자인 §0 Q9 derive table 1:1 적용. P1-3 block parser는 동일 crate에 sibling submodule로 추가될 예정.기준: tasks/p1/p1-2-parse-md-frontmatter.md, docs/superpowers/specs/2026-04-27-kb-final-form-design.md (§0 Q9 / §3.6 Metadata / §3.7b Warning / §10).
구현
crates/kb-parse-md:lib.rs—parse_frontmatter,BodyHints,FrontmatterSpan만 노출frontmatter.rs— delim detection / YAML(serde_yaml_ng) + TOML 파서 / §0 Q9 derive chain / lingua lang detect (ko/en/ja/zh,OnceLock캐시)tests/frontmatter_snapshots.rs— 3 snapshot + 1 regeneratorfixtures/markdown/frontmatter-only.md+ baseline JSONfixtures/markdown/mixed-lang.md+ baseline JSON동작 contract 핵심
metadata.user보존, no warningMalformedFrontmatterwarning + default + continuemetadata.user.original_timestamps[field]보존id:field →Metadata.user_id_alias단일 위치 (doc_id 영향 없음, recipe sentinel test 박힘)Metadata외부 필드 (디자인 §3.4 CanonicalDocument 직접 필드) →metadata.user["lang"]/metadata.user["title"]임시 보관, P1-4 normalize 가CanonicalDocument로 liftCRLF / BOM / 트림 robustness
검증
cargo check / build (RUSTFLAGS=-D warnings) / clippy --all-targets -D warningscleancargo test --workspace→ kb-parse-md 26 unit + 3 snapshot, 워크스페이스 전체 그린cargo tree -p kb-parse-md --depth 1→ 허용 deps만 (kb-core, kb-parse-types, anyhow, serde, serde_json, serde_yaml_ng, time, toml, lingua)Review
user_id_alias중복) 즉시 반영 (1fab6b0)다음
머지 후 feat/p1-3-parse-md-blocks 분기 → P1-3 진행 (sibling submodule).
Implement the frontmatter submodule: - detect_delimiters scans for a leading YAML (---) or TOML (+++) block at byte 0. Strict per §0 Q9: no leading whitespace / BOM, no chars on the delimiter line. Closing must be its own line. Unterminated → no FM. - parse_raw deserializes into RawFrontmatter, a serde-flatten struct that catches unknown keys into a serde_json::Map for verbatim preservation in metadata.user. - derive_metadata implements the §0 Q9 fallback chain: title → frontmatter | BodyHints.first_h1 | (filename: caller) aliases/tags→ frontmatter | [] lang → frontmatter | lingua autodetect on first 4 KB | hints | "und" created_at → frontmatter (RFC 3339, normalized to UTC) | fs_ctime updated_at → frontmatter | fs_mtime source_type → frontmatter | "markdown" trust_level → frontmatter | "primary" id → user_id_alias only — never a doc_id factor (§4.2) - Non-UTC offsets are normalized to UTC; the original string is preserved in user.original_timestamps[field] per §0 Q9. - Warnings are emitted for: malformed YAML/TOML, unknown enum values, malformed timestamps. Unknown keys are silent. - lingua detector is cached in a OnceLock — first build is heavy. - 15 unit tests cover every row of the derive table + delimiter edge cases + an explicit pin that `id:` does not feed id_for_doc.리뷰: 모든 항목 해소 — APPROVE 권고 (gate 정책상 COMMENT 등록)
내부 review loop 3회 (spec → quality round 1 → quality round 2) 완료. 마지막 리뷰어 verdict: APPROVED, must-fix-now 0건.
Review trail
user_id_alias중복 must-fix)1fab6b06a4db62,5850bfc핵심 검증 포인트
b"---\r\ntitle: Doc\r\n---\r\nbody\r\n"→title=Some("Doc"),warns=[], span 정확. 4 EOL 조합 모두 hand-trace 통과serde(flatten)partition 정합 —id_field_does_not_feed_doc_idsentinel test 통과OnceLocklazy init, 4 lang feature gate (default-features = false)Gate 통과
cargo check / build (RUSTFLAGS=-D warnings) / clippy --all-targets -D warningscleancargo test -p kb-parse-md→ 26 unit + 3 snapshot passcargo test --workspace→ 전체 그린후속 자연 해소 (비차단)
metadata.user["lang"]/metadata.user["title"]lift toCanonicalDocumentkey:value패턴) 추가 검토결론
papered over 없음. self-review gate 정책상 본 코멘트는 COMMENT — author 측 수동 APPROVE + merge 부탁.