Files
kebab/tasks/p1/p1-2-parse-md-frontmatter.md
altair823 d1b99b2994 docs: mark P0–P4 done, add SMOKE recipe, refresh README
State drift after P0–P4 completion + 3 post-merge hotfixes (PR #20
--config across subcommands, PR #24 --config in kb ask, PR #25 RRF
fusion_score normalization). README still framed the project as
"spec frozen, code 0 lines"; phase docs and task specs all carried
status: planned. Sweep:

- README.md: top banner now "P0–P4 done (17/31 tasks) + 3 hotfixes
  applied"; command table marks each subcommand's owning phase and
  current status (kb ask =  via P4-3, kb eval =  P5);
  phase roadmap table grew a Status column (P0–P4 completed, P5
  next, P6–P9 pending); component count bumped 30 → 31 to reflect
  P3-5 (app-wiring, post-spec); core decisions table notes the
  RRF [0,1] normalization invariant; build+실행 section drops the
  "P0 미시작" caveat; new pointers to HOTFIXES.md and SMOKE.md.
- docs/SMOKE.md (new): ~80-line recipe for running the full
  pipeline against an isolated /tmp/kb-smoke/ workspace via
  --config, without polluting ~/.config/kb/ or
  ~/.local/share/kb/. Covers fixture seeding, sample config.toml
  with the post-merge defaults, doctor → ingest → list →
  search × 3 modes → inspect → ask sequence, verification
  checklist, and known-behaviour notes (fastembed model
  download, RAG response time, --config hard-fail on missing
  path).
- tasks/phase-{0..4}-*.md: status frontmatter flipped planned →
  completed.
- tasks/p0/, tasks/p1/, tasks/p2/, tasks/p3/, tasks/p4/: same
  status flip across all 17 component task specs (1+6+2+5+3).
  P5–P9 stay planned.

cargo test --workspace: 319 passed; clippy clean (no source
changes in this commit, just docs + frontmatter).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:32:28 +00:00

4.8 KiB

phase, component, task_id, title, status, depends_on, unblocks, contract_source, contract_sections
phase component task_id title status depends_on unblocks contract_source contract_sections
P1 kb-parse-md (frontmatter submodule) p1-2 Markdown frontmatter parsing → Metadata completed
p0-1
p1-4
../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
design §3.6 Metadata
design §3.7b kb-parse-types (Warning)
design §0 Q9 frontmatter
design §10 errors

p1-2 — Markdown frontmatter parsing

Goal

Parse YAML/TOML frontmatter from Markdown bytes into kb_core::Metadata, with auto-derive defaults and unknown-key preservation in metadata.user.

Why now / why this size

Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of kb-parse-md simple and lets us reach 100% test coverage on the rules in design §0 Q9.

Allowed dependencies

  • kb-core
  • kb-parse-types (provides shared Warning + WarningKind per design §3.7b)
  • serde
  • serde_yaml (or yaml-rust2) for YAML
  • toml for TOML
  • time
  • lingua (lang auto-detect — accept feature-gate if heavy)
  • thiserror

Forbidden dependencies

  • kb-store-*, kb-llm*, kb-rag, kb-embed*, kb-search, kb-tui, kb-desktop, kb-source-fs, kb-chunk, kb-normalize, pulldown-cmark (block parser is a sibling task)

Inputs

input type source
Markdown bytes &[u8] extractor
body fallbacks BodyHints { first_h1: Option<String>, fs_ctime: OffsetDateTime, fs_mtime: OffsetDateTime, fallback_lang: Option<String> } caller

Outputs

output type downstream
(Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>) tuple kb-normalize → CanonicalDocument

Public surface (signatures only — no new types)

pub fn parse_frontmatter(
    bytes: &[u8],
    hints: &BodyHints,
) -> anyhow::Result<(kb_core::Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)>;

Warning / WarningKind come from kb-parse-types (shared with p1-3 blocks parser and downstream kb-normalize). FrontmatterSpan is crate-internal; if any new public type is needed, STOP and update the frozen design doc first.

Behavior contract

  • All Metadata fields are optional in input. Missing fields populated per design §0 Q9 derive table:
    • title ← first H1 (from BodyHints.first_h1) → filename without extension if no H1.
    • lang ← lingua auto-detect on first 4 KB of body → fallback BodyHints.fallback_lang or "und".
    • created_at / updated_atBodyHints.fs_ctime / fs_mtime if missing.
    • source_type default markdown; trust_level default primary.
    • aliases, tags default empty.
  • Unknown keys → metadata.user (serde_json::Map), preserved verbatim, no warning.
  • Unknown enum value (e.g. trust_level: weird) → emit kb_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" } + ingest continues with default.
  • Malformed YAML → frontmatter discarded, body still parsed, Warning { kind: WarningKind::MalformedFrontmatter, note: "<error msg>" } emitted.
  • No frontmatter at all → defaults applied silently.
  • id: field captured into metadata.user_id_alias (alias only — does NOT influence doc_id per design §4.2).

Storage / wire effects

  • None. Pure function.

Test plan

kind description fixture / data
unit YAML frontmatter happy path → Metadata fields inline
unit TOML frontmatter happy path inline
unit unknown keys preserved in metadata.user inline
unit unknown enum value → warning + default inline
unit malformed YAML → empty Metadata + warning inline
unit no frontmatter → derive from BodyHints inline
unit id: field becomes user_id_alias, not doc_id factor inline + assert via §4.2 recipe stub
snapshot fixtures/markdown/frontmatter-only.md produces stable JSON fixture
snapshot mixed-language body with no lang: detects ko or en fixtures/markdown/mixed-lang.md

All tests under cargo test -p kb-parse-md --lib frontmatter.

Definition of Done

  • cargo check -p kb-parse-md passes
  • cargo test -p kb-parse-md frontmatter passes
  • No pulldown-cmark import in this submodule
  • Snapshot tests stable across two consecutive runs
  • PR links design §0 Q9, §3.6

Out of scope

  • Block parsing (p1-3).
  • Building CanonicalDocument (p1-4).
  • Persisting metadata (p1-6).

Risks / notes

  • lingua model load is heavy on first call; tests should reuse a static instance.
  • timezone normalization: parse created_at/updated_at to UTC; preserve original offset only in metadata.user.original_timestamps if present and non-UTC.