Files
kebab/tasks/p1/p1-2-parse-md-frontmatter.md
altair823 f9714aa5cb docs(rename): kb → kebab — README, tasks/, docs/, design doc, report
마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:01:55 +00:00

4.8 KiB

phase, component, task_id, title, status, depends_on, unblocks, contract_source, contract_sections
phase component task_id title status depends_on unblocks contract_source contract_sections
P1 kebab-parse-md (frontmatter submodule) p1-2 Markdown frontmatter parsing → Metadata completed
p0-1
p1-4
../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
design §3.6 Metadata
design §3.7b kebab-parse-types (Warning)
design §0 Q9 frontmatter
design §10 errors

p1-2 — Markdown frontmatter parsing

Goal

Parse YAML/TOML frontmatter from Markdown bytes into kebab_core::Metadata, with auto-derive defaults and unknown-key preservation in metadata.user.

Why now / why this size

Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of kebab-parse-md simple and lets us reach 100% test coverage on the rules in design §0 Q9.

Allowed dependencies

  • kebab-core
  • kebab-parse-types (provides shared Warning + WarningKind per design §3.7b)
  • serde
  • serde_yaml (or yaml-rust2) for YAML
  • toml for TOML
  • time
  • lingua (lang auto-detect — accept feature-gate if heavy)
  • thiserror

Forbidden dependencies

  • kebab-store-*, kebab-llm*, kebab-rag, kebab-embed*, kebab-search, kebab-tui, kebab-desktop, kebab-source-fs, kebab-chunk, kebab-normalize, pulldown-cmark (block parser is a sibling task)

Inputs

input type source
Markdown bytes &[u8] extractor
body fallbacks BodyHints { first_h1: Option<String>, fs_ctime: OffsetDateTime, fs_mtime: OffsetDateTime, fallback_lang: Option<String> } caller

Outputs

output type downstream
(Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>) tuple kebab-normalize → CanonicalDocument

Public surface (signatures only — no new types)

pub fn parse_frontmatter(
    bytes: &[u8],
    hints: &BodyHints,
) -> anyhow::Result<(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)>;

Warning / WarningKind come from kebab-parse-types (shared with p1-3 blocks parser and downstream kebab-normalize). FrontmatterSpan is crate-internal; if any new public type is needed, STOP and update the frozen design doc first.

Behavior contract

  • All Metadata fields are optional in input. Missing fields populated per design §0 Q9 derive table:
    • title ← first H1 (from BodyHints.first_h1) → filename without extension if no H1.
    • lang ← lingua auto-detect on first 4 KB of body → fallback BodyHints.fallback_lang or "und".
    • created_at / updated_atBodyHints.fs_ctime / fs_mtime if missing.
    • source_type default markdown; trust_level default primary.
    • aliases, tags default empty.
  • Unknown keys → metadata.user (serde_json::Map), preserved verbatim, no warning.
  • Unknown enum value (e.g. trust_level: weird) → emit kebab_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" } + ingest continues with default.
  • Malformed YAML → frontmatter discarded, body still parsed, Warning { kind: WarningKind::MalformedFrontmatter, note: "<error msg>" } emitted.
  • No frontmatter at all → defaults applied silently.
  • id: field captured into metadata.user_id_alias (alias only — does NOT influence doc_id per design §4.2).

Storage / wire effects

  • None. Pure function.

Test plan

kind description fixture / data
unit YAML frontmatter happy path → Metadata fields inline
unit TOML frontmatter happy path inline
unit unknown keys preserved in metadata.user inline
unit unknown enum value → warning + default inline
unit malformed YAML → empty Metadata + warning inline
unit no frontmatter → derive from BodyHints inline
unit id: field becomes user_id_alias, not doc_id factor inline + assert via §4.2 recipe stub
snapshot fixtures/markdown/frontmatter-only.md produces stable JSON fixture
snapshot mixed-language body with no lang: detects ko or en fixtures/markdown/mixed-lang.md

All tests under cargo test -p kebab-parse-md --lib frontmatter.

Definition of Done

  • cargo check -p kebab-parse-md passes
  • cargo test -p kebab-parse-md frontmatter passes
  • No pulldown-cmark import in this submodule
  • Snapshot tests stable across two consecutive runs
  • PR links design §0 Q9, §3.6

Out of scope

  • Block parsing (p1-3).
  • Building CanonicalDocument (p1-4).
  • Persisting metadata (p1-6).

Risks / notes

  • lingua model load is heavy on first call; tests should reuse a static instance.
  • timezone normalization: parse created_at/updated_at to UTC; preserve original offset only in metadata.user.original_timestamps if present and non-UTC.