State drift after P0–P4 completion + 3 post-merge hotfixes (PR #20
--config across subcommands, PR #24 --config in kb ask, PR #25 RRF
fusion_score normalization). README still framed the project as
"spec frozen, code 0 lines"; phase docs and task specs all carried
status: planned. Sweep:
- README.md: top banner now "P0–P4 done (17/31 tasks) + 3 hotfixes
applied"; command table marks each subcommand's owning phase and
current status (kb ask = ✅ via P4-3, kb eval = ⏳ P5);
phase roadmap table grew a Status column (P0–P4 completed, P5
next, P6–P9 pending); component count bumped 30 → 31 to reflect
P3-5 (app-wiring, post-spec); core decisions table notes the
RRF [0,1] normalization invariant; build+실행 section drops the
"P0 미시작" caveat; new pointers to HOTFIXES.md and SMOKE.md.
- docs/SMOKE.md (new): ~80-line recipe for running the full
pipeline against an isolated /tmp/kb-smoke/ workspace via
--config, without polluting ~/.config/kb/ or
~/.local/share/kb/. Covers fixture seeding, sample config.toml
with the post-merge defaults, doctor → ingest → list →
search × 3 modes → inspect → ask sequence, verification
checklist, and known-behaviour notes (fastembed model
download, RAG response time, --config hard-fail on missing
path).
- tasks/phase-{0..4}-*.md: status frontmatter flipped planned →
completed.
- tasks/p0/, tasks/p1/, tasks/p2/, tasks/p3/, tasks/p4/: same
status flip across all 17 component task specs (1+6+2+5+3).
P5–P9 stay planned.
cargo test --workspace: 319 passed; clippy clean (no source
changes in this commit, just docs + frontmatter).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Parse Markdown body bytes into a flat Vec<kb_parse_types::ParsedBlock> with heading paths and line ranges preserved, ready for kb-normalize to lift into CanonicalDocument.
Why now / why this size
This is the heaviest part of P1 parser. Separating it from frontmatter and from normalization keeps each piece tractable. Determinism of line ranges directly determines citation quality (design §0 Q3 / §3.4 SourceSpan::Line).
ParsedBlock is defined in kb-parse-types (design §3.7b). kb-parse-md does NOT define its own; it consumes the shared type. Lift to kb_core::Block (with BlockId assignment) is kb-normalize's job (p1-4).
Behavior contract
Source-map: each ParsedBlock carries SourceSpan::Line { start, end } relative to the original file (i.e., add body_offset_lines).
Heading tree: every block records its ancestor heading texts in order (e.g., ["아키텍처", "Chunking 정책"]).
Tables: GFM tables produce ParsedPayload::Table { headers, rows }; if a table cell is malformed, fall back to ParsedPayload::Paragraph + Warning::MalformedTable.
Image references:  produces ParsedPayload::ImageRef { src, alt }. AssetId resolution happens later in kb-normalize (when image src can be matched to a workspace asset).
Lists: ordered/unordered preserved via ParsedPayload::List { ordered, items }; nested list items flattened so each items[i] is a Vec<kb_core::Inline> for one top-level item.
Inline elements: only Text, Code, Link, Strong, Emph (per kb_core::Inline per design §3.4). Drop other inlines silently.
Malformed input never panics. Worst case: empty Vec<ParsedBlock> + Warning::ExtractFailed.
Storage / wire effects
None.
Test plan
kind
description
fixture / data
unit
heading tree depth + heading_path correctness
inline
unit
code block lang tag preserved
inline
unit
GFM table parses; malformed table degrades to paragraph + warning
inline
unit
line range correct under various line-ending styles (LF / CRLF)
All tests under cargo test -p kb-parse-md --lib blocks.
Definition of Done
cargo check -p kb-parse-md passes
cargo test -p kb-parse-md blocks passes
Snapshot tests stable across two runs
No imports outside Allowed dependencies
PR links design §3.4
Out of scope
Frontmatter (p1-2).
Lifting kb_parse_types::ParsedBlock → kb_core::Block with BlockId (p1-4 normalize).
Chunking (p1-5).
Risks / notes
pulldown-cmark source-map may not include exact byte ranges for all event kinds; line ranges are the binding contract per design (line-range citation is the primary form for Markdown).
CRLF normalization: convert internally to LF for span math but report line numbers from the original byte stream.