feat(p1-3): kb-parse-md blocks (Markdown body → ParsedBlock tree) #8
Reference in New Issue
Block a user
Delete Branch "feat/p1-3-parse-md-blocks"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
요약
P1-3 task spec 구현.
kb-parse-mdcrate 의blockssubmodule (P1-2 frontmatter sibling) — Markdown body bytes를Vec<kb_parse_types::ParsedBlock>로 파싱. heading path / line span / inline filter 결정적, P1-4 normalize 가CanonicalDocument로 lift할 입력.기준: tasks/p1/p1-3-parse-md-blocks.md, docs/superpowers/specs/2026-04-27-kb-final-form-design.md (§3.4 / §3.7b / §0 Q3).
구현
crates/kb-parse-md/src/blocks.rs:parse_blocks(body, body_offset_lines) -> Result<(Vec<ParsedBlock>, Vec<Warning>)>—pulldown_cmark::Parser::new_ext+into_offset_iter기반 walkerWalkState+Frameenum — paragraph/code/list/listitem/table/quote/heading frame stackLineIndex—\n기반 byte→line 매핑 (CRLF 호환)panic::catch_unwind래퍼 — adversarial input fallback(vec![], vec![Warning::ExtractFailed])Tag::Image { dest_url }이벤트에서 캡처 (title leak / angle-bracket 처리)body_offset_lines오버플로 →checked_add감지 +ExtractFailedwarning + 빈 출력crates/kb-parse-md/src/lib.rs—parse_blocks추가 re-exportcrates/kb-parse-md/Cargo.toml—pulldown-cmark v0.13(default-features = false, GFM 테이블 런타임 활성)crates/kb-parse-md/tests/blocks_snapshots.rs— 3 snapshot + 1 regeneratorfixtures/markdown/{nested-headings,code-and-table}.blocks.snapshot.json— baseline검증
cargo check / build (RUSTFLAGS=-D warnings) / clippy --all-targets -D warningscleancargo test -p kb-parse-md→ 57 unit + 6 integration passcargo test --workspace전체 그린cargo tree -p kb-parse-md --depth 1→ 허용 deps 만 (kb-core, kb-parse-types, pulldown-cmark, anyhow, serde, serde_json, tracing + P1-2 shared)Review trail
de91648,0050cf3,d49dbc1,73040ca,23ff4d6,2b6d9ab,80123e9)후속 자연 해소 (비차단)
kb_core::Inline직렬화 버그 (P0-1 schema): internally-tagged enum + newtype variant 조합 → serde 4/5 variant 실패. snapshot test가BlockViewprojection으로 우회. P1-4 normalize 가CanonicalDocument직렬화 시점에 §9 schema migration 으로 hotfix 예정.ParsedPayload::Quote가 list/code/table 자식을 보존하지 못함 (schema 한계) — TODO 주석 + pin test 박힘.다음
머지 후
feat/p1-4-normalize분기 → P1-4 (kb-normalize) 진행. P1-4 시점에Inlineschema fix 동반.`emit_block` previously walked the frame stack looking only for a Quote container, falling back to top-level on miss. That caused any block emitted inside a list item — code blocks, images, tables, headings — to escape the list and appear at the top of `blocks`, after the entire list and out of source order. `ParsedPayload::List { items: Vec<Vec<Inline>> }` cannot represent a child block structurally, so the choice is between dropping content and flattening. Extend the reverse-walk to also recognize `Frame::ListItem` and route the block into a textual rendering appended to the item's inline buffer (`flatten_block_into_item`): - Code → fenced text approximation, preserving lang hint + body - Image → `` text - Audio → `[audio](src)` text - Heading → leading hashes + text - Quote → `> text` - Nested List → same rendering as `nested_in_item` flatten - Table → pipe-table approximation Document order is preserved because flattening happens inside the item's frame, before the item closes. New tests: - code_block_inside_list_item_flattens_into_parent - image_inside_list_item_flattens_into_parent - block_content_in_list_preserves_document_order Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>`# ` (a heading with no following text) used to seed the heading stack with `Some("")`, which then propagated into every child block's `heading_path` as a `""` segment — visibly polluting the path that downstream consumers index by. Filter empty entries from both `heading_path()` and the in-line ancestor collection at heading-end. We deliberately keep `Some("")` in the stack rather than skipping the assignment so the slot remains occupied and a subsequent deeper heading is still positioned correctly relative to its level — only the visible path drops the empty. New tests: - empty_heading_does_not_pollute_path - empty_h1_then_h2_does_not_break_stack Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>`ParsedPayload::Quote { text, inlines }` cannot represent block-level children (lists, code, tables, images), so the BlockQuote end handler silently drops them when assembling the Quote payload. This matches §3.4 for now but is non-obvious and easy to regress without an explicit pin. Add a TODO(P1-future) comment near the Quote emission code and a regression test (`quote_with_list_inside_drops_list`) that fixes the current shape: a `> - item` blockquote produces a Quote with empty text and empty inlines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>리뷰: 모든 항목 해소 — APPROVE 권고 (gate 정책상 COMMENT 등록)
내부 review loop 3회 완료. 마지막 리뷰어 verdict: APPROVED, must-fix-now 0건.
Review trail
de91648,0050cf3,d49dbc1,73040ca,23ff4d6,2b6d9ab,80123e9핵심 검증 포인트
checked_add+Warning::ExtractFailedbody_offset_lines_max_returns_extract_failed+u32::MAX-Nboundary 검증flatten_block_into_itemreverse-walk, document order 보존filter_map으로 path 에서 빈 string skip, 스택은 유지하여 deeper level positioningTag::Image { dest_url }이벤트 캡처로 교체, title/angle-bracket/empty 케이스 모두 픽스push_text+push_link_text동시 갱신Gate 통과
cargo check / build (RUSTFLAGS=-D warnings) / clippy --all-targets -D warningscleancargo test -p kb-parse-md→ 57 unit + 6 integration pass (was 41 / 6, +16 신규)후속 자연 해소 (비차단)
kb_core::Inline직렬화 버그: P0-1 schema 가#[serde(tag="kind")]+ newtype variant 조합으로 4/5 variant runtime 실패. P1-3 snapshot은BlockViewprojection 으로 우회. P1-4 가CanonicalDocumentSQLite 직렬화 시작할 때 §9 schema migration 동반 권장.결론
papered over 없음. self-review gate 정책상 본 코멘트는 COMMENT — author 측 수동 APPROVE + merge 부탁.