Files

altair823 f9714aa5cb docs(rename): kb → kebab — README, tasks/, docs/, design doc, report

마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 04:01:55 +00:00

4.8 KiB

Raw Blame History

phase, component, task_id, title, status, depends_on, unblocks, contract_source, contract_sections

phase

component

task_id

title

status

depends_on

unblocks

contract_source

contract_sections

kebab-parse-md (blocks submodule)

p1-3

Markdown body → Block tree with line spans

completed

p0-1

p1-4

../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md

§3.4 Block

§3.4 SourceSpan

§3.7b kebab-parse-types

§0 Q3 citation

p1-3 — Markdown body → Block tree

Goal

Parse Markdown body bytes into a flat Vec<kebab_parse_types::ParsedBlock> with heading paths and line ranges preserved, ready for kebab-normalize to lift into CanonicalDocument.

Why now / why this size

This is the heaviest part of P1 parser. Separating it from frontmatter and from normalization keeps each piece tractable. Determinism of line ranges directly determines citation quality (design §0 Q3 / §3.4 SourceSpan::Line).

Allowed dependencies

kebab-core
kebab-parse-types (defines ParsedBlock, ParsedPayload, Warning)
pulldown-cmark (CommonMark with source-map; GFM tables enabled via feature)
serde
thiserror

Forbidden dependencies

kebab-store-*, kebab-llm*, kebab-rag, kebab-embed*, kebab-search, kebab-source-fs, kebab-chunk, kebab-normalize, kebab-tui, kebab-desktop, comrak (alternative parser; pick one)

Inputs

input	type	source
Markdown body bytes	`&[u8]`	extractor (after frontmatter stripped)
`body_offset_lines`	`u32`	extractor (so line ranges are reported relative to original file)

Outputs

output	type	downstream
`Vec<kebab_parse_types::ParsedBlock>`	shared type from `kebab-parse-types`	`kebab-normalize`
`Vec<kebab_parse_types::Warning>`	shared type	propagated into Provenance

Public surface (signatures only — no new types)

pub fn parse_blocks(
    body: &[u8],
    body_offset_lines: u32,
) -> anyhow::Result<(Vec<kebab_parse_types::ParsedBlock>, Vec<kebab_parse_types::Warning>)>;

ParsedBlock is defined in kebab-parse-types (design §3.7b). kebab-parse-md does NOT define its own; it consumes the shared type. Lift to kebab_core::Block (with BlockId assignment) is kebab-normalize's job (p1-4).

Behavior contract

Source-map: each ParsedBlock carries SourceSpan::Line { start, end } relative to the original file (i.e., add body_offset_lines).
Heading tree: every block records its ancestor heading texts in order (e.g., ["아키텍처", "Chunking 정책"]).
Code blocks: ParsedPayload::Code { lang: Some("rust"), code } — fenced content not split.
Tables: GFM tables produce ParsedPayload::Table { headers, rows }; if a table cell is malformed, fall back to ParsedPayload::Paragraph + Warning::MalformedTable.
Image references: ![alt](src) produces ParsedPayload::ImageRef { src, alt }. AssetId resolution happens later in kebab-normalize (when image src can be matched to a workspace asset).
Lists: ordered/unordered preserved via ParsedPayload::List { ordered, items }; nested list items flattened so each items[i] is a Vec<kebab_core::Inline> for one top-level item.
Inline elements: only Text, Code, Link, Strong, Emph (per kebab_core::Inline per design §3.4). Drop other inlines silently.
Malformed input never panics. Worst case: empty Vec<ParsedBlock> + Warning::ExtractFailed.

Storage / wire effects

None.

Test plan

kind	description	fixture / data
unit	heading tree depth + heading_path correctness	inline
unit	code block lang tag preserved	inline
unit	GFM table parses; malformed table degrades to paragraph + warning	inline
unit	line range correct under various line-ending styles (LF / CRLF)	inline
unit	image ref captured with src/alt	inline
unit	nested list flattens correctly	inline
unit	malformed input does not panic	inline (random byte slices)
snapshot	`fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable	fixture
snapshot	`fixtures/markdown/code-and-table.md` → JSON stable	fixture

All tests under cargo test -p kebab-parse-md --lib blocks.

Definition of Done

cargo check -p kebab-parse-md passes
cargo test -p kebab-parse-md blocks passes
Snapshot tests stable across two runs
No imports outside Allowed dependencies
PR links design §3.4

Out of scope

Frontmatter (p1-2).
Lifting kebab_parse_types::ParsedBlock → kebab_core::Block with BlockId (p1-4 normalize).
Chunking (p1-5).

Risks / notes

pulldown-cmark source-map may not include exact byte ranges for all event kinds; line ranges are the binding contract per design (line-range citation is the primary form for Markdown).
CRLF normalization: convert internally to LF for span math but report line numbers from the original byte stream.

4.8 KiB Raw Blame History