docs(rename): kb → kebab — README, tasks/, docs/, design doc, report

마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-02 04:01:55 +00:00
parent f1a448d6dc
commit f9714aa5cb
56 changed files with 1324 additions and 1324 deletions

View File

@@ -8,11 +8,11 @@
- Phase A — Author the canonical task spec template (`tasks/_template.md`).
- Phase B — Decompose P1 (Markdown ingestion) into 6 component task specs to validate the template.
- Phase C — After Phase B passes review, decompose P0 + P2..P9 into the remaining ~24 task specs in one pass.
- Each task spec cites only the frozen design doc (`docs/superpowers/specs/2026-04-27-kb-final-form-design.md`) for types, traits, schema, layout. No new domain types or traits are introduced inside task specs.
- Each task spec cites only the frozen design doc (`docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`) for types, traits, schema, layout. No new domain types or traits are introduced inside task specs.
**Tech Stack:** Plain Markdown documents under `tasks/`. No code changes in this plan — produces specs that AI sub-agents will later implement against.
**Frozen contract source:** [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../specs/2026-04-27-kb-final-form-design.md). All task specs reference this. Modifications to the contract require updating that file first, then re-checking dependent task specs.
**Frozen contract source:** [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../specs/2026-04-27-kebab-final-form-design.md). All task specs reference this. Modifications to the contract require updating that file first, then re-checking dependent task specs.
**Phase task index (target file layout):**
@@ -83,7 +83,7 @@ Existing per-phase epic files (`tasks/phase-0-skeleton.md` … `phase-9-ui.md`)
- [ ] **Step 1: Verify the design doc path resolves**
```bash
test -f docs/superpowers/specs/2026-04-27-kb-final-form-design.md && echo OK
test -f docs/superpowers/specs/2026-04-27-kebab-final-form-design.md && echo OK
```
Expected: `OK`
@@ -101,7 +101,7 @@ title: "<Component title>"
status: planned
depends_on: [] # other task_ids
unblocks: [] # other task_ids
contract_source: ../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
---
@@ -117,7 +117,7 @@ contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
## Allowed dependencies
- `kb-core`
- `kebab-core`
- <other crates per design §8>
- <external crates with versions>
@@ -166,7 +166,7 @@ If any item here is needed during implementation, STOP and update the frozen des
| unit | ... | ... |
| snapshot | ... (JSON freeze) | `fixtures/...` |
| contract | trait round-trip | mock impls |
| integration | end-to-end via `kb-app` facade | tmp workspace |
| integration | end-to-end via `kebab-app` facade | tmp workspace |
All tests must run under `cargo test -p <crate>` and not require external network or Ollama unless explicitly stated.
@@ -247,7 +247,7 @@ git add tasks/p1 tasks/INDEX.md
git commit -m "tasks: prepare P1 component decomposition skeleton"
```
### Task B1: `p1-1-source-fs.md` (kb-source-fs)
### Task B1: `p1-1-source-fs.md` (kebab-source-fs)
**Files:**
- Create: `tasks/p1/p1-1-source-fs.md`
@@ -259,13 +259,13 @@ Write the following content to `tasks/p1/p1-1-source-fs.md`:
````markdown
---
phase: P1
component: kb-source-fs
component: kebab-source-fs
task_id: p1-1
title: "Local filesystem source connector"
status: planned
depends_on: [p0-1]
unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8]
---
@@ -281,8 +281,8 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `ignore` (gitignore semantics)
- `blake3`
- `walkdir`
@@ -293,21 +293,21 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
## Forbidden dependencies
- `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `SourceScope` | `kb_core::SourceScope` | `kb-app` from config |
| `SourceScope` | `kebab_core::SourceScope` | `kebab-app` from config |
| filesystem | `&Path` | OS |
| `.kbignore` | text file | workspace root, optional |
| `.kebabignore` | text file | workspace root, optional |
## Outputs
| output | type | downstream consumer |
|--------|------|---------------------|
| `Vec<RawAsset>` | `kb_core::RawAsset` | `kb-parse-md`, asset writer in `kb-store-sqlite` (via `kb-app`) |
| `Vec<RawAsset>` | `kebab_core::RawAsset` | `kebab-parse-md`, asset writer in `kebab-store-sqlite` (via `kebab-app`) |
## Public surface (signatures only — no new types)
@@ -315,11 +315,11 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
pub struct FsSourceConnector { /* internal */ }
impl FsSourceConnector {
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
}
impl kb_core::SourceConnector for FsSourceConnector {
fn scan(&self, scope: &kb_core::SourceScope) -> anyhow::Result<Vec<kb_core::RawAsset>>;
impl kebab_core::SourceConnector for FsSourceConnector {
fn scan(&self, scope: &kebab_core::SourceScope) -> anyhow::Result<Vec<kebab_core::RawAsset>>;
}
```
@@ -329,7 +329,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
- `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex.
- `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`).
- `discovered_at` = current `OffsetDateTime::now_utc()` at scan time.
- Combine `config.workspace.exclude` `.kbignore` for filter (union; ordering does not matter).
- Combine `config.workspace.exclude` `.kebabignore` for filter (union; ordering does not matter).
- Symbolic links: follow once, detect cycles via `canonicalize` + visited set.
- Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task).
- Idempotent: same input → same `Vec<RawAsset>` (sort by `workspace_path`).
@@ -337,7 +337,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
## Storage / wire effects
- Reads: filesystem under `config.workspace.root`.
- Writes: nothing. (Asset copy is handled by the asset writer in `kb-store-sqlite`.)
- Writes: nothing. (Asset copy is handled by the asset writer in `kebab-store-sqlite`.)
## Test plan
@@ -346,25 +346,25 @@ impl kb_core::SourceConnector for FsSourceConnector {
| unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical |
| unit | blake3 of known bytes matches expected hex | inline |
| unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test |
| unit | `.kbignore` config exclude works | tmp tree |
| unit | `.kebabignore` config exclude works | tmp tree |
| unit | symlink cycle does not loop | tmp tree with `a -> b -> a` |
| snapshot | `Vec<RawAsset>` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` |
| determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` |
All tests run under `cargo test -p kb-source-fs` with no network and no model.
All tests run under `cargo test -p kebab-source-fs` with no network and no model.
## Definition of Done
- [ ] `cargo check -p kb-source-fs` passes
- [ ] `cargo test -p kb-source-fs` passes
- [ ] `cargo check -p kebab-source-fs` passes
- [ ] `cargo test -p kebab-source-fs` passes
- [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kb-source-fs`)
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kebab-source-fs`)
- [ ] PR description links to design §3.3, §6.2, §7.2
## Out of scope
- File watching (P+).
- Asset copy/reference storage on disk (`kb-store-sqlite` task p1-6).
- Asset copy/reference storage on disk (`kebab-store-sqlite` task p1-6).
- Non-fs source connectors (HTTP, S3 — P+).
## Risks / notes
@@ -400,13 +400,13 @@ Write to `tasks/p1/p1-2-parse-md-frontmatter.md`:
````markdown
---
phase: P1
component: kb-parse-md (frontmatter submodule)
component: kebab-parse-md (frontmatter submodule)
task_id: p1-2
title: "Markdown frontmatter parsing → Metadata"
status: planned
depends_on: [p0-1]
unblocks: [p1-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.6 Metadata, §0 Q9 frontmatter, §10 errors]
---
@@ -414,15 +414,15 @@ contract_sections: [§3.6 Metadata, §0 Q9 frontmatter, §10 errors]
## Goal
Parse YAML/TOML frontmatter from Markdown bytes into `kb_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
Parse YAML/TOML frontmatter from Markdown bytes into `kebab_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
## Why now / why this size
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kb-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kebab-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
## Allowed dependencies
- `kb-core`
- `kebab-core`
- `serde`
- `serde_yaml` (or `yaml-rust2`) for YAML
- `toml` for TOML
@@ -432,7 +432,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
## Forbidden dependencies
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-tui`, `kb-desktop`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `pulldown-cmark` (block parser is a sibling task)
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-tui`, `kebab-desktop`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `pulldown-cmark` (block parser is a sibling task)
## Inputs
@@ -445,7 +445,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
| output | type | downstream |
|--------|------|------------|
| `(Metadata, Option<FrontmatterSpan>, Vec<Warning>)` | tuple | `kb-normalize` → CanonicalDocument |
| `(Metadata, Option<FrontmatterSpan>, Vec<Warning>)` | tuple | `kebab-normalize` → CanonicalDocument |
## Public surface (signatures only — no new types)
@@ -453,7 +453,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
pub fn parse_frontmatter(
bytes: &[u8],
hints: &BodyHints,
) -> anyhow::Result<(kb_core::Metadata, Option<FrontmatterSpan>, Vec<Warning>)>;
) -> anyhow::Result<(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<Warning>)>;
```
`FrontmatterSpan` and `Warning` are crate-internal helpers; if any new public type is needed, STOP and update the frozen design doc first.
@@ -490,12 +490,12 @@ pub fn parse_frontmatter(
| snapshot | `fixtures/markdown/frontmatter-only.md` produces stable JSON | fixture |
| snapshot | mixed-language body with no `lang:` detects `ko` or `en` | `fixtures/markdown/mixed-lang.md` |
All tests under `cargo test -p kb-parse-md --lib frontmatter`.
All tests under `cargo test -p kebab-parse-md --lib frontmatter`.
## Definition of Done
- [ ] `cargo check -p kb-parse-md` passes
- [ ] `cargo test -p kb-parse-md frontmatter` passes
- [ ] `cargo check -p kebab-parse-md` passes
- [ ] `cargo test -p kebab-parse-md frontmatter` passes
- [ ] No `pulldown-cmark` import in this submodule
- [ ] Snapshot tests stable across two consecutive runs
- [ ] PR links design §0 Q9, §3.6
@@ -534,13 +534,13 @@ Write to `tasks/p1/p1-3-parse-md-blocks.md`:
````markdown
---
phase: P1
component: kb-parse-md (blocks submodule)
component: kebab-parse-md (blocks submodule)
task_id: p1-3
title: "Markdown body → Block tree with line spans"
status: planned
depends_on: [p0-1]
unblocks: [p1-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 Block, §3.4 SourceSpan, §0 Q3 citation]
---
@@ -548,7 +548,7 @@ contract_sections: [§3.4 Block, §3.4 SourceSpan, §0 Q3 citation]
## Goal
Parse Markdown body bytes into a flat `Vec<ParsedBlock>` (intermediate, crate-private) with heading paths and line ranges preserved, ready for `kb-normalize` to lift into `CanonicalDocument`.
Parse Markdown body bytes into a flat `Vec<ParsedBlock>` (intermediate, crate-private) with heading paths and line ranges preserved, ready for `kebab-normalize` to lift into `CanonicalDocument`.
## Why now / why this size
@@ -556,14 +556,14 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
## Allowed dependencies
- `kb-core`
- `kebab-core`
- `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature)
- `serde`
- `thiserror`
## Forbidden dependencies
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `kb-tui`, `kb-desktop`, `comrak` (alternative parser; pick one)
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `kebab-tui`, `kebab-desktop`, `comrak` (alternative parser; pick one)
## Inputs
@@ -576,7 +576,7 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
| output | type | downstream |
|--------|------|------------|
| `Vec<ParsedBlock>` (intermediate type, crate-private) | | `kb-normalize` |
| `Vec<ParsedBlock>` (intermediate type, crate-private) | | `kebab-normalize` |
| `Vec<Warning>` | | propagated into Provenance |
## Public surface (signatures only — no new types)
@@ -585,7 +585,7 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<ParsedBlock>, Vec<Warning>)>;
```
`ParsedBlock` is a crate-internal mirror that maps 1:1 to `kb_core::Block` variants once `kb-normalize` assigns `BlockId`s.
`ParsedBlock` is a crate-internal mirror that maps 1:1 to `kebab_core::Block` variants once `kebab-normalize` assigns `BlockId`s.
## Behavior contract
@@ -593,7 +593,7 @@ pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<
- Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`).
- Code blocks: language tag preserved (` ```rust ``Some("rust")`), fenced content not split.
- Tables: GFM tables produce `TableBlock` with header row + body rows; if a table cell is malformed, fall back to a `Paragraph` block + warning.
- Image references: `![alt](src)` produces `ImageRefBlock` with `asset_id = None`, `src = "..."`, `alt = "..."`. Resolution to `AssetId` happens later in `kb-normalize`.
- Image references: `![alt](src)` produces `ImageRefBlock` with `asset_id = None`, `src = "..."`, `alt = "..."`. Resolution to `AssetId` happens later in `kebab-normalize`.
- Lists: ordered/unordered preserved; nested list items flattened into one `ListBlock` with each top-level item's text.
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per design §3.4). Drop other inlines silently.
- Malformed input never panics. Worst case: empty `Vec<ParsedBlock>` + warning.
@@ -616,12 +616,12 @@ pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<
| snapshot | `fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable | fixture |
| snapshot | `fixtures/markdown/code-and-table.md` → JSON stable | fixture |
All tests under `cargo test -p kb-parse-md --lib blocks`.
All tests under `cargo test -p kebab-parse-md --lib blocks`.
## Definition of Done
- [ ] `cargo check -p kb-parse-md` passes
- [ ] `cargo test -p kb-parse-md blocks` passes
- [ ] `cargo check -p kebab-parse-md` passes
- [ ] `cargo test -p kebab-parse-md blocks` passes
- [ ] Snapshot tests stable across two runs
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4
@@ -629,7 +629,7 @@ All tests under `cargo test -p kb-parse-md --lib blocks`.
## Out of scope
- Frontmatter (p1-2).
- Lifting `ParsedBlock``kb_core::Block` with `BlockId` (p1-4).
- Lifting `ParsedBlock``kebab_core::Block` with `BlockId` (p1-4).
- Chunking (p1-5).
## Risks / notes
@@ -658,13 +658,13 @@ Write to `tasks/p1/p1-4-normalize.md`:
````markdown
---
phase: P1
component: kb-normalize
component: kebab-normalize
task_id: p1-4
title: "Lift parser output → CanonicalDocument with deterministic IDs"
status: planned
depends_on: [p1-2, p1-3]
unblocks: [p1-5, p1-6]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4, §4 ID recipe, §3.6 Provenance]
---
@@ -676,12 +676,12 @@ Combine `Metadata` (p1-2) + `Vec<ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a
## Why now / why this size
Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
Single responsibility: ID generation + struct assembly. Keeps `kebab-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`
- `serde-json-canonicalizer` (canonical JSON for ID hashing)
- `blake3`
@@ -691,38 +691,38 @@ Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` pure
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md` (consumed via plain types only — must not couple back), `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md` (consumed via plain types only — must not couple back), `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
Note: this crate accepts `ParsedBlock` from `kb-parse-md` either by (a) exposing `ParsedBlock` as a `kb-core` type, or (b) `kb-parse-md` re-exporting via a public DTO. Pick (a): move `ParsedBlock` into `kb-core` so this task does not import `kb-parse-md`.
Note: this crate accepts `ParsedBlock` from `kebab-parse-md` either by (a) exposing `ParsedBlock` as a `kebab-core` type, or (b) `kebab-parse-md` re-exporting via a public DTO. Pick (a): move `ParsedBlock` into `kebab-core` so this task does not import `kebab-parse-md`.
## Inputs
| input | type | source |
|-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | p1-1 |
| `RawAsset` | `kebab_core::RawAsset` | p1-1 |
| `Metadata` + frontmatter span + warnings | from p1-2 | parser caller |
| `Vec<ParsedBlock>` + warnings | from p1-3 | parser caller |
| `parser_version` | `kb_core::ParserVersion` | constant in `kb-parse-md` |
| `parser_version` | `kebab_core::ParserVersion` | constant in `kebab-parse-md` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk`, `kb-store-sqlite` |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk`, `kebab-store-sqlite` |
## Public surface (signatures only — no new types)
```rust
pub fn build_canonical_document(
asset: &kb_core::RawAsset,
metadata: kb_core::Metadata,
blocks: Vec<kb_core::ParsedBlock>,
parser_version: &kb_core::ParserVersion,
asset: &kebab_core::RawAsset,
metadata: kebab_core::Metadata,
blocks: Vec<kebab_core::ParsedBlock>,
parser_version: &kebab_core::ParserVersion,
warnings: Vec<Warning>,
) -> anyhow::Result<kb_core::CanonicalDocument>;
) -> anyhow::Result<kebab_core::CanonicalDocument>;
pub fn id_for_doc(workspace_path: &kb_core::WorkspacePath, asset: &kb_core::AssetId, parser_version: &kb_core::ParserVersion) -> kb_core::DocumentId;
pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kb_core::SourceSpan) -> kb_core::BlockId;
pub fn id_for_doc(workspace_path: &kebab_core::WorkspacePath, asset: &kebab_core::AssetId, parser_version: &kebab_core::ParserVersion) -> kebab_core::DocumentId;
pub fn id_for_block(doc: &kebab_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kebab_core::SourceSpan) -> kebab_core::BlockId;
```
## Behavior contract
@@ -750,14 +750,14 @@ pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[Strin
| unit | provenance contains Discovered/Parsed/Normalized in order | inline |
| snapshot | `fixtures/markdown/code-and-table.md` → CanonicalDocument JSON stable (incl. all IDs) | fixture |
All tests under `cargo test -p kb-normalize`.
All tests under `cargo test -p kebab-normalize`.
## Definition of Done
- [ ] `cargo check -p kb-normalize` passes
- [ ] `cargo test -p kb-normalize` passes
- [ ] `cargo check -p kebab-normalize` passes
- [ ] `cargo test -p kebab-normalize` passes
- [ ] Determinism test runs ≥ 1000 iterations under 1 second
- [ ] No `kb-parse-md` import (consumed via `kb-core::ParsedBlock`)
- [ ] No `kebab-parse-md` import (consumed via `kebab-core::ParsedBlock`)
- [ ] PR links design §4.2, §4.3
## Out of scope
@@ -791,13 +791,13 @@ Write to `tasks/p1/p1-5-chunk.md`:
````markdown
---
phase: P1
component: kb-chunk
component: kebab-chunk
task_id: p1-5
title: "Markdown heading-aware chunker (md-heading-v1)"
status: planned
depends_on: [p1-4]
unblocks: [p1-6, p2-2, p3-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §7.2 Chunker, §0 Q3 citation]
---
@@ -813,8 +813,8 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`
- `blake3` (policy_hash)
- `serde-json-canonicalizer`
@@ -822,30 +822,30 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize` (consumes `CanonicalDocument` only via `kb-core`), `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize` (consumes `CanonicalDocument` only via `kebab-core`), `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` from config |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` from config |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite` (p1-6), `kb-embed*` (P3) |
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite` (p1-6), `kebab-embed*` (P3) |
## Public surface (signatures only — no new types)
```rust
pub struct MdHeadingV1Chunker;
impl kb_core::Chunker for MdHeadingV1Chunker {
fn chunker_version(&self) -> kb_core::ChunkerVersion;
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
impl kebab_core::Chunker for MdHeadingV1Chunker {
fn chunker_version(&self) -> kebab_core::ChunkerVersion;
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
}
```
@@ -882,12 +882,12 @@ impl kb_core::Chunker for MdHeadingV1Chunker {
| determinism | identical input + identical policy → identical chunk_ids | inline |
| snapshot | `fixtures/markdown/long-section.md` → Vec<Chunk> JSON stable | fixture |
All tests under `cargo test -p kb-chunk`.
All tests under `cargo test -p kebab-chunk`.
## Definition of Done
- [ ] `cargo check -p kb-chunk` passes
- [ ] `cargo test -p kb-chunk` passes
- [ ] `cargo check -p kebab-chunk` passes
- [ ] `cargo test -p kebab-chunk` passes
- [ ] Snapshot stable across two runs
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.5, §4.2
@@ -924,13 +924,13 @@ Write to `tasks/p1/p1-6-store-sqlite.md`:
````markdown
---
phase: P1
component: kb-store-sqlite (P1 subset)
component: kebab-store-sqlite (P1 subset)
task_id: p1-6
title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations"
status: planned
depends_on: [p1-1, p1-4, p1-5]
unblocks: [p2-1, p3-3, p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5 DDL (5.1, 5.2, 5.3, 5.4, 5.5 chunks only — FTS handled in p2-1), §5.7 jobs/ingest_runs, §5.8 transactions, §6.3 data_dir layout]
---
@@ -946,8 +946,8 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `rusqlite` (with `bundled-sqlcipher` disabled; use `bundled` feature)
- `refinery` for migrations
- `serde_json`
@@ -958,7 +958,7 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
## Forbidden dependencies
- `kb-source-fs` (only types via `kb-core`), `kb-parse-md`, `kb-normalize`, `kb-chunk` (only types via `kb-core`), `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs` (only types via `kebab-core`), `kebab-parse-md`, `kebab-normalize`, `kebab-chunk` (only types via `kebab-core`), `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
@@ -966,17 +966,17 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|-------|------|--------|
| migrations | `migrations/V001__init.sql` | repo |
| `RawAsset` + bytes | `(RawAsset, Vec<u8>)` | p1-1 + reader |
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
| `Vec<Chunk>` | `kb_core::Chunk` | p1-5 |
| `IngestRun` aggregates | `(scope, counts, duration)` | `kb-app` |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
| `Vec<Chunk>` | `kebab_core::Chunk` | p1-5 |
| `IngestRun` aggregates | `(scope, counts, duration)` | `kebab-app` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `data_dir/kb.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | | every later phase |
| `data_dir/kebab.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | | every later phase |
| `data_dir/assets/<aa>/<asset_id>` bytes (when copied) | | future re-extraction, integrity verification |
| `IngestReport` (wire schema v1) | `kb_core::IngestReport` | `kb-cli`, eval |
| `IngestReport` (wire schema v1) | `kebab_core::IngestReport` | `kebab-cli`, eval |
## Public surface (signatures only — no new types)
@@ -984,23 +984,23 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
pub struct SqliteStore { /* internal */ }
impl SqliteStore {
pub fn open(config: &kb_config::Config) -> anyhow::Result<Self>;
pub fn open(config: &kebab_config::Config) -> anyhow::Result<Self>;
pub fn run_migrations(&self) -> anyhow::Result<()>;
pub fn put_asset_with_bytes(&self, asset: &kb_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
pub fn put_asset_with_bytes(&self, asset: &kebab_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
}
impl kb_core::DocumentStore for SqliteStore {
fn put_asset(&self, a: &kb_core::RawAsset) -> anyhow::Result<()>;
fn put_document(&self, d: &kb_core::CanonicalDocument) -> anyhow::Result<()>;
fn put_blocks(&self, doc: &kb_core::DocumentId, blocks: &[kb_core::Block]) -> anyhow::Result<()>;
fn put_chunks(&self, doc: &kb_core::DocumentId, chunks: &[kb_core::Chunk]) -> anyhow::Result<()>;
fn get_document(&self, id: &kb_core::DocumentId) -> anyhow::Result<Option<kb_core::CanonicalDocument>>;
fn get_chunk(&self, id: &kb_core::ChunkId) -> anyhow::Result<Option<kb_core::Chunk>>;
fn list_documents(&self, filter: &kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>;
impl kebab_core::DocumentStore for SqliteStore {
fn put_asset(&self, a: &kebab_core::RawAsset) -> anyhow::Result<()>;
fn put_document(&self, d: &kebab_core::CanonicalDocument) -> anyhow::Result<()>;
fn put_blocks(&self, doc: &kebab_core::DocumentId, blocks: &[kebab_core::Block]) -> anyhow::Result<()>;
fn put_chunks(&self, doc: &kebab_core::DocumentId, chunks: &[kebab_core::Chunk]) -> anyhow::Result<()>;
fn get_document(&self, id: &kebab_core::DocumentId) -> anyhow::Result<Option<kebab_core::CanonicalDocument>>;
fn get_chunk(&self, id: &kebab_core::ChunkId) -> anyhow::Result<Option<kebab_core::Chunk>>;
fn list_documents(&self, filter: &kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
}
impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
impl kebab_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
```
## Behavior contract
@@ -1020,7 +1020,7 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
## Storage / wire effects
- Writes: `kb.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
- Writes: `kebab.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
- Reads on subsequent calls: same DB.
## Test plan
@@ -1036,14 +1036,14 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
| contract | DocumentStore trait round-trip for fixture document | `fixtures/markdown/code-and-table.md` |
| snapshot | IngestReport JSON for fixture run | fixture |
All tests under `cargo test -p kb-store-sqlite` with no network.
All tests under `cargo test -p kebab-store-sqlite` with no network.
## Definition of Done
- [ ] `cargo check -p kb-store-sqlite` passes
- [ ] `cargo test -p kb-store-sqlite` passes
- [ ] `cargo check -p kebab-store-sqlite` passes
- [ ] `cargo test -p kebab-store-sqlite` passes
- [ ] migration `V001__init.sql` matches design §5 verbatim (diff-checked in CI)
- [ ] Writes to `~/.local/share/kb/` are gated by `kb-config`'s `data_dir` and never escape it
- [ ] Writes to `~/.local/share/kebab/` are gated by `kebab-config`'s `data_dir` and never escape it
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §5
@@ -1056,7 +1056,7 @@ All tests under `cargo test -p kb-store-sqlite` with no network.
## Risks / notes
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kb.sqlite-wal` / `-shm`.
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kebab.sqlite-wal` / `-shm`.
- Asset directory shard prefix uses `asset_id[..2]`; using `asset_id[..1]` would create at most 16 dirs (insufficient).
````
@@ -1074,7 +1074,7 @@ git commit -m "tasks: add p1-6 store-sqlite component spec"
```bash
ls tasks/p1/ | sort
for f in tasks/p1/p1-*.md; do grep -q '2026-04-27-kb-final-form-design.md' "$f" || echo "MISSING REF in $f"; done
for f in tasks/p1/p1-*.md; do grep -q '2026-04-27-kebab-final-form-design.md' "$f" || echo "MISSING REF in $f"; done
echo done
```
@@ -1106,9 +1106,9 @@ For each component task below, the steps are: (1) write file, (2) verify, (3) co
**Files:** Create: `tasks/p0/p0-1-skeleton.md`. Also `mkdir -p tasks/p0`.
`contract_sections`: §3 (all subsections), §4, §5 (migrations meta only), §6, §7, §8, §10. `Allowed`: workspace + `kb-core` + `kb-config` + `kb-app` + `kb-cli` only.
`contract_sections`: §3 (all subsections), §4, §5 (migrations meta only), §6, §7, §8, §10. `Allowed`: workspace + `kebab-core` + `kebab-config` + `kebab-app` + `kebab-cli` only.
Body covers: workspace `Cargo.toml` resolver=3, edition 2024, member list (`kb-core`, `kb-config`, `kb-app`, `kb-cli`), workspace dependencies, `kb-core` types and traits per design §3 / §7, deterministic ID functions per §4 (with full unit tests), `kb-config` loader (TOML + env + CLI override per §6.4), `kb-app` facade signatures (`ingest`, `search`, `ask`, `inspect_doc`, `inspect_chunk`, `doctor`, `init`), `kb-cli` skeleton with clap + `--help`. DoD: `cargo check --workspace`, `cargo test --workspace` (Newtype+ID+canonical-json tests only), `kb --help` works, `docs/spec/*` stubs created (link to frozen design doc), `docs/wire-schema/v1/*.schema.json` stubs (one file per object in §2).
Body covers: workspace `Cargo.toml` resolver=3, edition 2024, member list (`kebab-core`, `kebab-config`, `kebab-app`, `kebab-cli`), workspace dependencies, `kebab-core` types and traits per design §3 / §7, deterministic ID functions per §4 (with full unit tests), `kebab-config` loader (TOML + env + CLI override per §6.4), `kebab-app` facade signatures (`ingest`, `search`, `ask`, `inspect_doc`, `inspect_chunk`, `doctor`, `init`), `kebab-cli` skeleton with clap + `--help`. DoD: `cargo check --workspace`, `cargo test --workspace` (Newtype+ID+canonical-json tests only), `kebab --help` works, `docs/spec/*` stubs created (link to frozen design doc), `docs/wire-schema/v1/*.schema.json` stubs (one file per object in §2).
- [ ] **Step 1: Create directory and file**
@@ -1134,11 +1134,11 @@ git -c user.name=kb -c user.email=kb@local commit -m "tasks: add p0-1 skeleton c
#### `p2-1-fts-schema.md`
`contract_sections`: §5.5 FTS5 + triggers, §9 versioning. `Allowed`: `kb-core`, `kb-config`, `kb-store-sqlite` (extends migrations). `depends_on: [p1-6]`. Migration `V002__fts.sql` adds `chunks_fts` virtual table and three triggers verbatim from §5.5. Tests: backfill from existing chunks via `INSERT INTO chunks_fts SELECT ... FROM chunks`, then assert FTS row count == chunks row count; insert/update/delete in `chunks` reflects in `chunks_fts`.
`contract_sections`: §5.5 FTS5 + triggers, §9 versioning. `Allowed`: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (extends migrations). `depends_on: [p1-6]`. Migration `V002__fts.sql` adds `chunks_fts` virtual table and three triggers verbatim from §5.5. Tests: backfill from existing chunks via `INSERT INTO chunks_fts SELECT ... FROM chunks`, then assert FTS row count == chunks row count; insert/update/delete in `chunks` reflects in `chunks_fts`.
#### `p2-2-lexical-retriever.md`
`contract_sections`: §3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5 search output (for snippet length defaults), §2.2 wire schema. `Allowed`: `kb-core`, `kb-config`, `kb-store-sqlite`. `depends_on: [p2-1]`. Implements `Retriever` trait with `bm25(chunks_fts)` ranking, snippet via SQLite `snippet()` (≤ `snippet_chars` chars), citation built per §0 Q3 from `source_spans`. Tests: top-k correctness on fixture corpus, citation line range round-trip against original Markdown, deterministic across two runs.
`contract_sections`: §3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5 search output (for snippet length defaults), §2.2 wire schema. `Allowed`: `kebab-core`, `kebab-config`, `kebab-store-sqlite`. `depends_on: [p2-1]`. Implements `Retriever` trait with `bm25(chunks_fts)` ranking, snippet via SQLite `snippet()` (≤ `snippet_chars` chars), citation built per §0 Q3 from `source_spans`. Tests: top-k correctness on fixture corpus, citation line range round-trip against original Markdown, deterministic across two runs.
- [ ] **Step 1: Create directory and both spec files** (template per Phase B; bodies as described above).
- [ ] **Step 2: Verify with `for f in tasks/p2/p2-*.md; do test -s "$f" || echo MISSING $f; done; echo done` (expect only `done`).**
@@ -1152,10 +1152,10 @@ git add tasks/p2 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
**Files:** `mkdir -p tasks/p3` and create `p3-1-embedder-trait.md`, `p3-2-fastembed-adapter.md`, `p3-3-lancedb-store.md`, `p3-4-hybrid-fusion.md`.
- `p3-1-embedder-trait.md`: §3.7, §7.2 Embedder, §11. Allowed: `kb-core`, `kb-config`. No external embedding dep. Public surface: `Embedder` trait + `EmbeddingInput`/`EmbeddingKind` already in core (validate they exist; if not, this task is also a `kb-core` patch). `depends_on: [p0-1]`. Tests: trait dyn dispatch, mock embedder.
- `p3-2-fastembed-adapter.md`: §11.3, §6.4 `[models.embedding]`. Allowed: `kb-core`, `kb-config`, `fastembed`, `tokenizers`, `ort`. `depends_on: [p3-1]`. Provides `FastembedEmbedder` implementing `Embedder` for `multilingual-e5-small` (default), with required Document/Query prefix per §11.3. Tests: dimension check, deterministic vector for fixed input (hash compare on first 8 floats with epsilon), batch size respected.
- `p3-3-lancedb-store.md`: §3.5, §5.6 embedding_records, §6.3 lancedb table naming. Allowed: `kb-core`, `kb-config`, `lancedb`, `arrow`, `kb-store-sqlite` (write `embedding_records` row only — no other table). `depends_on: [p3-2, p1-6]`. Implements `VectorStore` trait. Table naming `chunk_embeddings_<model>_<dim>.lance`. `ensure_table` creates if missing. `upsert` inserts vectors and writes a matching `embedding_records` row in same logical operation (best-effort 2PC: lance commit, then SQLite insert; on SQLite failure, log warning + leave lance row — re-upsert is idempotent because of the `UNIQUE(chunk_id, model_id, model_version, dimensions)` constraint and lance upsert semantics). `search` filters via SearchFilters and returns top-k. Tests: smoke (insert+search), dimension mismatch error, model isolation (two models stay in two tables).
- `p3-4-hybrid-fusion.md`: §3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 `[search]` rrf settings. Allowed: `kb-core`, `kb-config`, `kb-store-sqlite` (lexical Retriever from p2-2), `kb-store-vector` (vector Retriever wrapper around `VectorStore::search`). `depends_on: [p2-2, p3-3]`. Implements `HybridRetriever` that dispatches by `SearchMode`, fuses with RRF (k from config, default 60), populates `lexical_score`, `vector_score`, `lexical_rank`, `vector_rank`, `fusion_score`. Tests: pure lexical mode == p2-2 output; pure vector mode == p3-3 output; hybrid produces strictly larger or equal coverage of expected hits than either single mode on a small fixture; deterministic.
- `p3-1-embedder-trait.md`: §3.7, §7.2 Embedder, §11. Allowed: `kebab-core`, `kebab-config`. No external embedding dep. Public surface: `Embedder` trait + `EmbeddingInput`/`EmbeddingKind` already in core (validate they exist; if not, this task is also a `kebab-core` patch). `depends_on: [p0-1]`. Tests: trait dyn dispatch, mock embedder.
- `p3-2-fastembed-adapter.md`: §11.3, §6.4 `[models.embedding]`. Allowed: `kebab-core`, `kebab-config`, `fastembed`, `tokenizers`, `ort`. `depends_on: [p3-1]`. Provides `FastembedEmbedder` implementing `Embedder` for `multilingual-e5-small` (default), with required Document/Query prefix per §11.3. Tests: dimension check, deterministic vector for fixed input (hash compare on first 8 floats with epsilon), batch size respected.
- `p3-3-lancedb-store.md`: §3.5, §5.6 embedding_records, §6.3 lancedb table naming. Allowed: `kebab-core`, `kebab-config`, `lancedb`, `arrow`, `kebab-store-sqlite` (write `embedding_records` row only — no other table). `depends_on: [p3-2, p1-6]`. Implements `VectorStore` trait. Table naming `chunk_embeddings_<model>_<dim>.lance`. `ensure_table` creates if missing. `upsert` inserts vectors and writes a matching `embedding_records` row in same logical operation (best-effort 2PC: lance commit, then SQLite insert; on SQLite failure, log warning + leave lance row — re-upsert is idempotent because of the `UNIQUE(chunk_id, model_id, model_version, dimensions)` constraint and lance upsert semantics). `search` filters via SearchFilters and returns top-k. Tests: smoke (insert+search), dimension mismatch error, model isolation (two models stay in two tables).
- `p3-4-hybrid-fusion.md`: §3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 `[search]` rrf settings. Allowed: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (lexical Retriever from p2-2), `kebab-store-vector` (vector Retriever wrapper around `VectorStore::search`). `depends_on: [p2-2, p3-3]`. Implements `HybridRetriever` that dispatches by `SearchMode`, fuses with RRF (k from config, default 60), populates `lexical_score`, `vector_score`, `lexical_rank`, `vector_rank`, `fusion_score`. Tests: pure lexical mode == p2-2 output; pure vector mode == p3-3 output; hybrid produces strictly larger or equal coverage of expected hits than either single mode on a small fixture; deterministic.
- [ ] **Step 1: Create directory and 4 files** (template per Phase B).
- [ ] **Step 2: Verify**
@@ -1176,9 +1176,9 @@ git add tasks/p3 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
**Files:** `mkdir -p tasks/p4` and create `p4-1-llm-trait.md`, `p4-2-ollama-adapter.md`, `p4-3-rag-pipeline.md`.
- `p4-1-llm-trait.md`: §7.2 LanguageModel + TokenChunk, §0 Q5 streaming, §3.8 Answer types referenced. Allowed: `kb-core`, `kb-config`. `depends_on: [p0-1]`. Defines (or validates) `LanguageModel` trait, `GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage` per design. Tests: trait dyn dispatch, mock LM streams 3 tokens.
- `p4-2-ollama-adapter.md`: §11.2 Ollama, §6.4 `[models.llm]`, §0 Q5 streaming. Allowed: `kb-core`, `kb-config`, `reqwest` (blocking + json + stream feature) or `ureq` + manual SSE; `serde_json`, `tokio`/runtime if needed. `depends_on: [p4-1]`. Implements `OllamaLanguageModel` with streaming `/api/generate`. `temperature=0.0` default, `seed` honored for determinism. Reachability/missing-model errors map to `LlmError` per design §10. Tests: against a mock HTTP server (`wiremock` or hand-rolled `tiny_http`); deterministic stream collect equals buffered concatenation; missing model returns `LlmError::ModelNotPulled` with proper hint.
- `p4-3-rag-pipeline.md`: §0 Q4 refusal (two-layer), §0 Q7 footer, §1.11.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 `[rag]`. Allowed: `kb-core`, `kb-config`, `kb-search` (Retriever), `kb-llm` (LanguageModel). `depends_on: [p3-4, p4-2]`. Pipeline: retrieve top-k → score gate (`refusal_reason: ScoreGate` if top1 < gate) → context packer (token budget + heading_path header `[#n doc=… heading=… span=…]`) → render `rag-v1` prompt → stream → collect → citation extraction (regex `\[(\d+)\]`) → citation validation (each `[n]` must map to a packed chunk; otherwise `grounded=false`, `refusal_reason: LlmSelfJudge`) → write `answers` row. Tests: happy path produces grounded Answer with citations; query with all chunks below gate produces ScoreGate refusal; query whose LLM emits a citation pointing to non-existent `[7]` becomes LlmSelfJudge refusal; identical query under temperature=0 produces byte-identical Answer (snapshot).
- `p4-1-llm-trait.md`: §7.2 LanguageModel + TokenChunk, §0 Q5 streaming, §3.8 Answer types referenced. Allowed: `kebab-core`, `kebab-config`. `depends_on: [p0-1]`. Defines (or validates) `LanguageModel` trait, `GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage` per design. Tests: trait dyn dispatch, mock LM streams 3 tokens.
- `p4-2-ollama-adapter.md`: §11.2 Ollama, §6.4 `[models.llm]`, §0 Q5 streaming. Allowed: `kebab-core`, `kebab-config`, `reqwest` (blocking + json + stream feature) or `ureq` + manual SSE; `serde_json`, `tokio`/runtime if needed. `depends_on: [p4-1]`. Implements `OllamaLanguageModel` with streaming `/api/generate`. `temperature=0.0` default, `seed` honored for determinism. Reachability/missing-model errors map to `LlmError` per design §10. Tests: against a mock HTTP server (`wiremock` or hand-rolled `tiny_http`); deterministic stream collect equals buffered concatenation; missing model returns `LlmError::ModelNotPulled` with proper hint.
- `p4-3-rag-pipeline.md`: §0 Q4 refusal (two-layer), §0 Q7 footer, §1.11.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 `[rag]`. Allowed: `kebab-core`, `kebab-config`, `kebab-search` (Retriever), `kebab-llm` (LanguageModel). `depends_on: [p3-4, p4-2]`. Pipeline: retrieve top-k → score gate (`refusal_reason: ScoreGate` if top1 < gate) → context packer (token budget + heading_path header `[#n doc=… heading=… span=…]`) → render `rag-v1` prompt → stream → collect → citation extraction (regex `\[(\d+)\]`) → citation validation (each `[n]` must map to a packed chunk; otherwise `grounded=false`, `refusal_reason: LlmSelfJudge`) → write `answers` row. Tests: happy path produces grounded Answer with citations; query with all chunks below gate produces ScoreGate refusal; query whose LLM emits a citation pointing to non-existent `[7]` becomes LlmSelfJudge refusal; identical query under temperature=0 produces byte-identical Answer (snapshot).
- [ ] **Step 1, 2, 3** as in C3.
@@ -1193,8 +1193,8 @@ git add tasks/p4 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
**Files:** `mkdir -p tasks/p5` and create `p5-1-golden-fixture-runner.md`, `p5-2-metrics-compare.md`.
- `p5-1-golden-fixture-runner.md`: phase epic + §5.7 eval_runs/eval_query_results, §6.3 runs_dir. Allowed: `kb-core`, `kb-config`, `kb-app` (calls facade for search/ask), `serde_yaml`. `depends_on: [p4-3]`. Loads `fixtures/golden_queries.yaml`, runs each query in selected mode (lexical/vector/hybrid/rag), captures per-query results to `eval_query_results` and to `runs_dir/<run_id>/per_query.jsonl`. Tests: fixture with 3 queries runs end-to-end on a tiny corpus, all rows recorded.
- `p5-2-metrics-compare.md`: phase epic, §0 Q6 wire schema. Allowed: `kb-core`, `kb-config`, `kb-store-sqlite` (read eval rows). `depends_on: [p5-1]`. Computes hit@k, MRR, recall@k_doc, citation_coverage, groundedness (rule-based via `must_contain`), empty_result_rate, refusal_correctness. `kb eval compare a b` produces wins/losses/draws + delta. Tests: fixed input rows produce expected metric values; compare produces stable sorted output.
- `p5-1-golden-fixture-runner.md`: phase epic + §5.7 eval_runs/eval_query_results, §6.3 runs_dir. Allowed: `kebab-core`, `kebab-config`, `kebab-app` (calls facade for search/ask), `serde_yaml`. `depends_on: [p4-3]`. Loads `fixtures/golden_queries.yaml`, runs each query in selected mode (lexical/vector/hybrid/rag), captures per-query results to `eval_query_results` and to `runs_dir/<run_id>/per_query.jsonl`. Tests: fixture with 3 queries runs end-to-end on a tiny corpus, all rows recorded.
- `p5-2-metrics-compare.md`: phase epic, §0 Q6 wire schema. Allowed: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (read eval rows). `depends_on: [p5-1]`. Computes hit@k, MRR, recall@k_doc, citation_coverage, groundedness (rule-based via `must_contain`), empty_result_rate, refusal_correctness. `kebab eval compare a b` produces wins/losses/draws + delta. Tests: fixed input rows produce expected metric values; compare produces stable sorted output.
- [ ] **Step 1, 2, 3** as in C3.
@@ -1209,9 +1209,9 @@ git add tasks/p5 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
`mkdir -p tasks/p6` and create:
- `p6-1-image-extractor-exif.md`: phase epic §9.1, §3.4 ImageRefBlock, §3.7a ImageType. Allowed: `kb-core`, `kb-config`, `image`, `kamadak-exif`. Implements `Extractor` for `MediaType::Image(_)` producing a `CanonicalDocument` whose body is exactly one `ImageRefBlock`. EXIF goes to `metadata.user`. `depends_on: [p0-1, p1-6]`. Tests: PNG/JPEG decode metadata; EXIF extraction; deterministic doc_id.
- `p6-2-ocr-adapter.md`: phase epic §9.1. Allowed: `kb-core`, `kb-config`, `image`, OS-specific OCR (feature `apple-vision` for macOS via sidecar binary; feature `tesseract` for cross-platform; default tesseract). Defines `OcrEngine` trait + adapter. Populates `ImageRefBlock.ocr` `OcrText` (`joined`, regions, engine, engine_version). `depends_on: [p6-1]`. Tests: deterministic text on a fixed fixture image with high-confidence text.
- `p6-3-caption-adapter.md`: phase epic §9.1 caption section, §3.7a ModelCaption. Allowed: `kb-core`, `kb-config`, `kb-llm` (reuse LanguageModel for VLM). Optional/feature-gated. `depends_on: [p6-1, p4-2]`. Populates `ImageRefBlock.caption`. Tests: with mock LM, caption recorded with model id; absence of feature flag leaves caption=None.
- `p6-1-image-extractor-exif.md`: phase epic §9.1, §3.4 ImageRefBlock, §3.7a ImageType. Allowed: `kebab-core`, `kebab-config`, `image`, `kamadak-exif`. Implements `Extractor` for `MediaType::Image(_)` producing a `CanonicalDocument` whose body is exactly one `ImageRefBlock`. EXIF goes to `metadata.user`. `depends_on: [p0-1, p1-6]`. Tests: PNG/JPEG decode metadata; EXIF extraction; deterministic doc_id.
- `p6-2-ocr-adapter.md`: phase epic §9.1. Allowed: `kebab-core`, `kebab-config`, `image`, OS-specific OCR (feature `apple-vision` for macOS via sidecar binary; feature `tesseract` for cross-platform; default tesseract). Defines `OcrEngine` trait + adapter. Populates `ImageRefBlock.ocr` `OcrText` (`joined`, regions, engine, engine_version). `depends_on: [p6-1]`. Tests: deterministic text on a fixed fixture image with high-confidence text.
- `p6-3-caption-adapter.md`: phase epic §9.1 caption section, §3.7a ModelCaption. Allowed: `kebab-core`, `kebab-config`, `kebab-llm` (reuse LanguageModel for VLM). Optional/feature-gated. `depends_on: [p6-1, p4-2]`. Populates `ImageRefBlock.caption`. Tests: with mock LM, caption recorded with model id; absence of feature flag leaves caption=None.
- [ ] **Step 1, 2, 3** as in C3.
@@ -1226,8 +1226,8 @@ git add tasks/p6 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
`mkdir -p tasks/p7` and create:
- `p7-1-pdf-text-extractor.md`: phase epic §9.2, §3.4 SourceSpan::Page. Allowed: `kb-core`, `kb-config`, `pdf-extract`, `lopdf` (page metadata). `depends_on: [p0-1, p1-6]`. Extractor for `MediaType::Pdf` produces a `CanonicalDocument` with one `Paragraph` per page, `SourceSpan::Page`. Failed-text pages are emitted as paragraphs with empty text and a `Provenance` warning marking them as scanned candidates. Tests: page count, span correctness, failure handling.
- `p7-2-pdf-page-chunker.md`: phase epic §9.2, §3.5, §0 Q3 citation. Allowed: `kb-core`, `kb-config`. New chunker version `pdf-page-v1` that respects page boundaries. `depends_on: [p7-1]`. Tests: chunk does not cross page boundary; very long page subdivides per `target_tokens`.
- `p7-1-pdf-text-extractor.md`: phase epic §9.2, §3.4 SourceSpan::Page. Allowed: `kebab-core`, `kebab-config`, `pdf-extract`, `lopdf` (page metadata). `depends_on: [p0-1, p1-6]`. Extractor for `MediaType::Pdf` produces a `CanonicalDocument` with one `Paragraph` per page, `SourceSpan::Page`. Failed-text pages are emitted as paragraphs with empty text and a `Provenance` warning marking them as scanned candidates. Tests: page count, span correctness, failure handling.
- `p7-2-pdf-page-chunker.md`: phase epic §9.2, §3.5, §0 Q3 citation. Allowed: `kebab-core`, `kebab-config`. New chunker version `pdf-page-v1` that respects page boundaries. `depends_on: [p7-1]`. Tests: chunk does not cross page boundary; very long page subdivides per `target_tokens`.
- [ ] **Step 1, 2, 3** as in C3.
@@ -1242,7 +1242,7 @@ git add tasks/p7 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
`mkdir -p tasks/p8` and create:
- `p8-1-whisper-adapter.md`: phase epic §9.3, §3.4 AudioRefBlock + `Transcript`. Allowed: `kb-core`, `kb-config`, whisper.cpp Rust binding (`whisper-rs`) or sidecar binary. `depends_on: [p0-1, p1-6]`. Implements `Transcriber` trait. Default model `large-v3` via config; tests use a tiny model (e.g., `base.en`) for speed. Tests: monotone segment timestamps, language detection populated, deterministic transcript on fixed audio.
- `p8-1-whisper-adapter.md`: phase epic §9.3, §3.4 AudioRefBlock + `Transcript`. Allowed: `kebab-core`, `kebab-config`, whisper.cpp Rust binding (`whisper-rs`) or sidecar binary. `depends_on: [p0-1, p1-6]`. Implements `Transcriber` trait. Default model `large-v3` via config; tests use a tiny model (e.g., `base.en`) for speed. Tests: monotone segment timestamps, language detection populated, deterministic transcript on fixed audio.
- `p8-2-segment-chunker.md`: phase epic §9.3, §3.5. New `audio-segment-v1` chunker that groups segments up to `target_tokens` with priority on speaker turn boundaries (when present). `depends_on: [p8-1]`. Tests: chunk timestamp == first/last segment timestamp; speaker change forces split.
- [ ] **Step 1, 2, 3** as in C3.
@@ -1258,11 +1258,11 @@ git add tasks/p8 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
`mkdir -p tasks/p9` and create:
- `p9-1-tui-library.md`: phase epic §16.2, §3.7. Allowed: `kb-core`, `kb-app` only (UI law). `ratatui`, `crossterm`. `depends_on: [p1-6]`. Library list view + tag filter. Tests: snapshot of rendered frame against fixture corpus list.
- `p9-1-tui-library.md`: phase epic §16.2, §3.7. Allowed: `kebab-core`, `kebab-app` only (UI law). `ratatui`, `crossterm`. `depends_on: [p1-6]`. Library list view + tag filter. Tests: snapshot of rendered frame against fixture corpus list.
- `p9-2-tui-search.md`: phase epic §16.2, §1.5. Allowed: same as p9-1. `depends_on: [p2-2, p3-4]`. Search input + result list + preview pane; `Enter` triggers external editor jump (`$EDITOR +<line> <path>`). Tests: search results render; `g` keybinding constructs the correct editor command.
- `p9-3-tui-ask.md`: phase epic §16.2, §1.1, §1.2. Allowed: same. `depends_on: [p4-3]`. Ask pane shows streaming tokens; `--explain` toggle. Tests: streaming render, refusal render.
- `p9-4-tui-inspect.md`: §1.6 inspect, §3.5. Allowed: same. `depends_on: [p1-6, p3-3]`. Renders Document and Chunk inspection per wire schemas 2.5/2.6.
- `p9-5-desktop-tauri.md`: phase epic §16.3, §1 all scenes. Allowed: `kb-core`, `kb-app`, Tauri backend; frontend stack TBD by user (vanilla TS by default). `depends_on: [p9-1, p9-2, p9-3, p9-4]`. Backend exposes Tauri commands that wrap `kb-app` 1:1. Source viewer per medium (Markdown render, PDF page, image with region overlay, audio with seek). Tests: backend command unit tests (no frontend e2e in this task).
- `p9-5-desktop-tauri.md`: phase epic §16.3, §1 all scenes. Allowed: `kebab-core`, `kebab-app`, Tauri backend; frontend stack TBD by user (vanilla TS by default). `depends_on: [p9-1, p9-2, p9-3, p9-4]`. Backend exposes Tauri commands that wrap `kebab-app` 1:1. Source viewer per medium (Markdown render, PDF page, image with region overlay, audio with seek). Tests: backend command unit tests (no frontend e2e in this task).
- [ ] **Step 1, 2, 3** as in C3.
@@ -1351,5 +1351,5 @@ git -c user.name=kb -c user.email=kb@local commit -m "tasks: update INDEX with f
- [ ] `tasks/p0/`, `tasks/p1/` … `tasks/p9/` exist with the component spec files listed above.
- [ ] Every component spec contains: frontmatter (with `contract_sections`), Allowed/Forbidden, Inputs, Outputs, Public surface, Behavior contract, Storage/wire effects, Test plan, Definition of Done, Out of scope, Risks/notes.
- [ ] `tasks/INDEX.md` lists every component task.
- [ ] No new domain types introduced inside any component spec — every type referenced is defined in [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../specs/2026-04-27-kb-final-form-design.md).
- [ ] No new domain types introduced inside any component spec — every type referenced is defined in [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../specs/2026-04-27-kebab-final-form-design.md).
- [ ] All commits authored sequentially per task; rollback is per-task.

View File

@@ -3,7 +3,7 @@ title: "KB v1 최종 결과물 형태 — Frozen Design"
date: 2026-04-27
status: frozen
purpose: 작은 단위 분해 작업 시 spec 변경을 막기 위한 단단한 contract 동결
source_report: ../../../kb_local_rust_report.md
source_report: ../../../kebab_local_rust_report.md
related_tasks: ../../../tasks/INDEX.md
---
@@ -11,7 +11,7 @@ related_tasks: ../../../tasks/INDEX.md
이 문서는 사용자가 만족할 **최종 결과물의 매우 구체적 형태**를 동결한다. 각 phase 의 task 분해는 이 contract 위에서 수행되며, 이 문서가 바뀌지 않는 한 task 들의 인터페이스는 변하지 않는다.
전제 보고서는 [`kb_local_rust_report.md`](../../../kb_local_rust_report.md). 그 보고서가 *방향*과 *근거*를 제공하며, 이 문서가 *형태*를 못박는다.
전제 보고서는 [`kebab_local_rust_report.md`](../../../kebab_local_rust_report.md). 그 보고서가 *방향*과 *근거*를 제공하며, 이 문서가 *형태*를 못박는다.
---
@@ -20,7 +20,7 @@ related_tasks: ../../../tasks/INDEX.md
| # | 결정 | 값 | 근거 |
|---|------|-----|------|
| Q1 | scope 우선순위 | UX → Data 역도출 | 사용자 만족이 spec 안정성 lever |
| Q2 | headline UX | `kb ask` 답변 화면 | 검색/citation/RAG/refusal/모델메타 모두 노출 |
| Q2 | headline UX | `kebab ask` 답변 화면 | 검색/citation/RAG/refusal/모델메타 모두 노출 |
| | ask 기본 형식 | inline numeric refs `[1]…[n]` + footer | 일상 가독성 |
| | ask `--explain` | per-claim 분해 + verbose footer + retrieval trace | 디버그 단일 플래그 |
| Q3 | citation 문자열 | URI fragment (`path#k=v…`, W3C Media Fragments) | 표준 정합 + Windows path 안전 + 브라우저 자동 스크롤 |
@@ -34,7 +34,7 @@ related_tasks: ../../../tasks/INDEX.md
| Q10 | workspace | single root + XDG layout | personal v1 적정 |
| | asset 보존 | content-addressable copy, `copy_threshold_mb=100` 초과 시 reference + checksum | reproducibility + 디스크 절감 |
| | wire 버전 | additive within `vN`, breaking → `vN+1` | 외부 깨짐 방지 |
| | ignore | gitignore 문법 + `.kbignore` | 익숙함 |
| | ignore | gitignore 문법 + `.kebabignore` | 익숙함 |
| | 에러 | thiserror per crate, anyhow at boundary | 추적성 + UX |
| | sync | watch=false default | v1 명시 ingest |
@@ -42,43 +42,43 @@ related_tasks: ../../../tasks/INDEX.md
## 1. Headline UX scenes
### 1.1 `kb ask` (default)
### 1.1 `kebab ask` (default)
```text
$ kb ask "Markdown chunking 규칙은?"
$ kebab ask "Markdown chunking 규칙은?"
heading boundary 우선 [1]. code block 중간 분할 금지 [2]. table 가능한 한
단일 chunk 유지 [2]. 긴 section 은 paragraph 단위로 분할 [1]. chunk 마다
heading_path 와 source_span 보존 [1].
─────────────────────────────────────────────────────────
[1] notes/rust/kb-architecture.md#L661-L672
[1] notes/rust/kebab-architecture.md#L661-L672
§14 Chunking 정책
[2] notes/rust/kb-architecture.md#L665-L668
[2] notes/rust/kebab-architecture.md#L665-L668
§14 Chunking 정책
grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks
```
### 1.2 `kb ask --explain`
### 1.2 `kebab ask --explain`
```text
$ kb ask --explain "Markdown chunking 규칙은?"
$ kebab ask --explain "Markdown chunking 규칙은?"
▎ heading boundary 우선
└ notes/rust/kb-architecture.md#L662
└ notes/rust/kebab-architecture.md#L662
「heading boundary를 우선한다」
▎ code block 중간 분할 금지
└ notes/rust/kb-architecture.md#L663
└ notes/rust/kebab-architecture.md#L663
「code block은 중간에서 자르지 않는다」
▎ table 단일 chunk 유지
└ notes/rust/kb-architecture.md#L664
└ notes/rust/kebab-architecture.md#L664
「table은 가능한 한 하나의 chunk로」
▎ heading_path / source_span 보존
└ notes/rust/kb-architecture.md#L668-L670
└ notes/rust/kebab-architecture.md#L668-L670
retrieval trace
query "Markdown chunking 규칙은?"
@@ -87,8 +87,8 @@ retrieval trace
threshold (gate) 0.30 → top-1 0.82 pass
fusion rrf (k=60)
chunks (used) 3 / 8 returned
#1 0.82 notes/rust/kb-architecture.md#L661-L672 bm25=12.4 vec=0.78
#2 0.78 notes/rust/kb-architecture.md#L692-L713 bm25=10.1 vec=0.74
#1 0.82 notes/rust/kebab-architecture.md#L661-L672 bm25=12.4 vec=0.78
#2 0.78 notes/rust/kebab-architecture.md#L692-L713 bm25=10.1 vec=0.74
#3 0.55 guides/markdown-style.md#L4-L18 bm25=8.2 vec=0.61
grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks
@@ -96,10 +96,10 @@ prompt 1184 tokens completion 312 tokens latency 1842 ms
embedding multilingual-e5-small index v1.0
```
### 1.3 `kb ask` (refusal — score gate)
### 1.3 `kebab ask` (refusal — score gate)
```text
$ kb ask "당신의 회사 매출은?"
$ kebab ask "당신의 회사 매출은?"
근거 부족. KB 에 해당 내용 없음.
가까운 후보 (모두 임계 0.30 미만):
@@ -108,10 +108,10 @@ $ kb ask "당신의 회사 매출은?"
grounded ✗ qwen2.5:14b-instruct rag-v1 0 chunks used
```
### 1.4 `kb ask` (refusal — LLM self-judge)
### 1.4 `kebab ask` (refusal — LLM self-judge)
```text
$ kb ask "이 책의 23쪽 결론은?"
$ kebab ask "이 책의 23쪽 결론은?"
근거 부족. 제공된 chunk 중 결론 내용 없음.
검색은 됨, LLM 이 결론 부재 판단:
@@ -121,17 +121,17 @@ $ kb ask "이 책의 23쪽 결론은?"
grounded ✗ qwen2.5:14b-instruct rag-v1 3 chunks searched, 0 grounded
```
### 1.5 `kb search` (dense)
### 1.5 `kebab search` (dense)
```text
$ kb search "Markdown chunking 규칙"
$ kebab search "Markdown chunking 규칙"
1. 0.82 notes/rust/kb-architecture.md#L661-L672
1. 0.82 notes/rust/kebab-architecture.md#L661-L672
§14 Chunking 정책
heading boundary 우선. code block 중간 분할 금지.
table 가능한 한 단일 chunk…
2. 0.71 notes/rust/kb-architecture.md#L692-L713
2. 0.71 notes/rust/kebab-architecture.md#L692-L713
§15 검색과 RAG 정책
검색은 처음부터 hybrid 로 설계하되 구현은 단계적…
@@ -142,7 +142,7 @@ $ kb search "Markdown chunking 규칙"
3 hits hybrid index v1.0 bm25+e5-small/RRF
```
### 1.6 `kb search --explain`
### 1.6 `kebab search --explain`
각 hit 아래 추가:
@@ -174,8 +174,8 @@ $ kb search "Markdown chunking 규칙"
{
"schema_version": "citation.v1",
"kind": "line|page|region|caption|time",
"path": "notes/rust/kb.md",
"uri": "notes/rust/kb.md#L12-L34",
"path": "notes/rust/kebab.md",
"uri": "notes/rust/kebab.md#L12-L34",
"line": { "start": 12, "end": 34, "section": "§14 Chunking 정책" },
"page": { "page": 13, "section": "Experiment Setup" },
@@ -196,7 +196,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
"score": 0.82,
"chunk_id": "9b4a8c1e7d3f2a05",
"doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kb-architecture.md",
"doc_path": "notes/rust/kebab-architecture.md",
"heading_path": ["아키텍처", "Chunking 정책"],
"section_label": "§14 Chunking 정책",
"snippet": "heading boundary 우선. code block 중간 분할 금지…",
@@ -261,7 +261,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
{
"kind": "new|updated|skipped|error",
"doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kb-architecture.md",
"doc_path": "notes/rust/kebab-architecture.md",
"asset_id": "8c1e7d3f2a05",
"byte_len": 41822,
"block_count": 184,
@@ -277,13 +277,13 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
`--summary-only``items: null`.
### 2.5 DocSummary (`kb list docs`)
### 2.5 DocSummary (`kebab list docs`)
```json
{
"schema_version": "doc_summary.v1",
"doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kb-architecture.md",
"doc_path": "notes/rust/kebab-architecture.md",
"title": "Rust 로컬 Knowledge Base 설계",
"lang": "ko",
"tags": ["knowledge-base", "rust", "rag"],
@@ -305,7 +305,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
"schema_version": "chunk_inspection.v1",
"chunk_id": "9b4a8c1e7d3f2a05",
"doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kb-architecture.md",
"doc_path": "notes/rust/kebab-architecture.md",
"heading_path": ["아키텍처", "Chunking 정책"],
"text": "heading boundary 우선…",
"source_spans": [{ "kind": "line", "start": 661, "end": 672 }],
@@ -325,9 +325,9 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
"schema_version": "doctor.v1",
"ok": true,
"checks": [
{ "name": "config_loaded", "ok": true, "detail": "~/.config/kb/config.toml" },
{ "name": "data_dir_writable", "ok": true, "detail": "~/.local/share/kb" },
{ "name": "sqlite_open", "ok": true, "detail": "kb.sqlite (schema v1)" },
{ "name": "config_loaded", "ok": true, "detail": "~/.config/kebab/config.toml" },
{ "name": "data_dir_writable", "ok": true, "detail": "~/.local/share/kebab" },
{ "name": "sqlite_open", "ok": true, "detail": "kebab.sqlite (schema v1)" },
{ "name": "lancedb_open", "ok": true, "detail": "lancedb/" },
{ "name": "embedding_model", "ok": true, "detail": "multilingual-e5-small (384d)" },
{ "name": "ollama_reachable", "ok": true, "detail": "http://127.0.0.1:11434" },
@@ -346,7 +346,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
---
## 3. 도메인 모델 (kb-core)
## 3. 도메인 모델 (kebab-core)
### 3.1 Newtype IDs
@@ -581,7 +581,7 @@ pub struct RetrievalDetail {
### 3.7a Forward-declared types
`Block::ImageRef` / `AudioRef` variant 은 v1 부터 존재하나, 그 안의 `ocr` / `caption` / `transcript` 필드는 P1 에선 항상 `None`. 다음 타입은 `kb-core` 에 stub 으로 둠 (최종 도메인 모델 슬롯):
`Block::ImageRef` / `AudioRef` variant 은 v1 부터 존재하나, 그 안의 `ocr` / `caption` / `transcript` 필드는 P1 에선 항상 `None`. 다음 타입은 `kebab-core` 에 stub 으로 둠 (최종 도메인 모델 슬롯):
```rust
pub struct OcrText { pub joined: String, pub regions: Vec<OcrRegion>, pub engine: String, pub engine_version: String }
@@ -596,41 +596,41 @@ pub enum ImageType { Png, Jpeg, Webp, Gif, Tiff, Other(String) }
pub enum AudioType { M4a, Mp3, Wav, Flac, Ogg, Other(String) }
```
`ExtractConfig`, `DocFilter`, `JobKind`, `JobStatus`, `JobFilter`, `JobRow`, `JobId`, `VectorRecord`, `VectorHit`, `RefusalSignal`, `NoHitSignal`, `DoctorUnhealthy``kb-core` 에 정의 (자세한 필드는 사용 시 결정, 이 spec 에서 forward-ref 만 보장).
`ExtractConfig`, `DocFilter`, `JobKind`, `JobStatus`, `JobFilter`, `JobRow`, `JobId`, `VectorRecord`, `VectorHit`, `RefusalSignal`, `NoHitSignal`, `DoctorUnhealthy``kebab-core` 에 정의 (자세한 필드는 사용 시 결정, 이 spec 에서 forward-ref 만 보장).
`OffsetDateTime``time::OffsetDateTime`, `Result` 는 crate-local alias.
### 3.7b Parser intermediate types — `kb-parse-types`
### 3.7b Parser intermediate types — `kebab-parse-types`
Parser 의 *중간* 표현 (`ParsedBlock` 류) 은 `kb-core` 가 아니라 별도의 thin crate **`kb-parse-types`** 에 둔다. 이유: `kb-normalize` 는 medium-agnostic 한 ID/Provenance lift 를 책임지고 어떤 parser 도 직접 import 하면 안 된다. 그러나 normalize 에 들어오는 입력 타입이 어딘가에 정의되어야 하는데, 그것을 `kb-core` 에 박으면 (a) parser-별 ParsedBlock 변종 (`ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`) 이 향후 합류할 때 core 의 namespace 가 폭발하고, (b) parser 의 의미 변경이 core 변경이 되어 모든 의존자가 영향을 받는다.
Parser 의 *중간* 표현 (`ParsedBlock` 류) 은 `kebab-core` 가 아니라 별도의 thin crate **`kebab-parse-types`** 에 둔다. 이유: `kebab-normalize` 는 medium-agnostic 한 ID/Provenance lift 를 책임지고 어떤 parser 도 직접 import 하면 안 된다. 그러나 normalize 에 들어오는 입력 타입이 어딘가에 정의되어야 하는데, 그것을 `kebab-core` 에 박으면 (a) parser-별 ParsedBlock 변종 (`ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`) 이 향후 합류할 때 core 의 namespace 가 폭발하고, (b) parser 의 의미 변경이 core 변경이 되어 모든 의존자가 영향을 받는다.
`kb-parse-types` 는 이 둘 사이의 **유일한 layer** 다. 의존 그래프:
`kebab-parse-types` 는 이 둘 사이의 **유일한 layer** 다. 의존 그래프:
```text
kb-core (도메인 모델 — Block, Chunk, SourceSpan, IDs, …)
kebab-core (도메인 모델 — Block, Chunk, SourceSpan, IDs, …)
kb-parse-types (parser 중간 표현 — ParsedBlock, ParsedImageRegion[P+], ParsedPdfPage[P+], ParsedAudioSegment[P+], Inline)
kebab-parse-types (parser 중간 표현 — ParsedBlock, ParsedImageRegion[P+], ParsedPdfPage[P+], ParsedAudioSegment[P+], Inline)
▲ ▲
│ │
kb-parse-md, kb-parse-pdf, kb-normalize
kb-parse-image, kb-parse-audio
kebab-parse-md, kebab-parse-pdf, kebab-normalize
kebab-parse-image, kebab-parse-audio
```
`kb-parse-types` 는:
- `kb-core` 에만 의존 (`Block`, `SourceSpan`, `Lang` 등 도메인 타입 사용).
- 다른 어떤 `kb-*` 에도 의존하지 않는다.
`kebab-parse-types` 는:
- `kebab-core` 에만 의존 (`Block`, `SourceSpan`, `Lang` 등 도메인 타입 사용).
- 다른 어떤 `kebab-*` 에도 의존하지 않는다.
- 어떤 parser 의 구체 라이브러리 (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`) 에도 의존하지 않는다.
- serde + thiserror 정도의 외부 의존만 가진다.
P1 에서 정의되는 타입:
```rust
// kb-parse-types — depends on kb-core only.
// kebab-parse-types — depends on kebab-core only.
pub struct ParsedBlock {
pub kind: ParsedBlockKind,
pub heading_path: Vec<String>,
pub source_span: kb_core::SourceSpan,
pub source_span: kebab_core::SourceSpan,
pub payload: ParsedPayload,
}
@@ -638,11 +638,11 @@ pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRe
pub enum ParsedPayload {
Heading { level: u8, text: String },
Paragraph { text: String, inlines: Vec<kb_core::Inline> },
List { ordered: bool, items: Vec<Vec<kb_core::Inline>> },
Paragraph { text: String, inlines: Vec<kebab_core::Inline> },
List { ordered: bool, items: Vec<Vec<kebab_core::Inline>> },
Code { lang: Option<String>, code: String },
Table { headers: Vec<String>, rows: Vec<Vec<String>> },
Quote { text: String, inlines: Vec<kb_core::Inline> },
Quote { text: String, inlines: Vec<kebab_core::Inline> },
ImageRef { src: String, alt: String },
AudioRef { src: String }, // duration_ms filled by extractor before chunking
}
@@ -651,7 +651,7 @@ pub struct Warning { pub kind: WarningKind, pub note: String }
pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed }
```
`Inline``kb-core` (§3.4) 에 있는 도메인 타입. `kb-parse-types` 는 그것을 *참조* 만 한다 — 같은 의미를 두 crate 에 중복 정의하지 않는다 (그러면 normalize 가 identity-conversion 을 해야 해서 무의미).
`Inline``kebab-core` (§3.4) 에 있는 도메인 타입. `kebab-parse-types` 는 그것을 *참조* 만 한다 — 같은 의미를 두 crate 에 중복 정의하지 않는다 (그러면 normalize 가 identity-conversion 을 해야 해서 무의미).
P6/P7/P8 에서 추가될 타입 (forward-ref):
@@ -661,7 +661,7 @@ pub struct ParsedPdfPage { pub page: u32, pub text: String }
pub struct ParsedAudioSegment { pub start_ms: u64, pub end_ms: u64, pub text: String }
```
→ 새 medium 추가 시 `kb-core::Block` 변종은 변하지 않고, `kb-parse-types` 만 확장된다.
→ 새 medium 추가 시 `kebab-core::Block` 변종은 변하지 않고, `kebab-parse-types` 만 확장된다.
### 3.8 Answer / RAG types
@@ -995,28 +995,28 @@ CREATE TABLE eval_query_results (
| 종류 | 기본 위치 |
|------|-----------|
| 워크스페이스 | `~/KnowledgeBase/` |
| config | `~/.config/kb/config.toml` |
| data | `~/.local/share/kb/` |
| cache | `~/.cache/kb/` |
| state (logs) | `~/.local/state/kb/` |
| config | `~/.config/kebab/config.toml` |
| data | `~/.local/share/kebab/` |
| cache | `~/.cache/kebab/` |
| state (logs) | `~/.local/state/kebab/` |
`~`, `$HOME`, `${KB_*}` expand. 절대 path 정규화 후 사용.
`~`, `$HOME`, `${KEBAB_*}` expand. 절대 path 정규화 후 사용.
### 6.2 Workspace 구조
```
~/KnowledgeBase/
├── inbox/ notes/ papers/ photos/ recordings/
└── .kbignore
└── .kebabignore
```
`.kbignore``config.workspace.exclude` 합집합.
`.kebabignore``config.workspace.exclude` 합집합.
### 6.3 Data dir 구조
```
~/.local/share/kb/
├── kb.sqlite (+ -wal, -shm)
~/.local/share/kebab/
├── kebab.sqlite (+ -wal, -shm)
├── lancedb/
│ └── chunk_embeddings_<model>_<dim>.lance/
├── assets/<aa>/<asset_id> # shard
@@ -1025,7 +1025,7 @@ CREATE TABLE eval_query_results (
└── runs/<run_id>/ # eval per_query.jsonl + report.md
```
### 6.4 Config (`~/.config/kb/config.toml`) — frozen schema
### 6.4 Config (`~/.config/kebab/config.toml`) — frozen schema
```toml
schema_version = 1
@@ -1036,8 +1036,8 @@ include = ["**/*.md"]
exclude = [".git/**", "node_modules/**", ".obsidian/**"]
[storage]
data_dir = "${XDG_DATA_HOME:-~/.local/share}/kb"
sqlite = "{data_dir}/kb.sqlite"
data_dir = "${XDG_DATA_HOME:-~/.local/share}/kebab"
sqlite = "{data_dir}/kebab.sqlite"
vector_dir = "{data_dir}/lancedb"
asset_dir = "{data_dir}/assets"
artifact_dir = "{data_dir}/artifacts"
@@ -1086,15 +1086,15 @@ max_context_tokens = 8000
config 우선순위: default → file → env (`KB_<SECTION>_<KEY>`) → CLI flag.
### 6.5 `kb init` 출력
### 6.5 `kebab init` 출력
```text
$ kb init
created ~/.config/kb/config.toml
created ~/.local/share/kb/
$ kebab init
created ~/.config/kebab/config.toml
created ~/.local/share/kebab/
created ~/KnowledgeBase/
opened ~/.local/share/kb/kb.sqlite (schema v1)
hint edit ~/.config/kb/config.toml then `kb ingest ~/KnowledgeBase`
opened ~/.local/share/kebab/kebab.sqlite (schema v1)
hint edit ~/.config/kebab/config.toml then `kebab ingest ~/KnowledgeBase`
```
기존 파일 보존, `--force` 명시 필요.
@@ -1107,7 +1107,7 @@ hint edit ~/.config/kb/config.toml then `kb ingest ~/KnowledgeBase`
---
## 7. Trait contracts (kb-core)
## 7. Trait contracts (kebab-core)
### 7.1 입출력 보조
@@ -1210,34 +1210,34 @@ pub trait JobRepo {
## 8. 모듈 경계 (Allowed / Forbidden)
```text
kb-cli, kb-tui, kb-desktop
└─> kb-app
├─> kb-source-fs
├─> kb-parse-md / kb-parse-pdf / kb-parse-image / kb-parse-audio
│ └─> kb-parse-types (parser intermediate)
├─> kb-normalize
│ └─> kb-parse-types
├─> kb-chunk
├─> kb-store-sqlite (DocumentStore, JobRepo, Retriever[lexical])
├─> kb-store-vector (VectorStore)
├─> kb-embed-local
├─> kb-search (Retriever[hybrid])
├─> kb-llm-local
├─> kb-rag
├─> kb-eval
└─> kb-config
└─> kb-core (모두 의존)
kebab-cli, kebab-tui, kebab-desktop
└─> kebab-app
├─> kebab-source-fs
├─> kebab-parse-md / kebab-parse-pdf / kebab-parse-image / kebab-parse-audio
│ └─> kebab-parse-types (parser intermediate)
├─> kebab-normalize
│ └─> kebab-parse-types
├─> kebab-chunk
├─> kebab-store-sqlite (DocumentStore, JobRepo, Retriever[lexical])
├─> kebab-store-vector (VectorStore)
├─> kebab-embed-local
├─> kebab-search (Retriever[hybrid])
├─> kebab-llm-local
├─> kebab-rag
├─> kebab-eval
└─> kebab-config
└─> kebab-core (모두 의존)
```
`kb-parse-types``kb-core` 와 parsers/normalize 사이의 thin layer (§3.7b 참조). parser-별 중간 표현 (`ParsedBlock`, `ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`, `Inline`) 을 한 곳에 모아 (a) `kb-core` 의 namespace 폭발을 막고 (b) `kb-normalize` 가 parser 를 직접 import 하지 않게 한다.
`kebab-parse-types``kebab-core` 와 parsers/normalize 사이의 thin layer (§3.7b 참조). parser-별 중간 표현 (`ParsedBlock`, `ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`, `Inline`) 을 한 곳에 모아 (a) `kebab-core` 의 namespace 폭발을 막고 (b) `kebab-normalize` 가 parser 를 직접 import 하지 않게 한다.
핵심 금지:
- UI → store/llm/parse 직접 의존 ✗
- parse-* → store/llm/embed ✗
- parse-* → kb-normalize ✗ (단방향: parsers → kb-parse-types ← normalize)
- parse-* → kebab-normalize ✗ (단방향: parsers → kebab-parse-types ← normalize)
- chunk → llm/embed ✗
- normalize → store / parse-* ✗
- kb-parse-types → 어떤 parser/normalize/store/llm/embed/search/rag/ui ✗ (`kb-core` 만 의존)
- kebab-parse-types → 어떤 parser/normalize/store/llm/embed/search/rag/ui ✗ (`kebab-core` 만 의존)
- 다른 store 와 cross-write ✗
`cargo deny` + workspace deny.toml + CI 체크로 강제.
@@ -1270,7 +1270,7 @@ CI:
## 10. 에러 모델 + exit codes
```rust
// kb-core
// kebab-core
pub enum CoreError { InvalidId, InvalidCitation, InvalidSpan, Malformed }
// crate-local examples
pub enum ParseMdError { Yaml(String), Encoding, Pulldown(String), Span }
@@ -1278,7 +1278,7 @@ pub enum StoreError { Sqlx(rusqlite::Error), Migration(String), Conflict(String)
pub enum LlmError { Unreachable, ModelNotPulled(String), Timeout, Stream(String) }
```
Boundary (`kb-app`, `kb-cli`) 에서 `anyhow::Error` 합침. exit code 매핑:
Boundary (`kebab-app`, `kebab-cli`) 에서 `anyhow::Error` 합침. exit code 매핑:
```rust
fn exit_code(err: &anyhow::Error) -> i32 {
@@ -1295,17 +1295,17 @@ fn exit_code(err: &anyhow::Error) -> i32 {
| `--verbose` | + anyhow chain |
| `--debug` 또는 `RUST_LOG=debug` | + tracing target/level/span |
Refusal 은 에러 아님. `kb ask` 거절은 정상 stdout (Answer with grounded=false) + exit 1.
Refusal 은 에러 아님. `kebab ask` 거절은 정상 stdout (Answer with grounded=false) + exit 1.
Logging: `tracing` + `tracing-subscriber` + `tracing-appender` daily roll, `~/.local/state/kb/logs/`. structured (`trace_id`, `doc_id`, `chunk_id`).
Logging: `tracing` + `tracing-subscriber` + `tracing-appender` daily roll, `~/.local/state/kebab/logs/`. structured (`trace_id`, `doc_id`, `chunk_id`).
`kb doctor` 출력 (사람):
`kebab doctor` 출력 (사람):
```text
$ kb doctor
✓ config_loaded ~/.config/kb/config.toml
✓ data_dir_writable ~/.local/share/kb
✓ sqlite_open kb.sqlite (schema v1)
$ kebab doctor
✓ config_loaded ~/.config/kebab/config.toml
✓ data_dir_writable ~/.local/share/kebab
✓ sqlite_open kebab.sqlite (schema v1)
✓ lancedb_open lancedb/
✓ embedding_model multilingual-e5-small (384d)
✓ ollama_reachable http://127.0.0.1:11434
@@ -1322,7 +1322,7 @@ $ kb doctor
이 문서가 동결 ↔ 다음 컴포넌트 분해 작업이 안전:
- 모든 wire schema (`docs/wire-schema/v1/*.schema.json`)
- 모든 trait 시그니처 (kb-core)
- 모든 trait 시그니처 (kebab-core)
- 모든 ID recipe (4.2)
- SQLite DDL (5장)
- Filesystem + config schema (6장)
@@ -1334,7 +1334,7 @@ $ kb doctor
**의도적으로 빠진 것 (out of scope, P+)**:
- multi-workspace
- watch mode
- desktop app `kb://` protocol handler
- desktop app `kebab://` protocol handler
- LLM-as-judge eval
- visual embedding (CLIP)
- real-time collab