docs(rename): kb → kebab — README, tasks/, docs/, design doc, report

마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-02 04:01:55 +00:00
parent f1a448d6dc
commit f9714aa5cb
56 changed files with 1324 additions and 1324 deletions

View File

@@ -14,30 +14,30 @@ historical contract that was implemented; this file accumulates the
deltas so phase 5+ readers can find the live behavior without diffing
git history.
## 2026-05-01 — `--config` flag silently ignored across all kb-cli subcommands
## 2026-05-01 — `--config` flag silently ignored across all kebab-cli subcommands
**Discovered**: post-P3-5 manual smoke at `/tmp/kb-smoke/`.
**Discovered**: post-P3-5 manual smoke at `/tmp/kebab-smoke/`.
**Symptom**: `kb --config /path/to/config.toml ingest|search|list|inspect|doctor` ignored the flag and fell back to `~/.config/kb/config.toml` (XDG default). Users had to use `KB_*` env vars to point at a non-default config.
**Symptom**: `kebab --config /path/to/config.toml ingest|search|list|inspect|doctor` ignored the flag and fell back to `~/.config/kebab/config.toml` (XDG default). Users had to use `KEBAB_*` env vars to point at a non-default config.
**Root cause**: `kb-cli` read `cli.config` only inside `Cmd::Ingest` to build `SourceScope`, then called bare `kb_app::ingest(scope, summary_only)` which internally re-loaded `Config::load(None)` (XDG path). Same pattern in `Cmd::Search` / `List` / `Inspect` / `Doctor`. P3-5 introduced `*_with_config` test seams via `#[doc(hidden)] pub fn` but kb-cli never used them.
**Root cause**: `kebab-cli` read `cli.config` only inside `Cmd::Ingest` to build `SourceScope`, then called bare `kebab_app::ingest(scope, summary_only)` which internally re-loaded `Config::load(None)` (XDG path). Same pattern in `Cmd::Search` / `List` / `Inspect` / `Doctor`. P3-5 introduced `*_with_config` test seams via `#[doc(hidden)] pub fn` but kebab-cli never used them.
**Fix** (PR #20, fix/cli-config-flag-and-search-output):
- `kb-cli` now builds the Config once via `Config::load(cli.config.as_deref())` at the top of every subcommand and threads it into `kb_app::*_with_config(cfg, ...)` instead of `kb_app::*(...)`.
- `kb_app::doctor()` rewritten as `doctor_with_config_path(Option<&Path>)` that reports the actual path probed and hard-fails when `--config <path>` doesn't exist (defaults would otherwise mask user intent).
- `kb-app` module doc-comment updated: `#[doc(hidden)] pub fn *_with_config` is no longer "test-only seam" — it's the official "config-explicit" API consumed by CLI `--config`, integration tests, and TUI sessions.
- Same PR also improved `kb search` printer: `{:.4}` score formatting (RRF range collapses on `{:.2}`) and `> heading_path` suffix so chunks from the same document are visually distinct.
- `kebab-cli` now builds the Config once via `Config::load(cli.config.as_deref())` at the top of every subcommand and threads it into `kebab_app::*_with_config(cfg, ...)` instead of `kebab_app::*(...)`.
- `kebab_app::doctor()` rewritten as `doctor_with_config_path(Option<&Path>)` that reports the actual path probed and hard-fails when `--config <path>` doesn't exist (defaults would otherwise mask user intent).
- `kebab-app` module doc-comment updated: `#[doc(hidden)] pub fn *_with_config` is no longer "test-only seam" — it's the official "config-explicit" API consumed by CLI `--config`, integration tests, and TUI sessions.
- Same PR also improved `kebab search` printer: `{:.4}` score formatting (RRF range collapses on `{:.2}`) and `> heading_path` suffix so chunks from the same document are visually distinct.
**Amends**: tasks/p3/p3-5-app-wiring.md (the test seam was always meant to be the config-explicit API; only the doc-comment lied).
### 2026-05-01 — `--config` regression in `kb ask` (P4-3 follow-up)
### 2026-05-01 — `--config` regression in `kebab ask` (P4-3 follow-up)
**Discovered**: post-P4-3 manual smoke against 192.168.0.47 Ollama with `gemma4:26b`.
**Symptom**: `kb --config <path> ask` returned `model.id = qwen2.5:14b-instruct` (XDG default model) and `score_gate = 0.30` (XDG default), instead of `gemma4:26b` / `0.05` from the explicit config. P4-3 added the ask body but kb-cli's `Cmd::Ask` arm still called bare `kb_app::ask(query, opts)` — same regression class as the P3-5 fix above, just missed when ask was wired.
**Symptom**: `kebab --config <path> ask` returned `model.id = qwen2.5:14b-instruct` (XDG default model) and `score_gate = 0.30` (XDG default), instead of `gemma4:26b` / `0.05` from the explicit config. P4-3 added the ask body but kebab-cli's `Cmd::Ask` arm still called bare `kebab_app::ask(query, opts)` — same regression class as the P3-5 fix above, just missed when ask was wired.
**Fix** (PR #24, fix/cli-ask-honor-config-flag):
- `kb-cli` builds `Config::load(cli.config.as_deref())` once at the top of `Cmd::Ask` and calls `kb_app::ask_with_config(cfg, query, opts)`.
- `kebab-cli` builds `Config::load(cli.config.as_deref())` once at the top of `Cmd::Ask` and calls `kebab_app::ask_with_config(cfg, query, opts)`.
**Amends**: tasks/p4/p4-3-rag-pipeline.md.
@@ -48,15 +48,15 @@ git history.
**Root cause**: RRF formula `score(c) = Σ 1/(k_rrf + rank_m(c))` produces values bounded by `num_retrievers / (k_rrf + 1)`. With `num_retrievers = 2` and the default `k_rrf = 60`, the upper bound is `2/61 ≈ 0.0328`. The default `config.rag.score_gate = 0.05` was calibrated for vector / lexical scores already in `[0, 1]` and silently refused every hybrid query. `fusion_score` was also incomparable across modes — Lexical / Vector lived in `[0, 1]`, Hybrid lived in `(0, 0.033]`.
**Fix** (PR #25, fix/rrf-fusion-score-normalize-and-docs):
- `crates/kb-search/src/hybrid.rs` divides every raw RRF score by `2 / (k_rrf + 1)` so `fusion_score` always lives in `[0, 1]` regardless of mode. Both retrievers contributing rank 1 normalises to `1.0`; chunks present in only one retriever cap around `0.5`. RRF's rank-ordering invariants are preserved (same constant divides every score), so sort + tiebreak behaviour is identical.
- `crates/kebab-search/src/hybrid.rs` divides every raw RRF score by `2 / (k_rrf + 1)` so `fusion_score` always lives in `[0, 1]` regardless of mode. Both retrievers contributing rank 1 normalises to `1.0`; chunks present in only one retriever cap around `0.5`. RRF's rank-ordering invariants are preserved (same constant divides every score), so sort + tiebreak behaviour is identical.
- One unit test (`rrf_formula_matches_known_value`) updated to expect the normalised value `(1/61 + 1/62) / (2/61) ≈ 0.9919`.
- The integration snapshot `crates/kb-search/tests/fixtures/search/hybrid/run-1.json` already used presence checks (`fusion_score_positive: true`) rather than absolute values, so it didn't need regeneration.
- The integration snapshot `crates/kebab-search/tests/fixtures/search/hybrid/run-1.json` already used presence checks (`fusion_score_positive: true`) rather than absolute values, so it didn't need regeneration.
**Why not a per-mode `score_gate` config**: separate `lexical_score_gate / vector_score_gate / hybrid_score_gate` would force every downstream consumer (CLI, eval, TUI) to know which mode picks which threshold. Normalising the score itself is a one-line change at the source and makes `Answer.retrieval.score_gate` semantically meaningful without per-mode bookkeeping.
**Amends**: tasks/p3/p3-4-hybrid-fusion.md (RRF formula now divides by `2/(k_rrf+1)` after summation), tasks/phase-3-vector-hybrid.md (RRF section).
**Verification**: post-fix smoke at `/tmp/kb-smoke/` with default `score_gate = 0.05` succeeded across four scenarios — Korean→Korean, English→English, cross-language, and out-of-corpus refusal.
**Verification**: post-fix smoke at `/tmp/kebab-smoke/` with default `score_gate = 0.05` succeeded across four scenarios — Korean→Korean, English→English, cross-language, and out-of-corpus refusal.
## How to add an entry

View File

@@ -1,12 +1,12 @@
---
title: "KB 작업 단위 인덱스"
source: kb_local_rust_report.md
source: kebab_local_rust_report.md
date: 2026-04-27
---
# KB 작업 단위 인덱스
[`kb_local_rust_report.md`](../kb_local_rust_report.md) 의 Phase 로드맵을 아키텍처 수준 작업 단위로 분해. 각 task 문서는 독립적으로 착수/검수 가능한 단위.
[`kebab_local_rust_report.md`](../kebab_local_rust_report.md) 의 Phase 로드맵을 아키텍처 수준 작업 단위로 분해. 각 task 문서는 독립적으로 착수/검수 가능한 단위.
## 의존 그래프
@@ -25,16 +25,16 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.
| # | 코드 | 제목 | 핵심 산출 crate | 선행 |
|---|------|------|----------------|------|
| P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kb-core, kb-parse-types, kb-config, kb-app, kb-cli | |
| P1 | [phase-1-markdown-ingestion.md](phase-1-markdown-ingestion.md) | Markdown ingestion 파이프라인 | kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-sqlite | P0 |
| P2 | [phase-2-lexical-search.md](phase-2-lexical-search.md) | SQLite FTS5 lexical 검색 + citation | kb-search (lexical) | P1 |
| P3 | [phase-3-vector-hybrid.md](phase-3-vector-hybrid.md) | Local embedding + LanceDB + hybrid | kb-embed, kb-embed-local, kb-store-vector, kb-search | P2 |
| P4 | [phase-4-local-llm-rag.md](phase-4-local-llm-rag.md) | Local LLM + RAG + grounded answer | kb-llm, kb-llm-local, kb-rag | P3 |
| P5 | [phase-5-evaluation.md](phase-5-evaluation.md) | Golden query / regression eval | kb-eval | P4 |
| P6 | [phase-6-image.md](phase-6-image.md) | 이미지 ingestion (OCR + caption) | kb-parse-image | P5 |
| P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kb-parse-pdf | P5 |
| P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kb-parse-audio | P5 |
| P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kb-tui, kb-desktop | P5 |
| P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kebab-core, kebab-parse-types, kebab-config, kebab-app, kebab-cli | |
| P1 | [phase-1-markdown-ingestion.md](phase-1-markdown-ingestion.md) | Markdown ingestion 파이프라인 | kebab-source-fs, kebab-parse-md, kebab-normalize, kebab-chunk, kebab-store-sqlite | P0 |
| P2 | [phase-2-lexical-search.md](phase-2-lexical-search.md) | SQLite FTS5 lexical 검색 + citation | kebab-search (lexical) | P1 |
| P3 | [phase-3-vector-hybrid.md](phase-3-vector-hybrid.md) | Local embedding + LanceDB + hybrid | kebab-embed, kebab-embed-local, kebab-store-vector, kebab-search | P2 |
| P4 | [phase-4-local-llm-rag.md](phase-4-local-llm-rag.md) | Local LLM + RAG + grounded answer | kebab-llm, kebab-llm-local, kebab-rag | P3 |
| P5 | [phase-5-evaluation.md](phase-5-evaluation.md) | Golden query / regression eval | kebab-eval | P4 |
| P6 | [phase-6-image.md](phase-6-image.md) | 이미지 ingestion (OCR + caption) | kebab-parse-image | P5 |
| P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kebab-parse-pdf | P5 |
| P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kebab-parse-audio | P5 |
| P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kebab-tui, kebab-desktop | P5 |
## Component task decomposition (per phase)

View File

@@ -6,7 +6,7 @@ title: "<Component title>"
status: planned
depends_on: [] # other task_ids
unblocks: [] # other task_ids
contract_source: ../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
---
@@ -22,7 +22,7 @@ contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
## Allowed dependencies
- `kb-core`
- `kebab-core`
- <other crates per design §8>
- <external crates with versions>
@@ -71,7 +71,7 @@ If any item here is needed during implementation, STOP and update the frozen des
| unit | ... | ... |
| snapshot | ... (JSON freeze) | `fixtures/...` |
| contract | trait round-trip | mock impls |
| integration | end-to-end via `kb-app` facade | tmp workspace |
| integration | end-to-end via `kebab-app` facade | tmp workspace |
All tests must run under `cargo test -p <crate>` and not require external network or Ollama unless explicitly stated.

View File

@@ -1,12 +1,12 @@
---
phase: P0
component: workspace + kb-core + kb-config + kb-app + kb-cli
component: workspace + kebab-core + kebab-config + kebab-app + kebab-cli
task_id: p0-1
title: "Workspace skeleton + frozen domain types/traits + ID recipe + facade"
status: completed
depends_on: []
unblocks: [p1-1, p1-2, p1-3, p1-4, p1-5, p1-6, p2-1, p2-2, p3-1, p3-2, p3-3, p3-4, p4-1, p4-2, p4-3, p5-1, p5-2, p6-1, p6-2, p6-3, p7-1, p7-2, p8-1, p8-2, p9-1, p9-2, p9-3, p9-4, p9-5]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config + XDG), §7 (all traits), §8 module boundaries, §9 versioning, §10 errors+exit codes, §2.8 wire schema_version]
---
@@ -14,56 +14,56 @@ contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config +
## Goal
Stand up the Cargo workspace (Rust 2024, resolver=3) with `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts.
Stand up the Cargo workspace (Rust 2024, resolver=3) with `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts.
## Why now / why this size
Every other task imports `kb-core`. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them.
Every other task imports `kebab-core`. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them.
## Allowed dependencies
- workspace `[workspace.dependencies]`: `anyhow = "1"`, `thiserror = "2"`, `serde = { version = "1", features = ["derive"] }`, `serde_json = "1"`, `time = { version = "0.3", features = ["serde", "macros"] }`, `uuid = { version = "1", features = ["v7", "serde"] }`, `blake3 = "1"`, `tracing = "0.1"`
- per crate:
- `kb-core`: workspace deps + `serde_json::Map`, `serde-json-canonicalizer`, `unicode-normalization`
- `kb-parse-types`: workspace deps + `kb-core` ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b.
- `kb-config`: workspace deps + `toml = "0.8"`, `dirs = "5"` (XDG paths)
- `kb-app`: workspace deps + `kb-core`, `kb-config`, `tracing-subscriber`, `tracing-appender`
- `kb-cli`: workspace deps + `kb-core`, `kb-config`, `kb-app`, `clap = { version = "4", features = ["derive"] }`
- `kebab-core`: workspace deps + `serde_json::Map`, `serde-json-canonicalizer`, `unicode-normalization`
- `kebab-parse-types`: workspace deps + `kebab-core` ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b.
- `kebab-config`: workspace deps + `toml = "0.8"`, `dirs = "5"` (XDG paths)
- `kebab-app`: workspace deps + `kebab-core`, `kebab-config`, `tracing-subscriber`, `tracing-appender`
- `kebab-cli`: workspace deps + `kebab-core`, `kebab-config`, `kebab-app`, `clap = { version = "4", features = ["derive"] }`
## Forbidden dependencies
- `kb-core` MUST NOT depend on any other `kb-*` crate.
- `kb-parse-types` MUST depend ONLY on `kb-core`. No parser libraries (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`, …), no other `kb-*` crate.
- `kb-config` MUST NOT depend on `kb-app`, `kb-cli`, parsers, stores, embedders, search, llm, rag, tui, desktop.
- `kb-app` MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return `unimplemented!()` or `anyhow::bail!("not yet wired (Pn-i)")`).
- `kb-cli` MUST NOT call any non-`kb-app` crate directly.
- `kebab-core` MUST NOT depend on any other `kebab-*` crate.
- `kebab-parse-types` MUST depend ONLY on `kebab-core`. No parser libraries (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`, …), no other `kebab-*` crate.
- `kebab-config` MUST NOT depend on `kebab-app`, `kebab-cli`, parsers, stores, embedders, search, llm, rag, tui, desktop.
- `kebab-app` MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return `unimplemented!()` or `anyhow::bail!("not yet wired (Pn-i)")`).
- `kebab-cli` MUST NOT call any non-`kebab-app` crate directly.
## Inputs
| input | type | source |
|-------|------|--------|
| frozen design doc | Markdown | `docs/superpowers/specs/2026-04-27-kb-final-form-design.md` |
| user `kb` invocation | command-line args | end user |
| frozen design doc | Markdown | `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` |
| user `kebab` invocation | command-line args | end user |
## Outputs
| output | type | downstream consumer |
|--------|------|---------------------|
| compiling workspace | Rust crates | every later task |
| `kb-core` types/traits | Rust API | every other crate |
| `kb-core` ID functions | Rust API | parsers, normalize, chunkers, embedders, search, rag |
| `kb-config::Config` | Rust struct | every other crate |
| `kb-app` facade methods (stubs) | Rust API | `kb-cli`, future TUI/desktop |
| `kb` binary | executable | end user |
| `kebab-core` types/traits | Rust API | every other crate |
| `kebab-core` ID functions | Rust API | parsers, normalize, chunkers, embedders, search, rag |
| `kebab-config::Config` | Rust struct | every other crate |
| `kebab-app` facade methods (stubs) | Rust API | `kebab-cli`, future TUI/desktop |
| `kebab` binary | executable | end user |
| `docs/wire-schema/v1/*.schema.json` stubs | JSON Schema files | future wire emitters and consumers |
| `docs/spec/*.md` stubs (link to frozen design) | Markdown | future contributors |
## Public surface (signatures only — no new types)
All types/traits below are defined in `kb-core` exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field.
All types/traits below are defined in `kebab-core` exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field.
```rust
// ── kb-core ─────────────────────────────────────────────────────────────────
// ── kebab-core ─────────────────────────────────────────────────────────────────
// Newtype IDs (design §3.1) — Display + FromStr implemented.
pub struct AssetId(pub String);
@@ -114,7 +114,7 @@ pub struct AudioRefBlock { /* per §3.4 */ }
pub enum Inline { /* per §3.4 */ }
pub enum SourceSpan { /* per §3.4 */ }
// (ParsedBlock + parser intermediates live in kb-parse-types per design §3.7b — NOT in kb-core.)
// (ParsedBlock + parser intermediates live in kebab-parse-types per design §3.7b — NOT in kebab-core.)
// Chunk + Citation (§3.5)
pub struct Chunk { /* per §3.5 */ }
@@ -232,14 +232,14 @@ pub fn nfc(input: &str) -> String; // §4.1
```
```rust
// ── kb-parse-types ──────────────────────────────────────────────────────────
// ── kebab-parse-types ──────────────────────────────────────────────────────────
// Per design §3.7b. Defines parser intermediate representations consumed by
// kb-normalize. Depends on kb-core only — never on parser libraries.
// kebab-normalize. Depends on kebab-core only — never on parser libraries.
pub struct ParsedBlock {
pub kind: ParsedBlockKind,
pub heading_path: Vec<String>,
pub source_span: kb_core::SourceSpan,
pub source_span: kebab_core::SourceSpan,
pub payload: ParsedPayload,
}
@@ -247,16 +247,16 @@ pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRe
pub enum ParsedPayload {
Heading { level: u8, text: String },
Paragraph { text: String, inlines: Vec<kb_core::Inline> },
List { ordered: bool, items: Vec<Vec<kb_core::Inline>> },
Paragraph { text: String, inlines: Vec<kebab_core::Inline> },
List { ordered: bool, items: Vec<Vec<kebab_core::Inline>> },
Code { lang: Option<String>, code: String },
Table { headers: Vec<String>, rows: Vec<Vec<String>> },
Quote { text: String, inlines: Vec<kb_core::Inline> },
Quote { text: String, inlines: Vec<kebab_core::Inline> },
ImageRef { src: String, alt: String },
AudioRef { src: String },
}
// `Inline` itself lives in kb-core (§3.4) — parse-types references it, never duplicates it.
// `Inline` itself lives in kebab-core (§3.4) — parse-types references it, never duplicates it.
pub struct Warning { pub kind: WarningKind, pub note: String }
pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed }
@@ -268,31 +268,31 @@ pub struct ParsedAudioSegment;
```
```rust
// ── kb-config ───────────────────────────────────────────────────────────────
// ── kebab-config ───────────────────────────────────────────────────────────────
pub struct Config { /* full schema per §6.4 */ }
impl Config {
pub fn load(path: Option<&std::path::Path>) -> anyhow::Result<Self>;
pub fn from_file(path: &std::path::Path) -> anyhow::Result<Self>;
pub fn defaults() -> Self;
pub fn apply_env(self, env: &std::collections::HashMap<String, String>) -> Self;
pub fn xdg_config_path() -> std::path::PathBuf; // ~/.config/kb/config.toml
pub fn xdg_data_dir() -> std::path::PathBuf; // ~/.local/share/kb
pub fn xdg_config_path() -> std::path::PathBuf; // ~/.config/kebab/config.toml
pub fn xdg_data_dir() -> std::path::PathBuf; // ~/.local/share/kebab
pub fn xdg_cache_dir() -> std::path::PathBuf;
pub fn xdg_state_dir() -> std::path::PathBuf;
}
```
```rust
// ── kb-app ──────────────────────────────────────────────────────────────────
// ── kebab-app ──────────────────────────────────────────────────────────────────
pub fn init_workspace(force: bool) -> anyhow::Result<()>;
pub fn ingest(scope: kb_core::SourceScope, summary_only: bool) -> anyhow::Result<kb_core::IngestReport>;
pub fn list_docs(filter: kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>;
pub fn inspect_doc(id: &kb_core::DocumentId) -> anyhow::Result<kb_core::CanonicalDocument>;
pub fn inspect_chunk(id: &kb_core::ChunkId) -> anyhow::Result<kb_core::Chunk>;
pub fn search(query: kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>;
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<kb_core::Answer>;
pub fn ingest(scope: kebab_core::SourceScope, summary_only: bool) -> anyhow::Result<kebab_core::IngestReport>;
pub fn list_docs(filter: kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
pub fn inspect_doc(id: &kebab_core::DocumentId) -> anyhow::Result<kebab_core::CanonicalDocument>;
pub fn inspect_chunk(id: &kebab_core::ChunkId) -> anyhow::Result<kebab_core::Chunk>;
pub fn search(query: kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<kebab_core::Answer>;
pub fn doctor() -> anyhow::Result<DoctorReport>;
pub struct AskOpts { pub k: usize, pub explain: bool, pub mode: kb_core::SearchMode, pub temperature: Option<f32>, pub seed: Option<u64> }
pub struct AskOpts { pub k: usize, pub explain: bool, pub mode: kebab_core::SearchMode, pub temperature: Option<f32>, pub seed: Option<u64> }
pub struct DoctorReport { pub ok: bool, pub checks: Vec<DoctorCheck> }
pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub hint: Option<String> }
```
@@ -300,9 +300,9 @@ pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub
P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; later phases replace bodies but never change signatures.
```rust
// ── kb-cli ──────────────────────────────────────────────────────────────────
// ── kebab-cli ──────────────────────────────────────────────────────────────────
// clap subcommands: init | ingest | list (docs) | inspect (doc|chunk) | search | ask | doctor | eval (subcommand placeholder)
// Each maps 1:1 to a kb_app function. Exit code mapping per §10.
// Each maps 1:1 to a kebab_app function. Exit code mapping per §10.
```
## Behavior contract
@@ -312,18 +312,18 @@ P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; late
- `id_from` uses `serde-json-canonicalizer` exactly as design §4.2 specifies and truncates blake3 to 32 hex chars.
- `Citation::to_uri` emits W3C Media Fragments URIs per §0 Q3 (`#L<a>-L<b>`, `#p=<n>`, `#xywh=…`, `#caption`, `#t=hh:mm:ss,hh:mm:ss[&speaker=…]`).
- `Citation::parse` is the strict inverse (round-trip property).
- `kb-config` resolves XDG paths via `dirs` crate; respects `XDG_CONFIG_HOME`, `XDG_DATA_HOME`, `XDG_CACHE_HOME`, `XDG_STATE_HOME` if set.
- Config layer order: defaults → file → env (`KB_<SECTION>_<KEY>`) → CLI flag (CLI override is applied by `kb-cli` after `Config::load`).
- `kb-cli` global flags: `--config <path>`, `--verbose`, `--debug`, `--json`, `--explain` (where applicable). On `--json`, output conforms to wire schema v1.
- `kb-cli` exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10).
- `kebab-config` resolves XDG paths via `dirs` crate; respects `XDG_CONFIG_HOME`, `XDG_DATA_HOME`, `XDG_CACHE_HOME`, `XDG_STATE_HOME` if set.
- Config layer order: defaults → file → env (`KB_<SECTION>_<KEY>`) → CLI flag (CLI override is applied by `kebab-cli` after `Config::load`).
- `kebab-cli` global flags: `--config <path>`, `--verbose`, `--debug`, `--json`, `--explain` (where applicable). On `--json`, output conforms to wire schema v1.
- `kebab-cli` exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10).
- All facade-returned wire objects emit `schema_version` per §2 (e.g., `"answer.v1"`, `"search_hit.v1"`).
## Storage / wire effects
- Filesystem: creates `~/.config/kb/`, `~/.local/share/kb/`, `~/KnowledgeBase/` only when `kb init` runs; never on `Config::load`.
- Filesystem: creates `~/.config/kebab/`, `~/.local/share/kebab/`, `~/KnowledgeBase/` only when `kebab init` runs; never on `Config::load`.
- Wire schemas: ships `docs/wire-schema/v1/{citation,search_hit,answer,ingest_report,doc_summary,chunk_inspection,doctor}.schema.json` as **stubs** declaring the top-level `schema_version` and required fields per §2. Full property validation can land later.
- DB: workspace ships `migrations/V001__init.sql` containing **only** §5.1 `schema_meta` + `migrations` tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within `kb-core`/`kb-config`/`kb-app`/`kb-cli` scope).
- Logging: `tracing` initialized in `kb-cli`; daily-rolling file in `~/.local/state/kb/logs/`.
- DB: workspace ships `migrations/V001__init.sql` containing **only** §5.1 `schema_meta` + `migrations` tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within `kebab-core`/`kebab-config`/`kebab-app`/`kebab-cli` scope).
- Logging: `tracing` initialized in `kebab-cli`; daily-rolling file in `~/.local/state/kebab/logs/`.
## Test plan
@@ -336,20 +336,20 @@ P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; late
| unit | newtype `Display`/`FromStr` rejects invalid lengths/chars | inline |
| unit | `Config::defaults` + env override + CLI override produces expected merged config | inline |
| snapshot | `Config::defaults` JSON serde stable | inline (round-trip) |
| smoke | `kb --help`, `kb init`, `kb doctor` run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint) | tmp `XDG_*` env |
| smoke | `kebab --help`, `kebab init`, `kebab doctor` run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint) | tmp `XDG_*` env |
| build | `cargo check --workspace` and `cargo test --workspace` pass | repo |
All tests must run with no network, no Ollama, no models.
## Definition of Done
- [ ] `Cargo.toml` workspace lists `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` and resolver=3, edition 2024
- [ ] `Cargo.toml` workspace lists `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` and resolver=3, edition 2024
- [ ] `cargo check --workspace` passes
- [ ] `cargo test --workspace` passes
- [ ] `kb-parse-types` `cargo tree` shows ONLY `kb-core` + `serde`/`thiserror` style deps (no parser libs, no other `kb-*`)
- [ ] `kb --help` prints subcommands
- [ ] `kb init` creates XDG dirs idempotently and writes `config.toml`
- [ ] `kb doctor` returns wire JSON conforming to `doctor.v1` (in `--json` mode)
- [ ] `kebab-parse-types` `cargo tree` shows ONLY `kebab-core` + `serde`/`thiserror` style deps (no parser libs, no other `kebab-*`)
- [ ] `kebab --help` prints subcommands
- [ ] `kebab init` creates XDG dirs idempotently and writes `config.toml`
- [ ] `kebab doctor` returns wire JSON conforming to `doctor.v1` (in `--json` mode)
- [ ] `docs/wire-schema/v1/*.schema.json` stubs exist (7 files: citation, search_hit, answer, ingest_report, doc_summary, chunk_inspection, doctor)
- [ ] `docs/spec/` stubs exist linking to the frozen design (one file per: domain-model, ids, canonical-document, chunk-policy, citation-policy, module-boundaries, ai-generation-guidelines)
- [ ] `fixtures/` root directory created with all subdirectories that downstream tasks reference: `fixtures/markdown/`, `fixtures/source-fs/`, `fixtures/search/lexical/`, `fixtures/search/hybrid/`, `fixtures/embed/`, `fixtures/vector/`, `fixtures/rag/`, `fixtures/eval/`, `fixtures/image/`, `fixtures/pdf/`, `fixtures/audio/`. Each subdir gets a `.gitkeep` so it tracks. P1 ships at minimum `fixtures/markdown/{simple-note,nested-headings,code-and-table}.md` (per epic phase-0); other dirs stay empty until their phase lands.
@@ -361,12 +361,12 @@ All tests must run with no network, no Ollama, no models.
- Any parser / store / embedder / search / llm / rag / tui / desktop logic (downstream phases).
- Full schema migrations (most DDL lands in p1-6 / p2-1 / p3-3).
- Wire schema deep validation (only required fields + `schema_version` checked here).
- Real `kb-app` business logic (functions stub with `unimplemented!()` or explicit `bail!`).
- Real `kebab-app` business logic (functions stub with `unimplemented!()` or explicit `bail!`).
## Risks / notes
- ID recipe is the contract that every later record depends on. Any change after this task lands forces a `parser_version` / `chunker_version` / `embedding_version` cascade per §9. Treat changes as schema migrations and update the design doc first.
- Newtype IDs use `String` (not `[u8; 16]`) to keep serde simple; tests must still enforce 32-char hex constraint on `FromStr`.
- `kb-app` stubs must use `bail!` not `panic!` so the CLI exits with code 2 cleanly per §10.
- `kebab-app` stubs must use `bail!` not `panic!` so the CLI exits with code 2 cleanly per §10.
- `clap` v4 derive: subcommand `inspect` has nested `doc` / `chunk` variants; ensure exit code 0/1/2 mapping wraps the facade call uniformly.
- XDG path discovery on macOS: spec uses XDG (not `Application Support`) per §6.1 — `dirs` crate honors XDG env vars; tests must set them explicitly.

View File

@@ -1,12 +1,12 @@
---
phase: P1
component: kb-source-fs
component: kebab-source-fs
task_id: p1-1
title: "Local filesystem source connector"
status: completed
depends_on: [p0-1]
unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8]
---
@@ -22,8 +22,8 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `ignore` (gitignore semantics)
- `blake3`
- `walkdir`
@@ -34,21 +34,21 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
## Forbidden dependencies
- `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `SourceScope` | `kb_core::SourceScope` | `kb-app` from config |
| `SourceScope` | `kebab_core::SourceScope` | `kebab-app` from config |
| filesystem | `&Path` | OS |
| `.kbignore` | text file | workspace root, optional |
| `.kebabignore` | text file | workspace root, optional |
## Outputs
| output | type | downstream consumer |
|--------|------|---------------------|
| `Vec<RawAsset>` | `kb_core::RawAsset` | `kb-parse-md`, asset writer in `kb-store-sqlite` (via `kb-app`) |
| `Vec<RawAsset>` | `kebab_core::RawAsset` | `kebab-parse-md`, asset writer in `kebab-store-sqlite` (via `kebab-app`) |
## Public surface (signatures only — no new types)
@@ -56,11 +56,11 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
pub struct FsSourceConnector { /* internal */ }
impl FsSourceConnector {
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
}
impl kb_core::SourceConnector for FsSourceConnector {
fn scan(&self, scope: &kb_core::SourceScope) -> anyhow::Result<Vec<kb_core::RawAsset>>;
impl kebab_core::SourceConnector for FsSourceConnector {
fn scan(&self, scope: &kebab_core::SourceScope) -> anyhow::Result<Vec<kebab_core::RawAsset>>;
}
```
@@ -70,7 +70,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
- `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex.
- `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`).
- `discovered_at` = current `OffsetDateTime::now_utc()` at scan time.
- Combine `config.workspace.exclude` `.kbignore` for filter (union; ordering does not matter).
- Combine `config.workspace.exclude` `.kebabignore` for filter (union; ordering does not matter).
- Symbolic links: follow once, detect cycles via `canonicalize` + visited set.
- Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task).
- Idempotent: same input → same `Vec<RawAsset>` (sort by `workspace_path`).
@@ -78,7 +78,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
## Storage / wire effects
- Reads: filesystem under `config.workspace.root`.
- Writes: nothing. (Asset copy is handled by the asset writer in `kb-store-sqlite`.)
- Writes: nothing. (Asset copy is handled by the asset writer in `kebab-store-sqlite`.)
## Test plan
@@ -87,25 +87,25 @@ impl kb_core::SourceConnector for FsSourceConnector {
| unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical |
| unit | blake3 of known bytes matches expected hex | inline |
| unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test |
| unit | `.kbignore` config exclude works | tmp tree |
| unit | `.kebabignore` config exclude works | tmp tree |
| unit | symlink cycle does not loop | tmp tree with `a -> b -> a` |
| snapshot | `Vec<RawAsset>` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` |
| determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` |
All tests run under `cargo test -p kb-source-fs` with no network and no model.
All tests run under `cargo test -p kebab-source-fs` with no network and no model.
## Definition of Done
- [ ] `cargo check -p kb-source-fs` passes
- [ ] `cargo test -p kb-source-fs` passes
- [ ] `cargo check -p kebab-source-fs` passes
- [ ] `cargo test -p kebab-source-fs` passes
- [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kb-source-fs`)
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kebab-source-fs`)
- [ ] PR description links to design §3.3, §6.2, §7.2
## Out of scope
- File watching (P+).
- Asset copy/reference storage on disk (`kb-store-sqlite` task p1-6).
- Asset copy/reference storage on disk (`kebab-store-sqlite` task p1-6).
- Non-fs source connectors (HTTP, S3 — P+).
## Risks / notes

View File

@@ -1,29 +1,29 @@
---
phase: P1
component: kb-parse-md (frontmatter submodule)
component: kebab-parse-md (frontmatter submodule)
task_id: p1-2
title: "Markdown frontmatter parsing → Metadata"
status: completed
depends_on: [p0-1]
unblocks: [p1-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_sections: [design §3.6 Metadata, design §3.7b kb-parse-types (Warning), design §0 Q9 frontmatter, design §10 errors]
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §3.6 Metadata, design §3.7b kebab-parse-types (Warning), design §0 Q9 frontmatter, design §10 errors]
---
# p1-2 — Markdown frontmatter parsing
## Goal
Parse YAML/TOML frontmatter from Markdown bytes into `kb_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
Parse YAML/TOML frontmatter from Markdown bytes into `kebab_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
## Why now / why this size
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kb-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kebab-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
## Allowed dependencies
- `kb-core`
- `kb-parse-types` (provides shared `Warning` + `WarningKind` per design §3.7b)
- `kebab-core`
- `kebab-parse-types` (provides shared `Warning` + `WarningKind` per design §3.7b)
- `serde`
- `serde_yaml` (or `yaml-rust2`) for YAML
- `toml` for TOML
@@ -33,7 +33,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
## Forbidden dependencies
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-tui`, `kb-desktop`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `pulldown-cmark` (block parser is a sibling task)
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-tui`, `kebab-desktop`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `pulldown-cmark` (block parser is a sibling task)
## Inputs
@@ -46,7 +46,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
| output | type | downstream |
|--------|------|------------|
| `(Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)` | tuple | `kb-normalize` → CanonicalDocument |
| `(Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)` | tuple | `kebab-normalize` → CanonicalDocument |
## Public surface (signatures only — no new types)
@@ -54,10 +54,10 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
pub fn parse_frontmatter(
bytes: &[u8],
hints: &BodyHints,
) -> anyhow::Result<(kb_core::Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)>;
) -> anyhow::Result<(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)>;
```
`Warning` / `WarningKind` come from `kb-parse-types` (shared with `p1-3` blocks parser and downstream `kb-normalize`). `FrontmatterSpan` is crate-internal; if any new public type is needed, STOP and update the frozen design doc first.
`Warning` / `WarningKind` come from `kebab-parse-types` (shared with `p1-3` blocks parser and downstream `kebab-normalize`). `FrontmatterSpan` is crate-internal; if any new public type is needed, STOP and update the frozen design doc first.
## Behavior contract
@@ -68,7 +68,7 @@ pub fn parse_frontmatter(
- `source_type` default `markdown`; `trust_level` default `primary`.
- `aliases`, `tags` default empty.
- Unknown keys → `metadata.user` (`serde_json::Map`), preserved verbatim, no warning.
- Unknown enum value (e.g. `trust_level: weird`) → emit `kb_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" }` + ingest continues with default.
- Unknown enum value (e.g. `trust_level: weird`) → emit `kebab_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" }` + ingest continues with default.
- Malformed YAML → frontmatter discarded, body still parsed, `Warning { kind: WarningKind::MalformedFrontmatter, note: "<error msg>" }` emitted.
- No frontmatter at all → defaults applied silently.
- `id:` field captured into `metadata.user_id_alias` (alias only — does NOT influence `doc_id` per design §4.2).
@@ -91,12 +91,12 @@ pub fn parse_frontmatter(
| snapshot | `fixtures/markdown/frontmatter-only.md` produces stable JSON | fixture |
| snapshot | mixed-language body with no `lang:` detects `ko` or `en` | `fixtures/markdown/mixed-lang.md` |
All tests under `cargo test -p kb-parse-md --lib frontmatter`.
All tests under `cargo test -p kebab-parse-md --lib frontmatter`.
## Definition of Done
- [ ] `cargo check -p kb-parse-md` passes
- [ ] `cargo test -p kb-parse-md frontmatter` passes
- [ ] `cargo check -p kebab-parse-md` passes
- [ ] `cargo test -p kebab-parse-md frontmatter` passes
- [ ] No `pulldown-cmark` import in this submodule
- [ ] Snapshot tests stable across two consecutive runs
- [ ] PR links design §0 Q9, §3.6

View File

@@ -1,20 +1,20 @@
---
phase: P1
component: kb-parse-md (blocks submodule)
component: kebab-parse-md (blocks submodule)
task_id: p1-3
title: "Markdown body → Block tree with line spans"
status: completed
depends_on: [p0-1]
unblocks: [p1-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_sections: [§3.4 Block, §3.4 SourceSpan, §3.7b kb-parse-types, §0 Q3 citation]
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 Block, §3.4 SourceSpan, §3.7b kebab-parse-types, §0 Q3 citation]
---
# p1-3 — Markdown body → Block tree
## Goal
Parse Markdown body bytes into a flat `Vec<kb_parse_types::ParsedBlock>` with heading paths and line ranges preserved, ready for `kb-normalize` to lift into `CanonicalDocument`.
Parse Markdown body bytes into a flat `Vec<kebab_parse_types::ParsedBlock>` with heading paths and line ranges preserved, ready for `kebab-normalize` to lift into `CanonicalDocument`.
## Why now / why this size
@@ -22,15 +22,15 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
## Allowed dependencies
- `kb-core`
- `kb-parse-types` (defines `ParsedBlock`, `ParsedPayload`, `Warning`)
- `kebab-core`
- `kebab-parse-types` (defines `ParsedBlock`, `ParsedPayload`, `Warning`)
- `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature)
- `serde`
- `thiserror`
## Forbidden dependencies
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `kb-tui`, `kb-desktop`, `comrak` (alternative parser; pick one)
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `kebab-tui`, `kebab-desktop`, `comrak` (alternative parser; pick one)
## Inputs
@@ -43,8 +43,8 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
| output | type | downstream |
|--------|------|------------|
| `Vec<kb_parse_types::ParsedBlock>` | shared type from `kb-parse-types` | `kb-normalize` |
| `Vec<kb_parse_types::Warning>` | shared type | propagated into Provenance |
| `Vec<kebab_parse_types::ParsedBlock>` | shared type from `kebab-parse-types` | `kebab-normalize` |
| `Vec<kebab_parse_types::Warning>` | shared type | propagated into Provenance |
## Public surface (signatures only — no new types)
@@ -52,10 +52,10 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
pub fn parse_blocks(
body: &[u8],
body_offset_lines: u32,
) -> anyhow::Result<(Vec<kb_parse_types::ParsedBlock>, Vec<kb_parse_types::Warning>)>;
) -> anyhow::Result<(Vec<kebab_parse_types::ParsedBlock>, Vec<kebab_parse_types::Warning>)>;
```
`ParsedBlock` is defined in `kb-parse-types` (design §3.7b). `kb-parse-md` does NOT define its own; it consumes the shared type. Lift to `kb_core::Block` (with `BlockId` assignment) is `kb-normalize`'s job (p1-4).
`ParsedBlock` is defined in `kebab-parse-types` (design §3.7b). `kebab-parse-md` does NOT define its own; it consumes the shared type. Lift to `kebab_core::Block` (with `BlockId` assignment) is `kebab-normalize`'s job (p1-4).
## Behavior contract
@@ -63,9 +63,9 @@ pub fn parse_blocks(
- Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`).
- Code blocks: `ParsedPayload::Code { lang: Some("rust"), code }` — fenced content not split.
- Tables: GFM tables produce `ParsedPayload::Table { headers, rows }`; if a table cell is malformed, fall back to `ParsedPayload::Paragraph` + `Warning::MalformedTable`.
- Image references: `![alt](src)` produces `ParsedPayload::ImageRef { src, alt }`. `AssetId` resolution happens later in `kb-normalize` (when image src can be matched to a workspace asset).
- Lists: ordered/unordered preserved via `ParsedPayload::List { ordered, items }`; nested list items flattened so each `items[i]` is a `Vec<kb_core::Inline>` for one top-level item.
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per `kb_core::Inline` per design §3.4). Drop other inlines silently.
- Image references: `![alt](src)` produces `ParsedPayload::ImageRef { src, alt }`. `AssetId` resolution happens later in `kebab-normalize` (when image src can be matched to a workspace asset).
- Lists: ordered/unordered preserved via `ParsedPayload::List { ordered, items }`; nested list items flattened so each `items[i]` is a `Vec<kebab_core::Inline>` for one top-level item.
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per `kebab_core::Inline` per design §3.4). Drop other inlines silently.
- Malformed input never panics. Worst case: empty `Vec<ParsedBlock>` + `Warning::ExtractFailed`.
## Storage / wire effects
@@ -86,12 +86,12 @@ pub fn parse_blocks(
| snapshot | `fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable | fixture |
| snapshot | `fixtures/markdown/code-and-table.md` → JSON stable | fixture |
All tests under `cargo test -p kb-parse-md --lib blocks`.
All tests under `cargo test -p kebab-parse-md --lib blocks`.
## Definition of Done
- [ ] `cargo check -p kb-parse-md` passes
- [ ] `cargo test -p kb-parse-md blocks` passes
- [ ] `cargo check -p kebab-parse-md` passes
- [ ] `cargo test -p kebab-parse-md blocks` passes
- [ ] Snapshot tests stable across two runs
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4
@@ -99,7 +99,7 @@ All tests under `cargo test -p kb-parse-md --lib blocks`.
## Out of scope
- Frontmatter (p1-2).
- Lifting `kb_parse_types::ParsedBlock``kb_core::Block` with `BlockId` (p1-4 normalize).
- Lifting `kebab_parse_types::ParsedBlock``kebab_core::Block` with `BlockId` (p1-4 normalize).
- Chunking (p1-5).
## Risks / notes

View File

@@ -1,30 +1,30 @@
---
phase: P1
component: kb-normalize
component: kebab-normalize
task_id: p1-4
title: "Lift parser output → CanonicalDocument with deterministic IDs"
status: completed
depends_on: [p1-2, p1-3]
unblocks: [p1-5, p1-6]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_sections: [§3.4, §3.7b kb-parse-types, §4 ID recipe, §3.6 Provenance, §8 module boundaries]
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4, §3.7b kebab-parse-types, §4 ID recipe, §3.6 Provenance, §8 module boundaries]
---
# p1-4 — Lift to CanonicalDocument
## Goal
Combine `Metadata` (p1-2) + `Vec<kb_parse_types::ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a `kb_core::CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe.
Combine `Metadata` (p1-2) + `Vec<kebab_parse_types::ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a `kebab_core::CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe.
## Why now / why this size
Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
Single responsibility: ID generation + struct assembly. Keeps `kebab-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
## Allowed dependencies
- `kb-core`
- `kb-parse-types` (input shapes — `ParsedBlock`, `ParsedPayload`, `Warning`)
- `kb-config`
- `kebab-core`
- `kebab-parse-types` (input shapes — `ParsedBlock`, `ParsedPayload`, `Warning`)
- `kebab-config`
- `serde`
- `serde-json-canonicalizer` (canonical JSON for ID hashing)
- `blake3`
@@ -34,36 +34,36 @@ Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` pure
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md` (consumed via shared `kb-parse-types` only — `kb-parse-md` must NOT appear in this crate's `cargo tree`), `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md` (consumed via shared `kebab-parse-types` only — `kebab-parse-md` must NOT appear in this crate's `cargo tree`), `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | p1-1 |
| `Metadata` + frontmatter span + warnings | `(kb_core::Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)` | p1-2 |
| `Vec<ParsedBlock>` + warnings | `(Vec<kb_parse_types::ParsedBlock>, Vec<kb_parse_types::Warning>)` | p1-3 |
| `parser_version` | `kb_core::ParserVersion` | constant in `kb-parse-md` |
| `RawAsset` | `kebab_core::RawAsset` | p1-1 |
| `Metadata` + frontmatter span + warnings | `(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)` | p1-2 |
| `Vec<ParsedBlock>` + warnings | `(Vec<kebab_parse_types::ParsedBlock>, Vec<kebab_parse_types::Warning>)` | p1-3 |
| `parser_version` | `kebab_core::ParserVersion` | constant in `kebab-parse-md` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk`, `kb-store-sqlite` |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk`, `kebab-store-sqlite` |
## Public surface (signatures only — no new types)
```rust
pub fn build_canonical_document(
asset: &kb_core::RawAsset,
metadata: kb_core::Metadata,
blocks: Vec<kb_parse_types::ParsedBlock>,
parser_version: &kb_core::ParserVersion,
warnings: Vec<kb_parse_types::Warning>,
) -> anyhow::Result<kb_core::CanonicalDocument>;
asset: &kebab_core::RawAsset,
metadata: kebab_core::Metadata,
blocks: Vec<kebab_parse_types::ParsedBlock>,
parser_version: &kebab_core::ParserVersion,
warnings: Vec<kebab_parse_types::Warning>,
) -> anyhow::Result<kebab_core::CanonicalDocument>;
pub fn id_for_doc(workspace_path: &kb_core::WorkspacePath, asset: &kb_core::AssetId, parser_version: &kb_core::ParserVersion) -> kb_core::DocumentId;
pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kb_core::SourceSpan) -> kb_core::BlockId;
pub fn id_for_doc(workspace_path: &kebab_core::WorkspacePath, asset: &kebab_core::AssetId, parser_version: &kebab_core::ParserVersion) -> kebab_core::DocumentId;
pub fn id_for_block(doc: &kebab_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kebab_core::SourceSpan) -> kebab_core::BlockId;
```
## Behavior contract
@@ -91,14 +91,14 @@ pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[Strin
| unit | provenance contains Discovered/Parsed/Normalized in order | inline |
| snapshot | `fixtures/markdown/code-and-table.md` → CanonicalDocument JSON stable (incl. all IDs) | fixture |
All tests under `cargo test -p kb-normalize`.
All tests under `cargo test -p kebab-normalize`.
## Definition of Done
- [ ] `cargo check -p kb-normalize` passes
- [ ] `cargo test -p kb-normalize` passes
- [ ] `cargo check -p kebab-normalize` passes
- [ ] `cargo test -p kebab-normalize` passes
- [ ] Determinism test runs ≥ 1000 iterations under 1 second
- [ ] No `kb-parse-md` (or any other parser crate) appears in `cargo tree -p kb-normalize` — input types come from `kb-parse-types` only
- [ ] No `kebab-parse-md` (or any other parser crate) appears in `cargo tree -p kebab-normalize` — input types come from `kebab-parse-types` only
- [ ] PR links design §3.7b, §4.2, §4.3, §8
## Out of scope

View File

@@ -1,12 +1,12 @@
---
phase: P1
component: kb-chunk
component: kebab-chunk
task_id: p1-5
title: "Markdown heading-aware chunker (md-heading-v1)"
status: completed
depends_on: [p1-4]
unblocks: [p1-6, p2-2, p3-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §7.2 Chunker, §0 Q3 citation]
---
@@ -22,8 +22,8 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`
- `blake3` (policy_hash)
- `serde-json-canonicalizer`
@@ -31,30 +31,30 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize` (consumes `CanonicalDocument` only via `kb-core`), `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize` (consumes `CanonicalDocument` only via `kebab-core`), `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` from config |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` from config |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite` (p1-6), `kb-embed*` (P3) |
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite` (p1-6), `kebab-embed*` (P3) |
## Public surface (signatures only — no new types)
```rust
pub struct MdHeadingV1Chunker;
impl kb_core::Chunker for MdHeadingV1Chunker {
fn chunker_version(&self) -> kb_core::ChunkerVersion;
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
impl kebab_core::Chunker for MdHeadingV1Chunker {
fn chunker_version(&self) -> kebab_core::ChunkerVersion;
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
}
```
@@ -91,12 +91,12 @@ impl kb_core::Chunker for MdHeadingV1Chunker {
| determinism | identical input + identical policy → identical chunk_ids | inline |
| snapshot | `fixtures/markdown/long-section.md` → Vec<Chunk> JSON stable | fixture |
All tests under `cargo test -p kb-chunk`.
All tests under `cargo test -p kebab-chunk`.
## Definition of Done
- [ ] `cargo check -p kb-chunk` passes
- [ ] `cargo test -p kb-chunk` passes
- [ ] `cargo check -p kebab-chunk` passes
- [ ] `cargo test -p kebab-chunk` passes
- [ ] Snapshot stable across two runs
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.5, §4.2

View File

@@ -1,12 +1,12 @@
---
phase: P1
component: kb-store-sqlite (P1 subset)
component: kebab-store-sqlite (P1 subset)
task_id: p1-6
title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations"
status: completed
depends_on: [p1-1, p1-4, p1-5]
unblocks: [p2-1, p3-3, p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5 DDL (5.1, 5.2, 5.3, 5.4, 5.5 chunks only — FTS handled in p2-1), §5.7 jobs/ingest_runs, §5.8 transactions, §6.3 data_dir layout]
---
@@ -22,8 +22,8 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `rusqlite` (with `bundled-sqlcipher` disabled; use `bundled` feature)
- `refinery` for migrations
- `serde_json`
@@ -34,7 +34,7 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
## Forbidden dependencies
- `kb-source-fs` (only types via `kb-core`), `kb-parse-md`, `kb-normalize`, `kb-chunk` (only types via `kb-core`), `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs` (only types via `kebab-core`), `kebab-parse-md`, `kebab-normalize`, `kebab-chunk` (only types via `kebab-core`), `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
@@ -42,17 +42,17 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|-------|------|--------|
| migrations | `migrations/V001__init.sql` | repo |
| `RawAsset` + bytes | `(RawAsset, Vec<u8>)` | p1-1 + reader |
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
| `Vec<Chunk>` | `kb_core::Chunk` | p1-5 |
| `IngestRun` aggregates | `(scope, counts, duration)` | `kb-app` |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
| `Vec<Chunk>` | `kebab_core::Chunk` | p1-5 |
| `IngestRun` aggregates | `(scope, counts, duration)` | `kebab-app` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `data_dir/kb.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | | every later phase |
| `data_dir/kebab.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | | every later phase |
| `data_dir/assets/<aa>/<asset_id>` bytes (when copied) | | future re-extraction, integrity verification |
| `IngestReport` (wire schema v1) | `kb_core::IngestReport` | `kb-cli`, eval |
| `IngestReport` (wire schema v1) | `kebab_core::IngestReport` | `kebab-cli`, eval |
## Public surface (signatures only — no new types)
@@ -60,23 +60,23 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
pub struct SqliteStore { /* internal */ }
impl SqliteStore {
pub fn open(config: &kb_config::Config) -> anyhow::Result<Self>;
pub fn open(config: &kebab_config::Config) -> anyhow::Result<Self>;
pub fn run_migrations(&self) -> anyhow::Result<()>;
pub fn put_asset_with_bytes(&self, asset: &kb_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
pub fn put_asset_with_bytes(&self, asset: &kebab_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
}
impl kb_core::DocumentStore for SqliteStore {
fn put_asset(&self, a: &kb_core::RawAsset) -> anyhow::Result<()>;
fn put_document(&self, d: &kb_core::CanonicalDocument) -> anyhow::Result<()>;
fn put_blocks(&self, doc: &kb_core::DocumentId, blocks: &[kb_core::Block]) -> anyhow::Result<()>;
fn put_chunks(&self, doc: &kb_core::DocumentId, chunks: &[kb_core::Chunk]) -> anyhow::Result<()>;
fn get_document(&self, id: &kb_core::DocumentId) -> anyhow::Result<Option<kb_core::CanonicalDocument>>;
fn get_chunk(&self, id: &kb_core::ChunkId) -> anyhow::Result<Option<kb_core::Chunk>>;
fn list_documents(&self, filter: &kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>;
impl kebab_core::DocumentStore for SqliteStore {
fn put_asset(&self, a: &kebab_core::RawAsset) -> anyhow::Result<()>;
fn put_document(&self, d: &kebab_core::CanonicalDocument) -> anyhow::Result<()>;
fn put_blocks(&self, doc: &kebab_core::DocumentId, blocks: &[kebab_core::Block]) -> anyhow::Result<()>;
fn put_chunks(&self, doc: &kebab_core::DocumentId, chunks: &[kebab_core::Chunk]) -> anyhow::Result<()>;
fn get_document(&self, id: &kebab_core::DocumentId) -> anyhow::Result<Option<kebab_core::CanonicalDocument>>;
fn get_chunk(&self, id: &kebab_core::ChunkId) -> anyhow::Result<Option<kebab_core::Chunk>>;
fn list_documents(&self, filter: &kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
}
impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
impl kebab_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
```
## Behavior contract
@@ -96,7 +96,7 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
## Storage / wire effects
- Writes: `kb.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
- Writes: `kebab.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
- Reads on subsequent calls: same DB.
## Test plan
@@ -112,14 +112,14 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
| contract | DocumentStore trait round-trip for fixture document | `fixtures/markdown/code-and-table.md` |
| snapshot | IngestReport JSON for fixture run | fixture |
All tests under `cargo test -p kb-store-sqlite` with no network.
All tests under `cargo test -p kebab-store-sqlite` with no network.
## Definition of Done
- [ ] `cargo check -p kb-store-sqlite` passes
- [ ] `cargo test -p kb-store-sqlite` passes
- [ ] `cargo check -p kebab-store-sqlite` passes
- [ ] `cargo test -p kebab-store-sqlite` passes
- [ ] migration `V001__init.sql` matches design §5 verbatim (diff-checked in CI)
- [ ] Writes to `~/.local/share/kb/` are gated by `kb-config`'s `data_dir` and never escape it
- [ ] Writes to `~/.local/share/kebab/` are gated by `kebab-config`'s `data_dir` and never escape it
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §5
@@ -132,5 +132,5 @@ All tests under `cargo test -p kb-store-sqlite` with no network.
## Risks / notes
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kb.sqlite-wal` / `-shm`.
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kebab.sqlite-wal` / `-shm`.
- Asset directory shard prefix uses `asset_id[..2]`; using `asset_id[..1]` would create at most 16 dirs (insufficient).

View File

@@ -1,12 +1,12 @@
---
phase: P2
component: kb-store-sqlite (FTS5 migration)
component: kebab-store-sqlite (FTS5 migration)
task_id: p2-1
title: "FTS5 virtual table + triggers (V002 migration)"
status: completed
depends_on: [p1-6]
unblocks: [p2-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5.5 chunks_fts + triggers, §9 versioning]
---
@@ -18,19 +18,19 @@ Add `chunks_fts` virtual table and three sync triggers via migration `V002__fts.
## Why now / why this size
`chunks_fts` is the lexical index for `kb-search`. Splitting it from p1-6 keeps P1 focused on relational data; bringing it as `V002` lets users upgrade an existing P1 DB without re-ingesting.
`chunks_fts` is the lexical index for `kebab-search`. Splitting it from p1-6 keeps P1 focused on relational data; bringing it as `V002` lets users upgrade an existing P1 DB without re-ingesting.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-store-sqlite` (extends migrations)
- `kebab-core`
- `kebab-config`
- `kebab-store-sqlite` (extends migrations)
- `rusqlite`
- `refinery`
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search` (consumer is p2-2), `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search` (consumer is p2-2), `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
@@ -52,7 +52,7 @@ Add `chunks_fts` virtual table and three sync triggers via migration `V002__fts.
pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
```
(Used by `kb index --rebuild-fts`. Re-runs `INSERT INTO chunks_fts SELECT ... FROM chunks` after `DELETE FROM chunks_fts;`.)
(Used by `kebab index --rebuild-fts`. Re-runs `INSERT INTO chunks_fts SELECT ... FROM chunks` after `DELETE FROM chunks_fts;`.)
## Behavior contract
@@ -64,7 +64,7 @@ pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
## Storage / wire effects
- Writes: `chunks_fts` virtual table inside `kb.sqlite`.
- Writes: `chunks_fts` virtual table inside `kebab.sqlite`.
- Reads: existing `chunks` rows for backfill.
## Test plan
@@ -78,12 +78,12 @@ pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
| function | `rebuild_chunks_fts` produces deterministic content equal to fresh backfill | tmp DB |
| migration | running `V002` twice is a no-op (refinery handles idempotency) | tmp DB |
All tests under `cargo test -p kb-store-sqlite fts`.
All tests under `cargo test -p kebab-store-sqlite fts`.
## Definition of Done
- [ ] `cargo check -p kb-store-sqlite` passes
- [ ] `cargo test -p kb-store-sqlite fts` passes
- [ ] `cargo check -p kebab-store-sqlite` passes
- [ ] `cargo test -p kebab-store-sqlite fts` passes
- [ ] `migrations/V002__fts.sql` matches design §5.5 verbatim (CI diff check)
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §5.5

View File

@@ -1,12 +1,12 @@
---
phase: P2
component: kb-search (lexical mode)
component: kebab-search (lexical mode)
task_id: p2-2
title: "Lexical Retriever via SQLite FTS5 + bm25 + citation"
status: completed
depends_on: [p2-1]
unblocks: [p3-4, p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5/1.6 search output, §2.2 wire schema, §6.4 search settings]
---
@@ -14,38 +14,38 @@ contract_sections: [§3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5
## Goal
Implement `kb_core::Retriever` for `SearchMode::Lexical` using SQLite FTS5. Returns `SearchHit` with `bm25` ranking, `snippet()`-derived preview, and proper W3C-fragment citation.
Implement `kebab_core::Retriever` for `SearchMode::Lexical` using SQLite FTS5. Returns `SearchHit` with `bm25` ranking, `snippet()`-derived preview, and proper W3C-fragment citation.
## Why now / why this size
First concrete `Retriever`. Lets `kb search --mode lexical` work without any embedding/LLM infrastructure. Establishes the SearchHit construction contract that hybrid (p3-4) reuses.
First concrete `Retriever`. Lets `kebab search --mode lexical` work without any embedding/LLM infrastructure. Establishes the SearchHit construction contract that hybrid (p3-4) reuses.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-store-sqlite` (read access to `chunks_fts` + `chunks` + `documents`)
- `kebab-core`
- `kebab-config`
- `kebab-store-sqlite` (read access to `chunks_fts` + `chunks` + `documents`)
- `rusqlite`
- `tracing`
- `thiserror`
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `SearchQuery` (mode=Lexical) | `kb_core::SearchQuery` | `kb-app::search` |
| `kb-config::search` settings (`default_k`, `snippet_chars`) | `kb_config::Config` | runtime |
| SQLite connection (read) | `rusqlite::Connection` | `kb-store-sqlite` |
| `SearchQuery` (mode=Lexical) | `kebab_core::SearchQuery` | `kebab-app::search` |
| `kebab-config::search` settings (`default_k`, `snippet_chars`) | `kebab_config::Config` | runtime |
| SQLite connection (read) | `rusqlite::Connection` | `kebab-store-sqlite` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<SearchHit>` | `kb_core::SearchHit` | `kb-cli` printer, `kb-rag` packer (P4), hybrid (p3-4) |
| `Vec<SearchHit>` | `kebab_core::SearchHit` | `kebab-cli` printer, `kebab-rag` packer (P4), hybrid (p3-4) |
## Public surface (signatures only — no new types)
@@ -53,12 +53,12 @@ First concrete `Retriever`. Lets `kb search --mode lexical` work without any emb
pub struct LexicalRetriever { /* internal: holds an Arc<rusqlite::Connection> + IndexVersion */ }
impl LexicalRetriever {
pub fn new(store: std::sync::Arc<kb_store_sqlite::SqliteStore>, index_version: kb_core::IndexVersion) -> Self;
pub fn new(store: std::sync::Arc<kebab_store_sqlite::SqliteStore>, index_version: kebab_core::IndexVersion) -> Self;
}
impl kb_core::Retriever for LexicalRetriever {
fn search(&self, query: &kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>;
fn index_version(&self) -> kb_core::IndexVersion;
impl kebab_core::Retriever for LexicalRetriever {
fn search(&self, query: &kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
fn index_version(&self) -> kebab_core::IndexVersion;
}
```
@@ -98,8 +98,8 @@ impl kb_core::Retriever for LexicalRetriever {
## Storage / wire effects
- Reads only. Never mutates `kb.sqlite`.
- Wire: `Vec<SearchHit>` serialized via wire schema `search_hit.v1` when `kb-cli --json` is used.
- Reads only. Never mutates `kebab.sqlite`.
- Wire: `Vec<SearchHit>` serialized via wire schema `search_hit.v1` when `kebab-cli --json` is used.
## Test plan
@@ -114,13 +114,13 @@ impl kb_core::Retriever for LexicalRetriever {
| determinism | identical query twice produces identical hit order and scores | tmp DB |
| snapshot | `Vec<SearchHit>` JSON for fixed corpus stable | `fixtures/search/lexical/run-1.json` |
All tests under `cargo test -p kb-search lexical`.
All tests under `cargo test -p kebab-search lexical`.
## Definition of Done
- [ ] `cargo check -p kb-search` passes
- [ ] `cargo test -p kb-search lexical` passes
- [ ] No imports outside Allowed dependencies (`cargo tree -p kb-search` audit)
- [ ] `cargo check -p kebab-search` passes
- [ ] `cargo test -p kebab-search lexical` passes
- [ ] No imports outside Allowed dependencies (`cargo tree -p kebab-search` audit)
- [ ] Output JSON conforms to `docs/wire-schema/v1/search_hit.schema.json`
- [ ] PR links design §3.7, §0 Q3, §2.2

View File

@@ -1,12 +1,12 @@
---
phase: P3
component: kb-embed (trait crate)
component: kebab-embed (trait crate)
task_id: p3-1
title: "Embedder trait + EmbeddingInput/Kind validation"
status: completed
depends_on: [p0-1]
unblocks: [p3-2, p3-3, p3-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §3.7 SearchHit.embedding_model, design §7.1 EmbeddingInput/Kind, design §7.2 Embedder, report §11 LLM/embedding split]
---
@@ -14,16 +14,16 @@ contract_sections: [design §3.7 SearchHit.embedding_model, design §7.1 Embeddi
## Goal
Provide the `kb-embed` crate that re-exports `Embedder` trait, `EmbeddingInput`/`EmbeddingKind`, and offers a mock implementation for downstream tests. This task is **trait-only**; concrete adapters live in p3-2.
Provide the `kebab-embed` crate that re-exports `Embedder` trait, `EmbeddingInput`/`EmbeddingKind`, and offers a mock implementation for downstream tests. This task is **trait-only**; concrete adapters live in p3-2.
## Why now / why this size
Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. Owning the trait + a mock implementation in a tiny crate keeps `kb-store-vector` and `kb-search` testable without touching real models.
Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. Owning the trait + a mock implementation in a tiny crate keeps `kebab-store-vector` and `kebab-search` testable without touching real models.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`
- `thiserror`
- `tracing`
@@ -31,35 +31,35 @@ Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface.
## Forbidden dependencies
- `fastembed`, `ort`, `tokenizers`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `fastembed`, `ort`, `tokenizers`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `EmbeddingInput` | `kb_core::EmbeddingInput<'_>` | callers (parser-side or query-side) |
| `EmbeddingInput` | `kebab_core::EmbeddingInput<'_>` | callers (parser-side or query-side) |
| model identity | `(EmbeddingModelId, EmbeddingVersion, dimensions)` | adapter at construction |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<Vec<f32>>` | row-aligned with input | `kb-store-vector`, `kb-search` (vector mode) |
| `Vec<Vec<f32>>` | row-aligned with input | `kebab-store-vector`, `kebab-search` (vector mode) |
## Public surface (signatures only — no new types)
```rust
pub use kb_core::{EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion, Embedder};
pub use kebab_core::{EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion, Embedder};
/// Test-only mock that produces deterministic vectors. Compiled only when `mock` feature is on.
#[cfg(feature = "mock")]
pub struct MockEmbedder { /* internal: model_id, dims, seed */ }
#[cfg(feature = "mock")]
impl MockEmbedder {
pub fn new(model_id: kb_core::EmbeddingModelId, version: kb_core::EmbeddingVersion, dimensions: usize) -> Self;
pub fn new(model_id: kebab_core::EmbeddingModelId, version: kebab_core::EmbeddingVersion, dimensions: usize) -> Self;
}
#[cfg(feature = "mock")]
impl kb_core::Embedder for MockEmbedder { /* per §7.2 */ }
impl kebab_core::Embedder for MockEmbedder { /* per §7.2 */ }
```
## Behavior contract
@@ -84,21 +84,21 @@ impl kb_core::Embedder for MockEmbedder { /* per §7.2 */ }
| unit | dimensions match construction-time value | inline |
| contract | property test: 100 random inputs, each vector has length == dimensions, all finite floats | inline (proptest) |
All tests under `cargo test -p kb-embed`.
All tests under `cargo test -p kebab-embed`.
## Definition of Done
- [ ] `cargo check -p kb-embed` passes
- [ ] `cargo test -p kb-embed` passes
- [ ] `cargo check -p kebab-embed` passes
- [ ] `cargo test -p kebab-embed` passes
- [ ] No external embedding dep present
- [ ] PR links design §7.2 Embedder, §11
## Out of scope
- Real adapter (`kb-embed-local` is p3-2).
- Real adapter (`kebab-embed-local` is p3-2).
- Reranker traits (P+).
## Risks / notes
- `MockEmbedder` is gated by `mock` feature (default OFF). Downstream tests opt in via `[dev-dependencies] kb-embed = { path = "...", features = ["mock"] }`. CI build of release binary (`cargo build --release` without `--features mock`) MUST NOT include `MockEmbedder` symbol — verifiable via `cargo bloat` or `nm` symbol scan.
- Trait re-exports keep the call site stable even if `kb-core` reorganizes; downstream crates should `use kb_embed::Embedder` rather than `use kb_core::Embedder`.
- `MockEmbedder` is gated by `mock` feature (default OFF). Downstream tests opt in via `[dev-dependencies] kebab-embed = { path = "...", features = ["mock"] }`. CI build of release binary (`cargo build --release` without `--features mock`) MUST NOT include `MockEmbedder` symbol — verifiable via `cargo bloat` or `nm` symbol scan.
- Trait re-exports keep the call site stable even if `kebab-core` reorganizes; downstream crates should `use kebab_embed::Embedder` rather than `use kebab_core::Embedder`.

View File

@@ -1,12 +1,12 @@
---
phase: P3
component: kb-embed-local (fastembed adapter)
component: kebab-embed-local (fastembed adapter)
task_id: p3-2
title: "fastembed-rs Embedder for multilingual-e5-small"
status: completed
depends_on: [p3-1]
unblocks: [p3-3, p3-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §7.2 Embedder, report §11.3 local embedding, design §6.4 [models.embedding], design §9 versioning]
---
@@ -22,9 +22,9 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-embed`
- `kebab-core`
- `kebab-config`
- `kebab-embed`
- `fastembed = "4"` (or current stable)
- `tokenizers`
- `ort` (transitive via fastembed)
@@ -33,21 +33,21 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, network HTTP libs (model download is fastembed's responsibility)
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, network HTTP libs (model download is fastembed's responsibility)
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-config::Config.models.embedding` | settings | runtime |
| `EmbeddingInput[..]` | `kb_core::EmbeddingInput<'_>[]` | callers |
| `kebab-config::Config.models.embedding` | settings | runtime |
| `EmbeddingInput[..]` | `kebab_core::EmbeddingInput<'_>[]` | callers |
| model cache | `data_dir/models/fastembed/` | filesystem |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<Vec<f32>>` | row-aligned, `dimensions = 384` | `kb-store-vector`, query vectors for hybrid search |
| `Vec<Vec<f32>>` | row-aligned, `dimensions = 384` | `kebab-store-vector`, query vectors for hybrid search |
| model identity | `(EmbeddingModelId, EmbeddingVersion, usize)` | record fields, `embedding_id` recipe |
## Public surface (signatures only — no new types)
@@ -56,14 +56,14 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
pub struct FastembedEmbedder { /* internal: TextEmbedding instance + model meta */ }
impl FastembedEmbedder {
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
}
impl kb_core::Embedder for FastembedEmbedder {
fn model_id(&self) -> kb_core::EmbeddingModelId;
fn model_version(&self) -> kb_core::EmbeddingVersion;
impl kebab_core::Embedder for FastembedEmbedder {
fn model_id(&self) -> kebab_core::EmbeddingModelId;
fn model_version(&self) -> kebab_core::EmbeddingVersion;
fn dimensions(&self) -> usize;
fn embed(&self, inputs: &[kb_core::EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>;
fn embed(&self, inputs: &[kebab_core::EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>;
}
```
@@ -96,12 +96,12 @@ impl kb_core::Embedder for FastembedEmbedder {
| performance | batch of 64 short inputs completes in < 5s on CI host | tmp config (skip on slow CI via `#[ignore]`) |
| snapshot | aggregate hash of vectors for 5 known sentences stable across runs | `fixtures/embed/known-sentences.json` |
All tests under `cargo test -p kb-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane.
All tests under `cargo test -p kebab-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane.
## Definition of Done
- [ ] `cargo check -p kb-embed-local` passes
- [ ] `cargo test -p kb-embed-local` passes (excluding `#[ignore]`)
- [ ] `cargo check -p kebab-embed-local` passes
- [ ] `cargo test -p kebab-embed-local` passes (excluding `#[ignore]`)
- [ ] First-run model download works under `data_dir/models/fastembed/`
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §11.3, §6.4, §9

View File

@@ -1,12 +1,12 @@
---
phase: P3
component: kb-store-vector (LanceDB)
component: kebab-store-vector (LanceDB)
task_id: p3-3
title: "LanceDB VectorStore + embedding_records writer"
status: completed
depends_on: [p3-2, p1-6]
unblocks: [p3-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5.6 embedding_records, §6.3 lancedb table naming, §7.2 VectorStore, §9 versioning]
---
@@ -18,13 +18,13 @@ Implement `VectorStore` over LanceDB (embedded). Stores per-model tables (`chunk
## Why now / why this size
Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details.
Closes the loop chunk → vector. Splits cleanly from `kebab-search` so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-store-sqlite` (only for writing/reading rows in `embedding_records`)
- `kebab-core`
- `kebab-config`
- `kebab-store-sqlite` (only for writing/reading rows in `embedding_records`)
- `lancedb`
- `arrow` (and `arrow-array`, `arrow-schema`)
- `serde`, `serde_json`
@@ -33,16 +33,16 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-embed*` (consumes `Vec<f32>` via input only — no embedding logic here), `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-embed*` (consumes `Vec<f32>` via input only — no embedding logic here), `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `VectorRecord[..]` | `kb_core::VectorRecord` | `kb-app::embed_index` (P3 facade) |
| query vector | `&[f32]` | `kb-embed-local` (`Embedder::embed` for query) |
| filters | `kb_core::SearchFilters` | `SearchQuery` |
| `kb-config::Config.storage.vector_dir` | path | runtime |
| `VectorRecord[..]` | `kebab_core::VectorRecord` | `kebab-app::embed_index` (P3 facade) |
| query vector | `&[f32]` | `kebab-embed-local` (`Embedder::embed` for query) |
| filters | `kebab_core::SearchFilters` | `SearchQuery` |
| `kebab-config::Config.storage.vector_dir` | path | runtime |
## Outputs
@@ -50,7 +50,7 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
|--------|------|------------|
| Lance tables under `vector_dir/chunk_embeddings_<model>_<dim>.lance/` | filesystem | future searches |
| `embedding_records` rows | SQLite | reverse lookup, reindex bookkeeping |
| `Vec<VectorHit>` | `kb_core::VectorHit` | hybrid retriever (p3-4) |
| `Vec<VectorHit>` | `kebab_core::VectorHit` | hybrid retriever (p3-4) |
## Public surface (signatures only — no new types)
@@ -58,13 +58,13 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
pub struct LanceVectorStore { /* internal: connection + sqlite handle */ }
impl LanceVectorStore {
pub fn new(config: &kb_config::Config, sqlite: std::sync::Arc<kb_store_sqlite::SqliteStore>) -> anyhow::Result<Self>;
pub fn new(config: &kebab_config::Config, sqlite: std::sync::Arc<kebab_store_sqlite::SqliteStore>) -> anyhow::Result<Self>;
}
impl kb_core::VectorStore for LanceVectorStore {
fn ensure_table(&self, model: &kb_core::EmbeddingModelId, dim: usize) -> anyhow::Result<kb_core::IndexId>;
fn upsert(&self, recs: &[kb_core::VectorRecord]) -> anyhow::Result<()>;
fn search(&self, query_vec: &[f32], k: usize, filters: &kb_core::SearchFilters) -> anyhow::Result<Vec<kb_core::VectorHit>>;
impl kebab_core::VectorStore for LanceVectorStore {
fn ensure_table(&self, model: &kebab_core::EmbeddingModelId, dim: usize) -> anyhow::Result<kebab_core::IndexId>;
fn upsert(&self, recs: &[kebab_core::VectorRecord]) -> anyhow::Result<()>;
fn search(&self, query_vec: &[f32], k: usize, filters: &kebab_core::SearchFilters) -> anyhow::Result<Vec<kebab_core::VectorHit>>;
}
```
@@ -116,12 +116,12 @@ impl kb_core::VectorStore for LanceVectorStore {
| determinism | same query vector + same data → same top-k order | tmp data_dir |
| snapshot | `Vec<VectorHit>` JSON for fixed corpus stable | `fixtures/vector/run-1.json` |
All tests under `cargo test -p kb-store-vector`.
All tests under `cargo test -p kebab-store-vector`.
## Definition of Done
- [ ] `cargo check -p kb-store-vector` passes
- [ ] `cargo test -p kb-store-vector` passes
- [ ] `cargo check -p kebab-store-vector` passes
- [ ] `cargo test -p kebab-store-vector` passes
- [ ] No imports outside Allowed dependencies
- [ ] `embedding_records` rows align 1:1 with Lance rows after a successful upsert batch
- [ ] PR links design §5.6, §6.3, §7.2
@@ -130,7 +130,7 @@ All tests under `cargo test -p kb-store-vector`.
- IVF / PQ index tuning (P+).
- Image / multimodal vector tables (P6).
- `kb-app` orchestration of indexing jobs (`embed_index` facade method body).
- `kebab-app` orchestration of indexing jobs (`embed_index` facade method body).
## Risks / notes

View File

@@ -1,12 +1,12 @@
---
phase: P3
component: kb-search (hybrid)
component: kebab-search (hybrid)
task_id: p3-4
title: "Hybrid Retriever (RRF) over lexical + vector"
status: completed
depends_on: [p2-2, p3-3]
unblocks: [p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 [search] rrf settings]
---
@@ -22,17 +22,17 @@ Single mediator. Keeps the lexical and vector retrievers focused; only this task
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-store-sqlite` (for `LexicalRetriever`)
- `kb-store-vector` (for `LanceVectorStore`)
- `kb-embed` (trait only — for query embedding via `Embedder`)
- `kebab-core`
- `kebab-config`
- `kebab-store-sqlite` (for `LexicalRetriever`)
- `kebab-store-vector` (for `LanceVectorStore`)
- `kebab-embed` (trait only — for query embedding via `Embedder`)
- `tracing`
- `thiserror`
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`. (`kb-embed-local` is a runtime-injected `dyn Embedder`; this crate must not depend on the concrete adapter directly.)
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`. (`kebab-embed-local` is a runtime-injected `dyn Embedder`; this crate must not depend on the concrete adapter directly.)
## Inputs
@@ -41,21 +41,21 @@ Single mediator. Keeps the lexical and vector retrievers focused; only this task
| `LexicalRetriever` | trait object | constructed elsewhere |
| `LanceVectorStore` | trait object | constructed elsewhere |
| `Box<dyn Embedder>` | for query embedding | runtime-injected |
| `kb-config::Config.search` | `default_k`, `hybrid_fusion`, `rrf_k` | runtime |
| `SearchQuery` | `kb_core::SearchQuery` | `kb-app::search` |
| `kebab-config::Config.search` | `default_k`, `hybrid_fusion`, `rrf_k` | runtime |
| `SearchQuery` | `kebab_core::SearchQuery` | `kebab-app::search` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<SearchHit>` (with full `RetrievalDetail`) | `kb_core::SearchHit` | `kb-cli` printer, `kb-rag` packer |
| `Vec<SearchHit>` (with full `RetrievalDetail`) | `kebab_core::SearchHit` | `kebab-cli` printer, `kebab-rag` packer |
## Public surface (signatures only — no new types)
```rust
pub struct HybridRetriever {
lexical: std::sync::Arc<dyn kb_core::Retriever>,
vector: std::sync::Arc<dyn kb_core::Retriever>, // wrapper over LanceVectorStore + Embedder
lexical: std::sync::Arc<dyn kebab_core::Retriever>,
vector: std::sync::Arc<dyn kebab_core::Retriever>, // wrapper over LanceVectorStore + Embedder
fusion: FusionPolicy,
k: usize,
}
@@ -64,27 +64,27 @@ pub enum FusionPolicy { Rrf { k_rrf: u32 } }
impl HybridRetriever {
pub fn new(
config: &kb_config::Config,
lexical: std::sync::Arc<dyn kb_core::Retriever>,
vector: std::sync::Arc<dyn kb_core::Retriever>,
config: &kebab_config::Config,
lexical: std::sync::Arc<dyn kebab_core::Retriever>,
vector: std::sync::Arc<dyn kebab_core::Retriever>,
) -> Self;
}
impl kb_core::Retriever for HybridRetriever {
fn search(&self, query: &kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>;
fn index_version(&self) -> kb_core::IndexVersion;
impl kebab_core::Retriever for HybridRetriever {
fn search(&self, query: &kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
fn index_version(&self) -> kebab_core::IndexVersion;
}
/// Wrapper that turns a VectorStore + Embedder into a Retriever.
pub struct VectorRetriever {
store: std::sync::Arc<dyn kb_core::VectorStore>,
embed: std::sync::Arc<dyn kb_core::Embedder>,
/* heading_path/snippet enrichment hits SQLite via kb-store-sqlite read accessor */
store: std::sync::Arc<dyn kebab_core::VectorStore>,
embed: std::sync::Arc<dyn kebab_core::Embedder>,
/* heading_path/snippet enrichment hits SQLite via kebab-store-sqlite read accessor */
}
impl VectorRetriever {
pub fn new(store: std::sync::Arc<dyn kb_core::VectorStore>, embed: std::sync::Arc<dyn kb_core::Embedder>, sqlite: std::sync::Arc<kb_store_sqlite::SqliteStore>) -> Self;
pub fn new(store: std::sync::Arc<dyn kebab_core::VectorStore>, embed: std::sync::Arc<dyn kebab_core::Embedder>, sqlite: std::sync::Arc<kebab_store_sqlite::SqliteStore>) -> Self;
}
impl kb_core::Retriever for VectorRetriever { /* per §7.2 */ }
impl kebab_core::Retriever for VectorRetriever { /* per §7.2 */ }
```
## Behavior contract
@@ -124,12 +124,12 @@ impl kb_core::Retriever for VectorRetriever { /* per §7.2 */ }
| determinism | identical query twice → byte-identical `Vec<SearchHit>` | tmp DB |
| snapshot | hybrid output JSON stable | `fixtures/search/hybrid/run-1.json` |
All tests under `cargo test -p kb-search hybrid`.
All tests under `cargo test -p kebab-search hybrid`.
## Definition of Done
- [ ] `cargo check -p kb-search` passes
- [ ] `cargo test -p kb-search hybrid` passes
- [ ] `cargo check -p kebab-search` passes
- [ ] `cargo test -p kebab-search hybrid` passes
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.7, §6.4 search, §0 Q3

View File

@@ -1,64 +1,64 @@
---
phase: P3
component: kb-app (facade wiring)
component: kebab-app (facade wiring)
task_id: p3-5
title: "Wire kb-app facade — ingest / search / list / inspect end-to-end"
title: "Wire kebab-app facade — ingest / search / list / inspect end-to-end"
status: completed
depends_on: [p1-6, p2-2, p3-2, p3-3, p3-4]
unblocks: [p4-3, p9-1, p9-2, p9-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§7 components, §7.2 traits, §1 UX scenes (ingest, search, list, inspect), §6.4 config, §10 errors]
---
# p3-5 — Wire kb-app facade
# p3-5 — Wire kebab-app facade
## Goal
Replace the `bail!("not yet wired")` stubs in `kb-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1P3. After this task, `kb index` actually walks a workspace and persists chunks, and `kb search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kb-app::ask` stays stubbed (P4-3 owns it).
Replace the `bail!("not yet wired")` stubs in `kebab-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1P3. After this task, `kebab index` actually walks a workspace and persists chunks, and `kebab search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kebab-app::ask` stays stubbed (P4-3 owns it).
## Why now / why this size
P1P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kb-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3.
P1P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kebab-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3.
## Allowed dependencies
`kb-app` may depend on:
`kebab-app` may depend on:
- `kb-core`, `kb-config` (already)
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-sqlite` (P1)
- `kb-search` (P2-2, P3-4)
- `kb-store-vector`, `kb-embed`, `kb-embed-local` (P3-2, P3-3, P3-4)
- `kebab-core`, `kebab-config` (already)
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-sqlite` (P1)
- `kebab-search` (P2-2, P3-4)
- `kebab-store-vector`, `kebab-embed`, `kebab-embed-local` (P3-2, P3-3, P3-4)
- `tracing`, `anyhow`, `serde`, `serde_json`, `time`, `dirs`, `toml` (existing)
## Forbidden dependencies
- `kb-llm*`, `kb-rag` (P4 ownership — `ask` stays stubbed)
- `kb-tui`, `kb-desktop` (P9 — those *consume* `kb-app`, not the other way)
- `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio` (P6/P7/P8)
- network HTTP libs (model download is `kb-embed-local`'s responsibility via fastembed)
- `kebab-llm*`, `kebab-rag` (P4 ownership — `ask` stays stubbed)
- `kebab-tui`, `kebab-desktop` (P9 — those *consume* `kebab-app`, not the other way)
- `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio` (P6/P7/P8)
- network HTTP libs (model download is `kebab-embed-local`'s responsibility via fastembed)
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-config::Config` | `kb_config::Config` | loaded once at process start; threaded through every facade call |
| `SourceScope` | `kb_core::SourceScope` | CLI |
| `SearchQuery` | `kb_core::SearchQuery` | CLI |
| `DocFilter` | `kb_core::DocFilter` | CLI |
| `DocumentId` / `ChunkId` | `kb_core::ids::*` | CLI |
| `kebab-config::Config` | `kebab_config::Config` | loaded once at process start; threaded through every facade call |
| `SourceScope` | `kebab_core::SourceScope` | CLI |
| `SearchQuery` | `kebab_core::SearchQuery` | CLI |
| `DocFilter` | `kebab_core::DocFilter` | CLI |
| `DocumentId` / `ChunkId` | `kebab_core::ids::*` | CLI |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `IngestReport` | `kb_core::IngestReport` | `kb-cli` (`ingest_report.v1`), `kb-eval` (P5) |
| `Vec<SearchHit>` | `kb_core::SearchHit` | `kb-cli` (`search_hit.v1`), `kb-rag` (P4-3) |
| `Vec<DocSummary>` | `kb_core::DocSummary` | `kb-cli` (`doc_summary.v1`) |
| `CanonicalDocument` / `Chunk` | `kb_core::*` | `kb-cli` (`chunk_inspection.v1`) |
| `IngestReport` | `kebab_core::IngestReport` | `kebab-cli` (`ingest_report.v1`), `kebab-eval` (P5) |
| `Vec<SearchHit>` | `kebab_core::SearchHit` | `kebab-cli` (`search_hit.v1`), `kebab-rag` (P4-3) |
| `Vec<DocSummary>` | `kebab_core::DocSummary` | `kebab-cli` (`doc_summary.v1`) |
| `CanonicalDocument` / `Chunk` | `kebab_core::*` | `kebab-cli` (`chunk_inspection.v1`) |
## Public surface (signatures only — no new types)
The signatures already live on `kb-app`. This task only swaps the bodies. Frozen surface:
The signatures already live on `kebab-app`. This task only swaps the bodies. Frozen surface:
```rust
pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>;
@@ -69,7 +69,7 @@ pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHi
// `ask` and `init_workspace` / `doctor` stay as-is.
```
If a constructor helper is needed (e.g., a single `App::open(&Config)` that opens `SqliteStore`, `LanceVectorStore`, embedder, retrievers once), it MAY be added pub-or-pub(crate) to `kb-app`. Keep all newly-added types crate-private unless explicitly required by `kb-cli`.
If a constructor helper is needed (e.g., a single `App::open(&Config)` that opens `SqliteStore`, `LanceVectorStore`, embedder, retrievers once), it MAY be added pub-or-pub(crate) to `kebab-app`. Keep all newly-added types crate-private unless explicitly required by `kebab-cli`.
## Behavior contract
@@ -77,8 +77,8 @@ If a constructor helper is needed (e.g., a single `App::open(&Config)` that open
Pipeline per design §1.2:
1. Resolve `scope` → set of files under `config.workspace.root` filtered by `config.workspace.include`/`exclude`. Use `kb-source-fs::FsSourceConnector` and pass each `RawAsset` + bytes through.
2. For each markdown asset: `kb_parse_md::frontmatter::parse_frontmatter` + `kb_parse_md::blocks::parse_blocks``kb_normalize::build_canonical_document``kb_chunk::MdHeadingV1Chunker::chunk`.
1. Resolve `scope` → set of files under `config.workspace.root` filtered by `config.workspace.include`/`exclude`. Use `kebab-source-fs::FsSourceConnector` and pass each `RawAsset` + bytes through.
2. For each markdown asset: `kebab_parse_md::frontmatter::parse_frontmatter` + `kebab_parse_md::blocks::parse_blocks``kebab_normalize::build_canonical_document``kebab_chunk::MdHeadingV1Chunker::chunk`.
3. SQLite writes per design §5.8 (one ingest = one transaction): `put_asset_with_bytes``put_document``put_blocks``put_chunks`.
4. If `config.models.embedding.provider == "fastembed"`: build `FastembedEmbedder` once at the top of the run, embed every chunk's `text` as `EmbeddingKind::Document`, batch by `config.models.embedding.batch_size`, then `LanceVectorStore::ensure_table` + `upsert`. Skip vector indexing if provider is `"none"` or `embedding.dimensions == 0` (config opt-out path).
5. Record an `ingest_runs` row via `JobRepo` with the aggregate counts. `summary_only=true` writes `items_json=NULL`.
@@ -90,9 +90,9 @@ Errors: any per-file parse failure should be recorded as a `Warning` in `Provena
Per `query.mode`:
- `Lexical``kb_search::LexicalRetriever::search`. Takes a read-only borrow of the SQLite connection.
- `Vector``kb_search::VectorRetriever::search`. Requires `Embedder` (built from config) + `VectorStore` (LanceDB).
- `Hybrid``kb_search::HybridRetriever::search` composing the above two.
- `Lexical``kebab_search::LexicalRetriever::search`. Takes a read-only borrow of the SQLite connection.
- `Vector``kebab_search::VectorRetriever::search`. Requires `Embedder` (built from config) + `VectorStore` (LanceDB).
- `Hybrid``kebab_search::HybridRetriever::search` composing the above two.
Each call constructs (or reuses) the retrievers from a single `App` context that holds `Arc<SqliteStore>`, `Arc<LanceVectorStore>`, `Arc<dyn Embedder>` so cold-start cost is paid once per process. Document the lifetime model: a CLI invocation paths through `App::open(&Config)` once, runs the requested op, then drops everything on exit. The TUI (P9) holds the `App` for the session.
@@ -100,17 +100,17 @@ Each call constructs (or reuses) the retrievers from a single `App` context that
### `list_docs(filter)` / `inspect_doc(id)` / `inspect_chunk(id)`
Direct delegation to `kb_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement.
Direct delegation to `kebab_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement.
### Lifecycle
Define an internal `pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result<Self>`. Each public `kb-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kb-cli` and `kb-tui` don't need to refactor.
Define an internal `pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result<Self>`. Each public `kebab-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kebab-cli` and `kebab-tui` don't need to refactor.
`vector` and `embedder` are `Option` because the spec allows running KB without embeddings (lexical-only mode for resource-constrained boxes — `provider == "none"`). When absent, `search` with `Vector` / `Hybrid` modes returns `anyhow::Error` with a hint to enable embeddings; `ingest` skips the vector step.
### Versioning
- `parser_version` from `kb-parse-md` constant.
- `parser_version` from `kebab-parse-md` constant.
- `chunker_version` from `MdHeadingV1Chunker::chunker_version()`.
- `embedding_model` / `embedding_version` from the embedder.
- `index_version` from the retriever (or `Lexical`/`Vector` IndexId).
@@ -119,15 +119,15 @@ All wired through to the persisted records and the wire output.
## Storage / wire effects
- Writes: `data_dir/kb.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets/<aa>/<id>` (when copied), `data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only).
- Writes: `data_dir/kebab.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets/<aa>/<id>` (when copied), `data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only).
- Reads on subsequent calls: same DB.
- Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1``kb-cli` already has the wrappers.
- Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1``kebab-cli` already has the wrappers.
## Test plan
| kind | description | fixture / data |
|------|-------------|----------------|
| integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kb-app/tests/fixtures/workspace/`, tmp `data_dir` |
| integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kebab-app/tests/fixtures/workspace/`, tmp `data_dir` |
| integration | `ingest` with `provider="none"` skips Lance and produces an `IngestReport` with `embeddings_indexed=0` | tmp config |
| integration | `ingest` is idempotent: re-running on the same workspace updates `documents.updated_at` and bumps `doc_version` without duplicating rows | tmp `data_dir` |
| integration | `search` with `mode=Lexical` returns hits whose `embedding_model` is `None` | tmp `data_dir` |
@@ -136,36 +136,36 @@ All wired through to the persisted records and the wire output.
| integration | `search` with `mode=Vector` and `provider="none"` returns a clear error | tmp config |
| integration | `list_docs` with `tags_any=["rust"]` filters correctly | tmp `data_dir` |
| integration | `inspect_doc` round-trips a document; `inspect_chunk` round-trips a chunk | tmp `data_dir` |
| smoke | end-to-end CLI smoke: `kb index` + `kb search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture |
| smoke | end-to-end CLI smoke: `kebab index` + `kebab search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture |
`#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kb-app` runs the lexical-only and SQLite-only paths.
`#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kebab-app` runs the lexical-only and SQLite-only paths.
All tests under `cargo test -p kb-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree.
All tests under `cargo test -p kebab-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree.
## Definition of Done
- [ ] `cargo check -p kb-app` passes.
- [ ] `cargo test -p kb-app` passes (default lane).
- [ ] `cargo test -p kb-app -- --ignored` passes on AVX hardware.
- [ ] `cargo run -p kb-cli -- index` succeeds against a non-empty fixture workspace.
- [ ] `cargo run -p kb-cli -- search --mode hybrid "<term>"` returns real hits + citations.
- [ ] `cargo run -p kb-cli -- list` returns the indexed documents.
- [ ] `cargo check -p kebab-app` passes.
- [ ] `cargo test -p kebab-app` passes (default lane).
- [ ] `cargo test -p kebab-app -- --ignored` passes on AVX hardware.
- [ ] `cargo run -p kebab-cli -- index` succeeds against a non-empty fixture workspace.
- [ ] `cargo run -p kebab-cli -- search --mode hybrid "<term>"` returns real hits + citations.
- [ ] `cargo run -p kebab-cli -- list` returns the indexed documents.
- [ ] `cargo clippy --workspace --all-targets -- -D warnings` clean.
- [ ] No imports outside Allowed dependencies.
- [ ] PR links design §1.2, §1.5, §1.6, §7.
## Out of scope
- `kb-app::ask` body — P4-3 (RAG pipeline).
- `kb index --rebuild-fts` CLI option — wire `kb_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task.
- `kb index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer.
- `kebab-app::ask` body — P4-3 (RAG pipeline).
- `kebab index --rebuild-fts` CLI option — wire `kebab_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task.
- `kebab index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer.
- `--watch` mode — P+.
- TUI / desktop integration — P9 consumes the wired facade.
## Risks / notes
- Cold-start cost: first `kb index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2).
- **Post-merge fix (2026-05)**: original P3-5 implementation left every `kb-cli` subcommand calling the bare `kb_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config <path>` flag. CLI now goes through `kb_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details.
- Cold-start cost: first `kebab index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2).
- **Post-merge fix (2026-05)**: original P3-5 implementation left every `kebab-cli` subcommand calling the bare `kebab_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config <path>` flag. CLI now goes through `kebab_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details.
- The `App` lifecycle struct is internal but its construction is the natural seam for adding caching / connection pooling later. Keep it `pub(crate)` so future refactors don't break the CLI.
- Mismatched `index_version` across stored records and the live retriever should fail loud at `App::open` (not at first search). Reuse the `tracing::warn!` from `HybridRetriever::new` (P3-4).
- The fastembed adapter holds a `tokio::runtime` (P3-3); `App` must be constructed from a synchronous context. Document on `App::open`.

View File

@@ -1,12 +1,12 @@
---
phase: P4
component: kb-llm (trait crate)
component: kebab-llm (trait crate)
task_id: p4-1
title: "LanguageModel trait + GenerateRequest/TokenChunk"
status: completed
depends_on: [p0-1]
unblocks: [p4-2, p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§7.1 GenerateRequest/TokenChunk, §7.2 LanguageModel, §0 Q5 streaming, §3.8 ModelRef]
---
@@ -14,16 +14,16 @@ contract_sections: [§7.1 GenerateRequest/TokenChunk, §7.2 LanguageModel, §0 Q
## Goal
Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper types (`GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage`, `ModelRef`), plus a `MockLanguageModel` for downstream tests.
Provide the `kebab-llm` crate that re-exports the `LanguageModel` trait and helper types (`GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage`, `ModelRef`), plus a `MockLanguageModel` for downstream tests.
## Why now / why this size
`kb-rag` (p4-3) consumes a `LanguageModel` trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond.
`kebab-rag` (p4-3) consumes a `LanguageModel` trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`
- `thiserror`
- `tracing`
@@ -31,13 +31,13 @@ Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper
## Forbidden dependencies
- `reqwest`, `ureq`, `tokio`, `whisper-rs`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-tui`, `kb-desktop`
- `reqwest`, `ureq`, `tokio`, `whisper-rs`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `GenerateRequest` | `kb_core::GenerateRequest` | RAG pipeline |
| `GenerateRequest` | `kebab_core::GenerateRequest` | RAG pipeline |
| concrete adapter at runtime | `dyn LanguageModel` | p4-2+ |
## Outputs
@@ -45,12 +45,12 @@ Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper
| output | type | downstream |
|--------|------|------------|
| streaming `TokenChunk` iterator | `Box<dyn Iterator<Item=anyhow::Result<TokenChunk>> + Send>` | RAG pipeline |
| `ModelRef` identity | `kb_core::ModelRef` | Answer.model |
| `ModelRef` identity | `kebab_core::ModelRef` | Answer.model |
## Public surface (signatures only — no new types)
```rust
pub use kb_core::{LanguageModel, GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef};
pub use kebab_core::{LanguageModel, GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef};
/// Test-only deterministic mock. Compiled only when `mock` feature is on.
#[cfg(feature = "mock")]
@@ -59,12 +59,12 @@ pub struct MockLanguageModel {
pub provider: String,
pub context_tokens: usize,
pub canned_response: String, // emitted token-by-token
pub canned_finish: kb_core::FinishReason,
pub canned_usage: kb_core::TokenUsage,
pub canned_finish: kebab_core::FinishReason,
pub canned_usage: kebab_core::TokenUsage,
}
#[cfg(feature = "mock")]
impl kb_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
impl kebab_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
```
## Behavior contract
@@ -89,12 +89,12 @@ impl kb_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
| unit | concatenation of streamed `TokenChunk::Token` equals canned text (truncated by stop strings) | inline |
| contract | `model_ref()` populates `provider` and leaves `dimensions = None` | inline |
All tests under `cargo test -p kb-llm`.
All tests under `cargo test -p kebab-llm`.
## Definition of Done
- [ ] `cargo check -p kb-llm` passes
- [ ] `cargo test -p kb-llm` passes
- [ ] `cargo check -p kebab-llm` passes
- [ ] `cargo test -p kebab-llm` passes
- [ ] No HTTP / async runtime deps present
- [ ] PR links design §7.2 LanguageModel, §0 Q5

View File

@@ -1,12 +1,12 @@
---
phase: P4
component: kb-llm-local (Ollama adapter)
component: kebab-llm-local (Ollama adapter)
task_id: p4-2
title: "OllamaLanguageModel — streaming /api/generate"
status: completed
depends_on: [p4-1]
unblocks: [p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §7.2 LanguageModel, report §11.2 Ollama, design §6.4 [models.llm], design §0 Q5 streaming, design §10 errors]
---
@@ -18,13 +18,13 @@ Implement `OllamaLanguageModel` against Ollama's local HTTP API (`POST /api/gene
## Why now / why this size
First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so swapping providers stays config-only.
First real LM. Required for `kebab ask` to function. Isolated from RAG pipeline so swapping providers stays config-only.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-llm`
- `kebab-core`
- `kebab-config`
- `kebab-llm`
- `reqwest = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls"] }`
- `serde`, `serde_json`
- `tracing`
@@ -32,21 +32,21 @@ First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so
## Forbidden dependencies
- `tokio`, `async-std`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-tui`, `kb-desktop`. (Streaming uses `reqwest::blocking::Response::bytes_stream` via line-delimited JSON; no async runtime needed.)
- `tokio`, `async-std`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-tui`, `kebab-desktop`. (Streaming uses `reqwest::blocking::Response::bytes_stream` via line-delimited JSON; no async runtime needed.)
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-config::Config.models.llm` | endpoint, model, context, temperature, seed | runtime |
| `GenerateRequest` | `kb_core::GenerateRequest` | RAG pipeline |
| `kebab-config::Config.models.llm` | endpoint, model, context, temperature, seed | runtime |
| `GenerateRequest` | `kebab_core::GenerateRequest` | RAG pipeline |
| Ollama HTTP server (local) | `http://127.0.0.1:11434` | external process |
## Outputs
| output | type | downstream |
|--------|------|------------|
| streaming `TokenChunk` iterator | per §7.2 | `kb-rag` |
| streaming `TokenChunk` iterator | per §7.2 | `kebab-rag` |
| `ModelRef` | `{ id, provider="ollama", dimensions=None }` | `Answer.model` |
## Public surface (signatures only — no new types)
@@ -55,14 +55,14 @@ First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so
pub struct OllamaLanguageModel { /* internal: reqwest::blocking::Client + config */ }
impl OllamaLanguageModel {
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
}
impl kb_core::LanguageModel for OllamaLanguageModel {
fn model_ref(&self) -> kb_core::ModelRef;
impl kebab_core::LanguageModel for OllamaLanguageModel {
fn model_ref(&self) -> kebab_core::ModelRef;
fn context_tokens(&self) -> usize;
fn generate_stream(&self, req: kb_core::GenerateRequest)
-> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<kb_core::TokenChunk>> + Send>>;
fn generate_stream(&self, req: kebab_core::GenerateRequest)
-> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<kebab_core::TokenChunk>> + Send>>;
}
```
@@ -93,7 +93,7 @@ impl kb_core::LanguageModel for OllamaLanguageModel {
- UTF-8 boundary: buffer incomplete byte sequences across stream lines before emitting `TokenChunk::Token`.
- Determinism: with `temperature=0` and fixed `seed`, Ollama's output is reproducible (modulo nondeterminism in the model itself); tests that verify determinism use a fixed seed and may rely on aggregate hash with tolerance, NOT byte equality.
- `model_ref().provider = "ollama"`, `dimensions = None`.
- Reachability check: `OllamaLanguageModel::new` does NOT eagerly hit the network; first failure surfaces on `generate_stream`. Use `kb doctor` (separate task) to probe.
- Reachability check: `OllamaLanguageModel::new` does NOT eagerly hit the network; first failure surfaces on `generate_stream`. Use `kebab doctor` (separate task) to probe.
## Storage / wire effects
@@ -112,12 +112,12 @@ impl kb_core::LanguageModel for OllamaLanguageModel {
| determinism | identical request + temperature=0 + seed=0 produces identical token stream against mock | mocked HTTP |
| `#[ignore]` integration | real Ollama on `localhost:11434` with `qwen2.5:14b-instruct` produces non-empty output | requires user opt-in |
All non-ignored tests under `cargo test -p kb-llm-local`. Real-LM integration runs via `cargo test -p kb-llm-local -- --ignored`.
All non-ignored tests under `cargo test -p kebab-llm-local`. Real-LM integration runs via `cargo test -p kebab-llm-local -- --ignored`.
## Definition of Done
- [ ] `cargo check -p kb-llm-local` passes
- [ ] `cargo test -p kb-llm-local` passes (mocked tests; real LM behind `#[ignore]`)
- [ ] `cargo check -p kebab-llm-local` passes
- [ ] `cargo test -p kebab-llm-local` passes (mocked tests; real LM behind `#[ignore]`)
- [ ] No async runtime present (uses `reqwest::blocking`)
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §11.2, §0 Q5, §10
@@ -125,7 +125,7 @@ All non-ignored tests under `cargo test -p kb-llm-local`. Real-LM integration ru
## Out of scope
- llama.cpp / candle adapters (P+).
- Embedding via Ollama's `/api/embed` endpoint (alternate adapter inside `kb-embed-local` if requested later).
- Embedding via Ollama's `/api/embed` endpoint (alternate adapter inside `kebab-embed-local` if requested later).
- Cancellation / abort tokens (P+).
- Connection pooling tuning (default `reqwest::blocking` is sufficient for single-user CLI).

View File

@@ -1,12 +1,12 @@
---
phase: P4
component: kb-rag
component: kebab-rag
task_id: p4-3
title: "RAG pipeline: retrieve → gate → pack → generate → cite-validate"
status: completed
depends_on: [p3-4, p4-2]
unblocks: [p5-1]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§0 Q4 refusal (two-layer), §0 Q7 footer, §1.11.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 [rag], §10 errors]
---
@@ -22,11 +22,11 @@ This is the user-facing payoff. Splitting it further would couple too many inter
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-search` (Retriever trait object)
- `kb-llm` (LanguageModel trait object)
- `kb-store-sqlite` (read chunk full text/section + write `answers` row)
- `kebab-core`
- `kebab-config`
- `kebab-search` (Retriever trait object)
- `kebab-llm` (LanguageModel trait object)
- `kebab-store-sqlite` (read chunk full text/section + write `answers` row)
- `serde`, `serde_json`
- `regex` (for citation marker extraction)
- `time`
@@ -35,51 +35,51 @@ This is the user-facing payoff. Splitting it further would couple too many inter
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector` (only via Retriever trait), `kb-embed*` (only via Retriever), `kb-llm-local` (only via LanguageModel trait), `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector` (only via Retriever trait), `kebab-embed*` (only via Retriever), `kebab-llm-local` (only via LanguageModel trait), `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `query: &str` | text | `kb-app::ask` |
| `query: &str` | text | `kebab-app::ask` |
| `AskOpts` | k, explain, mode, temperature, seed | CLI |
| `dyn Retriever` | hybrid retriever from p3-4 | runtime injection |
| `dyn LanguageModel` | from p4-2 (or mock) | runtime injection |
| `dyn DocumentStore` | for chunk full-text fetch | from p1-6 |
| `kb-config::Config.rag` | `prompt_template_version`, `score_gate`, `max_context_tokens` | runtime |
| `kebab-config::Config.rag` | `prompt_template_version`, `score_gate`, `max_context_tokens` | runtime |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Answer` | `kb_core::Answer` | `kb-cli` printer, `answers` table |
| `Answer` | `kebab_core::Answer` | `kebab-cli` printer, `answers` table |
| `answers` table row | SQLite | history, eval |
## Public surface (signatures only — no new types)
```rust
pub struct RagPipeline {
retriever: std::sync::Arc<dyn kb_core::Retriever>,
llm: std::sync::Arc<dyn kb_core::LanguageModel>,
docs: std::sync::Arc<kb_store_sqlite::SqliteStore>,
config: kb_config::Config,
retriever: std::sync::Arc<dyn kebab_core::Retriever>,
llm: std::sync::Arc<dyn kebab_core::LanguageModel>,
docs: std::sync::Arc<kebab_store_sqlite::SqliteStore>,
config: kebab_config::Config,
}
impl RagPipeline {
pub fn new(
config: kb_config::Config,
retriever: std::sync::Arc<dyn kb_core::Retriever>,
llm: std::sync::Arc<dyn kb_core::LanguageModel>,
docs: std::sync::Arc<kb_store_sqlite::SqliteStore>,
config: kebab_config::Config,
retriever: std::sync::Arc<dyn kebab_core::Retriever>,
llm: std::sync::Arc<dyn kebab_core::LanguageModel>,
docs: std::sync::Arc<kebab_store_sqlite::SqliteStore>,
) -> Self;
pub fn ask(&self, query: &str, opts: AskOpts) -> anyhow::Result<kb_core::Answer>;
pub fn ask(&self, query: &str, opts: AskOpts) -> anyhow::Result<kebab_core::Answer>;
}
pub struct AskOpts {
pub k: usize,
pub explain: bool,
pub mode: kb_core::SearchMode,
pub mode: kebab_core::SearchMode,
pub temperature: Option<f32>,
pub seed: Option<u64>,
pub stream_sink: Option<std::sync::mpsc::Sender<String>>, // tty/UI token streaming
@@ -158,12 +158,12 @@ pub struct AskOpts {
| determinism | identical inputs + temperature=0 + seed=0 → identical Answer (snapshot) | mock |
| snapshot | `Answer` JSON for fixed query stable | `fixtures/rag/run-1.json` |
All tests under `cargo test -p kb-rag` with no real Ollama (mock LM only).
All tests under `cargo test -p kebab-rag` with no real Ollama (mock LM only).
## Definition of Done
- [ ] `cargo check -p kb-rag` passes
- [ ] `cargo test -p kb-rag` passes
- [ ] `cargo check -p kebab-rag` passes
- [ ] `cargo test -p kebab-rag` passes
- [ ] No imports outside Allowed dependencies
- [ ] All paths write an `answers` row
- [ ] Output JSON conforms to `answer.v1`
@@ -178,9 +178,9 @@ All tests under `cargo test -p kb-rag` with no real Ollama (mock LM only).
## Risks / notes
- Citation regex is STRICT `\[#(\d{1,3})\]` only. Models that emit `[1]`/`[ #1 ]`/`[foo]` are treated as no-marker → refusal. This is intentional: a noisy citation grammar lets prose `[1]` or `vec![1]` slip through as false positives, which corrupts both `grounded` and `kb eval` `citation_coverage`. The prompt template (`rag-v1`) explicitly instructs `[#번호]`.
- **Post-merge fix (2026-05)**: kb-cli's `Cmd::Ask` arm originally called bare `kb_app::ask(query, opts)`, ignoring `--config <path>` and silently using XDG defaults (manifested as wrong model id / score_gate / data_dir surfacing in `Answer.retrieval`). Fixed by routing through `kb_app::ask_with_config(cfg, query, opts)`. See [HOTFIXES.md](../HOTFIXES.md).
- **Post-merge fix (2026-05)**: `config.rag.score_gate` default `0.05` was incompatible with hybrid RRF `fusion_score` (raw range `(0, 2/k_rrf]` ≈ `≤ 0.033` at default k_rrf=60), refusing every hybrid `kb ask`. Closed by normalizing RRF in p3-4 to `[0, 1]`. See p3-4 spec + HOTFIXES.md.
- Citation regex is STRICT `\[#(\d{1,3})\]` only. Models that emit `[1]`/`[ #1 ]`/`[foo]` are treated as no-marker → refusal. This is intentional: a noisy citation grammar lets prose `[1]` or `vec![1]` slip through as false positives, which corrupts both `grounded` and `kebab eval` `citation_coverage`. The prompt template (`rag-v1`) explicitly instructs `[#번호]`.
- **Post-merge fix (2026-05)**: kebab-cli's `Cmd::Ask` arm originally called bare `kebab_app::ask(query, opts)`, ignoring `--config <path>` and silently using XDG defaults (manifested as wrong model id / score_gate / data_dir surfacing in `Answer.retrieval`). Fixed by routing through `kebab_app::ask_with_config(cfg, query, opts)`. See [HOTFIXES.md](../HOTFIXES.md).
- **Post-merge fix (2026-05)**: `config.rag.score_gate` default `0.05` was incompatible with hybrid RRF `fusion_score` (raw range `(0, 2/k_rrf]` ≈ `≤ 0.033` at default k_rrf=60), refusing every hybrid `kebab ask`. Closed by normalizing RRF in p3-4 to `[0, 1]`. See p3-4 spec + HOTFIXES.md.
- `stream_sink` channel: pipeline `send`s tokens; if the receiver is dropped (caller cancelled), `SendError` is silently swallowed and generation continues to completion (so the `Answer` row still gets persisted). Pipeline does NOT panic on a dead sink.
- `temperature=0` does not fully eliminate stochasticity in some quantized Ollama models; document this and rely on `must_contain` rule-based metrics in P5 instead of exact match.
- Prompt-injection defense lives entirely in the system prompt; do NOT mutate `[근거]` text. If chunk text contains `<|system|>` or similar tokens, do not strip them — they are inert when wrapped.

View File

@@ -1,12 +1,12 @@
---
phase: P5
component: kb-eval (runner)
component: kebab-eval (runner)
task_id: p5-1
title: "Golden query fixture loader + per-query runner"
status: completed
depends_on: [p4-3]
unblocks: [p5-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5.7 eval_runs/eval_query_results, §6.3 runs_dir, phase epic tasks/phase-5-evaluation.md]
---
@@ -14,7 +14,7 @@ contract_sections: [§5.7 eval_runs/eval_query_results, §6.3 runs_dir, phase ep
## Goal
Load `fixtures/golden_queries.yaml`, run each query through `kb-app` (lexical / vector / hybrid / rag), and persist results into `eval_query_results` + `runs_dir/<run_id>/per_query.jsonl`.
Load `fixtures/golden_queries.yaml`, run each query through `kebab-app` (lexical / vector / hybrid / rag), and persist results into `eval_query_results` + `runs_dir/<run_id>/per_query.jsonl`.
## Why now / why this size
@@ -22,10 +22,10 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-app` (calls facade for search / ask)
- `kb-store-sqlite` (writes eval rows)
- `kebab-core`
- `kebab-config`
- `kebab-app` (calls facade for search / ask)
- `kebab-store-sqlite` (writes eval rows)
- `serde`, `serde_yaml`, `serde_json`
- `time`
- `tracing`
@@ -33,7 +33,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (all reached via `kb-app` facade only), `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (all reached via `kebab-app` facade only), `kebab-tui`, `kebab-desktop`
## Inputs
@@ -41,7 +41,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
|-------|------|--------|
| `fixtures/golden_queries.yaml` | YAML | repo-shipped |
| `EvalRunOpts` | suite, mode, with_rag, k, temperature, seed | CLI |
| `kb-app` facade | search/ask | runtime |
| `kebab-app` facade | search/ask | runtime |
## Outputs
@@ -50,7 +50,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
| `eval_runs` row | SQLite | p5-2, history |
| `eval_query_results` rows | SQLite | p5-2 |
| `runs_dir/<run_id>/per_query.jsonl` | filesystem | external tools, audits |
| `EvalRun` struct | `kb_eval::EvalRun` | caller |
| `EvalRun` struct | `kebab_eval::EvalRun` | caller |
## Public surface (signatures only — no new types)
@@ -58,9 +58,9 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
pub struct GoldenQuery {
pub id: String,
pub query: String,
pub lang: kb_core::Lang,
pub expected_doc_ids: Vec<kb_core::DocumentId>,
pub expected_chunk_ids: Vec<kb_core::ChunkId>,
pub lang: kebab_core::Lang,
pub expected_doc_ids: Vec<kebab_core::DocumentId>,
pub expected_chunk_ids: Vec<kebab_core::ChunkId>,
pub must_contain: Vec<String>,
pub forbidden: Vec<String>,
pub difficulty: Option<String>,
@@ -68,7 +68,7 @@ pub struct GoldenQuery {
pub struct EvalRunOpts {
pub suite: String, // "golden" default
pub mode: kb_core::SearchMode,
pub mode: kebab_core::SearchMode,
pub with_rag: bool,
pub k: usize,
pub temperature: Option<f32>,
@@ -86,9 +86,9 @@ pub struct EvalRun {
pub struct QueryResult {
pub query_id: String,
pub query: String,
pub mode: kb_core::SearchMode,
pub hits_top_k: Vec<kb_core::SearchHit>,
pub answer: Option<kb_core::Answer>,
pub mode: kebab_core::SearchMode,
pub hits_top_k: Vec<kebab_core::SearchHit>,
pub answer: Option<kebab_core::Answer>,
pub elapsed_ms: u32,
pub error: Option<String>,
}
@@ -103,10 +103,10 @@ pub fn run_eval(opts: &EvalRunOpts) -> anyhow::Result<EvalRun>;
- Parses YAML; required fields: `id`, `query`. Optional: everything else (defaults to empty / `None`).
- Validates uniqueness of `id` and that `expected_doc_ids` / `expected_chunk_ids` exist in DB; missing → return error listing the offenders.
- `run_eval`:
- Loads `fixtures/golden_queries.yaml` (path overridable via env `KB_EVAL_GOLDEN`).
- Loads `fixtures/golden_queries.yaml` (path overridable via env `KEBAB_EVAL_GOLDEN`).
- Generates `run_id = "run_" + ulid_lower()`.
- Captures `config_snapshot_json`: serialized `kb_config::Config` plus `chunker_version`, `embedding_model+version+dims`, `llm.model_id`, `prompt_template_version`, `score_gate`, `rrf_k`, `index_version`.
- For each query: call `kb_app::search(SearchQuery { mode: opts.mode, k: opts.k, .. })`. If `opts.with_rag`, also call `kb_app::ask(query, AskOpts { mode: opts.mode, k: opts.k, explain: true, temperature: opts.temperature, seed: opts.seed, .. })`.
- Captures `config_snapshot_json`: serialized `kebab_config::Config` plus `chunker_version`, `embedding_model+version+dims`, `llm.model_id`, `prompt_template_version`, `score_gate`, `rrf_k`, `index_version`.
- For each query: call `kebab_app::search(SearchQuery { mode: opts.mode, k: opts.k, .. })`. If `opts.with_rag`, also call `kebab_app::ask(query, AskOpts { mode: opts.mode, k: opts.k, explain: true, temperature: opts.temperature, seed: opts.seed, .. })`.
- Each `QueryResult` measured by elapsed wall-clock (ms).
- Errors are caught per-query (do not abort the run). Failed queries record `error: Some(msg)` and `hits_top_k = vec![]`.
- Determinism: with `temperature=0` and fixed `seed`, two consecutive runs produce byte-identical `per_query.jsonl` for non-RAG queries; RAG queries may differ in negligible token budget telemetry.
@@ -130,12 +130,12 @@ pub fn run_eval(opts: &EvalRunOpts) -> anyhow::Result<EvalRun>;
| determinism | re-running same suite + fixed seed → identical `per_query.jsonl` (lexical only) | tmp DB, fixed corpus |
| snapshot | `EvalRun` (with mock LM for `with_rag`) JSON stable | `fixtures/eval/run-1.json` |
All tests under `cargo test -p kb-eval runner`.
All tests under `cargo test -p kebab-eval runner`.
## Definition of Done
- [ ] `cargo check -p kb-eval` passes
- [ ] `cargo test -p kb-eval runner` passes
- [ ] `cargo check -p kebab-eval` passes
- [ ] `cargo test -p kebab-eval runner` passes
- [ ] `fixtures/golden_queries.yaml` template shipped (≥ 5 example entries)
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §5.7

View File

@@ -1,12 +1,12 @@
---
phase: P5
component: kb-eval (metrics + compare)
component: kebab-eval (metrics + compare)
task_id: p5-2
title: "Metrics computation + compare report"
status: completed
depends_on: [p5-1]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5.7 eval_runs.aggregate_json, phase epic tasks/phase-5-evaluation.md]
---
@@ -14,7 +14,7 @@ contract_sections: [§5.7 eval_runs.aggregate_json, phase epic tasks/phase-5-eva
## Goal
Compute hit@k, MRR, recall@k_doc, citation_coverage, groundedness, empty_result_rate, refusal_correctness from stored `eval_query_results`. Write `aggregate_json` back into `eval_runs`. Provide `kb eval compare a b` that diffs two runs.
Compute hit@k, MRR, recall@k_doc, citation_coverage, groundedness, empty_result_rate, refusal_correctness from stored `eval_query_results`. Write `aggregate_json` back into `eval_runs`. Provide `kebab eval compare a b` that diffs two runs.
## Why now / why this size
@@ -22,16 +22,16 @@ Metric formulas + comparison logic are pure computation. Splitting them from p5-
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-store-sqlite` (read eval rows, write `aggregate_json`)
- `kebab-core`
- `kebab-config`
- `kebab-store-sqlite` (read eval rows, write `aggregate_json`)
- `serde`, `serde_json`
- `tracing`
- `thiserror`
## Forbidden dependencies
- `kb-app`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-app`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
@@ -46,7 +46,7 @@ Metric formulas + comparison logic are pure computation. Splitting them from p5-
| output | type | downstream |
|--------|------|------------|
| `eval_runs.aggregate_json` updated | SQLite | history, CI checks |
| `CompareReport` | `kb_eval::CompareReport` | `kb-cli` printer |
| `CompareReport` | `kebab_eval::CompareReport` | `kebab-cli` printer |
| optional `runs_dir/<run_id>/report.md` | filesystem | human-readable summary |
## Public surface (signatures only — no new types)
@@ -127,15 +127,15 @@ pub fn render_report_md(report: &CompareReport) -> String;
| determinism | running `compute_aggregate` twice produces identical `AggregateMetrics` | inline |
| snapshot | `CompareReport` JSON for a fixed pair of runs stable | `fixtures/eval/compare-1.json` |
All tests under `cargo test -p kb-eval metrics`.
All tests under `cargo test -p kebab-eval metrics`.
## Definition of Done
- [ ] `cargo check -p kb-eval` passes
- [ ] `cargo test -p kb-eval metrics` passes
- [ ] `cargo check -p kebab-eval` passes
- [ ] `cargo test -p kebab-eval metrics` passes
- [ ] No imports outside Allowed dependencies
- [ ] `eval_runs.aggregate_json` always populated after `store_aggregate`
- [ ] `kb eval compare` CLI surface integrated via `kb-app` (call `compare_runs` + `render_report_md`)
- [ ] `kebab eval compare` CLI surface integrated via `kebab-app` (call `compare_runs` + `render_report_md`)
- [ ] PR links phase epic tasks/phase-5-evaluation.md
## Out of scope
@@ -172,11 +172,11 @@ same one this spec defines, the names / wiring just differ.
/ `compare_runs(a, b)` so integration tests can drive the pipeline
against a TempDir-backed `Config`. The no-arg forms wrap them with
`Config::load(None)`.
- **CLI surface lives on `kb-cli` directly, not via `kb-app`.** DoD
asks for `kb eval compare` to be reached "via kb-app", but `kb-app`
already depends on `kb-eval` (the P5-1 runner uses the App facade),
so routing the CLI through `kb-app` would form a cycle. `kb-cli`
`kb-eval` is wired directly; `kb-app` is unchanged.
- **CLI surface lives on `kebab-cli` directly, not via `kebab-app`.** DoD
asks for `kebab eval compare` to be reached "via kebab-app", but `kebab-app`
already depends on `kebab-eval` (the P5-1 runner uses the App facade),
so routing the CLI through `kebab-app` would form a cycle. `kebab-cli`
`kebab-eval` is wired directly; `kebab-app` is unchanged.
- **`AggregateMetrics` is `Serialize + Deserialize`.** The spec defines
only the field shape; we add `Deserialize` so the stored
`aggregate_json` can round-trip back into the type for follow-up
@@ -184,12 +184,12 @@ same one this spec defines, the names / wiring just differ.
- **`anyhow`** is used in `Result` returns since the rest of the
workspace already speaks anyhow; not in the spec's Allowed list but
matches every other crate.
- **`kb-eval` crate-level `kb-app` dep stays.** The crate already
depends on `kb-app` from P5-1 (the runner uses the `App` facade), so
- **`kebab-eval` crate-level `kebab-app` dep stays.** The crate already
depends on `kebab-app` from P5-1 (the runner uses the `App` facade), so
the Cargo.toml entry remains. The new modules (`metrics.rs`,
`compare.rs`) do not import `kb-app` themselves — they're behind the
`compare.rs`) do not import `kebab-app` themselves — they're behind the
same crate boundary as the runner, but the metric/compare *surface*
is `kb-app`-clean. Splitting the crate to avoid a transitive Cargo
is `kebab-app`-clean. Splitting the crate to avoid a transitive Cargo
edge would be churn for no behavior gain.
- **`citation_coverage` is intentionally weaker than the spec literal.**
Spec calls for "every citation resolves to a real chunk in the DB".

View File

@@ -1,12 +1,12 @@
---
phase: P6
component: kb-parse-image (image extractor + EXIF)
component: kebab-parse-image (image extractor + EXIF)
task_id: p6-1
title: "Image Extractor producing single-block CanonicalDocument + EXIF metadata"
status: planned
depends_on: [p0-1, p1-6]
unblocks: [p6-2, p6-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 Block::ImageRef + ImageRefBlock, §3.7a OcrText/ModelCaption stubs, §9.1 image extraction policy, §9 versioning]
---
@@ -22,8 +22,8 @@ Establishes the image-as-document contract and decouples extraction (asset → I
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `image = "0.25"` (decoding for size + format detect)
- `kamadak-exif` for EXIF
- `serde`, `serde_json`
@@ -33,31 +33,31 @@ Establishes the image-as-document contract and decouples extraction (asset → I
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, OCR libs, LLM libs
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, OCR libs, LLM libs
## Inputs
| input | type | source |
|-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | from `kb-source-fs` |
| `RawAsset` | `kebab_core::RawAsset` | from `kebab-source-fs` |
| image bytes | `&[u8]` | filesystem |
| `parser_version` | `kb_core::ParserVersion` | constant in this crate (`"image-meta-v1"`) |
| `parser_version` | `kebab_core::ParserVersion` | constant in this crate (`"image-meta-v1"`) |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (image-region chunker) → `kb-store-sqlite` |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (image-region chunker) → `kebab-store-sqlite` |
## Public surface (signatures only — no new types)
```rust
pub struct ImageExtractor;
impl kb_core::Extractor for ImageExtractor {
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Image(_)) }
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("image-meta-v1".into()) }
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>;
impl kebab_core::Extractor for ImageExtractor {
fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Image(_)) }
fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("image-meta-v1".into()) }
fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
}
```
@@ -69,7 +69,7 @@ impl kb_core::Extractor for ImageExtractor {
- `metadata.source_type = SourceType::Reference` (per design enum); `trust_level = TrustLevel::Primary`; `tags`/`aliases` empty.
- `metadata.user["exif"]` = JSON object with whitelisted EXIF tags (DateTimeOriginal, GPS lat/lon, Make, Model, Orientation, Software). Missing tags omitted.
- `metadata.user["dimensions"] = { "w": <u32>, "h": <u32>, "format": "<png|jpeg|...>" }`.
- `provenance` includes `Discovered`, `Parsed` events (no Normalized — ID assignment happens here directly per §3.4 stub from p1-4 logic, OR pipe through `kb-normalize` if available; this task's choice: emit a fully formed CanonicalDocument with deterministic IDs by calling `kb_core::id_for_doc` and `kb_core::id_for_block` directly).
- `provenance` includes `Discovered`, `Parsed` events (no Normalized — ID assignment happens here directly per §3.4 stub from p1-4 logic, OR pipe through `kebab-normalize` if available; this task's choice: emit a fully formed CanonicalDocument with deterministic IDs by calling `kebab_core::id_for_doc` and `kebab_core::id_for_block` directly).
- Failure modes:
- Truncated/corrupt image → still emits a CanonicalDocument with `dimensions = null`, EXIF empty, `Provenance` warning event with the decoder error message.
- Unsupported format → `anyhow::Error` (caller skips).
@@ -77,7 +77,7 @@ impl kb_core::Extractor for ImageExtractor {
## Storage / wire effects
- None directly (the caller persists via `kb-store-sqlite`).
- None directly (the caller persists via `kebab-store-sqlite`).
## Test plan
@@ -90,12 +90,12 @@ impl kb_core::Extractor for ImageExtractor {
| determinism | identical bytes → identical `doc_id`, `block_id` across two runs | inline |
| snapshot | `CanonicalDocument` JSON stable for fixture | `fixtures/image/red-100x50.png` |
All tests under `cargo test -p kb-parse-image`.
All tests under `cargo test -p kebab-parse-image`.
## Definition of Done
- [ ] `cargo check -p kb-parse-image` passes
- [ ] `cargo test -p kb-parse-image` passes
- [ ] `cargo check -p kebab-parse-image` passes
- [ ] `cargo test -p kebab-parse-image` passes
- [ ] No OCR/caption/embedding code present
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4, §9.1

View File

@@ -1,12 +1,12 @@
---
phase: P6
component: kb-parse-image (OCR adapter)
component: kebab-parse-image (OCR adapter)
task_id: p6-2
title: "OcrEngine trait + Tesseract adapter (Apple Vision feature-gated)"
status: planned
depends_on: [p6-1]
unblocks: [p6-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 ImageRefBlock.ocr, §3.7a OcrText/OcrRegion, §9.1 OCR vs caption provenance]
---
@@ -22,9 +22,9 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-parse-image` (consumes its types)
- `kebab-core`
- `kebab-config`
- `kebab-parse-image` (consumes its types)
- `tesseract = "0.13"` (feature `tesseract`, default ON)
- For feature `apple-vision`: `std::process::Command` only (sidecar binary, not a Rust dep)
- `serde`, `serde_json`
@@ -34,21 +34,21 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| image bytes | `&[u8]` | from extractor |
| optional language hint | `kb_core::Lang` | metadata |
| `kb-config` OCR settings | engine name, languages | runtime |
| optional language hint | `kebab_core::Lang` | metadata |
| `kebab-config` OCR settings | engine name, languages | runtime |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `OcrText` | `kb_core::OcrText` | merged into `ImageRefBlock.ocr` |
| `OcrText` | `kebab_core::OcrText` | merged into `ImageRefBlock.ocr` |
## Public surface (signatures only — no new types)
@@ -56,11 +56,11 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
pub trait OcrEngine: Send + Sync {
fn engine_name(&self) -> &'static str;
fn engine_version(&self) -> String;
fn recognize(&self, image_bytes: &[u8], lang_hint: Option<&kb_core::Lang>) -> anyhow::Result<kb_core::OcrText>;
fn recognize(&self, image_bytes: &[u8], lang_hint: Option<&kebab_core::Lang>) -> anyhow::Result<kebab_core::OcrText>;
}
pub struct TesseractOcr { /* internal: lazy api handle */ }
impl TesseractOcr { pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; }
impl TesseractOcr { pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>; }
impl OcrEngine for TesseractOcr { /* per trait */ }
#[cfg(feature = "apple-vision")]
@@ -71,8 +71,8 @@ impl OcrEngine for AppleVisionOcr { /* per trait */ }
pub fn apply_ocr(
engine: &dyn OcrEngine,
image_bytes: &[u8],
block: &mut kb_core::ImageRefBlock,
lang_hint: Option<&kb_core::Lang>,
block: &mut kebab_core::ImageRefBlock,
lang_hint: Option<&kebab_core::Lang>,
) -> anyhow::Result<()>;
```
@@ -85,7 +85,7 @@ pub fn apply_ocr(
- `joined` = `regions.iter().map(|r| r.text).join(" ")` (no smart layout reconstruction in v1).
- `engine = "tesseract"`, `engine_version = <tesseract version string>`. The `tesseract` crate (0.13+) does NOT expose a stable Rust `version()` accessor. Use one of: (a) call libtesseract's `TessVersion()` via the bundled FFI surface, OR (b) at adapter construction, shell-out `tesseract --version` once and cache the parsed `"5.3.4"`-style string. Both are deterministic for a fixed install. Pin the chosen approach in the implementation PR.
- Apple Vision sidecar (feature `apple-vision`):
- Spawn a small Swift binary `kb-vision-ocr` (path from `config.ocr.apple_vision_binary`) feeding the image via stdin and reading JSON `{ regions: [{x,y,w,h,text,confidence}, ...] }` from stdout.
- Spawn a small Swift binary `kebab-vision-ocr` (path from `config.ocr.apple_vision_binary`) feeding the image via stdin and reading JSON `{ regions: [{x,y,w,h,text,confidence}, ...] }` from stdout.
- Same threshold and `joined` rules as Tesseract. `engine = "apple-vision"`, `engine_version = sidecar's --version`.
- This subagent task does NOT write the Swift sidecar; it only wires the Rust side. Document the expected sidecar interface in `docs/spec/sidecar-vision.md` (separate doc spec stub, optional).
- `apply_ocr` calls `engine.recognize`, sets `block.ocr = Some(text)`, and appends a `Provenance::OcrApplied` event in the caller's CanonicalDocument (caller responsibility — this task exposes a helper).
@@ -109,12 +109,12 @@ pub fn apply_ocr(
| determinism | two runs of recognize on same input → identical OcrText | fixture |
| `#[cfg(feature = "apple-vision")]` smoke | sidecar invocation captured (mock binary echoes fixed JSON) | inline mock |
All tests under `cargo test -p kb-parse-image ocr`. Tesseract install required on CI host.
All tests under `cargo test -p kebab-parse-image ocr`. Tesseract install required on CI host.
## Definition of Done
- [ ] `cargo check -p kb-parse-image --features tesseract` passes
- [ ] `cargo test -p kb-parse-image ocr` passes
- [ ] `cargo check -p kebab-parse-image --features tesseract` passes
- [ ] `cargo test -p kebab-parse-image ocr` passes
- [ ] `apple-vision` feature compiles on macOS and gracefully no-ops on Linux
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4, §3.7a, §9.1
@@ -129,5 +129,5 @@ All tests under `cargo test -p kb-parse-image ocr`. Tesseract install required o
## Risks / notes
- Tesseract performance varies wildly with image quality; document `min_confidence` and default page-segmentation mode.
- Apple Vision sidecar requires code signing for distribution; for v1 dev builds, accept unsigned binary from `~/.local/bin/kb-vision-ocr`.
- Apple Vision sidecar requires code signing for distribution; for v1 dev builds, accept unsigned binary from `~/.local/bin/kebab-vision-ocr`.
- Large image downscale loses small-text recognition; expose `config.ocr.max_pixels` so power users can tune.

View File

@@ -1,12 +1,12 @@
---
phase: P6
component: kb-parse-image (caption adapter)
component: kebab-parse-image (caption adapter)
task_id: p6-3
title: "ModelCaption adapter (LanguageModel-driven, feature-gated)"
status: planned
depends_on: [p6-1, p4-2]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 ImageRefBlock.caption, §3.7a ModelCaption, §9.1 caption (model-generated, low trust)]
---
@@ -22,10 +22,10 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-parse-image`
- `kb-llm` (LanguageModel trait)
- `kebab-core`
- `kebab-config`
- `kebab-parse-image`
- `kebab-llm` (LanguageModel trait)
- `base64`
- `serde`, `serde_json`
- `image` (resize for prompt cost control)
@@ -34,7 +34,7 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-llm-local` (only via trait), `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-llm-local` (only via trait), `kebab-tui`, `kebab-desktop`
## Inputs
@@ -42,28 +42,28 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
|-------|------|--------|
| image bytes | `&[u8]` | extractor |
| `dyn LanguageModel` (vision-capable) | runtime | injected |
| `kb-config.image.caption` | `{ enabled, max_pixels, prompt_template_version }` | runtime |
| `kebab-config.image.caption` | `{ enabled, max_pixels, prompt_template_version }` | runtime |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `ModelCaption` | `kb_core::ModelCaption` | merged into `ImageRefBlock.caption` |
| `ModelCaption` | `kebab_core::ModelCaption` | merged into `ImageRefBlock.caption` |
## Public surface (signatures only — no new types)
```rust
pub fn caption_image(
llm: &dyn kb_core::LanguageModel,
llm: &dyn kebab_core::LanguageModel,
image_bytes: &[u8],
cfg: &kb_config::Config,
) -> anyhow::Result<kb_core::ModelCaption>;
cfg: &kebab_config::Config,
) -> anyhow::Result<kebab_core::ModelCaption>;
pub fn apply_caption(
llm: &dyn kb_core::LanguageModel,
llm: &dyn kebab_core::LanguageModel,
image_bytes: &[u8],
block: &mut kb_core::ImageRefBlock,
cfg: &kb_config::Config,
block: &mut kebab_core::ImageRefBlock,
cfg: &kebab_config::Config,
) -> anyhow::Result<()>;
```
@@ -86,7 +86,7 @@ pub fn apply_caption(
## Storage / wire effects
- None directly. Caller persists via `kb-store-sqlite`.
- None directly. Caller persists via `kebab-store-sqlite`.
## Test plan
@@ -99,12 +99,12 @@ pub fn apply_caption(
| unit | downscale honors `max_pixels` (resulting bytes < some threshold) | fixture large image |
| determinism | identical input + temperature=0 + seed=0 → identical caption (mock) | inline |
All tests under `cargo test -p kb-parse-image caption` with mock LM only.
All tests under `cargo test -p kebab-parse-image caption` with mock LM only.
## Definition of Done
- [ ] `cargo check -p kb-parse-image --features caption` passes
- [ ] `cargo test -p kb-parse-image caption` passes
- [ ] `cargo check -p kebab-parse-image --features caption` passes
- [ ] `cargo test -p kebab-parse-image caption` passes
- [ ] No imports outside Allowed dependencies
- [ ] Feature default OFF; only on when user opts in via config
- [ ] PR links design §3.4 ImageRefBlock.caption, §9.1

View File

@@ -1,12 +1,12 @@
---
phase: P7
component: kb-parse-pdf (text extractor)
component: kebab-parse-pdf (text extractor)
task_id: p7-1
title: "Text PDF extractor → CanonicalDocument with page-level blocks"
status: planned
depends_on: [p0-1, p1-6]
unblocks: [p7-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 SourceSpan::Page, §3.4 Block::Paragraph, §9.2 PDF text extraction, §9 versioning]
---
@@ -22,8 +22,8 @@ Strict scope: page text + page numbers. Layout reconstruction (multi-column merg
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `pdf-extract = "0.7"` (or current stable)
- `lopdf = "0.32"` for page metadata (count, optional title from /Info)
- `serde`, `serde_json`
@@ -33,30 +33,30 @@ Strict scope: page text + page numbers. Layout reconstruction (multi-column merg
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, OCR libraries (OCR fallback is a separate task, not this one)
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, OCR libraries (OCR fallback is a separate task, not this one)
## Inputs
| input | type | source |
|-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | `kb-source-fs` |
| `RawAsset` | `kebab_core::RawAsset` | `kebab-source-fs` |
| PDF bytes | `&[u8]` | filesystem |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (`pdf-page-v1` chunker in p7-2) |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (`pdf-page-v1` chunker in p7-2) |
## Public surface (signatures only — no new types)
```rust
pub struct PdfTextExtractor;
impl kb_core::Extractor for PdfTextExtractor {
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Pdf) }
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("pdf-text-v1".into()) }
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>;
impl kebab_core::Extractor for PdfTextExtractor {
fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Pdf) }
fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("pdf-text-v1".into()) }
fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
}
```
@@ -100,12 +100,12 @@ impl kb_core::Extractor for PdfTextExtractor {
| determinism | identical bytes → identical CanonicalDocument JSON across two runs | inline |
| snapshot | CanonicalDocument JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` |
All tests under `cargo test -p kb-parse-pdf`.
All tests under `cargo test -p kebab-parse-pdf`.
## Definition of Done
- [ ] `cargo check -p kb-parse-pdf` passes
- [ ] `cargo test -p kb-parse-pdf` passes
- [ ] `cargo check -p kebab-parse-pdf` passes
- [ ] `cargo test -p kebab-parse-pdf` passes
- [ ] No OCR / LLM code present
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4 SourceSpan::Page, §9.2

View File

@@ -1,12 +1,12 @@
---
phase: P7
component: kb-chunk (pdf-page-v1)
component: kebab-chunk (pdf-page-v1)
task_id: p7-2
title: "PDF page-aware chunker (pdf-page-v1)"
status: planned
depends_on: [p7-1]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning]
---
@@ -22,8 +22,8 @@ Per-medium chunkers must stay tiny and obvious. Page-aware logic is small but it
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`, `serde_json`
- `blake3` (policy_hash)
- `serde-json-canonicalizer`
@@ -31,30 +31,30 @@ Per-medium chunkers must stay tiny and obvious. Page-aware logic is small but it
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf` (consumes `CanonicalDocument` via `kb-core` only), `kb-normalize`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf` (consumes `CanonicalDocument` via `kebab-core` only), `kebab-normalize`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `CanonicalDocument` (produced by `pdf-text-v1`) | `kb_core::CanonicalDocument` | p7-1 |
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` |
| `CanonicalDocument` (produced by `pdf-text-v1`) | `kebab_core::CanonicalDocument` | p7-1 |
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite`, `kb-embed*` |
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite`, `kebab-embed*` |
## Public surface (signatures only — no new types)
```rust
pub struct PdfPageV1Chunker;
impl kb_core::Chunker for PdfPageV1Chunker {
fn chunker_version(&self) -> kb_core::ChunkerVersion { kb_core::ChunkerVersion("pdf-page-v1".into()) }
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
impl kebab_core::Chunker for PdfPageV1Chunker {
fn chunker_version(&self) -> kebab_core::ChunkerVersion { kebab_core::ChunkerVersion("pdf-page-v1".into()) }
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
}
```
@@ -62,7 +62,7 @@ impl kb_core::Chunker for PdfPageV1Chunker {
## Behavior contract
- Only operates on documents whose blocks all carry `SourceSpan::Page` (i.e., from `kb-parse-pdf`). Other documents → return `anyhow::Error("PdfPageV1Chunker only handles PDF docs")`.
- Only operates on documents whose blocks all carry `SourceSpan::Page` (i.e., from `kebab-parse-pdf`). Other documents → return `anyhow::Error("PdfPageV1Chunker only handles PDF docs")`.
- For each page block (1 block per page after p7-1):
- If `text.len()` (byte estimate) ≤ `policy.target_tokens * 4` (proxy for tokens) → emit one chunk for the entire page.
- Else → split by paragraphs (split text on `\n\n` or sentence-ending punctuation followed by whitespace) and group adjacent paragraphs until the running byte total approaches `policy.target_tokens * 4`. Apply `policy.overlap_tokens * 4` bytes of trailing overlap into the next chunk's prefix.
@@ -91,12 +91,12 @@ impl kb_core::Chunker for PdfPageV1Chunker {
| determinism | same input → same chunk_ids twice | inline |
| snapshot | `Vec<Chunk>` JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` (chunked) |
All tests under `cargo test -p kb-chunk pdf`.
All tests under `cargo test -p kebab-chunk pdf`.
## Definition of Done
- [ ] `cargo check -p kb-chunk` passes (existing `md-heading-v1` continues to pass)
- [ ] `cargo test -p kb-chunk pdf` passes
- [ ] `cargo check -p kebab-chunk` passes (existing `md-heading-v1` continues to pass)
- [ ] `cargo test -p kebab-chunk pdf` passes
- [ ] Snapshot stable across two runs
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.5, §0 Q3, §9

View File

@@ -1,12 +1,12 @@
---
phase: P8
component: kb-parse-audio (whisper adapter)
component: kebab-parse-audio (whisper adapter)
task_id: p8-1
title: "Audio Extractor + Transcriber trait + whisper.cpp adapter"
status: planned
depends_on: [p0-1, p1-6]
unblocks: [p8-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 Block::AudioRef + AudioRefBlock, §3.7a Transcript + TranscriptSegment, §9.3 audio policy, §9 versioning]
---
@@ -22,8 +22,8 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `whisper-rs = "0.13"` (or current stable)
- `symphonia = { version = "0.5", features = ["all"] }` — decode `.m4a/.mp3/.wav/.flac/.ogg` to interleaved f32 PCM at the source's native sample rate / channel layout. Symphonia does NOT resample; that is rubato's job.
- `rubato = "0.15"` — sample-rate conversion to 16 kHz mono f32 (the input shape whisper.cpp expects). Use `rubato::FftFixedIn::new(input_sample_rate, 16_000, frames_per_chunk, sub_chunks, 1 /* channels after downmix */)` for fixed-input streaming; pre-mix multi-channel to mono via simple averaging before the resampler.
@@ -34,21 +34,21 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf`, `kb-parse-image`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf`, `kebab-parse-image`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | `kb-source-fs` |
| `RawAsset` | `kebab_core::RawAsset` | `kebab-source-fs` |
| audio bytes | `&[u8]` | filesystem |
| `kb-config.audio` | `{ model_path, language, chunk_seconds, n_threads, gpu }` | runtime |
| `kebab-config.audio` | `{ model_path, language, chunk_seconds, n_threads, gpu }` | runtime |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (`audio-segment-v1` chunker in p8-2) |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (`audio-segment-v1` chunker in p8-2) |
## Public surface (signatures only — no new types)
@@ -56,19 +56,19 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
pub trait Transcriber: Send + Sync {
fn engine(&self) -> &'static str;
fn engine_version(&self) -> String;
fn transcribe(&self, pcm_f32_16khz: &[f32], language_hint: Option<&kb_core::Lang>) -> anyhow::Result<kb_core::Transcript>;
fn transcribe(&self, pcm_f32_16khz: &[f32], language_hint: Option<&kebab_core::Lang>) -> anyhow::Result<kebab_core::Transcript>;
}
pub struct WhisperCppTranscriber { /* internal: whisper_rs::WhisperContext */ }
impl WhisperCppTranscriber { pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; }
impl WhisperCppTranscriber { pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>; }
impl Transcriber for WhisperCppTranscriber { /* per trait */ }
pub struct AudioExtractor { transcriber: std::sync::Arc<dyn Transcriber> }
impl AudioExtractor { pub fn new(transcriber: std::sync::Arc<dyn Transcriber>) -> Self; }
impl kb_core::Extractor for AudioExtractor {
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Audio(_)) }
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("audio-whisper-v1".into()) }
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>;
impl kebab_core::Extractor for AudioExtractor {
fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Audio(_)) }
fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("audio-whisper-v1".into()) }
fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
}
```
@@ -87,7 +87,7 @@ impl kb_core::Extractor for AudioExtractor {
- `provenance` events: `Discovered`, `Parsed`, `Transcribed`.
- `block_id` per design §4.2 with `block_kind = "audio_ref"`, `heading_path = []`, `ordinal = 0`, `source_span = SourceSpan::Time { start_ms: 0, end_ms: duration_ms }`.
- `WhisperCppTranscriber`:
- Loads model from `config.audio.model_path` (e.g., `~/.local/share/kb/models/whisper/ggml-large-v3.bin`).
- Loads model from `config.audio.model_path` (e.g., `~/.local/share/kebab/models/whisper/ggml-large-v3.bin`).
- Runs with `WhisperFullParams::new(SamplingStrategy::Greedy { best_of: 1 })` — deterministic.
- Streams in chunks of `config.audio.chunk_seconds` (default 30) to bound memory; aggregates segments.
- `Transcript.segments` populated with `start_ms`, `end_ms`, `text`, `confidence: Some(p)` from whisper's per-token probabilities (averaged), `speaker: None` (diarization is P+).
@@ -115,12 +115,12 @@ impl kb_core::Extractor for AudioExtractor {
| `#[ignore]` integration | 30-second Korean audio → segments_count > 1, language = "ko" | requires `large-v3` model |
| snapshot | CanonicalDocument JSON stable for short fixture | `fixtures/audio/hello.wav` |
All tests under `cargo test -p kb-parse-audio`. Mark slow/large-model tests `#[ignore]`.
All tests under `cargo test -p kebab-parse-audio`. Mark slow/large-model tests `#[ignore]`.
## Definition of Done
- [ ] `cargo check -p kb-parse-audio` passes
- [ ] `cargo test -p kb-parse-audio` passes (excluding `#[ignore]`)
- [ ] `cargo check -p kebab-parse-audio` passes
- [ ] `cargo test -p kebab-parse-audio` passes (excluding `#[ignore]`)
- [ ] No imports outside Allowed dependencies (resampler crate may be added — record in PR)
- [ ] First-run model download path documented (NOT performed by code; user responsibility)
- [ ] PR links design §3.4, §3.7a, §9.3

View File

@@ -1,12 +1,12 @@
---
phase: P8
component: kb-chunk (audio-segment-v1)
component: kebab-chunk (audio-segment-v1)
task_id: p8-2
title: "Audio segment chunker (audio-segment-v1)"
status: planned
depends_on: [p8-1]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.5 Chunk, §3.4 SourceSpan::Time, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning]
---
@@ -22,8 +22,8 @@ Per-medium chunker. Tiny but versioned — `chunk_id` depends on `chunker_versio
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`, `serde_json`
- `blake3` (policy_hash)
- `serde-json-canonicalizer`
@@ -31,30 +31,30 @@ Per-medium chunker. Tiny but versioned — `chunk_id` depends on `chunker_versio
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio` (consumes via `kb-core` only), `kb-normalize`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio` (consumes via `kebab-core` only), `kebab-normalize`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `CanonicalDocument` containing one `AudioRefBlock` with `Transcript` | `kb_core::CanonicalDocument` | p8-1 |
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` |
| `CanonicalDocument` containing one `AudioRefBlock` with `Transcript` | `kebab_core::CanonicalDocument` | p8-1 |
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite`, embedders |
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite`, embedders |
## Public surface (signatures only — no new types)
```rust
pub struct AudioSegmentV1Chunker;
impl kb_core::Chunker for AudioSegmentV1Chunker {
fn chunker_version(&self) -> kb_core::ChunkerVersion { kb_core::ChunkerVersion("audio-segment-v1".into()) }
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
impl kebab_core::Chunker for AudioSegmentV1Chunker {
fn chunker_version(&self) -> kebab_core::ChunkerVersion { kebab_core::ChunkerVersion("audio-segment-v1".into()) }
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
}
```
@@ -92,12 +92,12 @@ impl kb_core::Chunker for AudioSegmentV1Chunker {
| determinism | same input → same chunk_ids twice | inline |
| snapshot | `Vec<Chunk>` JSON for fixture transcript stable | `fixtures/audio/transcript-1.json` (constructed) |
All tests under `cargo test -p kb-chunk audio`.
All tests under `cargo test -p kebab-chunk audio`.
## Definition of Done
- [ ] `cargo check -p kb-chunk` passes (md-heading-v1 + pdf-page-v1 + audio-segment-v1 all coexist)
- [ ] `cargo test -p kb-chunk audio` passes
- [ ] `cargo check -p kebab-chunk` passes (md-heading-v1 + pdf-page-v1 + audio-segment-v1 all coexist)
- [ ] `cargo test -p kebab-chunk audio` passes
- [ ] Snapshot stable across two runs
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.5, §3.4 SourceSpan::Time, §4.2

View File

@@ -1,12 +1,12 @@
---
phase: P9
component: kb-tui (library view)
component: kebab-tui (library view)
task_id: p9-1
title: "Ratatui library list view + tag filter"
status: planned
depends_on: [p1-6]
unblocks: [p9-2, p9-3, p9-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [report §16.2 TUI (also tasks/phase-9-ui.md epic), design §3.7 SearchHit, design §1 UX scenes for shared key bindings]
---
@@ -14,7 +14,7 @@ contract_sections: [report §16.2 TUI (also tasks/phase-9-ui.md epic), design §
## Goal
Stand up a Ratatui app skeleton with a "Library" pane: list documents, filter by tag/lang, navigate. Establishes the global app loop, key dispatch, and `kb-app` integration point that the search/ask/inspect panes (p9-2..p9-4) extend.
Stand up a Ratatui app skeleton with a "Library" pane: list documents, filter by tag/lang, navigate. Establishes the global app loop, key dispatch, and `kebab-app` integration point that the search/ask/inspect panes (p9-2..p9-4) extend.
## Why now / why this size
@@ -22,9 +22,9 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-app` (facade — the only crate this binary touches besides `kb-core`/`kb-config`)
- `kebab-core`
- `kebab-config`
- `kebab-app` (facade — the only crate this binary touches besides `kebab-core`/`kebab-config`)
- `ratatui = "0.28"`
- `crossterm`
- `tracing`
@@ -32,15 +32,15 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (UI must go through `kb-app` only — this is the design §8 boundary)
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (UI must go through `kebab-app` only — this is the design §8 boundary)
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-app::list_docs(filter)` | facade call | runtime |
| `kebab-app::list_docs(filter)` | facade call | runtime |
| keyboard events | `crossterm` | terminal |
| `kb-config::Config` | runtime | env / file |
| `kebab-config::Config` | runtime | env / file |
## Outputs
@@ -57,7 +57,7 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
// state in WITHOUT modifying the App struct definition. This avoids merge
// conflicts when p9-2/3/4 land in parallel; only p9-1 ever changes `App`.
pub struct App {
pub config: kb_config::Config,
pub config: kebab_config::Config,
pub focus: Pane,
pub library: LibraryState, // owned by p9-1
pub search: Option<SearchState>, // populated by p9-2 (None until that crate links in)
@@ -73,7 +73,7 @@ pub struct AskState; // body filled by p9-3
pub struct InspectState; // body filled by p9-4
impl App {
pub fn new(config: kb_config::Config) -> anyhow::Result<Self>;
pub fn new(config: kebab_config::Config) -> anyhow::Result<Self>;
pub fn run(&mut self) -> anyhow::Result<()>; // blocking loop until quit
}
@@ -102,12 +102,12 @@ pub enum KeyOutcome { Continue, Quit, SwitchPane(Pane), Refresh }
- `Enter` → switch to Inspect pane (p9-4) on selected doc
- `q` or `Esc` → quit
- All facade calls run on the main thread (no async). For long calls, render a "loading…" state and call from a worker thread; bridge via `mpsc::channel` (this task may keep things synchronous and accept brief UI hangs for v1).
- Logging: `tracing` initialized to a file under `~/.local/state/kb/logs/`; never to stdout/stderr (so the TUI is not corrupted).
- Logging: `tracing` initialized to a file under `~/.local/state/kebab/logs/`; never to stdout/stderr (so the TUI is not corrupted).
- Error rendering: a popup overlay shows `error: {msg}\nhint: {hint}` from `anyhow::Error` chain; press any key to dismiss.
## Storage / wire effects
- Reads: `kb-app::list_docs` only.
- Reads: `kebab-app::list_docs` only.
- Writes: none.
## Test plan
@@ -118,16 +118,16 @@ pub enum KeyOutcome { Continue, Quit, SwitchPane(Pane), Refresh }
| unit | filter `f` opens edit overlay; `Enter` triggers refresh | inline |
| snapshot | rendered library with 3 docs + filter open produces stable frame buffer (use `ratatui::backend::TestBackend`) | inline |
| unit | error popup renders without panic on injected `anyhow::Error` | inline |
| integration | mocked `kb-app::list_docs` returning N docs renders all rows | inline |
| integration | mocked `kebab-app::list_docs` returning N docs renders all rows | inline |
All tests under `cargo test -p kb-tui library`.
All tests under `cargo test -p kebab-tui library`.
## Definition of Done
- [ ] `cargo check -p kb-tui` passes
- [ ] `cargo test -p kb-tui library` passes
- [ ] No imports outside `kb-core`, `kb-config`, `kb-app`
- [ ] `kb tui` (or `kb` if TUI is the default) launches and shows Library on a real terminal (manual smoke)
- [ ] `cargo check -p kebab-tui` passes
- [ ] `cargo test -p kebab-tui library` passes
- [ ] No imports outside `kebab-core`, `kebab-config`, `kebab-app`
- [ ] `kebab tui` (or `kebab` if TUI is the default) launches and shows Library on a real terminal (manual smoke)
- [ ] PR links design §8 module boundary, report §16.2 (TUI epic)
## Out of scope

View File

@@ -1,12 +1,12 @@
---
phase: P9
component: kb-tui (search pane)
component: kebab-tui (search pane)
task_id: p9-2
title: "TUI Search pane: input + result list + preview + editor jump"
status: planned
depends_on: [p2-2, p3-4, p9-1]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§1.5/1.6 search output, §3.7 SearchHit, §0 Q3 citation]
---
@@ -14,7 +14,7 @@ contract_sections: [§1.5/1.6 search output, §3.7 SearchHit, §0 Q3 citation]
## Goal
Add a Search pane to the TUI that drives `kb-app::search`, renders dense results (rank+score / path#frag / heading / snippet), and supports `g` (editor jump to citation) for the selected hit.
Add a Search pane to the TUI that drives `kebab-app::search`, renders dense results (rank+score / path#frag / heading / snippet), and supports `g` (editor jump to citation) for the selected hit.
## Why now / why this size
@@ -22,25 +22,25 @@ Search is the most-used surface. Confining it to one pane leverages the App skel
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-app`
- `kb-tui` (extends p9-1)
- `kebab-core`
- `kebab-config`
- `kebab-app`
- `kebab-tui` (extends p9-1)
- `ratatui`, `crossterm`
- `tracing`
- `thiserror`
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-app::search(query)` | facade | runtime |
| `kebab-app::search(query)` | facade | runtime |
| keyboard events | `crossterm` | terminal |
| selected hit's citation | `kb_core::Citation` | App state |
| selected hit's citation | `kebab_core::Citation` | App state |
## Outputs
@@ -54,16 +54,16 @@ Search is the most-used surface. Confining it to one pane leverages the App skel
```rust
pub fn render_search<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App);
pub fn handle_key_search(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
pub fn jump_to_citation(citation: &kb_core::Citation, editor_env: &str /* $EDITOR */) -> anyhow::Result<()>;
pub fn jump_to_citation(citation: &kebab_core::Citation, editor_env: &str /* $EDITOR */) -> anyhow::Result<()>;
```
This task fills the body of `kb_tui::SearchState` (forward-declared in p9-1). The `App` struct itself is NOT edited — only `SearchState` gets fields:
This task fills the body of `kebab_tui::SearchState` (forward-declared in p9-1). The `App` struct itself is NOT edited — only `SearchState` gets fields:
```rust
pub struct SearchState {
pub input: String,
pub mode: kb_core::SearchMode,
pub hits: Vec<kb_core::SearchHit>,
pub mode: kebab_core::SearchMode,
pub hits: Vec<kebab_core::SearchHit>,
pub selected_hit: usize,
pub last_query_at: Option<time::OffsetDateTime>, // debounce timer
}
@@ -73,7 +73,7 @@ The Library pane's keypress handler (in p9-1) sets `app.search = Some(SearchStat
## Behavior contract
- Layout: top input bar (search query + mode badge `[hybrid|lexical|vector]`), middle result list (one hit per 4 lines per design §1.5 dense format), bottom preview pane (full chunk text fetched lazily via `kb-app::inspect_chunk`).
- Layout: top input bar (search query + mode badge `[hybrid|lexical|vector]`), middle result list (one hit per 4 lines per design §1.5 dense format), bottom preview pane (full chunk text fetched lazily via `kebab-app::inspect_chunk`).
- Key bindings (Search pane):
- typing → updates `search_input`; debounced (200 ms) re-search
- `Tab` → cycles `search_mode` Lexical → Vector → Hybrid → Lexical
@@ -104,14 +104,14 @@ The Library pane's keypress handler (in p9-1) sets `app.search = Some(SearchStat
| unit | `j` / `k` move selection within bounds | inline |
| unit | `jump_to_citation` for `Line` builds `+<line> <path>` command (assert via mocked Command runner) | inline |
| snapshot | rendered Search pane with 3 hits + preview stable | TestBackend |
| integration | mocked `kb-app::search` returning fixture hits drives render | inline |
| integration | mocked `kebab-app::search` returning fixture hits drives render | inline |
All tests under `cargo test -p kb-tui search`.
All tests under `cargo test -p kebab-tui search`.
## Definition of Done
- [ ] `cargo check -p kb-tui` passes
- [ ] `cargo test -p kb-tui search` passes
- [ ] `cargo check -p kebab-tui` passes
- [ ] `cargo test -p kebab-tui search` passes
- [ ] `g` keybinding launches `$EDITOR` with correct `+<line>` argument (manual smoke against vim)
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §1.5/1.6, §3.7
@@ -125,5 +125,5 @@ All tests under `cargo test -p kb-tui search`.
## Risks / notes
- Suspending and restoring crossterm raw mode around the editor spawn is finicky; code defensively (RAII guard).
- Different editors take different jump syntaxes. Provide an env override `KB_EDITOR_JUMP_FORMAT="vim"` for users on exotic editors.
- Different editors take different jump syntaxes. Provide an env override `KEBAB_EDITOR_JUMP_FORMAT="vim"` for users on exotic editors.
- Long snippet text wrap: clamp to viewport width and ellipsize per design §1.5 (`…` already in dense template).

View File

@@ -1,12 +1,12 @@
---
phase: P9
component: kb-tui (ask pane)
component: kebab-tui (ask pane)
task_id: p9-3
title: "TUI Ask pane: streaming answer + citation links + --explain toggle"
status: planned
depends_on: [p4-3, p9-1]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§1.11.4 ask scenes, §2.3 Answer wire, §3.8 Answer]
---
@@ -14,7 +14,7 @@ contract_sections: [§1.11.4 ask scenes, §2.3 Answer wire, §3.8 Answer]
## Goal
Add an Ask pane that calls `kb-app::ask`, streams tokens into the answer area in real time, renders citation footnotes (default mode A), and toggles to `--explain` (mode B + retrieval trace) with a key.
Add an Ask pane that calls `kebab-app::ask`, streams tokens into the answer area in real time, renders citation footnotes (default mode A), and toggles to `--explain` (mode B + retrieval trace) with a key.
## Why now / why this size
@@ -22,23 +22,23 @@ Streaming UI is the only TUI piece that meaningfully differs from search/inspect
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-app`
- `kb-tui` (extends p9-1)
- `kebab-core`
- `kebab-config`
- `kebab-app`
- `kebab-tui` (extends p9-1)
- `ratatui`, `crossterm`
- `tracing`
- `thiserror`
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (only via `kb-app`), `kb-desktop`
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (only via `kebab-app`), `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-app::ask(query, AskOpts)` | facade | runtime |
| `kebab-app::ask(query, AskOpts)` | facade | runtime |
| keyboard events | `crossterm` | terminal |
## Outputs
@@ -46,7 +46,7 @@ Streaming UI is the only TUI piece that meaningfully differs from search/inspect
| output | type | downstream |
|--------|------|------------|
| Ratatui Ask pane render | terminal | user |
| `kb-app::ask` invocation with streaming closure | facade | RAG pipeline |
| `kebab-app::ask` invocation with streaming closure | facade | RAG pipeline |
## Public surface (signatures only — no new types)
@@ -55,7 +55,7 @@ pub fn render_ask<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ra
pub fn handle_key_ask(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
```
This task fills the body of `kb_tui::AskState` (forward-declared in p9-1). `App` is NOT edited — only `AskState` gets fields:
This task fills the body of `kebab_tui::AskState` (forward-declared in p9-1). `App` is NOT edited — only `AskState` gets fields:
```rust
pub struct AskState {
@@ -63,8 +63,8 @@ pub struct AskState {
pub explain: bool,
pub streaming: bool,
pub partial: String,
pub answer: Option<kb_core::Answer>,
pub thread: Option<std::thread::JoinHandle<anyhow::Result<kb_core::Answer>>>,
pub answer: Option<kebab_core::Answer>,
pub thread: Option<std::thread::JoinHandle<anyhow::Result<kebab_core::Answer>>>,
pub rx: Option<std::sync::mpsc::Receiver<String>>,
}
```
@@ -74,7 +74,7 @@ pub struct AskState {
## Behavior contract
- Layout: top input bar (`?` prompt, query text), middle answer area (rendered Markdown-light: paragraphs + inline `[N]` markers), bottom-right citations panel (numbered list of citations with `path#fragment` and section label), bottom-left status (`grounded ✓/✗ model prompt_v k chunks`).
- Submission: `Enter` triggers a worker thread that calls `kb-app::ask` with `AskOpts.stream_sink: Some(tx)` (`tx: mpsc::Sender<String>`). The thread holds the `tx`, the TUI holds the matching `rx` (set on `AskState.rx`). On each render frame the TUI drains `rx.try_iter()` into `state.partial`, no blocking.
- Submission: `Enter` triggers a worker thread that calls `kebab-app::ask` with `AskOpts.stream_sink: Some(tx)` (`tx: mpsc::Sender<String>`). The thread holds the `tx`, the TUI holds the matching `rx` (set on `AskState.rx`). On each render frame the TUI drains `rx.try_iter()` into `state.partial`, no blocking.
- Streaming: while `ask_streaming = true`, the Answer area shows `ask_partial` and a small "▍" cursor. When the worker finishes, `ask_answer` is populated and the citations panel switches to the final list.
- Refusal rendering:
- `grounded = false` and `refusal_reason = ScoreGate` → render the answer (which is the human-friendly "근거 부족…" message), citations show "가까운 후보".
@@ -85,12 +85,12 @@ pub struct AskState {
- `e` → toggle `ask_explain`; resubmit on next `Enter`. While explain ON, citations panel is replaced by the per-claim breakdown (mode B in design §1.2) and a footer shows the retrieval trace summary.
- `Esc` → switch back to Library pane (cancellation of an in-flight ask is best-effort: the worker thread continues but its final answer is dropped).
- `j` / `k` → scroll the answer area when oversized.
- All facade calls stay within `kb-app::ask` — never reach into `kb-rag` directly.
- All facade calls stay within `kebab-app::ask` — never reach into `kebab-rag` directly.
- Errors render as a popup overlay; do not crash the pane.
## Storage / wire effects
- Reads/writes via `kb-app::ask` which itself writes the `answers` row in `kb.sqlite`. The pane has no direct DB access.
- Reads/writes via `kebab-app::ask` which itself writes the `answers` row in `kebab.sqlite`. The pane has no direct DB access.
## Test plan
@@ -102,14 +102,14 @@ pub struct AskState {
| unit | refusal answer renders without citations panel index errors | inline |
| snapshot | rendered Ask pane mid-stream is stable | TestBackend |
| snapshot | rendered Ask pane after finished grounded answer is stable | TestBackend |
| integration | mocked `kb-app::ask` returning a canned `Answer` populates final state correctly | inline |
| integration | mocked `kebab-app::ask` returning a canned `Answer` populates final state correctly | inline |
All tests under `cargo test -p kb-tui ask`.
All tests under `cargo test -p kebab-tui ask`.
## Definition of Done
- [ ] `cargo check -p kb-tui` passes
- [ ] `cargo test -p kb-tui ask` passes
- [ ] `cargo check -p kebab-tui` passes
- [ ] `cargo test -p kebab-tui ask` passes
- [ ] No imports outside Allowed dependencies
- [ ] Manual smoke: stream tokens visible character-by-character against a real Ollama (or `MockLanguageModel`)
- [ ] PR links design §1.11.4, §2.3

View File

@@ -1,12 +1,12 @@
---
phase: P9
component: kb-tui (inspect pane)
component: kebab-tui (inspect pane)
task_id: p9-4
title: "TUI Inspect pane: document & chunk detail render"
status: planned
depends_on: [p1-6, p9-1]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§1 inspect output, §3.5 Chunk, §2.5 DocSummary, §2.6 ChunkInspection]
---
@@ -22,24 +22,24 @@ Inspect is read-only and has no external interactions; smallest possible pane. U
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-app`
- `kb-tui` (extends p9-1)
- `kebab-core`
- `kebab-config`
- `kebab-app`
- `kebab-tui` (extends p9-1)
- `ratatui`, `crossterm`
- `tracing`
- `thiserror`
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (only via `kb-app`), `kb-desktop`
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (only via `kebab-app`), `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-app::inspect_doc(id)` | facade | runtime |
| `kb-app::inspect_chunk(id)` | facade | runtime |
| `kebab-app::inspect_doc(id)` | facade | runtime |
| `kebab-app::inspect_chunk(id)` | facade | runtime |
| keyboard events | `crossterm` | terminal |
## Outputs
@@ -51,19 +51,19 @@ Inspect is read-only and has no external interactions; smallest possible pane. U
## Public surface (signatures only — no new types)
```rust
pub enum InspectTarget { Doc(kb_core::DocumentId), Chunk(kb_core::ChunkId) }
pub enum InspectTarget { Doc(kebab_core::DocumentId), Chunk(kebab_core::ChunkId) }
pub fn render_inspect<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App);
pub fn handle_key_inspect(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
```
This task fills the body of `kb_tui::InspectState` (forward-declared in p9-1). `App` is NOT edited.
This task fills the body of `kebab_tui::InspectState` (forward-declared in p9-1). `App` is NOT edited.
```rust
pub struct InspectState {
pub target: Option<InspectTarget>,
pub doc: Option<kb_core::CanonicalDocument>,
pub chunk: Option<kb_core::Chunk>,
pub doc: Option<kebab_core::CanonicalDocument>,
pub chunk: Option<kebab_core::Chunk>,
pub collapsed: std::collections::HashSet<&'static str>,
pub scroll: u16,
}
@@ -89,7 +89,7 @@ pub struct InspectState {
- `c` → collapse / expand currently focused section (focus is implicit by current scroll position; v1 may simplify by toggling all sections)
- `Esc` → return to previous pane (Library or Search)
- `Enter` → no-op (Inspect is terminal — no editor jump here; users use Search pane for jump)
- Loading: while `kb-app::inspect_doc` or `inspect_chunk` runs, show "loading…". On error, popup with hint.
- Loading: while `kebab-app::inspect_doc` or `inspect_chunk` runs, show "loading…". On error, popup with hint.
- Renders must conform to wire schemas `doc_summary.v1` (subset for header) and `chunk_inspection.v1`.
## Storage / wire effects
@@ -100,18 +100,18 @@ pub struct InspectState {
| kind | description | fixture / data |
|------|-------------|----------------|
| unit | switching to InspectTarget::Doc triggers `kb-app::inspect_doc` once | inline mock |
| unit | switching to InspectTarget::Doc triggers `kebab-app::inspect_doc` once | inline mock |
| unit | scroll bounded by content height | inline |
| unit | collapse toggle via `c` flips state | inline |
| snapshot | doc-view rendered for fixture stable | TestBackend + fixture |
| snapshot | chunk-view rendered for fixture stable | TestBackend + fixture |
All tests under `cargo test -p kb-tui inspect`.
All tests under `cargo test -p kebab-tui inspect`.
## Definition of Done
- [ ] `cargo check -p kb-tui` passes
- [ ] `cargo test -p kb-tui inspect` passes
- [ ] `cargo check -p kebab-tui` passes
- [ ] `cargo test -p kebab-tui inspect` passes
- [ ] No imports outside Allowed dependencies
- [ ] Manual smoke: inspect a doc with multiple chunks, scroll, return to library
- [ ] PR links design §3.5, §2.5, §2.6

View File

@@ -1,12 +1,12 @@
---
phase: P9
component: kb-desktop (Tauri)
component: kebab-desktop (Tauri)
task_id: p9-5
title: "Tauri desktop app: backend commands wrapping kb-app + multimodal source viewer"
title: "Tauri desktop app: backend commands wrapping kebab-app + multimodal source viewer"
status: planned
depends_on: [p9-1, p9-2, p9-3, p9-4]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [report §16.3 desktop (also tasks/phase-9-ui.md epic), design §1 ask/search scenes, design §2 wire schemas v1, design §8 module boundaries]
---
@@ -14,23 +14,23 @@ contract_sections: [report §16.3 desktop (also tasks/phase-9-ui.md epic), desig
## Goal
Stand up a Tauri 2.x app (`kb-desktop` crate as backend, `kb-desktop-frontend/` as web assets) whose Tauri commands wrap `kb-app` 1:1. The frontend renders multimodal source viewers (Markdown render, PDF page viewer, image viewer with region overlay, audio player with seek). Citation clicks route to the appropriate viewer.
Stand up a Tauri 2.x app (`kebab-desktop` crate as backend, `kebab-desktop-frontend/` as web assets) whose Tauri commands wrap `kebab-app` 1:1. The frontend renders multimodal source viewers (Markdown render, PDF page viewer, image viewer with region overlay, audio player with seek). Citation clicks route to the appropriate viewer.
## Why now / why this size
Last task. Combines all backend phases into a single user-facing surface. Strict policy: backend commands are thin wrappers over `kb-app`; no new business logic.
Last task. Combines all backend phases into a single user-facing surface. Strict policy: backend commands are thin wrappers over `kebab-app`; no new business logic.
## Allowed dependencies
- backend (`kb-desktop`):
- `kb-core`
- `kb-config`
- `kb-app`
- backend (`kebab-desktop`):
- `kebab-core`
- `kebab-config`
- `kebab-app`
- `tauri = "2"` + `tauri-build`
- `serde`, `serde_json`
- `tracing`
- `thiserror`
- frontend (`kb-desktop-frontend/`): vanilla TypeScript + Vite (default; user may swap to Svelte/Solid in a follow-up).
- frontend (`kebab-desktop-frontend/`): vanilla TypeScript + Vite (default; user may swap to Svelte/Solid in a follow-up).
- PDF rendering: `pdfjs-dist`
- Markdown rendering: `marked` + `dompurify`
- Audio: HTML `<audio>` with custom segment overlay
@@ -38,7 +38,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (UI must go through `kb-app` only — design §8).
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (UI must go through `kebab-app` only — design §8).
- **No native PDF render backend** (no `pdfium`, no `mupdf`, no `poppler`). PDF rendering lives entirely in the frontend (`pdfjs-dist`). Adding any of these would (a) bloat the bundle 100+ MB, (b) require frozen-design amendment, and (c) double the path-containment surface.
## Inputs
@@ -46,7 +46,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
| input | type | source |
|-------|------|--------|
| Tauri commands | invoked from frontend | user clicks |
| `kb-config::Config` | runtime | env / file |
| `kebab-config::Config` | runtime | env / file |
| user file system (read-only) | for source viewers | OS |
## Outputs
@@ -59,7 +59,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
## Public surface (signatures only — no new types)
```rust
// Tauri command surface (one per kb-app facade method, plus source viewers)
// Tauri command surface (one per kebab-app facade method, plus source viewers)
#[tauri::command] fn cmd_init(force: bool) -> Result<()>;
#[tauri::command] fn cmd_ingest(scope_json: serde_json::Value, summary_only: bool) -> Result<serde_json::Value /* IngestReportWireV1 */>;
#[tauri::command] fn cmd_list_docs(filter_json: serde_json::Value) -> Result<Vec<serde_json::Value /* DocSummaryWireV1 */>>;
@@ -75,11 +75,11 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
#[tauri::command] fn cmd_read_file_bytes(path: String) -> Result<Vec<u8>>; // raw bytes for PDF / image / audio
```
(All commands convert internal `kb-core` types to wire-schema-v1 JSON before returning.)
(All commands convert internal `kebab-core` types to wire-schema-v1 JSON before returning.)
## Behavior contract
- Backend bootstraps `tracing` to a file under `~/.local/state/kb/logs/` and a Tauri plugin loads/saves window state.
- Backend bootstraps `tracing` to a file under `~/.local/state/kebab/logs/` and a Tauri plugin loads/saves window state.
- Every Tauri command performs **path containment** for source viewers: resolves `path` against `config.workspace.root`, rejects (`anyhow::Error`) any path outside.
- Layout (frontend): left = Library + Search + Ask tabs; right = Source viewer keyed by current citation.
- Citation routing in the frontend (clicks on `[#N]` markers or hit rows). All rendering is frontend-side; backend serves raw bytes only.
@@ -88,23 +88,23 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
- `Citation::Region { path, x, y, w, h }``cmd_read_file_bytes(path)` → blob URL → `<img>` + absolute-positioned overlay at `(x, y, w, h)`.
- `Citation::Caption { path, model }` → same as Region but no overlay; caption banner shows `model`.
- `Citation::Time { path, start_ms, end_ms }``cmd_read_file_bytes(path)` → blob URL → `<audio src=...>` seeked to `start_ms / 1000`, with a timeline marker spanning `[start_ms, end_ms]`.
- Streaming `kb ask`: backend command `cmd_ask` returns the buffered Answer (per §0 Q5: pipe/JSON mode buffers). For real-time streaming in the desktop, expose a separate `cmd_ask_stream` event channel via Tauri's `Window::emit("kb://ask-token", payload)`. (Implementation can be deferred to a follow-up; v1 of the desktop accepts buffered.)
- Streaming `kebab ask`: backend command `cmd_ask` returns the buffered Answer (per §0 Q5: pipe/JSON mode buffers). For real-time streaming in the desktop, expose a separate `cmd_ask_stream` event channel via Tauri's `Window::emit("kebab://ask-token", payload)`. (Implementation can be deferred to a follow-up; v1 of the desktop accepts buffered.)
- All backend errors mapped to a `String` message with structure `{ "error": msg, "hint": Option<msg> }`.
- Frontend respects light/dark per OS theme (Tauri supplies the API).
- No telemetry. No automatic update channel for v1 (manual download).
## Storage / wire effects
- Reads via `kb-app` (which reads/writes via SQLite + LanceDB).
- Reads via `kebab-app` (which reads/writes via SQLite + LanceDB).
- Reads workspace files directly for source viewers (path-contained).
- Writes nothing outside what `kb-app` writes.
- Writes nothing outside what `kebab-app` writes.
- Wire JSON between backend and frontend uses schema v1 strictly. The frontend MUST validate `schema_version` strings on every IPC return and warn (or upgrade-gate) when `v1 != current`.
## Test plan
| kind | description | fixture / data |
|------|-------------|----------------|
| unit (backend) | each command wraps the corresponding `kb-app` function and serializes via wire schema | inline mocks |
| unit (backend) | each command wraps the corresponding `kebab-app` function and serializes via wire schema | inline mocks |
| unit (backend) | `cmd_read_markdown` rejects paths outside workspace | tmp config |
| unit (backend) | `cmd_read_file_bytes` rejects paths outside workspace incl. `..`, absolute path, symlink-out | tmp config + traversal vectors |
| unit (backend) | `cmd_read_file_bytes` returns identical bytes to `std::fs::read` for an in-workspace file | tmp config |
@@ -112,13 +112,13 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
| smoke (frontend, optional in this task) | Vitest test that mounts the Library tab, calls a mocked `cmd_list_docs`, renders 1 row | minimal |
| manual | full-stack smoke against a real ingested workspace (Markdown + 1 PDF + 1 image + 1 audio); each citation jumps correctly | manual checklist |
Backend tests under `cargo test -p kb-desktop`. Frontend tests are bonus and not gated by this task's DoD.
Backend tests under `cargo test -p kebab-desktop`. Frontend tests are bonus and not gated by this task's DoD.
## Definition of Done
- [ ] `cargo check -p kb-desktop` passes
- [ ] `cargo test -p kb-desktop` passes
- [ ] `pnpm --filter kb-desktop-frontend build` produces a static asset bundle Tauri can package
- [ ] `cargo check -p kebab-desktop` passes
- [ ] `cargo test -p kebab-desktop` passes
- [ ] `pnpm --filter kebab-desktop-frontend build` produces a static asset bundle Tauri can package
- [ ] `tauri build` produces an unsigned dmg on macOS in CI (signed/notarized are out of scope)
- [ ] Each Tauri command returns wire-schema-v1 JSON; frontend asserts `schema_version`
- [ ] No imports outside Allowed dependencies (backend)
@@ -139,4 +139,4 @@ Backend tests under `cargo test -p kb-desktop`. Frontend tests are bonus and not
- Path containment is the desktop's most security-sensitive surface; tests must include path traversal vectors (`..`, symlinks, absolute paths).
- PDF rendering via `pdfjs-dist` is heavy (~2 MB worker); lazy-load on first PDF citation. The trade-off vs a native render backend (e.g., `pdfium` ~150 MB binary, code-signing pain) is heavily one-sided; v1 stays on `pdfjs-dist`.
- Audio formats vary; rely on the browser engine's HTML audio decoder (WebKit on macOS supports `.m4a`, `.mp3`; mileage varies on `.flac`/`.ogg`).
- Wide Tauri command surface tempts business-logic creep; CI must enforce that no `kb-rag` / `kb-search` / store crate appears in `kb-desktop`'s `cargo tree`.
- Wide Tauri command surface tempts business-logic creep; CI must enforce that no `kebab-rag` / `kebab-search` / store crate appears in `kebab-desktop`'s `cargo tree`.

View File

@@ -3,7 +3,7 @@ phase: P0
title: "Workspace 뼈대 + 도메인 계약"
status: completed
depends_on: []
source: kb_local_rust_report.md §3, §4, §6, §7, §13
source: kebab_local_rust_report.md §3, §4, §6, §7, §13
---
# P0 — Workspace 뼈대 + 도메인 계약
@@ -16,11 +16,11 @@ compile 되는 Rust 2024 workspace 와 domain spec 확정. 이후 모든 phase
| crate | 역할 |
|-------|------|
| `kb-core` | domain type, trait, error, ID 규칙 |
| `kb-parse-types` | parser 중간 표현 (`ParsedBlock` 등) — `kb-core` 와 parsers/normalize 사이 thin layer (design §3.7b) |
| `kb-config` | config 로딩, 기본값, 경로 확장 |
| `kb-app` | facade. CLI/TUI/desktop 공통 진입점 |
| `kb-cli` | `kb` 바이너리 skeleton (`--help`만 동작) |
| `kebab-core` | domain type, trait, error, ID 규칙 |
| `kebab-parse-types` | parser 중간 표현 (`ParsedBlock` 등) — `kebab-core` 와 parsers/normalize 사이 thin layer (design §3.7b) |
| `kebab-config` | config 로딩, 기본값, 경로 확장 |
| `kebab-app` | facade. CLI/TUI/desktop 공통 진입점 |
| `kebab-cli` | `kebab` 바이너리 skeleton (`--help`만 동작) |
## Workspace 설정
@@ -29,7 +29,7 @@ compile 되는 Rust 2024 workspace 와 domain spec 확정. 이후 모든 phase
```toml
[workspace]
resolver = "3"
members = ["crates/kb-core", "crates/kb-parse-types", "crates/kb-config", "crates/kb-app", "crates/kb-cli"]
members = ["crates/kebab-core", "crates/kebab-parse-types", "crates/kebab-config", "crates/kebab-app", "crates/kebab-cli"]
[workspace.package]
edition = "2024"
@@ -48,11 +48,11 @@ tracing = "0.1"
추가 멤버 crate 는 후속 phase 에서 합류.
## kb-core 도메인 타입
## kebab-core 도메인 타입
`RawAsset`, `CanonicalDocument`, `Block`, `Chunk`, `SearchHit`, `Citation`, `SourceSpan`, `Provenance`, `Metadata`. report §6 정의 그대로.
## kb-core trait
## kebab-core trait
```rust
pub trait SourceConnector { fn scan(&self, scope: &SourceScope) -> anyhow::Result<Vec<RawAsset>>; }
@@ -78,7 +78,7 @@ index_id = collection name + index version + embedding model id
각 record 에 다음 version 필드 보존: `doc_version`, `schema_version`, `parser_version`, `chunker_version`, `embedding_model`, `embedding_version`, `index_version`, `prompt_template_version`.
## kb-app facade
## kebab-app facade
CLI/TUI/desktop 모두 facade 함수 호출. parser/DB/LLM adapter 직접 호출 금지. 초기 facade 메서드 stub:
@@ -91,9 +91,9 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
pub fn doctor() -> anyhow::Result<DoctorReport>;
```
## kb-cli skeleton
## kebab-cli skeleton
`clap` derive. subcommand: `init`, `ingest`, `index`, `search`, `ask`, `inspect doc|chunk`, `doctor`. 본체는 `kb-app` 호출만. P0 에선 `--help` 만 동작.
`clap` derive. subcommand: `init`, `ingest`, `index`, `search`, `ask`, `inspect doc|chunk`, `doctor`. 본체는 `kebab-app` 호출만. P0 에선 `--help` 만 동작.
## spec 문서
@@ -112,15 +112,15 @@ pub fn doctor() -> anyhow::Result<DoctorReport>;
## 의존성 경계
- `kb-core`: 외부 의존 최소 (serde, time, uuid, blake3, thiserror, tracing).
- `kb-cli``kb-app` 만 의존. parser/DB/LLM 직접 의존 금지.
- `kb-app` 은 trait 만 보고 동작. 구현체는 dyn injection.
- `kebab-core`: 외부 의존 최소 (serde, time, uuid, blake3, thiserror, tracing).
- `kebab-cli``kebab-app` 만 의존. parser/DB/LLM 직접 의존 금지.
- `kebab-app` 은 trait 만 보고 동작. 구현체는 dyn injection.
## 완료 조건
- [ ] `cargo check --workspace` 통과
- [ ] `cargo test --workspace` 통과 (단위 테스트는 ID 생성/도메인 직렬화 round-trip)
- [ ] `kb --help` 출력
- [ ] `kebab --help` 출력
- [ ] `docs/spec/*` 7개 문서 존재
- [ ] `fixtures/markdown/*` 3개 존재
- [ ] domain type serde JSON snapshot test 1개 이상

View File

@@ -3,47 +3,47 @@ phase: P1
title: "Markdown ingestion 파이프라인"
status: completed
depends_on: [P0]
source: kb_local_rust_report.md §8, §14, §17 Phase 1
source: kebab_local_rust_report.md §8, §14, §17 Phase 1
---
# P1 — Markdown ingestion 파이프라인
## 목표
`Markdown 파일 -> RawAsset -> CanonicalDocument -> Chunk -> SQLite` 흐름 완성. LLM/embedding 없이도 `kb ingest` / `kb list docs` / `kb inspect doc <id>` 동작.
`Markdown 파일 -> RawAsset -> CanonicalDocument -> Chunk -> SQLite` 흐름 완성. LLM/embedding 없이도 `kebab ingest` / `kebab list docs` / `kebab inspect doc <id>` 동작.
## 산출 crate
| crate | 역할 |
|-------|------|
| `kb-source-fs` | local folder scan, checksum, 변경 감지. `SourceConnector` 구현 |
| `kb-parse-md` | Markdown bytes → structured document. `Extractor` 구현 |
| `kb-normalize` | parser output → `CanonicalDocument` |
| `kb-chunk` | block-aware chunking. `Chunker` 구현 (`md-heading-v1`) |
| `kb-store-sqlite` | metadata, document, chunk, job table. FTS table 은 P2 에서 활성화 |
| `kebab-source-fs` | local folder scan, checksum, 변경 감지. `SourceConnector` 구현 |
| `kebab-parse-md` | Markdown bytes → structured document. `Extractor` 구현 |
| `kebab-normalize` | parser output → `CanonicalDocument` |
| `kebab-chunk` | block-aware chunking. `Chunker` 구현 (`md-heading-v1`) |
| `kebab-store-sqlite` | metadata, document, chunk, job table. FTS table 은 P2 에서 활성화 |
## kb-source-fs
## kebab-source-fs
- 입력: `SourceScope { root: PathBuf, include: Vec<Glob>, exclude: Vec<Glob> }`
- 동작: 재귀 walk → 각 파일 `blake3``RawAsset` 목록.
- 변경 감지: `(source_uri, checksum)` 기준 신/구 비교. 동일 checksum 은 skip.
- watch 모드는 P1 범위 밖 (config 만 정의, 구현 후순위).
## kb-parse-md
## kebab-parse-md
- parser 후보: `pulldown-cmark` 1차. GFM table/task list 필요해지면 `comrak` 검토 (§8).
- 보존 대상: YAML/TOML frontmatter, heading tree, paragraph, list, code block + lang tag, table, blockquote, link, image ref, **line range**.
- 출력: 중간 표현 (parser 고유). `kb-normalize` 가 canonical 로 변환.
- 출력: 중간 표현 (parser 고유). `kebab-normalize` 가 canonical 로 변환.
- malformed markdown: panic 금지. 가능한 부분만 보존하고 `Provenance` 에 warning 기록.
## kb-normalize
## kebab-normalize
- 책임: parser 중간 표현 → `CanonicalDocument`.
- frontmatter → `Metadata` (id, title, aliases, tags, created_at, updated_at, source_type, trust_level, lang).
- block 트리 평탄화 + `BlockId` 부여 (heading path + 순번 기반 deterministic).
- `SourceSpan``LineRange { start, end }` 또는 `ByteRange` 둘 다 허용. Markdown 은 line range 1차.
## kb-chunk (`md-heading-v1`)
## kebab-chunk (`md-heading-v1`)
우선순위 (§14):
1. heading boundary 우선
@@ -58,7 +58,7 @@ policy 기본값: `target_tokens = 500`, `overlap_tokens = 80`, `respect_markdow
token 추정: tokenizer 미도입 단계라 byte / 문자 기반 근사 OK. 실제 tokenizer 는 P3 embedding 도입 시 교체.
## kb-store-sqlite
## kebab-store-sqlite
스키마 (1차):
@@ -117,7 +117,7 @@ CREATE TABLE jobs (
- transaction: ingest 1건 = 1 transaction. 부분 실패 시 rollback.
- idempotent: 동일 `doc_id` 재수집은 UPSERT, version bump.
## kb-app facade 확장
## kebab-app facade 확장
```rust
pub fn ingest(scope: SourceScope) -> anyhow::Result<IngestReport>;
@@ -131,10 +131,10 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
## CLI
```text
kb ingest <path> [--include <glob>] [--exclude <glob>]
kb list docs [--tag <t>]
kb inspect doc <doc_id>
kb inspect chunk <chunk_id>
kebab ingest <path> [--include <glob>] [--exclude <glob>]
kebab list docs [--tag <t>]
kebab inspect doc <doc_id>
kebab inspect chunk <chunk_id>
```
## 테스트
@@ -146,14 +146,14 @@ kb inspect chunk <chunk_id>
## 의존성 경계
`kb-parse-md` 금지: `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, embedding 호출. parser 는 순수 함수.
`kebab-parse-md` 금지: `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, embedding 호출. parser 는 순수 함수.
## 완료 조건
- [ ] `kb ingest <path>` 실행 후 SQLite 에 documents/blocks/chunks 채워짐
- [ ] `kb list docs` 정상 출력
- [ ] `kb inspect doc <id>` JSON 출력
- [ ] `kb inspect chunk <id>` JSON 출력 (heading path + source span 포함)
- [ ] `kebab ingest <path>` 실행 후 SQLite 에 documents/blocks/chunks 채워짐
- [ ] `kebab list docs` 정상 출력
- [ ] `kebab inspect doc <id>` JSON 출력
- [ ] `kebab inspect chunk <id>` JSON 출력 (heading path + source span 포함)
- [ ] 같은 폴더 재수집 시 중복 row 없음
- [ ] parser/chunker version 변경 시 재처리 대상 식별 가능
- [ ] fixture snapshot test 통과

View File

@@ -3,19 +3,19 @@ phase: P2
title: "SQLite FTS5 lexical 검색 + citation"
status: completed
depends_on: [P1]
source: kb_local_rust_report.md §10, §15, §17 Phase 2
source: kebab_local_rust_report.md §10, §15, §17 Phase 2
---
# P2 — SQLite FTS5 lexical 검색 + citation
## 목표
embedding/LLM 없이 FTS5 만으로 동작하는 검색 + citation 출력. `kb search "..."` 가 chunk 와 source span 반환.
embedding/LLM 없이 FTS5 만으로 동작하는 검색 + citation 출력. `kebab search "..."` 가 chunk 와 source span 반환.
## 산출 crate
- `kb-search` (lexical 모드) — `Retriever` trait 구현 1번째.
- `kb-store-sqlite` 확장: FTS5 virtual table + trigger.
- `kebab-search` (lexical 모드) — `Retriever` trait 구현 1번째.
- `kebab-store-sqlite` 확장: FTS5 virtual table + trigger.
## FTS5 스키마
@@ -69,15 +69,15 @@ pub struct SearchHit {
}
```
`Citation` 형식: `notes/rust/kb.md:L12-L34`.
`Citation` 형식: `notes/rust/kebab.md:L12-L34`.
## 인덱스 라이프사이클
- ingest 시 trigger 로 자동 동기화.
- `kb index --rebuild-fts` command 로 FTS table 재구축 (chunker version bump 후 사용).
- `kebab index --rebuild-fts` command 로 FTS table 재구축 (chunker version bump 후 사용).
- `index_version``(schema_version, fts_config_hash)` 조합.
## kb-app facade 확장
## kebab-app facade 확장
```rust
pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHit>>;
@@ -86,16 +86,16 @@ pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHit>>;
## CLI
```text
kb search "Rust workspace 설계" [--k 10] [--tag rust] [--mode lexical]
kb index --rebuild-fts
kebab search "Rust workspace 설계" [--k 10] [--tag rust] [--mode lexical]
kebab index --rebuild-fts
```
출력 예:
```text
1. [0.82] Rust workspace는 여러 package를 하나로 관리한다…
doc: notes/rust/kb.md
citation: notes/rust/kb.md:L12-L34
doc: notes/rust/kebab.md
citation: notes/rust/kebab.md:L12-L34
heading: 아키텍처 > Rust workspace
```
@@ -108,13 +108,13 @@ kb index --rebuild-fts
## 의존성 경계
- `kb-search``kb-store-sqlite``kb-core` 만 의존.
- `kebab-search``kebab-store-sqlite``kebab-core` 만 의존.
- LLM/embedding 호출 금지 (P2 단계).
- CLI 는 `kb-app` 통해서만 호출.
- CLI 는 `kebab-app` 통해서만 호출.
## 완료 조건
- [ ] `kb search "..."` top-k chunk 반환
- [ ] `kebab search "..."` top-k chunk 반환
- [ ] 모든 결과에 citation 포함
- [ ] citation line range 가 원본과 일치
- [ ] 한영 혼합 query 동작 (한국어 토큰화 한계는 노트로)

View File

@@ -3,23 +3,23 @@ phase: P3
title: "Local embedding + LanceDB + hybrid search"
status: completed
depends_on: [P2]
source: kb_local_rust_report.md §10, §11, §15, §17 Phase 3
source: kebab_local_rust_report.md §10, §11, §15, §17 Phase 3
---
# P3 — Local embedding + LanceDB + hybrid search
## 목표
local embedding 으로 chunk vector 화 → LanceDB 저장 → vector 검색 + lexical 융합 (hybrid). `kb search --mode {lexical,vector,hybrid}` 동작.
local embedding 으로 chunk vector 화 → LanceDB 저장 → vector 검색 + lexical 융합 (hybrid). `kebab search --mode {lexical,vector,hybrid}` 동작.
## 산출 crate
| crate | 역할 |
|-------|------|
| `kb-embed` | `Embedder` trait + `EmbeddingInput`/output 타입 |
| `kb-embed-local` | `fastembed-rs` adapter (1차). later: Ollama embed endpoint, candle |
| `kb-store-vector` | LanceDB 연동. table 관리, upsert, vector search |
| `kb-search` | lexical + vector 병행 + score fusion |
| `kebab-embed` | `Embedder` trait + `EmbeddingInput`/output 타입 |
| `kebab-embed-local` | `fastembed-rs` adapter (1차). later: Ollama embed endpoint, candle |
| `kebab-store-vector` | LanceDB 연동. table 관리, upsert, vector search |
| `kebab-search` | lexical + vector 병행 + score fusion |
## Embedder
@@ -64,7 +64,7 @@ created_at : timestamp
## Indexing job
```text
kb index --embeddings [--model <id>] [--batch-size N] [--resume]
kebab index --embeddings [--model <id>] [--batch-size N] [--resume]
```
- chunk 중 `embedding_id = chunk_id + model_id + dim` 가 vector store 에 없는 것만 처리.
@@ -91,7 +91,7 @@ k_rrf 기본 60.
P3 범위에선 reranker 미도입 (P+ 단계 노트).
## kb-search 구조
## kebab-search 구조
```rust
pub struct HybridRetriever {
@@ -102,11 +102,11 @@ pub struct HybridRetriever {
```
- 각 sub retriever 는 `Retriever` trait 구현.
- `kb-app::search` 가 mode 따라 dispatch.
- `kebab-app::search` 가 mode 따라 dispatch.
## kb-app facade 확장
## kebab-app facade 확장
P3 동안 kb-app facade 의 `ingest` / `search` / `list_docs` / `inspect_doc` / `inspect_chunk` 의 stub 본체를 실제 라이브러리 호출로 대체. P0 부터 시그니처는 frozen 이므로 signature 변경 없이 body 만 swap.
P3 동안 kebab-app facade 의 `ingest` / `search` / `list_docs` / `inspect_doc` / `inspect_chunk` 의 stub 본체를 실제 라이브러리 호출로 대체. P0 부터 시그니처는 frozen 이므로 signature 변경 없이 body 만 swap.
```rust
pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>; // p3-5
@@ -117,14 +117,14 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>; // p4-3 (stub remains)
```
p3-5 는 LLM 미관여 facade 모두 (`ask` 제외) 를 한 번에 wire. 이후 `cargo run -p kb-cli -- index``cargo run -p kb-cli -- search --mode {lexical,vector,hybrid}` 가 실 동작.
p3-5 는 LLM 미관여 facade 모두 (`ask` 제외) 를 한 번에 wire. 이후 `cargo run -p kebab-cli -- index``cargo run -p kebab-cli -- search --mode {lexical,vector,hybrid}` 가 실 동작.
## CLI
```text
kb index --embeddings
kb search --mode vector "비슷한 설계 원칙"
kb search --mode hybrid "Markdown chunking 규칙"
kebab index --embeddings
kebab search --mode vector "비슷한 설계 원칙"
kebab search --mode hybrid "Markdown chunking 규칙"
```
## 테스트
@@ -137,19 +137,19 @@ kb search --mode hybrid "Markdown chunking 규칙"
## 의존성 경계
- `kb-embed-local` 만 ONNX/모델 binding 의존. 다른 crate 는 trait 만 사용.
- `kb-store-vector``lancedb` 의존. SQLite 와 cross-write 금지 (각 store 책임 분리).
- `kebab-embed-local` 만 ONNX/모델 binding 의존. 다른 crate 는 trait 만 사용.
- `kebab-store-vector``lancedb` 의존. SQLite 와 cross-write 금지 (각 store 책임 분리).
- LLM crate 와 분리 (§11.1).
## 완료 조건
- [ ] `kb index` (= `kb-app::ingest`) 로 모든 chunk 가 SQLite + LanceDB 에 저장 (p3-5)
- [ ] `kb search --mode vector` 정상 hit
- [ ] `kb search --mode hybrid` 정상 hit, citation 포함
- [ ] `kebab index` (= `kebab-app::ingest`) 로 모든 chunk 가 SQLite + LanceDB 에 저장 (p3-5)
- [ ] `kebab search --mode vector` 정상 hit
- [ ] `kebab search --mode hybrid` 정상 hit, citation 포함
- [ ] 모델/차원 변경 시 별도 table 로 분리 저장
- [ ] resume 시 미완료 chunk 만 처리 (P+ 로 deferred)
- [ ] hit@k 측정 가능한 형태로 결과 구조화 (P5 준비)
- [ ] `kb-app::{ingest,search,list_docs,inspect_doc,inspect_chunk}` 가 실 동작 (`ask` 는 P4-3 까지 stub) — p3-5
- [ ] `kebab-app::{ingest,search,list_docs,inspect_doc,inspect_chunk}` 가 실 동작 (`ask` 는 P4-3 까지 stub) — p3-5
## 리스크 / 주의

View File

@@ -3,22 +3,22 @@ phase: P4
title: "Local LLM + RAG + grounded answer"
status: completed
depends_on: [P3]
source: kb_local_rust_report.md §11, §15.2, §17 Phase 4
source: kebab_local_rust_report.md §11, §15.2, §17 Phase 4
---
# P4 — Local LLM + RAG + grounded answer
## 목표
local LLM 으로 citation 포함 답변 생성. 근거 부족 시 거절. `kb ask "..."` 동작.
local LLM 으로 citation 포함 답변 생성. 근거 부족 시 거절. `kebab ask "..."` 동작.
## 산출 crate
| crate | 역할 |
|-------|------|
| `kb-llm` | `LanguageModel` trait + request/response 타입 |
| `kb-llm-local` | Ollama adapter 1차. later: llama.cpp, candle |
| `kb-rag` | retrieval → context packing → prompt → generate → citation 검증 |
| `kebab-llm` | `LanguageModel` trait + request/response 타입 |
| `kebab-llm-local` | Ollama adapter 1차. later: llama.cpp, candle |
| `kebab-rag` | retrieval → context packing → prompt → generate → citation 검증 |
## LanguageModel
@@ -50,9 +50,9 @@ pub struct GenerateResponse {
- HTTP localhost 호출 (`http://127.0.0.1:11434/api/generate`).
- 내부에서 async runtime 사용 가능. 외부 API 는 동기 wrapper 유지.
- model 기본값 config (`qwen2.5:14b-instruct` 등). 실제 선택은 P5 eval 후 결정.
- 서버 미기동 시 명확한 에러 메시지 + `kb doctor` 진단.
- 서버 미기동 시 명확한 에러 메시지 + `kebab doctor` 진단.
## kb-rag 파이프라인
## kebab-rag 파이프라인
```text
query
@@ -118,7 +118,7 @@ pub struct Answer {
`answers` table 에 저장 (재현/감사용). 사용한 chunk_id 목록 + retrieval params 도 함께.
## kb-app facade 확장
## kebab-app facade 확장
```rust
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>;
@@ -127,9 +127,9 @@ pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>;
## CLI
```text
kb ask "내 KB 설계에서 저장소 전략은?"
kb ask --k 8 --temperature 0 "..."
kb ask --explain "..." # retrieval trace + packed prompt 출력
kebab ask "내 KB 설계에서 저장소 전략은?"
kebab ask --k 8 --temperature 0 "..."
kebab ask --explain "..." # retrieval trace + packed prompt 출력
```
## 테스트
@@ -142,13 +142,13 @@ kb ask --explain "..." # retrieval trace + packed prompt 출력
## 의존성 경계
- `kb-llm-local` 만 Ollama HTTP 의존.
- `kb-rag``kb-search` (Retriever trait) + `kb-llm` (LanguageModel trait) 만 사용. SQLite/LanceDB 직접 호출 금지.
- CLI 는 `kb-app::ask` 만 호출.
- `kebab-llm-local` 만 Ollama HTTP 의존.
- `kebab-rag``kebab-search` (Retriever trait) + `kebab-llm` (LanguageModel trait) 만 사용. SQLite/LanceDB 직접 호출 금지.
- CLI 는 `kebab-app::ask` 만 호출.
## 완료 조건
- [ ] `kb ask "..."` 동작
- [ ] `kebab ask "..."` 동작
- [ ] 답변에 citation 포함
- [ ] 근거 없는 질문 거절
- [ ] `--explain` 으로 retrieval trace 확인
@@ -158,6 +158,6 @@ kb ask --explain "..." # retrieval trace + packed prompt 출력
## 리스크 / 주의
- 모델 선택은 P5 golden set 으로 평가 후 확정. P4 에선 default 만.
- Ollama 미기동 / 모델 미다운로드 → `kb doctor` 가 명확히 안내.
- Ollama 미기동 / 모델 미다운로드 → `kebab doctor` 가 명확히 안내.
- LLM 답변에 hallucinated citation 자주 나옴. 후처리 검증이 핵심.
- prompt template 변경은 `prompt_template_version` 반드시 bump.

View File

@@ -3,7 +3,7 @@ phase: P5
title: "Golden query / regression eval"
status: completed
depends_on: [P4]
source: kb_local_rust_report.md §17 Phase 5, §18
source: kebab_local_rust_report.md §17 Phase 5, §18
---
# P5 — Golden query / regression eval
@@ -14,7 +14,7 @@ source: kb_local_rust_report.md §17 Phase 5, §18
## 산출 crate
- `kb-eval` — golden query 실행기, 지표 계산, report 생성.
- `kebab-eval` — golden query 실행기, 지표 계산, report 생성.
## Golden set fixture
@@ -25,9 +25,9 @@ source: kb_local_rust_report.md §17 Phase 5, §18
query: "Markdown chunking 규칙"
lang: ko
expected_doc_ids:
- doc:notes/rust/kb-architecture.md
- doc:notes/rust/kebab-architecture.md
expected_chunk_ids:
- chunk:notes/rust/kb-architecture.md#chunking-policy
- chunk:notes/rust/kebab-architecture.md#chunking-policy
must_contain:
- "heading"
- "code block"
@@ -57,9 +57,9 @@ source: kb_local_rust_report.md §17 Phase 5, §18
## 실행 모드
```text
kb eval run --suite golden [--mode {lexical,vector,hybrid}] [--with-rag]
kb eval compare <run_id_a> <run_id_b>
kb eval report <run_id> --format {json,md,html}
kebab eval run --suite golden [--mode {lexical,vector,hybrid}] [--with-rag]
kebab eval compare <run_id_a> <run_id_b>
kebab eval report <run_id> --format {json,md,html}
```
run record:
@@ -90,7 +90,7 @@ DB 저장 (`eval_runs`, `eval_query_results` table) 또는 JSON 파일. 재현
- 자동 hyperparameter 탐색 — 안 함.
- LLM judge ("LLM as a judge") — P5 범위 밖. groundedness 는 rule-based (`must_contain`) 만.
## kb-app facade 확장
## kebab-app facade 확장
```rust
pub fn eval_run(opts: EvalRunOpts) -> anyhow::Result<EvalRun>;
@@ -105,13 +105,13 @@ pub fn eval_compare(a: &str, b: &str) -> anyhow::Result<CompareReport>;
## 의존성 경계
- `kb-eval``kb-app` 만 호출 (검색/ask 는 facade 통해서). 내부 store/LLM 직접 호출 금지.
- `kebab-eval``kebab-app` 만 호출 (검색/ask 는 facade 통해서). 내부 store/LLM 직접 호출 금지.
## 완료 조건
- [ ] `fixtures/golden_queries.yaml` 30+ 개
- [ ] `kb eval run` 으로 hit@k, MRR, citation_coverage 산출
- [ ] `kb eval compare` 로 두 run 비교 가능
- [ ] `kebab eval run` 으로 hit@k, MRR, citation_coverage 산출
- [ ] `kebab eval compare` 로 두 run 비교 가능
- [ ] config snapshot 이 run 에 저장됨 (chunker, embedding, llm, prompt 버전)
- [ ] CI 로 회귀 감지 가능 (예: hit@5 가 baseline 대비 -3% 이상 떨어지면 실패)

View File

@@ -3,7 +3,7 @@ phase: P6
title: "이미지 ingestion (OCR + caption)"
status: planned
depends_on: [P5]
source: kb_local_rust_report.md §9.1, §17 Phase 6
source: kebab_local_rust_report.md §9.1, §17 Phase 6
---
# P6 — 이미지 ingestion
@@ -14,8 +14,8 @@ source: kb_local_rust_report.md §9.1, §17 Phase 6
## 산출 crate
- `kb-parse-image``Extractor` 구현. 이미지 → CanonicalDocument.
- (선택) `kb-ocr` / `kb-vlm` 어댑터 (외부 모델 분리 시).
- `kebab-parse-image``Extractor` 구현. 이미지 → CanonicalDocument.
- (선택) `kebab-ocr` / `kebab-vlm` 어댑터 (외부 모델 분리 시).
## 추출 정보 3종 (§9.1)
@@ -83,10 +83,10 @@ photos/diagram-2026.png#caption # caption chunk
## CLI
```text
kb ingest ./assets/diagram.png
kb ingest ./assets/ # 폴더 안 이미지 자동 인식
kb search "이미지 안의 OCR 텍스트"
kb inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
kebab ingest ./assets/diagram.png
kebab ingest ./assets/ # 폴더 안 이미지 자동 인식
kebab search "이미지 안의 OCR 텍스트"
kebab inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
```
## 테스트
@@ -99,13 +99,13 @@ kb inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
## 의존성 경계
- `kb-parse-image``kb-core` + 이미지 디코딩 (`image` crate) + OCR adapter 만.
- `kebab-parse-image``kebab-core` + 이미지 디코딩 (`image` crate) + OCR adapter 만.
- LLM/embedding 호출 금지 (caption 은 별도 adapter 통해).
- VLM caption 은 background job. ingest blocking 금지.
## 완료 조건
- [ ] `kb ingest <image>` 동작
- [ ] `kebab ingest <image>` 동작
- [ ] OCR text 검색 가능
- [ ] OCR region citation 출력
- [ ] caption 과 observed text provenance 분리

View File

@@ -3,7 +3,7 @@ phase: P7
title: "PDF text extraction + page citation"
status: planned
depends_on: [P5]
source: kb_local_rust_report.md §9.2, §17 Phase 7
source: kebab_local_rust_report.md §9.2, §17 Phase 7
---
# P7 — PDF ingestion
@@ -14,7 +14,7 @@ text PDF 추출 → page-aware chunking → citation `paper.pdf:p13`. scanned PD
## 산출 crate
- `kb-parse-pdf``Extractor` 구현.
- `kebab-parse-pdf``Extractor` 구현.
## 단계 분리 (§9.2)
@@ -63,10 +63,10 @@ paper.pdf:p13:span=0-1240 # char range within page
## CLI
```text
kb ingest ./paper.pdf
kb ingest ./papers/
kb search "PDF 안의 특정 개념"
kb inspect doc <pdf_doc_id>
kebab ingest ./paper.pdf
kebab ingest ./papers/
kebab search "PDF 안의 특정 개념"
kebab inspect doc <pdf_doc_id>
```
## 테스트
@@ -79,12 +79,12 @@ kb inspect doc <pdf_doc_id>
## 의존성 경계
- `kb-parse-pdf``kb-core` + `pdf-extract` / `lopdf` 만.
- `kebab-parse-pdf``kebab-core` + `pdf-extract` / `lopdf` 만.
- OCR 호출은 별도 adapter 통해 (P6 OCR 인프라 재사용).
## 완료 조건
- [ ] `kb ingest <pdf>` 동작
- [ ] `kebab ingest <pdf>` 동작
- [ ] page-level chunk + citation
- [ ] 검색 결과에 `paper.pdf:p<n>` 포함
- [ ] 추출 실패 페이지에 대한 provenance warning

View File

@@ -3,7 +3,7 @@ phase: P8
title: "음성 transcription + timestamp citation"
status: planned
depends_on: [P5]
source: kb_local_rust_report.md §9.3, §17 Phase 8
source: kebab_local_rust_report.md §9.3, §17 Phase 8
---
# P8 — 음성 ingestion
@@ -14,8 +14,8 @@ audio 파일 → transcript (timestamped segment) → CanonicalDocument → 동
## 산출 crate
- `kb-parse-audio``Extractor` 구현.
- `kb-asr-whisper` (또는 `kb-parse-audio` 내부 모듈) — whisper.cpp adapter.
- `kebab-parse-audio``Extractor` 구현.
- `kebab-asr-whisper` (또는 `kebab-parse-audio` 내부 모듈) — whisper.cpp adapter.
## 파이프라인 (§9.3)
@@ -94,11 +94,11 @@ meeting-2026-04-27.m4a:00:13:42-00:14:10:speaker=S1
## CLI
```text
kb ingest ./meeting.m4a
kb ingest ./recordings/
kb search "회의에서 언급한 결정사항"
kb inspect doc <audio_doc_id> # transcript + segment timestamp 표시
kb play <chunk_id> # (선택) 해당 구간 재생 — 후순위
kebab ingest ./meeting.m4a
kebab ingest ./recordings/
kebab search "회의에서 언급한 결정사항"
kebab inspect doc <audio_doc_id> # transcript + segment timestamp 표시
kebab play <chunk_id> # (선택) 해당 구간 재생 — 후순위
```
## 테스트
@@ -111,12 +111,12 @@ kb play <chunk_id> # (선택) 해당 구간 재생 — 후순위
## 의존성 경계
- `kb-parse-audio``kb-core` + `Transcriber` adapter 만.
- `kebab-parse-audio``kebab-core` + `Transcriber` adapter 만.
- LLM 호출 금지. RAG 단계는 transcript text 기반으로 동일 파이프라인.
## 완료 조건
- [ ] `kb ingest <audio>` 동작
- [ ] `kebab ingest <audio>` 동작
- [ ] transcript 가 segment timestamp 와 함께 저장
- [ ] 검색 결과에 `00:hh:mm:ss-` citation 포함
- [ ] 동일 오디오 재수집 idempotent

View File

@@ -3,7 +3,7 @@ phase: P9
title: "TUI + desktop app"
status: planned
depends_on: [P5]
source: kb_local_rust_report.md §16, §17 Phase 9
source: kebab_local_rust_report.md §16, §17 Phase 9
---
# P9 — TUI + desktop app
@@ -15,18 +15,18 @@ CLI 위에 사용성 레이어 추가. domain/검색/RAG 가 안정된 뒤 마
## 순서
```text
kb-tui (먼저) → kb-desktop (나중)
kebab-tui (먼저) → kebab-desktop (나중)
```
이유: TUI 는 domain 변화에 빠르게 적응 가능. desktop 은 packaging/배포 비용 큼.
---
## P9.A — kb-tui (Ratatui)
## P9.A — kebab-tui (Ratatui)
### 산출 crate
- `kb-tui` — Ratatui + crossterm 기반 terminal UI.
- `kebab-tui` — Ratatui + crossterm 기반 terminal UI.
### 화면 구성
@@ -51,8 +51,8 @@ q : 종료
### 의존성 경계
- `kb-tui``kb-app` 만. parser/store/LLM 직접 호출 금지.
- 비동기 I/O (검색/ask) 는 `kb-app` 비동기 wrapper 또는 thread + channel.
- `kebab-tui``kebab-app` 만. parser/store/LLM 직접 호출 금지.
- 비동기 I/O (검색/ask) 는 `kebab-app` 비동기 wrapper 또는 thread + channel.
### 완료 조건
@@ -63,7 +63,7 @@ q : 종료
---
## P9.B — kb-desktop
## P9.B — kebab-desktop
### 후보 비교
@@ -72,14 +72,14 @@ q : 종료
| Tauri | Rust backend + web frontend. native webview, 작은 binary | web frontend 별도 stack (TS/JS) |
| egui/eframe | 순수 Rust, immediate-mode | 디자인 자유도/접근성 한계 |
추천: Tauri 1차. 기존 `kb-app` facade 그대로 backend 로 노출. frontend 는 가볍게 (svelte/solid/vanilla).
추천: Tauri 1차. 기존 `kebab-app` facade 그대로 backend 로 노출. frontend 는 가볍게 (svelte/solid/vanilla).
### 산출 crate / 구조
- `kb-desktop` (Tauri app crate)
- `kb-desktop-frontend/` (web 자산)
- `kebab-desktop` (Tauri app crate)
- `kebab-desktop-frontend/` (web 자산)
Tauri command 는 `kb-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가 금지.
Tauri command 는 `kebab-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가 금지.
### 화면 구성 (1차)
@@ -100,7 +100,7 @@ Tauri command 는 `kb-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가
### 의존성 경계
- frontend 는 Tauri command (= `kb-app` wrapper) 만 호출. SQLite/LanceDB 직접 접근 금지.
- frontend 는 Tauri command (= `kebab-app` wrapper) 만 호출. SQLite/LanceDB 직접 접근 금지.
- 모델 다운로드/실행은 backend 책임.
### 완료 조건