docs(rename): kb → kebab — README, tasks/, docs/, design doc, report
마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신. - 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈 path 표기 `kb_*` → `kebab_*` 포함). - 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`, `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`, `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때 같은 prefix 사용). - CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` / `kb doctor` / `kb inspect` / `kb list` / `kb eval` → `kebab <verb>`. fenced code block + 인라인 backtick 모두. - XDG paths + env vars + binary 경로 (`target/release/kb` → `target/release/kebab`) 동기화. - design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task spec 모든 reference 통일. - task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history 기록용 author 정보라 그대로 유지 (실제 git history 의 author 는 변경 불가). - `tasks/phase-5-evaluation.md` 의 `status: planned` → `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분). ## 검증 - `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]" --include="*.md"` 0 hits (task-decomposition.md 의 git author 제외). - 모든 file path reference 살아있음 (renamed file 들 모두 새 path 로 update). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -14,30 +14,30 @@ historical contract that was implemented; this file accumulates the
|
||||
deltas so phase 5+ readers can find the live behavior without diffing
|
||||
git history.
|
||||
|
||||
## 2026-05-01 — `--config` flag silently ignored across all kb-cli subcommands
|
||||
## 2026-05-01 — `--config` flag silently ignored across all kebab-cli subcommands
|
||||
|
||||
**Discovered**: post-P3-5 manual smoke at `/tmp/kb-smoke/`.
|
||||
**Discovered**: post-P3-5 manual smoke at `/tmp/kebab-smoke/`.
|
||||
|
||||
**Symptom**: `kb --config /path/to/config.toml ingest|search|list|inspect|doctor` ignored the flag and fell back to `~/.config/kb/config.toml` (XDG default). Users had to use `KB_*` env vars to point at a non-default config.
|
||||
**Symptom**: `kebab --config /path/to/config.toml ingest|search|list|inspect|doctor` ignored the flag and fell back to `~/.config/kebab/config.toml` (XDG default). Users had to use `KEBAB_*` env vars to point at a non-default config.
|
||||
|
||||
**Root cause**: `kb-cli` read `cli.config` only inside `Cmd::Ingest` to build `SourceScope`, then called bare `kb_app::ingest(scope, summary_only)` which internally re-loaded `Config::load(None)` (XDG path). Same pattern in `Cmd::Search` / `List` / `Inspect` / `Doctor`. P3-5 introduced `*_with_config` test seams via `#[doc(hidden)] pub fn` but kb-cli never used them.
|
||||
**Root cause**: `kebab-cli` read `cli.config` only inside `Cmd::Ingest` to build `SourceScope`, then called bare `kebab_app::ingest(scope, summary_only)` which internally re-loaded `Config::load(None)` (XDG path). Same pattern in `Cmd::Search` / `List` / `Inspect` / `Doctor`. P3-5 introduced `*_with_config` test seams via `#[doc(hidden)] pub fn` but kebab-cli never used them.
|
||||
|
||||
**Fix** (PR #20, fix/cli-config-flag-and-search-output):
|
||||
- `kb-cli` now builds the Config once via `Config::load(cli.config.as_deref())` at the top of every subcommand and threads it into `kb_app::*_with_config(cfg, ...)` instead of `kb_app::*(...)`.
|
||||
- `kb_app::doctor()` rewritten as `doctor_with_config_path(Option<&Path>)` that reports the actual path probed and hard-fails when `--config <path>` doesn't exist (defaults would otherwise mask user intent).
|
||||
- `kb-app` module doc-comment updated: `#[doc(hidden)] pub fn *_with_config` is no longer "test-only seam" — it's the official "config-explicit" API consumed by CLI `--config`, integration tests, and TUI sessions.
|
||||
- Same PR also improved `kb search` printer: `{:.4}` score formatting (RRF range collapses on `{:.2}`) and `> heading_path` suffix so chunks from the same document are visually distinct.
|
||||
- `kebab-cli` now builds the Config once via `Config::load(cli.config.as_deref())` at the top of every subcommand and threads it into `kebab_app::*_with_config(cfg, ...)` instead of `kebab_app::*(...)`.
|
||||
- `kebab_app::doctor()` rewritten as `doctor_with_config_path(Option<&Path>)` that reports the actual path probed and hard-fails when `--config <path>` doesn't exist (defaults would otherwise mask user intent).
|
||||
- `kebab-app` module doc-comment updated: `#[doc(hidden)] pub fn *_with_config` is no longer "test-only seam" — it's the official "config-explicit" API consumed by CLI `--config`, integration tests, and TUI sessions.
|
||||
- Same PR also improved `kebab search` printer: `{:.4}` score formatting (RRF range collapses on `{:.2}`) and `> heading_path` suffix so chunks from the same document are visually distinct.
|
||||
|
||||
**Amends**: tasks/p3/p3-5-app-wiring.md (the test seam was always meant to be the config-explicit API; only the doc-comment lied).
|
||||
|
||||
### 2026-05-01 — `--config` regression in `kb ask` (P4-3 follow-up)
|
||||
### 2026-05-01 — `--config` regression in `kebab ask` (P4-3 follow-up)
|
||||
|
||||
**Discovered**: post-P4-3 manual smoke against 192.168.0.47 Ollama with `gemma4:26b`.
|
||||
|
||||
**Symptom**: `kb --config <path> ask` returned `model.id = qwen2.5:14b-instruct` (XDG default model) and `score_gate = 0.30` (XDG default), instead of `gemma4:26b` / `0.05` from the explicit config. P4-3 added the ask body but kb-cli's `Cmd::Ask` arm still called bare `kb_app::ask(query, opts)` — same regression class as the P3-5 fix above, just missed when ask was wired.
|
||||
**Symptom**: `kebab --config <path> ask` returned `model.id = qwen2.5:14b-instruct` (XDG default model) and `score_gate = 0.30` (XDG default), instead of `gemma4:26b` / `0.05` from the explicit config. P4-3 added the ask body but kebab-cli's `Cmd::Ask` arm still called bare `kebab_app::ask(query, opts)` — same regression class as the P3-5 fix above, just missed when ask was wired.
|
||||
|
||||
**Fix** (PR #24, fix/cli-ask-honor-config-flag):
|
||||
- `kb-cli` builds `Config::load(cli.config.as_deref())` once at the top of `Cmd::Ask` and calls `kb_app::ask_with_config(cfg, query, opts)`.
|
||||
- `kebab-cli` builds `Config::load(cli.config.as_deref())` once at the top of `Cmd::Ask` and calls `kebab_app::ask_with_config(cfg, query, opts)`.
|
||||
|
||||
**Amends**: tasks/p4/p4-3-rag-pipeline.md.
|
||||
|
||||
@@ -48,15 +48,15 @@ git history.
|
||||
**Root cause**: RRF formula `score(c) = Σ 1/(k_rrf + rank_m(c))` produces values bounded by `num_retrievers / (k_rrf + 1)`. With `num_retrievers = 2` and the default `k_rrf = 60`, the upper bound is `2/61 ≈ 0.0328`. The default `config.rag.score_gate = 0.05` was calibrated for vector / lexical scores already in `[0, 1]` and silently refused every hybrid query. `fusion_score` was also incomparable across modes — Lexical / Vector lived in `[0, 1]`, Hybrid lived in `(0, 0.033]`.
|
||||
|
||||
**Fix** (PR #25, fix/rrf-fusion-score-normalize-and-docs):
|
||||
- `crates/kb-search/src/hybrid.rs` divides every raw RRF score by `2 / (k_rrf + 1)` so `fusion_score` always lives in `[0, 1]` regardless of mode. Both retrievers contributing rank 1 normalises to `1.0`; chunks present in only one retriever cap around `0.5`. RRF's rank-ordering invariants are preserved (same constant divides every score), so sort + tiebreak behaviour is identical.
|
||||
- `crates/kebab-search/src/hybrid.rs` divides every raw RRF score by `2 / (k_rrf + 1)` so `fusion_score` always lives in `[0, 1]` regardless of mode. Both retrievers contributing rank 1 normalises to `1.0`; chunks present in only one retriever cap around `0.5`. RRF's rank-ordering invariants are preserved (same constant divides every score), so sort + tiebreak behaviour is identical.
|
||||
- One unit test (`rrf_formula_matches_known_value`) updated to expect the normalised value `(1/61 + 1/62) / (2/61) ≈ 0.9919`.
|
||||
- The integration snapshot `crates/kb-search/tests/fixtures/search/hybrid/run-1.json` already used presence checks (`fusion_score_positive: true`) rather than absolute values, so it didn't need regeneration.
|
||||
- The integration snapshot `crates/kebab-search/tests/fixtures/search/hybrid/run-1.json` already used presence checks (`fusion_score_positive: true`) rather than absolute values, so it didn't need regeneration.
|
||||
|
||||
**Why not a per-mode `score_gate` config**: separate `lexical_score_gate / vector_score_gate / hybrid_score_gate` would force every downstream consumer (CLI, eval, TUI) to know which mode picks which threshold. Normalising the score itself is a one-line change at the source and makes `Answer.retrieval.score_gate` semantically meaningful without per-mode bookkeeping.
|
||||
|
||||
**Amends**: tasks/p3/p3-4-hybrid-fusion.md (RRF formula now divides by `2/(k_rrf+1)` after summation), tasks/phase-3-vector-hybrid.md (RRF section).
|
||||
|
||||
**Verification**: post-fix smoke at `/tmp/kb-smoke/` with default `score_gate = 0.05` succeeded across four scenarios — Korean→Korean, English→English, cross-language, and out-of-corpus refusal.
|
||||
**Verification**: post-fix smoke at `/tmp/kebab-smoke/` with default `score_gate = 0.05` succeeded across four scenarios — Korean→Korean, English→English, cross-language, and out-of-corpus refusal.
|
||||
|
||||
## How to add an entry
|
||||
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
title: "KB 작업 단위 인덱스"
|
||||
source: kb_local_rust_report.md
|
||||
source: kebab_local_rust_report.md
|
||||
date: 2026-04-27
|
||||
---
|
||||
|
||||
# KB 작업 단위 인덱스
|
||||
|
||||
[`kb_local_rust_report.md`](../kb_local_rust_report.md) 의 Phase 로드맵을 아키텍처 수준 작업 단위로 분해. 각 task 문서는 독립적으로 착수/검수 가능한 단위.
|
||||
[`kebab_local_rust_report.md`](../kebab_local_rust_report.md) 의 Phase 로드맵을 아키텍처 수준 작업 단위로 분해. 각 task 문서는 독립적으로 착수/검수 가능한 단위.
|
||||
|
||||
## 의존 그래프
|
||||
|
||||
@@ -25,16 +25,16 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.
|
||||
|
||||
| # | 코드 | 제목 | 핵심 산출 crate | 선행 |
|
||||
|---|------|------|----------------|------|
|
||||
| P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kb-core, kb-parse-types, kb-config, kb-app, kb-cli | – |
|
||||
| P1 | [phase-1-markdown-ingestion.md](phase-1-markdown-ingestion.md) | Markdown ingestion 파이프라인 | kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-sqlite | P0 |
|
||||
| P2 | [phase-2-lexical-search.md](phase-2-lexical-search.md) | SQLite FTS5 lexical 검색 + citation | kb-search (lexical) | P1 |
|
||||
| P3 | [phase-3-vector-hybrid.md](phase-3-vector-hybrid.md) | Local embedding + LanceDB + hybrid | kb-embed, kb-embed-local, kb-store-vector, kb-search | P2 |
|
||||
| P4 | [phase-4-local-llm-rag.md](phase-4-local-llm-rag.md) | Local LLM + RAG + grounded answer | kb-llm, kb-llm-local, kb-rag | P3 |
|
||||
| P5 | [phase-5-evaluation.md](phase-5-evaluation.md) | Golden query / regression eval | kb-eval | P4 |
|
||||
| P6 | [phase-6-image.md](phase-6-image.md) | 이미지 ingestion (OCR + caption) | kb-parse-image | P5 |
|
||||
| P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kb-parse-pdf | P5 |
|
||||
| P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kb-parse-audio | P5 |
|
||||
| P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kb-tui, kb-desktop | P5 |
|
||||
| P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kebab-core, kebab-parse-types, kebab-config, kebab-app, kebab-cli | – |
|
||||
| P1 | [phase-1-markdown-ingestion.md](phase-1-markdown-ingestion.md) | Markdown ingestion 파이프라인 | kebab-source-fs, kebab-parse-md, kebab-normalize, kebab-chunk, kebab-store-sqlite | P0 |
|
||||
| P2 | [phase-2-lexical-search.md](phase-2-lexical-search.md) | SQLite FTS5 lexical 검색 + citation | kebab-search (lexical) | P1 |
|
||||
| P3 | [phase-3-vector-hybrid.md](phase-3-vector-hybrid.md) | Local embedding + LanceDB + hybrid | kebab-embed, kebab-embed-local, kebab-store-vector, kebab-search | P2 |
|
||||
| P4 | [phase-4-local-llm-rag.md](phase-4-local-llm-rag.md) | Local LLM + RAG + grounded answer | kebab-llm, kebab-llm-local, kebab-rag | P3 |
|
||||
| P5 | [phase-5-evaluation.md](phase-5-evaluation.md) | Golden query / regression eval | kebab-eval | P4 |
|
||||
| P6 | [phase-6-image.md](phase-6-image.md) | 이미지 ingestion (OCR + caption) | kebab-parse-image | P5 |
|
||||
| P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kebab-parse-pdf | P5 |
|
||||
| P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kebab-parse-audio | P5 |
|
||||
| P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kebab-tui, kebab-desktop | P5 |
|
||||
|
||||
## Component task decomposition (per phase)
|
||||
|
||||
|
||||
@@ -6,7 +6,7 @@ title: "<Component title>"
|
||||
status: planned
|
||||
depends_on: [] # other task_ids
|
||||
unblocks: [] # other task_ids
|
||||
contract_source: ../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
|
||||
---
|
||||
|
||||
@@ -22,7 +22,7 @@ contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kebab-core`
|
||||
- <other crates per design §8>
|
||||
- <external crates with versions>
|
||||
|
||||
@@ -71,7 +71,7 @@ If any item here is needed during implementation, STOP and update the frozen des
|
||||
| unit | ... | ... |
|
||||
| snapshot | ... (JSON freeze) | `fixtures/...` |
|
||||
| contract | trait round-trip | mock impls |
|
||||
| integration | end-to-end via `kb-app` facade | tmp workspace |
|
||||
| integration | end-to-end via `kebab-app` facade | tmp workspace |
|
||||
|
||||
All tests must run under `cargo test -p <crate>` and not require external network or Ollama unless explicitly stated.
|
||||
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P0
|
||||
component: workspace + kb-core + kb-config + kb-app + kb-cli
|
||||
component: workspace + kebab-core + kebab-config + kebab-app + kebab-cli
|
||||
task_id: p0-1
|
||||
title: "Workspace skeleton + frozen domain types/traits + ID recipe + facade"
|
||||
status: completed
|
||||
depends_on: []
|
||||
unblocks: [p1-1, p1-2, p1-3, p1-4, p1-5, p1-6, p2-1, p2-2, p3-1, p3-2, p3-3, p3-4, p4-1, p4-2, p4-3, p5-1, p5-2, p6-1, p6-2, p6-3, p7-1, p7-2, p8-1, p8-2, p9-1, p9-2, p9-3, p9-4, p9-5]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config + XDG), §7 (all traits), §8 module boundaries, §9 versioning, §10 errors+exit codes, §2.8 wire schema_version]
|
||||
---
|
||||
|
||||
@@ -14,56 +14,56 @@ contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config +
|
||||
|
||||
## Goal
|
||||
|
||||
Stand up the Cargo workspace (Rust 2024, resolver=3) with `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts.
|
||||
Stand up the Cargo workspace (Rust 2024, resolver=3) with `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Every other task imports `kb-core`. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them.
|
||||
Every other task imports `kebab-core`. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- workspace `[workspace.dependencies]`: `anyhow = "1"`, `thiserror = "2"`, `serde = { version = "1", features = ["derive"] }`, `serde_json = "1"`, `time = { version = "0.3", features = ["serde", "macros"] }`, `uuid = { version = "1", features = ["v7", "serde"] }`, `blake3 = "1"`, `tracing = "0.1"`
|
||||
- per crate:
|
||||
- `kb-core`: workspace deps + `serde_json::Map`, `serde-json-canonicalizer`, `unicode-normalization`
|
||||
- `kb-parse-types`: workspace deps + `kb-core` ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b.
|
||||
- `kb-config`: workspace deps + `toml = "0.8"`, `dirs = "5"` (XDG paths)
|
||||
- `kb-app`: workspace deps + `kb-core`, `kb-config`, `tracing-subscriber`, `tracing-appender`
|
||||
- `kb-cli`: workspace deps + `kb-core`, `kb-config`, `kb-app`, `clap = { version = "4", features = ["derive"] }`
|
||||
- `kebab-core`: workspace deps + `serde_json::Map`, `serde-json-canonicalizer`, `unicode-normalization`
|
||||
- `kebab-parse-types`: workspace deps + `kebab-core` ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b.
|
||||
- `kebab-config`: workspace deps + `toml = "0.8"`, `dirs = "5"` (XDG paths)
|
||||
- `kebab-app`: workspace deps + `kebab-core`, `kebab-config`, `tracing-subscriber`, `tracing-appender`
|
||||
- `kebab-cli`: workspace deps + `kebab-core`, `kebab-config`, `kebab-app`, `clap = { version = "4", features = ["derive"] }`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-core` MUST NOT depend on any other `kb-*` crate.
|
||||
- `kb-parse-types` MUST depend ONLY on `kb-core`. No parser libraries (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`, …), no other `kb-*` crate.
|
||||
- `kb-config` MUST NOT depend on `kb-app`, `kb-cli`, parsers, stores, embedders, search, llm, rag, tui, desktop.
|
||||
- `kb-app` MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return `unimplemented!()` or `anyhow::bail!("not yet wired (Pn-i)")`).
|
||||
- `kb-cli` MUST NOT call any non-`kb-app` crate directly.
|
||||
- `kebab-core` MUST NOT depend on any other `kebab-*` crate.
|
||||
- `kebab-parse-types` MUST depend ONLY on `kebab-core`. No parser libraries (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`, …), no other `kebab-*` crate.
|
||||
- `kebab-config` MUST NOT depend on `kebab-app`, `kebab-cli`, parsers, stores, embedders, search, llm, rag, tui, desktop.
|
||||
- `kebab-app` MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return `unimplemented!()` or `anyhow::bail!("not yet wired (Pn-i)")`).
|
||||
- `kebab-cli` MUST NOT call any non-`kebab-app` crate directly.
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| frozen design doc | Markdown | `docs/superpowers/specs/2026-04-27-kb-final-form-design.md` |
|
||||
| user `kb` invocation | command-line args | end user |
|
||||
| frozen design doc | Markdown | `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` |
|
||||
| user `kebab` invocation | command-line args | end user |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream consumer |
|
||||
|--------|------|---------------------|
|
||||
| compiling workspace | Rust crates | every later task |
|
||||
| `kb-core` types/traits | Rust API | every other crate |
|
||||
| `kb-core` ID functions | Rust API | parsers, normalize, chunkers, embedders, search, rag |
|
||||
| `kb-config::Config` | Rust struct | every other crate |
|
||||
| `kb-app` facade methods (stubs) | Rust API | `kb-cli`, future TUI/desktop |
|
||||
| `kb` binary | executable | end user |
|
||||
| `kebab-core` types/traits | Rust API | every other crate |
|
||||
| `kebab-core` ID functions | Rust API | parsers, normalize, chunkers, embedders, search, rag |
|
||||
| `kebab-config::Config` | Rust struct | every other crate |
|
||||
| `kebab-app` facade methods (stubs) | Rust API | `kebab-cli`, future TUI/desktop |
|
||||
| `kebab` binary | executable | end user |
|
||||
| `docs/wire-schema/v1/*.schema.json` stubs | JSON Schema files | future wire emitters and consumers |
|
||||
| `docs/spec/*.md` stubs (link to frozen design) | Markdown | future contributors |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
All types/traits below are defined in `kb-core` exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field.
|
||||
All types/traits below are defined in `kebab-core` exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field.
|
||||
|
||||
```rust
|
||||
// ── kb-core ─────────────────────────────────────────────────────────────────
|
||||
// ── kebab-core ─────────────────────────────────────────────────────────────────
|
||||
|
||||
// Newtype IDs (design §3.1) — Display + FromStr implemented.
|
||||
pub struct AssetId(pub String);
|
||||
@@ -114,7 +114,7 @@ pub struct AudioRefBlock { /* per §3.4 */ }
|
||||
pub enum Inline { /* per §3.4 */ }
|
||||
pub enum SourceSpan { /* per §3.4 */ }
|
||||
|
||||
// (ParsedBlock + parser intermediates live in kb-parse-types per design §3.7b — NOT in kb-core.)
|
||||
// (ParsedBlock + parser intermediates live in kebab-parse-types per design §3.7b — NOT in kebab-core.)
|
||||
|
||||
// Chunk + Citation (§3.5)
|
||||
pub struct Chunk { /* per §3.5 */ }
|
||||
@@ -232,14 +232,14 @@ pub fn nfc(input: &str) -> String; // §4.1
|
||||
```
|
||||
|
||||
```rust
|
||||
// ── kb-parse-types ──────────────────────────────────────────────────────────
|
||||
// ── kebab-parse-types ──────────────────────────────────────────────────────────
|
||||
// Per design §3.7b. Defines parser intermediate representations consumed by
|
||||
// kb-normalize. Depends on kb-core only — never on parser libraries.
|
||||
// kebab-normalize. Depends on kebab-core only — never on parser libraries.
|
||||
|
||||
pub struct ParsedBlock {
|
||||
pub kind: ParsedBlockKind,
|
||||
pub heading_path: Vec<String>,
|
||||
pub source_span: kb_core::SourceSpan,
|
||||
pub source_span: kebab_core::SourceSpan,
|
||||
pub payload: ParsedPayload,
|
||||
}
|
||||
|
||||
@@ -247,16 +247,16 @@ pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRe
|
||||
|
||||
pub enum ParsedPayload {
|
||||
Heading { level: u8, text: String },
|
||||
Paragraph { text: String, inlines: Vec<kb_core::Inline> },
|
||||
List { ordered: bool, items: Vec<Vec<kb_core::Inline>> },
|
||||
Paragraph { text: String, inlines: Vec<kebab_core::Inline> },
|
||||
List { ordered: bool, items: Vec<Vec<kebab_core::Inline>> },
|
||||
Code { lang: Option<String>, code: String },
|
||||
Table { headers: Vec<String>, rows: Vec<Vec<String>> },
|
||||
Quote { text: String, inlines: Vec<kb_core::Inline> },
|
||||
Quote { text: String, inlines: Vec<kebab_core::Inline> },
|
||||
ImageRef { src: String, alt: String },
|
||||
AudioRef { src: String },
|
||||
}
|
||||
|
||||
// `Inline` itself lives in kb-core (§3.4) — parse-types references it, never duplicates it.
|
||||
// `Inline` itself lives in kebab-core (§3.4) — parse-types references it, never duplicates it.
|
||||
|
||||
pub struct Warning { pub kind: WarningKind, pub note: String }
|
||||
pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed }
|
||||
@@ -268,31 +268,31 @@ pub struct ParsedAudioSegment;
|
||||
```
|
||||
|
||||
```rust
|
||||
// ── kb-config ───────────────────────────────────────────────────────────────
|
||||
// ── kebab-config ───────────────────────────────────────────────────────────────
|
||||
pub struct Config { /* full schema per §6.4 */ }
|
||||
impl Config {
|
||||
pub fn load(path: Option<&std::path::Path>) -> anyhow::Result<Self>;
|
||||
pub fn from_file(path: &std::path::Path) -> anyhow::Result<Self>;
|
||||
pub fn defaults() -> Self;
|
||||
pub fn apply_env(self, env: &std::collections::HashMap<String, String>) -> Self;
|
||||
pub fn xdg_config_path() -> std::path::PathBuf; // ~/.config/kb/config.toml
|
||||
pub fn xdg_data_dir() -> std::path::PathBuf; // ~/.local/share/kb
|
||||
pub fn xdg_config_path() -> std::path::PathBuf; // ~/.config/kebab/config.toml
|
||||
pub fn xdg_data_dir() -> std::path::PathBuf; // ~/.local/share/kebab
|
||||
pub fn xdg_cache_dir() -> std::path::PathBuf;
|
||||
pub fn xdg_state_dir() -> std::path::PathBuf;
|
||||
}
|
||||
```
|
||||
|
||||
```rust
|
||||
// ── kb-app ──────────────────────────────────────────────────────────────────
|
||||
// ── kebab-app ──────────────────────────────────────────────────────────────────
|
||||
pub fn init_workspace(force: bool) -> anyhow::Result<()>;
|
||||
pub fn ingest(scope: kb_core::SourceScope, summary_only: bool) -> anyhow::Result<kb_core::IngestReport>;
|
||||
pub fn list_docs(filter: kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>;
|
||||
pub fn inspect_doc(id: &kb_core::DocumentId) -> anyhow::Result<kb_core::CanonicalDocument>;
|
||||
pub fn inspect_chunk(id: &kb_core::ChunkId) -> anyhow::Result<kb_core::Chunk>;
|
||||
pub fn search(query: kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>;
|
||||
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<kb_core::Answer>;
|
||||
pub fn ingest(scope: kebab_core::SourceScope, summary_only: bool) -> anyhow::Result<kebab_core::IngestReport>;
|
||||
pub fn list_docs(filter: kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
|
||||
pub fn inspect_doc(id: &kebab_core::DocumentId) -> anyhow::Result<kebab_core::CanonicalDocument>;
|
||||
pub fn inspect_chunk(id: &kebab_core::ChunkId) -> anyhow::Result<kebab_core::Chunk>;
|
||||
pub fn search(query: kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
|
||||
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<kebab_core::Answer>;
|
||||
pub fn doctor() -> anyhow::Result<DoctorReport>;
|
||||
pub struct AskOpts { pub k: usize, pub explain: bool, pub mode: kb_core::SearchMode, pub temperature: Option<f32>, pub seed: Option<u64> }
|
||||
pub struct AskOpts { pub k: usize, pub explain: bool, pub mode: kebab_core::SearchMode, pub temperature: Option<f32>, pub seed: Option<u64> }
|
||||
pub struct DoctorReport { pub ok: bool, pub checks: Vec<DoctorCheck> }
|
||||
pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub hint: Option<String> }
|
||||
```
|
||||
@@ -300,9 +300,9 @@ pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub
|
||||
P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; later phases replace bodies but never change signatures.
|
||||
|
||||
```rust
|
||||
// ── kb-cli ──────────────────────────────────────────────────────────────────
|
||||
// ── kebab-cli ──────────────────────────────────────────────────────────────────
|
||||
// clap subcommands: init | ingest | list (docs) | inspect (doc|chunk) | search | ask | doctor | eval (subcommand placeholder)
|
||||
// Each maps 1:1 to a kb_app function. Exit code mapping per §10.
|
||||
// Each maps 1:1 to a kebab_app function. Exit code mapping per §10.
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -312,18 +312,18 @@ P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; late
|
||||
- `id_from` uses `serde-json-canonicalizer` exactly as design §4.2 specifies and truncates blake3 to 32 hex chars.
|
||||
- `Citation::to_uri` emits W3C Media Fragments URIs per §0 Q3 (`#L<a>-L<b>`, `#p=<n>`, `#xywh=…`, `#caption`, `#t=hh:mm:ss,hh:mm:ss[&speaker=…]`).
|
||||
- `Citation::parse` is the strict inverse (round-trip property).
|
||||
- `kb-config` resolves XDG paths via `dirs` crate; respects `XDG_CONFIG_HOME`, `XDG_DATA_HOME`, `XDG_CACHE_HOME`, `XDG_STATE_HOME` if set.
|
||||
- Config layer order: defaults → file → env (`KB_<SECTION>_<KEY>`) → CLI flag (CLI override is applied by `kb-cli` after `Config::load`).
|
||||
- `kb-cli` global flags: `--config <path>`, `--verbose`, `--debug`, `--json`, `--explain` (where applicable). On `--json`, output conforms to wire schema v1.
|
||||
- `kb-cli` exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10).
|
||||
- `kebab-config` resolves XDG paths via `dirs` crate; respects `XDG_CONFIG_HOME`, `XDG_DATA_HOME`, `XDG_CACHE_HOME`, `XDG_STATE_HOME` if set.
|
||||
- Config layer order: defaults → file → env (`KB_<SECTION>_<KEY>`) → CLI flag (CLI override is applied by `kebab-cli` after `Config::load`).
|
||||
- `kebab-cli` global flags: `--config <path>`, `--verbose`, `--debug`, `--json`, `--explain` (where applicable). On `--json`, output conforms to wire schema v1.
|
||||
- `kebab-cli` exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10).
|
||||
- All facade-returned wire objects emit `schema_version` per §2 (e.g., `"answer.v1"`, `"search_hit.v1"`).
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Filesystem: creates `~/.config/kb/`, `~/.local/share/kb/`, `~/KnowledgeBase/` only when `kb init` runs; never on `Config::load`.
|
||||
- Filesystem: creates `~/.config/kebab/`, `~/.local/share/kebab/`, `~/KnowledgeBase/` only when `kebab init` runs; never on `Config::load`.
|
||||
- Wire schemas: ships `docs/wire-schema/v1/{citation,search_hit,answer,ingest_report,doc_summary,chunk_inspection,doctor}.schema.json` as **stubs** declaring the top-level `schema_version` and required fields per §2. Full property validation can land later.
|
||||
- DB: workspace ships `migrations/V001__init.sql` containing **only** §5.1 `schema_meta` + `migrations` tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within `kb-core`/`kb-config`/`kb-app`/`kb-cli` scope).
|
||||
- Logging: `tracing` initialized in `kb-cli`; daily-rolling file in `~/.local/state/kb/logs/`.
|
||||
- DB: workspace ships `migrations/V001__init.sql` containing **only** §5.1 `schema_meta` + `migrations` tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within `kebab-core`/`kebab-config`/`kebab-app`/`kebab-cli` scope).
|
||||
- Logging: `tracing` initialized in `kebab-cli`; daily-rolling file in `~/.local/state/kebab/logs/`.
|
||||
|
||||
## Test plan
|
||||
|
||||
@@ -336,20 +336,20 @@ P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; late
|
||||
| unit | newtype `Display`/`FromStr` rejects invalid lengths/chars | inline |
|
||||
| unit | `Config::defaults` + env override + CLI override produces expected merged config | inline |
|
||||
| snapshot | `Config::defaults` JSON serde stable | inline (round-trip) |
|
||||
| smoke | `kb --help`, `kb init`, `kb doctor` run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint) | tmp `XDG_*` env |
|
||||
| smoke | `kebab --help`, `kebab init`, `kebab doctor` run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint) | tmp `XDG_*` env |
|
||||
| build | `cargo check --workspace` and `cargo test --workspace` pass | repo |
|
||||
|
||||
All tests must run with no network, no Ollama, no models.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `Cargo.toml` workspace lists `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` and resolver=3, edition 2024
|
||||
- [ ] `Cargo.toml` workspace lists `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` and resolver=3, edition 2024
|
||||
- [ ] `cargo check --workspace` passes
|
||||
- [ ] `cargo test --workspace` passes
|
||||
- [ ] `kb-parse-types` `cargo tree` shows ONLY `kb-core` + `serde`/`thiserror` style deps (no parser libs, no other `kb-*`)
|
||||
- [ ] `kb --help` prints subcommands
|
||||
- [ ] `kb init` creates XDG dirs idempotently and writes `config.toml`
|
||||
- [ ] `kb doctor` returns wire JSON conforming to `doctor.v1` (in `--json` mode)
|
||||
- [ ] `kebab-parse-types` `cargo tree` shows ONLY `kebab-core` + `serde`/`thiserror` style deps (no parser libs, no other `kebab-*`)
|
||||
- [ ] `kebab --help` prints subcommands
|
||||
- [ ] `kebab init` creates XDG dirs idempotently and writes `config.toml`
|
||||
- [ ] `kebab doctor` returns wire JSON conforming to `doctor.v1` (in `--json` mode)
|
||||
- [ ] `docs/wire-schema/v1/*.schema.json` stubs exist (7 files: citation, search_hit, answer, ingest_report, doc_summary, chunk_inspection, doctor)
|
||||
- [ ] `docs/spec/` stubs exist linking to the frozen design (one file per: domain-model, ids, canonical-document, chunk-policy, citation-policy, module-boundaries, ai-generation-guidelines)
|
||||
- [ ] `fixtures/` root directory created with all subdirectories that downstream tasks reference: `fixtures/markdown/`, `fixtures/source-fs/`, `fixtures/search/lexical/`, `fixtures/search/hybrid/`, `fixtures/embed/`, `fixtures/vector/`, `fixtures/rag/`, `fixtures/eval/`, `fixtures/image/`, `fixtures/pdf/`, `fixtures/audio/`. Each subdir gets a `.gitkeep` so it tracks. P1 ships at minimum `fixtures/markdown/{simple-note,nested-headings,code-and-table}.md` (per epic phase-0); other dirs stay empty until their phase lands.
|
||||
@@ -361,12 +361,12 @@ All tests must run with no network, no Ollama, no models.
|
||||
- Any parser / store / embedder / search / llm / rag / tui / desktop logic (downstream phases).
|
||||
- Full schema migrations (most DDL lands in p1-6 / p2-1 / p3-3).
|
||||
- Wire schema deep validation (only required fields + `schema_version` checked here).
|
||||
- Real `kb-app` business logic (functions stub with `unimplemented!()` or explicit `bail!`).
|
||||
- Real `kebab-app` business logic (functions stub with `unimplemented!()` or explicit `bail!`).
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- ID recipe is the contract that every later record depends on. Any change after this task lands forces a `parser_version` / `chunker_version` / `embedding_version` cascade per §9. Treat changes as schema migrations and update the design doc first.
|
||||
- Newtype IDs use `String` (not `[u8; 16]`) to keep serde simple; tests must still enforce 32-char hex constraint on `FromStr`.
|
||||
- `kb-app` stubs must use `bail!` not `panic!` so the CLI exits with code 2 cleanly per §10.
|
||||
- `kebab-app` stubs must use `bail!` not `panic!` so the CLI exits with code 2 cleanly per §10.
|
||||
- `clap` v4 derive: subcommand `inspect` has nested `doc` / `chunk` variants; ensure exit code 0/1/2 mapping wraps the facade call uniformly.
|
||||
- XDG path discovery on macOS: spec uses XDG (not `Application Support`) per §6.1 — `dirs` crate honors XDG env vars; tests must set them explicitly.
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P1
|
||||
component: kb-source-fs
|
||||
component: kebab-source-fs
|
||||
task_id: p1-1
|
||||
title: "Local filesystem source connector"
|
||||
status: completed
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `ignore` (gitignore semantics)
|
||||
- `blake3`
|
||||
- `walkdir`
|
||||
@@ -34,21 +34,21 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `SourceScope` | `kb_core::SourceScope` | `kb-app` from config |
|
||||
| `SourceScope` | `kebab_core::SourceScope` | `kebab-app` from config |
|
||||
| filesystem | `&Path` | OS |
|
||||
| `.kbignore` | text file | workspace root, optional |
|
||||
| `.kebabignore` | text file | workspace root, optional |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream consumer |
|
||||
|--------|------|---------------------|
|
||||
| `Vec<RawAsset>` | `kb_core::RawAsset` | `kb-parse-md`, asset writer in `kb-store-sqlite` (via `kb-app`) |
|
||||
| `Vec<RawAsset>` | `kebab_core::RawAsset` | `kebab-parse-md`, asset writer in `kebab-store-sqlite` (via `kebab-app`) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -56,11 +56,11 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
|
||||
pub struct FsSourceConnector { /* internal */ }
|
||||
|
||||
impl FsSourceConnector {
|
||||
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
|
||||
}
|
||||
|
||||
impl kb_core::SourceConnector for FsSourceConnector {
|
||||
fn scan(&self, scope: &kb_core::SourceScope) -> anyhow::Result<Vec<kb_core::RawAsset>>;
|
||||
impl kebab_core::SourceConnector for FsSourceConnector {
|
||||
fn scan(&self, scope: &kebab_core::SourceScope) -> anyhow::Result<Vec<kebab_core::RawAsset>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -70,7 +70,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
|
||||
- `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex.
|
||||
- `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`).
|
||||
- `discovered_at` = current `OffsetDateTime::now_utc()` at scan time.
|
||||
- Combine `config.workspace.exclude` ∪ `.kbignore` for filter (union; ordering does not matter).
|
||||
- Combine `config.workspace.exclude` ∪ `.kebabignore` for filter (union; ordering does not matter).
|
||||
- Symbolic links: follow once, detect cycles via `canonicalize` + visited set.
|
||||
- Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task).
|
||||
- Idempotent: same input → same `Vec<RawAsset>` (sort by `workspace_path`).
|
||||
@@ -78,7 +78,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
|
||||
## Storage / wire effects
|
||||
|
||||
- Reads: filesystem under `config.workspace.root`.
|
||||
- Writes: nothing. (Asset copy is handled by the asset writer in `kb-store-sqlite`.)
|
||||
- Writes: nothing. (Asset copy is handled by the asset writer in `kebab-store-sqlite`.)
|
||||
|
||||
## Test plan
|
||||
|
||||
@@ -87,25 +87,25 @@ impl kb_core::SourceConnector for FsSourceConnector {
|
||||
| unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical |
|
||||
| unit | blake3 of known bytes matches expected hex | inline |
|
||||
| unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test |
|
||||
| unit | `.kbignore` ∪ config exclude works | tmp tree |
|
||||
| unit | `.kebabignore` ∪ config exclude works | tmp tree |
|
||||
| unit | symlink cycle does not loop | tmp tree with `a -> b -> a` |
|
||||
| snapshot | `Vec<RawAsset>` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` |
|
||||
| determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` |
|
||||
|
||||
All tests run under `cargo test -p kb-source-fs` with no network and no model.
|
||||
All tests run under `cargo test -p kebab-source-fs` with no network and no model.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-source-fs` passes
|
||||
- [ ] `cargo test -p kb-source-fs` passes
|
||||
- [ ] `cargo check -p kebab-source-fs` passes
|
||||
- [ ] `cargo test -p kebab-source-fs` passes
|
||||
- [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically
|
||||
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kb-source-fs`)
|
||||
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kebab-source-fs`)
|
||||
- [ ] PR description links to design §3.3, §6.2, §7.2
|
||||
|
||||
## Out of scope
|
||||
|
||||
- File watching (P+).
|
||||
- Asset copy/reference storage on disk (`kb-store-sqlite` task p1-6).
|
||||
- Asset copy/reference storage on disk (`kebab-store-sqlite` task p1-6).
|
||||
- Non-fs source connectors (HTTP, S3 — P+).
|
||||
|
||||
## Risks / notes
|
||||
|
||||
@@ -1,29 +1,29 @@
|
||||
---
|
||||
phase: P1
|
||||
component: kb-parse-md (frontmatter submodule)
|
||||
component: kebab-parse-md (frontmatter submodule)
|
||||
task_id: p1-2
|
||||
title: "Markdown frontmatter parsing → Metadata"
|
||||
status: completed
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p1-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_sections: [design §3.6 Metadata, design §3.7b kb-parse-types (Warning), design §0 Q9 frontmatter, design §10 errors]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [design §3.6 Metadata, design §3.7b kebab-parse-types (Warning), design §0 Q9 frontmatter, design §10 errors]
|
||||
---
|
||||
|
||||
# p1-2 — Markdown frontmatter parsing
|
||||
|
||||
## Goal
|
||||
|
||||
Parse YAML/TOML frontmatter from Markdown bytes into `kb_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
|
||||
Parse YAML/TOML frontmatter from Markdown bytes into `kebab_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kb-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
|
||||
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kebab-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-parse-types` (provides shared `Warning` + `WarningKind` per design §3.7b)
|
||||
- `kebab-core`
|
||||
- `kebab-parse-types` (provides shared `Warning` + `WarningKind` per design §3.7b)
|
||||
- `serde`
|
||||
- `serde_yaml` (or `yaml-rust2`) for YAML
|
||||
- `toml` for TOML
|
||||
@@ -33,7 +33,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-tui`, `kb-desktop`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `pulldown-cmark` (block parser is a sibling task)
|
||||
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-tui`, `kebab-desktop`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `pulldown-cmark` (block parser is a sibling task)
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -46,7 +46,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `(Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)` | tuple | `kb-normalize` → CanonicalDocument |
|
||||
| `(Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)` | tuple | `kebab-normalize` → CanonicalDocument |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -54,10 +54,10 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
|
||||
pub fn parse_frontmatter(
|
||||
bytes: &[u8],
|
||||
hints: &BodyHints,
|
||||
) -> anyhow::Result<(kb_core::Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)>;
|
||||
) -> anyhow::Result<(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)>;
|
||||
```
|
||||
|
||||
`Warning` / `WarningKind` come from `kb-parse-types` (shared with `p1-3` blocks parser and downstream `kb-normalize`). `FrontmatterSpan` is crate-internal; if any new public type is needed, STOP and update the frozen design doc first.
|
||||
`Warning` / `WarningKind` come from `kebab-parse-types` (shared with `p1-3` blocks parser and downstream `kebab-normalize`). `FrontmatterSpan` is crate-internal; if any new public type is needed, STOP and update the frozen design doc first.
|
||||
|
||||
## Behavior contract
|
||||
|
||||
@@ -68,7 +68,7 @@ pub fn parse_frontmatter(
|
||||
- `source_type` default `markdown`; `trust_level` default `primary`.
|
||||
- `aliases`, `tags` default empty.
|
||||
- Unknown keys → `metadata.user` (`serde_json::Map`), preserved verbatim, no warning.
|
||||
- Unknown enum value (e.g. `trust_level: weird`) → emit `kb_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" }` + ingest continues with default.
|
||||
- Unknown enum value (e.g. `trust_level: weird`) → emit `kebab_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" }` + ingest continues with default.
|
||||
- Malformed YAML → frontmatter discarded, body still parsed, `Warning { kind: WarningKind::MalformedFrontmatter, note: "<error msg>" }` emitted.
|
||||
- No frontmatter at all → defaults applied silently.
|
||||
- `id:` field captured into `metadata.user_id_alias` (alias only — does NOT influence `doc_id` per design §4.2).
|
||||
@@ -91,12 +91,12 @@ pub fn parse_frontmatter(
|
||||
| snapshot | `fixtures/markdown/frontmatter-only.md` produces stable JSON | fixture |
|
||||
| snapshot | mixed-language body with no `lang:` detects `ko` or `en` | `fixtures/markdown/mixed-lang.md` |
|
||||
|
||||
All tests under `cargo test -p kb-parse-md --lib frontmatter`.
|
||||
All tests under `cargo test -p kebab-parse-md --lib frontmatter`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-md` passes
|
||||
- [ ] `cargo test -p kb-parse-md frontmatter` passes
|
||||
- [ ] `cargo check -p kebab-parse-md` passes
|
||||
- [ ] `cargo test -p kebab-parse-md frontmatter` passes
|
||||
- [ ] No `pulldown-cmark` import in this submodule
|
||||
- [ ] Snapshot tests stable across two consecutive runs
|
||||
- [ ] PR links design §0 Q9, §3.6
|
||||
|
||||
@@ -1,20 +1,20 @@
|
||||
---
|
||||
phase: P1
|
||||
component: kb-parse-md (blocks submodule)
|
||||
component: kebab-parse-md (blocks submodule)
|
||||
task_id: p1-3
|
||||
title: "Markdown body → Block tree with line spans"
|
||||
status: completed
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p1-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_sections: [§3.4 Block, §3.4 SourceSpan, §3.7b kb-parse-types, §0 Q3 citation]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4 Block, §3.4 SourceSpan, §3.7b kebab-parse-types, §0 Q3 citation]
|
||||
---
|
||||
|
||||
# p1-3 — Markdown body → Block tree
|
||||
|
||||
## Goal
|
||||
|
||||
Parse Markdown body bytes into a flat `Vec<kb_parse_types::ParsedBlock>` with heading paths and line ranges preserved, ready for `kb-normalize` to lift into `CanonicalDocument`.
|
||||
Parse Markdown body bytes into a flat `Vec<kebab_parse_types::ParsedBlock>` with heading paths and line ranges preserved, ready for `kebab-normalize` to lift into `CanonicalDocument`.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -22,15 +22,15 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-parse-types` (defines `ParsedBlock`, `ParsedPayload`, `Warning`)
|
||||
- `kebab-core`
|
||||
- `kebab-parse-types` (defines `ParsedBlock`, `ParsedPayload`, `Warning`)
|
||||
- `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature)
|
||||
- `serde`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `kb-tui`, `kb-desktop`, `comrak` (alternative parser; pick one)
|
||||
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `kebab-tui`, `kebab-desktop`, `comrak` (alternative parser; pick one)
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -43,8 +43,8 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<kb_parse_types::ParsedBlock>` | shared type from `kb-parse-types` | `kb-normalize` |
|
||||
| `Vec<kb_parse_types::Warning>` | shared type | propagated into Provenance |
|
||||
| `Vec<kebab_parse_types::ParsedBlock>` | shared type from `kebab-parse-types` | `kebab-normalize` |
|
||||
| `Vec<kebab_parse_types::Warning>` | shared type | propagated into Provenance |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -52,10 +52,10 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
|
||||
pub fn parse_blocks(
|
||||
body: &[u8],
|
||||
body_offset_lines: u32,
|
||||
) -> anyhow::Result<(Vec<kb_parse_types::ParsedBlock>, Vec<kb_parse_types::Warning>)>;
|
||||
) -> anyhow::Result<(Vec<kebab_parse_types::ParsedBlock>, Vec<kebab_parse_types::Warning>)>;
|
||||
```
|
||||
|
||||
`ParsedBlock` is defined in `kb-parse-types` (design §3.7b). `kb-parse-md` does NOT define its own; it consumes the shared type. Lift to `kb_core::Block` (with `BlockId` assignment) is `kb-normalize`'s job (p1-4).
|
||||
`ParsedBlock` is defined in `kebab-parse-types` (design §3.7b). `kebab-parse-md` does NOT define its own; it consumes the shared type. Lift to `kebab_core::Block` (with `BlockId` assignment) is `kebab-normalize`'s job (p1-4).
|
||||
|
||||
## Behavior contract
|
||||
|
||||
@@ -63,9 +63,9 @@ pub fn parse_blocks(
|
||||
- Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`).
|
||||
- Code blocks: `ParsedPayload::Code { lang: Some("rust"), code }` — fenced content not split.
|
||||
- Tables: GFM tables produce `ParsedPayload::Table { headers, rows }`; if a table cell is malformed, fall back to `ParsedPayload::Paragraph` + `Warning::MalformedTable`.
|
||||
- Image references: `` produces `ParsedPayload::ImageRef { src, alt }`. `AssetId` resolution happens later in `kb-normalize` (when image src can be matched to a workspace asset).
|
||||
- Lists: ordered/unordered preserved via `ParsedPayload::List { ordered, items }`; nested list items flattened so each `items[i]` is a `Vec<kb_core::Inline>` for one top-level item.
|
||||
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per `kb_core::Inline` per design §3.4). Drop other inlines silently.
|
||||
- Image references: `` produces `ParsedPayload::ImageRef { src, alt }`. `AssetId` resolution happens later in `kebab-normalize` (when image src can be matched to a workspace asset).
|
||||
- Lists: ordered/unordered preserved via `ParsedPayload::List { ordered, items }`; nested list items flattened so each `items[i]` is a `Vec<kebab_core::Inline>` for one top-level item.
|
||||
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per `kebab_core::Inline` per design §3.4). Drop other inlines silently.
|
||||
- Malformed input never panics. Worst case: empty `Vec<ParsedBlock>` + `Warning::ExtractFailed`.
|
||||
|
||||
## Storage / wire effects
|
||||
@@ -86,12 +86,12 @@ pub fn parse_blocks(
|
||||
| snapshot | `fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable | fixture |
|
||||
| snapshot | `fixtures/markdown/code-and-table.md` → JSON stable | fixture |
|
||||
|
||||
All tests under `cargo test -p kb-parse-md --lib blocks`.
|
||||
All tests under `cargo test -p kebab-parse-md --lib blocks`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-md` passes
|
||||
- [ ] `cargo test -p kb-parse-md blocks` passes
|
||||
- [ ] `cargo check -p kebab-parse-md` passes
|
||||
- [ ] `cargo test -p kebab-parse-md blocks` passes
|
||||
- [ ] Snapshot tests stable across two runs
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.4
|
||||
@@ -99,7 +99,7 @@ All tests under `cargo test -p kb-parse-md --lib blocks`.
|
||||
## Out of scope
|
||||
|
||||
- Frontmatter (p1-2).
|
||||
- Lifting `kb_parse_types::ParsedBlock` → `kb_core::Block` with `BlockId` (p1-4 normalize).
|
||||
- Lifting `kebab_parse_types::ParsedBlock` → `kebab_core::Block` with `BlockId` (p1-4 normalize).
|
||||
- Chunking (p1-5).
|
||||
|
||||
## Risks / notes
|
||||
|
||||
@@ -1,30 +1,30 @@
|
||||
---
|
||||
phase: P1
|
||||
component: kb-normalize
|
||||
component: kebab-normalize
|
||||
task_id: p1-4
|
||||
title: "Lift parser output → CanonicalDocument with deterministic IDs"
|
||||
status: completed
|
||||
depends_on: [p1-2, p1-3]
|
||||
unblocks: [p1-5, p1-6]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_sections: [§3.4, §3.7b kb-parse-types, §4 ID recipe, §3.6 Provenance, §8 module boundaries]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4, §3.7b kebab-parse-types, §4 ID recipe, §3.6 Provenance, §8 module boundaries]
|
||||
---
|
||||
|
||||
# p1-4 — Lift to CanonicalDocument
|
||||
|
||||
## Goal
|
||||
|
||||
Combine `Metadata` (p1-2) + `Vec<kb_parse_types::ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a `kb_core::CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe.
|
||||
Combine `Metadata` (p1-2) + `Vec<kebab_parse_types::ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a `kebab_core::CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
|
||||
Single responsibility: ID generation + struct assembly. Keeps `kebab-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-parse-types` (input shapes — `ParsedBlock`, `ParsedPayload`, `Warning`)
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-parse-types` (input shapes — `ParsedBlock`, `ParsedPayload`, `Warning`)
|
||||
- `kebab-config`
|
||||
- `serde`
|
||||
- `serde-json-canonicalizer` (canonical JSON for ID hashing)
|
||||
- `blake3`
|
||||
@@ -34,36 +34,36 @@ Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` pure
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md` (consumed via shared `kb-parse-types` only — `kb-parse-md` must NOT appear in this crate's `cargo tree`), `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md` (consumed via shared `kebab-parse-types` only — `kebab-parse-md` must NOT appear in this crate's `cargo tree`), `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `RawAsset` | `kb_core::RawAsset` | p1-1 |
|
||||
| `Metadata` + frontmatter span + warnings | `(kb_core::Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)` | p1-2 |
|
||||
| `Vec<ParsedBlock>` + warnings | `(Vec<kb_parse_types::ParsedBlock>, Vec<kb_parse_types::Warning>)` | p1-3 |
|
||||
| `parser_version` | `kb_core::ParserVersion` | constant in `kb-parse-md` |
|
||||
| `RawAsset` | `kebab_core::RawAsset` | p1-1 |
|
||||
| `Metadata` + frontmatter span + warnings | `(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)` | p1-2 |
|
||||
| `Vec<ParsedBlock>` + warnings | `(Vec<kebab_parse_types::ParsedBlock>, Vec<kebab_parse_types::Warning>)` | p1-3 |
|
||||
| `parser_version` | `kebab_core::ParserVersion` | constant in `kebab-parse-md` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk`, `kb-store-sqlite` |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk`, `kebab-store-sqlite` |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub fn build_canonical_document(
|
||||
asset: &kb_core::RawAsset,
|
||||
metadata: kb_core::Metadata,
|
||||
blocks: Vec<kb_parse_types::ParsedBlock>,
|
||||
parser_version: &kb_core::ParserVersion,
|
||||
warnings: Vec<kb_parse_types::Warning>,
|
||||
) -> anyhow::Result<kb_core::CanonicalDocument>;
|
||||
asset: &kebab_core::RawAsset,
|
||||
metadata: kebab_core::Metadata,
|
||||
blocks: Vec<kebab_parse_types::ParsedBlock>,
|
||||
parser_version: &kebab_core::ParserVersion,
|
||||
warnings: Vec<kebab_parse_types::Warning>,
|
||||
) -> anyhow::Result<kebab_core::CanonicalDocument>;
|
||||
|
||||
pub fn id_for_doc(workspace_path: &kb_core::WorkspacePath, asset: &kb_core::AssetId, parser_version: &kb_core::ParserVersion) -> kb_core::DocumentId;
|
||||
pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kb_core::SourceSpan) -> kb_core::BlockId;
|
||||
pub fn id_for_doc(workspace_path: &kebab_core::WorkspacePath, asset: &kebab_core::AssetId, parser_version: &kebab_core::ParserVersion) -> kebab_core::DocumentId;
|
||||
pub fn id_for_block(doc: &kebab_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kebab_core::SourceSpan) -> kebab_core::BlockId;
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -91,14 +91,14 @@ pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[Strin
|
||||
| unit | provenance contains Discovered/Parsed/Normalized in order | inline |
|
||||
| snapshot | `fixtures/markdown/code-and-table.md` → CanonicalDocument JSON stable (incl. all IDs) | fixture |
|
||||
|
||||
All tests under `cargo test -p kb-normalize`.
|
||||
All tests under `cargo test -p kebab-normalize`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-normalize` passes
|
||||
- [ ] `cargo test -p kb-normalize` passes
|
||||
- [ ] `cargo check -p kebab-normalize` passes
|
||||
- [ ] `cargo test -p kebab-normalize` passes
|
||||
- [ ] Determinism test runs ≥ 1000 iterations under 1 second
|
||||
- [ ] No `kb-parse-md` (or any other parser crate) appears in `cargo tree -p kb-normalize` — input types come from `kb-parse-types` only
|
||||
- [ ] No `kebab-parse-md` (or any other parser crate) appears in `cargo tree -p kebab-normalize` — input types come from `kebab-parse-types` only
|
||||
- [ ] PR links design §3.7b, §4.2, §4.3, §8
|
||||
|
||||
## Out of scope
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P1
|
||||
component: kb-chunk
|
||||
component: kebab-chunk
|
||||
task_id: p1-5
|
||||
title: "Markdown heading-aware chunker (md-heading-v1)"
|
||||
status: completed
|
||||
depends_on: [p1-4]
|
||||
unblocks: [p1-6, p2-2, p3-2]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §7.2 Chunker, §0 Q3 citation]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `serde`
|
||||
- `blake3` (policy_hash)
|
||||
- `serde-json-canonicalizer`
|
||||
@@ -31,30 +31,30 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize` (consumes `CanonicalDocument` only via `kb-core`), `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize` (consumes `CanonicalDocument` only via `kebab-core`), `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
|
||||
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` from config |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
|
||||
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` from config |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite` (p1-6), `kb-embed*` (P3) |
|
||||
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite` (p1-6), `kebab-embed*` (P3) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct MdHeadingV1Chunker;
|
||||
|
||||
impl kb_core::Chunker for MdHeadingV1Chunker {
|
||||
fn chunker_version(&self) -> kb_core::ChunkerVersion;
|
||||
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
|
||||
impl kebab_core::Chunker for MdHeadingV1Chunker {
|
||||
fn chunker_version(&self) -> kebab_core::ChunkerVersion;
|
||||
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -91,12 +91,12 @@ impl kb_core::Chunker for MdHeadingV1Chunker {
|
||||
| determinism | identical input + identical policy → identical chunk_ids | inline |
|
||||
| snapshot | `fixtures/markdown/long-section.md` → Vec<Chunk> JSON stable | fixture |
|
||||
|
||||
All tests under `cargo test -p kb-chunk`.
|
||||
All tests under `cargo test -p kebab-chunk`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-chunk` passes
|
||||
- [ ] `cargo test -p kb-chunk` passes
|
||||
- [ ] `cargo check -p kebab-chunk` passes
|
||||
- [ ] `cargo test -p kebab-chunk` passes
|
||||
- [ ] Snapshot stable across two runs
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.5, §4.2
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P1
|
||||
component: kb-store-sqlite (P1 subset)
|
||||
component: kebab-store-sqlite (P1 subset)
|
||||
task_id: p1-6
|
||||
title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations"
|
||||
status: completed
|
||||
depends_on: [p1-1, p1-4, p1-5]
|
||||
unblocks: [p2-1, p3-3, p4-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§5 DDL (5.1, 5.2, 5.3, 5.4, 5.5 chunks only — FTS handled in p2-1), §5.7 jobs/ingest_runs, §5.8 transactions, §6.3 data_dir layout]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `rusqlite` (with `bundled-sqlcipher` disabled; use `bundled` feature)
|
||||
- `refinery` for migrations
|
||||
- `serde_json`
|
||||
@@ -34,7 +34,7 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs` (only types via `kb-core`), `kb-parse-md`, `kb-normalize`, `kb-chunk` (only types via `kb-core`), `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs` (only types via `kebab-core`), `kebab-parse-md`, `kebab-normalize`, `kebab-chunk` (only types via `kebab-core`), `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -42,17 +42,17 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|
||||
|-------|------|--------|
|
||||
| migrations | `migrations/V001__init.sql` | repo |
|
||||
| `RawAsset` + bytes | `(RawAsset, Vec<u8>)` | p1-1 + reader |
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
|
||||
| `Vec<Chunk>` | `kb_core::Chunk` | p1-5 |
|
||||
| `IngestRun` aggregates | `(scope, counts, duration)` | `kb-app` |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
|
||||
| `Vec<Chunk>` | `kebab_core::Chunk` | p1-5 |
|
||||
| `IngestRun` aggregates | `(scope, counts, duration)` | `kebab-app` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `data_dir/kb.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | – | every later phase |
|
||||
| `data_dir/kebab.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | – | every later phase |
|
||||
| `data_dir/assets/<aa>/<asset_id>` bytes (when copied) | – | future re-extraction, integrity verification |
|
||||
| `IngestReport` (wire schema v1) | `kb_core::IngestReport` | `kb-cli`, eval |
|
||||
| `IngestReport` (wire schema v1) | `kebab_core::IngestReport` | `kebab-cli`, eval |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -60,23 +60,23 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|
||||
pub struct SqliteStore { /* internal */ }
|
||||
|
||||
impl SqliteStore {
|
||||
pub fn open(config: &kb_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn open(config: &kebab_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn run_migrations(&self) -> anyhow::Result<()>;
|
||||
|
||||
pub fn put_asset_with_bytes(&self, asset: &kb_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
|
||||
pub fn put_asset_with_bytes(&self, asset: &kebab_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
|
||||
}
|
||||
|
||||
impl kb_core::DocumentStore for SqliteStore {
|
||||
fn put_asset(&self, a: &kb_core::RawAsset) -> anyhow::Result<()>;
|
||||
fn put_document(&self, d: &kb_core::CanonicalDocument) -> anyhow::Result<()>;
|
||||
fn put_blocks(&self, doc: &kb_core::DocumentId, blocks: &[kb_core::Block]) -> anyhow::Result<()>;
|
||||
fn put_chunks(&self, doc: &kb_core::DocumentId, chunks: &[kb_core::Chunk]) -> anyhow::Result<()>;
|
||||
fn get_document(&self, id: &kb_core::DocumentId) -> anyhow::Result<Option<kb_core::CanonicalDocument>>;
|
||||
fn get_chunk(&self, id: &kb_core::ChunkId) -> anyhow::Result<Option<kb_core::Chunk>>;
|
||||
fn list_documents(&self, filter: &kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>;
|
||||
impl kebab_core::DocumentStore for SqliteStore {
|
||||
fn put_asset(&self, a: &kebab_core::RawAsset) -> anyhow::Result<()>;
|
||||
fn put_document(&self, d: &kebab_core::CanonicalDocument) -> anyhow::Result<()>;
|
||||
fn put_blocks(&self, doc: &kebab_core::DocumentId, blocks: &[kebab_core::Block]) -> anyhow::Result<()>;
|
||||
fn put_chunks(&self, doc: &kebab_core::DocumentId, chunks: &[kebab_core::Chunk]) -> anyhow::Result<()>;
|
||||
fn get_document(&self, id: &kebab_core::DocumentId) -> anyhow::Result<Option<kebab_core::CanonicalDocument>>;
|
||||
fn get_chunk(&self, id: &kebab_core::ChunkId) -> anyhow::Result<Option<kebab_core::Chunk>>;
|
||||
fn list_documents(&self, filter: &kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
|
||||
}
|
||||
|
||||
impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
|
||||
impl kebab_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -96,7 +96,7 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Writes: `kb.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
|
||||
- Writes: `kebab.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
|
||||
- Reads on subsequent calls: same DB.
|
||||
|
||||
## Test plan
|
||||
@@ -112,14 +112,14 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
|
||||
| contract | DocumentStore trait round-trip for fixture document | `fixtures/markdown/code-and-table.md` |
|
||||
| snapshot | IngestReport JSON for fixture run | fixture |
|
||||
|
||||
All tests under `cargo test -p kb-store-sqlite` with no network.
|
||||
All tests under `cargo test -p kebab-store-sqlite` with no network.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-store-sqlite` passes
|
||||
- [ ] `cargo test -p kb-store-sqlite` passes
|
||||
- [ ] `cargo check -p kebab-store-sqlite` passes
|
||||
- [ ] `cargo test -p kebab-store-sqlite` passes
|
||||
- [ ] migration `V001__init.sql` matches design §5 verbatim (diff-checked in CI)
|
||||
- [ ] Writes to `~/.local/share/kb/` are gated by `kb-config`'s `data_dir` and never escape it
|
||||
- [ ] Writes to `~/.local/share/kebab/` are gated by `kebab-config`'s `data_dir` and never escape it
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §5
|
||||
|
||||
@@ -132,5 +132,5 @@ All tests under `cargo test -p kb-store-sqlite` with no network.
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kb.sqlite-wal` / `-shm`.
|
||||
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kebab.sqlite-wal` / `-shm`.
|
||||
- Asset directory shard prefix uses `asset_id[..2]`; using `asset_id[..1]` would create at most 16 dirs (insufficient).
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P2
|
||||
component: kb-store-sqlite (FTS5 migration)
|
||||
component: kebab-store-sqlite (FTS5 migration)
|
||||
task_id: p2-1
|
||||
title: "FTS5 virtual table + triggers (V002 migration)"
|
||||
status: completed
|
||||
depends_on: [p1-6]
|
||||
unblocks: [p2-2]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§5.5 chunks_fts + triggers, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -18,19 +18,19 @@ Add `chunks_fts` virtual table and three sync triggers via migration `V002__fts.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
`chunks_fts` is the lexical index for `kb-search`. Splitting it from p1-6 keeps P1 focused on relational data; bringing it as `V002` lets users upgrade an existing P1 DB without re-ingesting.
|
||||
`chunks_fts` is the lexical index for `kebab-search`. Splitting it from p1-6 keeps P1 focused on relational data; bringing it as `V002` lets users upgrade an existing P1 DB without re-ingesting.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-store-sqlite` (extends migrations)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-store-sqlite` (extends migrations)
|
||||
- `rusqlite`
|
||||
- `refinery`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search` (consumer is p2-2), `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search` (consumer is p2-2), `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -52,7 +52,7 @@ Add `chunks_fts` virtual table and three sync triggers via migration `V002__fts.
|
||||
pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
|
||||
```
|
||||
|
||||
(Used by `kb index --rebuild-fts`. Re-runs `INSERT INTO chunks_fts SELECT ... FROM chunks` after `DELETE FROM chunks_fts;`.)
|
||||
(Used by `kebab index --rebuild-fts`. Re-runs `INSERT INTO chunks_fts SELECT ... FROM chunks` after `DELETE FROM chunks_fts;`.)
|
||||
|
||||
## Behavior contract
|
||||
|
||||
@@ -64,7 +64,7 @@ pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Writes: `chunks_fts` virtual table inside `kb.sqlite`.
|
||||
- Writes: `chunks_fts` virtual table inside `kebab.sqlite`.
|
||||
- Reads: existing `chunks` rows for backfill.
|
||||
|
||||
## Test plan
|
||||
@@ -78,12 +78,12 @@ pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
|
||||
| function | `rebuild_chunks_fts` produces deterministic content equal to fresh backfill | tmp DB |
|
||||
| migration | running `V002` twice is a no-op (refinery handles idempotency) | tmp DB |
|
||||
|
||||
All tests under `cargo test -p kb-store-sqlite fts`.
|
||||
All tests under `cargo test -p kebab-store-sqlite fts`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-store-sqlite` passes
|
||||
- [ ] `cargo test -p kb-store-sqlite fts` passes
|
||||
- [ ] `cargo check -p kebab-store-sqlite` passes
|
||||
- [ ] `cargo test -p kebab-store-sqlite fts` passes
|
||||
- [ ] `migrations/V002__fts.sql` matches design §5.5 verbatim (CI diff check)
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §5.5
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P2
|
||||
component: kb-search (lexical mode)
|
||||
component: kebab-search (lexical mode)
|
||||
task_id: p2-2
|
||||
title: "Lexical Retriever via SQLite FTS5 + bm25 + citation"
|
||||
status: completed
|
||||
depends_on: [p2-1]
|
||||
unblocks: [p3-4, p4-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5/1.6 search output, §2.2 wire schema, §6.4 search settings]
|
||||
---
|
||||
|
||||
@@ -14,38 +14,38 @@ contract_sections: [§3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5
|
||||
|
||||
## Goal
|
||||
|
||||
Implement `kb_core::Retriever` for `SearchMode::Lexical` using SQLite FTS5. Returns `SearchHit` with `bm25` ranking, `snippet()`-derived preview, and proper W3C-fragment citation.
|
||||
Implement `kebab_core::Retriever` for `SearchMode::Lexical` using SQLite FTS5. Returns `SearchHit` with `bm25` ranking, `snippet()`-derived preview, and proper W3C-fragment citation.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
First concrete `Retriever`. Lets `kb search --mode lexical` work without any embedding/LLM infrastructure. Establishes the SearchHit construction contract that hybrid (p3-4) reuses.
|
||||
First concrete `Retriever`. Lets `kebab search --mode lexical` work without any embedding/LLM infrastructure. Establishes the SearchHit construction contract that hybrid (p3-4) reuses.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-store-sqlite` (read access to `chunks_fts` + `chunks` + `documents`)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-store-sqlite` (read access to `chunks_fts` + `chunks` + `documents`)
|
||||
- `rusqlite`
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `SearchQuery` (mode=Lexical) | `kb_core::SearchQuery` | `kb-app::search` |
|
||||
| `kb-config::search` settings (`default_k`, `snippet_chars`) | `kb_config::Config` | runtime |
|
||||
| SQLite connection (read) | `rusqlite::Connection` | `kb-store-sqlite` |
|
||||
| `SearchQuery` (mode=Lexical) | `kebab_core::SearchQuery` | `kebab-app::search` |
|
||||
| `kebab-config::search` settings (`default_k`, `snippet_chars`) | `kebab_config::Config` | runtime |
|
||||
| SQLite connection (read) | `rusqlite::Connection` | `kebab-store-sqlite` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<SearchHit>` | `kb_core::SearchHit` | `kb-cli` printer, `kb-rag` packer (P4), hybrid (p3-4) |
|
||||
| `Vec<SearchHit>` | `kebab_core::SearchHit` | `kebab-cli` printer, `kebab-rag` packer (P4), hybrid (p3-4) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -53,12 +53,12 @@ First concrete `Retriever`. Lets `kb search --mode lexical` work without any emb
|
||||
pub struct LexicalRetriever { /* internal: holds an Arc<rusqlite::Connection> + IndexVersion */ }
|
||||
|
||||
impl LexicalRetriever {
|
||||
pub fn new(store: std::sync::Arc<kb_store_sqlite::SqliteStore>, index_version: kb_core::IndexVersion) -> Self;
|
||||
pub fn new(store: std::sync::Arc<kebab_store_sqlite::SqliteStore>, index_version: kebab_core::IndexVersion) -> Self;
|
||||
}
|
||||
|
||||
impl kb_core::Retriever for LexicalRetriever {
|
||||
fn search(&self, query: &kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>;
|
||||
fn index_version(&self) -> kb_core::IndexVersion;
|
||||
impl kebab_core::Retriever for LexicalRetriever {
|
||||
fn search(&self, query: &kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
|
||||
fn index_version(&self) -> kebab_core::IndexVersion;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -98,8 +98,8 @@ impl kb_core::Retriever for LexicalRetriever {
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Reads only. Never mutates `kb.sqlite`.
|
||||
- Wire: `Vec<SearchHit>` serialized via wire schema `search_hit.v1` when `kb-cli --json` is used.
|
||||
- Reads only. Never mutates `kebab.sqlite`.
|
||||
- Wire: `Vec<SearchHit>` serialized via wire schema `search_hit.v1` when `kebab-cli --json` is used.
|
||||
|
||||
## Test plan
|
||||
|
||||
@@ -114,13 +114,13 @@ impl kb_core::Retriever for LexicalRetriever {
|
||||
| determinism | identical query twice produces identical hit order and scores | tmp DB |
|
||||
| snapshot | `Vec<SearchHit>` JSON for fixed corpus stable | `fixtures/search/lexical/run-1.json` |
|
||||
|
||||
All tests under `cargo test -p kb-search lexical`.
|
||||
All tests under `cargo test -p kebab-search lexical`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-search` passes
|
||||
- [ ] `cargo test -p kb-search lexical` passes
|
||||
- [ ] No imports outside Allowed dependencies (`cargo tree -p kb-search` audit)
|
||||
- [ ] `cargo check -p kebab-search` passes
|
||||
- [ ] `cargo test -p kebab-search lexical` passes
|
||||
- [ ] No imports outside Allowed dependencies (`cargo tree -p kebab-search` audit)
|
||||
- [ ] Output JSON conforms to `docs/wire-schema/v1/search_hit.schema.json`
|
||||
- [ ] PR links design §3.7, §0 Q3, §2.2
|
||||
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P3
|
||||
component: kb-embed (trait crate)
|
||||
component: kebab-embed (trait crate)
|
||||
task_id: p3-1
|
||||
title: "Embedder trait + EmbeddingInput/Kind validation"
|
||||
status: completed
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p3-2, p3-3, p3-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [design §3.7 SearchHit.embedding_model, design §7.1 EmbeddingInput/Kind, design §7.2 Embedder, report §11 LLM/embedding split]
|
||||
---
|
||||
|
||||
@@ -14,16 +14,16 @@ contract_sections: [design §3.7 SearchHit.embedding_model, design §7.1 Embeddi
|
||||
|
||||
## Goal
|
||||
|
||||
Provide the `kb-embed` crate that re-exports `Embedder` trait, `EmbeddingInput`/`EmbeddingKind`, and offers a mock implementation for downstream tests. This task is **trait-only**; concrete adapters live in p3-2.
|
||||
Provide the `kebab-embed` crate that re-exports `Embedder` trait, `EmbeddingInput`/`EmbeddingKind`, and offers a mock implementation for downstream tests. This task is **trait-only**; concrete adapters live in p3-2.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. Owning the trait + a mock implementation in a tiny crate keeps `kb-store-vector` and `kb-search` testable without touching real models.
|
||||
Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. Owning the trait + a mock implementation in a tiny crate keeps `kebab-store-vector` and `kebab-search` testable without touching real models.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `serde`
|
||||
- `thiserror`
|
||||
- `tracing`
|
||||
@@ -31,35 +31,35 @@ Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface.
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `fastembed`, `ort`, `tokenizers`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `fastembed`, `ort`, `tokenizers`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `EmbeddingInput` | `kb_core::EmbeddingInput<'_>` | callers (parser-side or query-side) |
|
||||
| `EmbeddingInput` | `kebab_core::EmbeddingInput<'_>` | callers (parser-side or query-side) |
|
||||
| model identity | `(EmbeddingModelId, EmbeddingVersion, dimensions)` | adapter at construction |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<Vec<f32>>` | row-aligned with input | `kb-store-vector`, `kb-search` (vector mode) |
|
||||
| `Vec<Vec<f32>>` | row-aligned with input | `kebab-store-vector`, `kebab-search` (vector mode) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub use kb_core::{EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion, Embedder};
|
||||
pub use kebab_core::{EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion, Embedder};
|
||||
|
||||
/// Test-only mock that produces deterministic vectors. Compiled only when `mock` feature is on.
|
||||
#[cfg(feature = "mock")]
|
||||
pub struct MockEmbedder { /* internal: model_id, dims, seed */ }
|
||||
#[cfg(feature = "mock")]
|
||||
impl MockEmbedder {
|
||||
pub fn new(model_id: kb_core::EmbeddingModelId, version: kb_core::EmbeddingVersion, dimensions: usize) -> Self;
|
||||
pub fn new(model_id: kebab_core::EmbeddingModelId, version: kebab_core::EmbeddingVersion, dimensions: usize) -> Self;
|
||||
}
|
||||
#[cfg(feature = "mock")]
|
||||
impl kb_core::Embedder for MockEmbedder { /* per §7.2 */ }
|
||||
impl kebab_core::Embedder for MockEmbedder { /* per §7.2 */ }
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -84,21 +84,21 @@ impl kb_core::Embedder for MockEmbedder { /* per §7.2 */ }
|
||||
| unit | dimensions match construction-time value | inline |
|
||||
| contract | property test: 100 random inputs, each vector has length == dimensions, all finite floats | inline (proptest) |
|
||||
|
||||
All tests under `cargo test -p kb-embed`.
|
||||
All tests under `cargo test -p kebab-embed`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-embed` passes
|
||||
- [ ] `cargo test -p kb-embed` passes
|
||||
- [ ] `cargo check -p kebab-embed` passes
|
||||
- [ ] `cargo test -p kebab-embed` passes
|
||||
- [ ] No external embedding dep present
|
||||
- [ ] PR links design §7.2 Embedder, §11
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Real adapter (`kb-embed-local` is p3-2).
|
||||
- Real adapter (`kebab-embed-local` is p3-2).
|
||||
- Reranker traits (P+).
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- `MockEmbedder` is gated by `mock` feature (default OFF). Downstream tests opt in via `[dev-dependencies] kb-embed = { path = "...", features = ["mock"] }`. CI build of release binary (`cargo build --release` without `--features mock`) MUST NOT include `MockEmbedder` symbol — verifiable via `cargo bloat` or `nm` symbol scan.
|
||||
- Trait re-exports keep the call site stable even if `kb-core` reorganizes; downstream crates should `use kb_embed::Embedder` rather than `use kb_core::Embedder`.
|
||||
- `MockEmbedder` is gated by `mock` feature (default OFF). Downstream tests opt in via `[dev-dependencies] kebab-embed = { path = "...", features = ["mock"] }`. CI build of release binary (`cargo build --release` without `--features mock`) MUST NOT include `MockEmbedder` symbol — verifiable via `cargo bloat` or `nm` symbol scan.
|
||||
- Trait re-exports keep the call site stable even if `kebab-core` reorganizes; downstream crates should `use kebab_embed::Embedder` rather than `use kebab_core::Embedder`.
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P3
|
||||
component: kb-embed-local (fastembed adapter)
|
||||
component: kebab-embed-local (fastembed adapter)
|
||||
task_id: p3-2
|
||||
title: "fastembed-rs Embedder for multilingual-e5-small"
|
||||
status: completed
|
||||
depends_on: [p3-1]
|
||||
unblocks: [p3-3, p3-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [design §7.2 Embedder, report §11.3 local embedding, design §6.4 [models.embedding], design §9 versioning]
|
||||
---
|
||||
|
||||
@@ -22,9 +22,9 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-embed`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-embed`
|
||||
- `fastembed = "4"` (or current stable)
|
||||
- `tokenizers`
|
||||
- `ort` (transitive via fastembed)
|
||||
@@ -33,21 +33,21 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, network HTTP libs (model download is fastembed's responsibility)
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, network HTTP libs (model download is fastembed's responsibility)
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-config::Config.models.embedding` | settings | runtime |
|
||||
| `EmbeddingInput[..]` | `kb_core::EmbeddingInput<'_>[]` | callers |
|
||||
| `kebab-config::Config.models.embedding` | settings | runtime |
|
||||
| `EmbeddingInput[..]` | `kebab_core::EmbeddingInput<'_>[]` | callers |
|
||||
| model cache | `data_dir/models/fastembed/` | filesystem |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<Vec<f32>>` | row-aligned, `dimensions = 384` | `kb-store-vector`, query vectors for hybrid search |
|
||||
| `Vec<Vec<f32>>` | row-aligned, `dimensions = 384` | `kebab-store-vector`, query vectors for hybrid search |
|
||||
| model identity | `(EmbeddingModelId, EmbeddingVersion, usize)` | record fields, `embedding_id` recipe |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
@@ -56,14 +56,14 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
|
||||
pub struct FastembedEmbedder { /* internal: TextEmbedding instance + model meta */ }
|
||||
|
||||
impl FastembedEmbedder {
|
||||
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
|
||||
}
|
||||
|
||||
impl kb_core::Embedder for FastembedEmbedder {
|
||||
fn model_id(&self) -> kb_core::EmbeddingModelId;
|
||||
fn model_version(&self) -> kb_core::EmbeddingVersion;
|
||||
impl kebab_core::Embedder for FastembedEmbedder {
|
||||
fn model_id(&self) -> kebab_core::EmbeddingModelId;
|
||||
fn model_version(&self) -> kebab_core::EmbeddingVersion;
|
||||
fn dimensions(&self) -> usize;
|
||||
fn embed(&self, inputs: &[kb_core::EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>;
|
||||
fn embed(&self, inputs: &[kebab_core::EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -96,12 +96,12 @@ impl kb_core::Embedder for FastembedEmbedder {
|
||||
| performance | batch of 64 short inputs completes in < 5s on CI host | tmp config (skip on slow CI via `#[ignore]`) |
|
||||
| snapshot | aggregate hash of vectors for 5 known sentences stable across runs | `fixtures/embed/known-sentences.json` |
|
||||
|
||||
All tests under `cargo test -p kb-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane.
|
||||
All tests under `cargo test -p kebab-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-embed-local` passes
|
||||
- [ ] `cargo test -p kb-embed-local` passes (excluding `#[ignore]`)
|
||||
- [ ] `cargo check -p kebab-embed-local` passes
|
||||
- [ ] `cargo test -p kebab-embed-local` passes (excluding `#[ignore]`)
|
||||
- [ ] First-run model download works under `data_dir/models/fastembed/`
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §11.3, §6.4, §9
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P3
|
||||
component: kb-store-vector (LanceDB)
|
||||
component: kebab-store-vector (LanceDB)
|
||||
task_id: p3-3
|
||||
title: "LanceDB VectorStore + embedding_records writer"
|
||||
status: completed
|
||||
depends_on: [p3-2, p1-6]
|
||||
unblocks: [p3-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§5.6 embedding_records, §6.3 lancedb table naming, §7.2 VectorStore, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -18,13 +18,13 @@ Implement `VectorStore` over LanceDB (embedded). Stores per-model tables (`chunk
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details.
|
||||
Closes the loop chunk → vector. Splits cleanly from `kebab-search` so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-store-sqlite` (only for writing/reading rows in `embedding_records`)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-store-sqlite` (only for writing/reading rows in `embedding_records`)
|
||||
- `lancedb`
|
||||
- `arrow` (and `arrow-array`, `arrow-schema`)
|
||||
- `serde`, `serde_json`
|
||||
@@ -33,16 +33,16 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-embed*` (consumes `Vec<f32>` via input only — no embedding logic here), `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-embed*` (consumes `Vec<f32>` via input only — no embedding logic here), `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `VectorRecord[..]` | `kb_core::VectorRecord` | `kb-app::embed_index` (P3 facade) |
|
||||
| query vector | `&[f32]` | `kb-embed-local` (`Embedder::embed` for query) |
|
||||
| filters | `kb_core::SearchFilters` | `SearchQuery` |
|
||||
| `kb-config::Config.storage.vector_dir` | path | runtime |
|
||||
| `VectorRecord[..]` | `kebab_core::VectorRecord` | `kebab-app::embed_index` (P3 facade) |
|
||||
| query vector | `&[f32]` | `kebab-embed-local` (`Embedder::embed` for query) |
|
||||
| filters | `kebab_core::SearchFilters` | `SearchQuery` |
|
||||
| `kebab-config::Config.storage.vector_dir` | path | runtime |
|
||||
|
||||
## Outputs
|
||||
|
||||
@@ -50,7 +50,7 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
|
||||
|--------|------|------------|
|
||||
| Lance tables under `vector_dir/chunk_embeddings_<model>_<dim>.lance/` | filesystem | future searches |
|
||||
| `embedding_records` rows | SQLite | reverse lookup, reindex bookkeeping |
|
||||
| `Vec<VectorHit>` | `kb_core::VectorHit` | hybrid retriever (p3-4) |
|
||||
| `Vec<VectorHit>` | `kebab_core::VectorHit` | hybrid retriever (p3-4) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -58,13 +58,13 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
|
||||
pub struct LanceVectorStore { /* internal: connection + sqlite handle */ }
|
||||
|
||||
impl LanceVectorStore {
|
||||
pub fn new(config: &kb_config::Config, sqlite: std::sync::Arc<kb_store_sqlite::SqliteStore>) -> anyhow::Result<Self>;
|
||||
pub fn new(config: &kebab_config::Config, sqlite: std::sync::Arc<kebab_store_sqlite::SqliteStore>) -> anyhow::Result<Self>;
|
||||
}
|
||||
|
||||
impl kb_core::VectorStore for LanceVectorStore {
|
||||
fn ensure_table(&self, model: &kb_core::EmbeddingModelId, dim: usize) -> anyhow::Result<kb_core::IndexId>;
|
||||
fn upsert(&self, recs: &[kb_core::VectorRecord]) -> anyhow::Result<()>;
|
||||
fn search(&self, query_vec: &[f32], k: usize, filters: &kb_core::SearchFilters) -> anyhow::Result<Vec<kb_core::VectorHit>>;
|
||||
impl kebab_core::VectorStore for LanceVectorStore {
|
||||
fn ensure_table(&self, model: &kebab_core::EmbeddingModelId, dim: usize) -> anyhow::Result<kebab_core::IndexId>;
|
||||
fn upsert(&self, recs: &[kebab_core::VectorRecord]) -> anyhow::Result<()>;
|
||||
fn search(&self, query_vec: &[f32], k: usize, filters: &kebab_core::SearchFilters) -> anyhow::Result<Vec<kebab_core::VectorHit>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -116,12 +116,12 @@ impl kb_core::VectorStore for LanceVectorStore {
|
||||
| determinism | same query vector + same data → same top-k order | tmp data_dir |
|
||||
| snapshot | `Vec<VectorHit>` JSON for fixed corpus stable | `fixtures/vector/run-1.json` |
|
||||
|
||||
All tests under `cargo test -p kb-store-vector`.
|
||||
All tests under `cargo test -p kebab-store-vector`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-store-vector` passes
|
||||
- [ ] `cargo test -p kb-store-vector` passes
|
||||
- [ ] `cargo check -p kebab-store-vector` passes
|
||||
- [ ] `cargo test -p kebab-store-vector` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] `embedding_records` rows align 1:1 with Lance rows after a successful upsert batch
|
||||
- [ ] PR links design §5.6, §6.3, §7.2
|
||||
@@ -130,7 +130,7 @@ All tests under `cargo test -p kb-store-vector`.
|
||||
|
||||
- IVF / PQ index tuning (P+).
|
||||
- Image / multimodal vector tables (P6).
|
||||
- `kb-app` orchestration of indexing jobs (`embed_index` facade method body).
|
||||
- `kebab-app` orchestration of indexing jobs (`embed_index` facade method body).
|
||||
|
||||
## Risks / notes
|
||||
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P3
|
||||
component: kb-search (hybrid)
|
||||
component: kebab-search (hybrid)
|
||||
task_id: p3-4
|
||||
title: "Hybrid Retriever (RRF) over lexical + vector"
|
||||
status: completed
|
||||
depends_on: [p2-2, p3-3]
|
||||
unblocks: [p4-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 [search] rrf settings]
|
||||
---
|
||||
|
||||
@@ -22,17 +22,17 @@ Single mediator. Keeps the lexical and vector retrievers focused; only this task
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-store-sqlite` (for `LexicalRetriever`)
|
||||
- `kb-store-vector` (for `LanceVectorStore`)
|
||||
- `kb-embed` (trait only — for query embedding via `Embedder`)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-store-sqlite` (for `LexicalRetriever`)
|
||||
- `kebab-store-vector` (for `LanceVectorStore`)
|
||||
- `kebab-embed` (trait only — for query embedding via `Embedder`)
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`. (`kb-embed-local` is a runtime-injected `dyn Embedder`; this crate must not depend on the concrete adapter directly.)
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`. (`kebab-embed-local` is a runtime-injected `dyn Embedder`; this crate must not depend on the concrete adapter directly.)
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -41,21 +41,21 @@ Single mediator. Keeps the lexical and vector retrievers focused; only this task
|
||||
| `LexicalRetriever` | trait object | constructed elsewhere |
|
||||
| `LanceVectorStore` | trait object | constructed elsewhere |
|
||||
| `Box<dyn Embedder>` | for query embedding | runtime-injected |
|
||||
| `kb-config::Config.search` | `default_k`, `hybrid_fusion`, `rrf_k` | runtime |
|
||||
| `SearchQuery` | `kb_core::SearchQuery` | `kb-app::search` |
|
||||
| `kebab-config::Config.search` | `default_k`, `hybrid_fusion`, `rrf_k` | runtime |
|
||||
| `SearchQuery` | `kebab_core::SearchQuery` | `kebab-app::search` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<SearchHit>` (with full `RetrievalDetail`) | `kb_core::SearchHit` | `kb-cli` printer, `kb-rag` packer |
|
||||
| `Vec<SearchHit>` (with full `RetrievalDetail`) | `kebab_core::SearchHit` | `kebab-cli` printer, `kebab-rag` packer |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct HybridRetriever {
|
||||
lexical: std::sync::Arc<dyn kb_core::Retriever>,
|
||||
vector: std::sync::Arc<dyn kb_core::Retriever>, // wrapper over LanceVectorStore + Embedder
|
||||
lexical: std::sync::Arc<dyn kebab_core::Retriever>,
|
||||
vector: std::sync::Arc<dyn kebab_core::Retriever>, // wrapper over LanceVectorStore + Embedder
|
||||
fusion: FusionPolicy,
|
||||
k: usize,
|
||||
}
|
||||
@@ -64,27 +64,27 @@ pub enum FusionPolicy { Rrf { k_rrf: u32 } }
|
||||
|
||||
impl HybridRetriever {
|
||||
pub fn new(
|
||||
config: &kb_config::Config,
|
||||
lexical: std::sync::Arc<dyn kb_core::Retriever>,
|
||||
vector: std::sync::Arc<dyn kb_core::Retriever>,
|
||||
config: &kebab_config::Config,
|
||||
lexical: std::sync::Arc<dyn kebab_core::Retriever>,
|
||||
vector: std::sync::Arc<dyn kebab_core::Retriever>,
|
||||
) -> Self;
|
||||
}
|
||||
|
||||
impl kb_core::Retriever for HybridRetriever {
|
||||
fn search(&self, query: &kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>;
|
||||
fn index_version(&self) -> kb_core::IndexVersion;
|
||||
impl kebab_core::Retriever for HybridRetriever {
|
||||
fn search(&self, query: &kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
|
||||
fn index_version(&self) -> kebab_core::IndexVersion;
|
||||
}
|
||||
|
||||
/// Wrapper that turns a VectorStore + Embedder into a Retriever.
|
||||
pub struct VectorRetriever {
|
||||
store: std::sync::Arc<dyn kb_core::VectorStore>,
|
||||
embed: std::sync::Arc<dyn kb_core::Embedder>,
|
||||
/* heading_path/snippet enrichment hits SQLite via kb-store-sqlite read accessor */
|
||||
store: std::sync::Arc<dyn kebab_core::VectorStore>,
|
||||
embed: std::sync::Arc<dyn kebab_core::Embedder>,
|
||||
/* heading_path/snippet enrichment hits SQLite via kebab-store-sqlite read accessor */
|
||||
}
|
||||
impl VectorRetriever {
|
||||
pub fn new(store: std::sync::Arc<dyn kb_core::VectorStore>, embed: std::sync::Arc<dyn kb_core::Embedder>, sqlite: std::sync::Arc<kb_store_sqlite::SqliteStore>) -> Self;
|
||||
pub fn new(store: std::sync::Arc<dyn kebab_core::VectorStore>, embed: std::sync::Arc<dyn kebab_core::Embedder>, sqlite: std::sync::Arc<kebab_store_sqlite::SqliteStore>) -> Self;
|
||||
}
|
||||
impl kb_core::Retriever for VectorRetriever { /* per §7.2 */ }
|
||||
impl kebab_core::Retriever for VectorRetriever { /* per §7.2 */ }
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -124,12 +124,12 @@ impl kb_core::Retriever for VectorRetriever { /* per §7.2 */ }
|
||||
| determinism | identical query twice → byte-identical `Vec<SearchHit>` | tmp DB |
|
||||
| snapshot | hybrid output JSON stable | `fixtures/search/hybrid/run-1.json` |
|
||||
|
||||
All tests under `cargo test -p kb-search hybrid`.
|
||||
All tests under `cargo test -p kebab-search hybrid`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-search` passes
|
||||
- [ ] `cargo test -p kb-search hybrid` passes
|
||||
- [ ] `cargo check -p kebab-search` passes
|
||||
- [ ] `cargo test -p kebab-search hybrid` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.7, §6.4 search, §0 Q3
|
||||
|
||||
|
||||
@@ -1,64 +1,64 @@
|
||||
---
|
||||
phase: P3
|
||||
component: kb-app (facade wiring)
|
||||
component: kebab-app (facade wiring)
|
||||
task_id: p3-5
|
||||
title: "Wire kb-app facade — ingest / search / list / inspect end-to-end"
|
||||
title: "Wire kebab-app facade — ingest / search / list / inspect end-to-end"
|
||||
status: completed
|
||||
depends_on: [p1-6, p2-2, p3-2, p3-3, p3-4]
|
||||
unblocks: [p4-3, p9-1, p9-2, p9-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§7 components, §7.2 traits, §1 UX scenes (ingest, search, list, inspect), §6.4 config, §10 errors]
|
||||
---
|
||||
|
||||
# p3-5 — Wire kb-app facade
|
||||
# p3-5 — Wire kebab-app facade
|
||||
|
||||
## Goal
|
||||
|
||||
Replace the `bail!("not yet wired")` stubs in `kb-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1–P3. After this task, `kb index` actually walks a workspace and persists chunks, and `kb search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kb-app::ask` stays stubbed (P4-3 owns it).
|
||||
Replace the `bail!("not yet wired")` stubs in `kebab-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1–P3. After this task, `kebab index` actually walks a workspace and persists chunks, and `kebab search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kebab-app::ask` stays stubbed (P4-3 owns it).
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
P1–P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kb-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3.
|
||||
P1–P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kebab-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
`kb-app` may depend on:
|
||||
`kebab-app` may depend on:
|
||||
|
||||
- `kb-core`, `kb-config` (already)
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-sqlite` (P1)
|
||||
- `kb-search` (P2-2, P3-4)
|
||||
- `kb-store-vector`, `kb-embed`, `kb-embed-local` (P3-2, P3-3, P3-4)
|
||||
- `kebab-core`, `kebab-config` (already)
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-sqlite` (P1)
|
||||
- `kebab-search` (P2-2, P3-4)
|
||||
- `kebab-store-vector`, `kebab-embed`, `kebab-embed-local` (P3-2, P3-3, P3-4)
|
||||
- `tracing`, `anyhow`, `serde`, `serde_json`, `time`, `dirs`, `toml` (existing)
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-llm*`, `kb-rag` (P4 ownership — `ask` stays stubbed)
|
||||
- `kb-tui`, `kb-desktop` (P9 — those *consume* `kb-app`, not the other way)
|
||||
- `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio` (P6/P7/P8)
|
||||
- network HTTP libs (model download is `kb-embed-local`'s responsibility via fastembed)
|
||||
- `kebab-llm*`, `kebab-rag` (P4 ownership — `ask` stays stubbed)
|
||||
- `kebab-tui`, `kebab-desktop` (P9 — those *consume* `kebab-app`, not the other way)
|
||||
- `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio` (P6/P7/P8)
|
||||
- network HTTP libs (model download is `kebab-embed-local`'s responsibility via fastembed)
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-config::Config` | `kb_config::Config` | loaded once at process start; threaded through every facade call |
|
||||
| `SourceScope` | `kb_core::SourceScope` | CLI |
|
||||
| `SearchQuery` | `kb_core::SearchQuery` | CLI |
|
||||
| `DocFilter` | `kb_core::DocFilter` | CLI |
|
||||
| `DocumentId` / `ChunkId` | `kb_core::ids::*` | CLI |
|
||||
| `kebab-config::Config` | `kebab_config::Config` | loaded once at process start; threaded through every facade call |
|
||||
| `SourceScope` | `kebab_core::SourceScope` | CLI |
|
||||
| `SearchQuery` | `kebab_core::SearchQuery` | CLI |
|
||||
| `DocFilter` | `kebab_core::DocFilter` | CLI |
|
||||
| `DocumentId` / `ChunkId` | `kebab_core::ids::*` | CLI |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `IngestReport` | `kb_core::IngestReport` | `kb-cli` (`ingest_report.v1`), `kb-eval` (P5) |
|
||||
| `Vec<SearchHit>` | `kb_core::SearchHit` | `kb-cli` (`search_hit.v1`), `kb-rag` (P4-3) |
|
||||
| `Vec<DocSummary>` | `kb_core::DocSummary` | `kb-cli` (`doc_summary.v1`) |
|
||||
| `CanonicalDocument` / `Chunk` | `kb_core::*` | `kb-cli` (`chunk_inspection.v1`) |
|
||||
| `IngestReport` | `kebab_core::IngestReport` | `kebab-cli` (`ingest_report.v1`), `kebab-eval` (P5) |
|
||||
| `Vec<SearchHit>` | `kebab_core::SearchHit` | `kebab-cli` (`search_hit.v1`), `kebab-rag` (P4-3) |
|
||||
| `Vec<DocSummary>` | `kebab_core::DocSummary` | `kebab-cli` (`doc_summary.v1`) |
|
||||
| `CanonicalDocument` / `Chunk` | `kebab_core::*` | `kebab-cli` (`chunk_inspection.v1`) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
The signatures already live on `kb-app`. This task only swaps the bodies. Frozen surface:
|
||||
The signatures already live on `kebab-app`. This task only swaps the bodies. Frozen surface:
|
||||
|
||||
```rust
|
||||
pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>;
|
||||
@@ -69,7 +69,7 @@ pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHi
|
||||
// `ask` and `init_workspace` / `doctor` stay as-is.
|
||||
```
|
||||
|
||||
If a constructor helper is needed (e.g., a single `App::open(&Config)` that opens `SqliteStore`, `LanceVectorStore`, embedder, retrievers once), it MAY be added pub-or-pub(crate) to `kb-app`. Keep all newly-added types crate-private unless explicitly required by `kb-cli`.
|
||||
If a constructor helper is needed (e.g., a single `App::open(&Config)` that opens `SqliteStore`, `LanceVectorStore`, embedder, retrievers once), it MAY be added pub-or-pub(crate) to `kebab-app`. Keep all newly-added types crate-private unless explicitly required by `kebab-cli`.
|
||||
|
||||
## Behavior contract
|
||||
|
||||
@@ -77,8 +77,8 @@ If a constructor helper is needed (e.g., a single `App::open(&Config)` that open
|
||||
|
||||
Pipeline per design §1.2:
|
||||
|
||||
1. Resolve `scope` → set of files under `config.workspace.root` filtered by `config.workspace.include`/`exclude`. Use `kb-source-fs::FsSourceConnector` and pass each `RawAsset` + bytes through.
|
||||
2. For each markdown asset: `kb_parse_md::frontmatter::parse_frontmatter` + `kb_parse_md::blocks::parse_blocks` → `kb_normalize::build_canonical_document` → `kb_chunk::MdHeadingV1Chunker::chunk`.
|
||||
1. Resolve `scope` → set of files under `config.workspace.root` filtered by `config.workspace.include`/`exclude`. Use `kebab-source-fs::FsSourceConnector` and pass each `RawAsset` + bytes through.
|
||||
2. For each markdown asset: `kebab_parse_md::frontmatter::parse_frontmatter` + `kebab_parse_md::blocks::parse_blocks` → `kebab_normalize::build_canonical_document` → `kebab_chunk::MdHeadingV1Chunker::chunk`.
|
||||
3. SQLite writes per design §5.8 (one ingest = one transaction): `put_asset_with_bytes` → `put_document` → `put_blocks` → `put_chunks`.
|
||||
4. If `config.models.embedding.provider == "fastembed"`: build `FastembedEmbedder` once at the top of the run, embed every chunk's `text` as `EmbeddingKind::Document`, batch by `config.models.embedding.batch_size`, then `LanceVectorStore::ensure_table` + `upsert`. Skip vector indexing if provider is `"none"` or `embedding.dimensions == 0` (config opt-out path).
|
||||
5. Record an `ingest_runs` row via `JobRepo` with the aggregate counts. `summary_only=true` writes `items_json=NULL`.
|
||||
@@ -90,9 +90,9 @@ Errors: any per-file parse failure should be recorded as a `Warning` in `Provena
|
||||
|
||||
Per `query.mode`:
|
||||
|
||||
- `Lexical` → `kb_search::LexicalRetriever::search`. Takes a read-only borrow of the SQLite connection.
|
||||
- `Vector` → `kb_search::VectorRetriever::search`. Requires `Embedder` (built from config) + `VectorStore` (LanceDB).
|
||||
- `Hybrid` → `kb_search::HybridRetriever::search` composing the above two.
|
||||
- `Lexical` → `kebab_search::LexicalRetriever::search`. Takes a read-only borrow of the SQLite connection.
|
||||
- `Vector` → `kebab_search::VectorRetriever::search`. Requires `Embedder` (built from config) + `VectorStore` (LanceDB).
|
||||
- `Hybrid` → `kebab_search::HybridRetriever::search` composing the above two.
|
||||
|
||||
Each call constructs (or reuses) the retrievers from a single `App` context that holds `Arc<SqliteStore>`, `Arc<LanceVectorStore>`, `Arc<dyn Embedder>` so cold-start cost is paid once per process. Document the lifetime model: a CLI invocation paths through `App::open(&Config)` once, runs the requested op, then drops everything on exit. The TUI (P9) holds the `App` for the session.
|
||||
|
||||
@@ -100,17 +100,17 @@ Each call constructs (or reuses) the retrievers from a single `App` context that
|
||||
|
||||
### `list_docs(filter)` / `inspect_doc(id)` / `inspect_chunk(id)`
|
||||
|
||||
Direct delegation to `kb_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement.
|
||||
Direct delegation to `kebab_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement.
|
||||
|
||||
### Lifecycle
|
||||
|
||||
Define an internal `pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result<Self>`. Each public `kb-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kb-cli` and `kb-tui` don't need to refactor.
|
||||
Define an internal `pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result<Self>`. Each public `kebab-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kebab-cli` and `kebab-tui` don't need to refactor.
|
||||
|
||||
`vector` and `embedder` are `Option` because the spec allows running KB without embeddings (lexical-only mode for resource-constrained boxes — `provider == "none"`). When absent, `search` with `Vector` / `Hybrid` modes returns `anyhow::Error` with a hint to enable embeddings; `ingest` skips the vector step.
|
||||
|
||||
### Versioning
|
||||
|
||||
- `parser_version` from `kb-parse-md` constant.
|
||||
- `parser_version` from `kebab-parse-md` constant.
|
||||
- `chunker_version` from `MdHeadingV1Chunker::chunker_version()`.
|
||||
- `embedding_model` / `embedding_version` from the embedder.
|
||||
- `index_version` from the retriever (or `Lexical`/`Vector` IndexId).
|
||||
@@ -119,15 +119,15 @@ All wired through to the persisted records and the wire output.
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Writes: `data_dir/kb.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets/<aa>/<id>` (when copied), `data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only).
|
||||
- Writes: `data_dir/kebab.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets/<aa>/<id>` (when copied), `data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only).
|
||||
- Reads on subsequent calls: same DB.
|
||||
- Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1` — `kb-cli` already has the wrappers.
|
||||
- Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1` — `kebab-cli` already has the wrappers.
|
||||
|
||||
## Test plan
|
||||
|
||||
| kind | description | fixture / data |
|
||||
|------|-------------|----------------|
|
||||
| integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kb-app/tests/fixtures/workspace/`, tmp `data_dir` |
|
||||
| integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kebab-app/tests/fixtures/workspace/`, tmp `data_dir` |
|
||||
| integration | `ingest` with `provider="none"` skips Lance and produces an `IngestReport` with `embeddings_indexed=0` | tmp config |
|
||||
| integration | `ingest` is idempotent: re-running on the same workspace updates `documents.updated_at` and bumps `doc_version` without duplicating rows | tmp `data_dir` |
|
||||
| integration | `search` with `mode=Lexical` returns hits whose `embedding_model` is `None` | tmp `data_dir` |
|
||||
@@ -136,36 +136,36 @@ All wired through to the persisted records and the wire output.
|
||||
| integration | `search` with `mode=Vector` and `provider="none"` returns a clear error | tmp config |
|
||||
| integration | `list_docs` with `tags_any=["rust"]` filters correctly | tmp `data_dir` |
|
||||
| integration | `inspect_doc` round-trips a document; `inspect_chunk` round-trips a chunk | tmp `data_dir` |
|
||||
| smoke | end-to-end CLI smoke: `kb index` + `kb search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture |
|
||||
| smoke | end-to-end CLI smoke: `kebab index` + `kebab search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture |
|
||||
|
||||
`#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kb-app` runs the lexical-only and SQLite-only paths.
|
||||
`#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kebab-app` runs the lexical-only and SQLite-only paths.
|
||||
|
||||
All tests under `cargo test -p kb-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree.
|
||||
All tests under `cargo test -p kebab-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-app` passes.
|
||||
- [ ] `cargo test -p kb-app` passes (default lane).
|
||||
- [ ] `cargo test -p kb-app -- --ignored` passes on AVX hardware.
|
||||
- [ ] `cargo run -p kb-cli -- index` succeeds against a non-empty fixture workspace.
|
||||
- [ ] `cargo run -p kb-cli -- search --mode hybrid "<term>"` returns real hits + citations.
|
||||
- [ ] `cargo run -p kb-cli -- list` returns the indexed documents.
|
||||
- [ ] `cargo check -p kebab-app` passes.
|
||||
- [ ] `cargo test -p kebab-app` passes (default lane).
|
||||
- [ ] `cargo test -p kebab-app -- --ignored` passes on AVX hardware.
|
||||
- [ ] `cargo run -p kebab-cli -- index` succeeds against a non-empty fixture workspace.
|
||||
- [ ] `cargo run -p kebab-cli -- search --mode hybrid "<term>"` returns real hits + citations.
|
||||
- [ ] `cargo run -p kebab-cli -- list` returns the indexed documents.
|
||||
- [ ] `cargo clippy --workspace --all-targets -- -D warnings` clean.
|
||||
- [ ] No imports outside Allowed dependencies.
|
||||
- [ ] PR links design §1.2, §1.5, §1.6, §7.
|
||||
|
||||
## Out of scope
|
||||
|
||||
- `kb-app::ask` body — P4-3 (RAG pipeline).
|
||||
- `kb index --rebuild-fts` CLI option — wire `kb_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task.
|
||||
- `kb index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer.
|
||||
- `kebab-app::ask` body — P4-3 (RAG pipeline).
|
||||
- `kebab index --rebuild-fts` CLI option — wire `kebab_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task.
|
||||
- `kebab index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer.
|
||||
- `--watch` mode — P+.
|
||||
- TUI / desktop integration — P9 consumes the wired facade.
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- Cold-start cost: first `kb index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2).
|
||||
- **Post-merge fix (2026-05)**: original P3-5 implementation left every `kb-cli` subcommand calling the bare `kb_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config <path>` flag. CLI now goes through `kb_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details.
|
||||
- Cold-start cost: first `kebab index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2).
|
||||
- **Post-merge fix (2026-05)**: original P3-5 implementation left every `kebab-cli` subcommand calling the bare `kebab_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config <path>` flag. CLI now goes through `kebab_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details.
|
||||
- The `App` lifecycle struct is internal but its construction is the natural seam for adding caching / connection pooling later. Keep it `pub(crate)` so future refactors don't break the CLI.
|
||||
- Mismatched `index_version` across stored records and the live retriever should fail loud at `App::open` (not at first search). Reuse the `tracing::warn!` from `HybridRetriever::new` (P3-4).
|
||||
- The fastembed adapter holds a `tokio::runtime` (P3-3); `App` must be constructed from a synchronous context. Document on `App::open`.
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P4
|
||||
component: kb-llm (trait crate)
|
||||
component: kebab-llm (trait crate)
|
||||
task_id: p4-1
|
||||
title: "LanguageModel trait + GenerateRequest/TokenChunk"
|
||||
status: completed
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p4-2, p4-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§7.1 GenerateRequest/TokenChunk, §7.2 LanguageModel, §0 Q5 streaming, §3.8 ModelRef]
|
||||
---
|
||||
|
||||
@@ -14,16 +14,16 @@ contract_sections: [§7.1 GenerateRequest/TokenChunk, §7.2 LanguageModel, §0 Q
|
||||
|
||||
## Goal
|
||||
|
||||
Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper types (`GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage`, `ModelRef`), plus a `MockLanguageModel` for downstream tests.
|
||||
Provide the `kebab-llm` crate that re-exports the `LanguageModel` trait and helper types (`GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage`, `ModelRef`), plus a `MockLanguageModel` for downstream tests.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
`kb-rag` (p4-3) consumes a `LanguageModel` trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond.
|
||||
`kebab-rag` (p4-3) consumes a `LanguageModel` trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `serde`
|
||||
- `thiserror`
|
||||
- `tracing`
|
||||
@@ -31,13 +31,13 @@ Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `reqwest`, `ureq`, `tokio`, `whisper-rs`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `reqwest`, `ureq`, `tokio`, `whisper-rs`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `GenerateRequest` | `kb_core::GenerateRequest` | RAG pipeline |
|
||||
| `GenerateRequest` | `kebab_core::GenerateRequest` | RAG pipeline |
|
||||
| concrete adapter at runtime | `dyn LanguageModel` | p4-2+ |
|
||||
|
||||
## Outputs
|
||||
@@ -45,12 +45,12 @@ Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| streaming `TokenChunk` iterator | `Box<dyn Iterator<Item=anyhow::Result<TokenChunk>> + Send>` | RAG pipeline |
|
||||
| `ModelRef` identity | `kb_core::ModelRef` | Answer.model |
|
||||
| `ModelRef` identity | `kebab_core::ModelRef` | Answer.model |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub use kb_core::{LanguageModel, GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef};
|
||||
pub use kebab_core::{LanguageModel, GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef};
|
||||
|
||||
/// Test-only deterministic mock. Compiled only when `mock` feature is on.
|
||||
#[cfg(feature = "mock")]
|
||||
@@ -59,12 +59,12 @@ pub struct MockLanguageModel {
|
||||
pub provider: String,
|
||||
pub context_tokens: usize,
|
||||
pub canned_response: String, // emitted token-by-token
|
||||
pub canned_finish: kb_core::FinishReason,
|
||||
pub canned_usage: kb_core::TokenUsage,
|
||||
pub canned_finish: kebab_core::FinishReason,
|
||||
pub canned_usage: kebab_core::TokenUsage,
|
||||
}
|
||||
|
||||
#[cfg(feature = "mock")]
|
||||
impl kb_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
|
||||
impl kebab_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -89,12 +89,12 @@ impl kb_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
|
||||
| unit | concatenation of streamed `TokenChunk::Token` equals canned text (truncated by stop strings) | inline |
|
||||
| contract | `model_ref()` populates `provider` and leaves `dimensions = None` | inline |
|
||||
|
||||
All tests under `cargo test -p kb-llm`.
|
||||
All tests under `cargo test -p kebab-llm`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-llm` passes
|
||||
- [ ] `cargo test -p kb-llm` passes
|
||||
- [ ] `cargo check -p kebab-llm` passes
|
||||
- [ ] `cargo test -p kebab-llm` passes
|
||||
- [ ] No HTTP / async runtime deps present
|
||||
- [ ] PR links design §7.2 LanguageModel, §0 Q5
|
||||
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P4
|
||||
component: kb-llm-local (Ollama adapter)
|
||||
component: kebab-llm-local (Ollama adapter)
|
||||
task_id: p4-2
|
||||
title: "OllamaLanguageModel — streaming /api/generate"
|
||||
status: completed
|
||||
depends_on: [p4-1]
|
||||
unblocks: [p4-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [design §7.2 LanguageModel, report §11.2 Ollama, design §6.4 [models.llm], design §0 Q5 streaming, design §10 errors]
|
||||
---
|
||||
|
||||
@@ -18,13 +18,13 @@ Implement `OllamaLanguageModel` against Ollama's local HTTP API (`POST /api/gene
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so swapping providers stays config-only.
|
||||
First real LM. Required for `kebab ask` to function. Isolated from RAG pipeline so swapping providers stays config-only.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-llm`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-llm`
|
||||
- `reqwest = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls"] }`
|
||||
- `serde`, `serde_json`
|
||||
- `tracing`
|
||||
@@ -32,21 +32,21 @@ First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `tokio`, `async-std`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-tui`, `kb-desktop`. (Streaming uses `reqwest::blocking::Response::bytes_stream` via line-delimited JSON; no async runtime needed.)
|
||||
- `tokio`, `async-std`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-tui`, `kebab-desktop`. (Streaming uses `reqwest::blocking::Response::bytes_stream` via line-delimited JSON; no async runtime needed.)
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-config::Config.models.llm` | endpoint, model, context, temperature, seed | runtime |
|
||||
| `GenerateRequest` | `kb_core::GenerateRequest` | RAG pipeline |
|
||||
| `kebab-config::Config.models.llm` | endpoint, model, context, temperature, seed | runtime |
|
||||
| `GenerateRequest` | `kebab_core::GenerateRequest` | RAG pipeline |
|
||||
| Ollama HTTP server (local) | `http://127.0.0.1:11434` | external process |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| streaming `TokenChunk` iterator | per §7.2 | `kb-rag` |
|
||||
| streaming `TokenChunk` iterator | per §7.2 | `kebab-rag` |
|
||||
| `ModelRef` | `{ id, provider="ollama", dimensions=None }` | `Answer.model` |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
@@ -55,14 +55,14 @@ First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so
|
||||
pub struct OllamaLanguageModel { /* internal: reqwest::blocking::Client + config */ }
|
||||
|
||||
impl OllamaLanguageModel {
|
||||
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
|
||||
}
|
||||
|
||||
impl kb_core::LanguageModel for OllamaLanguageModel {
|
||||
fn model_ref(&self) -> kb_core::ModelRef;
|
||||
impl kebab_core::LanguageModel for OllamaLanguageModel {
|
||||
fn model_ref(&self) -> kebab_core::ModelRef;
|
||||
fn context_tokens(&self) -> usize;
|
||||
fn generate_stream(&self, req: kb_core::GenerateRequest)
|
||||
-> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<kb_core::TokenChunk>> + Send>>;
|
||||
fn generate_stream(&self, req: kebab_core::GenerateRequest)
|
||||
-> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<kebab_core::TokenChunk>> + Send>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -93,7 +93,7 @@ impl kb_core::LanguageModel for OllamaLanguageModel {
|
||||
- UTF-8 boundary: buffer incomplete byte sequences across stream lines before emitting `TokenChunk::Token`.
|
||||
- Determinism: with `temperature=0` and fixed `seed`, Ollama's output is reproducible (modulo nondeterminism in the model itself); tests that verify determinism use a fixed seed and may rely on aggregate hash with tolerance, NOT byte equality.
|
||||
- `model_ref().provider = "ollama"`, `dimensions = None`.
|
||||
- Reachability check: `OllamaLanguageModel::new` does NOT eagerly hit the network; first failure surfaces on `generate_stream`. Use `kb doctor` (separate task) to probe.
|
||||
- Reachability check: `OllamaLanguageModel::new` does NOT eagerly hit the network; first failure surfaces on `generate_stream`. Use `kebab doctor` (separate task) to probe.
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
@@ -112,12 +112,12 @@ impl kb_core::LanguageModel for OllamaLanguageModel {
|
||||
| determinism | identical request + temperature=0 + seed=0 produces identical token stream against mock | mocked HTTP |
|
||||
| `#[ignore]` integration | real Ollama on `localhost:11434` with `qwen2.5:14b-instruct` produces non-empty output | requires user opt-in |
|
||||
|
||||
All non-ignored tests under `cargo test -p kb-llm-local`. Real-LM integration runs via `cargo test -p kb-llm-local -- --ignored`.
|
||||
All non-ignored tests under `cargo test -p kebab-llm-local`. Real-LM integration runs via `cargo test -p kebab-llm-local -- --ignored`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-llm-local` passes
|
||||
- [ ] `cargo test -p kb-llm-local` passes (mocked tests; real LM behind `#[ignore]`)
|
||||
- [ ] `cargo check -p kebab-llm-local` passes
|
||||
- [ ] `cargo test -p kebab-llm-local` passes (mocked tests; real LM behind `#[ignore]`)
|
||||
- [ ] No async runtime present (uses `reqwest::blocking`)
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §11.2, §0 Q5, §10
|
||||
@@ -125,7 +125,7 @@ All non-ignored tests under `cargo test -p kb-llm-local`. Real-LM integration ru
|
||||
## Out of scope
|
||||
|
||||
- llama.cpp / candle adapters (P+).
|
||||
- Embedding via Ollama's `/api/embed` endpoint (alternate adapter inside `kb-embed-local` if requested later).
|
||||
- Embedding via Ollama's `/api/embed` endpoint (alternate adapter inside `kebab-embed-local` if requested later).
|
||||
- Cancellation / abort tokens (P+).
|
||||
- Connection pooling tuning (default `reqwest::blocking` is sufficient for single-user CLI).
|
||||
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P4
|
||||
component: kb-rag
|
||||
component: kebab-rag
|
||||
task_id: p4-3
|
||||
title: "RAG pipeline: retrieve → gate → pack → generate → cite-validate"
|
||||
status: completed
|
||||
depends_on: [p3-4, p4-2]
|
||||
unblocks: [p5-1]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§0 Q4 refusal (two-layer), §0 Q7 footer, §1.1–1.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 [rag], §10 errors]
|
||||
---
|
||||
|
||||
@@ -22,11 +22,11 @@ This is the user-facing payoff. Splitting it further would couple too many inter
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-search` (Retriever trait object)
|
||||
- `kb-llm` (LanguageModel trait object)
|
||||
- `kb-store-sqlite` (read chunk full text/section + write `answers` row)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-search` (Retriever trait object)
|
||||
- `kebab-llm` (LanguageModel trait object)
|
||||
- `kebab-store-sqlite` (read chunk full text/section + write `answers` row)
|
||||
- `serde`, `serde_json`
|
||||
- `regex` (for citation marker extraction)
|
||||
- `time`
|
||||
@@ -35,51 +35,51 @@ This is the user-facing payoff. Splitting it further would couple too many inter
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector` (only via Retriever trait), `kb-embed*` (only via Retriever), `kb-llm-local` (only via LanguageModel trait), `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector` (only via Retriever trait), `kebab-embed*` (only via Retriever), `kebab-llm-local` (only via LanguageModel trait), `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `query: &str` | text | `kb-app::ask` |
|
||||
| `query: &str` | text | `kebab-app::ask` |
|
||||
| `AskOpts` | k, explain, mode, temperature, seed | CLI |
|
||||
| `dyn Retriever` | hybrid retriever from p3-4 | runtime injection |
|
||||
| `dyn LanguageModel` | from p4-2 (or mock) | runtime injection |
|
||||
| `dyn DocumentStore` | for chunk full-text fetch | from p1-6 |
|
||||
| `kb-config::Config.rag` | `prompt_template_version`, `score_gate`, `max_context_tokens` | runtime |
|
||||
| `kebab-config::Config.rag` | `prompt_template_version`, `score_gate`, `max_context_tokens` | runtime |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Answer` | `kb_core::Answer` | `kb-cli` printer, `answers` table |
|
||||
| `Answer` | `kebab_core::Answer` | `kebab-cli` printer, `answers` table |
|
||||
| `answers` table row | SQLite | history, eval |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct RagPipeline {
|
||||
retriever: std::sync::Arc<dyn kb_core::Retriever>,
|
||||
llm: std::sync::Arc<dyn kb_core::LanguageModel>,
|
||||
docs: std::sync::Arc<kb_store_sqlite::SqliteStore>,
|
||||
config: kb_config::Config,
|
||||
retriever: std::sync::Arc<dyn kebab_core::Retriever>,
|
||||
llm: std::sync::Arc<dyn kebab_core::LanguageModel>,
|
||||
docs: std::sync::Arc<kebab_store_sqlite::SqliteStore>,
|
||||
config: kebab_config::Config,
|
||||
}
|
||||
|
||||
impl RagPipeline {
|
||||
pub fn new(
|
||||
config: kb_config::Config,
|
||||
retriever: std::sync::Arc<dyn kb_core::Retriever>,
|
||||
llm: std::sync::Arc<dyn kb_core::LanguageModel>,
|
||||
docs: std::sync::Arc<kb_store_sqlite::SqliteStore>,
|
||||
config: kebab_config::Config,
|
||||
retriever: std::sync::Arc<dyn kebab_core::Retriever>,
|
||||
llm: std::sync::Arc<dyn kebab_core::LanguageModel>,
|
||||
docs: std::sync::Arc<kebab_store_sqlite::SqliteStore>,
|
||||
) -> Self;
|
||||
|
||||
pub fn ask(&self, query: &str, opts: AskOpts) -> anyhow::Result<kb_core::Answer>;
|
||||
pub fn ask(&self, query: &str, opts: AskOpts) -> anyhow::Result<kebab_core::Answer>;
|
||||
}
|
||||
|
||||
pub struct AskOpts {
|
||||
pub k: usize,
|
||||
pub explain: bool,
|
||||
pub mode: kb_core::SearchMode,
|
||||
pub mode: kebab_core::SearchMode,
|
||||
pub temperature: Option<f32>,
|
||||
pub seed: Option<u64>,
|
||||
pub stream_sink: Option<std::sync::mpsc::Sender<String>>, // tty/UI token streaming
|
||||
@@ -158,12 +158,12 @@ pub struct AskOpts {
|
||||
| determinism | identical inputs + temperature=0 + seed=0 → identical Answer (snapshot) | mock |
|
||||
| snapshot | `Answer` JSON for fixed query stable | `fixtures/rag/run-1.json` |
|
||||
|
||||
All tests under `cargo test -p kb-rag` with no real Ollama (mock LM only).
|
||||
All tests under `cargo test -p kebab-rag` with no real Ollama (mock LM only).
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-rag` passes
|
||||
- [ ] `cargo test -p kb-rag` passes
|
||||
- [ ] `cargo check -p kebab-rag` passes
|
||||
- [ ] `cargo test -p kebab-rag` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] All paths write an `answers` row
|
||||
- [ ] Output JSON conforms to `answer.v1`
|
||||
@@ -178,9 +178,9 @@ All tests under `cargo test -p kb-rag` with no real Ollama (mock LM only).
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- Citation regex is STRICT `\[#(\d{1,3})\]` only. Models that emit `[1]`/`[ #1 ]`/`[foo]` are treated as no-marker → refusal. This is intentional: a noisy citation grammar lets prose `[1]` or `vec![1]` slip through as false positives, which corrupts both `grounded` and `kb eval` `citation_coverage`. The prompt template (`rag-v1`) explicitly instructs `[#번호]`.
|
||||
- **Post-merge fix (2026-05)**: kb-cli's `Cmd::Ask` arm originally called bare `kb_app::ask(query, opts)`, ignoring `--config <path>` and silently using XDG defaults (manifested as wrong model id / score_gate / data_dir surfacing in `Answer.retrieval`). Fixed by routing through `kb_app::ask_with_config(cfg, query, opts)`. See [HOTFIXES.md](../HOTFIXES.md).
|
||||
- **Post-merge fix (2026-05)**: `config.rag.score_gate` default `0.05` was incompatible with hybrid RRF `fusion_score` (raw range `(0, 2/k_rrf]` ≈ `≤ 0.033` at default k_rrf=60), refusing every hybrid `kb ask`. Closed by normalizing RRF in p3-4 to `[0, 1]`. See p3-4 spec + HOTFIXES.md.
|
||||
- Citation regex is STRICT `\[#(\d{1,3})\]` only. Models that emit `[1]`/`[ #1 ]`/`[foo]` are treated as no-marker → refusal. This is intentional: a noisy citation grammar lets prose `[1]` or `vec![1]` slip through as false positives, which corrupts both `grounded` and `kebab eval` `citation_coverage`. The prompt template (`rag-v1`) explicitly instructs `[#번호]`.
|
||||
- **Post-merge fix (2026-05)**: kebab-cli's `Cmd::Ask` arm originally called bare `kebab_app::ask(query, opts)`, ignoring `--config <path>` and silently using XDG defaults (manifested as wrong model id / score_gate / data_dir surfacing in `Answer.retrieval`). Fixed by routing through `kebab_app::ask_with_config(cfg, query, opts)`. See [HOTFIXES.md](../HOTFIXES.md).
|
||||
- **Post-merge fix (2026-05)**: `config.rag.score_gate` default `0.05` was incompatible with hybrid RRF `fusion_score` (raw range `(0, 2/k_rrf]` ≈ `≤ 0.033` at default k_rrf=60), refusing every hybrid `kebab ask`. Closed by normalizing RRF in p3-4 to `[0, 1]`. See p3-4 spec + HOTFIXES.md.
|
||||
- `stream_sink` channel: pipeline `send`s tokens; if the receiver is dropped (caller cancelled), `SendError` is silently swallowed and generation continues to completion (so the `Answer` row still gets persisted). Pipeline does NOT panic on a dead sink.
|
||||
- `temperature=0` does not fully eliminate stochasticity in some quantized Ollama models; document this and rely on `must_contain` rule-based metrics in P5 instead of exact match.
|
||||
- Prompt-injection defense lives entirely in the system prompt; do NOT mutate `[근거]` text. If chunk text contains `<|system|>` or similar tokens, do not strip them — they are inert when wrapped.
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P5
|
||||
component: kb-eval (runner)
|
||||
component: kebab-eval (runner)
|
||||
task_id: p5-1
|
||||
title: "Golden query fixture loader + per-query runner"
|
||||
status: completed
|
||||
depends_on: [p4-3]
|
||||
unblocks: [p5-2]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§5.7 eval_runs/eval_query_results, §6.3 runs_dir, phase epic tasks/phase-5-evaluation.md]
|
||||
---
|
||||
|
||||
@@ -14,7 +14,7 @@ contract_sections: [§5.7 eval_runs/eval_query_results, §6.3 runs_dir, phase ep
|
||||
|
||||
## Goal
|
||||
|
||||
Load `fixtures/golden_queries.yaml`, run each query through `kb-app` (lexical / vector / hybrid / rag), and persist results into `eval_query_results` + `runs_dir/<run_id>/per_query.jsonl`.
|
||||
Load `fixtures/golden_queries.yaml`, run each query through `kebab-app` (lexical / vector / hybrid / rag), and persist results into `eval_query_results` + `runs_dir/<run_id>/per_query.jsonl`.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -22,10 +22,10 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-app` (calls facade for search / ask)
|
||||
- `kb-store-sqlite` (writes eval rows)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-app` (calls facade for search / ask)
|
||||
- `kebab-store-sqlite` (writes eval rows)
|
||||
- `serde`, `serde_yaml`, `serde_json`
|
||||
- `time`
|
||||
- `tracing`
|
||||
@@ -33,7 +33,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (all reached via `kb-app` facade only), `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (all reached via `kebab-app` facade only), `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -41,7 +41,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
|
||||
|-------|------|--------|
|
||||
| `fixtures/golden_queries.yaml` | YAML | repo-shipped |
|
||||
| `EvalRunOpts` | suite, mode, with_rag, k, temperature, seed | CLI |
|
||||
| `kb-app` facade | search/ask | runtime |
|
||||
| `kebab-app` facade | search/ask | runtime |
|
||||
|
||||
## Outputs
|
||||
|
||||
@@ -50,7 +50,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
|
||||
| `eval_runs` row | SQLite | p5-2, history |
|
||||
| `eval_query_results` rows | SQLite | p5-2 |
|
||||
| `runs_dir/<run_id>/per_query.jsonl` | filesystem | external tools, audits |
|
||||
| `EvalRun` struct | `kb_eval::EvalRun` | caller |
|
||||
| `EvalRun` struct | `kebab_eval::EvalRun` | caller |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -58,9 +58,9 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
|
||||
pub struct GoldenQuery {
|
||||
pub id: String,
|
||||
pub query: String,
|
||||
pub lang: kb_core::Lang,
|
||||
pub expected_doc_ids: Vec<kb_core::DocumentId>,
|
||||
pub expected_chunk_ids: Vec<kb_core::ChunkId>,
|
||||
pub lang: kebab_core::Lang,
|
||||
pub expected_doc_ids: Vec<kebab_core::DocumentId>,
|
||||
pub expected_chunk_ids: Vec<kebab_core::ChunkId>,
|
||||
pub must_contain: Vec<String>,
|
||||
pub forbidden: Vec<String>,
|
||||
pub difficulty: Option<String>,
|
||||
@@ -68,7 +68,7 @@ pub struct GoldenQuery {
|
||||
|
||||
pub struct EvalRunOpts {
|
||||
pub suite: String, // "golden" default
|
||||
pub mode: kb_core::SearchMode,
|
||||
pub mode: kebab_core::SearchMode,
|
||||
pub with_rag: bool,
|
||||
pub k: usize,
|
||||
pub temperature: Option<f32>,
|
||||
@@ -86,9 +86,9 @@ pub struct EvalRun {
|
||||
pub struct QueryResult {
|
||||
pub query_id: String,
|
||||
pub query: String,
|
||||
pub mode: kb_core::SearchMode,
|
||||
pub hits_top_k: Vec<kb_core::SearchHit>,
|
||||
pub answer: Option<kb_core::Answer>,
|
||||
pub mode: kebab_core::SearchMode,
|
||||
pub hits_top_k: Vec<kebab_core::SearchHit>,
|
||||
pub answer: Option<kebab_core::Answer>,
|
||||
pub elapsed_ms: u32,
|
||||
pub error: Option<String>,
|
||||
}
|
||||
@@ -103,10 +103,10 @@ pub fn run_eval(opts: &EvalRunOpts) -> anyhow::Result<EvalRun>;
|
||||
- Parses YAML; required fields: `id`, `query`. Optional: everything else (defaults to empty / `None`).
|
||||
- Validates uniqueness of `id` and that `expected_doc_ids` / `expected_chunk_ids` exist in DB; missing → return error listing the offenders.
|
||||
- `run_eval`:
|
||||
- Loads `fixtures/golden_queries.yaml` (path overridable via env `KB_EVAL_GOLDEN`).
|
||||
- Loads `fixtures/golden_queries.yaml` (path overridable via env `KEBAB_EVAL_GOLDEN`).
|
||||
- Generates `run_id = "run_" + ulid_lower()`.
|
||||
- Captures `config_snapshot_json`: serialized `kb_config::Config` plus `chunker_version`, `embedding_model+version+dims`, `llm.model_id`, `prompt_template_version`, `score_gate`, `rrf_k`, `index_version`.
|
||||
- For each query: call `kb_app::search(SearchQuery { mode: opts.mode, k: opts.k, .. })`. If `opts.with_rag`, also call `kb_app::ask(query, AskOpts { mode: opts.mode, k: opts.k, explain: true, temperature: opts.temperature, seed: opts.seed, .. })`.
|
||||
- Captures `config_snapshot_json`: serialized `kebab_config::Config` plus `chunker_version`, `embedding_model+version+dims`, `llm.model_id`, `prompt_template_version`, `score_gate`, `rrf_k`, `index_version`.
|
||||
- For each query: call `kebab_app::search(SearchQuery { mode: opts.mode, k: opts.k, .. })`. If `opts.with_rag`, also call `kebab_app::ask(query, AskOpts { mode: opts.mode, k: opts.k, explain: true, temperature: opts.temperature, seed: opts.seed, .. })`.
|
||||
- Each `QueryResult` measured by elapsed wall-clock (ms).
|
||||
- Errors are caught per-query (do not abort the run). Failed queries record `error: Some(msg)` and `hits_top_k = vec![]`.
|
||||
- Determinism: with `temperature=0` and fixed `seed`, two consecutive runs produce byte-identical `per_query.jsonl` for non-RAG queries; RAG queries may differ in negligible token budget telemetry.
|
||||
@@ -130,12 +130,12 @@ pub fn run_eval(opts: &EvalRunOpts) -> anyhow::Result<EvalRun>;
|
||||
| determinism | re-running same suite + fixed seed → identical `per_query.jsonl` (lexical only) | tmp DB, fixed corpus |
|
||||
| snapshot | `EvalRun` (with mock LM for `with_rag`) JSON stable | `fixtures/eval/run-1.json` |
|
||||
|
||||
All tests under `cargo test -p kb-eval runner`.
|
||||
All tests under `cargo test -p kebab-eval runner`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-eval` passes
|
||||
- [ ] `cargo test -p kb-eval runner` passes
|
||||
- [ ] `cargo check -p kebab-eval` passes
|
||||
- [ ] `cargo test -p kebab-eval runner` passes
|
||||
- [ ] `fixtures/golden_queries.yaml` template shipped (≥ 5 example entries)
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §5.7
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P5
|
||||
component: kb-eval (metrics + compare)
|
||||
component: kebab-eval (metrics + compare)
|
||||
task_id: p5-2
|
||||
title: "Metrics computation + compare report"
|
||||
status: completed
|
||||
depends_on: [p5-1]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§5.7 eval_runs.aggregate_json, phase epic tasks/phase-5-evaluation.md]
|
||||
---
|
||||
|
||||
@@ -14,7 +14,7 @@ contract_sections: [§5.7 eval_runs.aggregate_json, phase epic tasks/phase-5-eva
|
||||
|
||||
## Goal
|
||||
|
||||
Compute hit@k, MRR, recall@k_doc, citation_coverage, groundedness, empty_result_rate, refusal_correctness from stored `eval_query_results`. Write `aggregate_json` back into `eval_runs`. Provide `kb eval compare a b` that diffs two runs.
|
||||
Compute hit@k, MRR, recall@k_doc, citation_coverage, groundedness, empty_result_rate, refusal_correctness from stored `eval_query_results`. Write `aggregate_json` back into `eval_runs`. Provide `kebab eval compare a b` that diffs two runs.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -22,16 +22,16 @@ Metric formulas + comparison logic are pure computation. Splitting them from p5-
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-store-sqlite` (read eval rows, write `aggregate_json`)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-store-sqlite` (read eval rows, write `aggregate_json`)
|
||||
- `serde`, `serde_json`
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-app`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-app`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -46,7 +46,7 @@ Metric formulas + comparison logic are pure computation. Splitting them from p5-
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `eval_runs.aggregate_json` updated | SQLite | history, CI checks |
|
||||
| `CompareReport` | `kb_eval::CompareReport` | `kb-cli` printer |
|
||||
| `CompareReport` | `kebab_eval::CompareReport` | `kebab-cli` printer |
|
||||
| optional `runs_dir/<run_id>/report.md` | filesystem | human-readable summary |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
@@ -127,15 +127,15 @@ pub fn render_report_md(report: &CompareReport) -> String;
|
||||
| determinism | running `compute_aggregate` twice produces identical `AggregateMetrics` | inline |
|
||||
| snapshot | `CompareReport` JSON for a fixed pair of runs stable | `fixtures/eval/compare-1.json` |
|
||||
|
||||
All tests under `cargo test -p kb-eval metrics`.
|
||||
All tests under `cargo test -p kebab-eval metrics`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-eval` passes
|
||||
- [ ] `cargo test -p kb-eval metrics` passes
|
||||
- [ ] `cargo check -p kebab-eval` passes
|
||||
- [ ] `cargo test -p kebab-eval metrics` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] `eval_runs.aggregate_json` always populated after `store_aggregate`
|
||||
- [ ] `kb eval compare` CLI surface integrated via `kb-app` (call `compare_runs` + `render_report_md`)
|
||||
- [ ] `kebab eval compare` CLI surface integrated via `kebab-app` (call `compare_runs` + `render_report_md`)
|
||||
- [ ] PR links phase epic tasks/phase-5-evaluation.md
|
||||
|
||||
## Out of scope
|
||||
@@ -172,11 +172,11 @@ same one this spec defines, the names / wiring just differ.
|
||||
/ `compare_runs(a, b)` so integration tests can drive the pipeline
|
||||
against a TempDir-backed `Config`. The no-arg forms wrap them with
|
||||
`Config::load(None)`.
|
||||
- **CLI surface lives on `kb-cli` directly, not via `kb-app`.** DoD
|
||||
asks for `kb eval compare` to be reached "via kb-app", but `kb-app`
|
||||
already depends on `kb-eval` (the P5-1 runner uses the App facade),
|
||||
so routing the CLI through `kb-app` would form a cycle. `kb-cli` →
|
||||
`kb-eval` is wired directly; `kb-app` is unchanged.
|
||||
- **CLI surface lives on `kebab-cli` directly, not via `kebab-app`.** DoD
|
||||
asks for `kebab eval compare` to be reached "via kebab-app", but `kebab-app`
|
||||
already depends on `kebab-eval` (the P5-1 runner uses the App facade),
|
||||
so routing the CLI through `kebab-app` would form a cycle. `kebab-cli` →
|
||||
`kebab-eval` is wired directly; `kebab-app` is unchanged.
|
||||
- **`AggregateMetrics` is `Serialize + Deserialize`.** The spec defines
|
||||
only the field shape; we add `Deserialize` so the stored
|
||||
`aggregate_json` can round-trip back into the type for follow-up
|
||||
@@ -184,12 +184,12 @@ same one this spec defines, the names / wiring just differ.
|
||||
- **`anyhow`** is used in `Result` returns since the rest of the
|
||||
workspace already speaks anyhow; not in the spec's Allowed list but
|
||||
matches every other crate.
|
||||
- **`kb-eval` crate-level `kb-app` dep stays.** The crate already
|
||||
depends on `kb-app` from P5-1 (the runner uses the `App` facade), so
|
||||
- **`kebab-eval` crate-level `kebab-app` dep stays.** The crate already
|
||||
depends on `kebab-app` from P5-1 (the runner uses the `App` facade), so
|
||||
the Cargo.toml entry remains. The new modules (`metrics.rs`,
|
||||
`compare.rs`) do not import `kb-app` themselves — they're behind the
|
||||
`compare.rs`) do not import `kebab-app` themselves — they're behind the
|
||||
same crate boundary as the runner, but the metric/compare *surface*
|
||||
is `kb-app`-clean. Splitting the crate to avoid a transitive Cargo
|
||||
is `kebab-app`-clean. Splitting the crate to avoid a transitive Cargo
|
||||
edge would be churn for no behavior gain.
|
||||
- **`citation_coverage` is intentionally weaker than the spec literal.**
|
||||
Spec calls for "every citation resolves to a real chunk in the DB".
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P6
|
||||
component: kb-parse-image (image extractor + EXIF)
|
||||
component: kebab-parse-image (image extractor + EXIF)
|
||||
task_id: p6-1
|
||||
title: "Image Extractor producing single-block CanonicalDocument + EXIF metadata"
|
||||
status: planned
|
||||
depends_on: [p0-1, p1-6]
|
||||
unblocks: [p6-2, p6-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4 Block::ImageRef + ImageRefBlock, §3.7a OcrText/ModelCaption stubs, §9.1 image extraction policy, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ Establishes the image-as-document contract and decouples extraction (asset → I
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `image = "0.25"` (decoding for size + format detect)
|
||||
- `kamadak-exif` for EXIF
|
||||
- `serde`, `serde_json`
|
||||
@@ -33,31 +33,31 @@ Establishes the image-as-document contract and decouples extraction (asset → I
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, OCR libs, LLM libs
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, OCR libs, LLM libs
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `RawAsset` | `kb_core::RawAsset` | from `kb-source-fs` |
|
||||
| `RawAsset` | `kebab_core::RawAsset` | from `kebab-source-fs` |
|
||||
| image bytes | `&[u8]` | filesystem |
|
||||
| `parser_version` | `kb_core::ParserVersion` | constant in this crate (`"image-meta-v1"`) |
|
||||
| `parser_version` | `kebab_core::ParserVersion` | constant in this crate (`"image-meta-v1"`) |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (image-region chunker) → `kb-store-sqlite` |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (image-region chunker) → `kebab-store-sqlite` |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct ImageExtractor;
|
||||
|
||||
impl kb_core::Extractor for ImageExtractor {
|
||||
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Image(_)) }
|
||||
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("image-meta-v1".into()) }
|
||||
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>;
|
||||
impl kebab_core::Extractor for ImageExtractor {
|
||||
fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Image(_)) }
|
||||
fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("image-meta-v1".into()) }
|
||||
fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -69,7 +69,7 @@ impl kb_core::Extractor for ImageExtractor {
|
||||
- `metadata.source_type = SourceType::Reference` (per design enum); `trust_level = TrustLevel::Primary`; `tags`/`aliases` empty.
|
||||
- `metadata.user["exif"]` = JSON object with whitelisted EXIF tags (DateTimeOriginal, GPS lat/lon, Make, Model, Orientation, Software). Missing tags omitted.
|
||||
- `metadata.user["dimensions"] = { "w": <u32>, "h": <u32>, "format": "<png|jpeg|...>" }`.
|
||||
- `provenance` includes `Discovered`, `Parsed` events (no Normalized — ID assignment happens here directly per §3.4 stub from p1-4 logic, OR pipe through `kb-normalize` if available; this task's choice: emit a fully formed CanonicalDocument with deterministic IDs by calling `kb_core::id_for_doc` and `kb_core::id_for_block` directly).
|
||||
- `provenance` includes `Discovered`, `Parsed` events (no Normalized — ID assignment happens here directly per §3.4 stub from p1-4 logic, OR pipe through `kebab-normalize` if available; this task's choice: emit a fully formed CanonicalDocument with deterministic IDs by calling `kebab_core::id_for_doc` and `kebab_core::id_for_block` directly).
|
||||
- Failure modes:
|
||||
- Truncated/corrupt image → still emits a CanonicalDocument with `dimensions = null`, EXIF empty, `Provenance` warning event with the decoder error message.
|
||||
- Unsupported format → `anyhow::Error` (caller skips).
|
||||
@@ -77,7 +77,7 @@ impl kb_core::Extractor for ImageExtractor {
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- None directly (the caller persists via `kb-store-sqlite`).
|
||||
- None directly (the caller persists via `kebab-store-sqlite`).
|
||||
|
||||
## Test plan
|
||||
|
||||
@@ -90,12 +90,12 @@ impl kb_core::Extractor for ImageExtractor {
|
||||
| determinism | identical bytes → identical `doc_id`, `block_id` across two runs | inline |
|
||||
| snapshot | `CanonicalDocument` JSON stable for fixture | `fixtures/image/red-100x50.png` |
|
||||
|
||||
All tests under `cargo test -p kb-parse-image`.
|
||||
All tests under `cargo test -p kebab-parse-image`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-image` passes
|
||||
- [ ] `cargo test -p kb-parse-image` passes
|
||||
- [ ] `cargo check -p kebab-parse-image` passes
|
||||
- [ ] `cargo test -p kebab-parse-image` passes
|
||||
- [ ] No OCR/caption/embedding code present
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.4, §9.1
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P6
|
||||
component: kb-parse-image (OCR adapter)
|
||||
component: kebab-parse-image (OCR adapter)
|
||||
task_id: p6-2
|
||||
title: "OcrEngine trait + Tesseract adapter (Apple Vision feature-gated)"
|
||||
status: planned
|
||||
depends_on: [p6-1]
|
||||
unblocks: [p6-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4 ImageRefBlock.ocr, §3.7a OcrText/OcrRegion, §9.1 OCR vs caption provenance]
|
||||
---
|
||||
|
||||
@@ -22,9 +22,9 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-parse-image` (consumes its types)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-parse-image` (consumes its types)
|
||||
- `tesseract = "0.13"` (feature `tesseract`, default ON)
|
||||
- For feature `apple-vision`: `std::process::Command` only (sidecar binary, not a Rust dep)
|
||||
- `serde`, `serde_json`
|
||||
@@ -34,21 +34,21 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| image bytes | `&[u8]` | from extractor |
|
||||
| optional language hint | `kb_core::Lang` | metadata |
|
||||
| `kb-config` OCR settings | engine name, languages | runtime |
|
||||
| optional language hint | `kebab_core::Lang` | metadata |
|
||||
| `kebab-config` OCR settings | engine name, languages | runtime |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `OcrText` | `kb_core::OcrText` | merged into `ImageRefBlock.ocr` |
|
||||
| `OcrText` | `kebab_core::OcrText` | merged into `ImageRefBlock.ocr` |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -56,11 +56,11 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
|
||||
pub trait OcrEngine: Send + Sync {
|
||||
fn engine_name(&self) -> &'static str;
|
||||
fn engine_version(&self) -> String;
|
||||
fn recognize(&self, image_bytes: &[u8], lang_hint: Option<&kb_core::Lang>) -> anyhow::Result<kb_core::OcrText>;
|
||||
fn recognize(&self, image_bytes: &[u8], lang_hint: Option<&kebab_core::Lang>) -> anyhow::Result<kebab_core::OcrText>;
|
||||
}
|
||||
|
||||
pub struct TesseractOcr { /* internal: lazy api handle */ }
|
||||
impl TesseractOcr { pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; }
|
||||
impl TesseractOcr { pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>; }
|
||||
impl OcrEngine for TesseractOcr { /* per trait */ }
|
||||
|
||||
#[cfg(feature = "apple-vision")]
|
||||
@@ -71,8 +71,8 @@ impl OcrEngine for AppleVisionOcr { /* per trait */ }
|
||||
pub fn apply_ocr(
|
||||
engine: &dyn OcrEngine,
|
||||
image_bytes: &[u8],
|
||||
block: &mut kb_core::ImageRefBlock,
|
||||
lang_hint: Option<&kb_core::Lang>,
|
||||
block: &mut kebab_core::ImageRefBlock,
|
||||
lang_hint: Option<&kebab_core::Lang>,
|
||||
) -> anyhow::Result<()>;
|
||||
```
|
||||
|
||||
@@ -85,7 +85,7 @@ pub fn apply_ocr(
|
||||
- `joined` = `regions.iter().map(|r| r.text).join(" ")` (no smart layout reconstruction in v1).
|
||||
- `engine = "tesseract"`, `engine_version = <tesseract version string>`. The `tesseract` crate (0.13+) does NOT expose a stable Rust `version()` accessor. Use one of: (a) call libtesseract's `TessVersion()` via the bundled FFI surface, OR (b) at adapter construction, shell-out `tesseract --version` once and cache the parsed `"5.3.4"`-style string. Both are deterministic for a fixed install. Pin the chosen approach in the implementation PR.
|
||||
- Apple Vision sidecar (feature `apple-vision`):
|
||||
- Spawn a small Swift binary `kb-vision-ocr` (path from `config.ocr.apple_vision_binary`) feeding the image via stdin and reading JSON `{ regions: [{x,y,w,h,text,confidence}, ...] }` from stdout.
|
||||
- Spawn a small Swift binary `kebab-vision-ocr` (path from `config.ocr.apple_vision_binary`) feeding the image via stdin and reading JSON `{ regions: [{x,y,w,h,text,confidence}, ...] }` from stdout.
|
||||
- Same threshold and `joined` rules as Tesseract. `engine = "apple-vision"`, `engine_version = sidecar's --version`.
|
||||
- This subagent task does NOT write the Swift sidecar; it only wires the Rust side. Document the expected sidecar interface in `docs/spec/sidecar-vision.md` (separate doc spec stub, optional).
|
||||
- `apply_ocr` calls `engine.recognize`, sets `block.ocr = Some(text)`, and appends a `Provenance::OcrApplied` event in the caller's CanonicalDocument (caller responsibility — this task exposes a helper).
|
||||
@@ -109,12 +109,12 @@ pub fn apply_ocr(
|
||||
| determinism | two runs of recognize on same input → identical OcrText | fixture |
|
||||
| `#[cfg(feature = "apple-vision")]` smoke | sidecar invocation captured (mock binary echoes fixed JSON) | inline mock |
|
||||
|
||||
All tests under `cargo test -p kb-parse-image ocr`. Tesseract install required on CI host.
|
||||
All tests under `cargo test -p kebab-parse-image ocr`. Tesseract install required on CI host.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-image --features tesseract` passes
|
||||
- [ ] `cargo test -p kb-parse-image ocr` passes
|
||||
- [ ] `cargo check -p kebab-parse-image --features tesseract` passes
|
||||
- [ ] `cargo test -p kebab-parse-image ocr` passes
|
||||
- [ ] `apple-vision` feature compiles on macOS and gracefully no-ops on Linux
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.4, §3.7a, §9.1
|
||||
@@ -129,5 +129,5 @@ All tests under `cargo test -p kb-parse-image ocr`. Tesseract install required o
|
||||
## Risks / notes
|
||||
|
||||
- Tesseract performance varies wildly with image quality; document `min_confidence` and default page-segmentation mode.
|
||||
- Apple Vision sidecar requires code signing for distribution; for v1 dev builds, accept unsigned binary from `~/.local/bin/kb-vision-ocr`.
|
||||
- Apple Vision sidecar requires code signing for distribution; for v1 dev builds, accept unsigned binary from `~/.local/bin/kebab-vision-ocr`.
|
||||
- Large image downscale loses small-text recognition; expose `config.ocr.max_pixels` so power users can tune.
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P6
|
||||
component: kb-parse-image (caption adapter)
|
||||
component: kebab-parse-image (caption adapter)
|
||||
task_id: p6-3
|
||||
title: "ModelCaption adapter (LanguageModel-driven, feature-gated)"
|
||||
status: planned
|
||||
depends_on: [p6-1, p4-2]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4 ImageRefBlock.caption, §3.7a ModelCaption, §9.1 caption (model-generated, low trust)]
|
||||
---
|
||||
|
||||
@@ -22,10 +22,10 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-parse-image`
|
||||
- `kb-llm` (LanguageModel trait)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-parse-image`
|
||||
- `kebab-llm` (LanguageModel trait)
|
||||
- `base64`
|
||||
- `serde`, `serde_json`
|
||||
- `image` (resize for prompt cost control)
|
||||
@@ -34,7 +34,7 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-llm-local` (only via trait), `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-llm-local` (only via trait), `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -42,28 +42,28 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
|
||||
|-------|------|--------|
|
||||
| image bytes | `&[u8]` | extractor |
|
||||
| `dyn LanguageModel` (vision-capable) | runtime | injected |
|
||||
| `kb-config.image.caption` | `{ enabled, max_pixels, prompt_template_version }` | runtime |
|
||||
| `kebab-config.image.caption` | `{ enabled, max_pixels, prompt_template_version }` | runtime |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `ModelCaption` | `kb_core::ModelCaption` | merged into `ImageRefBlock.caption` |
|
||||
| `ModelCaption` | `kebab_core::ModelCaption` | merged into `ImageRefBlock.caption` |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub fn caption_image(
|
||||
llm: &dyn kb_core::LanguageModel,
|
||||
llm: &dyn kebab_core::LanguageModel,
|
||||
image_bytes: &[u8],
|
||||
cfg: &kb_config::Config,
|
||||
) -> anyhow::Result<kb_core::ModelCaption>;
|
||||
cfg: &kebab_config::Config,
|
||||
) -> anyhow::Result<kebab_core::ModelCaption>;
|
||||
|
||||
pub fn apply_caption(
|
||||
llm: &dyn kb_core::LanguageModel,
|
||||
llm: &dyn kebab_core::LanguageModel,
|
||||
image_bytes: &[u8],
|
||||
block: &mut kb_core::ImageRefBlock,
|
||||
cfg: &kb_config::Config,
|
||||
block: &mut kebab_core::ImageRefBlock,
|
||||
cfg: &kebab_config::Config,
|
||||
) -> anyhow::Result<()>;
|
||||
```
|
||||
|
||||
@@ -86,7 +86,7 @@ pub fn apply_caption(
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- None directly. Caller persists via `kb-store-sqlite`.
|
||||
- None directly. Caller persists via `kebab-store-sqlite`.
|
||||
|
||||
## Test plan
|
||||
|
||||
@@ -99,12 +99,12 @@ pub fn apply_caption(
|
||||
| unit | downscale honors `max_pixels` (resulting bytes < some threshold) | fixture large image |
|
||||
| determinism | identical input + temperature=0 + seed=0 → identical caption (mock) | inline |
|
||||
|
||||
All tests under `cargo test -p kb-parse-image caption` with mock LM only.
|
||||
All tests under `cargo test -p kebab-parse-image caption` with mock LM only.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-image --features caption` passes
|
||||
- [ ] `cargo test -p kb-parse-image caption` passes
|
||||
- [ ] `cargo check -p kebab-parse-image --features caption` passes
|
||||
- [ ] `cargo test -p kebab-parse-image caption` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] Feature default OFF; only on when user opts in via config
|
||||
- [ ] PR links design §3.4 ImageRefBlock.caption, §9.1
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P7
|
||||
component: kb-parse-pdf (text extractor)
|
||||
component: kebab-parse-pdf (text extractor)
|
||||
task_id: p7-1
|
||||
title: "Text PDF extractor → CanonicalDocument with page-level blocks"
|
||||
status: planned
|
||||
depends_on: [p0-1, p1-6]
|
||||
unblocks: [p7-2]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4 SourceSpan::Page, §3.4 Block::Paragraph, §9.2 PDF text extraction, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ Strict scope: page text + page numbers. Layout reconstruction (multi-column merg
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `pdf-extract = "0.7"` (or current stable)
|
||||
- `lopdf = "0.32"` for page metadata (count, optional title from /Info)
|
||||
- `serde`, `serde_json`
|
||||
@@ -33,30 +33,30 @@ Strict scope: page text + page numbers. Layout reconstruction (multi-column merg
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, OCR libraries (OCR fallback is a separate task, not this one)
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, OCR libraries (OCR fallback is a separate task, not this one)
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `RawAsset` | `kb_core::RawAsset` | `kb-source-fs` |
|
||||
| `RawAsset` | `kebab_core::RawAsset` | `kebab-source-fs` |
|
||||
| PDF bytes | `&[u8]` | filesystem |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (`pdf-page-v1` chunker in p7-2) |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (`pdf-page-v1` chunker in p7-2) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct PdfTextExtractor;
|
||||
|
||||
impl kb_core::Extractor for PdfTextExtractor {
|
||||
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Pdf) }
|
||||
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("pdf-text-v1".into()) }
|
||||
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>;
|
||||
impl kebab_core::Extractor for PdfTextExtractor {
|
||||
fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Pdf) }
|
||||
fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("pdf-text-v1".into()) }
|
||||
fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -100,12 +100,12 @@ impl kb_core::Extractor for PdfTextExtractor {
|
||||
| determinism | identical bytes → identical CanonicalDocument JSON across two runs | inline |
|
||||
| snapshot | CanonicalDocument JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` |
|
||||
|
||||
All tests under `cargo test -p kb-parse-pdf`.
|
||||
All tests under `cargo test -p kebab-parse-pdf`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-pdf` passes
|
||||
- [ ] `cargo test -p kb-parse-pdf` passes
|
||||
- [ ] `cargo check -p kebab-parse-pdf` passes
|
||||
- [ ] `cargo test -p kebab-parse-pdf` passes
|
||||
- [ ] No OCR / LLM code present
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.4 SourceSpan::Page, §9.2
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P7
|
||||
component: kb-chunk (pdf-page-v1)
|
||||
component: kebab-chunk (pdf-page-v1)
|
||||
task_id: p7-2
|
||||
title: "PDF page-aware chunker (pdf-page-v1)"
|
||||
status: planned
|
||||
depends_on: [p7-1]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ Per-medium chunkers must stay tiny and obvious. Page-aware logic is small but it
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `serde`, `serde_json`
|
||||
- `blake3` (policy_hash)
|
||||
- `serde-json-canonicalizer`
|
||||
@@ -31,30 +31,30 @@ Per-medium chunkers must stay tiny and obvious. Page-aware logic is small but it
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf` (consumes `CanonicalDocument` via `kb-core` only), `kb-normalize`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf` (consumes `CanonicalDocument` via `kebab-core` only), `kebab-normalize`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `CanonicalDocument` (produced by `pdf-text-v1`) | `kb_core::CanonicalDocument` | p7-1 |
|
||||
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` |
|
||||
| `CanonicalDocument` (produced by `pdf-text-v1`) | `kebab_core::CanonicalDocument` | p7-1 |
|
||||
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite`, `kb-embed*` |
|
||||
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite`, `kebab-embed*` |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct PdfPageV1Chunker;
|
||||
|
||||
impl kb_core::Chunker for PdfPageV1Chunker {
|
||||
fn chunker_version(&self) -> kb_core::ChunkerVersion { kb_core::ChunkerVersion("pdf-page-v1".into()) }
|
||||
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
|
||||
impl kebab_core::Chunker for PdfPageV1Chunker {
|
||||
fn chunker_version(&self) -> kebab_core::ChunkerVersion { kebab_core::ChunkerVersion("pdf-page-v1".into()) }
|
||||
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -62,7 +62,7 @@ impl kb_core::Chunker for PdfPageV1Chunker {
|
||||
|
||||
## Behavior contract
|
||||
|
||||
- Only operates on documents whose blocks all carry `SourceSpan::Page` (i.e., from `kb-parse-pdf`). Other documents → return `anyhow::Error("PdfPageV1Chunker only handles PDF docs")`.
|
||||
- Only operates on documents whose blocks all carry `SourceSpan::Page` (i.e., from `kebab-parse-pdf`). Other documents → return `anyhow::Error("PdfPageV1Chunker only handles PDF docs")`.
|
||||
- For each page block (1 block per page after p7-1):
|
||||
- If `text.len()` (byte estimate) ≤ `policy.target_tokens * 4` (proxy for tokens) → emit one chunk for the entire page.
|
||||
- Else → split by paragraphs (split text on `\n\n` or sentence-ending punctuation followed by whitespace) and group adjacent paragraphs until the running byte total approaches `policy.target_tokens * 4`. Apply `policy.overlap_tokens * 4` bytes of trailing overlap into the next chunk's prefix.
|
||||
@@ -91,12 +91,12 @@ impl kb_core::Chunker for PdfPageV1Chunker {
|
||||
| determinism | same input → same chunk_ids twice | inline |
|
||||
| snapshot | `Vec<Chunk>` JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` (chunked) |
|
||||
|
||||
All tests under `cargo test -p kb-chunk pdf`.
|
||||
All tests under `cargo test -p kebab-chunk pdf`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-chunk` passes (existing `md-heading-v1` continues to pass)
|
||||
- [ ] `cargo test -p kb-chunk pdf` passes
|
||||
- [ ] `cargo check -p kebab-chunk` passes (existing `md-heading-v1` continues to pass)
|
||||
- [ ] `cargo test -p kebab-chunk pdf` passes
|
||||
- [ ] Snapshot stable across two runs
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.5, §0 Q3, §9
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P8
|
||||
component: kb-parse-audio (whisper adapter)
|
||||
component: kebab-parse-audio (whisper adapter)
|
||||
task_id: p8-1
|
||||
title: "Audio Extractor + Transcriber trait + whisper.cpp adapter"
|
||||
status: planned
|
||||
depends_on: [p0-1, p1-6]
|
||||
unblocks: [p8-2]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4 Block::AudioRef + AudioRefBlock, §3.7a Transcript + TranscriptSegment, §9.3 audio policy, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `whisper-rs = "0.13"` (or current stable)
|
||||
- `symphonia = { version = "0.5", features = ["all"] }` — decode `.m4a/.mp3/.wav/.flac/.ogg` to interleaved f32 PCM at the source's native sample rate / channel layout. Symphonia does NOT resample; that is rubato's job.
|
||||
- `rubato = "0.15"` — sample-rate conversion to 16 kHz mono f32 (the input shape whisper.cpp expects). Use `rubato::FftFixedIn::new(input_sample_rate, 16_000, frames_per_chunk, sub_chunks, 1 /* channels after downmix */)` for fixed-input streaming; pre-mix multi-channel to mono via simple averaging before the resampler.
|
||||
@@ -34,21 +34,21 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf`, `kb-parse-image`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf`, `kebab-parse-image`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `RawAsset` | `kb_core::RawAsset` | `kb-source-fs` |
|
||||
| `RawAsset` | `kebab_core::RawAsset` | `kebab-source-fs` |
|
||||
| audio bytes | `&[u8]` | filesystem |
|
||||
| `kb-config.audio` | `{ model_path, language, chunk_seconds, n_threads, gpu }` | runtime |
|
||||
| `kebab-config.audio` | `{ model_path, language, chunk_seconds, n_threads, gpu }` | runtime |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (`audio-segment-v1` chunker in p8-2) |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (`audio-segment-v1` chunker in p8-2) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -56,19 +56,19 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
|
||||
pub trait Transcriber: Send + Sync {
|
||||
fn engine(&self) -> &'static str;
|
||||
fn engine_version(&self) -> String;
|
||||
fn transcribe(&self, pcm_f32_16khz: &[f32], language_hint: Option<&kb_core::Lang>) -> anyhow::Result<kb_core::Transcript>;
|
||||
fn transcribe(&self, pcm_f32_16khz: &[f32], language_hint: Option<&kebab_core::Lang>) -> anyhow::Result<kebab_core::Transcript>;
|
||||
}
|
||||
|
||||
pub struct WhisperCppTranscriber { /* internal: whisper_rs::WhisperContext */ }
|
||||
impl WhisperCppTranscriber { pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; }
|
||||
impl WhisperCppTranscriber { pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>; }
|
||||
impl Transcriber for WhisperCppTranscriber { /* per trait */ }
|
||||
|
||||
pub struct AudioExtractor { transcriber: std::sync::Arc<dyn Transcriber> }
|
||||
impl AudioExtractor { pub fn new(transcriber: std::sync::Arc<dyn Transcriber>) -> Self; }
|
||||
impl kb_core::Extractor for AudioExtractor {
|
||||
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Audio(_)) }
|
||||
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("audio-whisper-v1".into()) }
|
||||
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>;
|
||||
impl kebab_core::Extractor for AudioExtractor {
|
||||
fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Audio(_)) }
|
||||
fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("audio-whisper-v1".into()) }
|
||||
fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -87,7 +87,7 @@ impl kb_core::Extractor for AudioExtractor {
|
||||
- `provenance` events: `Discovered`, `Parsed`, `Transcribed`.
|
||||
- `block_id` per design §4.2 with `block_kind = "audio_ref"`, `heading_path = []`, `ordinal = 0`, `source_span = SourceSpan::Time { start_ms: 0, end_ms: duration_ms }`.
|
||||
- `WhisperCppTranscriber`:
|
||||
- Loads model from `config.audio.model_path` (e.g., `~/.local/share/kb/models/whisper/ggml-large-v3.bin`).
|
||||
- Loads model from `config.audio.model_path` (e.g., `~/.local/share/kebab/models/whisper/ggml-large-v3.bin`).
|
||||
- Runs with `WhisperFullParams::new(SamplingStrategy::Greedy { best_of: 1 })` — deterministic.
|
||||
- Streams in chunks of `config.audio.chunk_seconds` (default 30) to bound memory; aggregates segments.
|
||||
- `Transcript.segments` populated with `start_ms`, `end_ms`, `text`, `confidence: Some(p)` from whisper's per-token probabilities (averaged), `speaker: None` (diarization is P+).
|
||||
@@ -115,12 +115,12 @@ impl kb_core::Extractor for AudioExtractor {
|
||||
| `#[ignore]` integration | 30-second Korean audio → segments_count > 1, language = "ko" | requires `large-v3` model |
|
||||
| snapshot | CanonicalDocument JSON stable for short fixture | `fixtures/audio/hello.wav` |
|
||||
|
||||
All tests under `cargo test -p kb-parse-audio`. Mark slow/large-model tests `#[ignore]`.
|
||||
All tests under `cargo test -p kebab-parse-audio`. Mark slow/large-model tests `#[ignore]`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-audio` passes
|
||||
- [ ] `cargo test -p kb-parse-audio` passes (excluding `#[ignore]`)
|
||||
- [ ] `cargo check -p kebab-parse-audio` passes
|
||||
- [ ] `cargo test -p kebab-parse-audio` passes (excluding `#[ignore]`)
|
||||
- [ ] No imports outside Allowed dependencies (resampler crate may be added — record in PR)
|
||||
- [ ] First-run model download path documented (NOT performed by code; user responsibility)
|
||||
- [ ] PR links design §3.4, §3.7a, §9.3
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P8
|
||||
component: kb-chunk (audio-segment-v1)
|
||||
component: kebab-chunk (audio-segment-v1)
|
||||
task_id: p8-2
|
||||
title: "Audio segment chunker (audio-segment-v1)"
|
||||
status: planned
|
||||
depends_on: [p8-1]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.5 Chunk, §3.4 SourceSpan::Time, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ Per-medium chunker. Tiny but versioned — `chunk_id` depends on `chunker_versio
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `serde`, `serde_json`
|
||||
- `blake3` (policy_hash)
|
||||
- `serde-json-canonicalizer`
|
||||
@@ -31,30 +31,30 @@ Per-medium chunker. Tiny but versioned — `chunk_id` depends on `chunker_versio
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio` (consumes via `kb-core` only), `kb-normalize`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio` (consumes via `kebab-core` only), `kebab-normalize`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `CanonicalDocument` containing one `AudioRefBlock` with `Transcript` | `kb_core::CanonicalDocument` | p8-1 |
|
||||
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` |
|
||||
| `CanonicalDocument` containing one `AudioRefBlock` with `Transcript` | `kebab_core::CanonicalDocument` | p8-1 |
|
||||
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite`, embedders |
|
||||
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite`, embedders |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct AudioSegmentV1Chunker;
|
||||
|
||||
impl kb_core::Chunker for AudioSegmentV1Chunker {
|
||||
fn chunker_version(&self) -> kb_core::ChunkerVersion { kb_core::ChunkerVersion("audio-segment-v1".into()) }
|
||||
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
|
||||
impl kebab_core::Chunker for AudioSegmentV1Chunker {
|
||||
fn chunker_version(&self) -> kebab_core::ChunkerVersion { kebab_core::ChunkerVersion("audio-segment-v1".into()) }
|
||||
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -92,12 +92,12 @@ impl kb_core::Chunker for AudioSegmentV1Chunker {
|
||||
| determinism | same input → same chunk_ids twice | inline |
|
||||
| snapshot | `Vec<Chunk>` JSON for fixture transcript stable | `fixtures/audio/transcript-1.json` (constructed) |
|
||||
|
||||
All tests under `cargo test -p kb-chunk audio`.
|
||||
All tests under `cargo test -p kebab-chunk audio`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-chunk` passes (md-heading-v1 + pdf-page-v1 + audio-segment-v1 all coexist)
|
||||
- [ ] `cargo test -p kb-chunk audio` passes
|
||||
- [ ] `cargo check -p kebab-chunk` passes (md-heading-v1 + pdf-page-v1 + audio-segment-v1 all coexist)
|
||||
- [ ] `cargo test -p kebab-chunk audio` passes
|
||||
- [ ] Snapshot stable across two runs
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.5, §3.4 SourceSpan::Time, §4.2
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P9
|
||||
component: kb-tui (library view)
|
||||
component: kebab-tui (library view)
|
||||
task_id: p9-1
|
||||
title: "Ratatui library list view + tag filter"
|
||||
status: planned
|
||||
depends_on: [p1-6]
|
||||
unblocks: [p9-2, p9-3, p9-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [report §16.2 TUI (also tasks/phase-9-ui.md epic), design §3.7 SearchHit, design §1 UX scenes for shared key bindings]
|
||||
---
|
||||
|
||||
@@ -14,7 +14,7 @@ contract_sections: [report §16.2 TUI (also tasks/phase-9-ui.md epic), design §
|
||||
|
||||
## Goal
|
||||
|
||||
Stand up a Ratatui app skeleton with a "Library" pane: list documents, filter by tag/lang, navigate. Establishes the global app loop, key dispatch, and `kb-app` integration point that the search/ask/inspect panes (p9-2..p9-4) extend.
|
||||
Stand up a Ratatui app skeleton with a "Library" pane: list documents, filter by tag/lang, navigate. Establishes the global app loop, key dispatch, and `kebab-app` integration point that the search/ask/inspect panes (p9-2..p9-4) extend.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -22,9 +22,9 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-app` (facade — the only crate this binary touches besides `kb-core`/`kb-config`)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-app` (facade — the only crate this binary touches besides `kebab-core`/`kebab-config`)
|
||||
- `ratatui = "0.28"`
|
||||
- `crossterm`
|
||||
- `tracing`
|
||||
@@ -32,15 +32,15 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (UI must go through `kb-app` only — this is the design §8 boundary)
|
||||
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (UI must go through `kebab-app` only — this is the design §8 boundary)
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-app::list_docs(filter)` | facade call | runtime |
|
||||
| `kebab-app::list_docs(filter)` | facade call | runtime |
|
||||
| keyboard events | `crossterm` | terminal |
|
||||
| `kb-config::Config` | runtime | env / file |
|
||||
| `kebab-config::Config` | runtime | env / file |
|
||||
|
||||
## Outputs
|
||||
|
||||
@@ -57,7 +57,7 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
|
||||
// state in WITHOUT modifying the App struct definition. This avoids merge
|
||||
// conflicts when p9-2/3/4 land in parallel; only p9-1 ever changes `App`.
|
||||
pub struct App {
|
||||
pub config: kb_config::Config,
|
||||
pub config: kebab_config::Config,
|
||||
pub focus: Pane,
|
||||
pub library: LibraryState, // owned by p9-1
|
||||
pub search: Option<SearchState>, // populated by p9-2 (None until that crate links in)
|
||||
@@ -73,7 +73,7 @@ pub struct AskState; // body filled by p9-3
|
||||
pub struct InspectState; // body filled by p9-4
|
||||
|
||||
impl App {
|
||||
pub fn new(config: kb_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn new(config: kebab_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn run(&mut self) -> anyhow::Result<()>; // blocking loop until quit
|
||||
}
|
||||
|
||||
@@ -102,12 +102,12 @@ pub enum KeyOutcome { Continue, Quit, SwitchPane(Pane), Refresh }
|
||||
- `Enter` → switch to Inspect pane (p9-4) on selected doc
|
||||
- `q` or `Esc` → quit
|
||||
- All facade calls run on the main thread (no async). For long calls, render a "loading…" state and call from a worker thread; bridge via `mpsc::channel` (this task may keep things synchronous and accept brief UI hangs for v1).
|
||||
- Logging: `tracing` initialized to a file under `~/.local/state/kb/logs/`; never to stdout/stderr (so the TUI is not corrupted).
|
||||
- Logging: `tracing` initialized to a file under `~/.local/state/kebab/logs/`; never to stdout/stderr (so the TUI is not corrupted).
|
||||
- Error rendering: a popup overlay shows `error: {msg}\nhint: {hint}` from `anyhow::Error` chain; press any key to dismiss.
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Reads: `kb-app::list_docs` only.
|
||||
- Reads: `kebab-app::list_docs` only.
|
||||
- Writes: none.
|
||||
|
||||
## Test plan
|
||||
@@ -118,16 +118,16 @@ pub enum KeyOutcome { Continue, Quit, SwitchPane(Pane), Refresh }
|
||||
| unit | filter `f` opens edit overlay; `Enter` triggers refresh | inline |
|
||||
| snapshot | rendered library with 3 docs + filter open produces stable frame buffer (use `ratatui::backend::TestBackend`) | inline |
|
||||
| unit | error popup renders without panic on injected `anyhow::Error` | inline |
|
||||
| integration | mocked `kb-app::list_docs` returning N docs renders all rows | inline |
|
||||
| integration | mocked `kebab-app::list_docs` returning N docs renders all rows | inline |
|
||||
|
||||
All tests under `cargo test -p kb-tui library`.
|
||||
All tests under `cargo test -p kebab-tui library`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-tui` passes
|
||||
- [ ] `cargo test -p kb-tui library` passes
|
||||
- [ ] No imports outside `kb-core`, `kb-config`, `kb-app`
|
||||
- [ ] `kb tui` (or `kb` if TUI is the default) launches and shows Library on a real terminal (manual smoke)
|
||||
- [ ] `cargo check -p kebab-tui` passes
|
||||
- [ ] `cargo test -p kebab-tui library` passes
|
||||
- [ ] No imports outside `kebab-core`, `kebab-config`, `kebab-app`
|
||||
- [ ] `kebab tui` (or `kebab` if TUI is the default) launches and shows Library on a real terminal (manual smoke)
|
||||
- [ ] PR links design §8 module boundary, report §16.2 (TUI epic)
|
||||
|
||||
## Out of scope
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P9
|
||||
component: kb-tui (search pane)
|
||||
component: kebab-tui (search pane)
|
||||
task_id: p9-2
|
||||
title: "TUI Search pane: input + result list + preview + editor jump"
|
||||
status: planned
|
||||
depends_on: [p2-2, p3-4, p9-1]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§1.5/1.6 search output, §3.7 SearchHit, §0 Q3 citation]
|
||||
---
|
||||
|
||||
@@ -14,7 +14,7 @@ contract_sections: [§1.5/1.6 search output, §3.7 SearchHit, §0 Q3 citation]
|
||||
|
||||
## Goal
|
||||
|
||||
Add a Search pane to the TUI that drives `kb-app::search`, renders dense results (rank+score / path#frag / heading / snippet), and supports `g` (editor jump to citation) for the selected hit.
|
||||
Add a Search pane to the TUI that drives `kebab-app::search`, renders dense results (rank+score / path#frag / heading / snippet), and supports `g` (editor jump to citation) for the selected hit.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -22,25 +22,25 @@ Search is the most-used surface. Confining it to one pane leverages the App skel
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-app`
|
||||
- `kb-tui` (extends p9-1)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-app`
|
||||
- `kebab-tui` (extends p9-1)
|
||||
- `ratatui`, `crossterm`
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-app::search(query)` | facade | runtime |
|
||||
| `kebab-app::search(query)` | facade | runtime |
|
||||
| keyboard events | `crossterm` | terminal |
|
||||
| selected hit's citation | `kb_core::Citation` | App state |
|
||||
| selected hit's citation | `kebab_core::Citation` | App state |
|
||||
|
||||
## Outputs
|
||||
|
||||
@@ -54,16 +54,16 @@ Search is the most-used surface. Confining it to one pane leverages the App skel
|
||||
```rust
|
||||
pub fn render_search<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App);
|
||||
pub fn handle_key_search(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
|
||||
pub fn jump_to_citation(citation: &kb_core::Citation, editor_env: &str /* $EDITOR */) -> anyhow::Result<()>;
|
||||
pub fn jump_to_citation(citation: &kebab_core::Citation, editor_env: &str /* $EDITOR */) -> anyhow::Result<()>;
|
||||
```
|
||||
|
||||
This task fills the body of `kb_tui::SearchState` (forward-declared in p9-1). The `App` struct itself is NOT edited — only `SearchState` gets fields:
|
||||
This task fills the body of `kebab_tui::SearchState` (forward-declared in p9-1). The `App` struct itself is NOT edited — only `SearchState` gets fields:
|
||||
|
||||
```rust
|
||||
pub struct SearchState {
|
||||
pub input: String,
|
||||
pub mode: kb_core::SearchMode,
|
||||
pub hits: Vec<kb_core::SearchHit>,
|
||||
pub mode: kebab_core::SearchMode,
|
||||
pub hits: Vec<kebab_core::SearchHit>,
|
||||
pub selected_hit: usize,
|
||||
pub last_query_at: Option<time::OffsetDateTime>, // debounce timer
|
||||
}
|
||||
@@ -73,7 +73,7 @@ The Library pane's keypress handler (in p9-1) sets `app.search = Some(SearchStat
|
||||
|
||||
## Behavior contract
|
||||
|
||||
- Layout: top input bar (search query + mode badge `[hybrid|lexical|vector]`), middle result list (one hit per 4 lines per design §1.5 dense format), bottom preview pane (full chunk text fetched lazily via `kb-app::inspect_chunk`).
|
||||
- Layout: top input bar (search query + mode badge `[hybrid|lexical|vector]`), middle result list (one hit per 4 lines per design §1.5 dense format), bottom preview pane (full chunk text fetched lazily via `kebab-app::inspect_chunk`).
|
||||
- Key bindings (Search pane):
|
||||
- typing → updates `search_input`; debounced (200 ms) re-search
|
||||
- `Tab` → cycles `search_mode` Lexical → Vector → Hybrid → Lexical
|
||||
@@ -104,14 +104,14 @@ The Library pane's keypress handler (in p9-1) sets `app.search = Some(SearchStat
|
||||
| unit | `j` / `k` move selection within bounds | inline |
|
||||
| unit | `jump_to_citation` for `Line` builds `+<line> <path>` command (assert via mocked Command runner) | inline |
|
||||
| snapshot | rendered Search pane with 3 hits + preview stable | TestBackend |
|
||||
| integration | mocked `kb-app::search` returning fixture hits drives render | inline |
|
||||
| integration | mocked `kebab-app::search` returning fixture hits drives render | inline |
|
||||
|
||||
All tests under `cargo test -p kb-tui search`.
|
||||
All tests under `cargo test -p kebab-tui search`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-tui` passes
|
||||
- [ ] `cargo test -p kb-tui search` passes
|
||||
- [ ] `cargo check -p kebab-tui` passes
|
||||
- [ ] `cargo test -p kebab-tui search` passes
|
||||
- [ ] `g` keybinding launches `$EDITOR` with correct `+<line>` argument (manual smoke against vim)
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §1.5/1.6, §3.7
|
||||
@@ -125,5 +125,5 @@ All tests under `cargo test -p kb-tui search`.
|
||||
## Risks / notes
|
||||
|
||||
- Suspending and restoring crossterm raw mode around the editor spawn is finicky; code defensively (RAII guard).
|
||||
- Different editors take different jump syntaxes. Provide an env override `KB_EDITOR_JUMP_FORMAT="vim"` for users on exotic editors.
|
||||
- Different editors take different jump syntaxes. Provide an env override `KEBAB_EDITOR_JUMP_FORMAT="vim"` for users on exotic editors.
|
||||
- Long snippet text wrap: clamp to viewport width and ellipsize per design §1.5 (`…` already in dense template).
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P9
|
||||
component: kb-tui (ask pane)
|
||||
component: kebab-tui (ask pane)
|
||||
task_id: p9-3
|
||||
title: "TUI Ask pane: streaming answer + citation links + --explain toggle"
|
||||
status: planned
|
||||
depends_on: [p4-3, p9-1]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§1.1–1.4 ask scenes, §2.3 Answer wire, §3.8 Answer]
|
||||
---
|
||||
|
||||
@@ -14,7 +14,7 @@ contract_sections: [§1.1–1.4 ask scenes, §2.3 Answer wire, §3.8 Answer]
|
||||
|
||||
## Goal
|
||||
|
||||
Add an Ask pane that calls `kb-app::ask`, streams tokens into the answer area in real time, renders citation footnotes (default mode A), and toggles to `--explain` (mode B + retrieval trace) with a key.
|
||||
Add an Ask pane that calls `kebab-app::ask`, streams tokens into the answer area in real time, renders citation footnotes (default mode A), and toggles to `--explain` (mode B + retrieval trace) with a key.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -22,23 +22,23 @@ Streaming UI is the only TUI piece that meaningfully differs from search/inspect
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-app`
|
||||
- `kb-tui` (extends p9-1)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-app`
|
||||
- `kebab-tui` (extends p9-1)
|
||||
- `ratatui`, `crossterm`
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (only via `kb-app`), `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (only via `kebab-app`), `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-app::ask(query, AskOpts)` | facade | runtime |
|
||||
| `kebab-app::ask(query, AskOpts)` | facade | runtime |
|
||||
| keyboard events | `crossterm` | terminal |
|
||||
|
||||
## Outputs
|
||||
@@ -46,7 +46,7 @@ Streaming UI is the only TUI piece that meaningfully differs from search/inspect
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| Ratatui Ask pane render | terminal | user |
|
||||
| `kb-app::ask` invocation with streaming closure | facade | RAG pipeline |
|
||||
| `kebab-app::ask` invocation with streaming closure | facade | RAG pipeline |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -55,7 +55,7 @@ pub fn render_ask<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ra
|
||||
pub fn handle_key_ask(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
|
||||
```
|
||||
|
||||
This task fills the body of `kb_tui::AskState` (forward-declared in p9-1). `App` is NOT edited — only `AskState` gets fields:
|
||||
This task fills the body of `kebab_tui::AskState` (forward-declared in p9-1). `App` is NOT edited — only `AskState` gets fields:
|
||||
|
||||
```rust
|
||||
pub struct AskState {
|
||||
@@ -63,8 +63,8 @@ pub struct AskState {
|
||||
pub explain: bool,
|
||||
pub streaming: bool,
|
||||
pub partial: String,
|
||||
pub answer: Option<kb_core::Answer>,
|
||||
pub thread: Option<std::thread::JoinHandle<anyhow::Result<kb_core::Answer>>>,
|
||||
pub answer: Option<kebab_core::Answer>,
|
||||
pub thread: Option<std::thread::JoinHandle<anyhow::Result<kebab_core::Answer>>>,
|
||||
pub rx: Option<std::sync::mpsc::Receiver<String>>,
|
||||
}
|
||||
```
|
||||
@@ -74,7 +74,7 @@ pub struct AskState {
|
||||
## Behavior contract
|
||||
|
||||
- Layout: top input bar (`?` prompt, query text), middle answer area (rendered Markdown-light: paragraphs + inline `[N]` markers), bottom-right citations panel (numbered list of citations with `path#fragment` and section label), bottom-left status (`grounded ✓/✗ model prompt_v k chunks`).
|
||||
- Submission: `Enter` triggers a worker thread that calls `kb-app::ask` with `AskOpts.stream_sink: Some(tx)` (`tx: mpsc::Sender<String>`). The thread holds the `tx`, the TUI holds the matching `rx` (set on `AskState.rx`). On each render frame the TUI drains `rx.try_iter()` into `state.partial`, no blocking.
|
||||
- Submission: `Enter` triggers a worker thread that calls `kebab-app::ask` with `AskOpts.stream_sink: Some(tx)` (`tx: mpsc::Sender<String>`). The thread holds the `tx`, the TUI holds the matching `rx` (set on `AskState.rx`). On each render frame the TUI drains `rx.try_iter()` into `state.partial`, no blocking.
|
||||
- Streaming: while `ask_streaming = true`, the Answer area shows `ask_partial` and a small "▍" cursor. When the worker finishes, `ask_answer` is populated and the citations panel switches to the final list.
|
||||
- Refusal rendering:
|
||||
- `grounded = false` and `refusal_reason = ScoreGate` → render the answer (which is the human-friendly "근거 부족…" message), citations show "가까운 후보".
|
||||
@@ -85,12 +85,12 @@ pub struct AskState {
|
||||
- `e` → toggle `ask_explain`; resubmit on next `Enter`. While explain ON, citations panel is replaced by the per-claim breakdown (mode B in design §1.2) and a footer shows the retrieval trace summary.
|
||||
- `Esc` → switch back to Library pane (cancellation of an in-flight ask is best-effort: the worker thread continues but its final answer is dropped).
|
||||
- `j` / `k` → scroll the answer area when oversized.
|
||||
- All facade calls stay within `kb-app::ask` — never reach into `kb-rag` directly.
|
||||
- All facade calls stay within `kebab-app::ask` — never reach into `kebab-rag` directly.
|
||||
- Errors render as a popup overlay; do not crash the pane.
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Reads/writes via `kb-app::ask` which itself writes the `answers` row in `kb.sqlite`. The pane has no direct DB access.
|
||||
- Reads/writes via `kebab-app::ask` which itself writes the `answers` row in `kebab.sqlite`. The pane has no direct DB access.
|
||||
|
||||
## Test plan
|
||||
|
||||
@@ -102,14 +102,14 @@ pub struct AskState {
|
||||
| unit | refusal answer renders without citations panel index errors | inline |
|
||||
| snapshot | rendered Ask pane mid-stream is stable | TestBackend |
|
||||
| snapshot | rendered Ask pane after finished grounded answer is stable | TestBackend |
|
||||
| integration | mocked `kb-app::ask` returning a canned `Answer` populates final state correctly | inline |
|
||||
| integration | mocked `kebab-app::ask` returning a canned `Answer` populates final state correctly | inline |
|
||||
|
||||
All tests under `cargo test -p kb-tui ask`.
|
||||
All tests under `cargo test -p kebab-tui ask`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-tui` passes
|
||||
- [ ] `cargo test -p kb-tui ask` passes
|
||||
- [ ] `cargo check -p kebab-tui` passes
|
||||
- [ ] `cargo test -p kebab-tui ask` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] Manual smoke: stream tokens visible character-by-character against a real Ollama (or `MockLanguageModel`)
|
||||
- [ ] PR links design §1.1–1.4, §2.3
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P9
|
||||
component: kb-tui (inspect pane)
|
||||
component: kebab-tui (inspect pane)
|
||||
task_id: p9-4
|
||||
title: "TUI Inspect pane: document & chunk detail render"
|
||||
status: planned
|
||||
depends_on: [p1-6, p9-1]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§1 inspect output, §3.5 Chunk, §2.5 DocSummary, §2.6 ChunkInspection]
|
||||
---
|
||||
|
||||
@@ -22,24 +22,24 @@ Inspect is read-only and has no external interactions; smallest possible pane. U
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-app`
|
||||
- `kb-tui` (extends p9-1)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-app`
|
||||
- `kebab-tui` (extends p9-1)
|
||||
- `ratatui`, `crossterm`
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (only via `kb-app`), `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (only via `kebab-app`), `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-app::inspect_doc(id)` | facade | runtime |
|
||||
| `kb-app::inspect_chunk(id)` | facade | runtime |
|
||||
| `kebab-app::inspect_doc(id)` | facade | runtime |
|
||||
| `kebab-app::inspect_chunk(id)` | facade | runtime |
|
||||
| keyboard events | `crossterm` | terminal |
|
||||
|
||||
## Outputs
|
||||
@@ -51,19 +51,19 @@ Inspect is read-only and has no external interactions; smallest possible pane. U
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub enum InspectTarget { Doc(kb_core::DocumentId), Chunk(kb_core::ChunkId) }
|
||||
pub enum InspectTarget { Doc(kebab_core::DocumentId), Chunk(kebab_core::ChunkId) }
|
||||
|
||||
pub fn render_inspect<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App);
|
||||
pub fn handle_key_inspect(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
|
||||
```
|
||||
|
||||
This task fills the body of `kb_tui::InspectState` (forward-declared in p9-1). `App` is NOT edited.
|
||||
This task fills the body of `kebab_tui::InspectState` (forward-declared in p9-1). `App` is NOT edited.
|
||||
|
||||
```rust
|
||||
pub struct InspectState {
|
||||
pub target: Option<InspectTarget>,
|
||||
pub doc: Option<kb_core::CanonicalDocument>,
|
||||
pub chunk: Option<kb_core::Chunk>,
|
||||
pub doc: Option<kebab_core::CanonicalDocument>,
|
||||
pub chunk: Option<kebab_core::Chunk>,
|
||||
pub collapsed: std::collections::HashSet<&'static str>,
|
||||
pub scroll: u16,
|
||||
}
|
||||
@@ -89,7 +89,7 @@ pub struct InspectState {
|
||||
- `c` → collapse / expand currently focused section (focus is implicit by current scroll position; v1 may simplify by toggling all sections)
|
||||
- `Esc` → return to previous pane (Library or Search)
|
||||
- `Enter` → no-op (Inspect is terminal — no editor jump here; users use Search pane for jump)
|
||||
- Loading: while `kb-app::inspect_doc` or `inspect_chunk` runs, show "loading…". On error, popup with hint.
|
||||
- Loading: while `kebab-app::inspect_doc` or `inspect_chunk` runs, show "loading…". On error, popup with hint.
|
||||
- Renders must conform to wire schemas `doc_summary.v1` (subset for header) and `chunk_inspection.v1`.
|
||||
|
||||
## Storage / wire effects
|
||||
@@ -100,18 +100,18 @@ pub struct InspectState {
|
||||
|
||||
| kind | description | fixture / data |
|
||||
|------|-------------|----------------|
|
||||
| unit | switching to InspectTarget::Doc triggers `kb-app::inspect_doc` once | inline mock |
|
||||
| unit | switching to InspectTarget::Doc triggers `kebab-app::inspect_doc` once | inline mock |
|
||||
| unit | scroll bounded by content height | inline |
|
||||
| unit | collapse toggle via `c` flips state | inline |
|
||||
| snapshot | doc-view rendered for fixture stable | TestBackend + fixture |
|
||||
| snapshot | chunk-view rendered for fixture stable | TestBackend + fixture |
|
||||
|
||||
All tests under `cargo test -p kb-tui inspect`.
|
||||
All tests under `cargo test -p kebab-tui inspect`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-tui` passes
|
||||
- [ ] `cargo test -p kb-tui inspect` passes
|
||||
- [ ] `cargo check -p kebab-tui` passes
|
||||
- [ ] `cargo test -p kebab-tui inspect` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] Manual smoke: inspect a doc with multiple chunks, scroll, return to library
|
||||
- [ ] PR links design §3.5, §2.5, §2.6
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P9
|
||||
component: kb-desktop (Tauri)
|
||||
component: kebab-desktop (Tauri)
|
||||
task_id: p9-5
|
||||
title: "Tauri desktop app: backend commands wrapping kb-app + multimodal source viewer"
|
||||
title: "Tauri desktop app: backend commands wrapping kebab-app + multimodal source viewer"
|
||||
status: planned
|
||||
depends_on: [p9-1, p9-2, p9-3, p9-4]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [report §16.3 desktop (also tasks/phase-9-ui.md epic), design §1 ask/search scenes, design §2 wire schemas v1, design §8 module boundaries]
|
||||
---
|
||||
|
||||
@@ -14,23 +14,23 @@ contract_sections: [report §16.3 desktop (also tasks/phase-9-ui.md epic), desig
|
||||
|
||||
## Goal
|
||||
|
||||
Stand up a Tauri 2.x app (`kb-desktop` crate as backend, `kb-desktop-frontend/` as web assets) whose Tauri commands wrap `kb-app` 1:1. The frontend renders multimodal source viewers (Markdown render, PDF page viewer, image viewer with region overlay, audio player with seek). Citation clicks route to the appropriate viewer.
|
||||
Stand up a Tauri 2.x app (`kebab-desktop` crate as backend, `kebab-desktop-frontend/` as web assets) whose Tauri commands wrap `kebab-app` 1:1. The frontend renders multimodal source viewers (Markdown render, PDF page viewer, image viewer with region overlay, audio player with seek). Citation clicks route to the appropriate viewer.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Last task. Combines all backend phases into a single user-facing surface. Strict policy: backend commands are thin wrappers over `kb-app`; no new business logic.
|
||||
Last task. Combines all backend phases into a single user-facing surface. Strict policy: backend commands are thin wrappers over `kebab-app`; no new business logic.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- backend (`kb-desktop`):
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-app`
|
||||
- backend (`kebab-desktop`):
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-app`
|
||||
- `tauri = "2"` + `tauri-build`
|
||||
- `serde`, `serde_json`
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
- frontend (`kb-desktop-frontend/`): vanilla TypeScript + Vite (default; user may swap to Svelte/Solid in a follow-up).
|
||||
- frontend (`kebab-desktop-frontend/`): vanilla TypeScript + Vite (default; user may swap to Svelte/Solid in a follow-up).
|
||||
- PDF rendering: `pdfjs-dist`
|
||||
- Markdown rendering: `marked` + `dompurify`
|
||||
- Audio: HTML `<audio>` with custom segment overlay
|
||||
@@ -38,7 +38,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (UI must go through `kb-app` only — design §8).
|
||||
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (UI must go through `kebab-app` only — design §8).
|
||||
- **No native PDF render backend** (no `pdfium`, no `mupdf`, no `poppler`). PDF rendering lives entirely in the frontend (`pdfjs-dist`). Adding any of these would (a) bloat the bundle 100+ MB, (b) require frozen-design amendment, and (c) double the path-containment surface.
|
||||
|
||||
## Inputs
|
||||
@@ -46,7 +46,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| Tauri commands | invoked from frontend | user clicks |
|
||||
| `kb-config::Config` | runtime | env / file |
|
||||
| `kebab-config::Config` | runtime | env / file |
|
||||
| user file system (read-only) | for source viewers | OS |
|
||||
|
||||
## Outputs
|
||||
@@ -59,7 +59,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
// Tauri command surface (one per kb-app facade method, plus source viewers)
|
||||
// Tauri command surface (one per kebab-app facade method, plus source viewers)
|
||||
#[tauri::command] fn cmd_init(force: bool) -> Result<()>;
|
||||
#[tauri::command] fn cmd_ingest(scope_json: serde_json::Value, summary_only: bool) -> Result<serde_json::Value /* IngestReportWireV1 */>;
|
||||
#[tauri::command] fn cmd_list_docs(filter_json: serde_json::Value) -> Result<Vec<serde_json::Value /* DocSummaryWireV1 */>>;
|
||||
@@ -75,11 +75,11 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
|
||||
#[tauri::command] fn cmd_read_file_bytes(path: String) -> Result<Vec<u8>>; // raw bytes for PDF / image / audio
|
||||
```
|
||||
|
||||
(All commands convert internal `kb-core` types to wire-schema-v1 JSON before returning.)
|
||||
(All commands convert internal `kebab-core` types to wire-schema-v1 JSON before returning.)
|
||||
|
||||
## Behavior contract
|
||||
|
||||
- Backend bootstraps `tracing` to a file under `~/.local/state/kb/logs/` and a Tauri plugin loads/saves window state.
|
||||
- Backend bootstraps `tracing` to a file under `~/.local/state/kebab/logs/` and a Tauri plugin loads/saves window state.
|
||||
- Every Tauri command performs **path containment** for source viewers: resolves `path` against `config.workspace.root`, rejects (`anyhow::Error`) any path outside.
|
||||
- Layout (frontend): left = Library + Search + Ask tabs; right = Source viewer keyed by current citation.
|
||||
- Citation routing in the frontend (clicks on `[#N]` markers or hit rows). All rendering is frontend-side; backend serves raw bytes only.
|
||||
@@ -88,23 +88,23 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
|
||||
- `Citation::Region { path, x, y, w, h }` → `cmd_read_file_bytes(path)` → blob URL → `<img>` + absolute-positioned overlay at `(x, y, w, h)`.
|
||||
- `Citation::Caption { path, model }` → same as Region but no overlay; caption banner shows `model`.
|
||||
- `Citation::Time { path, start_ms, end_ms }` → `cmd_read_file_bytes(path)` → blob URL → `<audio src=...>` seeked to `start_ms / 1000`, with a timeline marker spanning `[start_ms, end_ms]`.
|
||||
- Streaming `kb ask`: backend command `cmd_ask` returns the buffered Answer (per §0 Q5: pipe/JSON mode buffers). For real-time streaming in the desktop, expose a separate `cmd_ask_stream` event channel via Tauri's `Window::emit("kb://ask-token", payload)`. (Implementation can be deferred to a follow-up; v1 of the desktop accepts buffered.)
|
||||
- Streaming `kebab ask`: backend command `cmd_ask` returns the buffered Answer (per §0 Q5: pipe/JSON mode buffers). For real-time streaming in the desktop, expose a separate `cmd_ask_stream` event channel via Tauri's `Window::emit("kebab://ask-token", payload)`. (Implementation can be deferred to a follow-up; v1 of the desktop accepts buffered.)
|
||||
- All backend errors mapped to a `String` message with structure `{ "error": msg, "hint": Option<msg> }`.
|
||||
- Frontend respects light/dark per OS theme (Tauri supplies the API).
|
||||
- No telemetry. No automatic update channel for v1 (manual download).
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Reads via `kb-app` (which reads/writes via SQLite + LanceDB).
|
||||
- Reads via `kebab-app` (which reads/writes via SQLite + LanceDB).
|
||||
- Reads workspace files directly for source viewers (path-contained).
|
||||
- Writes nothing outside what `kb-app` writes.
|
||||
- Writes nothing outside what `kebab-app` writes.
|
||||
- Wire JSON between backend and frontend uses schema v1 strictly. The frontend MUST validate `schema_version` strings on every IPC return and warn (or upgrade-gate) when `v1 != current`.
|
||||
|
||||
## Test plan
|
||||
|
||||
| kind | description | fixture / data |
|
||||
|------|-------------|----------------|
|
||||
| unit (backend) | each command wraps the corresponding `kb-app` function and serializes via wire schema | inline mocks |
|
||||
| unit (backend) | each command wraps the corresponding `kebab-app` function and serializes via wire schema | inline mocks |
|
||||
| unit (backend) | `cmd_read_markdown` rejects paths outside workspace | tmp config |
|
||||
| unit (backend) | `cmd_read_file_bytes` rejects paths outside workspace incl. `..`, absolute path, symlink-out | tmp config + traversal vectors |
|
||||
| unit (backend) | `cmd_read_file_bytes` returns identical bytes to `std::fs::read` for an in-workspace file | tmp config |
|
||||
@@ -112,13 +112,13 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
|
||||
| smoke (frontend, optional in this task) | Vitest test that mounts the Library tab, calls a mocked `cmd_list_docs`, renders 1 row | minimal |
|
||||
| manual | full-stack smoke against a real ingested workspace (Markdown + 1 PDF + 1 image + 1 audio); each citation jumps correctly | manual checklist |
|
||||
|
||||
Backend tests under `cargo test -p kb-desktop`. Frontend tests are bonus and not gated by this task's DoD.
|
||||
Backend tests under `cargo test -p kebab-desktop`. Frontend tests are bonus and not gated by this task's DoD.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-desktop` passes
|
||||
- [ ] `cargo test -p kb-desktop` passes
|
||||
- [ ] `pnpm --filter kb-desktop-frontend build` produces a static asset bundle Tauri can package
|
||||
- [ ] `cargo check -p kebab-desktop` passes
|
||||
- [ ] `cargo test -p kebab-desktop` passes
|
||||
- [ ] `pnpm --filter kebab-desktop-frontend build` produces a static asset bundle Tauri can package
|
||||
- [ ] `tauri build` produces an unsigned dmg on macOS in CI (signed/notarized are out of scope)
|
||||
- [ ] Each Tauri command returns wire-schema-v1 JSON; frontend asserts `schema_version`
|
||||
- [ ] No imports outside Allowed dependencies (backend)
|
||||
@@ -139,4 +139,4 @@ Backend tests under `cargo test -p kb-desktop`. Frontend tests are bonus and not
|
||||
- Path containment is the desktop's most security-sensitive surface; tests must include path traversal vectors (`..`, symlinks, absolute paths).
|
||||
- PDF rendering via `pdfjs-dist` is heavy (~2 MB worker); lazy-load on first PDF citation. The trade-off vs a native render backend (e.g., `pdfium` ~150 MB binary, code-signing pain) is heavily one-sided; v1 stays on `pdfjs-dist`.
|
||||
- Audio formats vary; rely on the browser engine's HTML audio decoder (WebKit on macOS supports `.m4a`, `.mp3`; mileage varies on `.flac`/`.ogg`).
|
||||
- Wide Tauri command surface tempts business-logic creep; CI must enforce that no `kb-rag` / `kb-search` / store crate appears in `kb-desktop`'s `cargo tree`.
|
||||
- Wide Tauri command surface tempts business-logic creep; CI must enforce that no `kebab-rag` / `kebab-search` / store crate appears in `kebab-desktop`'s `cargo tree`.
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P0
|
||||
title: "Workspace 뼈대 + 도메인 계약"
|
||||
status: completed
|
||||
depends_on: []
|
||||
source: kb_local_rust_report.md §3, §4, §6, §7, §13
|
||||
source: kebab_local_rust_report.md §3, §4, §6, §7, §13
|
||||
---
|
||||
|
||||
# P0 — Workspace 뼈대 + 도메인 계약
|
||||
@@ -16,11 +16,11 @@ compile 되는 Rust 2024 workspace 와 domain spec 확정. 이후 모든 phase
|
||||
|
||||
| crate | 역할 |
|
||||
|-------|------|
|
||||
| `kb-core` | domain type, trait, error, ID 규칙 |
|
||||
| `kb-parse-types` | parser 중간 표현 (`ParsedBlock` 등) — `kb-core` 와 parsers/normalize 사이 thin layer (design §3.7b) |
|
||||
| `kb-config` | config 로딩, 기본값, 경로 확장 |
|
||||
| `kb-app` | facade. CLI/TUI/desktop 공통 진입점 |
|
||||
| `kb-cli` | `kb` 바이너리 skeleton (`--help`만 동작) |
|
||||
| `kebab-core` | domain type, trait, error, ID 규칙 |
|
||||
| `kebab-parse-types` | parser 중간 표현 (`ParsedBlock` 등) — `kebab-core` 와 parsers/normalize 사이 thin layer (design §3.7b) |
|
||||
| `kebab-config` | config 로딩, 기본값, 경로 확장 |
|
||||
| `kebab-app` | facade. CLI/TUI/desktop 공통 진입점 |
|
||||
| `kebab-cli` | `kebab` 바이너리 skeleton (`--help`만 동작) |
|
||||
|
||||
## Workspace 설정
|
||||
|
||||
@@ -29,7 +29,7 @@ compile 되는 Rust 2024 workspace 와 domain spec 확정. 이후 모든 phase
|
||||
```toml
|
||||
[workspace]
|
||||
resolver = "3"
|
||||
members = ["crates/kb-core", "crates/kb-parse-types", "crates/kb-config", "crates/kb-app", "crates/kb-cli"]
|
||||
members = ["crates/kebab-core", "crates/kebab-parse-types", "crates/kebab-config", "crates/kebab-app", "crates/kebab-cli"]
|
||||
|
||||
[workspace.package]
|
||||
edition = "2024"
|
||||
@@ -48,11 +48,11 @@ tracing = "0.1"
|
||||
|
||||
추가 멤버 crate 는 후속 phase 에서 합류.
|
||||
|
||||
## kb-core 도메인 타입
|
||||
## kebab-core 도메인 타입
|
||||
|
||||
`RawAsset`, `CanonicalDocument`, `Block`, `Chunk`, `SearchHit`, `Citation`, `SourceSpan`, `Provenance`, `Metadata`. report §6 정의 그대로.
|
||||
|
||||
## kb-core trait
|
||||
## kebab-core trait
|
||||
|
||||
```rust
|
||||
pub trait SourceConnector { fn scan(&self, scope: &SourceScope) -> anyhow::Result<Vec<RawAsset>>; }
|
||||
@@ -78,7 +78,7 @@ index_id = collection name + index version + embedding model id
|
||||
|
||||
각 record 에 다음 version 필드 보존: `doc_version`, `schema_version`, `parser_version`, `chunker_version`, `embedding_model`, `embedding_version`, `index_version`, `prompt_template_version`.
|
||||
|
||||
## kb-app facade
|
||||
## kebab-app facade
|
||||
|
||||
CLI/TUI/desktop 모두 facade 함수 호출. parser/DB/LLM adapter 직접 호출 금지. 초기 facade 메서드 stub:
|
||||
|
||||
@@ -91,9 +91,9 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
|
||||
pub fn doctor() -> anyhow::Result<DoctorReport>;
|
||||
```
|
||||
|
||||
## kb-cli skeleton
|
||||
## kebab-cli skeleton
|
||||
|
||||
`clap` derive. subcommand: `init`, `ingest`, `index`, `search`, `ask`, `inspect doc|chunk`, `doctor`. 본체는 `kb-app` 호출만. P0 에선 `--help` 만 동작.
|
||||
`clap` derive. subcommand: `init`, `ingest`, `index`, `search`, `ask`, `inspect doc|chunk`, `doctor`. 본체는 `kebab-app` 호출만. P0 에선 `--help` 만 동작.
|
||||
|
||||
## spec 문서
|
||||
|
||||
@@ -112,15 +112,15 @@ pub fn doctor() -> anyhow::Result<DoctorReport>;
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-core`: 외부 의존 최소 (serde, time, uuid, blake3, thiserror, tracing).
|
||||
- `kb-cli` → `kb-app` 만 의존. parser/DB/LLM 직접 의존 금지.
|
||||
- `kb-app` 은 trait 만 보고 동작. 구현체는 dyn injection.
|
||||
- `kebab-core`: 외부 의존 최소 (serde, time, uuid, blake3, thiserror, tracing).
|
||||
- `kebab-cli` → `kebab-app` 만 의존. parser/DB/LLM 직접 의존 금지.
|
||||
- `kebab-app` 은 trait 만 보고 동작. 구현체는 dyn injection.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `cargo check --workspace` 통과
|
||||
- [ ] `cargo test --workspace` 통과 (단위 테스트는 ID 생성/도메인 직렬화 round-trip)
|
||||
- [ ] `kb --help` 출력
|
||||
- [ ] `kebab --help` 출력
|
||||
- [ ] `docs/spec/*` 7개 문서 존재
|
||||
- [ ] `fixtures/markdown/*` 3개 존재
|
||||
- [ ] domain type serde JSON snapshot test 1개 이상
|
||||
|
||||
@@ -3,47 +3,47 @@ phase: P1
|
||||
title: "Markdown ingestion 파이프라인"
|
||||
status: completed
|
||||
depends_on: [P0]
|
||||
source: kb_local_rust_report.md §8, §14, §17 Phase 1
|
||||
source: kebab_local_rust_report.md §8, §14, §17 Phase 1
|
||||
---
|
||||
|
||||
# P1 — Markdown ingestion 파이프라인
|
||||
|
||||
## 목표
|
||||
|
||||
`Markdown 파일 -> RawAsset -> CanonicalDocument -> Chunk -> SQLite` 흐름 완성. LLM/embedding 없이도 `kb ingest` / `kb list docs` / `kb inspect doc <id>` 동작.
|
||||
`Markdown 파일 -> RawAsset -> CanonicalDocument -> Chunk -> SQLite` 흐름 완성. LLM/embedding 없이도 `kebab ingest` / `kebab list docs` / `kebab inspect doc <id>` 동작.
|
||||
|
||||
## 산출 crate
|
||||
|
||||
| crate | 역할 |
|
||||
|-------|------|
|
||||
| `kb-source-fs` | local folder scan, checksum, 변경 감지. `SourceConnector` 구현 |
|
||||
| `kb-parse-md` | Markdown bytes → structured document. `Extractor` 구현 |
|
||||
| `kb-normalize` | parser output → `CanonicalDocument` |
|
||||
| `kb-chunk` | block-aware chunking. `Chunker` 구현 (`md-heading-v1`) |
|
||||
| `kb-store-sqlite` | metadata, document, chunk, job table. FTS table 은 P2 에서 활성화 |
|
||||
| `kebab-source-fs` | local folder scan, checksum, 변경 감지. `SourceConnector` 구현 |
|
||||
| `kebab-parse-md` | Markdown bytes → structured document. `Extractor` 구현 |
|
||||
| `kebab-normalize` | parser output → `CanonicalDocument` |
|
||||
| `kebab-chunk` | block-aware chunking. `Chunker` 구현 (`md-heading-v1`) |
|
||||
| `kebab-store-sqlite` | metadata, document, chunk, job table. FTS table 은 P2 에서 활성화 |
|
||||
|
||||
## kb-source-fs
|
||||
## kebab-source-fs
|
||||
|
||||
- 입력: `SourceScope { root: PathBuf, include: Vec<Glob>, exclude: Vec<Glob> }`
|
||||
- 동작: 재귀 walk → 각 파일 `blake3` → `RawAsset` 목록.
|
||||
- 변경 감지: `(source_uri, checksum)` 기준 신/구 비교. 동일 checksum 은 skip.
|
||||
- watch 모드는 P1 범위 밖 (config 만 정의, 구현 후순위).
|
||||
|
||||
## kb-parse-md
|
||||
## kebab-parse-md
|
||||
|
||||
- parser 후보: `pulldown-cmark` 1차. GFM table/task list 필요해지면 `comrak` 검토 (§8).
|
||||
- 보존 대상: YAML/TOML frontmatter, heading tree, paragraph, list, code block + lang tag, table, blockquote, link, image ref, **line range**.
|
||||
- 출력: 중간 표현 (parser 고유). `kb-normalize` 가 canonical 로 변환.
|
||||
- 출력: 중간 표현 (parser 고유). `kebab-normalize` 가 canonical 로 변환.
|
||||
- malformed markdown: panic 금지. 가능한 부분만 보존하고 `Provenance` 에 warning 기록.
|
||||
|
||||
## kb-normalize
|
||||
## kebab-normalize
|
||||
|
||||
- 책임: parser 중간 표현 → `CanonicalDocument`.
|
||||
- frontmatter → `Metadata` (id, title, aliases, tags, created_at, updated_at, source_type, trust_level, lang).
|
||||
- block 트리 평탄화 + `BlockId` 부여 (heading path + 순번 기반 deterministic).
|
||||
- `SourceSpan` 은 `LineRange { start, end }` 또는 `ByteRange` 둘 다 허용. Markdown 은 line range 1차.
|
||||
|
||||
## kb-chunk (`md-heading-v1`)
|
||||
## kebab-chunk (`md-heading-v1`)
|
||||
|
||||
우선순위 (§14):
|
||||
1. heading boundary 우선
|
||||
@@ -58,7 +58,7 @@ policy 기본값: `target_tokens = 500`, `overlap_tokens = 80`, `respect_markdow
|
||||
|
||||
token 추정: tokenizer 미도입 단계라 byte / 문자 기반 근사 OK. 실제 tokenizer 는 P3 embedding 도입 시 교체.
|
||||
|
||||
## kb-store-sqlite
|
||||
## kebab-store-sqlite
|
||||
|
||||
스키마 (1차):
|
||||
|
||||
@@ -117,7 +117,7 @@ CREATE TABLE jobs (
|
||||
- transaction: ingest 1건 = 1 transaction. 부분 실패 시 rollback.
|
||||
- idempotent: 동일 `doc_id` 재수집은 UPSERT, version bump.
|
||||
|
||||
## kb-app facade 확장
|
||||
## kebab-app facade 확장
|
||||
|
||||
```rust
|
||||
pub fn ingest(scope: SourceScope) -> anyhow::Result<IngestReport>;
|
||||
@@ -131,10 +131,10 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb ingest <path> [--include <glob>] [--exclude <glob>]
|
||||
kb list docs [--tag <t>]
|
||||
kb inspect doc <doc_id>
|
||||
kb inspect chunk <chunk_id>
|
||||
kebab ingest <path> [--include <glob>] [--exclude <glob>]
|
||||
kebab list docs [--tag <t>]
|
||||
kebab inspect doc <doc_id>
|
||||
kebab inspect chunk <chunk_id>
|
||||
```
|
||||
|
||||
## 테스트
|
||||
@@ -146,14 +146,14 @@ kb inspect chunk <chunk_id>
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
`kb-parse-md` 금지: `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, embedding 호출. parser 는 순수 함수.
|
||||
`kebab-parse-md` 금지: `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, embedding 호출. parser 는 순수 함수.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb ingest <path>` 실행 후 SQLite 에 documents/blocks/chunks 채워짐
|
||||
- [ ] `kb list docs` 정상 출력
|
||||
- [ ] `kb inspect doc <id>` JSON 출력
|
||||
- [ ] `kb inspect chunk <id>` JSON 출력 (heading path + source span 포함)
|
||||
- [ ] `kebab ingest <path>` 실행 후 SQLite 에 documents/blocks/chunks 채워짐
|
||||
- [ ] `kebab list docs` 정상 출력
|
||||
- [ ] `kebab inspect doc <id>` JSON 출력
|
||||
- [ ] `kebab inspect chunk <id>` JSON 출력 (heading path + source span 포함)
|
||||
- [ ] 같은 폴더 재수집 시 중복 row 없음
|
||||
- [ ] parser/chunker version 변경 시 재처리 대상 식별 가능
|
||||
- [ ] fixture snapshot test 통과
|
||||
|
||||
@@ -3,19 +3,19 @@ phase: P2
|
||||
title: "SQLite FTS5 lexical 검색 + citation"
|
||||
status: completed
|
||||
depends_on: [P1]
|
||||
source: kb_local_rust_report.md §10, §15, §17 Phase 2
|
||||
source: kebab_local_rust_report.md §10, §15, §17 Phase 2
|
||||
---
|
||||
|
||||
# P2 — SQLite FTS5 lexical 검색 + citation
|
||||
|
||||
## 목표
|
||||
|
||||
embedding/LLM 없이 FTS5 만으로 동작하는 검색 + citation 출력. `kb search "..."` 가 chunk 와 source span 반환.
|
||||
embedding/LLM 없이 FTS5 만으로 동작하는 검색 + citation 출력. `kebab search "..."` 가 chunk 와 source span 반환.
|
||||
|
||||
## 산출 crate
|
||||
|
||||
- `kb-search` (lexical 모드) — `Retriever` trait 구현 1번째.
|
||||
- `kb-store-sqlite` 확장: FTS5 virtual table + trigger.
|
||||
- `kebab-search` (lexical 모드) — `Retriever` trait 구현 1번째.
|
||||
- `kebab-store-sqlite` 확장: FTS5 virtual table + trigger.
|
||||
|
||||
## FTS5 스키마
|
||||
|
||||
@@ -69,15 +69,15 @@ pub struct SearchHit {
|
||||
}
|
||||
```
|
||||
|
||||
`Citation` 형식: `notes/rust/kb.md:L12-L34`.
|
||||
`Citation` 형식: `notes/rust/kebab.md:L12-L34`.
|
||||
|
||||
## 인덱스 라이프사이클
|
||||
|
||||
- ingest 시 trigger 로 자동 동기화.
|
||||
- `kb index --rebuild-fts` command 로 FTS table 재구축 (chunker version bump 후 사용).
|
||||
- `kebab index --rebuild-fts` command 로 FTS table 재구축 (chunker version bump 후 사용).
|
||||
- `index_version` 은 `(schema_version, fts_config_hash)` 조합.
|
||||
|
||||
## kb-app facade 확장
|
||||
## kebab-app facade 확장
|
||||
|
||||
```rust
|
||||
pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHit>>;
|
||||
@@ -86,16 +86,16 @@ pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHit>>;
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb search "Rust workspace 설계" [--k 10] [--tag rust] [--mode lexical]
|
||||
kb index --rebuild-fts
|
||||
kebab search "Rust workspace 설계" [--k 10] [--tag rust] [--mode lexical]
|
||||
kebab index --rebuild-fts
|
||||
```
|
||||
|
||||
출력 예:
|
||||
|
||||
```text
|
||||
1. [0.82] Rust workspace는 여러 package를 하나로 관리한다…
|
||||
doc: notes/rust/kb.md
|
||||
citation: notes/rust/kb.md:L12-L34
|
||||
doc: notes/rust/kebab.md
|
||||
citation: notes/rust/kebab.md:L12-L34
|
||||
heading: 아키텍처 > Rust workspace
|
||||
```
|
||||
|
||||
@@ -108,13 +108,13 @@ kb index --rebuild-fts
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-search` 는 `kb-store-sqlite` 와 `kb-core` 만 의존.
|
||||
- `kebab-search` 는 `kebab-store-sqlite` 와 `kebab-core` 만 의존.
|
||||
- LLM/embedding 호출 금지 (P2 단계).
|
||||
- CLI 는 `kb-app` 통해서만 호출.
|
||||
- CLI 는 `kebab-app` 통해서만 호출.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb search "..."` top-k chunk 반환
|
||||
- [ ] `kebab search "..."` top-k chunk 반환
|
||||
- [ ] 모든 결과에 citation 포함
|
||||
- [ ] citation line range 가 원본과 일치
|
||||
- [ ] 한영 혼합 query 동작 (한국어 토큰화 한계는 노트로)
|
||||
|
||||
@@ -3,23 +3,23 @@ phase: P3
|
||||
title: "Local embedding + LanceDB + hybrid search"
|
||||
status: completed
|
||||
depends_on: [P2]
|
||||
source: kb_local_rust_report.md §10, §11, §15, §17 Phase 3
|
||||
source: kebab_local_rust_report.md §10, §11, §15, §17 Phase 3
|
||||
---
|
||||
|
||||
# P3 — Local embedding + LanceDB + hybrid search
|
||||
|
||||
## 목표
|
||||
|
||||
local embedding 으로 chunk vector 화 → LanceDB 저장 → vector 검색 + lexical 융합 (hybrid). `kb search --mode {lexical,vector,hybrid}` 동작.
|
||||
local embedding 으로 chunk vector 화 → LanceDB 저장 → vector 검색 + lexical 융합 (hybrid). `kebab search --mode {lexical,vector,hybrid}` 동작.
|
||||
|
||||
## 산출 crate
|
||||
|
||||
| crate | 역할 |
|
||||
|-------|------|
|
||||
| `kb-embed` | `Embedder` trait + `EmbeddingInput`/output 타입 |
|
||||
| `kb-embed-local` | `fastembed-rs` adapter (1차). later: Ollama embed endpoint, candle |
|
||||
| `kb-store-vector` | LanceDB 연동. table 관리, upsert, vector search |
|
||||
| `kb-search` | lexical + vector 병행 + score fusion |
|
||||
| `kebab-embed` | `Embedder` trait + `EmbeddingInput`/output 타입 |
|
||||
| `kebab-embed-local` | `fastembed-rs` adapter (1차). later: Ollama embed endpoint, candle |
|
||||
| `kebab-store-vector` | LanceDB 연동. table 관리, upsert, vector search |
|
||||
| `kebab-search` | lexical + vector 병행 + score fusion |
|
||||
|
||||
## Embedder
|
||||
|
||||
@@ -64,7 +64,7 @@ created_at : timestamp
|
||||
## Indexing job
|
||||
|
||||
```text
|
||||
kb index --embeddings [--model <id>] [--batch-size N] [--resume]
|
||||
kebab index --embeddings [--model <id>] [--batch-size N] [--resume]
|
||||
```
|
||||
|
||||
- chunk 중 `embedding_id = chunk_id + model_id + dim` 가 vector store 에 없는 것만 처리.
|
||||
@@ -91,7 +91,7 @@ k_rrf 기본 60.
|
||||
|
||||
P3 범위에선 reranker 미도입 (P+ 단계 노트).
|
||||
|
||||
## kb-search 구조
|
||||
## kebab-search 구조
|
||||
|
||||
```rust
|
||||
pub struct HybridRetriever {
|
||||
@@ -102,11 +102,11 @@ pub struct HybridRetriever {
|
||||
```
|
||||
|
||||
- 각 sub retriever 는 `Retriever` trait 구현.
|
||||
- `kb-app::search` 가 mode 따라 dispatch.
|
||||
- `kebab-app::search` 가 mode 따라 dispatch.
|
||||
|
||||
## kb-app facade 확장
|
||||
## kebab-app facade 확장
|
||||
|
||||
P3 동안 kb-app facade 의 `ingest` / `search` / `list_docs` / `inspect_doc` / `inspect_chunk` 의 stub 본체를 실제 라이브러리 호출로 대체. P0 부터 시그니처는 frozen 이므로 signature 변경 없이 body 만 swap.
|
||||
P3 동안 kebab-app facade 의 `ingest` / `search` / `list_docs` / `inspect_doc` / `inspect_chunk` 의 stub 본체를 실제 라이브러리 호출로 대체. P0 부터 시그니처는 frozen 이므로 signature 변경 없이 body 만 swap.
|
||||
|
||||
```rust
|
||||
pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>; // p3-5
|
||||
@@ -117,14 +117,14 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
|
||||
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>; // p4-3 (stub remains)
|
||||
```
|
||||
|
||||
p3-5 는 LLM 미관여 facade 모두 (`ask` 제외) 를 한 번에 wire. 이후 `cargo run -p kb-cli -- index` 와 `cargo run -p kb-cli -- search --mode {lexical,vector,hybrid}` 가 실 동작.
|
||||
p3-5 는 LLM 미관여 facade 모두 (`ask` 제외) 를 한 번에 wire. 이후 `cargo run -p kebab-cli -- index` 와 `cargo run -p kebab-cli -- search --mode {lexical,vector,hybrid}` 가 실 동작.
|
||||
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb index --embeddings
|
||||
kb search --mode vector "비슷한 설계 원칙"
|
||||
kb search --mode hybrid "Markdown chunking 규칙"
|
||||
kebab index --embeddings
|
||||
kebab search --mode vector "비슷한 설계 원칙"
|
||||
kebab search --mode hybrid "Markdown chunking 규칙"
|
||||
```
|
||||
|
||||
## 테스트
|
||||
@@ -137,19 +137,19 @@ kb search --mode hybrid "Markdown chunking 규칙"
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-embed-local` 만 ONNX/모델 binding 의존. 다른 crate 는 trait 만 사용.
|
||||
- `kb-store-vector` 는 `lancedb` 의존. SQLite 와 cross-write 금지 (각 store 책임 분리).
|
||||
- `kebab-embed-local` 만 ONNX/모델 binding 의존. 다른 crate 는 trait 만 사용.
|
||||
- `kebab-store-vector` 는 `lancedb` 의존. SQLite 와 cross-write 금지 (각 store 책임 분리).
|
||||
- LLM crate 와 분리 (§11.1).
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb index` (= `kb-app::ingest`) 로 모든 chunk 가 SQLite + LanceDB 에 저장 (p3-5)
|
||||
- [ ] `kb search --mode vector` 정상 hit
|
||||
- [ ] `kb search --mode hybrid` 정상 hit, citation 포함
|
||||
- [ ] `kebab index` (= `kebab-app::ingest`) 로 모든 chunk 가 SQLite + LanceDB 에 저장 (p3-5)
|
||||
- [ ] `kebab search --mode vector` 정상 hit
|
||||
- [ ] `kebab search --mode hybrid` 정상 hit, citation 포함
|
||||
- [ ] 모델/차원 변경 시 별도 table 로 분리 저장
|
||||
- [ ] resume 시 미완료 chunk 만 처리 (P+ 로 deferred)
|
||||
- [ ] hit@k 측정 가능한 형태로 결과 구조화 (P5 준비)
|
||||
- [ ] `kb-app::{ingest,search,list_docs,inspect_doc,inspect_chunk}` 가 실 동작 (`ask` 는 P4-3 까지 stub) — p3-5
|
||||
- [ ] `kebab-app::{ingest,search,list_docs,inspect_doc,inspect_chunk}` 가 실 동작 (`ask` 는 P4-3 까지 stub) — p3-5
|
||||
|
||||
## 리스크 / 주의
|
||||
|
||||
|
||||
@@ -3,22 +3,22 @@ phase: P4
|
||||
title: "Local LLM + RAG + grounded answer"
|
||||
status: completed
|
||||
depends_on: [P3]
|
||||
source: kb_local_rust_report.md §11, §15.2, §17 Phase 4
|
||||
source: kebab_local_rust_report.md §11, §15.2, §17 Phase 4
|
||||
---
|
||||
|
||||
# P4 — Local LLM + RAG + grounded answer
|
||||
|
||||
## 목표
|
||||
|
||||
local LLM 으로 citation 포함 답변 생성. 근거 부족 시 거절. `kb ask "..."` 동작.
|
||||
local LLM 으로 citation 포함 답변 생성. 근거 부족 시 거절. `kebab ask "..."` 동작.
|
||||
|
||||
## 산출 crate
|
||||
|
||||
| crate | 역할 |
|
||||
|-------|------|
|
||||
| `kb-llm` | `LanguageModel` trait + request/response 타입 |
|
||||
| `kb-llm-local` | Ollama adapter 1차. later: llama.cpp, candle |
|
||||
| `kb-rag` | retrieval → context packing → prompt → generate → citation 검증 |
|
||||
| `kebab-llm` | `LanguageModel` trait + request/response 타입 |
|
||||
| `kebab-llm-local` | Ollama adapter 1차. later: llama.cpp, candle |
|
||||
| `kebab-rag` | retrieval → context packing → prompt → generate → citation 검증 |
|
||||
|
||||
## LanguageModel
|
||||
|
||||
@@ -50,9 +50,9 @@ pub struct GenerateResponse {
|
||||
- HTTP localhost 호출 (`http://127.0.0.1:11434/api/generate`).
|
||||
- 내부에서 async runtime 사용 가능. 외부 API 는 동기 wrapper 유지.
|
||||
- model 기본값 config (`qwen2.5:14b-instruct` 등). 실제 선택은 P5 eval 후 결정.
|
||||
- 서버 미기동 시 명확한 에러 메시지 + `kb doctor` 진단.
|
||||
- 서버 미기동 시 명확한 에러 메시지 + `kebab doctor` 진단.
|
||||
|
||||
## kb-rag 파이프라인
|
||||
## kebab-rag 파이프라인
|
||||
|
||||
```text
|
||||
query
|
||||
@@ -118,7 +118,7 @@ pub struct Answer {
|
||||
|
||||
`answers` table 에 저장 (재현/감사용). 사용한 chunk_id 목록 + retrieval params 도 함께.
|
||||
|
||||
## kb-app facade 확장
|
||||
## kebab-app facade 확장
|
||||
|
||||
```rust
|
||||
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>;
|
||||
@@ -127,9 +127,9 @@ pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>;
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb ask "내 KB 설계에서 저장소 전략은?"
|
||||
kb ask --k 8 --temperature 0 "..."
|
||||
kb ask --explain "..." # retrieval trace + packed prompt 출력
|
||||
kebab ask "내 KB 설계에서 저장소 전략은?"
|
||||
kebab ask --k 8 --temperature 0 "..."
|
||||
kebab ask --explain "..." # retrieval trace + packed prompt 출력
|
||||
```
|
||||
|
||||
## 테스트
|
||||
@@ -142,13 +142,13 @@ kb ask --explain "..." # retrieval trace + packed prompt 출력
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-llm-local` 만 Ollama HTTP 의존.
|
||||
- `kb-rag` 는 `kb-search` (Retriever trait) + `kb-llm` (LanguageModel trait) 만 사용. SQLite/LanceDB 직접 호출 금지.
|
||||
- CLI 는 `kb-app::ask` 만 호출.
|
||||
- `kebab-llm-local` 만 Ollama HTTP 의존.
|
||||
- `kebab-rag` 는 `kebab-search` (Retriever trait) + `kebab-llm` (LanguageModel trait) 만 사용. SQLite/LanceDB 직접 호출 금지.
|
||||
- CLI 는 `kebab-app::ask` 만 호출.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb ask "..."` 동작
|
||||
- [ ] `kebab ask "..."` 동작
|
||||
- [ ] 답변에 citation 포함
|
||||
- [ ] 근거 없는 질문 거절
|
||||
- [ ] `--explain` 으로 retrieval trace 확인
|
||||
@@ -158,6 +158,6 @@ kb ask --explain "..." # retrieval trace + packed prompt 출력
|
||||
## 리스크 / 주의
|
||||
|
||||
- 모델 선택은 P5 golden set 으로 평가 후 확정. P4 에선 default 만.
|
||||
- Ollama 미기동 / 모델 미다운로드 → `kb doctor` 가 명확히 안내.
|
||||
- Ollama 미기동 / 모델 미다운로드 → `kebab doctor` 가 명확히 안내.
|
||||
- LLM 답변에 hallucinated citation 자주 나옴. 후처리 검증이 핵심.
|
||||
- prompt template 변경은 `prompt_template_version` 반드시 bump.
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P5
|
||||
title: "Golden query / regression eval"
|
||||
status: completed
|
||||
depends_on: [P4]
|
||||
source: kb_local_rust_report.md §17 Phase 5, §18
|
||||
source: kebab_local_rust_report.md §17 Phase 5, §18
|
||||
---
|
||||
|
||||
# P5 — Golden query / regression eval
|
||||
@@ -14,7 +14,7 @@ source: kb_local_rust_report.md §17 Phase 5, §18
|
||||
|
||||
## 산출 crate
|
||||
|
||||
- `kb-eval` — golden query 실행기, 지표 계산, report 생성.
|
||||
- `kebab-eval` — golden query 실행기, 지표 계산, report 생성.
|
||||
|
||||
## Golden set fixture
|
||||
|
||||
@@ -25,9 +25,9 @@ source: kb_local_rust_report.md §17 Phase 5, §18
|
||||
query: "Markdown chunking 규칙"
|
||||
lang: ko
|
||||
expected_doc_ids:
|
||||
- doc:notes/rust/kb-architecture.md
|
||||
- doc:notes/rust/kebab-architecture.md
|
||||
expected_chunk_ids:
|
||||
- chunk:notes/rust/kb-architecture.md#chunking-policy
|
||||
- chunk:notes/rust/kebab-architecture.md#chunking-policy
|
||||
must_contain:
|
||||
- "heading"
|
||||
- "code block"
|
||||
@@ -57,9 +57,9 @@ source: kb_local_rust_report.md §17 Phase 5, §18
|
||||
## 실행 모드
|
||||
|
||||
```text
|
||||
kb eval run --suite golden [--mode {lexical,vector,hybrid}] [--with-rag]
|
||||
kb eval compare <run_id_a> <run_id_b>
|
||||
kb eval report <run_id> --format {json,md,html}
|
||||
kebab eval run --suite golden [--mode {lexical,vector,hybrid}] [--with-rag]
|
||||
kebab eval compare <run_id_a> <run_id_b>
|
||||
kebab eval report <run_id> --format {json,md,html}
|
||||
```
|
||||
|
||||
run record:
|
||||
@@ -90,7 +90,7 @@ DB 저장 (`eval_runs`, `eval_query_results` table) 또는 JSON 파일. 재현
|
||||
- 자동 hyperparameter 탐색 — 안 함.
|
||||
- LLM judge ("LLM as a judge") — P5 범위 밖. groundedness 는 rule-based (`must_contain`) 만.
|
||||
|
||||
## kb-app facade 확장
|
||||
## kebab-app facade 확장
|
||||
|
||||
```rust
|
||||
pub fn eval_run(opts: EvalRunOpts) -> anyhow::Result<EvalRun>;
|
||||
@@ -105,13 +105,13 @@ pub fn eval_compare(a: &str, b: &str) -> anyhow::Result<CompareReport>;
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-eval` 은 `kb-app` 만 호출 (검색/ask 는 facade 통해서). 내부 store/LLM 직접 호출 금지.
|
||||
- `kebab-eval` 은 `kebab-app` 만 호출 (검색/ask 는 facade 통해서). 내부 store/LLM 직접 호출 금지.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `fixtures/golden_queries.yaml` 30+ 개
|
||||
- [ ] `kb eval run` 으로 hit@k, MRR, citation_coverage 산출
|
||||
- [ ] `kb eval compare` 로 두 run 비교 가능
|
||||
- [ ] `kebab eval run` 으로 hit@k, MRR, citation_coverage 산출
|
||||
- [ ] `kebab eval compare` 로 두 run 비교 가능
|
||||
- [ ] config snapshot 이 run 에 저장됨 (chunker, embedding, llm, prompt 버전)
|
||||
- [ ] CI 로 회귀 감지 가능 (예: hit@5 가 baseline 대비 -3% 이상 떨어지면 실패)
|
||||
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P6
|
||||
title: "이미지 ingestion (OCR + caption)"
|
||||
status: planned
|
||||
depends_on: [P5]
|
||||
source: kb_local_rust_report.md §9.1, §17 Phase 6
|
||||
source: kebab_local_rust_report.md §9.1, §17 Phase 6
|
||||
---
|
||||
|
||||
# P6 — 이미지 ingestion
|
||||
@@ -14,8 +14,8 @@ source: kb_local_rust_report.md §9.1, §17 Phase 6
|
||||
|
||||
## 산출 crate
|
||||
|
||||
- `kb-parse-image` — `Extractor` 구현. 이미지 → CanonicalDocument.
|
||||
- (선택) `kb-ocr` / `kb-vlm` 어댑터 (외부 모델 분리 시).
|
||||
- `kebab-parse-image` — `Extractor` 구현. 이미지 → CanonicalDocument.
|
||||
- (선택) `kebab-ocr` / `kebab-vlm` 어댑터 (외부 모델 분리 시).
|
||||
|
||||
## 추출 정보 3종 (§9.1)
|
||||
|
||||
@@ -83,10 +83,10 @@ photos/diagram-2026.png#caption # caption chunk
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb ingest ./assets/diagram.png
|
||||
kb ingest ./assets/ # 폴더 안 이미지 자동 인식
|
||||
kb search "이미지 안의 OCR 텍스트"
|
||||
kb inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
|
||||
kebab ingest ./assets/diagram.png
|
||||
kebab ingest ./assets/ # 폴더 안 이미지 자동 인식
|
||||
kebab search "이미지 안의 OCR 텍스트"
|
||||
kebab inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
|
||||
```
|
||||
|
||||
## 테스트
|
||||
@@ -99,13 +99,13 @@ kb inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-parse-image` 는 `kb-core` + 이미지 디코딩 (`image` crate) + OCR adapter 만.
|
||||
- `kebab-parse-image` 는 `kebab-core` + 이미지 디코딩 (`image` crate) + OCR adapter 만.
|
||||
- LLM/embedding 호출 금지 (caption 은 별도 adapter 통해).
|
||||
- VLM caption 은 background job. ingest blocking 금지.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb ingest <image>` 동작
|
||||
- [ ] `kebab ingest <image>` 동작
|
||||
- [ ] OCR text 검색 가능
|
||||
- [ ] OCR region citation 출력
|
||||
- [ ] caption 과 observed text provenance 분리
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P7
|
||||
title: "PDF text extraction + page citation"
|
||||
status: planned
|
||||
depends_on: [P5]
|
||||
source: kb_local_rust_report.md §9.2, §17 Phase 7
|
||||
source: kebab_local_rust_report.md §9.2, §17 Phase 7
|
||||
---
|
||||
|
||||
# P7 — PDF ingestion
|
||||
@@ -14,7 +14,7 @@ text PDF 추출 → page-aware chunking → citation `paper.pdf:p13`. scanned PD
|
||||
|
||||
## 산출 crate
|
||||
|
||||
- `kb-parse-pdf` — `Extractor` 구현.
|
||||
- `kebab-parse-pdf` — `Extractor` 구현.
|
||||
|
||||
## 단계 분리 (§9.2)
|
||||
|
||||
@@ -63,10 +63,10 @@ paper.pdf:p13:span=0-1240 # char range within page
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb ingest ./paper.pdf
|
||||
kb ingest ./papers/
|
||||
kb search "PDF 안의 특정 개념"
|
||||
kb inspect doc <pdf_doc_id>
|
||||
kebab ingest ./paper.pdf
|
||||
kebab ingest ./papers/
|
||||
kebab search "PDF 안의 특정 개념"
|
||||
kebab inspect doc <pdf_doc_id>
|
||||
```
|
||||
|
||||
## 테스트
|
||||
@@ -79,12 +79,12 @@ kb inspect doc <pdf_doc_id>
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-parse-pdf` 는 `kb-core` + `pdf-extract` / `lopdf` 만.
|
||||
- `kebab-parse-pdf` 는 `kebab-core` + `pdf-extract` / `lopdf` 만.
|
||||
- OCR 호출은 별도 adapter 통해 (P6 OCR 인프라 재사용).
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb ingest <pdf>` 동작
|
||||
- [ ] `kebab ingest <pdf>` 동작
|
||||
- [ ] page-level chunk + citation
|
||||
- [ ] 검색 결과에 `paper.pdf:p<n>` 포함
|
||||
- [ ] 추출 실패 페이지에 대한 provenance warning
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P8
|
||||
title: "음성 transcription + timestamp citation"
|
||||
status: planned
|
||||
depends_on: [P5]
|
||||
source: kb_local_rust_report.md §9.3, §17 Phase 8
|
||||
source: kebab_local_rust_report.md §9.3, §17 Phase 8
|
||||
---
|
||||
|
||||
# P8 — 음성 ingestion
|
||||
@@ -14,8 +14,8 @@ audio 파일 → transcript (timestamped segment) → CanonicalDocument → 동
|
||||
|
||||
## 산출 crate
|
||||
|
||||
- `kb-parse-audio` — `Extractor` 구현.
|
||||
- `kb-asr-whisper` (또는 `kb-parse-audio` 내부 모듈) — whisper.cpp adapter.
|
||||
- `kebab-parse-audio` — `Extractor` 구현.
|
||||
- `kebab-asr-whisper` (또는 `kebab-parse-audio` 내부 모듈) — whisper.cpp adapter.
|
||||
|
||||
## 파이프라인 (§9.3)
|
||||
|
||||
@@ -94,11 +94,11 @@ meeting-2026-04-27.m4a:00:13:42-00:14:10:speaker=S1
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb ingest ./meeting.m4a
|
||||
kb ingest ./recordings/
|
||||
kb search "회의에서 언급한 결정사항"
|
||||
kb inspect doc <audio_doc_id> # transcript + segment timestamp 표시
|
||||
kb play <chunk_id> # (선택) 해당 구간 재생 — 후순위
|
||||
kebab ingest ./meeting.m4a
|
||||
kebab ingest ./recordings/
|
||||
kebab search "회의에서 언급한 결정사항"
|
||||
kebab inspect doc <audio_doc_id> # transcript + segment timestamp 표시
|
||||
kebab play <chunk_id> # (선택) 해당 구간 재생 — 후순위
|
||||
```
|
||||
|
||||
## 테스트
|
||||
@@ -111,12 +111,12 @@ kb play <chunk_id> # (선택) 해당 구간 재생 — 후순위
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-parse-audio` 는 `kb-core` + `Transcriber` adapter 만.
|
||||
- `kebab-parse-audio` 는 `kebab-core` + `Transcriber` adapter 만.
|
||||
- LLM 호출 금지. RAG 단계는 transcript text 기반으로 동일 파이프라인.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb ingest <audio>` 동작
|
||||
- [ ] `kebab ingest <audio>` 동작
|
||||
- [ ] transcript 가 segment timestamp 와 함께 저장
|
||||
- [ ] 검색 결과에 `00:hh:mm:ss-` citation 포함
|
||||
- [ ] 동일 오디오 재수집 idempotent
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P9
|
||||
title: "TUI + desktop app"
|
||||
status: planned
|
||||
depends_on: [P5]
|
||||
source: kb_local_rust_report.md §16, §17 Phase 9
|
||||
source: kebab_local_rust_report.md §16, §17 Phase 9
|
||||
---
|
||||
|
||||
# P9 — TUI + desktop app
|
||||
@@ -15,18 +15,18 @@ CLI 위에 사용성 레이어 추가. domain/검색/RAG 가 안정된 뒤 마
|
||||
## 순서
|
||||
|
||||
```text
|
||||
kb-tui (먼저) → kb-desktop (나중)
|
||||
kebab-tui (먼저) → kebab-desktop (나중)
|
||||
```
|
||||
|
||||
이유: TUI 는 domain 변화에 빠르게 적응 가능. desktop 은 packaging/배포 비용 큼.
|
||||
|
||||
---
|
||||
|
||||
## P9.A — kb-tui (Ratatui)
|
||||
## P9.A — kebab-tui (Ratatui)
|
||||
|
||||
### 산출 crate
|
||||
|
||||
- `kb-tui` — Ratatui + crossterm 기반 terminal UI.
|
||||
- `kebab-tui` — Ratatui + crossterm 기반 terminal UI.
|
||||
|
||||
### 화면 구성
|
||||
|
||||
@@ -51,8 +51,8 @@ q : 종료
|
||||
|
||||
### 의존성 경계
|
||||
|
||||
- `kb-tui` → `kb-app` 만. parser/store/LLM 직접 호출 금지.
|
||||
- 비동기 I/O (검색/ask) 는 `kb-app` 비동기 wrapper 또는 thread + channel.
|
||||
- `kebab-tui` → `kebab-app` 만. parser/store/LLM 직접 호출 금지.
|
||||
- 비동기 I/O (검색/ask) 는 `kebab-app` 비동기 wrapper 또는 thread + channel.
|
||||
|
||||
### 완료 조건
|
||||
|
||||
@@ -63,7 +63,7 @@ q : 종료
|
||||
|
||||
---
|
||||
|
||||
## P9.B — kb-desktop
|
||||
## P9.B — kebab-desktop
|
||||
|
||||
### 후보 비교
|
||||
|
||||
@@ -72,14 +72,14 @@ q : 종료
|
||||
| Tauri | Rust backend + web frontend. native webview, 작은 binary | web frontend 별도 stack (TS/JS) |
|
||||
| egui/eframe | 순수 Rust, immediate-mode | 디자인 자유도/접근성 한계 |
|
||||
|
||||
추천: Tauri 1차. 기존 `kb-app` facade 그대로 backend 로 노출. frontend 는 가볍게 (svelte/solid/vanilla).
|
||||
추천: Tauri 1차. 기존 `kebab-app` facade 그대로 backend 로 노출. frontend 는 가볍게 (svelte/solid/vanilla).
|
||||
|
||||
### 산출 crate / 구조
|
||||
|
||||
- `kb-desktop` (Tauri app crate)
|
||||
- `kb-desktop-frontend/` (web 자산)
|
||||
- `kebab-desktop` (Tauri app crate)
|
||||
- `kebab-desktop-frontend/` (web 자산)
|
||||
|
||||
Tauri command 는 `kb-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가 금지.
|
||||
Tauri command 는 `kebab-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가 금지.
|
||||
|
||||
### 화면 구성 (1차)
|
||||
|
||||
@@ -100,7 +100,7 @@ Tauri command 는 `kb-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가
|
||||
|
||||
### 의존성 경계
|
||||
|
||||
- frontend 는 Tauri command (= `kb-app` wrapper) 만 호출. SQLite/LanceDB 직접 접근 금지.
|
||||
- frontend 는 Tauri command (= `kebab-app` wrapper) 만 호출. SQLite/LanceDB 직접 접근 금지.
|
||||
- 모델 다운로드/실행은 backend 책임.
|
||||
|
||||
### 완료 조건
|
||||
|
||||
Reference in New Issue
Block a user