docs(rename): kb → kebab — README, tasks/, docs/, design doc, report

마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-02 04:01:55 +00:00
parent f1a448d6dc
commit f9714aa5cb
56 changed files with 1324 additions and 1324 deletions

View File

@@ -1,64 +1,64 @@
---
phase: P3
component: kb-app (facade wiring)
component: kebab-app (facade wiring)
task_id: p3-5
title: "Wire kb-app facade — ingest / search / list / inspect end-to-end"
title: "Wire kebab-app facade — ingest / search / list / inspect end-to-end"
status: completed
depends_on: [p1-6, p2-2, p3-2, p3-3, p3-4]
unblocks: [p4-3, p9-1, p9-2, p9-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§7 components, §7.2 traits, §1 UX scenes (ingest, search, list, inspect), §6.4 config, §10 errors]
---
# p3-5 — Wire kb-app facade
# p3-5 — Wire kebab-app facade
## Goal
Replace the `bail!("not yet wired")` stubs in `kb-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1P3. After this task, `kb index` actually walks a workspace and persists chunks, and `kb search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kb-app::ask` stays stubbed (P4-3 owns it).
Replace the `bail!("not yet wired")` stubs in `kebab-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1P3. After this task, `kebab index` actually walks a workspace and persists chunks, and `kebab search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kebab-app::ask` stays stubbed (P4-3 owns it).
## Why now / why this size
P1P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kb-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3.
P1P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kebab-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3.
## Allowed dependencies
`kb-app` may depend on:
`kebab-app` may depend on:
- `kb-core`, `kb-config` (already)
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-sqlite` (P1)
- `kb-search` (P2-2, P3-4)
- `kb-store-vector`, `kb-embed`, `kb-embed-local` (P3-2, P3-3, P3-4)
- `kebab-core`, `kebab-config` (already)
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-sqlite` (P1)
- `kebab-search` (P2-2, P3-4)
- `kebab-store-vector`, `kebab-embed`, `kebab-embed-local` (P3-2, P3-3, P3-4)
- `tracing`, `anyhow`, `serde`, `serde_json`, `time`, `dirs`, `toml` (existing)
## Forbidden dependencies
- `kb-llm*`, `kb-rag` (P4 ownership — `ask` stays stubbed)
- `kb-tui`, `kb-desktop` (P9 — those *consume* `kb-app`, not the other way)
- `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio` (P6/P7/P8)
- network HTTP libs (model download is `kb-embed-local`'s responsibility via fastembed)
- `kebab-llm*`, `kebab-rag` (P4 ownership — `ask` stays stubbed)
- `kebab-tui`, `kebab-desktop` (P9 — those *consume* `kebab-app`, not the other way)
- `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio` (P6/P7/P8)
- network HTTP libs (model download is `kebab-embed-local`'s responsibility via fastembed)
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-config::Config` | `kb_config::Config` | loaded once at process start; threaded through every facade call |
| `SourceScope` | `kb_core::SourceScope` | CLI |
| `SearchQuery` | `kb_core::SearchQuery` | CLI |
| `DocFilter` | `kb_core::DocFilter` | CLI |
| `DocumentId` / `ChunkId` | `kb_core::ids::*` | CLI |
| `kebab-config::Config` | `kebab_config::Config` | loaded once at process start; threaded through every facade call |
| `SourceScope` | `kebab_core::SourceScope` | CLI |
| `SearchQuery` | `kebab_core::SearchQuery` | CLI |
| `DocFilter` | `kebab_core::DocFilter` | CLI |
| `DocumentId` / `ChunkId` | `kebab_core::ids::*` | CLI |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `IngestReport` | `kb_core::IngestReport` | `kb-cli` (`ingest_report.v1`), `kb-eval` (P5) |
| `Vec<SearchHit>` | `kb_core::SearchHit` | `kb-cli` (`search_hit.v1`), `kb-rag` (P4-3) |
| `Vec<DocSummary>` | `kb_core::DocSummary` | `kb-cli` (`doc_summary.v1`) |
| `CanonicalDocument` / `Chunk` | `kb_core::*` | `kb-cli` (`chunk_inspection.v1`) |
| `IngestReport` | `kebab_core::IngestReport` | `kebab-cli` (`ingest_report.v1`), `kebab-eval` (P5) |
| `Vec<SearchHit>` | `kebab_core::SearchHit` | `kebab-cli` (`search_hit.v1`), `kebab-rag` (P4-3) |
| `Vec<DocSummary>` | `kebab_core::DocSummary` | `kebab-cli` (`doc_summary.v1`) |
| `CanonicalDocument` / `Chunk` | `kebab_core::*` | `kebab-cli` (`chunk_inspection.v1`) |
## Public surface (signatures only — no new types)
The signatures already live on `kb-app`. This task only swaps the bodies. Frozen surface:
The signatures already live on `kebab-app`. This task only swaps the bodies. Frozen surface:
```rust
pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>;
@@ -69,7 +69,7 @@ pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHi
// `ask` and `init_workspace` / `doctor` stay as-is.
```
If a constructor helper is needed (e.g., a single `App::open(&Config)` that opens `SqliteStore`, `LanceVectorStore`, embedder, retrievers once), it MAY be added pub-or-pub(crate) to `kb-app`. Keep all newly-added types crate-private unless explicitly required by `kb-cli`.
If a constructor helper is needed (e.g., a single `App::open(&Config)` that opens `SqliteStore`, `LanceVectorStore`, embedder, retrievers once), it MAY be added pub-or-pub(crate) to `kebab-app`. Keep all newly-added types crate-private unless explicitly required by `kebab-cli`.
## Behavior contract
@@ -77,8 +77,8 @@ If a constructor helper is needed (e.g., a single `App::open(&Config)` that open
Pipeline per design §1.2:
1. Resolve `scope` → set of files under `config.workspace.root` filtered by `config.workspace.include`/`exclude`. Use `kb-source-fs::FsSourceConnector` and pass each `RawAsset` + bytes through.
2. For each markdown asset: `kb_parse_md::frontmatter::parse_frontmatter` + `kb_parse_md::blocks::parse_blocks``kb_normalize::build_canonical_document``kb_chunk::MdHeadingV1Chunker::chunk`.
1. Resolve `scope` → set of files under `config.workspace.root` filtered by `config.workspace.include`/`exclude`. Use `kebab-source-fs::FsSourceConnector` and pass each `RawAsset` + bytes through.
2. For each markdown asset: `kebab_parse_md::frontmatter::parse_frontmatter` + `kebab_parse_md::blocks::parse_blocks``kebab_normalize::build_canonical_document``kebab_chunk::MdHeadingV1Chunker::chunk`.
3. SQLite writes per design §5.8 (one ingest = one transaction): `put_asset_with_bytes``put_document``put_blocks``put_chunks`.
4. If `config.models.embedding.provider == "fastembed"`: build `FastembedEmbedder` once at the top of the run, embed every chunk's `text` as `EmbeddingKind::Document`, batch by `config.models.embedding.batch_size`, then `LanceVectorStore::ensure_table` + `upsert`. Skip vector indexing if provider is `"none"` or `embedding.dimensions == 0` (config opt-out path).
5. Record an `ingest_runs` row via `JobRepo` with the aggregate counts. `summary_only=true` writes `items_json=NULL`.
@@ -90,9 +90,9 @@ Errors: any per-file parse failure should be recorded as a `Warning` in `Provena
Per `query.mode`:
- `Lexical``kb_search::LexicalRetriever::search`. Takes a read-only borrow of the SQLite connection.
- `Vector``kb_search::VectorRetriever::search`. Requires `Embedder` (built from config) + `VectorStore` (LanceDB).
- `Hybrid``kb_search::HybridRetriever::search` composing the above two.
- `Lexical``kebab_search::LexicalRetriever::search`. Takes a read-only borrow of the SQLite connection.
- `Vector``kebab_search::VectorRetriever::search`. Requires `Embedder` (built from config) + `VectorStore` (LanceDB).
- `Hybrid``kebab_search::HybridRetriever::search` composing the above two.
Each call constructs (or reuses) the retrievers from a single `App` context that holds `Arc<SqliteStore>`, `Arc<LanceVectorStore>`, `Arc<dyn Embedder>` so cold-start cost is paid once per process. Document the lifetime model: a CLI invocation paths through `App::open(&Config)` once, runs the requested op, then drops everything on exit. The TUI (P9) holds the `App` for the session.
@@ -100,17 +100,17 @@ Each call constructs (or reuses) the retrievers from a single `App` context that
### `list_docs(filter)` / `inspect_doc(id)` / `inspect_chunk(id)`
Direct delegation to `kb_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement.
Direct delegation to `kebab_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement.
### Lifecycle
Define an internal `pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result<Self>`. Each public `kb-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kb-cli` and `kb-tui` don't need to refactor.
Define an internal `pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result<Self>`. Each public `kebab-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kebab-cli` and `kebab-tui` don't need to refactor.
`vector` and `embedder` are `Option` because the spec allows running KB without embeddings (lexical-only mode for resource-constrained boxes — `provider == "none"`). When absent, `search` with `Vector` / `Hybrid` modes returns `anyhow::Error` with a hint to enable embeddings; `ingest` skips the vector step.
### Versioning
- `parser_version` from `kb-parse-md` constant.
- `parser_version` from `kebab-parse-md` constant.
- `chunker_version` from `MdHeadingV1Chunker::chunker_version()`.
- `embedding_model` / `embedding_version` from the embedder.
- `index_version` from the retriever (or `Lexical`/`Vector` IndexId).
@@ -119,15 +119,15 @@ All wired through to the persisted records and the wire output.
## Storage / wire effects
- Writes: `data_dir/kb.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets/<aa>/<id>` (when copied), `data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only).
- Writes: `data_dir/kebab.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets/<aa>/<id>` (when copied), `data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only).
- Reads on subsequent calls: same DB.
- Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1``kb-cli` already has the wrappers.
- Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1``kebab-cli` already has the wrappers.
## Test plan
| kind | description | fixture / data |
|------|-------------|----------------|
| integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kb-app/tests/fixtures/workspace/`, tmp `data_dir` |
| integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kebab-app/tests/fixtures/workspace/`, tmp `data_dir` |
| integration | `ingest` with `provider="none"` skips Lance and produces an `IngestReport` with `embeddings_indexed=0` | tmp config |
| integration | `ingest` is idempotent: re-running on the same workspace updates `documents.updated_at` and bumps `doc_version` without duplicating rows | tmp `data_dir` |
| integration | `search` with `mode=Lexical` returns hits whose `embedding_model` is `None` | tmp `data_dir` |
@@ -136,36 +136,36 @@ All wired through to the persisted records and the wire output.
| integration | `search` with `mode=Vector` and `provider="none"` returns a clear error | tmp config |
| integration | `list_docs` with `tags_any=["rust"]` filters correctly | tmp `data_dir` |
| integration | `inspect_doc` round-trips a document; `inspect_chunk` round-trips a chunk | tmp `data_dir` |
| smoke | end-to-end CLI smoke: `kb index` + `kb search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture |
| smoke | end-to-end CLI smoke: `kebab index` + `kebab search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture |
`#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kb-app` runs the lexical-only and SQLite-only paths.
`#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kebab-app` runs the lexical-only and SQLite-only paths.
All tests under `cargo test -p kb-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree.
All tests under `cargo test -p kebab-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree.
## Definition of Done
- [ ] `cargo check -p kb-app` passes.
- [ ] `cargo test -p kb-app` passes (default lane).
- [ ] `cargo test -p kb-app -- --ignored` passes on AVX hardware.
- [ ] `cargo run -p kb-cli -- index` succeeds against a non-empty fixture workspace.
- [ ] `cargo run -p kb-cli -- search --mode hybrid "<term>"` returns real hits + citations.
- [ ] `cargo run -p kb-cli -- list` returns the indexed documents.
- [ ] `cargo check -p kebab-app` passes.
- [ ] `cargo test -p kebab-app` passes (default lane).
- [ ] `cargo test -p kebab-app -- --ignored` passes on AVX hardware.
- [ ] `cargo run -p kebab-cli -- index` succeeds against a non-empty fixture workspace.
- [ ] `cargo run -p kebab-cli -- search --mode hybrid "<term>"` returns real hits + citations.
- [ ] `cargo run -p kebab-cli -- list` returns the indexed documents.
- [ ] `cargo clippy --workspace --all-targets -- -D warnings` clean.
- [ ] No imports outside Allowed dependencies.
- [ ] PR links design §1.2, §1.5, §1.6, §7.
## Out of scope
- `kb-app::ask` body — P4-3 (RAG pipeline).
- `kb index --rebuild-fts` CLI option — wire `kb_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task.
- `kb index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer.
- `kebab-app::ask` body — P4-3 (RAG pipeline).
- `kebab index --rebuild-fts` CLI option — wire `kebab_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task.
- `kebab index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer.
- `--watch` mode — P+.
- TUI / desktop integration — P9 consumes the wired facade.
## Risks / notes
- Cold-start cost: first `kb index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2).
- **Post-merge fix (2026-05)**: original P3-5 implementation left every `kb-cli` subcommand calling the bare `kb_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config <path>` flag. CLI now goes through `kb_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details.
- Cold-start cost: first `kebab index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2).
- **Post-merge fix (2026-05)**: original P3-5 implementation left every `kebab-cli` subcommand calling the bare `kebab_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config <path>` flag. CLI now goes through `kebab_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details.
- The `App` lifecycle struct is internal but its construction is the natural seam for adding caching / connection pooling later. Keep it `pub(crate)` so future refactors don't break the CLI.
- Mismatched `index_version` across stored records and the live retriever should fail loud at `App::open` (not at first search). Reuse the `tracing::warn!` from `HybridRetriever::new` (P3-4).
- The fastembed adapter holds a `tokio::runtime` (P3-3); `App` must be constructed from a synchronous context. Document on `App::open`.