마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신. - 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈 path 표기 `kb_*` → `kebab_*` 포함). - 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`, `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`, `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때 같은 prefix 사용). - CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` / `kb doctor` / `kb inspect` / `kb list` / `kb eval` → `kebab <verb>`. fenced code block + 인라인 backtick 모두. - XDG paths + env vars + binary 경로 (`target/release/kb` → `target/release/kebab`) 동기화. - design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task spec 모든 reference 통일. - task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history 기록용 author 정보라 그대로 유지 (실제 git history 의 author 는 변경 불가). - `tasks/phase-5-evaluation.md` 의 `status: planned` → `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분). ## 검증 - `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]" --include="*.md"` 0 hits (task-decomposition.md 의 git author 제외). - 모든 file path reference 살아있음 (renamed file 들 모두 새 path 로 update). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 KiB
phase, component, task_id, title, status, depends_on, unblocks, contract_source, contract_sections
| phase | component | task_id | title | status | depends_on | unblocks | contract_source | contract_sections | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P3 | kebab-app (facade wiring) | p3-5 | Wire kebab-app facade — ingest / search / list / inspect end-to-end | completed |
|
|
../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md |
|
p3-5 — Wire kebab-app facade
Goal
Replace the bail!("not yet wired") stubs in kebab-app (currently ingest, search, list_docs, inspect_doc, inspect_chunk) with real bodies that compose the libraries shipped in P1–P3. After this task, kebab index actually walks a workspace and persists chunks, and kebab search --mode {lexical,vector,hybrid} returns real SearchHits. kebab-app::ask stays stubbed (P4-3 owns it).
Why now / why this size
P1–P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where kebab-app::ask will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; ask wiring lives in P4-3.
Allowed dependencies
kebab-app may depend on:
kebab-core,kebab-config(already)kebab-source-fs,kebab-parse-md,kebab-normalize,kebab-chunk,kebab-store-sqlite(P1)kebab-search(P2-2, P3-4)kebab-store-vector,kebab-embed,kebab-embed-local(P3-2, P3-3, P3-4)tracing,anyhow,serde,serde_json,time,dirs,toml(existing)
Forbidden dependencies
kebab-llm*,kebab-rag(P4 ownership —askstays stubbed)kebab-tui,kebab-desktop(P9 — those consumekebab-app, not the other way)kebab-parse-pdf,kebab-parse-image,kebab-parse-audio(P6/P7/P8)- network HTTP libs (model download is
kebab-embed-local's responsibility via fastembed)
Inputs
| input | type | source |
|---|---|---|
kebab-config::Config |
kebab_config::Config |
loaded once at process start; threaded through every facade call |
SourceScope |
kebab_core::SourceScope |
CLI |
SearchQuery |
kebab_core::SearchQuery |
CLI |
DocFilter |
kebab_core::DocFilter |
CLI |
DocumentId / ChunkId |
kebab_core::ids::* |
CLI |
Outputs
| output | type | downstream |
|---|---|---|
IngestReport |
kebab_core::IngestReport |
kebab-cli (ingest_report.v1), kebab-eval (P5) |
Vec<SearchHit> |
kebab_core::SearchHit |
kebab-cli (search_hit.v1), kebab-rag (P4-3) |
Vec<DocSummary> |
kebab_core::DocSummary |
kebab-cli (doc_summary.v1) |
CanonicalDocument / Chunk |
kebab_core::* |
kebab-cli (chunk_inspection.v1) |
Public surface (signatures only — no new types)
The signatures already live on kebab-app. This task only swaps the bodies. Frozen surface:
pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>;
pub fn list_docs(filter: DocFilter) -> anyhow::Result<Vec<DocSummary>>;
pub fn inspect_doc(id: &DocumentId) -> anyhow::Result<CanonicalDocument>;
pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHit>>;
// `ask` and `init_workspace` / `doctor` stay as-is.
If a constructor helper is needed (e.g., a single App::open(&Config) that opens SqliteStore, LanceVectorStore, embedder, retrievers once), it MAY be added pub-or-pub(crate) to kebab-app. Keep all newly-added types crate-private unless explicitly required by kebab-cli.
Behavior contract
ingest(scope, summary_only)
Pipeline per design §1.2:
- Resolve
scope→ set of files underconfig.workspace.rootfiltered byconfig.workspace.include/exclude. Usekebab-source-fs::FsSourceConnectorand pass eachRawAsset+ bytes through. - For each markdown asset:
kebab_parse_md::frontmatter::parse_frontmatter+kebab_parse_md::blocks::parse_blocks→kebab_normalize::build_canonical_document→kebab_chunk::MdHeadingV1Chunker::chunk. - SQLite writes per design §5.8 (one ingest = one transaction):
put_asset_with_bytes→put_document→put_blocks→put_chunks. - If
config.models.embedding.provider == "fastembed": buildFastembedEmbedderonce at the top of the run, embed every chunk'stextasEmbeddingKind::Document, batch byconfig.models.embedding.batch_size, thenLanceVectorStore::ensure_table+upsert. Skip vector indexing if provider is"none"orembedding.dimensions == 0(config opt-out path). - Record an
ingest_runsrow viaJobRepowith the aggregate counts.summary_only=truewritesitems_json=NULL. - Return
IngestReportperwire-schema/v1/ingest_report.schema.json.
Errors: any per-file parse failure should be recorded as a Warning in Provenance and skipped, not propagated. Only structural failures (DB unreachable, FS permission denied) abort the whole run.
search(query)
Per query.mode:
Lexical→kebab_search::LexicalRetriever::search. Takes a read-only borrow of the SQLite connection.Vector→kebab_search::VectorRetriever::search. RequiresEmbedder(built from config) +VectorStore(LanceDB).Hybrid→kebab_search::HybridRetriever::searchcomposing the above two.
Each call constructs (or reuses) the retrievers from a single App context that holds Arc<SqliteStore>, Arc<LanceVectorStore>, Arc<dyn Embedder> so cold-start cost is paid once per process. Document the lifetime model: a CLI invocation paths through App::open(&Config) once, runs the requested op, then drops everything on exit. The TUI (P9) holds the App for the session.
SearchHit.embedding_model set to Some(embedder.model_id()) for Vector / Hybrid modes; None for Lexical-only. SearchHit.index_version follows the retriever's reported index_version().
list_docs(filter) / inspect_doc(id) / inspect_chunk(id)
Direct delegation to kebab_core::DocumentStore trait methods on SqliteStore. No vector / embedding involvement.
Lifecycle
Define an internal pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ } with a pub(crate) fn open(config: &Config) -> Result<Self>. Each public kebab-app::* function builds an App (or accepts one in tests) and uses it. The free functions stay the public API so kebab-cli and kebab-tui don't need to refactor.
vector and embedder are Option because the spec allows running KB without embeddings (lexical-only mode for resource-constrained boxes — provider == "none"). When absent, search with Vector / Hybrid modes returns anyhow::Error with a hint to enable embeddings; ingest skips the vector step.
Versioning
parser_versionfromkebab-parse-mdconstant.chunker_versionfromMdHeadingV1Chunker::chunker_version().embedding_model/embedding_versionfrom the embedder.index_versionfrom the retriever (orLexical/VectorIndexId).
All wired through to the persisted records and the wire output.
Storage / wire effects
- Writes:
data_dir/kebab.sqlite(assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs),data_dir/assets/<aa>/<id>(when copied),data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/(when embeddings on),data_dir/models/fastembed/(model cache, first run only). - Reads on subsequent calls: same DB.
- Wire JSON conforms to
ingest_report.v1,search_hit.v1,doc_summary.v1,chunk_inspection.v1—kebab-clialready has the wrappers.
Test plan
| kind | description | fixture / data |
|---|---|---|
| integration | ingest over a 3-file fixture workspace produces non-empty IngestReport and writes the expected SQLite rows + Lance rows |
crates/kebab-app/tests/fixtures/workspace/, tmp data_dir |
| integration | ingest with provider="none" skips Lance and produces an IngestReport with embeddings_indexed=0 |
tmp config |
| integration | ingest is idempotent: re-running on the same workspace updates documents.updated_at and bumps doc_version without duplicating rows |
tmp data_dir |
| integration | search with mode=Lexical returns hits whose embedding_model is None |
tmp data_dir |
| integration | search with mode=Vector returns hits whose embedding_model matches the configured model |
tmp data_dir (AVX #[ignore]) |
| integration | search with mode=Hybrid returns hits whose RetrievalDetail.method == Hybrid |
tmp data_dir (AVX #[ignore]) |
| integration | search with mode=Vector and provider="none" returns a clear error |
tmp config |
| integration | list_docs with tags_any=["rust"] filters correctly |
tmp data_dir |
| integration | inspect_doc round-trips a document; inspect_chunk round-trips a chunk |
tmp data_dir |
| smoke | end-to-end CLI smoke: kebab index + kebab search --mode hybrid "..." against the fixture workspace using cargo run (manual / assert_cmd) |
fixture |
#[ignore] AVX-gated tests follow the P3-3/P3-4 pattern: require_avx_or_panic() at the top of each LanceDB-touching body. Default cargo test -p kebab-app runs the lexical-only and SQLite-only paths.
All tests under cargo test -p kebab-app. CLI smoke optional via assert_cmd if it doesn't add a heavyweight dep tree.
Definition of Done
cargo check -p kebab-apppasses.cargo test -p kebab-apppasses (default lane).cargo test -p kebab-app -- --ignoredpasses on AVX hardware.cargo run -p kebab-cli -- indexsucceeds against a non-empty fixture workspace.cargo run -p kebab-cli -- search --mode hybrid "<term>"returns real hits + citations.cargo run -p kebab-cli -- listreturns the indexed documents.cargo clippy --workspace --all-targets -- -D warningsclean.- No imports outside Allowed dependencies.
- PR links design §1.2, §1.5, §1.6, §7.
Out of scope
kebab-app::askbody — P4-3 (RAG pipeline).kebab index --rebuild-ftsCLI option — wirekebab_store_sqlite::rebuild_chunks_ftsonly when needed by a downstream task.kebab index --resumecheckpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer.--watchmode — P+.- TUI / desktop integration — P9 consumes the wired facade.
Risks / notes
- Cold-start cost: first
kebab indexrun downloads the fastembed model (~470MB) and warms the ONNX session. Surface viatracing::info!(already wired in P3-2). - Post-merge fix (2026-05): original P3-5 implementation left every
kebab-clisubcommand calling the barekebab_app::*free functions, which silently re-loadedConfig::load(None)(XDG default) and ignored the user's--config <path>flag. CLI now goes throughkebab_app::*_with_config(cfg, ...)everywhere. The#[doc(hidden)] pub fn *_with_configcompanions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See HOTFIXES.md for details. - The
Applifecycle struct is internal but its construction is the natural seam for adding caching / connection pooling later. Keep itpub(crate)so future refactors don't break the CLI. - Mismatched
index_versionacross stored records and the live retriever should fail loud atApp::open(not at first search). Reuse thetracing::warn!fromHybridRetriever::new(P3-4). - The fastembed adapter holds a
tokio::runtime(P3-3);Appmust be constructed from a synchronous context. Document onApp::open. - Performance:
ingestover a large workspace (10k files) needs to keep SQLite WAL in healthy shape. One transaction per document is the spec; verify checkpoint behavior under load (manual benchmark, not a unit test).