Files

altair823 f9714aa5cb docs(rename): kb → kebab — README, tasks/, docs/, design doc, report

마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 04:01:55 +00:00

12 KiB

Raw Permalink Blame History

phase, component, task_id, title, status, depends_on, unblocks, contract_source, contract_sections

phase

component

task_id

title

status

depends_on

unblocks

contract_source

contract_sections

kebab-app (facade wiring)

p3-5

Wire kebab-app facade — ingest / search / list / inspect end-to-end

completed

p1-6

p2-2

p3-2

p3-3

p3-4

p4-3

p9-1

p9-2

p9-4

../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md

§7 components

§7.2 traits

§1 UX scenes (ingest

list

inspect)

§6.4 config

§10 errors

p3-5 — Wire kebab-app facade

Goal

Replace the bail!("not yet wired") stubs in kebab-app (currently ingest, search, list_docs, inspect_doc, inspect_chunk) with real bodies that compose the libraries shipped in P1–P3. After this task, kebab index actually walks a workspace and persists chunks, and kebab search --mode {lexical,vector,hybrid} returns real SearchHits. kebab-app::ask stays stubbed (P4-3 owns it).

Why now / why this size

P1–P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where kebab-app::ask will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; ask wiring lives in P4-3.

Allowed dependencies

kebab-app may depend on:

kebab-core, kebab-config (already)
kebab-source-fs, kebab-parse-md, kebab-normalize, kebab-chunk, kebab-store-sqlite (P1)
kebab-search (P2-2, P3-4)
kebab-store-vector, kebab-embed, kebab-embed-local (P3-2, P3-3, P3-4)
tracing, anyhow, serde, serde_json, time, dirs, toml (existing)

Forbidden dependencies

kebab-llm*, kebab-rag (P4 ownership — ask stays stubbed)
kebab-tui, kebab-desktop (P9 — those consume kebab-app, not the other way)
kebab-parse-pdf, kebab-parse-image, kebab-parse-audio (P6/P7/P8)
network HTTP libs (model download is kebab-embed-local's responsibility via fastembed)

Inputs

input	type	source
`kebab-config::Config`	`kebab_config::Config`	loaded once at process start; threaded through every facade call
`SourceScope`	`kebab_core::SourceScope`	CLI
`SearchQuery`	`kebab_core::SearchQuery`	CLI
`DocFilter`	`kebab_core::DocFilter`	CLI
`DocumentId` / `ChunkId`	`kebab_core::ids::*`	CLI

Outputs

output	type	downstream
`IngestReport`	`kebab_core::IngestReport`	`kebab-cli` (`ingest_report.v1`), `kebab-eval` (P5)
`Vec<SearchHit>`	`kebab_core::SearchHit`	`kebab-cli` (`search_hit.v1`), `kebab-rag` (P4-3)
`Vec<DocSummary>`	`kebab_core::DocSummary`	`kebab-cli` (`doc_summary.v1`)
`CanonicalDocument` / `Chunk`	`kebab_core::*`	`kebab-cli` (`chunk_inspection.v1`)

Public surface (signatures only — no new types)

The signatures already live on kebab-app. This task only swaps the bodies. Frozen surface:

pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>;
pub fn list_docs(filter: DocFilter)               -> anyhow::Result<Vec<DocSummary>>;
pub fn inspect_doc(id: &DocumentId)               -> anyhow::Result<CanonicalDocument>;
pub fn inspect_chunk(id: &ChunkId)                -> anyhow::Result<Chunk>;
pub fn search(query: SearchQuery)                 -> anyhow::Result<Vec<SearchHit>>;
// `ask` and `init_workspace` / `doctor` stay as-is.

If a constructor helper is needed (e.g., a single App::open(&Config) that opens SqliteStore, LanceVectorStore, embedder, retrievers once), it MAY be added pub-or-pub(crate) to kebab-app. Keep all newly-added types crate-private unless explicitly required by kebab-cli.

Behavior contract

`ingest(scope, summary_only)`

Pipeline per design §1.2:

Resolve scope → set of files under config.workspace.root filtered by config.workspace.include/exclude. Use kebab-source-fs::FsSourceConnector and pass each RawAsset + bytes through.
For each markdown asset: kebab_parse_md::frontmatter::parse_frontmatter + kebab_parse_md::blocks::parse_blocks → kebab_normalize::build_canonical_document → kebab_chunk::MdHeadingV1Chunker::chunk.
SQLite writes per design §5.8 (one ingest = one transaction): put_asset_with_bytes → put_document → put_blocks → put_chunks.
If config.models.embedding.provider == "fastembed": build FastembedEmbedder once at the top of the run, embed every chunk's text as EmbeddingKind::Document, batch by config.models.embedding.batch_size, then LanceVectorStore::ensure_table + upsert. Skip vector indexing if provider is "none" or embedding.dimensions == 0 (config opt-out path).
Record an ingest_runs row via JobRepo with the aggregate counts. summary_only=true writes items_json=NULL.
Return IngestReport per wire-schema/v1/ingest_report.schema.json.

Errors: any per-file parse failure should be recorded as a Warning in Provenance and skipped, not propagated. Only structural failures (DB unreachable, FS permission denied) abort the whole run.

`search(query)`

Per query.mode:

Lexical → kebab_search::LexicalRetriever::search. Takes a read-only borrow of the SQLite connection.
Vector → kebab_search::VectorRetriever::search. Requires Embedder (built from config) + VectorStore (LanceDB).
Hybrid → kebab_search::HybridRetriever::search composing the above two.

Each call constructs (or reuses) the retrievers from a single App context that holds Arc<SqliteStore>, Arc<LanceVectorStore>, Arc<dyn Embedder> so cold-start cost is paid once per process. Document the lifetime model: a CLI invocation paths through App::open(&Config) once, runs the requested op, then drops everything on exit. The TUI (P9) holds the App for the session.

SearchHit.embedding_model set to Some(embedder.model_id()) for Vector / Hybrid modes; None for Lexical-only. SearchHit.index_version follows the retriever's reported index_version().

`list_docs(filter)` / `inspect_doc(id)` / `inspect_chunk(id)`

Direct delegation to kebab_core::DocumentStore trait methods on SqliteStore. No vector / embedding involvement.

Lifecycle

Define an internal pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ } with a pub(crate) fn open(config: &Config) -> Result<Self>. Each public kebab-app::* function builds an App (or accepts one in tests) and uses it. The free functions stay the public API so kebab-cli and kebab-tui don't need to refactor.

vector and embedder are Option because the spec allows running KB without embeddings (lexical-only mode for resource-constrained boxes — provider == "none"). When absent, search with Vector / Hybrid modes returns anyhow::Error with a hint to enable embeddings; ingest skips the vector step.

Versioning

parser_version from kebab-parse-md constant.
chunker_version from MdHeadingV1Chunker::chunker_version().
embedding_model / embedding_version from the embedder.
index_version from the retriever (or Lexical/Vector IndexId).

All wired through to the persisted records and the wire output.

Storage / wire effects

Writes: data_dir/kebab.sqlite (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), data_dir/assets/<aa>/<id> (when copied), data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/ (when embeddings on), data_dir/models/fastembed/ (model cache, first run only).
Reads on subsequent calls: same DB.
Wire JSON conforms to ingest_report.v1, search_hit.v1, doc_summary.v1, chunk_inspection.v1 — kebab-cli already has the wrappers.

Test plan

kind	description	fixture / data
integration	`ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows	`crates/kebab-app/tests/fixtures/workspace/`, tmp `data_dir`
integration	`ingest` with `provider="none"` skips Lance and produces an `IngestReport` with `embeddings_indexed=0`	tmp config
integration	`ingest` is idempotent: re-running on the same workspace updates `documents.updated_at` and bumps `doc_version` without duplicating rows	tmp `data_dir`
integration	`search` with `mode=Lexical` returns hits whose `embedding_model` is `None`	tmp `data_dir`
integration	`search` with `mode=Vector` returns hits whose `embedding_model` matches the configured model	tmp `data_dir` (AVX `#[ignore]`)
integration	`search` with `mode=Hybrid` returns hits whose `RetrievalDetail.method == Hybrid`	tmp `data_dir` (AVX `#[ignore]`)
integration	`search` with `mode=Vector` and `provider="none"` returns a clear error	tmp config
integration	`list_docs` with `tags_any=["rust"]` filters correctly	tmp `data_dir`
integration	`inspect_doc` round-trips a document; `inspect_chunk` round-trips a chunk	tmp `data_dir`
smoke	end-to-end CLI smoke: `kebab index` + `kebab search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`)	fixture

#[ignore] AVX-gated tests follow the P3-3/P3-4 pattern: require_avx_or_panic() at the top of each LanceDB-touching body. Default cargo test -p kebab-app runs the lexical-only and SQLite-only paths.

All tests under cargo test -p kebab-app. CLI smoke optional via assert_cmd if it doesn't add a heavyweight dep tree.

Definition of Done

cargo check -p kebab-app passes.
cargo test -p kebab-app passes (default lane).
cargo test -p kebab-app -- --ignored passes on AVX hardware.
cargo run -p kebab-cli -- index succeeds against a non-empty fixture workspace.
cargo run -p kebab-cli -- search --mode hybrid "<term>" returns real hits + citations.
cargo run -p kebab-cli -- list returns the indexed documents.
cargo clippy --workspace --all-targets -- -D warnings clean.
No imports outside Allowed dependencies.
PR links design §1.2, §1.5, §1.6, §7.

Out of scope

kebab-app::ask body — P4-3 (RAG pipeline).
kebab index --rebuild-fts CLI option — wire kebab_store_sqlite::rebuild_chunks_fts only when needed by a downstream task.
kebab index --resume checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer.
--watch mode — P+.
TUI / desktop integration — P9 consumes the wired facade.

Risks / notes

Cold-start cost: first kebab index run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via tracing::info! (already wired in P3-2).
Post-merge fix (2026-05): original P3-5 implementation left every kebab-cli subcommand calling the bare kebab_app::* free functions, which silently re-loaded Config::load(None) (XDG default) and ignored the user's --config <path> flag. CLI now goes through kebab_app::*_with_config(cfg, ...) everywhere. The #[doc(hidden)] pub fn *_with_config companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See HOTFIXES.md for details.
The App lifecycle struct is internal but its construction is the natural seam for adding caching / connection pooling later. Keep it pub(crate) so future refactors don't break the CLI.
Mismatched index_version across stored records and the live retriever should fail loud at App::open (not at first search). Reuse the tracing::warn! from HybridRetriever::new (P3-4).
The fastembed adapter holds a tokio::runtime (P3-3); App must be constructed from a synchronous context. Document on App::open.
Performance: ingest over a large workspace (10k files) needs to keep SQLite WAL in healthy shape. One transaction per document is the spec; verify checkpoint behavior under load (manual benchmark, not a unit test).

12 KiB Raw Permalink Blame History Unescape Escape