Files
kebab/crates/kebab-store-vector/src/lib.rs
th-kim0823 39b4433549 feat(kebab-app): schema_with_config facade (fb-27)
New `SchemaV1` struct + `schema_with_config(&Config)` builder. Surfaces
wire schemas list, capabilities (current + future placeholders), model
versions (parser/chunker/embedding/prompt_template/index/corpus_revision),
and stats (doc/chunk/asset counts + last ingest). kebab-store-sqlite
gains `count_summary()` to back the stats block.

Deviations from plan:
- `cfg.models.embedding.id` → `cfg.models.embedding.model` (actual field name)
- No `Config::expand_path` method → free fn `kebab_config::expand_path(&cfg.storage.data_dir, "")`
- `PARSER_VERSION` added to `kebab-parse-md/src/lib.rs` (was absent; synced with `KEBAB_PARSE_MD_VERSION` literal in kebab-app)
- `INDEX_VERSION_STR` added to `kebab-store-vector/src/store.rs` + re-exported from `lib.rs` (was a private `const`)
- `corpus_revision()` returns `u64` directly (not `Result<u64>`) — no `?` in collect_models
- `SchemaV1` carries `schema_version: "schema.v1"` field (wire schema convention)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 11:46:37 +09:00

32 lines
1.5 KiB
Rust

//! `kb-store-vector` — LanceDB-backed [`kebab_core::VectorStore`] for kb.
//!
//! Stores per-model Lance tables under `config.storage.vector_dir/`
//! (`chunk_embeddings_<model>_<dim>.lance/`). `upsert` runs the
//! SQLite-first / Lance-second two-phase write described in design
//! §5.6: phase 1 stages `embedding_records` rows at `status='pending'`,
//! phase 2 issues a Lance `MergeInsert` keyed on `chunk_id`, phase 3
//! flips the rows to `status='committed'`. `search` joins against
//! `embedding_records WHERE status='committed'` so partial-write Lance
//! rows never surface to callers; if the process crashes between phase
//! 2 and phase 3 (or phase 2 itself fails), the next `upsert` call
//! retries the still-pending rows idempotently because Lance MergeInsert
//! dedupes on `chunk_id`.
//!
//! Sync / async bridge: `VectorStore` is a sync trait (§7.2) and
//! LanceDB's Rust API is async-only. We own a private current-thread
//! `tokio::runtime::Runtime` and `block_on` per trait method. The
//! tradeoff is documented inline; multi-thread runtime would let two
//! upserts run concurrently but kb-app's job scheduler already
//! serializes vector ops, and current-thread saves the two worker
//! threads a multi-thread runtime spawns by default.
//!
//! See `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`
//! §5.6 (embedding_records DDL), §6.3 (lancedb table naming),
//! §7.2 (VectorStore), §9 (versioning).
mod arrow_batch;
mod paths;
mod store;
pub use store::{INDEX_VERSION_STR, LanceVectorStore};