Implement VectorStore over LanceDB (embedded). Stores per-model tables (chunk_embeddings_<model>_<dim>.lance), upserts vectors transactionally with a row in embedding_records (SQLite), and serves search for the vector retrieval mode.
Why now / why this size
Closes the loop chunk → vector. Splits cleanly from kb-search so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details.
Allowed dependencies
kb-core
kb-config
kb-store-sqlite (only for writing/reading rows in embedding_records)
lancedb
arrow (and arrow-array, arrow-schema)
serde, serde_json
tracing
thiserror
Forbidden dependencies
kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-embed* (consumes Vec<f32> via input only — no embedding logic here), kb-search, kb-llm*, kb-rag, kb-tui, kb-desktop
Inputs
input
type
source
VectorRecord[..]
kb_core::VectorRecord
kb-app::embed_index (P3 facade)
query vector
&[f32]
kb-embed-local (Embedder::embed for query)
filters
kb_core::SearchFilters
SearchQuery
kb-config::Config.storage.vector_dir
path
runtime
Outputs
output
type
downstream
Lance tables under vector_dir/chunk_embeddings_<model>_<dim>.lance/
For corpora < 100k rows, no IVF index — flat cosine. Above that threshold, the next migration task (P+) introduces IVF; this task does not.
upsert is best-effort 2-step (Lance commit, then SQLite INSERT OR REPLACE INTO embedding_records). On SQLite failure after Lance commit, log a warning; the next upsert reconciles via the UNIQUE(chunk_id, model_id, model_version, dimensions) constraint.
Dimension mismatch (record dim ≠ table dim) returns anyhow::Error from upsert and writes nothing.
search performs cosine similarity, applies SearchFilters post-fetch (filter-then-limit may over-fetch internally — fetch 2 * k then trim).