tasks: add P3 component specs (embedder, fastembed, lancedb, hybrid)

2026-04-27 11:59:46 +00:00
parent c044b97a34
commit 5b813ce39e
4 changed files with 496 additions and 0 deletions
--- a/tasks/p3/p3-1-embedder-trait.md
+++ b/tasks/p3/p3-1-embedder-trait.md
@@ -0,0 +1,100 @@
+---
+phase: P3
+component: kb-embed (trait crate)
+task_id: p3-1
+title: "Embedder trait + EmbeddingInput/Kind validation"
+status: planned
+depends_on: [p0-1]
+unblocks: [p3-2, p3-3, p3-4]
+contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
+contract_sections: [§3.7 SearchHit.embedding_model, §7.1 EmbeddingInput/Kind, §7.2 Embedder, §11 LLM/embedding split]
+---
+
+# p3-1 — Embedder trait crate
+
+## Goal
+
+Provide the `kb-embed` crate that re-exports `Embedder` trait, `EmbeddingInput`/`EmbeddingKind`, and offers a mock implementation for downstream tests. This task is **trait-only**; concrete adapters live in p3-2.
+
+## Why now / why this size
+
+Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. Owning the trait + a mock implementation in a tiny crate keeps `kb-store-vector` and `kb-search` testable without touching real models.
+
+## Allowed dependencies
+
+- `kb-core`
+- `kb-config`
+- `serde`
+- `thiserror`
+- `tracing`
+
+## Forbidden dependencies
+
+- `fastembed`, `ort`, `tokenizers`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
+
+## Inputs
+
+| input | type | source |
+|-------|------|--------|
+| `EmbeddingInput` | `kb_core::EmbeddingInput<'_>` | callers (parser-side or query-side) |
+| model identity | `(EmbeddingModelId, EmbeddingVersion, dimensions)` | adapter at construction |
+
+## Outputs
+
+| output | type | downstream |
+|--------|------|------------|
+| `Vec<Vec<f32>>` | row-aligned with input | `kb-store-vector`, `kb-search` (vector mode) |
+
+## Public surface (signatures only — no new types)
+
+```rust
+pub use kb_core::{EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion, Embedder};
+
+/// Test-only mock that produces deterministic vectors.
+pub struct MockEmbedder { /* internal: model_id, dims, seed */ }
+impl MockEmbedder {
+    pub fn new(model_id: kb_core::EmbeddingModelId, version: kb_core::EmbeddingVersion, dimensions: usize) -> Self;
+}
+impl kb_core::Embedder for MockEmbedder { /* per §7.2 */ }
+```
+
+## Behavior contract
+
+- `MockEmbedder::embed` produces vectors deterministically from `(text, kind)`: e.g., `vector[i] = hash_to_unit_float(text, kind, i, seed)` so two identical inputs produce identical vectors and different inputs produce nearly-orthogonal vectors. Used by downstream tests.
+- `MockEmbedder` must respect `EmbeddingKind::Document` vs `Query` — different prefix mixed into the hash so query embeddings differ from document embeddings of the same text (mirrors real e5 behavior).
+- `dimensions()` returns the value passed at construction; callers must trust it.
+- Real adapters (p3-2) MUST NOT implement `Embedder` here.
+- The crate may expose a tiny helper `pub fn assert_vector_shape(vecs: &[Vec<f32>], expected_dims: usize)` for downstream tests.
+
+## Storage / wire effects
+
+- None.
+
+## Test plan
+
+| kind | description | fixture / data |
+|------|-------------|----------------|
+| unit | trait dyn dispatch via `Box<dyn Embedder>` works | inline |
+| unit | `MockEmbedder` produces identical vector for identical input | inline |
+| unit | `EmbeddingKind::Document` vs `Query` for same text yield different vectors | inline |
+| unit | dimensions match construction-time value | inline |
+| contract | property test: 100 random inputs, each vector has length == dimensions, all finite floats | inline (proptest) |
+
+All tests under `cargo test -p kb-embed`.
+
+## Definition of Done
+
+- [ ] `cargo check -p kb-embed` passes
+- [ ] `cargo test -p kb-embed` passes
+- [ ] No external embedding dep present
+- [ ] PR links design §7.2 Embedder, §11
+
+## Out of scope
+
+- Real adapter (`kb-embed-local` is p3-2).
+- Reranker traits (P+).
+
+## Risks / notes
+
+- `MockEmbedder` is for tests; do not let it leak into release builds via default features. Gate behind `cfg(test)` or a `mock` feature flag.
+- Trait re-exports keep the call site stable even if `kb-core` reorganizes; downstream crates should `use kb_embed::Embedder` rather than `use kb_core::Embedder`.
--- a/tasks/p3/p3-2-fastembed-adapter.md
+++ b/tasks/p3/p3-2-fastembed-adapter.md
@@ -0,0 +1,119 @@
+---
+phase: P3
+component: kb-embed-local (fastembed adapter)
+task_id: p3-2
+title: "fastembed-rs Embedder for multilingual-e5-small"
+status: planned
+depends_on: [p3-1]
+unblocks: [p3-3, p3-4]
+contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
+contract_sections: [§7.2 Embedder, §11.3 local embedding, §6.4 [models.embedding], §9 versioning]
+---
+
+# p3-2 — fastembed adapter
+
+## Goal
+
+Provide `FastembedEmbedder` implementing `Embedder` for `multilingual-e5-small` (default) using `fastembed-rs` (ONNX runtime). Apply Document/Query prefix per §11.3. Honor `batch_size` from config.
+
+## Why now / why this size
+
+First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + dims) downstream. Isolated from store/search so model swaps remain config-only.
+
+## Allowed dependencies
+
+- `kb-core`
+- `kb-config`
+- `kb-embed`
+- `fastembed = "4"` (or current stable)
+- `tokenizers`
+- `ort` (transitive via fastembed)
+- `tracing`
+- `thiserror`
+
+## Forbidden dependencies
+
+- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, network HTTP libs (model download is fastembed's responsibility)
+
+## Inputs
+
+| input | type | source |
+|-------|------|--------|
+| `kb-config::Config.models.embedding` | settings | runtime |
+| `EmbeddingInput[..]` | `kb_core::EmbeddingInput<'_>[]` | callers |
+| model cache | `data_dir/models/fastembed/` | filesystem |
+
+## Outputs
+
+| output | type | downstream |
+|--------|------|------------|
+| `Vec<Vec<f32>>` | row-aligned, `dimensions = 384` | `kb-store-vector`, query vectors for hybrid search |
+| model identity | `(EmbeddingModelId, EmbeddingVersion, usize)` | record fields, `embedding_id` recipe |
+
+## Public surface (signatures only — no new types)
+
+```rust
+pub struct FastembedEmbedder { /* internal: TextEmbedding instance + model meta */ }
+
+impl FastembedEmbedder {
+    pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
+}
+
+impl kb_core::Embedder for FastembedEmbedder {
+    fn model_id(&self) -> kb_core::EmbeddingModelId;
+    fn model_version(&self) -> kb_core::EmbeddingVersion;
+    fn dimensions(&self) -> usize;
+    fn embed(&self, inputs: &[kb_core::EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>;
+}
+```
+
+## Behavior contract
+
+- Default model `multilingual-e5-small` (384 dims). `model_id()` returns `EmbeddingModelId("multilingual-e5-small")`.
+- `model_version()` returns `EmbeddingVersion("v1")` initially. Bump per §9 if fastembed upgrades the bundled weights.
+- Apply e5 prefix per §11.3: input prefixed with `"passage: "` for `EmbeddingKind::Document`, `"query: "` for `EmbeddingKind::Query` BEFORE tokenization.
+- Batch processing respects `config.models.embedding.batch_size`. Inputs longer than the batch are split into multiple inference calls and concatenated.
+- L2-normalize each vector before returning (e5 convention).
+- Dimensions must equal `config.models.embedding.dimensions` AND the model's actual dim. Mismatch returns `anyhow::Error` at construction (not at first `embed`).
+- Model files cached under `config.storage.model_dir/fastembed/` (downloaded on first use).
+- Determinism: identical input + identical model version → identical vectors (tolerance < 1e-6 on aggregate hash for snapshot tests).
+- No async runtime: the trait is synchronous. fastembed is sync internally.
+
+## Storage / wire effects
+
+- Reads/writes `data_dir/models/fastembed/` (model cache).
+- Otherwise no DB or wire effects.
+
+## Test plan
+
+| kind | description | fixture / data |
+|------|-------------|----------------|
+| unit | construction with default config returns dims=384 | tmp config |
+| unit | construction with mismatched dims returns error | tmp config |
+| unit | `EmbeddingKind::Query` vs `Document` for same text yield different vectors (cosine < 1.0) | inline |
+| unit | output vectors are L2-normalized (norm ~= 1.0 ± 1e-3) | inline |
+| determinism | identical input twice → identical output (hash-of-floats compare) | inline |
+| performance | batch of 64 short inputs completes in < 5s on CI host | tmp config (skip on slow CI via `#[ignore]`) |
+| snapshot | aggregate hash of vectors for 5 known sentences stable across runs | `fixtures/embed/known-sentences.json` |
+
+All tests under `cargo test -p kb-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane.
+
+## Definition of Done
+
+- [ ] `cargo check -p kb-embed-local` passes
+- [ ] `cargo test -p kb-embed-local` passes (excluding `#[ignore]`)
+- [ ] First-run model download works under `data_dir/models/fastembed/`
+- [ ] No imports outside Allowed dependencies
+- [ ] PR links design §11.3, §6.4, §9
+
+## Out of scope
+
+- Reranker (P+).
+- Other model providers (Ollama embedding endpoint, candle) — separate adapter crates.
+- Visual / image embeddings (P6).
+
+## Risks / notes
+
+- ONNX runtime first-load latency on M-series Macs (Metal) can be 1-2 s; tests share a `OnceCell<FastembedEmbedder>`.
+- Forgetting the e5 prefix silently degrades retrieval quality. Tests must assert query/document yield distinct vectors.
+- Bumping `EmbeddingVersion` invalidates every `embedding_id`. Treat as a versioning event per §9 — provides justification in PR body.
--- a/tasks/p3/p3-3-lancedb-store.md
+++ b/tasks/p3/p3-3-lancedb-store.md
@@ -0,0 +1,132 @@
+---
+phase: P3
+component: kb-store-vector (LanceDB)
+task_id: p3-3
+title: "LanceDB VectorStore + embedding_records writer"
+status: planned
+depends_on: [p3-2, p1-6]
+unblocks: [p3-4]
+contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
+contract_sections: [§5.6 embedding_records, §6.3 lancedb table naming, §7.2 VectorStore, §9 versioning]
+---
+
+# p3-3 — LanceDB VectorStore
+
+## Goal
+
+Implement `VectorStore` over LanceDB (embedded). Stores per-model tables (`chunk_embeddings_<model>_<dim>.lance`), upserts vectors transactionally with a row in `embedding_records` (SQLite), and serves `search` for the vector retrieval mode.
+
+## Why now / why this size
+
+Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details.
+
+## Allowed dependencies
+
+- `kb-core`
+- `kb-config`
+- `kb-store-sqlite` (only for writing/reading rows in `embedding_records`)
+- `lancedb`
+- `arrow` (and `arrow-array`, `arrow-schema`)
+- `serde`, `serde_json`
+- `tracing`
+- `thiserror`
+
+## Forbidden dependencies
+
+- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-embed*` (consumes `Vec<f32>` via input only — no embedding logic here), `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
+
+## Inputs
+
+| input | type | source |
+|-------|------|--------|
+| `VectorRecord[..]` | `kb_core::VectorRecord` | `kb-app::embed_index` (P3 facade) |
+| query vector | `&[f32]` | `kb-embed-local` (`Embedder::embed` for query) |
+| filters | `kb_core::SearchFilters` | `SearchQuery` |
+| `kb-config::Config.storage.vector_dir` | path | runtime |
+
+## Outputs
+
+| output | type | downstream |
+|--------|------|------------|
+| Lance tables under `vector_dir/chunk_embeddings_<model>_<dim>.lance/` | filesystem | future searches |
+| `embedding_records` rows | SQLite | reverse lookup, reindex bookkeeping |
+| `Vec<VectorHit>` | `kb_core::VectorHit` | hybrid retriever (p3-4) |
+
+## Public surface (signatures only — no new types)
+
+```rust
+pub struct LanceVectorStore { /* internal: connection + sqlite handle */ }
+
+impl LanceVectorStore {
+    pub fn new(config: &kb_config::Config, sqlite: std::sync::Arc<kb_store_sqlite::SqliteStore>) -> anyhow::Result<Self>;
+}
+
+impl kb_core::VectorStore for LanceVectorStore {
+    fn ensure_table(&self, model: &kb_core::EmbeddingModelId, dim: usize) -> anyhow::Result<kb_core::IndexId>;
+    fn upsert(&self, recs: &[kb_core::VectorRecord]) -> anyhow::Result<()>;
+    fn search(&self, query_vec: &[f32], k: usize, filters: &kb_core::SearchFilters) -> anyhow::Result<Vec<kb_core::VectorHit>>;
+}
+```
+
+## Behavior contract
+
+- Table naming: `chunk_embeddings_<model_id>_<dim>.lance`. Model IDs must be sanitized (replace non `[A-Za-z0-9-]` with `_`) to avoid filesystem issues.
+- `ensure_table` is idempotent: opens existing or creates with explicit Arrow schema:
+  ```
+  chunk_id : Utf8 (primary)
+  doc_id   : Utf8
+  embedding: FixedSizeList<Float32, dim>
+  model_id : Utf8
+  embedding_version : Utf8
+  text     : Utf8
+  heading_path : Utf8
+  created_at : Timestamp(Microsecond, UTC)
+  ```
+- For corpora < 100k rows, no IVF index — flat cosine. Above that threshold, the next migration task (P+) introduces IVF; this task does not.
+- `upsert` is best-effort 2-step (Lance commit, then SQLite `INSERT OR REPLACE INTO embedding_records`). On SQLite failure after Lance commit, log a warning; the next `upsert` reconciles via the `UNIQUE(chunk_id, model_id, model_version, dimensions)` constraint.
+- Dimension mismatch (record dim ≠ table dim) returns `anyhow::Error` from `upsert` and writes nothing.
+- `search` performs cosine similarity, applies `SearchFilters` post-fetch (filter-then-limit may over-fetch internally — fetch `2 * k` then trim).
+- `VectorHit { chunk_id, score, doc_id, text, heading_path }`; score in [0, 1] (cosine similarity, clamped).
+- `search` returns empty `Vec` (not error) when table absent.
+- `index_id` for `ensure_table` per design §4.2 with `collection = "chunk_embeddings"`, `index_kind = "flat"`, `params_hash = blake3(serde_json(table_schema))`.
+
+## Storage / wire effects
+
+- Writes Lance tables under `data_dir/lancedb/`.
+- Writes/reads `embedding_records` rows.
+- Reads chunks/documents not from this crate (the caller pre-fetches text + heading via `VectorRecord`).
+
+## Test plan
+
+| kind | description | fixture / data |
+|------|-------------|----------------|
+| unit | `ensure_table` creates dir; second call returns same `IndexId` | tmp data_dir |
+| unit | `upsert` of 10 records makes them retrievable via `search` (k=5) | tmp data_dir |
+| unit | dimension mismatch → error, no Lance row written | tmp data_dir |
+| unit | filter `tags_any` removes non-matching docs | tmp data_dir + seeded sqlite tags |
+| unit | model isolation: two models live in two directories with same `chunk_id` | tmp data_dir |
+| unit | search before any upsert returns empty Vec | tmp data_dir |
+| determinism | same query vector + same data → same top-k order | tmp data_dir |
+| snapshot | `Vec<VectorHit>` JSON for fixed corpus stable | `fixtures/vector/run-1.json` |
+
+All tests under `cargo test -p kb-store-vector`.
+
+## Definition of Done
+
+- [ ] `cargo check -p kb-store-vector` passes
+- [ ] `cargo test -p kb-store-vector` passes
+- [ ] No imports outside Allowed dependencies
+- [ ] `embedding_records` rows align 1:1 with Lance rows after a successful upsert batch
+- [ ] PR links design §5.6, §6.3, §7.2
+
+## Out of scope
+
+- IVF / PQ index tuning (P+).
+- Image / multimodal vector tables (P6).
+- `kb-app` orchestration of indexing jobs (`embed_index` facade method body).
+
+## Risks / notes
+
+- LanceDB's Rust API requires Arrow batches; constructing them per upsert is allocation-heavy — batch by configurable chunk size to avoid memory spikes.
+- Filter-then-limit can starve `k` results; over-fetch by `2 * k` initially and double on retry up to a cap.
+- WAL stability: ensure Lance commits before SQLite `INSERT INTO embedding_records` to avoid orphan SQLite rows.
--- a/tasks/p3/p3-4-hybrid-fusion.md
+++ b/tasks/p3/p3-4-hybrid-fusion.md
@@ -0,0 +1,145 @@
+---
+phase: P3
+component: kb-search (hybrid)
+task_id: p3-4
+title: "Hybrid Retriever (RRF) over lexical + vector"
+status: planned
+depends_on: [p2-2, p3-3]
+unblocks: [p4-3]
+contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
+contract_sections: [§3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 [search] rrf settings]
+---
+
+# p3-4 — Hybrid Retriever (RRF)
+
+## Goal
+
+Compose `LexicalRetriever` (p2-2) and a vector retriever wrapper around `LanceVectorStore` (p3-3) into a single `Retriever` that dispatches by `SearchMode`. For `Hybrid`, fuse via Reciprocal Rank Fusion (RRF) and populate full `RetrievalDetail` per `SearchHit`.
+
+## Why now / why this size
+
+Single mediator. Keeps the lexical and vector retrievers focused; only this task knows how to fuse. RAG (p4-3) consumes hybrid output without caring about the underlying retrievers.
+
+## Allowed dependencies
+
+- `kb-core`
+- `kb-config`
+- `kb-store-sqlite` (for `LexicalRetriever`)
+- `kb-store-vector` (for `LanceVectorStore`)
+- `kb-embed` (trait only — for query embedding via `Embedder`)
+- `tracing`
+- `thiserror`
+
+## Forbidden dependencies
+
+- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`. (`kb-embed-local` is a runtime-injected `dyn Embedder`; this crate must not depend on the concrete adapter directly.)
+
+## Inputs
+
+| input | type | source |
+|-------|------|--------|
+| `LexicalRetriever` | trait object | constructed elsewhere |
+| `LanceVectorStore` | trait object | constructed elsewhere |
+| `Box<dyn Embedder>` | for query embedding | runtime-injected |
+| `kb-config::Config.search` | `default_k`, `hybrid_fusion`, `rrf_k` | runtime |
+| `SearchQuery` | `kb_core::SearchQuery` | `kb-app::search` |
+
+## Outputs
+
+| output | type | downstream |
+|--------|------|------------|
+| `Vec<SearchHit>` (with full `RetrievalDetail`) | `kb_core::SearchHit` | `kb-cli` printer, `kb-rag` packer |
+
+## Public surface (signatures only — no new types)
+
+```rust
+pub struct HybridRetriever {
+    lexical: std::sync::Arc<dyn kb_core::Retriever>,
+    vector:  std::sync::Arc<dyn kb_core::Retriever>,   // wrapper over LanceVectorStore + Embedder
+    fusion:  FusionPolicy,
+    k:       usize,
+}
+
+pub enum FusionPolicy { Rrf { k_rrf: u32 } }
+
+impl HybridRetriever {
+    pub fn new(
+        config: &kb_config::Config,
+        lexical: std::sync::Arc<dyn kb_core::Retriever>,
+        vector:  std::sync::Arc<dyn kb_core::Retriever>,
+    ) -> Self;
+}
+
+impl kb_core::Retriever for HybridRetriever {
+    fn search(&self, query: &kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>;
+    fn index_version(&self) -> kb_core::IndexVersion;
+}
+
+/// Wrapper that turns a VectorStore + Embedder into a Retriever.
+pub struct VectorRetriever {
+    store:   std::sync::Arc<dyn kb_core::VectorStore>,
+    embed:   std::sync::Arc<dyn kb_core::Embedder>,
+    /* heading_path/snippet enrichment hits SQLite via kb-store-sqlite read accessor */
+}
+impl VectorRetriever {
+    pub fn new(store: std::sync::Arc<dyn kb_core::VectorStore>, embed: std::sync::Arc<dyn kb_core::Embedder>, sqlite: std::sync::Arc<kb_store_sqlite::SqliteStore>) -> Self;
+}
+impl kb_core::Retriever for VectorRetriever { /* per §7.2 */ }
+```
+
+## Behavior contract
+
+- `SearchMode::Lexical` dispatches solely to `lexical`. `RetrievalDetail.method = Lexical`, `vector_*` fields are `None`.
+- `SearchMode::Vector` dispatches solely to `vector`. `RetrievalDetail.method = Vector`, `lexical_*` fields are `None`.
+- `SearchMode::Hybrid`:
+  - run `lexical.search(query)` and `vector.search(query)` in sequence (fan-out is fine; not required).
+  - fuse with RRF: `score(c) = Σ_{m ∈ {lex, vec}} 1 / (k_rrf + rank_m(c))` where `k_rrf` from config (default 60). `rank_m` is 1-based; chunks not appearing in retriever `m` contribute 0.
+  - sort by fused score DESC, take top `query.k`.
+  - populate every `SearchHit.retrieval`: `method = Hybrid`, `lexical_score` / `lexical_rank` / `vector_score` / `vector_rank` from each retriever's hit (or `None` if absent), `fusion_score` = computed fused score.
+  - if a chunk appears in only one retriever, its `RetrievalDetail` still gets populated with `Some(...)` from that side and `None` for the other.
+  - tie-break by `lexical_rank` ascending, then `chunk_id` ascending (deterministic).
+- `VectorRetriever`:
+  - embeds the query via `embed.embed(&[EmbeddingInput { text: query.text, kind: Query }])`.
+  - calls `VectorStore::search(query_vec, query.k * 2, query.filters)` (over-fetch for filter losses), trims to `k`.
+  - hydrates `doc_path` / `heading_path` / `section_label` / `chunker_version` / `embedding_model` from SQLite by joining on `chunk_id`.
+  - builds `Citation` from chunk's first source span (same logic as p2-2).
+- `index_version()` returns the lexical index version when in pure lexical mode, else the vector index version, else "hybrid:<lex_iv>+<vec_iv>".
+
+## Storage / wire effects
+
+- Reads only. No mutations.
+- Output JSON conforms to `search_hit.v1`.
+
+## Test plan
+
+| kind | description | fixture / data |
+|------|-------------|----------------|
+| unit | pure lexical mode delegates 1:1 to `lexical.search` | mock retrievers |
+| unit | pure vector mode delegates 1:1 to `vector.search` | mock retrievers |
+| unit | hybrid: chunk only in lexical receives `vector_*: None`, but still has a fused score | mock retrievers |
+| unit | RRF formula matches expected with `k_rrf=60` | inline math test |
+| unit | tie-break deterministic (same fused score → stable order) | inline |
+| unit | hybrid recall ≥ max(lexical recall, vector recall) on a tiny corpus where each mode finds disjoint hits | tmp DB + Lance + MockEmbedder |
+| determinism | identical query twice → byte-identical `Vec<SearchHit>` | tmp DB |
+| snapshot | hybrid output JSON stable | `fixtures/search/hybrid/run-1.json` |
+
+All tests under `cargo test -p kb-search hybrid`.
+
+## Definition of Done
+
+- [ ] `cargo check -p kb-search` passes
+- [ ] `cargo test -p kb-search hybrid` passes
+- [ ] No imports outside Allowed dependencies
+- [ ] PR links design §3.7, §6.4 search, §0 Q3
+
+## Out of scope
+
+- Reranker (P+).
+- Multimodal retrieval (image/audio) — P6+.
+- Score calibration across modes (RRF makes scores rank-comparable; absolute calibration is P+).
+
+## Risks / notes
+
+- Mismatched `index_version` between lexical and vector should be flagged at construction so users notice stale indexes.
+- Over-fetching at the vector retriever (`2 * k`) is conservative; if filters reject everything, the hybrid `k` may shrink. Document this in CLI `--explain`.
+- RRF is rank-based, so absolute lexical bm25 normalization (p2-2) doesn't affect fused order; still keep normalization for `--explain` readability.