Files
kebab/tasks/p3/p3-2-fastembed-adapter.md
altair823 d1b99b2994 docs: mark P0–P4 done, add SMOKE recipe, refresh README
State drift after P0–P4 completion + 3 post-merge hotfixes (PR #20
--config across subcommands, PR #24 --config in kb ask, PR #25 RRF
fusion_score normalization). README still framed the project as
"spec frozen, code 0 lines"; phase docs and task specs all carried
status: planned. Sweep:

- README.md: top banner now "P0–P4 done (17/31 tasks) + 3 hotfixes
  applied"; command table marks each subcommand's owning phase and
  current status (kb ask =  via P4-3, kb eval =  P5);
  phase roadmap table grew a Status column (P0–P4 completed, P5
  next, P6–P9 pending); component count bumped 30 → 31 to reflect
  P3-5 (app-wiring, post-spec); core decisions table notes the
  RRF [0,1] normalization invariant; build+실행 section drops the
  "P0 미시작" caveat; new pointers to HOTFIXES.md and SMOKE.md.
- docs/SMOKE.md (new): ~80-line recipe for running the full
  pipeline against an isolated /tmp/kb-smoke/ workspace via
  --config, without polluting ~/.config/kb/ or
  ~/.local/share/kb/. Covers fixture seeding, sample config.toml
  with the post-merge defaults, doctor → ingest → list →
  search × 3 modes → inspect → ask sequence, verification
  checklist, and known-behaviour notes (fastembed model
  download, RAG response time, --config hard-fail on missing
  path).
- tasks/phase-{0..4}-*.md: status frontmatter flipped planned →
  completed.
- tasks/p0/, tasks/p1/, tasks/p2/, tasks/p3/, tasks/p4/: same
  status flip across all 17 component task specs (1+6+2+5+3).
  P5–P9 stay planned.

cargo test --workspace: 319 passed; clippy clean (no source
changes in this commit, just docs + frontmatter).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:32:28 +00:00

5.2 KiB

phase: P3 component: kb-embed-local (fastembed adapter) task_id: p3-2 title: "fastembed-rs Embedder for multilingual-e5-small" status: completed depends_on: [p3-1] unblocks: [p3-3, p3-4] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_sections: [design §7.2 Embedder, report §11.3 local embedding, design §6.4 [models.embedding], design §9 versioning]

p3-2 — fastembed adapter

Goal

Provide FastembedEmbedder implementing Embedder for multilingual-e5-small (default) using fastembed-rs (ONNX runtime). Apply Document/Query prefix per §11.3. Honor batch_size from config.

Why now / why this size

First real Embedder. Drives EmbeddingId recipe (model_id + model_version + dims) downstream. Isolated from store/search so model swaps remain config-only.

Allowed dependencies

  • kb-core
  • kb-config
  • kb-embed
  • fastembed = "4" (or current stable)
  • tokenizers
  • ort (transitive via fastembed)
  • tracing
  • thiserror

Forbidden dependencies

  • kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-*, kb-search, kb-llm*, kb-rag, kb-tui, kb-desktop, network HTTP libs (model download is fastembed's responsibility)

Inputs

input type source
kb-config::Config.models.embedding settings runtime
EmbeddingInput[..] kb_core::EmbeddingInput<'_>[] callers
model cache data_dir/models/fastembed/ filesystem

Outputs

output type downstream
Vec<Vec<f32>> row-aligned, dimensions = 384 kb-store-vector, query vectors for hybrid search
model identity (EmbeddingModelId, EmbeddingVersion, usize) record fields, embedding_id recipe

Public surface (signatures only — no new types)

pub struct FastembedEmbedder { /* internal: TextEmbedding instance + model meta */ }

impl FastembedEmbedder {
    pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
}

impl kb_core::Embedder for FastembedEmbedder {
    fn model_id(&self) -> kb_core::EmbeddingModelId;
    fn model_version(&self) -> kb_core::EmbeddingVersion;
    fn dimensions(&self) -> usize;
    fn embed(&self, inputs: &[kb_core::EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>;
}

Behavior contract

  • Default model multilingual-e5-small (384 dims). model_id() returns EmbeddingModelId("multilingual-e5-small").
  • model_version() returns EmbeddingVersion("v1") initially. Bump per §9 if fastembed upgrades the bundled weights.
  • Apply e5 prefix per §11.3: input prefixed with "passage: " for EmbeddingKind::Document, "query: " for EmbeddingKind::Query BEFORE tokenization.
  • Batch processing respects config.models.embedding.batch_size. Inputs longer than the batch are split into multiple inference calls and concatenated.
  • L2-normalize each vector before returning (e5 convention).
  • Dimensions must equal config.models.embedding.dimensions AND the model's actual dim. Mismatch returns anyhow::Error at construction (not at first embed).
  • Model files cached under config.storage.model_dir/fastembed/ (downloaded on first use).
  • Determinism: identical input + identical model version → identical vectors (tolerance < 1e-6 on aggregate hash for snapshot tests).
  • No async runtime: the trait is synchronous. fastembed is sync internally.

Storage / wire effects

  • Reads/writes data_dir/models/fastembed/ (model cache).
  • Otherwise no DB or wire effects.

Test plan

kind description fixture / data
unit construction with default config returns dims=384 tmp config
unit construction with mismatched dims returns error tmp config
unit EmbeddingKind::Query vs Document for same text yield different vectors (cosine < 1.0) inline
unit output vectors are L2-normalized (norm ~= 1.0 ± 1e-3) inline
determinism identical input twice → identical output (hash-of-floats compare) inline
performance batch of 64 short inputs completes in < 5s on CI host tmp config (skip on slow CI via #[ignore])
snapshot aggregate hash of vectors for 5 known sentences stable across runs fixtures/embed/known-sentences.json

All tests under cargo test -p kb-embed-local. Mark slow tests #[ignore] and run via cargo test -- --ignored in dedicated CI lane.

Definition of Done

  • cargo check -p kb-embed-local passes
  • cargo test -p kb-embed-local passes (excluding #[ignore])
  • First-run model download works under data_dir/models/fastembed/
  • No imports outside Allowed dependencies
  • PR links design §11.3, §6.4, §9

Out of scope

  • Reranker (P+).
  • Other model providers (Ollama embedding endpoint, candle) — separate adapter crates.
  • Visual / image embeddings (P6).

Risks / notes

  • ONNX runtime first-load latency on M-series Macs (Metal) can be 1-2 s; tests share a OnceCell<FastembedEmbedder>.
  • Forgetting the e5 prefix silently degrades retrieval quality. Tests must assert query/document yield distinct vectors.
  • Bumping EmbeddingVersion invalidates every embedding_id. Treat as a versioning event per §9 — provides justification in PR body.