Files
kebab/tasks/p3/p3-2-fastembed-adapter.md
altair823 f9714aa5cb docs(rename): kb → kebab — README, tasks/, docs/, design doc, report
마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:01:55 +00:00

120 lines
5.2 KiB
Markdown

---
phase: P3
component: kebab-embed-local (fastembed adapter)
task_id: p3-2
title: "fastembed-rs Embedder for multilingual-e5-small"
status: completed
depends_on: [p3-1]
unblocks: [p3-3, p3-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §7.2 Embedder, report §11.3 local embedding, design §6.4 [models.embedding], design §9 versioning]
---
# p3-2 — fastembed adapter
## Goal
Provide `FastembedEmbedder` implementing `Embedder` for `multilingual-e5-small` (default) using `fastembed-rs` (ONNX runtime). Apply Document/Query prefix per §11.3. Honor `batch_size` from config.
## Why now / why this size
First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + dims) downstream. Isolated from store/search so model swaps remain config-only.
## Allowed dependencies
- `kebab-core`
- `kebab-config`
- `kebab-embed`
- `fastembed = "4"` (or current stable)
- `tokenizers`
- `ort` (transitive via fastembed)
- `tracing`
- `thiserror`
## Forbidden dependencies
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, network HTTP libs (model download is fastembed's responsibility)
## Inputs
| input | type | source |
|-------|------|--------|
| `kebab-config::Config.models.embedding` | settings | runtime |
| `EmbeddingInput[..]` | `kebab_core::EmbeddingInput<'_>[]` | callers |
| model cache | `data_dir/models/fastembed/` | filesystem |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<Vec<f32>>` | row-aligned, `dimensions = 384` | `kebab-store-vector`, query vectors for hybrid search |
| model identity | `(EmbeddingModelId, EmbeddingVersion, usize)` | record fields, `embedding_id` recipe |
## Public surface (signatures only — no new types)
```rust
pub struct FastembedEmbedder { /* internal: TextEmbedding instance + model meta */ }
impl FastembedEmbedder {
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
}
impl kebab_core::Embedder for FastembedEmbedder {
fn model_id(&self) -> kebab_core::EmbeddingModelId;
fn model_version(&self) -> kebab_core::EmbeddingVersion;
fn dimensions(&self) -> usize;
fn embed(&self, inputs: &[kebab_core::EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>;
}
```
## Behavior contract
- Default model `multilingual-e5-small` (384 dims). `model_id()` returns `EmbeddingModelId("multilingual-e5-small")`.
- `model_version()` returns `EmbeddingVersion("v1")` initially. Bump per §9 if fastembed upgrades the bundled weights.
- Apply e5 prefix per §11.3: input prefixed with `"passage: "` for `EmbeddingKind::Document`, `"query: "` for `EmbeddingKind::Query` BEFORE tokenization.
- Batch processing respects `config.models.embedding.batch_size`. Inputs longer than the batch are split into multiple inference calls and concatenated.
- L2-normalize each vector before returning (e5 convention).
- Dimensions must equal `config.models.embedding.dimensions` AND the model's actual dim. Mismatch returns `anyhow::Error` at construction (not at first `embed`).
- Model files cached under `config.storage.model_dir/fastembed/` (downloaded on first use).
- Determinism: identical input + identical model version → identical vectors (tolerance < 1e-6 on aggregate hash for snapshot tests).
- No async runtime: the trait is synchronous. fastembed is sync internally.
## Storage / wire effects
- Reads/writes `data_dir/models/fastembed/` (model cache).
- Otherwise no DB or wire effects.
## Test plan
| kind | description | fixture / data |
|------|-------------|----------------|
| unit | construction with default config returns dims=384 | tmp config |
| unit | construction with mismatched dims returns error | tmp config |
| unit | `EmbeddingKind::Query` vs `Document` for same text yield different vectors (cosine < 1.0) | inline |
| unit | output vectors are L2-normalized (norm ~= 1.0 ± 1e-3) | inline |
| determinism | identical input twice → identical output (hash-of-floats compare) | inline |
| performance | batch of 64 short inputs completes in < 5s on CI host | tmp config (skip on slow CI via `#[ignore]`) |
| snapshot | aggregate hash of vectors for 5 known sentences stable across runs | `fixtures/embed/known-sentences.json` |
All tests under `cargo test -p kebab-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane.
## Definition of Done
- [ ] `cargo check -p kebab-embed-local` passes
- [ ] `cargo test -p kebab-embed-local` passes (excluding `#[ignore]`)
- [ ] First-run model download works under `data_dir/models/fastembed/`
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §11.3, §6.4, §9
## Out of scope
- Reranker (P+).
- Other model providers (Ollama embedding endpoint, candle) — separate adapter crates.
- Visual / image embeddings (P6).
## Risks / notes
- ONNX runtime first-load latency on M-series Macs (Metal) can be 1-2 s; tests share a `OnceCell<FastembedEmbedder>`.
- Forgetting the e5 prefix silently degrades retrieval quality. Tests must assert query/document yield distinct vectors.
- Bumping `EmbeddingVersion` invalidates every `embedding_id`. Treat as a versioning event per §9 — provides justification in PR body.