Compare commits
26 Commits
spec/fb-42
...
v0.6.0
| Author | SHA1 | Date | |
|---|---|---|---|
| 72798bd3ff | |||
|
|
c3177561b9 | ||
| a465b71f99 | |||
|
|
787007172a | ||
|
|
b954e9ce66 | ||
|
|
c62a8ff503 | ||
|
|
69c94b6692 | ||
|
|
d5321701ea | ||
|
|
2c3461c465 | ||
| 240120ee80 | |||
|
|
5870a1de15 | ||
|
|
f00fb376fe | ||
|
|
bb0ec0469f | ||
|
|
f303c76f52 | ||
|
|
cd5b1e3bfc | ||
| 3a9a52326d | |||
|
|
b53376e96e | ||
|
|
441f1192ee | ||
|
|
e8da415624 | ||
|
|
d8e5f35601 | ||
|
|
6ab0d782ef | ||
|
|
2bbe94eb05 | ||
|
|
9ac13fa256 | ||
|
|
67f2c16cc2 | ||
|
|
1ebbd6b711 | ||
|
|
892175d009 |
44
Cargo.lock
generated
44
Cargo.lock
generated
@@ -3525,7 +3525,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-app"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"base64 0.22.1",
|
||||
@@ -3569,7 +3569,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-chunk"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"blake3",
|
||||
@@ -3584,7 +3584,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-cli"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"clap",
|
||||
@@ -3605,7 +3605,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-config"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"dirs 5.0.1",
|
||||
@@ -3620,7 +3620,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-core"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"blake3",
|
||||
@@ -3634,7 +3634,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-embed"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"blake3",
|
||||
@@ -3648,7 +3648,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-embed-local"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"fastembed",
|
||||
@@ -3661,7 +3661,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-eval"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"kebab-app",
|
||||
@@ -3680,7 +3680,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-llm"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"kebab-core",
|
||||
@@ -3689,7 +3689,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-llm-local"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"kebab-config",
|
||||
@@ -3706,7 +3706,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-mcp"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"kebab-app",
|
||||
@@ -3724,7 +3724,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-normalize"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"kebab-core",
|
||||
@@ -3739,7 +3739,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-parse-image"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"ab_glyph",
|
||||
"anyhow",
|
||||
@@ -3763,7 +3763,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-parse-md"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"kebab-core",
|
||||
@@ -3780,7 +3780,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-parse-pdf"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"blake3",
|
||||
@@ -3793,7 +3793,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-parse-types"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"kebab-core",
|
||||
"serde",
|
||||
@@ -3801,7 +3801,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-rag"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"blake3",
|
||||
@@ -3822,7 +3822,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-search"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"globset",
|
||||
@@ -3841,7 +3841,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-source-fs"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"blake3",
|
||||
@@ -3858,7 +3858,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-store-sqlite"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"blake3",
|
||||
@@ -3879,7 +3879,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-store-vector"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"arrow",
|
||||
@@ -3903,7 +3903,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "kebab-tui"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"crossterm",
|
||||
|
||||
@@ -30,7 +30,7 @@ edition = "2024"
|
||||
rust-version = "1.85"
|
||||
license = "MIT OR Apache-2.0"
|
||||
repository = "https://github.com/altair823/kebab"
|
||||
version = "0.5.0"
|
||||
version = "0.6.0"
|
||||
|
||||
[workspace.dependencies]
|
||||
anyhow = "1"
|
||||
|
||||
17
README.md
17
README.md
@@ -7,7 +7,7 @@
|
||||
- **Rust toolchain** ≥ 1.85 (workspace 가 edition 2024 + resolver 3 사용). [rustup](https://rustup.rs) 권장.
|
||||
- **Ollama** — `kebab ask` 와 이미지 OCR/caption 가 사용. `https://ollama.com/download` 에서 설치 후 `ollama serve` 실행. 기본 LLM 은 gemma4 계열 (`ollama pull gemma4:e4b`) — OCR / caption 도 같은 family 라 모델 하나만 pull 하면 됨. 더 큰 variant 원하면 `gemma4:26b` 등으로 config override. config 의 `[models.llm].endpoint` 에 host:port 명시.
|
||||
- **빌드 디스크** — 첫 빌드 시 `target/` 가 6–10 GB (Lance + DataFusion + fastembed). 여유 확인.
|
||||
- **fastembed 모델** — 첫 `kebab ingest` 시 `multilingual-e5-small` (~470 MB) 자동 다운로드.
|
||||
- **fastembed 모델** — 첫 `kebab ingest` 시 `multilingual-e5-large` (~1.3 GB, fb-39b) 자동 다운로드. `config.toml` 에서 `model = "multilingual-e5-small"` 로 명시하면 이전 모델 사용.
|
||||
|
||||
## 설치
|
||||
|
||||
@@ -71,7 +71,7 @@ kebab doctor
|
||||
|------|------|
|
||||
| `kebab init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 |
|
||||
| `kebab ingest [<path>]` | Markdown / 이미지 / PDF 색인 (idempotent). TTY 에서는 stderr 진행 바, non-TTY (CI / pipe) 는 stderr 한 줄씩, `--json` 은 stdout 에 `ingest_progress.v1` 라인 streaming 후 마지막에 `ingest_report.v1`. Ctrl-C 한 번이면 현재 asset 마무리 후 abort (부분 commit 보존, idempotent re-run), 두 번째 Ctrl-C 는 hard exit. Markdown title 이 frontmatter 에 없어도 첫 H1 → H2 → 첫 paragraph 80 자 → 파일명 순으로 자동 채움 (parser_version `md-frontmatter-v2`) — 기존 색인된 doc 도 다음 ingest 에서 새 title 로 갱신. **Incremental** (p9-fb-23): 두 번째 이후의 ingest 는 변하지 않은 doc (blake3 + parser/chunker/embedder version 모두 동일) 의 parse/chunk/embed/vector upsert 를 자동 스킵. final summary 에 `N unchanged` 카운트 표시. `--force-reingest` 로 skip 무시 강제 재처리. **지원 형식** (extractor 자동 결정 — config 에 명시 불가): Markdown (`.md`), 이미지 (`.png` / `.jpg` / `.jpeg`, OCR + caption), PDF (`.pdf`). 다른 확장자는 자동 skip — `IngestItem.warnings` 에 사유 (`"unsupported media type: .docx"` 등), `IngestReport.skipped_by_extension` 에 카운트 분류, CLI / TUI summary 에 breakdown 표시. |
|
||||
| `kebab search --mode {lexical,vector,hybrid} "<query>" [--no-cache] [--max-tokens N] [--snippet-chars N] [--cursor <opaque>] [--tag T] [--lang L] [--path-glob G] [--trust-min LEVEL] [--media TYPE] [--ingested-after RFC3339] [--doc-id ID] [--trace]` | 검색. hybrid는 RRF fusion, citation 포함. 같은 process 안에서 동일 query (NFKC + trim + lowercase 정규화) 반복 시 in-process LRU 캐시 hit (capacity = `[search] cache_capacity`, default 256). `--no-cache` 로 강제 bypass — 디버깅용. ingest commit 발생 시 `kv['corpus_revision']` bump 으로 모든 entry 자동 stale. **`--max-tokens` / `--snippet-chars` / `--cursor` (p9-fb-34)** — agent budget controls. `--json` 출력은 `search_response.v1` wrapper (`{hits, next_cursor, truncated}`) — pre-fb-34 의 bare array 와 호환 안 됨. mismatched cursor → `error.v1.code = stale_cursor`. **filter flags (p9-fb-36):** `--tag` 는 반복 가능 flag (`--tag rust --tag async`) 로 OR 매칭, `--media` 는 `,` 구분 다중 값 OR 매칭, 나머지 flags 간은 AND 조합. `--trust-min` 은 `primary\|secondary\|generated` 중 하나 (해당 level 이상 포함). `--ingested-after` 는 RFC3339 UTC — 파싱 실패 시 `error.v1.code = config_invalid` (exit 2). `--media md` 는 `markdown` alias 로 정규화. 알 수 없는 `--media` 값은 무조건 empty hits (오류 아님). **`--trace` (p9-fb-37)** — `search_response.v1.trace` 에 lexical / vector pre-fusion 후보 + RRF union + per-stage timing (`lexical_ms` / `vector_ms` / `fusion_ms` / `total_ms`) 노출. trace 요청은 캐시 우회 (`--no-cache` 없이도 항상 cold). |
|
||||
| `kebab search --mode {lexical,vector,hybrid} "<query>" [--no-cache] [--max-tokens N] [--snippet-chars N] [--cursor <opaque>] [--tag T] [--lang L] [--path-glob G] [--trust-min LEVEL] [--media TYPE] [--ingested-after RFC3339] [--doc-id ID] [--trace] [--bulk]` | 검색. hybrid는 RRF fusion, citation 포함. 같은 process 안에서 동일 query (NFKC + trim + lowercase 정규화) 반복 시 in-process LRU 캐시 hit (capacity = `[search] cache_capacity`, default 256). `--no-cache` 로 강제 bypass — 디버깅용. ingest commit 발생 시 `kv['corpus_revision']` bump 으로 모든 entry 자동 stale. **`--max-tokens` / `--snippet-chars` / `--cursor` (p9-fb-34)** — agent budget controls. `--json` 출력은 `search_response.v1` wrapper (`{hits, next_cursor, truncated}`) — pre-fb-34 의 bare array 와 호환 안 됨. mismatched cursor → `error.v1.code = stale_cursor`. **filter flags (p9-fb-36):** `--tag` 는 반복 가능 flag (`--tag rust --tag async`) 로 OR 매칭, `--media` 는 `,` 구분 다중 값 OR 매칭, 나머지 flags 간은 AND 조합. `--trust-min` 은 `primary\|secondary\|generated` 중 하나 (해당 level 이상 포함). `--ingested-after` 는 RFC3339 UTC — 파싱 실패 시 `error.v1.code = config_invalid` (exit 2). `--media md` 는 `markdown` alias 로 정규화. 알 수 없는 `--media` 값은 무조건 empty hits (오류 아님). **`--trace` (p9-fb-37)** — `search_response.v1.trace` 에 lexical / vector pre-fusion 후보 + RRF union + per-stage timing (`lexical_ms` / `vector_ms` / `fusion_ms` / `total_ms`) 노출. trace 요청은 캐시 우회 (`--no-cache` 없이도 항상 cold). **`--bulk` (p9-fb-42)** — stdin ndjson 으로 N query 한 번에 실행. `--json` 면 stdout per-query ndjson (`bulk_search_item.v1`) + stderr summary (`bulk_summary: total=N succeeded=S failed=F`). Cap 100. agent 가 query decomposition 후 sub-query 일괄 실행 시 single round-trip — App instance 재사용으로 캐시 / embedder cold-start 비용 한 번만. Per-query failure 는 item 의 `error` (error.v1) 에 격리, 다른 query 계속 진행. |
|
||||
| `kebab list docs` | 색인된 문서 목록 |
|
||||
| `kebab inspect doc <id>` / `kebab inspect chunk <id>` | raw record 보기 |
|
||||
| `kebab fetch chunk <id> [--context N]` / `kebab fetch doc <id> [--max-tokens N]` / `kebab fetch span <doc_id> <ls> <le> [--max-tokens N]` | (p9-fb-35) verbatim text fetch from indexed corpus. wire = `fetch_result.v1` (kind discriminator). chunk: target + ±N ordinal-context chunks. doc: full normalized markdown. span: 1-based line range (PDF/audio rejected as `error.v1.code = span_not_supported`). chars/4 budget on doc/span. |
|
||||
@@ -83,7 +83,7 @@ kebab doctor
|
||||
| `kebab schema [--json]` | introspection — wire schemas / capabilities / models / stats 한 번에. `--json` 은 `schema.v1` wire; 사람 모드는 서식 출력. **stats 에 (p9-fb-37) `media_breakdown` (5 keys: markdown / pdf / image / audio / other) + `lang_breakdown` (BCP-47 코드, NULL 은 literal `"null"`) + `index_bytes` (sqlite + lancedb on-disk 합계) + `stale_doc_count` (`config.search.stale_threshold_days` 초과 doc 수) 추가.** |
|
||||
| `kebab ingest-file <path>` | 단일 파일 ingest (workspace 외부 가능). 바이트는 `<workspace.root>/_external/<hash12>.<ext>` 로 copy. `.kebabignore` 매치 시 stderr warn 후 진행 (explicit ingest 가 bypass intent). |
|
||||
| `kebab ingest-stdin --title <T> [--source-uri <URI>]` | stdin 의 markdown 본문 ingest. frontmatter (title + source_uri) 자동 prepend. v1 markdown only. |
|
||||
| `kebab mcp` | MCP (Model Context Protocol) stdio server. agent host (Claude Code / Cursor / OpenAI Agents) 가 spawn 하여 tool 호출 (`search` / `ask` / `schema` / `doctor` / `ingest_file` / `ingest_stdin`). `--config` honor. |
|
||||
| `kebab mcp` | MCP (Model Context Protocol) stdio server. agent host (Claude Code / Cursor / OpenAI Agents) 가 spawn 하여 tool 호출 (`search` / `bulk_search` / `ask` / `fetch` / `schema` / `doctor` / `ingest_file` / `ingest_stdin`). `--config` honor. |
|
||||
|
||||
모든 명령에 `--json` 플래그. 출력은 frozen wire schema v1 (`schema_version` 항상 포함, 예: `ingest_report.v1`, `ingest_progress.v1`, `search_hit.v1`, `answer.v1`, `doctor.v1`, `reset_report.v1`, `schema.v1`). `--json` 모드에서 fatal error 는 stderr 에 `error.v1` ndjson 으로 emit (exit code 0/1/2/3 unchanged).
|
||||
|
||||
@@ -133,7 +133,7 @@ flowchart TB
|
||||
subgraph Pipeline["도메인 + 파이프라인"]
|
||||
parse["parse-md / parse-pdf / parse-image"]
|
||||
chunker["chunker (md-heading-v1, pdf-page-v1)"]
|
||||
embedder["embedder (fastembed multilingual-e5-small)"]
|
||||
embedder["embedder (fastembed multilingual-e5-large)"]
|
||||
retriever["retriever (lexical / vector / hybrid RRF)"]
|
||||
rag["RAG pipeline"]
|
||||
end
|
||||
@@ -178,7 +178,12 @@ flowchart TB
|
||||
|
||||
## Configuration
|
||||
|
||||
- `~/.config/kebab/config.toml` — `kebab init` 가 XDG 경로에 생성. `[workspace]` (root, exclude — include 필드는 제거됨, 지원 형식은 자동 결정), `[storage]`, `[chunking]`, `[models.embedding]`, `[models.llm]`, `[image.ocr]`, `[image.caption]`, `[search]`, `[rag]`, `[ui]` 절. `[ui] theme = "dark" | "light"` 로 TUI 팔레트 선택 (default `"dark"`, 알 수 없는 값은 dark fallback). `[search] stale_threshold_days = 30` (p9-fb-32) — search hit / RAG citation 의 `stale` 플래그 기준 (default 30 일, `0` 으로 비활성화). 옛 config 의 `workspace.include = [...]` 은 silently 무시 + 단발 deprecation warning (p9-fb-25).
|
||||
- `~/.config/kebab/config.toml` — `kebab init` 가 XDG 경로에 생성. `[workspace]` (root, exclude — include 필드는 제거됨, 지원 형식은 자동 결정), `[storage]`, `[chunking]`, `[models.embedding]`, `[models.llm]`, `[image.ocr]`, `[image.caption]`, `[search]`, `[rag]`, `[ui]` 절.
|
||||
- `[models.embedding]` —
|
||||
- `model` (default `"multilingual-e5-large"`, fb-39b) — 다국어 sentence embedding 모델. 1024-dim. ONNX (~1.3 GB) 첫 실행 시 fastembed cache (`config.storage.model_dir/fastembed/`) 에 자동 다운로드. `"multilingual-e5-small"` (384 dim) 는 backwards-compat 으로 사용 가능 — TOML 에 명시.
|
||||
- `dimensions` (default `1024`) — 모델의 embedding 차원. config 와 LanceDB stored dim 불일치 시 검색 결과 0 건 (orphan table). 모델 변경 시 `kebab reset --vector-only && kebab ingest` 로 vector index 재구축 권장.
|
||||
- `[ui] theme = "dark" | "light"` 로 TUI 팔레트 선택 (default `"dark"`, 알 수 없는 값은 dark fallback).
|
||||
- `[search] stale_threshold_days = 30` (p9-fb-32) — search hit / RAG citation 의 `stale` 플래그 기준 (default 30 일, `0` 으로 비활성화). 옛 config 의 `workspace.include = [...]` 은 silently 무시 + 단발 deprecation warning (p9-fb-25).
|
||||
- `[rag] prompt_template_version` (default `"rag-v2"`) — RAG system prompt version. `"rag-v1"` 은 legacy backwards-compat (사용자 명시 시 유지). v2 강화 규칙: (1) fact 인용 시 [#번호] 앞에 chunk 속 원문 큰따옴표 표기, (2) 학습 지식 동원 금지, (3) 근거 모호 시 "확실하지 않다" 명시.
|
||||
- `--config <path>` flag — 임시 워크스페이스 / 격리 테스트 시 사용. CLI / TUI 모두 honor.
|
||||
- `KEBAB_*` env — 일부 키 override (`KEBAB_RAG_SCORE_GATE`, `KEBAB_EVAL_GOLDEN`, `KEBAB_COMMIT_HASH` 등).
|
||||
@@ -198,7 +203,7 @@ config 예시는 [docs/SMOKE.md](docs/SMOKE.md) 의 `/tmp/kebab-smoke/config.tom
|
||||
|
||||
## MCP 사용
|
||||
|
||||
`kebab mcp` 가 stdio MCP server. 6 tool: `search` / `ask` / `schema` / `doctor` / `ingest_file` / `ingest_stdin`.
|
||||
`kebab mcp` 가 stdio MCP server. 8 tool: `search` / `bulk_search` (p9-fb-42 — N query 한 번에) / `ask` / `fetch` (p9-fb-35) / `schema` / `doctor` / `ingest_file` / `ingest_stdin`.
|
||||
|
||||
Claude Code 빠른 등록 (`~/.claude/mcp.json` 또는 host 동등 위치):
|
||||
|
||||
|
||||
296
crates/kebab-app/src/bulk.rs
Normal file
296
crates/kebab-app/src/bulk.rs
Normal file
@@ -0,0 +1,296 @@
|
||||
//! p9-fb-42: bulk multi-query facade. Sequential for-loop reusing
|
||||
//! one App instance so embedder cold-start + LRU cache amortize
|
||||
//! across the N queries.
|
||||
|
||||
use anyhow::Context;
|
||||
use kebab_core::{
|
||||
BulkSearchItem, BulkSearchSummary, DocumentId, Lang, SearchFilters, SearchHit, SearchMode,
|
||||
SearchOpts, SearchQuery, TrustLevel,
|
||||
};
|
||||
use serde_json::Value;
|
||||
|
||||
use crate::{App, SearchResponse};
|
||||
|
||||
/// Hard cap on items per bulk call. Documented in spec — agents that
|
||||
/// hit this should batch-split.
|
||||
pub const BULK_QUERIES_MAX: usize = 100;
|
||||
|
||||
/// p9-fb-42: bulk search facade. Returns `(items, summary)` always
|
||||
/// — per-query failures embed `error.v1` JSON in the item rather
|
||||
/// than aborting the bulk call. Returns `Err` only for input
|
||||
/// validation failures (e.g. >100 queries).
|
||||
#[doc(hidden)]
|
||||
pub fn bulk_search_with_config(
|
||||
config: kebab_config::Config,
|
||||
raw_items: Vec<Value>,
|
||||
) -> anyhow::Result<(Vec<BulkSearchItem>, BulkSearchSummary)> {
|
||||
if raw_items.len() > BULK_QUERIES_MAX {
|
||||
anyhow::bail!(
|
||||
"queries: max {} items, got {}",
|
||||
BULK_QUERIES_MAX,
|
||||
raw_items.len()
|
||||
);
|
||||
}
|
||||
|
||||
let app = App::open_with_config(config).context("kebab-app: open for bulk_search")?;
|
||||
|
||||
let mut results: Vec<BulkSearchItem> = Vec::with_capacity(raw_items.len());
|
||||
let mut succeeded: u32 = 0;
|
||||
let mut failed: u32 = 0;
|
||||
|
||||
for raw in raw_items {
|
||||
let item = run_one(&app, raw);
|
||||
if item.error.is_some() {
|
||||
failed += 1;
|
||||
} else {
|
||||
succeeded += 1;
|
||||
}
|
||||
results.push(item);
|
||||
}
|
||||
|
||||
let summary = BulkSearchSummary {
|
||||
total: succeeded + failed,
|
||||
succeeded,
|
||||
failed,
|
||||
};
|
||||
Ok((results, summary))
|
||||
}
|
||||
|
||||
fn run_one(app: &App, raw: Value) -> BulkSearchItem {
|
||||
let echo = raw.clone();
|
||||
match parse_one(&raw) {
|
||||
Ok((query, opts)) => match app.search_with_opts(query, opts) {
|
||||
Ok(resp) => BulkSearchItem {
|
||||
query: echo,
|
||||
response: Some(serialize_search_response(&resp)),
|
||||
error: None,
|
||||
},
|
||||
Err(e) => BulkSearchItem {
|
||||
query: echo,
|
||||
response: None,
|
||||
error: Some(error_v1_json("retrieval_error", &format!("{e:#}"), None)),
|
||||
},
|
||||
},
|
||||
Err(msg) => BulkSearchItem {
|
||||
query: echo,
|
||||
response: None,
|
||||
error: Some(error_v1_json("invalid_input", &msg, None)),
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
/// Mirror of `kebab-cli::wire::wire_search_response` — `SearchResponse`
|
||||
/// itself is not `Serialize`, so we build the `search_response.v1`-shaped
|
||||
/// JSON manually. Each hit also gets `score` promoted from
|
||||
/// `retrieval.fusion_score` per §2.2, matching the CLI wire layer.
|
||||
fn serialize_search_response(r: &SearchResponse) -> Value {
|
||||
let mut v = serde_json::json!({
|
||||
"schema_version": "search_response.v1",
|
||||
"hits": r.hits.iter().map(serialize_search_hit).collect::<Vec<_>>(),
|
||||
"next_cursor": r.next_cursor,
|
||||
"truncated": r.truncated,
|
||||
});
|
||||
if let Value::Object(ref mut map) = v {
|
||||
let trace_v = match &r.trace {
|
||||
Some(t) => serde_json::to_value(t).unwrap_or(Value::Null),
|
||||
None => Value::Null,
|
||||
};
|
||||
map.insert("trace".to_string(), trace_v);
|
||||
}
|
||||
v
|
||||
}
|
||||
|
||||
fn serialize_search_hit(h: &SearchHit) -> Value {
|
||||
let mut v = serde_json::to_value(h).unwrap_or(Value::Null);
|
||||
if let Value::Object(ref mut map) = v {
|
||||
if let Some(Value::Object(retrieval)) = map.get("retrieval") {
|
||||
if let Some(score) = retrieval.get("fusion_score").cloned() {
|
||||
map.insert("score".to_string(), score);
|
||||
}
|
||||
}
|
||||
map.insert(
|
||||
"schema_version".to_string(),
|
||||
Value::String("search_hit.v1".to_string()),
|
||||
);
|
||||
}
|
||||
v
|
||||
}
|
||||
|
||||
fn parse_one(raw: &Value) -> Result<(SearchQuery, SearchOpts), String> {
|
||||
let obj = raw.as_object().ok_or("expected JSON object")?;
|
||||
let text = obj
|
||||
.get("query")
|
||||
.and_then(|v| v.as_str())
|
||||
.ok_or("missing required field: query")?
|
||||
.to_string();
|
||||
|
||||
let mode = match obj.get("mode").and_then(|v| v.as_str()) {
|
||||
None => SearchMode::Hybrid,
|
||||
Some("hybrid") => SearchMode::Hybrid,
|
||||
Some("lexical") => SearchMode::Lexical,
|
||||
Some("vector") => SearchMode::Vector,
|
||||
Some(other) => return Err(format!("invalid mode: {other:?}")),
|
||||
};
|
||||
|
||||
let k = obj
|
||||
.get("k")
|
||||
.and_then(|v| v.as_u64())
|
||||
.map(|n| n as usize)
|
||||
.unwrap_or(0); // 0 → use config default in app
|
||||
|
||||
let trust_min = match obj.get("trust_min").and_then(|v| v.as_str()) {
|
||||
None => None,
|
||||
Some("primary") => Some(TrustLevel::Primary),
|
||||
Some("secondary") => Some(TrustLevel::Secondary),
|
||||
Some("generated") => Some(TrustLevel::Generated),
|
||||
Some(other) => return Err(format!("invalid trust_min: {other:?}")),
|
||||
};
|
||||
|
||||
let ingested_after = match obj.get("ingested_after").and_then(|v| v.as_str()) {
|
||||
None => None,
|
||||
Some(s) => Some(
|
||||
time::OffsetDateTime::parse(s, &time::format_description::well_known::Rfc3339)
|
||||
.map_err(|e| format!("invalid ingested_after RFC3339 {s:?}: {e}"))?,
|
||||
),
|
||||
};
|
||||
|
||||
let media: Vec<String> = obj
|
||||
.get("media")
|
||||
.and_then(|v| v.as_array())
|
||||
.map(|arr| {
|
||||
arr.iter()
|
||||
.filter_map(|x| x.as_str().map(normalize_media_alias))
|
||||
.collect()
|
||||
})
|
||||
.unwrap_or_default();
|
||||
|
||||
let tags_any: Vec<String> = obj
|
||||
.get("tag")
|
||||
.and_then(|v| v.as_array())
|
||||
.map(|arr| {
|
||||
arr.iter()
|
||||
.filter_map(|x| x.as_str().map(String::from))
|
||||
.collect()
|
||||
})
|
||||
.unwrap_or_default();
|
||||
|
||||
let lang = obj
|
||||
.get("lang")
|
||||
.and_then(|v| v.as_str())
|
||||
.map(|s| Lang(s.to_string()));
|
||||
|
||||
let path_glob = obj
|
||||
.get("path_glob")
|
||||
.and_then(|v| v.as_str())
|
||||
.map(String::from);
|
||||
|
||||
let doc_id = obj
|
||||
.get("doc_id")
|
||||
.and_then(|v| v.as_str())
|
||||
.map(|s| DocumentId(s.to_string()));
|
||||
|
||||
let filters = SearchFilters {
|
||||
tags_any,
|
||||
lang,
|
||||
path_glob,
|
||||
trust_min,
|
||||
media,
|
||||
ingested_after,
|
||||
doc_id,
|
||||
};
|
||||
|
||||
let opts = SearchOpts {
|
||||
max_tokens: obj
|
||||
.get("max_tokens")
|
||||
.and_then(|v| v.as_u64())
|
||||
.map(|n| n as usize),
|
||||
snippet_chars: obj
|
||||
.get("snippet_chars")
|
||||
.and_then(|v| v.as_u64())
|
||||
.map(|n| n as usize),
|
||||
cursor: obj.get("cursor").and_then(|v| v.as_str()).map(String::from),
|
||||
trace: obj.get("trace").and_then(|v| v.as_bool()).unwrap_or(false),
|
||||
};
|
||||
|
||||
Ok((
|
||||
SearchQuery {
|
||||
text,
|
||||
mode,
|
||||
k,
|
||||
filters,
|
||||
},
|
||||
opts,
|
||||
))
|
||||
}
|
||||
|
||||
fn normalize_media_alias(s: &str) -> String {
|
||||
match s.to_ascii_lowercase().as_str() {
|
||||
"md" => "markdown".to_string(),
|
||||
other => other.to_string(),
|
||||
}
|
||||
}
|
||||
|
||||
fn error_v1_json(code: &str, message: &str, hint: Option<&str>) -> Value {
|
||||
serde_json::json!({
|
||||
"schema_version": "error.v1",
|
||||
"code": code,
|
||||
"message": message,
|
||||
"hint": hint,
|
||||
})
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn open_temp() -> kebab_config::Config {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let mut cfg = kebab_config::Config::defaults();
|
||||
cfg.storage.data_dir = dir.path().to_string_lossy().into_owned();
|
||||
// Bring up migrations so SqliteStore::open_existing succeeds inside App::open.
|
||||
let store = kebab_store_sqlite::SqliteStore::open(&cfg).unwrap();
|
||||
store.run_migrations().unwrap();
|
||||
drop(store);
|
||||
// Leak the tempdir into a static — tests are short-lived; not worth threading.
|
||||
std::mem::forget(dir);
|
||||
cfg
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn empty_input_returns_empty_summary() {
|
||||
let cfg = open_temp();
|
||||
let (items, summary) = bulk_search_with_config(cfg, vec![]).unwrap();
|
||||
assert!(items.is_empty());
|
||||
assert_eq!(summary.total, 0);
|
||||
assert_eq!(summary.succeeded, 0);
|
||||
assert_eq!(summary.failed, 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn over_cap_returns_err() {
|
||||
let cfg = open_temp();
|
||||
let raw: Vec<Value> = (0..101)
|
||||
.map(|_| serde_json::json!({"query": "x"}))
|
||||
.collect();
|
||||
let err = bulk_search_with_config(cfg, raw).unwrap_err();
|
||||
let msg = format!("{err:#}");
|
||||
assert!(msg.contains("max 100"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn invalid_item_emits_error_keeps_total_count() {
|
||||
let cfg = open_temp();
|
||||
let raw = vec![
|
||||
serde_json::json!({"query": "ok", "mode": "lexical"}),
|
||||
serde_json::json!({"mode": "lexical"}), // missing required `query`
|
||||
];
|
||||
let (items, summary) = bulk_search_with_config(cfg, raw).unwrap();
|
||||
assert_eq!(items.len(), 2);
|
||||
assert_eq!(summary.total, 2);
|
||||
// First item: lexical mode against empty corpus succeeds with empty hits.
|
||||
assert!(items[0].error.is_none());
|
||||
// Second item: missing required field.
|
||||
assert!(items[1].error.is_some());
|
||||
assert_eq!(items[1].error.as_ref().unwrap()["code"], "invalid_input");
|
||||
}
|
||||
}
|
||||
@@ -55,6 +55,7 @@ use kebab_parse_md::{BodyHints, parse_blocks, parse_frontmatter};
|
||||
use kebab_source_fs::FsSourceConnector;
|
||||
|
||||
mod app;
|
||||
mod bulk;
|
||||
pub mod cursor;
|
||||
pub mod doctor_signal;
|
||||
pub mod error_signal;
|
||||
@@ -72,6 +73,8 @@ pub use ingest_progress::{AggregateCounts, IngestEvent, render_skipped_breakdown
|
||||
pub use reset::{ResetReport, ResetScope};
|
||||
pub use error_wire::{ERROR_V1_ID, ErrorV1, StructuredError, classify};
|
||||
pub use fetch::fetch_with_config;
|
||||
#[doc(hidden)]
|
||||
pub use bulk::{BULK_QUERIES_MAX, bulk_search_with_config};
|
||||
pub use schema::{Capabilities, Models, SCHEMA_V1_ID, SchemaV1, Stats, WireBlock, schema_with_config};
|
||||
pub use staleness::{compute_stale, mark_stale_in_place};
|
||||
|
||||
|
||||
@@ -32,6 +32,7 @@ pub struct Capabilities {
|
||||
pub http_daemon: bool,
|
||||
pub mcp_server: bool,
|
||||
pub single_file_ingest: bool,
|
||||
pub bulk_search: bool,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
@@ -85,6 +86,8 @@ const WIRE_SCHEMAS: &[&str] = &[
|
||||
"citation.v1",
|
||||
"schema.v1",
|
||||
"error.v1",
|
||||
"bulk_search_item.v1",
|
||||
"bulk_search_response.v1",
|
||||
];
|
||||
|
||||
/// Build a [`SchemaV1`] introspection report for the given config.
|
||||
@@ -123,6 +126,7 @@ fn capabilities_snapshot() -> Capabilities {
|
||||
http_daemon: false,
|
||||
mcp_server: true,
|
||||
single_file_ingest: false,
|
||||
bulk_search: true,
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -94,7 +94,8 @@ enum Cmd {
|
||||
|
||||
/// Lexical / vector / hybrid search over chunks.
|
||||
Search {
|
||||
query: String,
|
||||
/// Query text. Not required when `--bulk` is set (queries from stdin).
|
||||
query: Option<String>,
|
||||
|
||||
#[arg(long, default_value_t = 10)]
|
||||
k: usize,
|
||||
@@ -171,6 +172,16 @@ enum Cmd {
|
||||
/// without embeddings via a no-op vector stub.
|
||||
#[arg(long)]
|
||||
trace: bool,
|
||||
|
||||
/// p9-fb-42: bulk multi-query mode. Reads ndjson from stdin —
|
||||
/// one JSON object per line, each item shape mirrors the
|
||||
/// single-query input. Output is per-query ndjson on stdout
|
||||
/// (one `bulk_search_item.v1` per line) plus a summary line on
|
||||
/// stderr. Single-query flags (`--mode`, `--k`, `--tag`, etc.)
|
||||
/// are ignored when `--bulk` is set; pass them per-item in the
|
||||
/// stdin JSON instead. Caps at 100 queries per call.
|
||||
#[arg(long)]
|
||||
bulk: bool,
|
||||
},
|
||||
|
||||
/// Retrieval-augmented question answering.
|
||||
@@ -678,9 +689,97 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
|
||||
ingested_after,
|
||||
doc_id,
|
||||
trace,
|
||||
bulk,
|
||||
} => {
|
||||
// p9-fb-42: bulk mode — stdin ndjson → bulk_search_with_config
|
||||
// → stdout ndjson per query + stderr summary. Single-query
|
||||
// flags are ignored (each item supplies its own).
|
||||
if *bulk {
|
||||
use std::io::{BufRead, Write};
|
||||
|
||||
let cfg = kebab_config::Config::load(cli.config.as_deref())?;
|
||||
|
||||
let stdin = std::io::stdin();
|
||||
let stdin_locked = stdin.lock();
|
||||
let mut raw_items: Vec<serde_json::Value> = Vec::new();
|
||||
for (lineno, line) in stdin_locked.lines().enumerate() {
|
||||
let line = line?;
|
||||
if line.trim().is_empty() {
|
||||
continue;
|
||||
}
|
||||
let v: serde_json::Value =
|
||||
serde_json::from_str(&line).map_err(|e| {
|
||||
anyhow::Error::new(kebab_app::StructuredError(
|
||||
kebab_app::ErrorV1 {
|
||||
schema_version: kebab_app::ERROR_V1_ID
|
||||
.to_string(),
|
||||
code: "config_invalid".to_string(),
|
||||
message: format!(
|
||||
"stdin ndjson line {} parse error: {e}",
|
||||
lineno + 1
|
||||
),
|
||||
details: serde_json::Value::Null,
|
||||
hint: Some(
|
||||
"each line must be a JSON object with at least `query`"
|
||||
.to_string(),
|
||||
),
|
||||
},
|
||||
))
|
||||
})?;
|
||||
raw_items.push(v);
|
||||
}
|
||||
|
||||
let (items, summary) =
|
||||
kebab_app::bulk_search_with_config(cfg, raw_items)?;
|
||||
|
||||
if cli.json {
|
||||
let mut stdout = std::io::stdout().lock();
|
||||
for item in &items {
|
||||
let v = wire::wire_bulk_search_item(item);
|
||||
writeln!(stdout, "{}", serde_json::to_string(&v)?)?;
|
||||
}
|
||||
eprintln!(
|
||||
"bulk_summary: total={} succeeded={} failed={}",
|
||||
summary.total, summary.succeeded, summary.failed,
|
||||
);
|
||||
} else {
|
||||
let mut stdout = std::io::stdout().lock();
|
||||
for (idx, item) in items.iter().enumerate() {
|
||||
writeln!(
|
||||
stdout,
|
||||
"# Query {}: {}",
|
||||
idx + 1,
|
||||
serde_json::to_string(&item.query)?,
|
||||
)?;
|
||||
if let Some(err) = &item.error {
|
||||
writeln!(stdout, "error: {}", err)?;
|
||||
} else if let Some(resp) = &item.response {
|
||||
writeln!(
|
||||
stdout,
|
||||
"{}",
|
||||
serde_json::to_string_pretty(resp)?
|
||||
)?;
|
||||
}
|
||||
writeln!(stdout)?;
|
||||
}
|
||||
eprintln!(
|
||||
"bulk_summary: total={} succeeded={} failed={}",
|
||||
summary.total, summary.succeeded, summary.failed,
|
||||
);
|
||||
}
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let cfg = kebab_config::Config::load(cli.config.as_deref())?;
|
||||
|
||||
// p9-fb-42: bulk mode requires no query; single-query mode requires query.
|
||||
let query_text = match query.as_ref() {
|
||||
Some(q) => q.clone(),
|
||||
None => {
|
||||
return Err(anyhow::anyhow!("query is required unless --bulk is set"));
|
||||
}
|
||||
};
|
||||
|
||||
// p9-fb-36: normalize --media aliases (md → markdown).
|
||||
fn normalize_media_alias(s: &str) -> String {
|
||||
match s.to_ascii_lowercase().as_str() {
|
||||
@@ -732,7 +831,7 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
|
||||
};
|
||||
|
||||
let q = kebab_core::SearchQuery {
|
||||
text: query.clone(),
|
||||
text: query_text,
|
||||
mode: (*mode).into(),
|
||||
k: *k,
|
||||
filters,
|
||||
@@ -1266,6 +1365,7 @@ fn print_schema_text(s: &kebab_app::SchemaV1) {
|
||||
("http_daemon", s.capabilities.http_daemon),
|
||||
("mcp_server", s.capabilities.mcp_server),
|
||||
("single_file_ingest", s.capabilities.single_file_ingest),
|
||||
("bulk_search", s.capabilities.bulk_search),
|
||||
];
|
||||
for (name, on) in caps {
|
||||
let mark = if on { "✓" } else { "✗" };
|
||||
|
||||
@@ -201,6 +201,20 @@ pub fn wire_fetch_result(r: &kebab_core::FetchResult) -> Value {
|
||||
tag_object(v, "fetch_result.v1")
|
||||
}
|
||||
|
||||
/// p9-fb-42: tag a `BulkSearchItem` (already serialized as a Value)
|
||||
/// as `bulk_search_item.v1`. The inner `query` / `response` / `error`
|
||||
/// fields stay verbatim — only the envelope gets the schema_version stamp.
|
||||
pub fn wire_bulk_search_item(item: &kebab_core::BulkSearchItem) -> Value {
|
||||
let mut v = serde_json::to_value(item).expect("BulkSearchItem serializes");
|
||||
if let Value::Object(ref mut map) = v {
|
||||
map.insert(
|
||||
"schema_version".to_string(),
|
||||
Value::String("bulk_search_item.v1".to_string()),
|
||||
);
|
||||
}
|
||||
v
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
@@ -297,7 +311,7 @@ mod tests {
|
||||
json_mode: true, ingest_progress: true, ingest_cancellation: true,
|
||||
rag_multi_turn: true, search_cache: true, incremental_ingest: true,
|
||||
streaming_ask: false, http_daemon: false, mcp_server: false,
|
||||
single_file_ingest: false,
|
||||
single_file_ingest: false, bulk_search: true,
|
||||
},
|
||||
models: Models {
|
||||
parser_version: "x".to_string(),
|
||||
|
||||
@@ -66,8 +66,8 @@ fn cli_mcp_initialize_then_tools_list() {
|
||||
.expect("tools/list result.tools must be an array");
|
||||
assert_eq!(
|
||||
tools.len(),
|
||||
7,
|
||||
"expected 7 tools (schema, doctor, search, ask, fetch, ingest_file, ingest_stdin), got {}: {list}",
|
||||
8,
|
||||
"expected 8 tools (schema, doctor, search, bulk_search, ask, fetch, ingest_file, ingest_stdin), got {}: {list}",
|
||||
tools.len()
|
||||
);
|
||||
|
||||
|
||||
174
crates/kebab-cli/tests/wire_bulk_search.rs
Normal file
174
crates/kebab-cli/tests/wire_bulk_search.rs
Normal file
@@ -0,0 +1,174 @@
|
||||
//! p9-fb-42: integration tests for `kebab search --bulk`.
|
||||
//!
|
||||
//! Lexical-only — no fastembed / no Ollama. Each test builds its own
|
||||
//! TempDir KB via `common::write_config` + `common::ingest` and drives
|
||||
//! `kebab search --bulk` through stdin. Verifies:
|
||||
//!
|
||||
//! - Two queries over stdin emit per-query ndjson `bulk_search_item.v1` lines.
|
||||
//! - Empty stdin returns empty results with zero summary.
|
||||
//! - Malformed ndjson exits with code 2 (config_invalid).
|
||||
//! - Input over the 100-item cap fails with "max 100" error message.
|
||||
//! - Invalid item field (e.g. bad `mode`) emits per-item error and continues.
|
||||
|
||||
mod common;
|
||||
|
||||
use serde_json::Value;
|
||||
use std::fs;
|
||||
use std::io::Write;
|
||||
use std::process::{Command, Stdio};
|
||||
|
||||
fn cargo_bin() -> &'static str {
|
||||
env!("CARGO_BIN_EXE_kebab")
|
||||
}
|
||||
|
||||
fn run_bulk_with_stdin(cfg: &std::path::Path, stdin_body: &str, json: bool) -> std::process::Output {
|
||||
let mut cmd = Command::new(cargo_bin());
|
||||
cmd.arg("--config").arg(cfg).arg("search").arg("--bulk");
|
||||
if json {
|
||||
cmd.arg("--json");
|
||||
}
|
||||
cmd.stdin(Stdio::piped())
|
||||
.stdout(Stdio::piped())
|
||||
.stderr(Stdio::piped());
|
||||
let mut child = cmd.spawn().expect("spawn kebab");
|
||||
{
|
||||
let mut sin = child.stdin.take().expect("stdin");
|
||||
sin.write_all(stdin_body.as_bytes()).expect("write stdin");
|
||||
}
|
||||
child.wait_with_output().expect("wait")
|
||||
}
|
||||
|
||||
fn seed_workspace(workspace: &std::path::Path) {
|
||||
fs::write(workspace.join("a.md"), "# Alpha\n\nrust async hello").unwrap();
|
||||
fs::write(workspace.join("b.md"), "# Bravo\n\nbread and kebab").unwrap();
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 1: Two queries over stdin emit per-query ndjson
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn two_query_bulk_emits_per_query_ndjson() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let (cfg, workspace, _data) = common::write_config(dir.path(), 0);
|
||||
seed_workspace(&workspace);
|
||||
common::ingest(&cfg, &workspace);
|
||||
|
||||
let out = run_bulk_with_stdin(
|
||||
&cfg,
|
||||
"{\"query\":\"rust\",\"mode\":\"lexical\"}\n{\"query\":\"kebab\",\"mode\":\"lexical\"}\n",
|
||||
true,
|
||||
);
|
||||
assert!(
|
||||
out.status.success(),
|
||||
"stderr: {}",
|
||||
String::from_utf8_lossy(&out.stderr)
|
||||
);
|
||||
let stdout = String::from_utf8_lossy(&out.stdout);
|
||||
let lines: Vec<&str> = stdout.lines().filter(|l| !l.trim().is_empty()).collect();
|
||||
assert_eq!(lines.len(), 2, "expected 2 ndjson lines, got {lines:?}");
|
||||
for line in &lines {
|
||||
let v: Value = serde_json::from_str(line).expect("valid JSON line");
|
||||
assert_eq!(v["schema_version"], "bulk_search_item.v1");
|
||||
assert!(v["response"].is_object());
|
||||
assert!(v["error"].is_null());
|
||||
}
|
||||
let stderr = String::from_utf8_lossy(&out.stderr);
|
||||
assert!(
|
||||
stderr.contains("bulk_summary: total=2 succeeded=2 failed=0"),
|
||||
"stderr summary missing: {stderr}"
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 2: Empty stdin returns empty results with zero summary
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn empty_stdin_returns_empty_results_with_zero_summary() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let (cfg, workspace, _data) = common::write_config(dir.path(), 0);
|
||||
seed_workspace(&workspace);
|
||||
common::ingest(&cfg, &workspace);
|
||||
|
||||
let out = run_bulk_with_stdin(&cfg, "", true);
|
||||
assert!(out.status.success());
|
||||
let stdout = String::from_utf8_lossy(&out.stdout);
|
||||
assert!(stdout.trim().is_empty(), "expected empty stdout, got: {stdout}");
|
||||
let stderr = String::from_utf8_lossy(&out.stderr);
|
||||
assert!(stderr.contains("bulk_summary: total=0 succeeded=0 failed=0"));
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 3: Malformed ndjson line emits config_invalid exit 2
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn malformed_ndjson_line_emits_config_invalid_exit_2() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let (cfg, workspace, _data) = common::write_config(dir.path(), 0);
|
||||
seed_workspace(&workspace);
|
||||
common::ingest(&cfg, &workspace);
|
||||
|
||||
let out = run_bulk_with_stdin(&cfg, "not json\n", true);
|
||||
assert_eq!(out.status.code(), Some(2), "expected exit 2");
|
||||
let stderr = String::from_utf8_lossy(&out.stderr);
|
||||
assert!(
|
||||
stderr.contains("config_invalid") || stderr.contains("parse error"),
|
||||
"expected config_invalid or parse error in stderr: {stderr}"
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 4: Over cap input (>100) emits error
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn over_cap_input_emits_error() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let (cfg, workspace, _data) = common::write_config(dir.path(), 0);
|
||||
seed_workspace(&workspace);
|
||||
common::ingest(&cfg, &workspace);
|
||||
|
||||
let body: String = (0..101)
|
||||
.map(|_| "{\"query\":\"x\",\"mode\":\"lexical\"}\n")
|
||||
.collect();
|
||||
let out = run_bulk_with_stdin(&cfg, &body, true);
|
||||
// bulk_search_with_config returns Err — surfaces as exit 1 (anyhow chain)
|
||||
// or 2 if classified by error_wire. Accept either, but message must mention `max 100`.
|
||||
assert!(out.status.code().is_some());
|
||||
let stderr = String::from_utf8_lossy(&out.stderr);
|
||||
assert!(
|
||||
stderr.contains("max 100"),
|
||||
"expected 'max 100' in stderr: {stderr}"
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 5: Invalid item field (bad mode) emits per-item error and continues
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn invalid_item_field_emits_per_item_error_continues() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let (cfg, workspace, _data) = common::write_config(dir.path(), 0);
|
||||
seed_workspace(&workspace);
|
||||
common::ingest(&cfg, &workspace);
|
||||
|
||||
let out = run_bulk_with_stdin(
|
||||
&cfg,
|
||||
"{\"query\":\"rust\",\"mode\":\"lexical\"}\n{\"query\":\"x\",\"mode\":\"bogus\"}\n",
|
||||
true,
|
||||
);
|
||||
assert!(out.status.success());
|
||||
let stdout = String::from_utf8_lossy(&out.stdout);
|
||||
let lines: Vec<&str> = stdout.lines().filter(|l| !l.trim().is_empty()).collect();
|
||||
assert_eq!(lines.len(), 2);
|
||||
let v0: Value = serde_json::from_str(lines[0]).unwrap();
|
||||
let v1: Value = serde_json::from_str(lines[1]).unwrap();
|
||||
assert!(v0["error"].is_null());
|
||||
assert!(v1["error"].is_object());
|
||||
assert_eq!(v1["error"]["code"], "invalid_input");
|
||||
let stderr = String::from_utf8_lossy(&out.stderr);
|
||||
assert!(stderr.contains("succeeded=1 failed=1"));
|
||||
}
|
||||
@@ -302,9 +302,9 @@ impl Config {
|
||||
models: ModelsCfg {
|
||||
embedding: EmbeddingModelCfg {
|
||||
provider: "fastembed".to_string(),
|
||||
model: "multilingual-e5-small".to_string(),
|
||||
model: "multilingual-e5-large".to_string(),
|
||||
version: "v1".to_string(),
|
||||
dimensions: 384,
|
||||
dimensions: 1024,
|
||||
batch_size: 64,
|
||||
},
|
||||
llm: LlmCfg {
|
||||
@@ -764,7 +764,8 @@ mod tests {
|
||||
let c = Config::defaults();
|
||||
assert_eq!(c.rag.score_gate, 0.30);
|
||||
assert_eq!(c.chunking.target_tokens, 500);
|
||||
assert_eq!(c.models.embedding.dimensions, 384);
|
||||
assert_eq!(c.models.embedding.model, "multilingual-e5-large");
|
||||
assert_eq!(c.models.embedding.dimensions, 1024);
|
||||
assert_eq!(c.search.rrf_k, 60);
|
||||
}
|
||||
|
||||
@@ -947,9 +948,9 @@ chunker_version = "md-heading-v1"
|
||||
|
||||
[models.embedding]
|
||||
provider = "fastembed"
|
||||
model = "multilingual-e5-small"
|
||||
model = "multilingual-e5-large"
|
||||
version = "v1"
|
||||
dimensions = 384
|
||||
dimensions = 1024
|
||||
batch_size = 64
|
||||
|
||||
[models.llm]
|
||||
|
||||
@@ -51,9 +51,9 @@ pub use metadata::{
|
||||
TrustLevel,
|
||||
};
|
||||
pub use search::{
|
||||
DocFilter, DocSummary, IndexBytes, MEDIA_KINDS, RetrievalDetail, ScoreKind, SearchFilters, SearchHit,
|
||||
SearchMode, SearchOpts, SearchQuery, SearchTrace, TraceCandidate, TraceFusionInput,
|
||||
TraceTiming,
|
||||
BulkSearchItem, BulkSearchResponse, BulkSearchSummary, DocFilter, DocSummary, IndexBytes, MEDIA_KINDS,
|
||||
RetrievalDetail, ScoreKind, SearchFilters, SearchHit, SearchMode, SearchOpts, SearchQuery, SearchTrace,
|
||||
TraceCandidate, TraceFusionInput, TraceTiming,
|
||||
};
|
||||
pub use answer::{
|
||||
Answer, AnswerCitation, AnswerRetrievalSummary, ModelRef, RefusalReason, TokenUsage,
|
||||
|
||||
@@ -196,6 +196,32 @@ pub struct IndexBytes {
|
||||
pub lancedb: u64,
|
||||
}
|
||||
|
||||
/// p9-fb-42: per-query result in bulk search. `response` XOR `error` —
|
||||
/// exactly one is `Some`. `query` is the input echo (raw JSON value)
|
||||
/// so consumers can correlate input to output without index tracking.
|
||||
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
|
||||
pub struct BulkSearchItem {
|
||||
pub query: serde_json::Value,
|
||||
pub response: Option<serde_json::Value>,
|
||||
pub error: Option<serde_json::Value>,
|
||||
}
|
||||
|
||||
/// p9-fb-42: bulk summary counts. Invariant: total == succeeded + failed.
|
||||
#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct BulkSearchSummary {
|
||||
pub total: u32,
|
||||
pub succeeded: u32,
|
||||
pub failed: u32,
|
||||
}
|
||||
|
||||
/// p9-fb-42: MCP-only envelope. CLI emits raw ndjson without envelope.
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct BulkSearchResponse {
|
||||
pub schema_version: String,
|
||||
pub results: Vec<BulkSearchItem>,
|
||||
pub summary: BulkSearchSummary,
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
@@ -356,4 +382,51 @@ mod tests {
|
||||
let hit: SearchHit = serde_json::from_value(json).unwrap();
|
||||
assert_eq!(hit.score_kind, ScoreKind::Rrf);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bulk_search_summary_serde_roundtrip() {
|
||||
let s = BulkSearchSummary {
|
||||
total: 5,
|
||||
succeeded: 4,
|
||||
failed: 1,
|
||||
};
|
||||
let v = serde_json::to_value(s).unwrap();
|
||||
assert_eq!(v["total"], 5);
|
||||
assert_eq!(v["succeeded"], 4);
|
||||
assert_eq!(v["failed"], 1);
|
||||
let back: BulkSearchSummary = serde_json::from_value(v).unwrap();
|
||||
assert_eq!(back, s);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bulk_search_summary_default_is_zeros() {
|
||||
let s = BulkSearchSummary::default();
|
||||
assert_eq!(s.total, 0);
|
||||
assert_eq!(s.succeeded, 0);
|
||||
assert_eq!(s.failed, 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bulk_search_item_serde_response_variant() {
|
||||
let item = BulkSearchItem {
|
||||
query: serde_json::json!({"query": "rust"}),
|
||||
response: Some(serde_json::json!({"hits": []})),
|
||||
error: None,
|
||||
};
|
||||
let v = serde_json::to_value(&item).unwrap();
|
||||
assert!(v["response"].is_object());
|
||||
assert!(v["error"].is_null());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bulk_search_item_serde_error_variant() {
|
||||
let item = BulkSearchItem {
|
||||
query: serde_json::json!({"query": "rust"}),
|
||||
response: None,
|
||||
error: Some(serde_json::json!({"code": "config_invalid", "message": "bad"})),
|
||||
};
|
||||
let v = serde_json::to_value(&item).unwrap();
|
||||
assert!(v["response"].is_null());
|
||||
assert_eq!(v["error"]["code"], "config_invalid");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -5,14 +5,14 @@ edition = { workspace = true }
|
||||
rust-version = { workspace = true }
|
||||
license = { workspace = true }
|
||||
repository = { workspace = true }
|
||||
description = "Local fastembed-rs adapter implementing kb_core::Embedder (multilingual-e5-small default)"
|
||||
description = "Local fastembed-rs adapter implementing kb_core::Embedder (multilingual-e5-large default, e5-small backwards-compat)"
|
||||
|
||||
[dependencies]
|
||||
kebab-config = { path = "../kebab-config" }
|
||||
kebab-embed = { path = "../kebab-embed" }
|
||||
# Default features bring `ort-download-binaries` (bundled ONNX runtime)
|
||||
# and `hf-hub-native-tls` (first-run model download). No extra features
|
||||
# needed for the multilingual-e5-small path.
|
||||
# needed for the multilingual-e5-{small,large} paths.
|
||||
fastembed = { workspace = true }
|
||||
tracing = { workspace = true }
|
||||
anyhow = { workspace = true }
|
||||
|
||||
@@ -1,8 +1,9 @@
|
||||
//! `kb-embed-local` — `FastembedEmbedder`, a local ONNX-backed
|
||||
//! [`Embedder`](kebab_embed::Embedder) implementation.
|
||||
//!
|
||||
//! Wraps [`fastembed::TextEmbedding`] for the default `multilingual-e5-small`
|
||||
//! (384-dim) model. Honors `config.models.embedding.batch_size` and applies
|
||||
//! Wraps [`fastembed::TextEmbedding`]. Default is `multilingual-e5-large`
|
||||
//! (1024-dim, p9-fb-39b); `multilingual-e5-small` (384-dim) is also supported
|
||||
//! for backwards-compat. Honors `config.models.embedding.batch_size` and applies
|
||||
//! the e5 prefix convention (§11.3 of the design report):
|
||||
//!
|
||||
//! * `EmbeddingKind::Document` → `"passage: "` prefix
|
||||
@@ -69,9 +70,9 @@ impl FastembedEmbedder {
|
||||
.with_context(|| format!("create fastembed cache dir {}", cache_dir.display()))?;
|
||||
|
||||
// 2. Resolve the fastembed enum variant from
|
||||
// `config.models.embedding.model`. Currently only the default
|
||||
// `multilingual-e5-small` is wired; other model names error
|
||||
// out with a clear message rather than silently misconfiguring.
|
||||
// `config.models.embedding.model`. Currently `multilingual-e5-large`
|
||||
// (default) and `multilingual-e5-small` are wired; other model names
|
||||
// error out with a clear message rather than silently misconfiguring.
|
||||
let model_name = resolve_model(&config.models.embedding.model)?;
|
||||
|
||||
// 3. Verify dim match BEFORE loading the model — if the config
|
||||
@@ -100,7 +101,7 @@ impl FastembedEmbedder {
|
||||
target: "kebab-embed-local",
|
||||
model = %config.models.embedding.model,
|
||||
cache_dir = %cache_dir.display(),
|
||||
"loading embedding model (first run will download ~470MB)"
|
||||
"loading embedding model (first run downloads model weights — ~470MB for e5-small, ~1.3GB for e5-large)"
|
||||
);
|
||||
let inner = TextEmbedding::try_new(opts)
|
||||
.context("fastembed: TextEmbedding::try_new")?;
|
||||
@@ -193,17 +194,18 @@ fn prefix_input(input: &EmbeddingInput<'_>) -> String {
|
||||
}
|
||||
|
||||
/// Resolve a `config.models.embedding.model` string to a fastembed
|
||||
/// `EmbeddingModel` enum variant. Only `multilingual-e5-small` is wired
|
||||
/// for p3-2; additional model names should be added (and their dims
|
||||
/// pinned in tests) as needed.
|
||||
/// `EmbeddingModel` enum variant. Currently supports `multilingual-e5-small`
|
||||
/// (384-dim) and `multilingual-e5-large` (1024-dim); additional model names
|
||||
/// should be added (and their dims pinned in tests) as needed.
|
||||
fn resolve_model(name: &str) -> Result<EmbeddingModel> {
|
||||
match name {
|
||||
"multilingual-e5-small" => Ok(EmbeddingModel::MultilingualE5Small),
|
||||
"multilingual-e5-large" => Ok(EmbeddingModel::MultilingualE5Large),
|
||||
other => anyhow::bail!(
|
||||
"kb-embed-local: unsupported embedding model {other:?}; \
|
||||
this adapter currently only ships `multilingual-e5-small`. \
|
||||
Add a new arm to `resolve_model` (and a fastembed feature \
|
||||
flag if needed) to support more."
|
||||
this adapter currently ships `multilingual-e5-small` and \
|
||||
`multilingual-e5-large`. Add a new arm to `resolve_model` \
|
||||
(and a fastembed feature flag if needed) to support more."
|
||||
),
|
||||
}
|
||||
}
|
||||
@@ -294,6 +296,12 @@ mod tests {
|
||||
resolve_model("multilingual-e5-small").expect("default model resolves");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn resolve_model_supports_e5_large() {
|
||||
let m = resolve_model("multilingual-e5-large").expect("e5-large should resolve");
|
||||
let _ = m;
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn resolve_unknown_model_errors() {
|
||||
let err = resolve_model("not-a-real-model").expect_err("unknown model errors");
|
||||
@@ -301,6 +309,21 @@ mod tests {
|
||||
assert!(msg.contains("unsupported embedding model"), "msg={msg}");
|
||||
}
|
||||
|
||||
// ── check_dim ────────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn check_dim_passes_for_1024() {
|
||||
check_dim(1024, 1024).expect("matching dims must pass");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn check_dim_rejects_384_vs_1024() {
|
||||
let err = check_dim(384, 1024).expect_err("dim mismatch must error");
|
||||
let msg = format!("{err}");
|
||||
assert!(msg.contains("384") && msg.contains("1024"),
|
||||
"error must mention both dims, got: {msg}");
|
||||
}
|
||||
|
||||
// expand_path tests live in `kb-config::paths`. The adapter imports
|
||||
// it and trusts the upstream coverage rather than duplicating it.
|
||||
}
|
||||
|
||||
@@ -3,10 +3,11 @@
|
||||
//!
|
||||
//! ## Why every test in this file is `#[ignore]`
|
||||
//!
|
||||
//! The first call to `FastembedEmbedder::new` downloads ~470 MB of
|
||||
//! weights from Hugging Face into `data_dir/models/fastembed/`. Doing
|
||||
//! that on every `cargo test` invocation is wasteful, so the bare
|
||||
//! invocation skips this file entirely.
|
||||
//! The first call to `FastembedEmbedder::new` downloads ~1.3 GB of
|
||||
//! weights (multilingual-e5-large per p9-fb-39b default) from Hugging
|
||||
//! Face into `data_dir/models/fastembed/`. Doing that on every
|
||||
//! `cargo test` invocation is wasteful, so the bare invocation skips
|
||||
//! this file entirely.
|
||||
//!
|
||||
//! Run the full suite with:
|
||||
//! ```text
|
||||
@@ -58,19 +59,20 @@ fn shared_embedder() -> &'static FastembedEmbedder {
|
||||
// ─── construction ─────────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
#[ignore = "downloads ~470MB ONNX model on first run; CI-only"]
|
||||
fn default_config_constructs_with_dims_384() {
|
||||
#[ignore = "downloads ~1.3GB ONNX model on first run; CI-only"]
|
||||
fn default_config_constructs_with_dims_1024() {
|
||||
// p9-fb-39b: default flipped to multilingual-e5-large (1024 dim).
|
||||
let emb = shared_embedder();
|
||||
assert_eq!(emb.dimensions(), 384);
|
||||
assert_eq!(emb.model_id().0, "multilingual-e5-small");
|
||||
assert_eq!(emb.dimensions(), 1024);
|
||||
assert_eq!(emb.model_id().0, "multilingual-e5-large");
|
||||
assert_eq!(emb.model_version().0, "v1");
|
||||
}
|
||||
|
||||
#[test]
|
||||
#[ignore = "downloads ~470MB ONNX model on first run; CI-only"]
|
||||
#[ignore = "downloads ~1.3GB ONNX model on first run; CI-only"]
|
||||
fn mismatched_dims_in_config_errors_at_construction() {
|
||||
let (mut cfg, _tmp) = test_config();
|
||||
cfg.models.embedding.dimensions = 512; // model is 384
|
||||
cfg.models.embedding.dimensions = 512; // model is 1024 (e5-large default)
|
||||
// `FastembedEmbedder` deliberately does not implement `Debug`
|
||||
// (its inner ONNX session has no useful debug shape), so we
|
||||
// can't use `expect_err`; match the Result manually.
|
||||
@@ -80,7 +82,7 @@ fn mismatched_dims_in_config_errors_at_construction() {
|
||||
};
|
||||
let msg = format!("{err}");
|
||||
assert!(msg.contains("dimension mismatch"), "msg={msg}");
|
||||
assert!(msg.contains("384"), "msg={msg}");
|
||||
assert!(msg.contains("1024"), "msg={msg}");
|
||||
assert!(msg.contains("512"), "msg={msg}");
|
||||
}
|
||||
|
||||
@@ -104,8 +106,8 @@ fn document_and_query_yield_different_vectors() {
|
||||
])
|
||||
.expect("embed two inputs");
|
||||
assert_eq!(out.len(), 2);
|
||||
assert_eq!(out[0].len(), 384);
|
||||
assert_eq!(out[1].len(), 384);
|
||||
assert_eq!(out[0].len(), 1024);
|
||||
assert_eq!(out[1].len(), 1024);
|
||||
|
||||
// Both vectors are L2-normalized → cosine similarity == dot product.
|
||||
let cos: f32 = out[0]
|
||||
@@ -142,11 +144,11 @@ fn output_vectors_are_l2_normalized() {
|
||||
];
|
||||
let out = emb.embed(&inputs).expect("embed");
|
||||
// Per `kebab_embed::assert_unit_norm` docs: `5e-4` is the safe bound at
|
||||
// 384 dims (f32::EPSILON × √384 ≈ 2.3e-6, but ONNX kernels add
|
||||
// 1024 dims (f32::EPSILON × √1024 ≈ 2.3e-6, but ONNX kernels add
|
||||
// their own per-component noise; 1e-3 is very generous and matches
|
||||
// the spec's `± 1e-3`).
|
||||
kebab_embed::assert_unit_norm(&out, 1e-3);
|
||||
kebab_embed::assert_vector_shape(&out, 384);
|
||||
kebab_embed::assert_vector_shape(&out, 1024);
|
||||
}
|
||||
|
||||
// ─── determinism ──────────────────────────────────────────────────────
|
||||
@@ -254,7 +256,7 @@ fn snapshot_aggregate_hash_is_stable() {
|
||||
// Round every component to 4 decimal places, hash deterministically.
|
||||
let mut hasher = DefaultHasher::new();
|
||||
for (i, v) in out.iter().enumerate() {
|
||||
assert_eq!(v.len(), 384, "row {i} dim mismatch");
|
||||
assert_eq!(v.len(), 1024, "row {i} dim mismatch");
|
||||
for x in v {
|
||||
let rounded: i32 = (*x * 1.0e4).round() as i32;
|
||||
rounded.hash(&mut hasher);
|
||||
|
||||
@@ -184,6 +184,18 @@ pub fn render_report_md(report: &CompareReport) -> String {
|
||||
),
|
||||
);
|
||||
}
|
||||
for k in crate::metrics::TOP_K_VARIANTS {
|
||||
let _ = writeln!(
|
||||
out,
|
||||
"| precision@{k}_chunk | {} | {} | {} |",
|
||||
fmt(a.precision_at_k_chunk.get(k).copied().unwrap_or(f32::NAN)),
|
||||
fmt(b.precision_at_k_chunk.get(k).copied().unwrap_or(f32::NAN)),
|
||||
fmt_delta(
|
||||
a.precision_at_k_chunk.get(k).copied().unwrap_or(f32::NAN),
|
||||
b.precision_at_k_chunk.get(k).copied().unwrap_or(f32::NAN),
|
||||
),
|
||||
);
|
||||
}
|
||||
let _ = writeln!(
|
||||
out,
|
||||
"| citation_coverage | {} | {} | {} |",
|
||||
@@ -419,6 +431,7 @@ fn build_deltas(
|
||||
}
|
||||
let mut hit = serde_json::Map::new();
|
||||
let mut recall = serde_json::Map::new();
|
||||
let mut precision = serde_json::Map::new();
|
||||
for k in crate::metrics::TOP_K_VARIANTS {
|
||||
hit.insert(
|
||||
k.to_string(),
|
||||
@@ -434,11 +447,19 @@ fn build_deltas(
|
||||
b.recall_at_k_doc.get(k).copied().unwrap_or(f32::NAN),
|
||||
),
|
||||
);
|
||||
precision.insert(
|
||||
k.to_string(),
|
||||
d(
|
||||
a.precision_at_k_chunk.get(k).copied().unwrap_or(f32::NAN),
|
||||
b.precision_at_k_chunk.get(k).copied().unwrap_or(f32::NAN),
|
||||
),
|
||||
);
|
||||
}
|
||||
serde_json::json!({
|
||||
"hit_at_k": hit,
|
||||
"mrr": d(a.mrr, b.mrr),
|
||||
"recall_at_k_doc": recall,
|
||||
"precision_at_k_chunk": precision,
|
||||
"citation_coverage": d(a.citation_coverage, b.citation_coverage),
|
||||
"groundedness": d(a.groundedness, b.groundedness),
|
||||
"empty_result_rate": d(a.empty_result_rate, b.empty_result_rate),
|
||||
@@ -484,6 +505,7 @@ mod tests {
|
||||
hit_at_k: Default::default(),
|
||||
mrr: 0.5,
|
||||
recall_at_k_doc: Default::default(),
|
||||
precision_at_k_chunk: Default::default(),
|
||||
citation_coverage: f32::NAN,
|
||||
groundedness: 0.0,
|
||||
empty_result_rate: 0.0,
|
||||
|
||||
@@ -58,6 +58,14 @@ pub struct AggregateMetrics {
|
||||
pub hit_at_k: BTreeMap<u32, f32>,
|
||||
pub mrr: f32,
|
||||
pub recall_at_k_doc: BTreeMap<u32, f32>,
|
||||
/// p9-fb-39: chunk-level precision at k. Binary relevance via
|
||||
/// `expected_chunk_ids` (a hit is "relevant" if its chunk_id is
|
||||
/// in the golden's `expected_chunk_ids`). Denominator is k (fixed)
|
||||
/// — `hits.len() < k` still divides by k, treating shortfall as
|
||||
/// precision loss (mirrors `hit_at_k`). Queries with empty
|
||||
/// `expected_chunk_ids` are skipped (mirrors `hit_at_k_chunk`).
|
||||
#[serde(default)]
|
||||
pub precision_at_k_chunk: BTreeMap<u32, f32>,
|
||||
#[serde(
|
||||
serialize_with = "serialize_f32_nan_as_null",
|
||||
deserialize_with = "deserialize_f32_or_nan"
|
||||
@@ -187,6 +195,8 @@ pub(crate) fn aggregate_from_rows(
|
||||
TOP_K_VARIANTS.iter().map(|k| (*k, (0_u32, 0_u32))).collect();
|
||||
let mut recall_at_k_doc: BTreeMap<u32, (f64, u32)> =
|
||||
TOP_K_VARIANTS.iter().map(|k| (*k, (0.0_f64, 0_u32))).collect();
|
||||
let mut precision_at_k_chunk: BTreeMap<u32, (f64, u32)> =
|
||||
TOP_K_VARIANTS.iter().map(|k| (*k, (0.0_f64, 0_u32))).collect();
|
||||
|
||||
let mut mrr_sum: f64 = 0.0;
|
||||
let mut mrr_denom: u32 = 0;
|
||||
@@ -243,6 +253,18 @@ pub(crate) fn aggregate_from_rows(
|
||||
{
|
||||
mrr_sum += 1.0 / f64::from(rank);
|
||||
}
|
||||
// p9-fb-39: precision@k_chunk — count of top-k hits whose
|
||||
// chunk_id is in `expected`, divided by k (fixed denominator).
|
||||
for k in TOP_K_VARIANTS {
|
||||
let hits_in_topk_relevant = qr
|
||||
.hits_top_k
|
||||
.iter()
|
||||
.filter(|h| h.rank <= *k && expected.contains(&h.chunk_id))
|
||||
.count();
|
||||
let entry = precision_at_k_chunk.get_mut(k).expect("init");
|
||||
entry.0 += hits_in_topk_relevant as f64 / f64::from(*k);
|
||||
entry.1 += 1;
|
||||
}
|
||||
}
|
||||
|
||||
// recall@k_doc (doc-level, requires non-empty expected_doc_ids
|
||||
@@ -333,6 +355,7 @@ pub(crate) fn aggregate_from_rows(
|
||||
mrr_sum / f64::from(mrr_denom)
|
||||
}),
|
||||
recall_at_k_doc: round_recall_map(&recall_at_k_doc),
|
||||
precision_at_k_chunk: round_recall_map(&precision_at_k_chunk),
|
||||
citation_coverage: ratio_or_nan(citation_num, citation_denom),
|
||||
groundedness: ratio_or_zero(groundedness_num, groundedness_denom),
|
||||
empty_result_rate: ratio_or_zero(empty_result_count, total_queries),
|
||||
@@ -674,4 +697,114 @@ mod tests {
|
||||
assert_eq!(agg.failed_queries, 1);
|
||||
assert_eq!(agg.total_queries, 1);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn precision_at_k_chunk_field_default_empty_on_old_json() {
|
||||
// Old eval_runs.metrics_json predates fb-39 — no precision_at_k_chunk field.
|
||||
// serde(default) yields empty BTreeMap.
|
||||
let old = serde_json::json!({
|
||||
"hit_at_k": {"1": 0.5, "3": 0.5, "5": 0.5, "10": 0.5},
|
||||
"mrr": 0.5,
|
||||
"recall_at_k_doc": {"1": 0.0, "3": 0.0, "5": 0.0, "10": 0.0},
|
||||
"citation_coverage": null,
|
||||
"groundedness": 0.0,
|
||||
"empty_result_rate": 0.0,
|
||||
"refusal_correctness": null,
|
||||
"total_queries": 1,
|
||||
"failed_queries": 0
|
||||
});
|
||||
let parsed: AggregateMetrics =
|
||||
serde_json::from_value(old).expect("backwards-compat deserialize");
|
||||
assert!(parsed.precision_at_k_chunk.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn precision_at_k_chunk_exact_match() {
|
||||
// expected = [c1, c2, c3]. Top-5 hits: [c1@1, c2@2, c3@3, x@4, y@5].
|
||||
// P@5 = 3/5 = 0.6. P@10 = 3/10 = 0.3.
|
||||
let queries = vec![gq("q1", &["c1", "c2", "c3"], &["d1"])];
|
||||
let rows = vec![record(
|
||||
"q1",
|
||||
vec![
|
||||
hit(1, "c1", "d1"),
|
||||
hit(2, "c2", "d1"),
|
||||
hit(3, "c3", "d1"),
|
||||
hit(4, "x", "d1"),
|
||||
hit(5, "y", "d1"),
|
||||
],
|
||||
None,
|
||||
None,
|
||||
)];
|
||||
let agg = aggregate_from_rows(&queries, &rows).unwrap();
|
||||
assert_eq!(agg.precision_at_k_chunk[&5], 0.6);
|
||||
assert_eq!(agg.precision_at_k_chunk[&10], 0.3);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn precision_at_k_chunk_partial_topk_divides_by_k() {
|
||||
// expected = [c1, c2]. Hits: only [c1@1, c2@2, x@3] (3 results).
|
||||
// P@5 = 2/5 = 0.4 (denominator is k, not hits.len()).
|
||||
let queries = vec![gq("q1", &["c1", "c2"], &["d1"])];
|
||||
let rows = vec![record(
|
||||
"q1",
|
||||
vec![hit(1, "c1", "d1"), hit(2, "c2", "d1"), hit(3, "x", "d1")],
|
||||
None,
|
||||
None,
|
||||
)];
|
||||
let agg = aggregate_from_rows(&queries, &rows).unwrap();
|
||||
assert_eq!(agg.precision_at_k_chunk[&5], 0.4);
|
||||
assert_eq!(agg.precision_at_k_chunk[&10], 0.2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn precision_at_k_chunk_zero_relevant_in_topk() {
|
||||
// expected = [c1]. Hits: [x@1, y@2, z@3] (none relevant).
|
||||
// P@5 = 0/5 = 0.0.
|
||||
let queries = vec![gq("q1", &["c1"], &["d1"])];
|
||||
let rows = vec![record(
|
||||
"q1",
|
||||
vec![hit(1, "x", "d1"), hit(2, "y", "d1"), hit(3, "z", "d1")],
|
||||
None,
|
||||
None,
|
||||
)];
|
||||
let agg = aggregate_from_rows(&queries, &rows).unwrap();
|
||||
assert_eq!(agg.precision_at_k_chunk[&5], 0.0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn precision_at_k_chunk_empty_expected_skipped() {
|
||||
// expected_chunk_ids = []. Skipped → final BTreeMap entry value = 0.0
|
||||
// (zero-denom path in round_recall_map). Mirrors recall_at_k_doc behavior.
|
||||
let queries = vec![gq("q1", &[], &["d1"])];
|
||||
let rows = vec![record("q1", vec![hit(1, "c1", "d1")], None, None)];
|
||||
let agg = aggregate_from_rows(&queries, &rows).unwrap();
|
||||
assert_eq!(agg.precision_at_k_chunk[&5], 0.0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn precision_at_k_chunk_two_queries_averaged() {
|
||||
// q1: expected=[c1], hits=[c1@1, x@2, y@3] → P@5 = 1/5 = 0.2
|
||||
// q2: expected=[c1, c2], hits=[c1@1, c2@2] → P@5 = 2/5 = 0.4
|
||||
// Avg P@5 = 0.3.
|
||||
let queries = vec![
|
||||
gq("q1", &["c1"], &["d1"]),
|
||||
gq("q2", &["c1", "c2"], &["d2"]),
|
||||
];
|
||||
let rows = vec![
|
||||
record(
|
||||
"q1",
|
||||
vec![hit(1, "c1", "d1"), hit(2, "x", "d1"), hit(3, "y", "d1")],
|
||||
None,
|
||||
None,
|
||||
),
|
||||
record(
|
||||
"q2",
|
||||
vec![hit(1, "c1", "d2"), hit(2, "c2", "d2")],
|
||||
None,
|
||||
None,
|
||||
),
|
||||
];
|
||||
let agg = aggregate_from_rows(&queries, &rows).unwrap();
|
||||
assert_eq!(agg.precision_at_k_chunk[&5], 0.3);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -11,6 +11,12 @@
|
||||
"5": 0.666700005531311
|
||||
},
|
||||
"mrr": 0.41670000553131104,
|
||||
"precision_at_k_chunk": {
|
||||
"1": 0.33329999446868896,
|
||||
"10": 0.06669999659061432,
|
||||
"3": 0.11110000312328339,
|
||||
"5": 0.13330000638961792
|
||||
},
|
||||
"recall_at_k_doc": {
|
||||
"1": 0.33329999446868896,
|
||||
"10": 0.666700005531311,
|
||||
@@ -32,6 +38,12 @@
|
||||
"5": 1.0
|
||||
},
|
||||
"mrr": 0.833299994468689,
|
||||
"precision_at_k_chunk": {
|
||||
"1": 0.666700005531311,
|
||||
"10": 0.10000000149011612,
|
||||
"3": 0.33329999446868896,
|
||||
"5": 0.20000000298023224
|
||||
},
|
||||
"recall_at_k_doc": {
|
||||
"1": 0.666700005531311,
|
||||
"10": 1.0,
|
||||
@@ -53,6 +65,12 @@
|
||||
"5": 0.33329999446868896
|
||||
},
|
||||
"mrr": 0.41659998893737793,
|
||||
"precision_at_k_chunk": {
|
||||
"1": 0.33340001106262207,
|
||||
"10": 0.0333000048995018,
|
||||
"3": 0.22219999134540558,
|
||||
"5": 0.06669999659061432
|
||||
},
|
||||
"recall_at_k_doc": {
|
||||
"1": 0.33340001106262207,
|
||||
"10": 0.33329999446868896,
|
||||
|
||||
@@ -203,6 +203,7 @@ fn store_aggregate_rejects_missing_run() {
|
||||
hit_at_k: Default::default(),
|
||||
mrr: 0.0,
|
||||
recall_at_k_doc: Default::default(),
|
||||
precision_at_k_chunk: Default::default(),
|
||||
citation_coverage: f32::NAN,
|
||||
groundedness: 0.0,
|
||||
empty_result_rate: 0.0,
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
//! MCP (Model Context Protocol) server over stdio. Exposes 7 tools
|
||||
//! MCP (Model Context Protocol) server over stdio. Exposes 8 tools
|
||||
//! (`search` / `ask` / `schema` / `doctor` / `ingest_file` / `ingest_stdin`
|
||||
//! / `fetch`) backed by `kebab-app` facade methods. Used by `kebab-cli`'s
|
||||
//! `Cmd::Mcp` arm.
|
||||
//! / `fetch` / `bulk_search`) backed by `kebab-app` facade methods. Used by
|
||||
//! `kebab-cli`'s `Cmd::Mcp` arm.
|
||||
//!
|
||||
//! See spec `docs/superpowers/specs/2026-05-07-p9-fb-30-mcp-server-design.md`.
|
||||
|
||||
@@ -67,6 +67,11 @@ pub fn build_tools_vec() -> Vec<Tool> {
|
||||
"Verbatim fetch — chunk / doc / span modes. Returns fetch_result.v1 with the indexed text (no LLM rewrite).",
|
||||
schema_for_type::<tools::fetch::FetchInput>(),
|
||||
),
|
||||
Tool::new(
|
||||
"bulk_search",
|
||||
"Bulk multi-query search — N queries per call (cap 100). Each query mirrors the `search` input shape; returns `bulk_search_response.v1` with per-query results + summary. Sequential execution reuses one App instance so cache / embedder cold-start cost amortizes.",
|
||||
schema_for_type::<tools::bulk_search::BulkSearchInput>(),
|
||||
),
|
||||
]
|
||||
}
|
||||
|
||||
@@ -170,6 +175,13 @@ impl ServerHandler for KebabHandler {
|
||||
})
|
||||
.await
|
||||
}
|
||||
"bulk_search" => {
|
||||
let args = request.arguments.unwrap_or_default();
|
||||
self.spawn_tool(args, |state, input| {
|
||||
tools::bulk_search::handle(&state, input)
|
||||
})
|
||||
.await
|
||||
}
|
||||
_other => Err(ErrorData::method_not_found::<
|
||||
rmcp::model::CallToolRequestMethod,
|
||||
>()),
|
||||
|
||||
55
crates/kebab-mcp/src/tools/bulk_search.rs
Normal file
55
crates/kebab-mcp/src/tools/bulk_search.rs
Normal file
@@ -0,0 +1,55 @@
|
||||
//! `bulk_search` tool — wraps `kebab_app::bulk_search_with_config`.
|
||||
//! Input: `{ queries: [<SearchInput shape>, ...] }`.
|
||||
//! Output: `bulk_search_response.v1` envelope (results + summary).
|
||||
|
||||
use rmcp::model::CallToolResult;
|
||||
use schemars::JsonSchema;
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
use crate::error::{to_tool_error, to_tool_success};
|
||||
use crate::state::KebabAppState;
|
||||
|
||||
#[derive(Debug, Deserialize, Serialize, JsonSchema)]
|
||||
pub struct BulkSearchInput {
|
||||
/// Per-query inputs. Each item mirrors the single-query `search`
|
||||
/// tool's input shape — `query` is required, all other fields are
|
||||
/// optional and default to single-search defaults. Capped at 100
|
||||
/// items; exceeding returns an `invalid_input` tool error without
|
||||
/// running any query.
|
||||
pub queries: Vec<serde_json::Value>,
|
||||
}
|
||||
|
||||
pub fn handle(state: &KebabAppState, input: BulkSearchInput) -> CallToolResult {
|
||||
let cfg_clone = (*state.config).clone();
|
||||
match kebab_app::bulk_search_with_config(cfg_clone, input.queries) {
|
||||
Ok((items, summary)) => {
|
||||
let tagged_items: Vec<serde_json::Value> = items
|
||||
.iter()
|
||||
.map(|it| {
|
||||
let mut v = serde_json::to_value(it).unwrap_or(serde_json::Value::Null);
|
||||
if let serde_json::Value::Object(ref mut map) = v {
|
||||
map.insert(
|
||||
"schema_version".to_string(),
|
||||
serde_json::Value::String("bulk_search_item.v1".to_string()),
|
||||
);
|
||||
}
|
||||
v
|
||||
})
|
||||
.collect();
|
||||
let envelope = serde_json::json!({
|
||||
"schema_version": "bulk_search_response.v1",
|
||||
"results": tagged_items,
|
||||
"summary": {
|
||||
"total": summary.total,
|
||||
"succeeded": summary.succeeded,
|
||||
"failed": summary.failed,
|
||||
},
|
||||
});
|
||||
match serde_json::to_string(&envelope) {
|
||||
Ok(json) => to_tool_success(json),
|
||||
Err(e) => to_tool_error(&anyhow::anyhow!(e)),
|
||||
}
|
||||
}
|
||||
Err(e) => to_tool_error(&e),
|
||||
}
|
||||
}
|
||||
@@ -7,3 +7,4 @@ pub mod ask;
|
||||
pub mod ingest_file;
|
||||
pub mod ingest_stdin;
|
||||
pub mod fetch;
|
||||
pub mod bulk_search;
|
||||
|
||||
121
crates/kebab-mcp/tests/tools_call_bulk_search.rs
Normal file
121
crates/kebab-mcp/tests/tools_call_bulk_search.rs
Normal file
@@ -0,0 +1,121 @@
|
||||
//! p9-fb-42: integration tests for `mcp__kebab__bulk_search`.
|
||||
|
||||
use std::fs;
|
||||
|
||||
use kebab_config::Config;
|
||||
use kebab_core::SourceScope;
|
||||
use kebab_mcp::{KebabAppState, KebabHandler};
|
||||
use rmcp::model::RawContent;
|
||||
use serde_json::json;
|
||||
|
||||
fn minimal_config(data_dir: &std::path::Path, workspace_root: &std::path::Path) -> Config {
|
||||
let mut cfg = Config::defaults();
|
||||
cfg.storage.data_dir = data_dir.to_string_lossy().into_owned();
|
||||
cfg.storage.model_dir = data_dir.join("models").to_string_lossy().into_owned();
|
||||
cfg.workspace.root = workspace_root.to_string_lossy().into_owned();
|
||||
cfg.workspace.exclude.clear();
|
||||
cfg.models.embedding.provider = "none".to_string();
|
||||
cfg.models.embedding.dimensions = 0;
|
||||
cfg
|
||||
}
|
||||
|
||||
fn setup() -> (tempfile::TempDir, KebabHandler) {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let data_dir = dir.path().join("data");
|
||||
let workspace_root = dir.path().join("notes");
|
||||
fs::create_dir_all(&data_dir).unwrap();
|
||||
fs::create_dir_all(&workspace_root).unwrap();
|
||||
let config = minimal_config(&data_dir, &workspace_root);
|
||||
fs::write(
|
||||
workspace_root.join("a.md"),
|
||||
"# Alpha\n\nThis document mentions kebab and bread.",
|
||||
)
|
||||
.unwrap();
|
||||
let scope = SourceScope { root: workspace_root.clone(), include: vec![], exclude: vec![] };
|
||||
let _ = kebab_app::ingest_with_config(config.clone(), scope, false).unwrap();
|
||||
let state = KebabAppState::new(config, None);
|
||||
let handler = KebabHandler::new(state);
|
||||
(dir, handler)
|
||||
}
|
||||
|
||||
fn extract_json(result: &rmcp::model::CallToolResult) -> serde_json::Value {
|
||||
assert!(!result.is_error.unwrap_or(false), "expected isError=false, got {result:?}");
|
||||
let content = result.content.first().expect("at least one content item");
|
||||
let text = match &content.raw {
|
||||
RawContent::Text(t) => &t.text,
|
||||
other => panic!("expected Text content, got {other:?}"),
|
||||
};
|
||||
serde_json::from_str(text).expect("valid JSON")
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn bulk_search_two_queries_returns_envelope() {
|
||||
let (_dir, handler) = setup();
|
||||
let input = kebab_mcp::tools::bulk_search::BulkSearchInput {
|
||||
queries: vec![
|
||||
json!({"query": "kebab", "mode": "lexical", "k": 5}),
|
||||
json!({"query": "bread", "mode": "lexical", "k": 5}),
|
||||
],
|
||||
};
|
||||
let result = kebab_mcp::tools::bulk_search::handle(handler.state(), input);
|
||||
let v = extract_json(&result);
|
||||
assert_eq!(v["schema_version"], "bulk_search_response.v1");
|
||||
let results = v["results"].as_array().expect("results array");
|
||||
assert_eq!(results.len(), 2);
|
||||
for r in results {
|
||||
assert_eq!(r["schema_version"], "bulk_search_item.v1");
|
||||
assert!(r["response"].is_object());
|
||||
assert!(r["error"].is_null());
|
||||
}
|
||||
assert_eq!(v["summary"]["total"], 2);
|
||||
assert_eq!(v["summary"]["succeeded"], 2);
|
||||
assert_eq!(v["summary"]["failed"], 0);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn bulk_search_empty_queries_returns_empty_envelope() {
|
||||
let (_dir, handler) = setup();
|
||||
let input = kebab_mcp::tools::bulk_search::BulkSearchInput { queries: vec![] };
|
||||
let result = kebab_mcp::tools::bulk_search::handle(handler.state(), input);
|
||||
let v = extract_json(&result);
|
||||
assert_eq!(v["schema_version"], "bulk_search_response.v1");
|
||||
assert_eq!(v["results"].as_array().unwrap().len(), 0);
|
||||
assert_eq!(v["summary"]["total"], 0);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn bulk_search_invalid_item_field_continues_with_per_item_error() {
|
||||
let (_dir, handler) = setup();
|
||||
let input = kebab_mcp::tools::bulk_search::BulkSearchInput {
|
||||
queries: vec![
|
||||
json!({"query": "kebab", "mode": "lexical"}),
|
||||
json!({"query": "bread", "mode": "bogus"}), // invalid mode
|
||||
],
|
||||
};
|
||||
let result = kebab_mcp::tools::bulk_search::handle(handler.state(), input);
|
||||
let v = extract_json(&result);
|
||||
let results = v["results"].as_array().unwrap();
|
||||
assert_eq!(results.len(), 2);
|
||||
assert!(results[0]["error"].is_null());
|
||||
assert!(results[1]["error"].is_object());
|
||||
assert_eq!(results[1]["error"]["code"], "invalid_input");
|
||||
assert_eq!(v["summary"]["succeeded"], 1);
|
||||
assert_eq!(v["summary"]["failed"], 1);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn bulk_search_over_cap_returns_tool_error() {
|
||||
let (_dir, handler) = setup();
|
||||
let queries: Vec<serde_json::Value> = (0..101)
|
||||
.map(|_| json!({"query": "x", "mode": "lexical"}))
|
||||
.collect();
|
||||
let input = kebab_mcp::tools::bulk_search::BulkSearchInput { queries };
|
||||
let result = kebab_mcp::tools::bulk_search::handle(handler.state(), input);
|
||||
assert!(result.is_error.unwrap_or(false), "expected isError=true");
|
||||
let content = result.content.first().expect("error content");
|
||||
let text = match &content.raw {
|
||||
RawContent::Text(t) => &t.text,
|
||||
other => panic!("expected Text content, got {other:?}"),
|
||||
};
|
||||
assert!(text.contains("max 100"), "expected 'max 100' in error: {text}");
|
||||
}
|
||||
@@ -1,13 +1,13 @@
|
||||
//! Integration: `build_tools_vec` returns 7 tools with correct names and
|
||||
//! Integration: `build_tools_vec` returns 8 tools with correct names and
|
||||
//! inputSchema. Uses the extracted `pub fn build_tools_vec()` helper — no
|
||||
//! transport or RequestContext needed.
|
||||
|
||||
use kebab_mcp::build_tools_vec;
|
||||
|
||||
#[test]
|
||||
fn tools_list_returns_seven_tools() {
|
||||
fn tools_list_returns_eight_tools() {
|
||||
let tools = build_tools_vec();
|
||||
assert_eq!(tools.len(), 7, "expected exactly 7 tools, got {}", tools.len());
|
||||
assert_eq!(tools.len(), 8, "expected exactly 8 tools, got {}", tools.len());
|
||||
|
||||
let names: Vec<&str> = tools.iter().map(|t| t.name.as_ref()).collect();
|
||||
assert!(names.contains(&"schema"), "missing 'schema' tool");
|
||||
@@ -17,6 +17,7 @@ fn tools_list_returns_seven_tools() {
|
||||
assert!(names.contains(&"ingest_file"), "missing 'ingest_file' tool");
|
||||
assert!(names.contains(&"ingest_stdin"), "missing 'ingest_stdin' tool");
|
||||
assert!(names.contains(&"fetch"), "missing 'fetch' tool");
|
||||
assert!(names.contains(&"bulk_search"), "missing 'bulk_search' tool");
|
||||
}
|
||||
|
||||
#[test]
|
||||
|
||||
@@ -222,6 +222,23 @@ kebab --config /tmp/kebab-smoke/config.toml schema --json | jq .stats
|
||||
|
||||
Look for: `media_breakdown` (5 keys), `lang_breakdown`, `index_bytes`, `stale_doc_count`.
|
||||
|
||||
### Bulk multi-query (fb-42)
|
||||
|
||||
Stdin ndjson으로 N query 한 번에:
|
||||
|
||||
```bash
|
||||
printf '{"query":"rust","mode":"lexical"}\n{"query":"async","mode":"lexical"}\n' \
|
||||
| kebab --config /tmp/kebab-smoke/config.toml search --bulk --json
|
||||
```
|
||||
|
||||
stdout: per-query ndjson (`bulk_search_item.v1`). stderr: `bulk_summary: total=2 succeeded=2 failed=0`.
|
||||
|
||||
MCP tool 동등:
|
||||
|
||||
```json
|
||||
{"name":"kebab__bulk_search","arguments":{"queries":[{"query":"rust"},{"query":"async"}]}}
|
||||
```
|
||||
|
||||
## P6-4 이미지 ingestion 옵션
|
||||
|
||||
`config.toml` 에 다음 절을 추가하면 `kebab ingest` 가 `**/*.png` / `**/*.jpg` 등 이미지 자산도 함께 색인합니다 (텍스트만 색인하려면 생략):
|
||||
@@ -312,4 +329,24 @@ rm -rf /tmp/kebab-smoke # 통째로 정리
|
||||
- (P7-3) 한 PDF 가 N 페이지면 `kebab ingest` 가 N 개 (또는 그 이상의, 페이지 길면 multi-chunk) 의 chunk 를 한 transaction 안에서 commit. 500 페이지 책 → 500+ chunk 한 번에 → embedding throughput 가 bottleneck. 임베딩 활성 워크스페이스에서 큰 PDF 를 처음 ingest 하면 분-단위 시간 + WAL 크기 증가 가능 — P+ 스케일 hardening task 까지 정상 동작이지만 비용은 측정 가능.
|
||||
- (P7-3 + follow-up) 동일 path 에 byte 가 다른 PDF 를 두 번째 ingest 하면 `purge_vector_orphans_for_workspace_path` 가 옛 chunk_id 를 LanceDB 에서 먼저 삭제, 이어서 `purge_orphan_at_workspace_path` 가 옛 doc / chunks / embedding_records 를 SQLite 에서 sweep. 새 byte 가 새 `doc_id` 로 색인됨. `IngestReport` 에 그 자산만 `new+=1` (다른 자산은 `updated`). 두 store 모두 정합 — 옛 본문 검색 시 옛 chunks 가 더 이상 surface 되지 않음.
|
||||
|
||||
### Embedding upgrade (fb-39b)
|
||||
|
||||
`multilingual-e5-small` 에서 `multilingual-e5-large` 로 업그레이드 시퀀스:
|
||||
|
||||
```bash
|
||||
# 기존 vector index 정리 (orphan table 회피)
|
||||
kebab --config /tmp/kebab-smoke/config.toml reset --vector-only
|
||||
|
||||
# config.toml 의 [models.embedding] 갱신:
|
||||
# model = "multilingual-e5-large"
|
||||
# dimensions = 1024
|
||||
|
||||
# 재-ingest — fastembed 가 첫 실행 시 e5-large ONNX (~1.3 GB) 자동 다운로드.
|
||||
# 다운로드 시간 + 모든 chunk re-embed 시간 (e5-small 대비 ~3-4×).
|
||||
kebab --config /tmp/kebab-smoke/config.toml ingest
|
||||
|
||||
# fb-39 의 P@k metric 으로 small vs large 비교:
|
||||
kebab --config /tmp/kebab-smoke/config.toml eval run
|
||||
```
|
||||
|
||||
자세한 history 와 발견된 버그는 [tasks/HOTFIXES.md](../tasks/HOTFIXES.md) 참조.
|
||||
|
||||
418
docs/superpowers/plans/2026-05-10-p9-fb-39-eval-foundation.md
Normal file
418
docs/superpowers/plans/2026-05-10-p9-fb-39-eval-foundation.md
Normal file
@@ -0,0 +1,418 @@
|
||||
# fb-39 Eval Foundation (P@k Metric) Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add chunk-level `precision_at_k_chunk` metric (P@5, P@10) to kebab-eval `AggregateMetrics`, plus golden-set ground-truth documentation strengthening — so a future fb-39b can measure whether a lever (chunk policy / RRF / cross-encoder / embedding upgrade) actually moves the rank-5+ noise needle.
|
||||
|
||||
**Architecture:** Single new field on `AggregateMetrics`, computed inside the existing `compute_aggregate_with_config` loop using the same accumulator pattern as `recall_at_k_doc` (sum-of-per-query-ratios / denominator), serialized via the existing `round_recall_map` helper. Denominator is k (fixed), matching the `hit_at_k` convention. Skip queries with empty `expected_chunk_ids`. Golden set schema unchanged — `expected_chunk_ids` is the ground truth (curator fills per-workspace).
|
||||
|
||||
**Tech Stack:** Rust 2024, serde, serde_yaml. No new deps.
|
||||
|
||||
**Spec:** `docs/superpowers/specs/2026-05-10-p9-fb-39-eval-foundation-design.md`
|
||||
|
||||
---
|
||||
|
||||
## File map
|
||||
|
||||
**Modify:**
|
||||
- `crates/kebab-eval/src/metrics.rs` — add `precision_at_k_chunk` field on `AggregateMetrics`, init/accumulate/finalize inside `compute_aggregate_with_config`, plus unit tests.
|
||||
- `fixtures/golden_queries.yaml` — strengthen header comment about `expected_chunk_ids` being P@k ground truth.
|
||||
- `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` — add `precision_at_k_chunk` to §11 eval metric table.
|
||||
- `tasks/p9/p9-fb-39-retrieval-precision-tuning.md` — flip status, link design + plan, "lever 적용 deferred to fb-39b" banner.
|
||||
- `tasks/INDEX.md` — flip fb-39 row to ✅ (eval foundation only).
|
||||
|
||||
**Create:** none.
|
||||
|
||||
---
|
||||
|
||||
## Task 1: Add precision_at_k_chunk field + serde backwards-compat
|
||||
|
||||
**Files:**
|
||||
- Modify: `crates/kebab-eval/src/metrics.rs`
|
||||
|
||||
- [ ] **Step 1: Append failing test to `mod tests`**
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
fn precision_at_k_chunk_field_default_empty_on_old_json() {
|
||||
// Old eval_runs.metrics_json predates fb-39 — no precision_at_k_chunk field.
|
||||
// serde(default) should yield empty BTreeMap.
|
||||
let old = serde_json::json!({
|
||||
"hit_at_k": {"1": 0.5, "3": 0.5, "5": 0.5, "10": 0.5},
|
||||
"mrr": 0.5,
|
||||
"recall_at_k_doc": {"1": 0.0, "3": 0.0, "5": 0.0, "10": 0.0},
|
||||
"citation_coverage": null,
|
||||
"groundedness": 0.0,
|
||||
"empty_result_rate": 0.0,
|
||||
"refusal_correctness": null,
|
||||
"total_queries": 1,
|
||||
"failed_queries": 0
|
||||
});
|
||||
let parsed: AggregateMetrics = serde_json::from_value(old).expect("backwards-compat deserialize");
|
||||
assert!(parsed.precision_at_k_chunk.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn precision_at_k_chunk_field_serializes_when_populated() {
|
||||
let mut p = BTreeMap::new();
|
||||
p.insert(5, 0.6_f32);
|
||||
p.insert(10, 0.3_f32);
|
||||
let agg = AggregateMetrics {
|
||||
hit_at_k: BTreeMap::new(),
|
||||
mrr: 0.0,
|
||||
recall_at_k_doc: BTreeMap::new(),
|
||||
precision_at_k_chunk: p,
|
||||
citation_coverage: 0.0,
|
||||
groundedness: 0.0,
|
||||
empty_result_rate: 0.0,
|
||||
refusal_correctness: 0.0,
|
||||
total_queries: 0,
|
||||
failed_queries: 0,
|
||||
};
|
||||
let v = serde_json::to_value(&agg).unwrap();
|
||||
assert_eq!(v["precision_at_k_chunk"]["5"], 0.6);
|
||||
assert_eq!(v["precision_at_k_chunk"]["10"], 0.3);
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run tests — expect compile errors (field undefined)**
|
||||
|
||||
```bash
|
||||
cargo test -p kebab-eval --lib precision_at_k_chunk
|
||||
```
|
||||
Expected: errors — `precision_at_k_chunk` field missing on `AggregateMetrics`.
|
||||
|
||||
- [ ] **Step 3: Add field to `AggregateMetrics`**
|
||||
|
||||
In `crates/kebab-eval/src/metrics.rs`, find `pub struct AggregateMetrics { ... }` (~line 57). Add field after `recall_at_k_doc`:
|
||||
|
||||
```rust
|
||||
/// p9-fb-39: chunk-level precision at k. Binary relevance via
|
||||
/// `expected_chunk_ids` (a hit is "relevant" if its chunk_id is
|
||||
/// in the golden's `expected_chunk_ids`). Denominator is k (fixed)
|
||||
/// — `hits.len() < k` still divides by k, treating shortfall as
|
||||
/// precision loss (mirrors `hit_at_k`). Queries with empty
|
||||
/// `expected_chunk_ids` are skipped (mirrors `hit_at_k_chunk`).
|
||||
#[serde(default)]
|
||||
pub precision_at_k_chunk: BTreeMap<u32, f32>,
|
||||
```
|
||||
|
||||
The other tests in the file (e.g. `hit_at_k_handles_ranks_1_4_miss`, `recall_at_k_doc_partial`) construct `AggregateMetrics` via the public `compute_aggregate_with_config` path, not via struct literal, so the new `#[serde(default)]` field does NOT break them. Only direct struct-literal constructions need updates — search the file to confirm:
|
||||
|
||||
```bash
|
||||
grep -n "AggregateMetrics {" crates/kebab-eval/src/metrics.rs
|
||||
```
|
||||
|
||||
For each direct struct-literal site, add `precision_at_k_chunk: BTreeMap::new(),` to the literal.
|
||||
|
||||
- [ ] **Step 4: Run tests — expect both new tests pass**
|
||||
|
||||
```bash
|
||||
cargo test -p kebab-eval --lib precision_at_k_chunk
|
||||
```
|
||||
Expected: both pass.
|
||||
|
||||
- [ ] **Step 5: Run clippy**
|
||||
|
||||
```bash
|
||||
cargo clippy -p kebab-eval --all-targets -- -D warnings
|
||||
```
|
||||
Expected: clean.
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add crates/kebab-eval/src/metrics.rs
|
||||
git commit -m "feat(eval): AggregateMetrics.precision_at_k_chunk field (fb-39)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 2: Compute precision_at_k_chunk in aggregate loop
|
||||
|
||||
**Files:**
|
||||
- Modify: `crates/kebab-eval/src/metrics.rs`
|
||||
|
||||
- [ ] **Step 1: Append failing tests to `mod tests`**
|
||||
|
||||
(Use the existing `make_query_result` / fixture helpers — read the top of the test module for available helpers, e.g. `mk_qr_with_chunks(query_id, chunk_ids_with_ranks)`.)
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
fn precision_at_k_chunk_exact_match() {
|
||||
// 1 query, expected = [c1, c2, c3]. Top-5 hits: [c1@1, c2@2, c3@3, x@4, y@5].
|
||||
// P@5 = 3/5 = 0.6. P@10 = 3/10 = 0.3.
|
||||
let queries = vec![mk_golden(
|
||||
"g1",
|
||||
&[], // expected_doc_ids
|
||||
&["c1", "c2", "c3"], // expected_chunk_ids
|
||||
&[], // must_contain
|
||||
&[], // forbidden
|
||||
None, // expected_refusal
|
||||
)];
|
||||
let rows = vec![mk_query_row(
|
||||
"g1",
|
||||
&[("c1", 1), ("c2", 2), ("c3", 3), ("x", 4), ("y", 5)],
|
||||
)];
|
||||
let agg = compute_from_inputs(&queries, &rows);
|
||||
assert_eq!(agg.precision_at_k_chunk[&5], 0.6);
|
||||
assert_eq!(agg.precision_at_k_chunk[&10], 0.3);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn precision_at_k_chunk_partial_topk_divides_by_k() {
|
||||
// 1 query, expected = [c1, c2]. Top hits: only [c1@1, c2@2] (3 results total).
|
||||
// P@5 = 2/5 = 0.4 (denominator k, not hits.len).
|
||||
let queries = vec![mk_golden("g1", &[], &["c1", "c2"], &[], &[], None)];
|
||||
let rows = vec![mk_query_row("g1", &[("c1", 1), ("c2", 2), ("x", 3)])];
|
||||
let agg = compute_from_inputs(&queries, &rows);
|
||||
assert_eq!(agg.precision_at_k_chunk[&5], 0.4);
|
||||
assert_eq!(agg.precision_at_k_chunk[&10], 0.2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn precision_at_k_chunk_zero_relevant_in_topk() {
|
||||
// 1 query, expected = [c1]. Top hits all unrelated.
|
||||
// P@5 = 0/5 = 0.0.
|
||||
let queries = vec![mk_golden("g1", &[], &["c1"], &[], &[], None)];
|
||||
let rows = vec![mk_query_row("g1", &[("x", 1), ("y", 2), ("z", 3)])];
|
||||
let agg = compute_from_inputs(&queries, &rows);
|
||||
assert_eq!(agg.precision_at_k_chunk[&5], 0.0);
|
||||
assert_eq!(agg.precision_at_k_chunk[&10], 0.0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn precision_at_k_chunk_empty_expected_skipped() {
|
||||
// 1 query, expected_chunk_ids = []. Should be skipped — denom 0 → entry value 0.0
|
||||
// (matches `recall_at_k_doc` behavior in `round_recall_map` for zero-denom).
|
||||
let queries = vec![mk_golden("g1", &[], &[], &[], &[], None)];
|
||||
let rows = vec![mk_query_row("g1", &[("c1", 1)])];
|
||||
let agg = compute_from_inputs(&queries, &rows);
|
||||
// Mirrors recall_at_k_doc: zero-denom → 0.0 in map (not absent).
|
||||
assert_eq!(agg.precision_at_k_chunk[&5], 0.0);
|
||||
assert_eq!(agg.precision_at_k_chunk[&10], 0.0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn precision_at_k_chunk_two_queries_averaged() {
|
||||
// q1: expected=[c1], hits=[c1@1, x@2, y@3] → P@5 = 1/5 = 0.2
|
||||
// q2: expected=[c1, c2], hits=[c1@1, c2@2] → P@5 = 2/5 = 0.4
|
||||
// Avg P@5 = (0.2 + 0.4) / 2 = 0.3.
|
||||
let queries = vec![
|
||||
mk_golden("g1", &[], &["c1"], &[], &[], None),
|
||||
mk_golden("g2", &[], &["c1", "c2"], &[], &[], None),
|
||||
];
|
||||
let rows = vec![
|
||||
mk_query_row("g1", &[("c1", 1), ("x", 2), ("y", 3)]),
|
||||
mk_query_row("g2", &[("c1", 1), ("c2", 2)]),
|
||||
];
|
||||
let agg = compute_from_inputs(&queries, &rows);
|
||||
assert_eq!(agg.precision_at_k_chunk[&5], 0.3);
|
||||
}
|
||||
```
|
||||
|
||||
The `mk_golden` / `mk_query_row` / `compute_from_inputs` helpers are existing test helpers in this file. Read the top of `mod tests` (~line 380-510) to confirm the actual helper names and signatures. If your helpers have different shapes (e.g. `mk_qr_with_chunks(id, &[(chunk, rank)])`), adapt the test calls accordingly.
|
||||
|
||||
If those helpers don't exist, look for the pattern in the existing `hit_at_k_handles_ranks_1_4_miss` test (~line 513) and mirror it.
|
||||
|
||||
- [ ] **Step 2: Run tests — expect failures**
|
||||
|
||||
```bash
|
||||
cargo test -p kebab-eval --lib precision_at_k_chunk
|
||||
```
|
||||
Expected: 5 failures — `precision_at_k_chunk` map empty (only `#[serde(default)]` populates it from JSON; the compute path doesn't yet).
|
||||
|
||||
- [ ] **Step 3: Implement aggregation in `compute_aggregate_with_config`**
|
||||
|
||||
In `crates/kebab-eval/src/metrics.rs`, find `compute_aggregate_with_config` body. After the `recall_at_k_doc` accumulator init (~line 188-189), add:
|
||||
|
||||
```rust
|
||||
let mut precision_at_k_chunk: BTreeMap<u32, (f64, u32)> =
|
||||
TOP_K_VARIANTS.iter().map(|k| (*k, (0.0_f64, 0_u32))).collect();
|
||||
```
|
||||
|
||||
Inside the loop, after the existing `hit@k + MRR` block (~line 222-247) which already gates on `!gq.expected_chunk_ids.is_empty()`, add a sibling `for k in TOP_K_VARIANTS { ... }` that updates `precision_at_k_chunk`. Place it INSIDE the same `if !gq.expected_chunk_ids.is_empty() { ... }` block so the skip-empty policy is shared:
|
||||
|
||||
```rust
|
||||
// hit@k + MRR (chunk-level, requires non-empty expected_chunk_ids)
|
||||
if !gq.expected_chunk_ids.is_empty() {
|
||||
let expected: HashSet<&ChunkId> = gq.expected_chunk_ids.iter().collect();
|
||||
// ... existing hit@k + MRR computation ...
|
||||
|
||||
// p9-fb-39: precision@k_chunk — count of top-k hits whose
|
||||
// chunk_id is in `expected`, divided by k (fixed denominator).
|
||||
for k in TOP_K_VARIANTS {
|
||||
let hits_in_topk_relevant = qr
|
||||
.hits_top_k
|
||||
.iter()
|
||||
.filter(|h| h.rank <= *k && expected.contains(&h.chunk_id))
|
||||
.count();
|
||||
let entry = precision_at_k_chunk.get_mut(k).expect("init");
|
||||
entry.0 += hits_in_topk_relevant as f64 / f64::from(*k);
|
||||
entry.1 += 1;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Then at the final `Ok(AggregateMetrics { ... })` return (~line 325-345), add:
|
||||
|
||||
```rust
|
||||
precision_at_k_chunk: round_recall_map(&precision_at_k_chunk),
|
||||
```
|
||||
|
||||
(`round_recall_map` is the existing helper at line ~366; it accepts `BTreeMap<u32, (f64, u32)>` and divides sum by denom, returning `BTreeMap<u32, f32>`. Same shape used by `recall_at_k_doc`.)
|
||||
|
||||
- [ ] **Step 4: Run tests — expect all 5 pass**
|
||||
|
||||
```bash
|
||||
cargo test -p kebab-eval --lib precision_at_k_chunk
|
||||
```
|
||||
Expected: 5 passes.
|
||||
|
||||
- [ ] **Step 5: Run full kebab-eval suite**
|
||||
|
||||
```bash
|
||||
cargo test -p kebab-eval
|
||||
cargo clippy -p kebab-eval --all-targets -- -D warnings
|
||||
```
|
||||
Expected: no regressions; clippy clean.
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add crates/kebab-eval/src/metrics.rs
|
||||
git commit -m "feat(eval): compute precision_at_k_chunk in aggregate loop (fb-39)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 3: Strengthen golden YAML header documentation
|
||||
|
||||
**Files:**
|
||||
- Modify: `fixtures/golden_queries.yaml`
|
||||
|
||||
- [ ] **Step 1: Read existing header**
|
||||
|
||||
```bash
|
||||
head -20 fixtures/golden_queries.yaml
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Replace header comment**
|
||||
|
||||
Find the existing header (the comment block above the first `- id: g001` entry). Replace with:
|
||||
|
||||
```yaml
|
||||
# Golden query suite for `kebab eval run` (P5-1 / P5-2 / fb-39).
|
||||
#
|
||||
# Top-level: list of queries. Required fields: `id`, `query`. All
|
||||
# others are optional and default to empty / null.
|
||||
#
|
||||
# Curators: `expected_doc_ids` and `expected_chunk_ids` MUST refer to
|
||||
# real rows in the active workspace's SQLite store at run time. Stale
|
||||
# references make the runner bail at start. The shipped template
|
||||
# leaves them empty so the file is loadable on any fresh workspace —
|
||||
# fill them in after a `kebab ingest` to enable the metrics that
|
||||
# require ground truth (P5-2 + fb-39):
|
||||
#
|
||||
# - `expected_chunk_ids` → hit_at_k, MRR, precision_at_k_chunk (fb-39)
|
||||
# - `expected_doc_ids` → recall_at_k_doc
|
||||
#
|
||||
# `precision_at_k_chunk` (fb-39): of the top-k retrieved hits, what
|
||||
# fraction's `chunk_id` is in `expected_chunk_ids`. Denominator is k
|
||||
# (fixed) — `top-k` shortfall is treated as precision loss. Queries
|
||||
# with empty `expected_chunk_ids` are skipped from this metric.
|
||||
#
|
||||
# `must_contain` / `forbidden` drive the rule-based groundedness
|
||||
# metric (P5-2).
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Verify YAML still parses**
|
||||
|
||||
```bash
|
||||
cargo test -p kebab-eval --test golden_loader 2>/dev/null || cargo test -p kebab-eval load_golden
|
||||
```
|
||||
|
||||
If a loader test exists, it should still pass. If not, run a quick parse check:
|
||||
|
||||
```bash
|
||||
cargo run --bin kebab -- eval --help 2>/dev/null || true
|
||||
```
|
||||
|
||||
(The shipped `golden_queries.yaml` is just a fixture — the workspace test loader will read it during integration tests and fail loudly if YAML is malformed.)
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add fixtures/golden_queries.yaml
|
||||
git commit -m "docs(eval): document expected_chunk_ids as P@k ground truth (fb-39)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 4: Update design doc + spec status flip + INDEX
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`
|
||||
- Modify: `tasks/p9/p9-fb-39-retrieval-precision-tuning.md`
|
||||
- Modify: `tasks/INDEX.md`
|
||||
|
||||
- [ ] **Step 1: Update design §11 eval metric list**
|
||||
|
||||
```bash
|
||||
grep -n "^## §11\|^## 11\|hit_at_k\|recall_at_k_doc\|precision" docs/superpowers/specs/2026-04-27-kebab-final-form-design.md | head -10
|
||||
```
|
||||
|
||||
Find the §11 eval section (or wherever metrics are listed). Add a `precision_at_k_chunk` line next to `hit_at_k` / `recall_at_k_doc`:
|
||||
|
||||
```markdown
|
||||
- `precision_at_k_chunk` (fb-39): top-k 안 chunk_id 가 `expected_chunk_ids` 에 포함된 비율. 분모 = k (fixed). `expected_chunk_ids` 빈 query 는 skip.
|
||||
```
|
||||
|
||||
If the design doc doesn't currently list metrics inline, add a short subsection or bullet under §11 introducing it.
|
||||
|
||||
- [ ] **Step 2: Flip task spec status**
|
||||
|
||||
```bash
|
||||
sed -i.bak 's/^status: open$/status: completed/' tasks/p9/p9-fb-39-retrieval-precision-tuning.md
|
||||
rm tasks/p9/p9-fb-39-retrieval-precision-tuning.md.bak
|
||||
```
|
||||
|
||||
Replace the existing `> ⏳ **백로그 only — 미구현.**` skeleton banner with:
|
||||
|
||||
```markdown
|
||||
> ✅ **Eval foundation 부분 구현 완료.** P@k metric (P@5, P@10) 추가. 본 spec 의 lever 적용 (chunk policy / RRF / cross-encoder / embedding 업그레이드) 은 별도 task 로 분리 (fb-39b 이후).
|
||||
>
|
||||
> - Design: [`docs/superpowers/specs/2026-05-10-p9-fb-39-eval-foundation-design.md`](../../docs/superpowers/specs/2026-05-10-p9-fb-39-eval-foundation-design.md)
|
||||
> - Plan: [`docs/superpowers/plans/2026-05-10-p9-fb-39-eval-foundation.md`](../../docs/superpowers/plans/2026-05-10-p9-fb-39-eval-foundation.md)
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Flip INDEX row**
|
||||
|
||||
In `tasks/INDEX.md`, find the fb-39 row. Replace its status with `✅ 머지 (2026-05-10) — eval foundation only, lever 적용 deferred` (mirror the fb-42 row format from the previous PR for consistency).
|
||||
|
||||
- [ ] **Step 4: Workspace test + clippy gate**
|
||||
|
||||
```bash
|
||||
cargo test --workspace --no-fail-fast -j 1 2>&1 | tail -10
|
||||
cargo clippy --workspace --all-targets -- -D warnings 2>&1 | tail -5
|
||||
```
|
||||
|
||||
`-j 1` REQUIRED.
|
||||
|
||||
Expected: all green.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/superpowers/specs/2026-04-27-kebab-final-form-design.md tasks/p9/p9-fb-39-retrieval-precision-tuning.md tasks/INDEX.md
|
||||
git commit -m "docs(fb-39): design §11 + spec status + INDEX (eval foundation)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Final verification checklist
|
||||
|
||||
- [ ] `cargo test --workspace --no-fail-fast -j 1` green
|
||||
- [ ] `cargo clippy --workspace --all-targets -- -D warnings` clean
|
||||
- [ ] `kebab eval run` (against any workspace with non-empty `expected_chunk_ids` in golden) emits `precision_at_k_chunk: {5: ..., 10: ...}` in the run's `metrics_json`
|
||||
- [ ] design §11 + INDEX + task spec status flipped
|
||||
405
docs/superpowers/plans/2026-05-10-p9-fb-39b-embedding-upgrade.md
Normal file
405
docs/superpowers/plans/2026-05-10-p9-fb-39b-embedding-upgrade.md
Normal file
@@ -0,0 +1,405 @@
|
||||
# fb-39b Embedding Model Upgrade Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Upgrade default embedding model from `multilingual-e5-small` (384 dim) to `multilingual-e5-large` (1024 dim) so retrieval precision can improve on Korean dogfooding corpus. Existing user TOMLs pinning `multilingual-e5-small` keep working unchanged.
|
||||
|
||||
**Architecture:** Three-line code surface: a new arm in `kebab-embed-local::resolve_model`, defaults flipped in `kebab-config::Config::defaults` (and the TOML template), and the existing test asserting the 384 default updated. LanceDB tables are already namespaced by `(model, dim)` so an upgraded model writes to a fresh table; fb-23 incremental ingest detects the `embedding_version` mismatch and auto-re-embeds on next ingest. No migration tooling — orphan old-model tables cleaned via `kebab reset --vector-only`.
|
||||
|
||||
**Tech Stack:** Rust 2024, fastembed 4.9.1 (`MultilingualE5Large` enum already shipped), LanceDB.
|
||||
|
||||
**Spec:** `docs/superpowers/specs/2026-05-10-p9-fb-39b-embedding-upgrade-design.md`
|
||||
|
||||
---
|
||||
|
||||
## File map
|
||||
|
||||
**Modify:**
|
||||
- `crates/kebab-embed-local/src/lib.rs` — add `multilingual-e5-large` arm in `resolve_model`. Update or add `check_dim` test for 1024.
|
||||
- `crates/kebab-config/src/lib.rs` — flip `Config::defaults().models.embedding.{model, dimensions}` and the TOML template at line ~952. Update default test at line 767.
|
||||
- `README.md` — `[models.embedding]` section: mention new default + small opt-out + dim mismatch hint.
|
||||
- `docs/SMOKE.md` — append "Embedding upgrade (fb-39b)" walkthrough showing the `kebab reset --vector-only && kebab ingest` sequence + first-run ONNX download warning.
|
||||
- `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` §5 storage / §9 versioning — update default model + dim references.
|
||||
- `tasks/HOTFIXES.md` — entry for embedding upgrade UX (orphan tables on model swap, reset --vector-only flow).
|
||||
- `tasks/p9/p9-fb-39-retrieval-precision-tuning.md` banner — append note "fb-39b lever 적용 (embedding upgrade) ✅".
|
||||
- `tasks/INDEX.md` — fb-39b row ✅ (new row alongside fb-39).
|
||||
|
||||
**Create:**
|
||||
- `tasks/p9/p9-fb-39b-embedding-upgrade.md` — new task spec mirroring fb-39 frontmatter (status: completed, design + plan links).
|
||||
|
||||
---
|
||||
|
||||
## Task 1: Add multilingual-e5-large to kebab-embed-local
|
||||
|
||||
**Files:**
|
||||
- Modify: `crates/kebab-embed-local/src/lib.rs`
|
||||
|
||||
- [ ] **Step 1: Append failing tests**
|
||||
|
||||
Find the existing `mod tests` (~line 230). Append:
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
fn resolve_model_supports_e5_large() {
|
||||
let m = resolve_model("multilingual-e5-large").expect("e5-large should resolve");
|
||||
// The fastembed enum is non-comparable in some versions; we only need
|
||||
// to confirm Ok and that the underlying TextEmbedding could be built.
|
||||
// Avoid actually constructing the model in tests (1.3 GB ONNX download).
|
||||
let _ = m;
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn check_dim_passes_for_1024() {
|
||||
check_dim(1024, 1024).expect("matching dims must pass");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn check_dim_rejects_384_vs_1024() {
|
||||
let err = check_dim(384, 1024).expect_err("dim mismatch must error");
|
||||
let msg = format!("{err}");
|
||||
assert!(msg.contains("384") && msg.contains("1024"),
|
||||
"error must mention both dims, got: {msg}");
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run tests to confirm failures**
|
||||
|
||||
```bash
|
||||
cargo test -p kebab-embed-local resolve_model_supports_e5_large
|
||||
cargo test -p kebab-embed-local check_dim_passes_for_1024
|
||||
```
|
||||
Expected: `resolve_model_supports_e5_large` fails (no arm); `check_dim_*` passes already (helper is generic).
|
||||
|
||||
- [ ] **Step 3: Add arm to resolve_model**
|
||||
|
||||
In `crates/kebab-embed-local/src/lib.rs`, find `fn resolve_model` (~line 199). Replace the match body:
|
||||
|
||||
```rust
|
||||
fn resolve_model(name: &str) -> Result<EmbeddingModel> {
|
||||
match name {
|
||||
"multilingual-e5-small" => Ok(EmbeddingModel::MultilingualE5Small),
|
||||
"multilingual-e5-large" => Ok(EmbeddingModel::MultilingualE5Large),
|
||||
other => anyhow::bail!(
|
||||
"kb-embed-local: unsupported embedding model {other:?}; \
|
||||
this adapter currently ships `multilingual-e5-small` and \
|
||||
`multilingual-e5-large`. Add a new arm to `resolve_model` \
|
||||
(and a fastembed feature flag if needed) to support more."
|
||||
),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run tests — all pass**
|
||||
|
||||
```bash
|
||||
cargo test -p kebab-embed-local
|
||||
cargo clippy -p kebab-embed-local --all-targets -- -D warnings
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add crates/kebab-embed-local/src/lib.rs
|
||||
git commit -m "feat(embed): add multilingual-e5-large arm to resolve_model (fb-39b)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 2: Flip kebab-config default to e5-large + 1024 dim
|
||||
|
||||
**Files:**
|
||||
- Modify: `crates/kebab-config/src/lib.rs`
|
||||
|
||||
- [ ] **Step 1: Read existing default test + value sites**
|
||||
|
||||
```bash
|
||||
grep -n "multilingual-e5-small\|dimensions: 384\|dimensions = 384\|default.*embedding" crates/kebab-config/src/lib.rs
|
||||
```
|
||||
|
||||
Three sites to update:
|
||||
- `Config::defaults()` body (~line 307): `dimensions: 384` and `model: "multilingual-e5-small"`.
|
||||
- Default-assert test (~line 767): `assert_eq!(c.models.embedding.dimensions, 384)` and likely a sibling assertion on model.
|
||||
- TOML template at ~line 952: `dimensions = 384` (and likely `model = "multilingual-e5-small"`).
|
||||
|
||||
- [ ] **Step 2: Add failing assertion to existing default test**
|
||||
|
||||
Find the test at ~line 763-768 (likely `defaults_match_design_64_score_gate` or similar). Read it:
|
||||
|
||||
```bash
|
||||
sed -n '760,780p' crates/kebab-config/src/lib.rs
|
||||
```
|
||||
|
||||
If the test asserts `dimensions == 384`, change to `1024`. If it doesn't assert model name, add:
|
||||
|
||||
```rust
|
||||
assert_eq!(c.models.embedding.model, "multilingual-e5-large");
|
||||
assert_eq!(c.models.embedding.dimensions, 1024);
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Run tests — expect failure**
|
||||
|
||||
```bash
|
||||
cargo test -p kebab-config defaults_match
|
||||
```
|
||||
Expected: assertion failure on dimensions == 1024 (still 384) and/or model name.
|
||||
|
||||
- [ ] **Step 4: Flip the defaults**
|
||||
|
||||
In `crates/kebab-config/src/lib.rs:307` (the `EmbeddingCfg` defaults block):
|
||||
|
||||
```rust
|
||||
EmbeddingCfg {
|
||||
provider: "fastembed".to_string(),
|
||||
model: "multilingual-e5-large".to_string(),
|
||||
version: "v1".to_string(),
|
||||
dimensions: 1024,
|
||||
// ... preserve other fields (batch_size etc.) ...
|
||||
}
|
||||
```
|
||||
|
||||
(Read the surrounding lines first to confirm field names — if `version` field doesn't exist or has a different shape, only update `model` + `dimensions`.)
|
||||
|
||||
- [ ] **Step 5: Flip the TOML template**
|
||||
|
||||
In `crates/kebab-config/src/lib.rs` near line 952, the multi-line raw string contains the example TOML config. Find:
|
||||
|
||||
```toml
|
||||
[models.embedding]
|
||||
provider = "fastembed"
|
||||
model = "multilingual-e5-small"
|
||||
...
|
||||
dimensions = 384
|
||||
```
|
||||
|
||||
Replace with `model = "multilingual-e5-large"` and `dimensions = 1024`.
|
||||
|
||||
- [ ] **Step 6: Run tests — pass**
|
||||
|
||||
```bash
|
||||
cargo test -p kebab-config
|
||||
cargo clippy -p kebab-config --all-targets -- -D warnings
|
||||
```
|
||||
|
||||
- [ ] **Step 7: Commit**
|
||||
|
||||
```bash
|
||||
git add crates/kebab-config/src/lib.rs
|
||||
git commit -m "feat(config): default embedding model multilingual-e5-large + 1024 dim (fb-39b)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 3: Cross-crate test fixture sweep
|
||||
|
||||
**Files:**
|
||||
- Modify: any test fixture broken by Task 2's default flip.
|
||||
|
||||
- [ ] **Step 1: Find broken sites**
|
||||
|
||||
```bash
|
||||
cargo build --workspace 2>&1 | tail -10
|
||||
cargo test --workspace --no-run 2>&1 | grep -E "error\[|FAILED" | head -20
|
||||
```
|
||||
|
||||
Likely candidates:
|
||||
- `crates/kebab-app/tests/` — anywhere a test asserted `embedding.dimensions == 384`.
|
||||
- `crates/kebab-cli/tests/cli_schema.rs` — a capability/model assertion may include the embedding model name.
|
||||
|
||||
For each failure, decide:
|
||||
- **Pin to small intentionally** (test exercises small-specific behavior): set `cfg.models.embedding.model = "multilingual-e5-small"; cfg.models.embedding.dimensions = 384;` explicitly.
|
||||
- **Inherit new default** (test just snapshots defaults): update assertion to `multilingual-e5-large` / `1024`.
|
||||
|
||||
The vast majority of integration tests use `provider = "none"` (no embeddings) — those are unaffected.
|
||||
|
||||
- [ ] **Step 2: Verify workspace builds**
|
||||
|
||||
```bash
|
||||
cargo build --workspace 2>&1 | tail -5
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Run workspace tests**
|
||||
|
||||
```bash
|
||||
cargo test --workspace --no-fail-fast -j 1 2>&1 | tail -10
|
||||
cargo clippy --workspace --all-targets -- -D warnings 2>&1 | tail -5
|
||||
```
|
||||
|
||||
`-j 1` REQUIRED.
|
||||
|
||||
Expected: all green.
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add crates/
|
||||
git commit -m "fix(fb-39b): update test fixtures for embedding default flip"
|
||||
```
|
||||
|
||||
(Skip this commit if `cargo build --workspace` is already clean after Task 2 — meaning no fixture broke.)
|
||||
|
||||
---
|
||||
|
||||
## Task 4: Wire schema docs (design + HOTFIXES + new task spec)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`
|
||||
- Modify: `tasks/HOTFIXES.md`
|
||||
- Create: `tasks/p9/p9-fb-39b-embedding-upgrade.md`
|
||||
- Modify: `tasks/p9/p9-fb-39-retrieval-precision-tuning.md`
|
||||
- Modify: `tasks/INDEX.md`
|
||||
|
||||
- [ ] **Step 1: Update design §5 storage and §9 versioning**
|
||||
|
||||
```bash
|
||||
grep -n "multilingual-e5-small\|^## §5\|^### §5\|^## §9\|384" docs/superpowers/specs/2026-04-27-kebab-final-form-design.md | head -10
|
||||
```
|
||||
|
||||
Update any reference to `multilingual-e5-small` or `dim 384` in the design doc to read `multilingual-e5-large` and `dim 1024`. Keep historical version mentions intact (e.g. "0.6.0 shipped with multilingual-e5-small") if any — but the "current default" line must reflect the new model.
|
||||
|
||||
- [ ] **Step 2: Add HOTFIXES entry**
|
||||
|
||||
Append to `tasks/HOTFIXES.md` (under the dated log; place at top of the dated entries with today's date `2026-05-10`):
|
||||
|
||||
```markdown
|
||||
- **2026-05-10 fb-39b — embedding upgrade UX**: default embedding flipped from `multilingual-e5-small` (384 dim) to `multilingual-e5-large` (1024 dim). LanceDB tables are namespaced by `(model, dim)` so the new model writes to a fresh table and the old `chunk_embeddings_multilingual-e5-small_384` table becomes orphan. fb-23 incremental ingest auto-re-embeds chunks (embedding_version mismatch) into the new table on next `kebab ingest`. To free disk before re-ingest, run `kebab reset --vector-only` first — this wipes both LanceDB and the SQLite `embedding_records` table. Search/ask against the new model returns empty hits until `kebab ingest` populates the new table.
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Create `tasks/p9/p9-fb-39b-embedding-upgrade.md`**
|
||||
|
||||
Mirror the fb-39 frontmatter shape:
|
||||
|
||||
```markdown
|
||||
---
|
||||
phase: P9
|
||||
component: kebab-embed-local + kebab-config + kebab-store-vector + docs
|
||||
task_id: p9-fb-39b
|
||||
title: "Embedding model upgrade (multilingual-e5-large)"
|
||||
status: completed
|
||||
target_version: 0.7.0
|
||||
depends_on: [p9-fb-39]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§4 search, §5 storage, §9 versioning cascade]
|
||||
source_feedback: 사용자 도그푸딩 2026-05-06 — Claude Code 가 kebab CLI 사용 후 "rank 5+ 노이즈 섞임" 지적 (fb-39 의 lever 적용 측면).
|
||||
---
|
||||
|
||||
# p9-fb-39b — Embedding model upgrade
|
||||
|
||||
> ✅ **구현 완료.** fb-39 의 lever 후보 4개 중 embedding model 업그레이드 lever 적용. P@k metric (fb-39) 으로 small vs large 비교 가능.
|
||||
>
|
||||
> - Design: [`docs/superpowers/specs/2026-05-10-p9-fb-39b-embedding-upgrade-design.md`](../../docs/superpowers/specs/2026-05-10-p9-fb-39b-embedding-upgrade-design.md)
|
||||
> - Plan: [`docs/superpowers/plans/2026-05-10-p9-fb-39b-embedding-upgrade.md`](../../docs/superpowers/plans/2026-05-10-p9-fb-39b-embedding-upgrade.md)
|
||||
|
||||
## 요약
|
||||
|
||||
- `multilingual-e5-small` (384 dim) → `multilingual-e5-large` (1024 dim) default flip.
|
||||
- 기존 user TOML 이 small 명시 시 그대로 (backwards-compat).
|
||||
- fb-23 incremental ingest 가 embedding_version mismatch 감지 → 자동 re-embed.
|
||||
- 0.6 → 0.7 minor bump 트리거 (design §9 cascade rule).
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Append fb-39b note to fb-39 task spec banner**
|
||||
|
||||
In `tasks/p9/p9-fb-39-retrieval-precision-tuning.md`, find the existing `> ✅ **Eval foundation 부분 구현 완료.**` banner. Append a line:
|
||||
|
||||
```markdown
|
||||
> - fb-39b (lever 적용 — embedding upgrade): [`tasks/p9/p9-fb-39b-embedding-upgrade.md`](./p9-fb-39b-embedding-upgrade.md) ✅
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Add fb-39b row to INDEX**
|
||||
|
||||
In `tasks/INDEX.md`, find the fb-39 row. Add a sibling row immediately below:
|
||||
|
||||
```markdown
|
||||
- [p9-fb-39b embedding upgrade](p9/p9-fb-39b-embedding-upgrade.md) — ✅ 머지 (2026-05-10) — multilingual-e5-large default
|
||||
```
|
||||
|
||||
(Adapt format to match neighbor rows.)
|
||||
|
||||
- [ ] **Step 6: Workspace test + clippy gate**
|
||||
|
||||
```bash
|
||||
cargo test --workspace --no-fail-fast -j 1 2>&1 | tail -10
|
||||
cargo clippy --workspace --all-targets -- -D warnings 2>&1 | tail -5
|
||||
```
|
||||
|
||||
`-j 1` REQUIRED.
|
||||
|
||||
- [ ] **Step 7: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/ tasks/
|
||||
git commit -m "docs(fb-39b): design + HOTFIXES + new task spec + INDEX"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 5: README + SMOKE walkthrough
|
||||
|
||||
**Files:**
|
||||
- Modify: `README.md`
|
||||
- Modify: `docs/SMOKE.md`
|
||||
|
||||
- [ ] **Step 1: Update README `[models.embedding]` section**
|
||||
|
||||
```bash
|
||||
grep -n "models.embedding\|multilingual-e5-small\|fastembed" README.md | head -5
|
||||
```
|
||||
|
||||
Locate the `[models.embedding]` config block in README. Update default values mentioned + add new bullet:
|
||||
|
||||
```markdown
|
||||
- `model` (default `"multilingual-e5-large"`, fb-39b) — 다국어 sentence embedding 모델. 1024-dim. ONNX (~1.3 GB) 첫 실행 시 fastembed cache (`config.storage.model_dir/fastembed/`) 에 자동 다운로드. `"multilingual-e5-small"` (384 dim) 는 backwards-compat 으로 사용 가능 — TOML 에 명시.
|
||||
- `dimensions` (default `1024`) — 모델의 embedding 차원. config 와 LanceDB stored dim 불일치 시 검색 결과 0 건 (orphan table). 모델 변경 시 `kebab reset --vector-only && kebab ingest` 로 vector index 재구축 권장.
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Append SMOKE walkthrough**
|
||||
|
||||
Append to `docs/SMOKE.md` after fb-39 section (or at end if absent):
|
||||
|
||||
````markdown
|
||||
### Embedding upgrade (fb-39b)
|
||||
|
||||
`multilingual-e5-small` 에서 `multilingual-e5-large` 로 업그레이드 시퀀스:
|
||||
|
||||
```bash
|
||||
# 기존 vector index 정리 (orphan table 회피)
|
||||
kebab --config /tmp/kebab-smoke/config.toml reset --vector-only
|
||||
|
||||
# config.toml 의 [models.embedding] 갱신:
|
||||
# model = "multilingual-e5-large"
|
||||
# dimensions = 1024
|
||||
|
||||
# 재-ingest — fastembed 가 첫 실행 시 e5-large ONNX (~1.3 GB) 자동 다운로드.
|
||||
# 다운로드 시간 + 모든 chunk re-embed 시간 (e5-small 대비 ~3-4×).
|
||||
kebab --config /tmp/kebab-smoke/config.toml ingest
|
||||
|
||||
# fb-39 의 P@k metric 으로 small vs large 비교:
|
||||
kebab --config /tmp/kebab-smoke/config.toml eval run
|
||||
```
|
||||
````
|
||||
|
||||
- [ ] **Step 3: Workspace test + clippy gate (sanity)**
|
||||
|
||||
```bash
|
||||
cargo test --workspace --no-fail-fast -j 1 2>&1 | tail -5
|
||||
cargo clippy --workspace --all-targets -- -D warnings 2>&1 | tail -3
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add README.md docs/SMOKE.md
|
||||
git commit -m "docs(fb-39b): README + SMOKE — embedding upgrade walkthrough"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Final verification checklist
|
||||
|
||||
- [ ] `cargo test --workspace --no-fail-fast -j 1` green
|
||||
- [ ] `cargo clippy --workspace --all-targets -- -D warnings` clean
|
||||
- [ ] `kebab schema --json | jq .models.embedding_version` reflects new model name (after a fresh ingest with new defaults)
|
||||
- [ ] Manual smoke: `kebab reset --vector-only && kebab ingest` against `/tmp/kebab-smoke` triggers ONNX download (first run) then completes ingest into the new `chunk_embeddings_multilingual-e5-large_1024` table
|
||||
- [ ] README + SMOKE + design + HOTFIXES + fb-39b spec + INDEX all updated
|
||||
- [ ] **Post-merge**: cut version bump 0.6 → 0.7 + tag (CLAUDE.md `Versioning cascade` release rule — embedding_version cascade triggers minor bump)
|
||||
@@ -93,7 +93,7 @@ retrieval trace
|
||||
|
||||
grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks
|
||||
prompt 1184 tokens completion 312 tokens latency 1842 ms
|
||||
embedding multilingual-e5-small index v1.0
|
||||
embedding multilingual-e5-large index v1.0
|
||||
```
|
||||
|
||||
### 1.3 `kebab ask` (refusal — score gate)
|
||||
@@ -212,7 +212,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
|
||||
"vector_rank": 2
|
||||
},
|
||||
"index_version": "v1.0",
|
||||
"embedding_model": "multilingual-e5-small",
|
||||
"embedding_model": "multilingual-e5-large",
|
||||
"chunker_version": "md-heading-v1"
|
||||
}
|
||||
```
|
||||
@@ -245,6 +245,12 @@ normalize: rrf_score = raw_rrf / (2 / (k_rrf + 1))
|
||||
|
||||
`score_kind` 는 wire schema v1 에 **optional** 필드로 추가 (additive, backwards-compat). 누락 시 historical default `rrf` 로 해석.
|
||||
|
||||
#### Bulk multi-query (fb-42)
|
||||
|
||||
`kebab search --bulk` (stdin ndjson) + `mcp__kebab__bulk_search` tool 신규. agent 가 N sub-query 한 번에 실행 — query decomposition 시 단일 round-trip. Cap 100 per call. Sequential for-loop, App instance 재사용 → 캐시 / embedder cold-start 비용 한 번만.
|
||||
|
||||
Per-query failure 는 `bulk_search_item.v1.error` (error.v1) 에 격리, 다른 query 계속 진행. wire shape additive minor (`bulk_search_item.v1` + `bulk_search_response.v1` 신규).
|
||||
|
||||
### 2.3 Answer
|
||||
|
||||
```json
|
||||
@@ -258,7 +264,7 @@ normalize: rrf_score = raw_rrf / (2 / (k_rrf + 1))
|
||||
"grounded": true,
|
||||
"refusal_reason": null,
|
||||
"model": { "id": "qwen2.5:14b-instruct", "provider": "ollama" },
|
||||
"embedding": { "id": "multilingual-e5-small", "provider": "fastembed", "dimensions": 384 },
|
||||
"embedding": { "id": "multilingual-e5-large", "provider": "fastembed", "dimensions": 1024 },
|
||||
"prompt_template_version": "rag-v1",
|
||||
"retrieval": {
|
||||
"trace_id": "ret_4a8b2c1e",
|
||||
@@ -368,7 +374,7 @@ normalize: rrf_score = raw_rrf / (2 / (k_rrf + 1))
|
||||
"token_estimate": 480,
|
||||
"chunker_version": "md-heading-v1",
|
||||
"embeddings": [
|
||||
{ "model": "multilingual-e5-small", "dimensions": 384, "embedding_id": "e_2f1a" }
|
||||
{ "model": "multilingual-e5-large", "dimensions": 1024, "embedding_id": "e_2f1a" }
|
||||
]
|
||||
}
|
||||
```
|
||||
@@ -384,7 +390,7 @@ normalize: rrf_score = raw_rrf / (2 / (k_rrf + 1))
|
||||
{ "name": "data_dir_writable", "ok": true, "detail": "~/.local/share/kebab" },
|
||||
{ "name": "sqlite_open", "ok": true, "detail": "kebab.sqlite (schema v1)" },
|
||||
{ "name": "lancedb_open", "ok": true, "detail": "lancedb/" },
|
||||
{ "name": "embedding_model", "ok": true, "detail": "multilingual-e5-small (384d)" },
|
||||
{ "name": "embedding_model", "ok": true, "detail": "multilingual-e5-large (1024d)" },
|
||||
{ "name": "ollama_reachable", "ok": true, "detail": "http://127.0.0.1:11434" },
|
||||
{ "name": "ollama_model_pulled", "ok": false, "detail": "qwen2.5:14b-instruct missing", "hint": "ollama pull qwen2.5:14b-instruct" }
|
||||
]
|
||||
@@ -1203,9 +1209,9 @@ chunker_version = "md-heading-v1"
|
||||
|
||||
[models.embedding]
|
||||
provider = "fastembed"
|
||||
model = "multilingual-e5-small"
|
||||
model = "multilingual-e5-large"
|
||||
version = "v1"
|
||||
dimensions = 384
|
||||
dimensions = 1024
|
||||
batch_size = 64
|
||||
|
||||
[models.llm]
|
||||
@@ -1468,7 +1474,7 @@ $ kebab doctor
|
||||
✓ data_dir_writable ~/.local/share/kebab
|
||||
✓ sqlite_open kebab.sqlite (schema v1)
|
||||
✓ lancedb_open lancedb/
|
||||
✓ embedding_model multilingual-e5-small (384d)
|
||||
✓ embedding_model multilingual-e5-large (1024d)
|
||||
✓ ollama_reachable http://127.0.0.1:11434
|
||||
✗ ollama_model_pulled qwen2.5:14b-instruct missing
|
||||
hint: ollama pull qwen2.5:14b-instruct
|
||||
@@ -1504,6 +1510,26 @@ agent 가 분기). HTTP-SSE transport 는 fb-29 deferral 따라 P+. classify
|
||||
모듈은 `kebab-app::error_wire` 에 single source — kebab-cli + kebab-mcp
|
||||
공유.
|
||||
|
||||
### 10.3 Eval metrics (fb-39)
|
||||
|
||||
#### Retrieval metrics (ground-truth curated)
|
||||
|
||||
`kebab eval run` 이 golden query suite (`fixtures/golden_queries.yaml`) 대해 메트릭 계산. Curator 가 `expected_chunk_ids` 및 `expected_doc_ids` 설정 시에만 측정 가능 (shipped template 은 empty — workspace 별 자체 채움).
|
||||
|
||||
| 메트릭 | 정의 | 조건 |
|
||||
|--------|------|------|
|
||||
| `hit_at_k` | top-k 안 expected chunk 존재 여부 (binary). P(hit@k=true) 평균 | `expected_chunk_ids` 채움 |
|
||||
| `MRR` | Mean Reciprocal Rank (첫 관련 chunk rank 역수 평균) | `expected_chunk_ids` 채움 |
|
||||
| `recall_at_k_doc` | top-k 안 expected doc 비율 (`|top-k_docs ∩ expected_doc_ids| / |expected_doc_ids|`) | `expected_doc_ids` 채움 |
|
||||
| `precision_at_k_chunk` (fb-39) | top-k 안 chunk_id 가 `expected_chunk_ids` 에 포함된 비율. 분모 = k (fixed) — `top-k` 부족도 precision 손실로 간주. 빈 `expected_chunk_ids` query 는 skip. | `expected_chunk_ids` 채움 |
|
||||
|
||||
#### Groundedness metrics (rule-based)
|
||||
|
||||
| 메트릭 | 정의 |
|
||||
|--------|------|
|
||||
| `must_contain` pass | answer 문자열 이 `golden.must_contain` 의 모든 substring 포함 |
|
||||
| `forbidden` pass | answer 문자열 이 `golden.forbidden` 의 substring 미포함 |
|
||||
|
||||
---
|
||||
|
||||
## 11. 동결 범위 / 변경 정책
|
||||
|
||||
@@ -0,0 +1,133 @@
|
||||
---
|
||||
title: "p9-fb-39 — Eval foundation design (P@k metric)"
|
||||
phase: P9
|
||||
component: kebab-eval + docs
|
||||
task_id: p9-fb-39
|
||||
status: design
|
||||
target_version: 0.7.0
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3 chunking, §4 search, §7 RAG, §11 eval]
|
||||
date: 2026-05-10
|
||||
---
|
||||
|
||||
# p9-fb-39 — Eval foundation (P@k metric)
|
||||
|
||||
## Goal
|
||||
|
||||
도그푸딩 피드백 — agent / 사용자가 "rank 5+ 부터 노이즈 섞임" 지적 (precision-at-k 저하). lever (chunk policy / RRF / score_gate / cross-encoder / embedding) 선택 전, **measurement infrastructure 먼저** 정비. 본 PR scope:
|
||||
|
||||
- `AggregateMetrics` 에 `precision_at_k_chunk: BTreeMap<u32, f32>` 추가 (P@5, P@10).
|
||||
- chunk-level binary relevance 기반 — `expected_chunk_ids` 안 chunk 가 top-k 안 등장한 비율.
|
||||
- Golden set schema 무변경 — `expected_chunk_ids` 가 ground truth (curator 책임).
|
||||
- 문서화 강화 — `fixtures/golden_queries.yaml` 헤더 주석.
|
||||
|
||||
Lever 적용 (chunk policy / RRF tune / cross-encoder / embedding upgrade) 은 **본 spec 범위 외** — fb-39b 이후 별도 task 로 분리. 측정 도구가 먼저 있어야 lever 효과 비교 가능.
|
||||
|
||||
## Behavior contract
|
||||
|
||||
### Metric definition
|
||||
|
||||
```
|
||||
P@k_chunk(query) = |top-k hits ∩ expected_chunk_ids| / k
|
||||
```
|
||||
|
||||
**Denominator = k 고정**. `hits.len() < k` 인 경우에도 분모는 k — top-k 부족도 precision 손실로 간주 (`hit_at_k` 와 동일 컨벤션).
|
||||
|
||||
`expected_chunk_ids` 빈 query 는 metric 계산에서 skip (`hit_at_k_chunk` 와 동일 정책).
|
||||
|
||||
**Aggregation**: 모든 valid query (expected_chunk_ids 비어있지 않음) 의 P@k_chunk 평균. valid query 0 건이면 NaN → JSON null.
|
||||
|
||||
### Wire shape
|
||||
|
||||
`AggregateMetrics` 신규 field:
|
||||
|
||||
```rust
|
||||
pub struct AggregateMetrics {
|
||||
pub hit_at_k: BTreeMap<u32, f32>,
|
||||
pub mrr: f32,
|
||||
pub recall_at_k_doc: BTreeMap<u32, f32>,
|
||||
/// p9-fb-39: chunk-level precision at k. Binary relevance via
|
||||
/// `expected_chunk_ids`. Denominator = k (fixed). Skip queries
|
||||
/// with empty `expected_chunk_ids`.
|
||||
#[serde(default)]
|
||||
pub precision_at_k_chunk: BTreeMap<u32, f32>,
|
||||
// ... 기존 필드 ...
|
||||
}
|
||||
```
|
||||
|
||||
`#[serde(default)]` — 기존 eval_runs.metrics_json (옛 binary 가 기록한) 에 field 부재 시 empty BTreeMap 로 deserialize. backwards-compat 보장.
|
||||
|
||||
### k values
|
||||
|
||||
`compute_aggregate_metrics` 가 5, 10 두 값에 대해 계산. (기존 `hit_at_k` / `recall_at_k_doc` 가 이미 동일 k 사용 — 재사용.)
|
||||
|
||||
## Allowed / forbidden dependencies
|
||||
|
||||
- `kebab-eval`: 신규 dep 없음. metrics 모듈 확장만.
|
||||
- 다른 crate 무수정.
|
||||
|
||||
`kebab-eval` 의 `metrics` / `compare` 모듈은 retrieval / embedding / LLM crate 직접 import 금지 룰 그대로 (P5 inheritance).
|
||||
|
||||
## Public surface delta
|
||||
|
||||
### kebab-eval::metrics
|
||||
|
||||
```rust
|
||||
pub struct AggregateMetrics {
|
||||
// ... 기존 ...
|
||||
#[serde(default)]
|
||||
pub precision_at_k_chunk: BTreeMap<u32, f32>,
|
||||
}
|
||||
```
|
||||
|
||||
`compute_aggregate_metrics` body 안 새 누적 BTreeMap + 평균 계산 추가. NaN handling 은 기존 `serialize_f32_nan_as_null` 패턴 재사용 — 단, BTreeMap<u32, f32> 의 NaN 처리 패턴이 hit_at_k 와 동일하게 round_recall_map 같은 helper 통해.
|
||||
|
||||
## Test plan
|
||||
|
||||
| kind | description |
|
||||
|------|-------------|
|
||||
| unit (metrics) | `precision_at_k_chunk` empty expected → query skip → metric BTreeMap 안 entry 부재 또는 NaN |
|
||||
| unit (metrics) | exact match: 5 hits, top-3 in expected → P@5 = 3/5 = 0.6 |
|
||||
| unit (metrics) | partial top-k: hits.len() = 3 < k=5, all 3 in expected → P@5 = 3/5 = 0.6 (분모 k 고정) |
|
||||
| unit (metrics) | top-k 안 expected 0건 → P@5 = 0.0 |
|
||||
| unit (metrics) | 모든 query expected 비어있음 → P@k entry 부재 또는 NaN → JSON null |
|
||||
| unit (metrics) | `AggregateMetrics` serde roundtrip — precision_at_k_chunk 신규 field 보존 |
|
||||
| unit (metrics) | 옛 JSON (precision_at_k_chunk 부재) deserialize → empty BTreeMap default |
|
||||
| 통합 (eval runner) | runner end-to-end → eval_runs.metrics_json 안 precision_at_k_chunk 채워짐 |
|
||||
|
||||
snapshot tests (기존 metrics 출력 fixture 가 있다면 갱신 — `cargo test -p kebab-eval` 수행 후 fixture diff 확인).
|
||||
|
||||
## Implementation steps (high-level)
|
||||
|
||||
1. `kebab-eval::metrics`: `AggregateMetrics.precision_at_k_chunk` field 추가 + 계산 로직 + 단위 테스트.
|
||||
2. snapshot tests 갱신 (있다면).
|
||||
3. `fixtures/golden_queries.yaml` 헤더 주석 강화 — `expected_chunk_ids` 채우기 가이드.
|
||||
4. README `kebab eval` 섹션 또는 design §11 eval 에 P@k 정의 한 줄 추가.
|
||||
5. tasks/INDEX.md / spec status flip.
|
||||
|
||||
3-5 step PR. 단일 세션 내 완료 가능.
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- **분모 = k 고정 정책**: `hits.len() < k` 인 query 가 많으면 P@k 가 항상 < 1.0. 사용자 직관과 다를 수 있음 — README/design 에 명시.
|
||||
- **frozen design vs new metric**: design §11 eval 의 metric 표 갱신 필요. frozen contract 변경 트리거 — `target_version: 0.7.0` bump 명시.
|
||||
- **lever deferral**: 본 spec contract_sections 는 §3 chunking + §4 search + §7 RAG + §11 eval 인데, 실제 본 PR 은 §11 만 건드림. lever 적용 (chunk policy / RRF / cross-encoder / embedding) 은 fb-39b 이후 별도. spec status banner 에 명시.
|
||||
- **expected_chunk_ids 비어있는 shipped golden**: 현재 `fixtures/golden_queries.yaml` 의 g001-g005 모두 expected_chunk_ids 비어있음. P@k 계산 시 모두 skip — out-of-the-box 측정값 0건. curator 가 자기 KB 로 채워야 metric 의미 가짐. 의도 — golden set 은 workspace 의존이라 shipped fixtures 는 template, 실제 측정은 user 가 채워서 한다.
|
||||
- **fb-23 incremental ingest 와 충돌 없음**: 본 PR 은 metric 만 추가. chunker_version / embedding_version 무변경.
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Lever 적용 (chunk policy retune / RRF k tune / score_gate default ON / cross-encoder PoC / embedding model 업그레이드).
|
||||
- NDCG / MAP / 기타 ranking metric.
|
||||
- precision_at_k_doc (doc-level — recall_at_k_doc 가 이미 있음, 본 spec 은 chunk-level 만).
|
||||
- Golden set 콘텐츠 확장 (g006+ 추가) — curator 책임.
|
||||
- Synthetic golden generator (`kebab eval golden-from-corpus` 등).
|
||||
- Per-query relevance score (binary 0/1 만 — graded relevance 는 NDCG 도입 시 검토).
|
||||
|
||||
## Documentation updates (implementation PR 동시)
|
||||
|
||||
- `fixtures/golden_queries.yaml` — 헤더 주석에 `expected_chunk_ids` ground truth 의미 + P@k 측정 위해 채우기 권장 안내.
|
||||
- `README.md` — `kebab eval` 섹션 (있다면) 에 P@k metric 한 줄. 없으면 skip.
|
||||
- `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` §11 eval — metric 표에 `precision_at_k_chunk` 한 줄 추가.
|
||||
- `tasks/p9/p9-fb-39-retrieval-precision-tuning.md` — `status: open → completed`, 단 banner 에 "eval foundation only, lever 적용 deferred to fb-39b" 명시 + design/plan 링크.
|
||||
- `tasks/INDEX.md` — fb-39 행 ✅ (eval foundation only).
|
||||
@@ -0,0 +1,198 @@
|
||||
---
|
||||
title: "p9-fb-39b — Embedding model upgrade design (multilingual-e5-large)"
|
||||
phase: P9
|
||||
component: kebab-embed-local + kebab-store-vector + kebab-config + kebab-app
|
||||
task_id: p9-fb-39b
|
||||
status: design
|
||||
target_version: 0.7.0
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§4 search, §5 storage, §9 versioning cascade]
|
||||
date: 2026-05-10
|
||||
---
|
||||
|
||||
# p9-fb-39b — Embedding model upgrade
|
||||
|
||||
## Goal
|
||||
|
||||
fb-39 의 lever 적용 — embedding model 을 `multilingual-e5-small` (384 dim) 에서 `multilingual-e5-large` (1024 dim) 로 업그레이드. 도그푸딩 한국어 corpus 의 retrieval precision 개선.
|
||||
|
||||
fb-39 가 측정 도구 (P@5 / P@10) 를 추가했으므로, 본 PR 머지 후 small vs large 비교 가능.
|
||||
|
||||
`bge-m3` 검토했으나 fastembed 4.9.1 의 `EmbeddingModel` enum 에 미포함 — `UserDefinedEmbeddingModel` ONNX 직접 로드 path 는 별도 작업 (fb-39c 후보). 본 PR scope = e5-large 만.
|
||||
|
||||
## Behavior contract
|
||||
|
||||
### Embedding model
|
||||
|
||||
- 신규 default: `multilingual-e5-large` (1024 dim).
|
||||
- `kebab-embed-local::resolve_model` 에 신규 arm:
|
||||
|
||||
```rust
|
||||
"multilingual-e5-large" => Ok(EmbeddingModel::MultilingualE5Large),
|
||||
```
|
||||
|
||||
기존 `multilingual-e5-small` arm 그대로 (backwards-compat opt-out).
|
||||
|
||||
### Config defaults
|
||||
|
||||
- `Config::defaults().models.embedding.model`: `"multilingual-e5-small"` → `"multilingual-e5-large"`.
|
||||
- `Config::defaults().models.embedding.dimensions`: `384` → `1024`.
|
||||
- `kebab init` 가 생성하는 config.toml 템플릿 동일 갱신.
|
||||
|
||||
기존 user TOML 이 `model = "multilingual-e5-small"` 또는 `dimensions = 384` 명시한 경우 그대로 유지 — `serde` 가 user value 우선. opt-out 가능.
|
||||
|
||||
### Cascade
|
||||
|
||||
- `embedding_version`: 자동 변경 (config.models.embedding.model 값 그대로 wire 에 emit). `multilingual-e5-small` → `multilingual-e5-large`.
|
||||
- fb-23 incremental ingest: 4-input match (blake3 + parser_version + chunker_version + embedding_version) 에서 embedding_version 깨짐 → 모든 chunk 재-embed. text/parse/chunk 비용 회피, embed 비용만 발생.
|
||||
- `eval_runs.config_snapshot_json`: 새 version 자동 기록. 비교 시 동일 version 끼리.
|
||||
- design §9 cascade rule 의 5 키 중 `embedding_version` 변경 — binary release 트리거 (CLAUDE.md `Versioning cascade` 룰).
|
||||
|
||||
### Migration policy
|
||||
|
||||
LanceDB stored vectors 의 dim 과 `config.models.embedding.dimensions` 가 mismatch 면:
|
||||
|
||||
- `LanceVectorStore::open` (또는 첫 호출) 가 비교 → mismatch 시 신규 `ErrorV1`:
|
||||
- `code = "embedding_dim_mismatch"`
|
||||
- `message`: `"vector index dim 384 vs config dim 1024"`
|
||||
- `hint`: `"기존 vector index 가 4-dim, config 는 N-dim. 'kebab reset --vector-only && kebab ingest' 로 재구축."`
|
||||
- CLI: exit 1 + error.v1 stderr (또는 비-`--json` 모드 plain stderr).
|
||||
- silent migration / auto-wipe 안 함 — 사용자 명시 동의 필요.
|
||||
|
||||
remediation flow:
|
||||
|
||||
```
|
||||
$ kebab search "..."
|
||||
error: vector index dim 384 vs config dim 1024
|
||||
|
||||
Hint: 기존 vector index 가 384-dim, config 는 1024-dim.
|
||||
'kebab reset --vector-only && kebab ingest' 로 재구축.
|
||||
|
||||
$ kebab reset --vector-only
|
||||
[wipe LanceDB + SQLite embedding_records]
|
||||
|
||||
$ kebab ingest
|
||||
[full re-embed with new model — fastembed downloads e5-large ONNX (~1.3 GB) on first run]
|
||||
```
|
||||
|
||||
### Wire shape
|
||||
|
||||
신규 wire field 없음. `error.v1.code` 의 valid value namespace 에 `"embedding_dim_mismatch"` 추가 (string, enum 아님 — additive).
|
||||
|
||||
## Allowed / forbidden dependencies
|
||||
|
||||
- `kebab-embed-local`: 신규 dep 없음. fastembed enum variant 추가만.
|
||||
- `kebab-store-vector`: 신규 dep 없음. LanceDB schema reader 사용.
|
||||
- `kebab-config`: 신규 dep 없음. defaults 값 변경.
|
||||
- `kebab-app`: 신규 dep 없음. error propagation.
|
||||
|
||||
`kebab-core` 의 다른 `kebab-*` 의존 금지 룰 그대로.
|
||||
|
||||
## Public surface delta
|
||||
|
||||
### kebab-embed-local (`lib.rs`)
|
||||
|
||||
```rust
|
||||
fn resolve_model(name: &str) -> Result<EmbeddingModel> {
|
||||
match name {
|
||||
"multilingual-e5-small" => Ok(EmbeddingModel::MultilingualE5Small),
|
||||
"multilingual-e5-large" => Ok(EmbeddingModel::MultilingualE5Large), // 신규
|
||||
other => anyhow::bail!(/* ... */),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### kebab-config (defaults + TOML 템플릿)
|
||||
|
||||
```rust
|
||||
EmbeddingCfg {
|
||||
provider: "fastembed".to_string(),
|
||||
model: "multilingual-e5-large".to_string(),
|
||||
dimensions: 1024,
|
||||
// ... 기타 ...
|
||||
}
|
||||
```
|
||||
|
||||
generated config.toml 템플릿 도 같이 갱신.
|
||||
|
||||
### kebab-store-vector (`lib.rs` 또는 신규 helper)
|
||||
|
||||
```rust
|
||||
impl LanceVectorStore {
|
||||
pub fn open(...) -> Result<Self> {
|
||||
// 기존 open 로직 ...
|
||||
let stored_dim = read_schema_vector_dim(&table)?;
|
||||
if stored_dim != config_dim {
|
||||
anyhow::bail!(StructuredError(ErrorV1 {
|
||||
code: "embedding_dim_mismatch".to_string(),
|
||||
message: format!("vector index dim {stored_dim} vs config dim {config_dim}"),
|
||||
hint: Some(format!(
|
||||
"기존 vector index 가 {stored_dim}-dim, config 는 {config_dim}-dim. \
|
||||
'kebab reset --vector-only && kebab ingest' 로 재구축."
|
||||
)),
|
||||
// ...
|
||||
}));
|
||||
}
|
||||
Ok(...)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
(정확한 LanceDB schema reading API 는 구현 시 확인 — `Table::schema()` 또는 `arrow_schema::Schema` 직접 inspect.)
|
||||
|
||||
## Test plan
|
||||
|
||||
| kind | description |
|
||||
|------|-------------|
|
||||
| unit (kebab-embed-local) | `resolve_model("multilingual-e5-large")` returns Ok |
|
||||
| unit (kebab-embed-local) | `check_dim(1024, 1024)` ok |
|
||||
| unit (kebab-embed-local) | `check_dim(384, 1024)` Err — message mentions both dims |
|
||||
| unit (kebab-config) | `Config::defaults().models.embedding.model == "multilingual-e5-large"` |
|
||||
| unit (kebab-config) | `Config::defaults().models.embedding.dimensions == 1024` |
|
||||
| unit (kebab-config) | TOML `model = "multilingual-e5-small"` deserialize 정상 (backwards-compat) |
|
||||
| unit (kebab-config) | 생성된 config.toml 템플릿 안 `model = "multilingual-e5-large"`, `dimensions = 1024` |
|
||||
| unit (kebab-store-vector) | mismatch fixture (384-dim stored + 1024 cfg) → `embedding_dim_mismatch` ErrorV1 |
|
||||
| 통합 (kebab-cli) | mismatch scenario — pre-existing 384-dim DB + new config → exit 1 + error.v1 stderr (`code = embedding_dim_mismatch`) + hint mentions reset --vector-only |
|
||||
| 통합 (kebab-cli) | small config 로 fresh ingest + search → 정상 (backwards-compat path 검증) |
|
||||
|
||||
`multilingual-e5-large` 모델 다운로드 회피 위해 unit/integration 테스트는 fixture 또는 mock — 실 모델 호출 안 함. 첫 도그푸딩 시 사용자가 fastembed cache 다운로드.
|
||||
|
||||
## Implementation steps (high-level)
|
||||
|
||||
1. `kebab-embed-local::resolve_model` arm + check_dim 단위 테스트.
|
||||
2. `kebab-store-vector` dim mismatch detection + ErrorV1 + 단위 테스트.
|
||||
3. `kebab-config` defaults flip + TOML 템플릿 + 단위 테스트.
|
||||
4. `kebab-cli` integration: mismatch error.v1 wire + backwards-compat path 통합 테스트.
|
||||
5. README + SMOKE + design + HOTFIXES + status flip.
|
||||
|
||||
5 task. 단일 PR, single 세션 가능.
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- **첫 실행 모델 다운로드**: e5-large ONNX ~1.3 GB. fastembed cache (`config.storage.model_dir/fastembed/`) 에 자동 다운로드 (첫 호출 시). progress 표시 없음 — 사용자 침묵 latency. `kebab doctor` 또는 README 에 경고 안내.
|
||||
- **Search/ingest latency**: e5-large 가 e5-small 대비 ~3-4× embedding 시간. ingest 비용 증가 (one-time + 신규 docs). search 시 query embed per-call 증가.
|
||||
- **Disk usage**: vector dim 2.6× → LanceDB 약 2.7× 증가.
|
||||
- **HOTFIXES entry**: dim mismatch UX (error.v1 + reset --vector-only flow) 가 frozen design 안 명시 안 된 신규 동작 — HOTFIXES 한 항목 추가.
|
||||
- **eval comparison**: fb-39 P@k 가 측정 도구. 도그푸딩 corpus + golden 의 expected_chunk_ids 채워서 small vs large 정량 비교 별도 (PR 안 의무 아님).
|
||||
- **fb-23 incremental ingest 와의 상호작용**: embedding_version 변경 → 모든 doc 재-embed. fb-23 의 unchanged path 는 한 번도 hit 안 함 (예상 동작).
|
||||
- **release trigger**: design §9 cascade rule 의 `embedding_version` 변경 → CLAUDE.md `Versioning cascade` 룰에 따라 binary 0.6 → 0.7 minor bump 필요.
|
||||
|
||||
## Out of scope
|
||||
|
||||
- bge-m3 또는 user-defined ONNX path (fb-39c 후보).
|
||||
- Other lever (RRF / cross-encoder / chunk policy).
|
||||
- Auto-migration / background re-vector.
|
||||
- LanceDB schema migration tooling (별도 wipe + re-ingest).
|
||||
- multi-model coexistence (한 KB 안 small + large 동시).
|
||||
- precision 정량 비교 의무 (별도 도그푸딩).
|
||||
|
||||
## Documentation updates (implementation PR 동시)
|
||||
|
||||
- `README.md` `[models.embedding]` config 섹션 — default 변경 + small opt-out 안내 + dim mismatch 시 reset 명령 안내.
|
||||
- `docs/SMOKE.md` — upgrade walkthrough (`kebab reset --vector-only && kebab ingest` 시퀀스 + 첫 ONNX 다운로드 latency 경고).
|
||||
- `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` §5 storage / §9 versioning 적절 절 — 새 default + dim 1024 명시.
|
||||
- `tasks/HOTFIXES.md` — dim mismatch UX entry.
|
||||
- `tasks/p9/p9-fb-39-retrieval-precision-tuning.md` banner — fb-39b lever 적용 (embedding upgrade) ✅ 추가 (단 spec status 는 fb-39 frozen).
|
||||
- `tasks/p9/p9-fb-39b-embedding-upgrade.md` 신규 task spec (만들거나, fb-39 sub-task 로 frontmatter 처리).
|
||||
- `tasks/INDEX.md` — fb-39b 행 추가 ✅.
|
||||
- 본 PR 머지 후 `chore: bump version 0.6 → 0.7` + tag (CLAUDE.md release 절차).
|
||||
20
docs/wire-schema/v1/bulk_search_item.schema.json
Normal file
20
docs/wire-schema/v1/bulk_search_item.schema.json
Normal file
@@ -0,0 +1,20 @@
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"$id": "https://kb.local/wire/v1/bulk_search_item.schema.json",
|
||||
"title": "BulkSearchItem v1",
|
||||
"description": "p9-fb-42: per-query result inside a bulk_search response. `response` XOR `error` — exactly one is non-null. `query` is the input echo so consumers can correlate without index tracking.",
|
||||
"type": "object",
|
||||
"required": ["schema_version", "query", "response", "error"],
|
||||
"properties": {
|
||||
"schema_version": { "const": "bulk_search_item.v1" },
|
||||
"query": { "type": "object", "description": "Input echo (verbatim JSON object)." },
|
||||
"response":{
|
||||
"type": ["object", "null"],
|
||||
"description": "search_response.v1 payload on success; null when error is non-null."
|
||||
},
|
||||
"error": {
|
||||
"type": ["object", "null"],
|
||||
"description": "error.v1 payload when this query failed; null on success."
|
||||
}
|
||||
}
|
||||
}
|
||||
24
docs/wire-schema/v1/bulk_search_response.schema.json
Normal file
24
docs/wire-schema/v1/bulk_search_response.schema.json
Normal file
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"$id": "https://kb.local/wire/v1/bulk_search_response.schema.json",
|
||||
"title": "BulkSearchResponse v1",
|
||||
"description": "p9-fb-42: MCP envelope for bulk_search. CLI emits raw `bulk_search_item.v1` ndjson without this envelope (summary on stderr).",
|
||||
"type": "object",
|
||||
"required": ["schema_version", "results", "summary"],
|
||||
"properties": {
|
||||
"schema_version": { "const": "bulk_search_response.v1" },
|
||||
"results": {
|
||||
"type": "array",
|
||||
"items": { "type": "object", "description": "bulk_search_item.v1" }
|
||||
},
|
||||
"summary": {
|
||||
"type": "object",
|
||||
"required": ["total", "succeeded", "failed"],
|
||||
"properties": {
|
||||
"total": { "type": "integer", "minimum": 0 },
|
||||
"succeeded": { "type": "integer", "minimum": 0 },
|
||||
"failed": { "type": "integer", "minimum": 0 }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -24,7 +24,7 @@
|
||||
"required": [
|
||||
"json_mode", "ingest_progress", "ingest_cancellation",
|
||||
"rag_multi_turn", "search_cache", "incremental_ingest",
|
||||
"streaming_ask", "http_daemon", "mcp_server", "single_file_ingest"
|
||||
"streaming_ask", "http_daemon", "mcp_server", "single_file_ingest", "bulk_search"
|
||||
]
|
||||
},
|
||||
"models": {
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Golden query suite for `kb eval run` (P5-1 / P5-2).
|
||||
# Golden query suite for `kebab eval run` (P5-1 / P5-2 / fb-39).
|
||||
#
|
||||
# Top-level: list of queries. Required fields: `id`, `query`. All
|
||||
# others are optional and default to empty / null.
|
||||
@@ -7,8 +7,16 @@
|
||||
# real rows in the active workspace's SQLite store at run time. Stale
|
||||
# references make the runner bail at start. The shipped template
|
||||
# leaves them empty so the file is loadable on any fresh workspace —
|
||||
# fill them in after a `kb ingest` to enable hit@k / MRR metrics
|
||||
# (P5-2).
|
||||
# fill them in after a `kebab ingest` to enable the metrics that
|
||||
# require ground truth (P5-2 + fb-39):
|
||||
#
|
||||
# - `expected_chunk_ids` → hit_at_k, MRR, precision_at_k_chunk (fb-39)
|
||||
# - `expected_doc_ids` → recall_at_k_doc
|
||||
#
|
||||
# `precision_at_k_chunk` (fb-39): of the top-k retrieved hits, what
|
||||
# fraction's `chunk_id` is in `expected_chunk_ids`. Denominator is k
|
||||
# (fixed) — `top-k` shortfall is treated as precision loss. Queries
|
||||
# with empty `expected_chunk_ids` are skipped from this metric.
|
||||
#
|
||||
# `must_contain` / `forbidden` drive the rule-based groundedness
|
||||
# metric (P5-2).
|
||||
|
||||
@@ -28,11 +28,12 @@ User-specific trigger keywords (team names, system names, internal acronyms) bel
|
||||
|
||||
## MCP tools (preferred)
|
||||
|
||||
When `kebab` is registered as an MCP server (see `~/.claude/mcp.json` example below), seven tools are exposed as `mcp__kebab__<name>`:
|
||||
When `kebab` is registered as an MCP server (see `~/.claude/mcp.json` example below), eight tools are exposed as `mcp__kebab__<name>`:
|
||||
|
||||
| tool | purpose | mutation |
|
||||
|------|---------|----------|
|
||||
| `mcp__kebab__search` | corpus search → `search_response.v1` (`{hits, next_cursor, truncated}`) | no |
|
||||
| `mcp__kebab__bulk_search` | N queries in one call → `bulk_search_response.v1` (`{results, summary}`) | no |
|
||||
| `mcp__kebab__ask` | RAG answer → `answer.v1` | no |
|
||||
| `mcp__kebab__fetch` | verbatim text → `fetch_result.v1` (chunk / doc / span) | no |
|
||||
| `mcp__kebab__schema` | capability discovery → `schema.v1` | no |
|
||||
@@ -60,6 +61,17 @@ Input:
|
||||
- When `truncated: true`, the budget loop modified the page (snippet shortening or k reduction). `next_cursor` is **independent** — non-null whenever more hits may be reachable. Caller may widen `max_tokens` (re-issue same query for fuller snippets / more hits per page) or follow `next_cursor` (advance through more hits) or both. Mismatched cursor (corpus_revision changed) returns `error.v1.code = stale_cursor` — re-issue the search to obtain a fresh one.
|
||||
- **`trace: true` (p9-fb-37)** — debug aid. Response carries an extra `trace` block: `lexical[]` + `vector[]` (pre-fusion candidates), `rrf_inputs[]` (RRF union before final cut), and `timing` (`lexical_ms`, `vector_ms`, `fusion_ms`, `total_ms`). Trace bypasses the search cache (always cold). Use sparingly — it bloats the wire response and is for diagnosing "why did this hit / not hit", not normal retrieval.
|
||||
|
||||
### `mcp__kebab__bulk_search`
|
||||
|
||||
N개 query 한 번에 — agent loop 효율 개선. 각 query 는 `mcp__kebab__search` 와 동일 input shape (query 필수, 나머지 optional). Cap 100.
|
||||
|
||||
Input:
|
||||
```json
|
||||
{"queries": [{"query": "..."}, {"query": "...", "mode": "lexical"}, ...]}
|
||||
```
|
||||
|
||||
Output: `bulk_search_response.v1` envelope — `results: [bulk_search_item.v1]` (각 item = `{query, response | null, error | null}`) + `summary: {total, succeeded, failed}`. Per-query 실패는 item 의 error 에 격리, 다른 query 계속 진행.
|
||||
|
||||
### `mcp__kebab__ask` — when you need the answer
|
||||
|
||||
Use when the user wants a synthesized answer, not a list of links.
|
||||
|
||||
@@ -14,6 +14,22 @@ historical contract that was implemented; this file accumulates the
|
||||
deltas so phase 5+ readers can find the live behavior without diffing
|
||||
git history.
|
||||
|
||||
## 2026-05-10 — p9-fb-39b: embedding upgrade UX
|
||||
|
||||
**무엇이 바뀌었나**: default embedding 이 `multilingual-e5-small` (384 dim) 에서 `multilingual-e5-large` (1024 dim) 로 변경. LanceDB 테이블은 `(model, dim)` 으로 네임스페이스되어 새 모델은 fresh 테이블에 쓰고, 옛 `chunk_embeddings_multilingual-e5-small_384` 테이블은 orphan 상태 됨.
|
||||
|
||||
**user TOML 에 small 명시한 경우**: backwards-compat 유지. 사용자가 `[models.embedding] model = "multilingual-e5-small"` 로 명시했으면 그대로 small 사용 (새 default 무시).
|
||||
|
||||
**idempotent re-embed**: fb-23 incremental ingest 가 embedding_version mismatch 감지하면 자동으로 이전 chunk 를 새 모델로 re-embed. 다음 `kebab ingest` 호출 시 기존 chunk 의 embedding 을 새 테이블에 재작성.
|
||||
|
||||
**disk 절약**: 이전 모델의 orphan 테이블을 먼저 정리하려면 `kebab reset --vector-only` 실행 (LanceDB + SQLite `embedding_records` 모두 wipe). 이후 `kebab ingest` 가 모든 chunk 를 새 모델로 re-embed 해 새 테이블 채움.
|
||||
|
||||
**search/ask 결과**: re-embed 전까지는 empty hit (새 모델에 데이터가 없음). `kebab ingest` 후 정상 검색 가능.
|
||||
|
||||
**Spec contract 와의 관계**: design §5 (storage) + §9 (versioning cascade) 의 embedding_model.id / dimensions 변경. wire 의 `embedding_version` 필드 (kebab-app schema.v1.models.embedding_version 가 config.models.embedding.model 값을 그대로 emit) 변경 — CLAUDE.md cascade rule 의 release 트리거. 본 PR 머지 후 `chore: bump version 0.5 → 0.6` + tag 필요.
|
||||
|
||||
**Spec deviation**: design `2026-05-10-p9-fb-39b-embedding-upgrade-design.md` 의 §Migration policy + §Public surface delta 가 `LanceVectorStore::open` 안 신규 `error.v1.code = "embedding_dim_mismatch"` 명시했으나 구현 제외. 이유: LanceDB tables 가 `(model, dim)` namespaced — silent orphan + empty-hit 으로 surface (hard error 아님). 명시 error 필요 시 별도 startup health check 작업 필요 (fb-39c 후보 또는 doctor 확장).
|
||||
|
||||
## 2026-05-09 — p9-fb-34: search wire wrapped in search_response.v1
|
||||
|
||||
**무엇이 바뀌었나**: `kebab search --json` stdout 이 기존 `search_hit.v1[]` 배열에서 신규 `search_response.v1` object 로 교체. wrapper 가 `hits`, `next_cursor`, `truncated` 세 필드를 가짐.
|
||||
|
||||
@@ -129,12 +129,13 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.
|
||||
|
||||
### 🎯 0.5.0 — RAG quality (cascade 동반: V00X + reindex)
|
||||
- [p9-fb-38 score semantics](p9/p9-fb-38-score-semantics.md) — ✅ 머지 (2026-05-10)
|
||||
- [p9-fb-39 retrieval precision 튜닝](p9/p9-fb-39-retrieval-precision-tuning.md) — ⏳ 미구현, brainstorm 필요 (embedding_version cascade)
|
||||
- [p9-fb-39 retrieval precision 튜닝](p9/p9-fb-39-retrieval-precision-tuning.md) — ✅ 머지 (2026-05-10) — eval foundation only, lever 적용 deferred
|
||||
- [p9-fb-39b embedding upgrade](p9/p9-fb-39b-embedding-upgrade.md) — ✅ 머지 (2026-05-10) — multilingual-e5-large default
|
||||
- [p9-fb-40 fact-grounded answer](p9/p9-fb-40-fact-grounded-answer.md) — ✅ 머지 (2026-05-10)
|
||||
|
||||
### 🎯 0.6.0 또는 P+ — reasoning
|
||||
- [p9-fb-41 multi-hop reasoning](p9/p9-fb-41-multi-hop-reasoning.md) — ⏳ 미구현, brainstorm 필요 (XL, eval 인프라 선행)
|
||||
- [p9-fb-42 bulk multi-query + re-rank hint](p9/p9-fb-42-bulk-multi-query-rerank.md) — ⏳ 미구현, brainstorm 필요 (Nice)
|
||||
- [p9-fb-42 bulk multi-query + re-rank hint](p9/p9-fb-42-bulk-multi-query-rerank.md) — ✅ 머지 (2026-05-10) — bulk only, rerank hint deferred
|
||||
|
||||
## Post-merge 핫픽스
|
||||
|
||||
|
||||
@@ -1,20 +1,24 @@
|
||||
---
|
||||
phase: P9
|
||||
component: kebab-search + kebab-rag + kebab-chunk
|
||||
component: kebab-eval + docs
|
||||
task_id: p9-fb-39
|
||||
title: "Retrieval precision 튜닝 (rank 5+ 노이즈 완화)"
|
||||
status: open
|
||||
target_version: 0.5.0
|
||||
status: completed
|
||||
target_version: 0.7.0
|
||||
depends_on: []
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3 chunking, §4 search, §7 RAG]
|
||||
contract_sections: [§3 chunking, §4 search, §7 RAG, §10.3 eval metrics]
|
||||
source_feedback: 사용자 도그푸딩 2026-05-06 — Claude Code 가 kebab CLI 사용 후 "rank 5+ 부터 노이즈 섞임" 지적. precision-at-k 가 k=5 이후 떨어짐.
|
||||
---
|
||||
|
||||
# p9-fb-39 — Retrieval precision 튜닝
|
||||
|
||||
> ⏳ **백로그 only — 미구현.** 본 spec 은 도그푸딩 피드백 skeleton. 구현 착수 전 [superpowers:brainstorming](../../docs/superpowers/) 으로 설계 단계 선행 필요. 어느 lever (chunk policy / RRF k / score gate / cross-encoder / embedding 업그레이드) 부터 손볼지, eval golden set 선행 여부 brainstorm 후 결정.
|
||||
> ✅ **Eval foundation 부분 구현 완료.** P@k metric (P@5, P@10) 추가. 본 spec 의 lever 적용 (chunk policy / RRF / cross-encoder / embedding 업그레이드) 은 별도 task 로 분리 (fb-39b 이후).
|
||||
>
|
||||
> - Design: [`docs/superpowers/specs/2026-05-10-p9-fb-39-eval-foundation-design.md`](../../docs/superpowers/specs/2026-05-10-p9-fb-39-eval-foundation-design.md)
|
||||
> - Plan: [`docs/superpowers/plans/2026-05-10-p9-fb-39-eval-foundation.md`](../../docs/superpowers/plans/2026-05-10-p9-fb-39-eval-foundation.md)
|
||||
> - fb-39b (lever 적용 — embedding upgrade): [`tasks/p9/p9-fb-39b-embedding-upgrade.md`](./p9-fb-39b-embedding-upgrade.md) ✅
|
||||
|
||||
## 증상 / 동기
|
||||
|
||||
|
||||
76
tasks/p9/p9-fb-39b-embedding-upgrade.md
Normal file
76
tasks/p9/p9-fb-39b-embedding-upgrade.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
phase: P9
|
||||
component: kebab-embed-local + kebab-config + kebab-store-vector + docs
|
||||
task_id: p9-fb-39b
|
||||
title: "Embedding model upgrade (multilingual-e5-large)"
|
||||
status: completed
|
||||
target_version: 0.6.0
|
||||
depends_on: [p9-fb-39]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§4 search, §5 storage, §9 versioning cascade]
|
||||
source_feedback: 사용자 도그푸딩 2026-05-06 — Claude Code 가 kebab CLI 사용 후 "rank 5+ 노이즈 섞임" 지적 (fb-39 의 lever 적용 측면).
|
||||
---
|
||||
|
||||
# p9-fb-39b — Embedding model upgrade
|
||||
|
||||
> ✅ **구현 완료.** fb-39 의 lever 후보 4개 중 embedding model 업그레이드 lever 적용. P@k metric (fb-39) 으로 small vs large 비교 가능.
|
||||
>
|
||||
> - Design update: [`docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`](../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md) §5 / §9
|
||||
> - Plan: [`docs/superpowers/plans/2026-05-10-p9-fb-39b-embedding-upgrade.md`](../../docs/superpowers/plans/2026-05-10-p9-fb-39b-embedding-upgrade.md)
|
||||
|
||||
## 요약
|
||||
|
||||
- `multilingual-e5-small` (384 dim) → `multilingual-e5-large` (1024 dim) default flip.
|
||||
- 기존 user TOML 이 small 명시 시 그대로 유지 (backwards-compat).
|
||||
- fb-23 incremental ingest 가 embedding_version mismatch 감지 → 자동 re-embed.
|
||||
- 0.5 → 0.6 minor bump 트리거 (design §9 cascade rule, current Cargo.toml = 0.5.0).
|
||||
|
||||
## 구현 항목
|
||||
|
||||
1. **config defaults flip** — `[models.embedding] model = "multilingual-e5-large"`, `dimensions = 1024`.
|
||||
2. **fastembed e5-large resolution** — `kebab-embed-local` 의 `resolve_model()` 에 e5-large arm 추가.
|
||||
3. **fixture sweep** — 모든 unit/integration 테스트의 default embedding 모델 확인. Config 에서 명시하지 않으면 새 default 따름 (`provider = "none"` 테스트 제외).
|
||||
4. **design contract update** — design §5 (storage example) + §9 (versioning table) 의 embedding_model.id + dimensions 갱신.
|
||||
5. **HOTFIXES entry** — 사용자 재 ingest 절차 + backwards-compat 동작 명시.
|
||||
6. **README update** — `[models.embedding]` 섹션의 기본값 + `dimensions` 필드 설명 갱신.
|
||||
7. **SMOKE.md append** — 스모크 테스트 중 embedding 업그레이드 검증 절차 (reset → config 갱신 → ingest → eval).
|
||||
8. **tasks/INDEX.md append** — p9-fb-39b row 추가 (p9-fb-39 sibling).
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kebab-embed-local` — fastembed crate + `kebab-core`
|
||||
- `kebab-config` — toml crate
|
||||
- `kebab-store-vector` — lancedb crate (table naming 로직만 영향)
|
||||
- `kebab-app` — 와이어링만 (API 변경 없음)
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- parse-* crate (parser 무관)
|
||||
- llm-* crate (embedding 과 무관)
|
||||
- search crate (검색 로직은 adapter pattern 으로 이미 generic)
|
||||
|
||||
## Test
|
||||
|
||||
- `cargo test -p kebab-embed-local -- e5_large` (새 arm 테스트)
|
||||
- `cargo test -p kebab-config -- embedding_defaults` (config defaults)
|
||||
- `cargo test --workspace --no-fail-fast -j 1` (full regression)
|
||||
- Smoke: `kebab --config /tmp/smoke.toml doctor | grep embedding` → `multilingual-e5-large (1024d)`
|
||||
- Smoke: `kebab --config /tmp/smoke.toml ingest` → embedding 진행 표시 + dimension check
|
||||
- P@k eval: `kebab eval run` (fb-39 의 golden set) small vs large 비교
|
||||
|
||||
## Backward compat notes
|
||||
|
||||
- Pre-fb-39b user 가 config 에서 명시하지 않은 embedding → new default (large) 자동 적용. TOML 에 `model = "multilingual-e5-small"` 명시하면 유지.
|
||||
- `kebab-config` 의 `EmbeddingCfg.model` 은 String 필드 — TOML 에 명시한 값이 default 를 override (serde 기본 동작).
|
||||
- Orphan LanceDB table (`chunk_embeddings_multilingual-e5-small_384`) 은 다음 `kebab ingest` 실행 후 stale 취급 — 사용자가 수동 `kebab reset --vector-only` 로 정리 가능.
|
||||
|
||||
## Binary version bump
|
||||
|
||||
- 0.5.0 → 0.6.0 (current Cargo.toml = 0.5.0; embedding_version cascade triggers minor bump per design §9).
|
||||
- Release notes: `embedding default: multilingual-e5-small (384d) → multilingual-e5-large (1024d), P@k metric ↑`.
|
||||
|
||||
## Post-merge deviation
|
||||
|
||||
- **`embedding_dim_mismatch` ErrorV1 dropped**: design spec §Migration policy 가 `LanceVectorStore::open` 안 dim mismatch 감지 + 신규 `error.v1.code = "embedding_dim_mismatch"` 를 명시했으나 구현에서 제외. 이유: LanceDB tables 가 `(model, dim)` namespaced (`crates/kebab-store-vector/src/paths.rs:21`) — 신규 model 변경 시 새 table 자동 생성, 옛 table orphan. dim mismatch 가 hard error 되지 않고 검색 결과 0건 (silent precision loss) 으로 surface. HOTFIXES 항목이 documentation source. 명시 error 가 의미 있으려면 별도 startup health check 필요 — fb-39c 후보 또는 v0.7.0 의 doctor 확장.
|
||||
- 영향: 사용자가 model 변경 후 `kebab ingest` 안 하면 검색 결과 0건. README + SMOKE walkthrough 가 reset --vector-only && ingest 시퀀스 안내.
|
||||
@@ -3,7 +3,7 @@ phase: P9
|
||||
component: kebab-cli + kebab-search
|
||||
task_id: p9-fb-42
|
||||
title: "Bulk multi-query + re-rank hint — agent loop 효율"
|
||||
status: open
|
||||
status: completed
|
||||
target_version: 0.6.0+
|
||||
depends_on: []
|
||||
unblocks: []
|
||||
@@ -14,7 +14,10 @@ source_feedback: 사용자 도그푸딩 2026-05-06 — agent 가 N 개 query 동
|
||||
|
||||
# p9-fb-42 — Bulk multi-query + re-rank hint
|
||||
|
||||
> ⏳ **백로그 only — 미구현 (Nice-to-have).** 본 spec 은 도그푸딩 피드백 skeleton. 구현 착수 전 [superpowers:brainstorming](../../docs/superpowers/) 으로 설계 단계 선행 필요. multi-query input 형식 / 결과 합성 정책 / re-rank hint 의 LLM 호출 비용 brainstorm 후 확정.
|
||||
> ✅ **Bulk multi-query 부분 구현 완료.** 본 spec 의 rerank hint lever 는 별도 task 로 분리 (fb-39 cross-encoder 설계 후).
|
||||
>
|
||||
> - Design: [`docs/superpowers/specs/2026-05-10-p9-fb-42-bulk-multi-query-design.md`](../../docs/superpowers/specs/2026-05-10-p9-fb-42-bulk-multi-query-design.md)
|
||||
> - Plan: [`docs/superpowers/plans/2026-05-10-p9-fb-42-bulk-multi-query.md`](../../docs/superpowers/plans/2026-05-10-p9-fb-42-bulk-multi-query.md)
|
||||
|
||||
## 증상 / 동기
|
||||
|
||||
|
||||
Reference in New Issue
Block a user