Compare commits

...

13 Commits

Author SHA1 Message Date
th-kim0823
f303c76f52 plan(fb-39): eval foundation implementation plan
4 tasks: AggregateMetrics.precision_at_k_chunk field + serde
backwards-compat, compute aggregation in loop with 5 unit tests,
golden YAML header doc strengthening, design §11 + INDEX + status
flip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 22:19:44 +09:00
th-kim0823
cd5b1e3bfc spec(fb-39): eval foundation design (P@k metric)
- AggregateMetrics 에 precision_at_k_chunk: BTreeMap<u32, f32>
  (P@5, P@10) 추가, binary relevance via expected_chunk_ids
- Denominator = k 고정 (hits.len() < k 도 precision 손실 간주)
- Empty expected_chunk_ids query 는 skip (hit_at_k 동일 정책)
- Lever 적용 (chunk policy / RRF / cross-encoder / embedding) 은
  본 spec 범위 외 — fb-39b 이후 별도 task
- Golden set schema 무변경, shipped fixtures 헤더 주석만 강화

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 22:05:09 +09:00
3a9a52326d Merge pull request 'feat(fb-42): bulk multi-query — kebab search --bulk + mcp__kebab__bulk_search' (#134) from feat/fb-42-bulk-multi-query into main
Reviewed-on: #134
2026-05-10 12:27:11 +00:00
th-kim0823
b53376e96e fix(fb-42): address PR #134 round 1 review
- print_schema_text plain mode: include bulk_search capability row
- README: tool count 7 → 8, fetch added to MCP tool name lists

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:19:20 +09:00
th-kim0823
441f1192ee docs(fb-42): wire schema + README + SMOKE + design + SKILL + INDEX
- Add bulk_search_item.v1 + bulk_search_response.v1 wire schemas
- Register both in WIRE_SCHEMAS const
- README: --bulk flag mention + MCP tool list 7→8 (bulk_search)
- SMOKE: bulk multi-query walkthrough (CLI + MCP equivalent)
- Design §2.2: Bulk multi-query (fb-42) subsection (additive minor)
- SKILL: mcp__kebab__bulk_search section + tool table row
- Task spec status open→completed, banner replaced
- INDEX: fb-42 row 머지 (rerank hint deferred)
- Fix: missed Capabilities {bulk_search} in cli wire.rs test (Task 7 leftover)
- Fix: missed tools.len() 7→8 in cli_mcp_smoke (Task 5 leftover)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:07:36 +09:00
th-kim0823
e8da415624 feat(schema): bulk_search capability flag (fb-42)
- Capabilities.bulk_search: true (snapshot)
- schema.v1 wire required list updated

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:49:09 +09:00
th-kim0823
d8e5f35601 test(mcp): integration tests for bulk_search tool (fb-42) 2026-05-10 20:33:32 +09:00
th-kim0823
6ab0d782ef feat(mcp): kebab__bulk_search tool (fb-42)
Exposes bulk multi-query search via MCP `bulk_search` tool:
- Input: { queries: [SearchInput shapes...] }, capped at 100
- Output: bulk_search_response.v1 with per-query results + summary
- Sequential execution reuses App instance for cache amortization
- Per-query errors embed error.v1 JSON; never aborts bulk call

Updates tool count from 7 to 8 in lib.rs comment + tools_list test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:31:20 +09:00
th-kim0823
2bbe94eb05 test(cli): integration tests for kebab search --bulk (fb-42) 2026-05-10 20:26:07 +09:00
th-kim0823
9ac13fa256 fix(cli): make query optional when --bulk is set (fb-42) 2026-05-10 20:26:03 +09:00
th-kim0823
67f2c16cc2 feat(cli): kebab search --bulk flag + stdin ndjson + output stream (fb-42) 2026-05-10 20:22:45 +09:00
th-kim0823
1ebbd6b711 feat(app): bulk_search_with_config facade (fb-42) 2026-05-10 20:18:49 +09:00
th-kim0823
892175d009 feat(core): BulkSearchItem / Summary / Response types (fb-42) 2026-05-10 20:12:31 +09:00
25 changed files with 1509 additions and 22 deletions

View File

@@ -71,7 +71,7 @@ kebab doctor
|------|------|
| `kebab init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 |
| `kebab ingest [<path>]` | Markdown / 이미지 / PDF 색인 (idempotent). TTY 에서는 stderr 진행 바, non-TTY (CI / pipe) 는 stderr 한 줄씩, `--json` 은 stdout 에 `ingest_progress.v1` 라인 streaming 후 마지막에 `ingest_report.v1`. Ctrl-C 한 번이면 현재 asset 마무리 후 abort (부분 commit 보존, idempotent re-run), 두 번째 Ctrl-C 는 hard exit. Markdown title 이 frontmatter 에 없어도 첫 H1 → H2 → 첫 paragraph 80 자 → 파일명 순으로 자동 채움 (parser_version `md-frontmatter-v2`) — 기존 색인된 doc 도 다음 ingest 에서 새 title 로 갱신. **Incremental** (p9-fb-23): 두 번째 이후의 ingest 는 변하지 않은 doc (blake3 + parser/chunker/embedder version 모두 동일) 의 parse/chunk/embed/vector upsert 를 자동 스킵. final summary 에 `N unchanged` 카운트 표시. `--force-reingest` 로 skip 무시 강제 재처리. **지원 형식** (extractor 자동 결정 — config 에 명시 불가): Markdown (`.md`), 이미지 (`.png` / `.jpg` / `.jpeg`, OCR + caption), PDF (`.pdf`). 다른 확장자는 자동 skip — `IngestItem.warnings` 에 사유 (`"unsupported media type: .docx"` 등), `IngestReport.skipped_by_extension` 에 카운트 분류, CLI / TUI summary 에 breakdown 표시. |
| `kebab search --mode {lexical,vector,hybrid} "<query>" [--no-cache] [--max-tokens N] [--snippet-chars N] [--cursor <opaque>] [--tag T] [--lang L] [--path-glob G] [--trust-min LEVEL] [--media TYPE] [--ingested-after RFC3339] [--doc-id ID] [--trace]` | 검색. hybrid는 RRF fusion, citation 포함. 같은 process 안에서 동일 query (NFKC + trim + lowercase 정규화) 반복 시 in-process LRU 캐시 hit (capacity = `[search] cache_capacity`, default 256). `--no-cache` 로 강제 bypass — 디버깅용. ingest commit 발생 시 `kv['corpus_revision']` bump 으로 모든 entry 자동 stale. **`--max-tokens` / `--snippet-chars` / `--cursor` (p9-fb-34)** — agent budget controls. `--json` 출력은 `search_response.v1` wrapper (`{hits, next_cursor, truncated}`) — pre-fb-34 의 bare array 와 호환 안 됨. mismatched cursor → `error.v1.code = stale_cursor`. **filter flags (p9-fb-36):** `--tag` 는 반복 가능 flag (`--tag rust --tag async`) 로 OR 매칭, `--media``,` 구분 다중 값 OR 매칭, 나머지 flags 간은 AND 조합. `--trust-min``primary\|secondary\|generated` 중 하나 (해당 level 이상 포함). `--ingested-after` 는 RFC3339 UTC — 파싱 실패 시 `error.v1.code = config_invalid` (exit 2). `--media md``markdown` alias 로 정규화. 알 수 없는 `--media` 값은 무조건 empty hits (오류 아님). **`--trace` (p9-fb-37)** — `search_response.v1.trace` 에 lexical / vector pre-fusion 후보 + RRF union + per-stage timing (`lexical_ms` / `vector_ms` / `fusion_ms` / `total_ms`) 노출. trace 요청은 캐시 우회 (`--no-cache` 없이도 항상 cold). |
| `kebab search --mode {lexical,vector,hybrid} "<query>" [--no-cache] [--max-tokens N] [--snippet-chars N] [--cursor <opaque>] [--tag T] [--lang L] [--path-glob G] [--trust-min LEVEL] [--media TYPE] [--ingested-after RFC3339] [--doc-id ID] [--trace] [--bulk]` | 검색. hybrid는 RRF fusion, citation 포함. 같은 process 안에서 동일 query (NFKC + trim + lowercase 정규화) 반복 시 in-process LRU 캐시 hit (capacity = `[search] cache_capacity`, default 256). `--no-cache` 로 강제 bypass — 디버깅용. ingest commit 발생 시 `kv['corpus_revision']` bump 으로 모든 entry 자동 stale. **`--max-tokens` / `--snippet-chars` / `--cursor` (p9-fb-34)** — agent budget controls. `--json` 출력은 `search_response.v1` wrapper (`{hits, next_cursor, truncated}`) — pre-fb-34 의 bare array 와 호환 안 됨. mismatched cursor → `error.v1.code = stale_cursor`. **filter flags (p9-fb-36):** `--tag` 는 반복 가능 flag (`--tag rust --tag async`) 로 OR 매칭, `--media``,` 구분 다중 값 OR 매칭, 나머지 flags 간은 AND 조합. `--trust-min``primary\|secondary\|generated` 중 하나 (해당 level 이상 포함). `--ingested-after` 는 RFC3339 UTC — 파싱 실패 시 `error.v1.code = config_invalid` (exit 2). `--media md``markdown` alias 로 정규화. 알 수 없는 `--media` 값은 무조건 empty hits (오류 아님). **`--trace` (p9-fb-37)** — `search_response.v1.trace` 에 lexical / vector pre-fusion 후보 + RRF union + per-stage timing (`lexical_ms` / `vector_ms` / `fusion_ms` / `total_ms`) 노출. trace 요청은 캐시 우회 (`--no-cache` 없이도 항상 cold). **`--bulk` (p9-fb-42)** — stdin ndjson 으로 N query 한 번에 실행. `--json` 면 stdout per-query ndjson (`bulk_search_item.v1`) + stderr summary (`bulk_summary: total=N succeeded=S failed=F`). Cap 100. agent 가 query decomposition 후 sub-query 일괄 실행 시 single round-trip — App instance 재사용으로 캐시 / embedder cold-start 비용 한 번만. Per-query failure 는 item 의 `error` (error.v1) 에 격리, 다른 query 계속 진행. |
| `kebab list docs` | 색인된 문서 목록 |
| `kebab inspect doc <id>` / `kebab inspect chunk <id>` | raw record 보기 |
| `kebab fetch chunk <id> [--context N]` / `kebab fetch doc <id> [--max-tokens N]` / `kebab fetch span <doc_id> <ls> <le> [--max-tokens N]` | (p9-fb-35) verbatim text fetch from indexed corpus. wire = `fetch_result.v1` (kind discriminator). chunk: target + ±N ordinal-context chunks. doc: full normalized markdown. span: 1-based line range (PDF/audio rejected as `error.v1.code = span_not_supported`). chars/4 budget on doc/span. |
@@ -83,7 +83,7 @@ kebab doctor
| `kebab schema [--json]` | introspection — wire schemas / capabilities / models / stats 한 번에. `--json``schema.v1` wire; 사람 모드는 서식 출력. **stats 에 (p9-fb-37) `media_breakdown` (5 keys: markdown / pdf / image / audio / other) + `lang_breakdown` (BCP-47 코드, NULL 은 literal `"null"`) + `index_bytes` (sqlite + lancedb on-disk 합계) + `stale_doc_count` (`config.search.stale_threshold_days` 초과 doc 수) 추가.** |
| `kebab ingest-file <path>` | 단일 파일 ingest (workspace 외부 가능). 바이트는 `<workspace.root>/_external/<hash12>.<ext>` 로 copy. `.kebabignore` 매치 시 stderr warn 후 진행 (explicit ingest 가 bypass intent). |
| `kebab ingest-stdin --title <T> [--source-uri <URI>]` | stdin 의 markdown 본문 ingest. frontmatter (title + source_uri) 자동 prepend. v1 markdown only. |
| `kebab mcp` | MCP (Model Context Protocol) stdio server. agent host (Claude Code / Cursor / OpenAI Agents) 가 spawn 하여 tool 호출 (`search` / `ask` / `schema` / `doctor` / `ingest_file` / `ingest_stdin`). `--config` honor. |
| `kebab mcp` | MCP (Model Context Protocol) stdio server. agent host (Claude Code / Cursor / OpenAI Agents) 가 spawn 하여 tool 호출 (`search` / `bulk_search` / `ask` / `fetch` / `schema` / `doctor` / `ingest_file` / `ingest_stdin`). `--config` honor. |
모든 명령에 `--json` 플래그. 출력은 frozen wire schema v1 (`schema_version` 항상 포함, 예: `ingest_report.v1`, `ingest_progress.v1`, `search_hit.v1`, `answer.v1`, `doctor.v1`, `reset_report.v1`, `schema.v1`). `--json` 모드에서 fatal error 는 stderr 에 `error.v1` ndjson 으로 emit (exit code 0/1/2/3 unchanged).
@@ -198,7 +198,7 @@ config 예시는 [docs/SMOKE.md](docs/SMOKE.md) 의 `/tmp/kebab-smoke/config.tom
## MCP 사용
`kebab mcp` 가 stdio MCP server. 6 tool: `search` / `ask` / `schema` / `doctor` / `ingest_file` / `ingest_stdin`.
`kebab mcp` 가 stdio MCP server. 8 tool: `search` / `bulk_search` (p9-fb-42 — N query 한 번에) / `ask` / `fetch` (p9-fb-35) / `schema` / `doctor` / `ingest_file` / `ingest_stdin`.
Claude Code 빠른 등록 (`~/.claude/mcp.json` 또는 host 동등 위치):

View File

@@ -0,0 +1,296 @@
//! p9-fb-42: bulk multi-query facade. Sequential for-loop reusing
//! one App instance so embedder cold-start + LRU cache amortize
//! across the N queries.
use anyhow::Context;
use kebab_core::{
BulkSearchItem, BulkSearchSummary, DocumentId, Lang, SearchFilters, SearchHit, SearchMode,
SearchOpts, SearchQuery, TrustLevel,
};
use serde_json::Value;
use crate::{App, SearchResponse};
/// Hard cap on items per bulk call. Documented in spec — agents that
/// hit this should batch-split.
pub const BULK_QUERIES_MAX: usize = 100;
/// p9-fb-42: bulk search facade. Returns `(items, summary)` always
/// — per-query failures embed `error.v1` JSON in the item rather
/// than aborting the bulk call. Returns `Err` only for input
/// validation failures (e.g. >100 queries).
#[doc(hidden)]
pub fn bulk_search_with_config(
config: kebab_config::Config,
raw_items: Vec<Value>,
) -> anyhow::Result<(Vec<BulkSearchItem>, BulkSearchSummary)> {
if raw_items.len() > BULK_QUERIES_MAX {
anyhow::bail!(
"queries: max {} items, got {}",
BULK_QUERIES_MAX,
raw_items.len()
);
}
let app = App::open_with_config(config).context("kebab-app: open for bulk_search")?;
let mut results: Vec<BulkSearchItem> = Vec::with_capacity(raw_items.len());
let mut succeeded: u32 = 0;
let mut failed: u32 = 0;
for raw in raw_items {
let item = run_one(&app, raw);
if item.error.is_some() {
failed += 1;
} else {
succeeded += 1;
}
results.push(item);
}
let summary = BulkSearchSummary {
total: succeeded + failed,
succeeded,
failed,
};
Ok((results, summary))
}
fn run_one(app: &App, raw: Value) -> BulkSearchItem {
let echo = raw.clone();
match parse_one(&raw) {
Ok((query, opts)) => match app.search_with_opts(query, opts) {
Ok(resp) => BulkSearchItem {
query: echo,
response: Some(serialize_search_response(&resp)),
error: None,
},
Err(e) => BulkSearchItem {
query: echo,
response: None,
error: Some(error_v1_json("retrieval_error", &format!("{e:#}"), None)),
},
},
Err(msg) => BulkSearchItem {
query: echo,
response: None,
error: Some(error_v1_json("invalid_input", &msg, None)),
},
}
}
/// Mirror of `kebab-cli::wire::wire_search_response` — `SearchResponse`
/// itself is not `Serialize`, so we build the `search_response.v1`-shaped
/// JSON manually. Each hit also gets `score` promoted from
/// `retrieval.fusion_score` per §2.2, matching the CLI wire layer.
fn serialize_search_response(r: &SearchResponse) -> Value {
let mut v = serde_json::json!({
"schema_version": "search_response.v1",
"hits": r.hits.iter().map(serialize_search_hit).collect::<Vec<_>>(),
"next_cursor": r.next_cursor,
"truncated": r.truncated,
});
if let Value::Object(ref mut map) = v {
let trace_v = match &r.trace {
Some(t) => serde_json::to_value(t).unwrap_or(Value::Null),
None => Value::Null,
};
map.insert("trace".to_string(), trace_v);
}
v
}
fn serialize_search_hit(h: &SearchHit) -> Value {
let mut v = serde_json::to_value(h).unwrap_or(Value::Null);
if let Value::Object(ref mut map) = v {
if let Some(Value::Object(retrieval)) = map.get("retrieval") {
if let Some(score) = retrieval.get("fusion_score").cloned() {
map.insert("score".to_string(), score);
}
}
map.insert(
"schema_version".to_string(),
Value::String("search_hit.v1".to_string()),
);
}
v
}
fn parse_one(raw: &Value) -> Result<(SearchQuery, SearchOpts), String> {
let obj = raw.as_object().ok_or("expected JSON object")?;
let text = obj
.get("query")
.and_then(|v| v.as_str())
.ok_or("missing required field: query")?
.to_string();
let mode = match obj.get("mode").and_then(|v| v.as_str()) {
None => SearchMode::Hybrid,
Some("hybrid") => SearchMode::Hybrid,
Some("lexical") => SearchMode::Lexical,
Some("vector") => SearchMode::Vector,
Some(other) => return Err(format!("invalid mode: {other:?}")),
};
let k = obj
.get("k")
.and_then(|v| v.as_u64())
.map(|n| n as usize)
.unwrap_or(0); // 0 → use config default in app
let trust_min = match obj.get("trust_min").and_then(|v| v.as_str()) {
None => None,
Some("primary") => Some(TrustLevel::Primary),
Some("secondary") => Some(TrustLevel::Secondary),
Some("generated") => Some(TrustLevel::Generated),
Some(other) => return Err(format!("invalid trust_min: {other:?}")),
};
let ingested_after = match obj.get("ingested_after").and_then(|v| v.as_str()) {
None => None,
Some(s) => Some(
time::OffsetDateTime::parse(s, &time::format_description::well_known::Rfc3339)
.map_err(|e| format!("invalid ingested_after RFC3339 {s:?}: {e}"))?,
),
};
let media: Vec<String> = obj
.get("media")
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|x| x.as_str().map(normalize_media_alias))
.collect()
})
.unwrap_or_default();
let tags_any: Vec<String> = obj
.get("tag")
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|x| x.as_str().map(String::from))
.collect()
})
.unwrap_or_default();
let lang = obj
.get("lang")
.and_then(|v| v.as_str())
.map(|s| Lang(s.to_string()));
let path_glob = obj
.get("path_glob")
.and_then(|v| v.as_str())
.map(String::from);
let doc_id = obj
.get("doc_id")
.and_then(|v| v.as_str())
.map(|s| DocumentId(s.to_string()));
let filters = SearchFilters {
tags_any,
lang,
path_glob,
trust_min,
media,
ingested_after,
doc_id,
};
let opts = SearchOpts {
max_tokens: obj
.get("max_tokens")
.and_then(|v| v.as_u64())
.map(|n| n as usize),
snippet_chars: obj
.get("snippet_chars")
.and_then(|v| v.as_u64())
.map(|n| n as usize),
cursor: obj.get("cursor").and_then(|v| v.as_str()).map(String::from),
trace: obj.get("trace").and_then(|v| v.as_bool()).unwrap_or(false),
};
Ok((
SearchQuery {
text,
mode,
k,
filters,
},
opts,
))
}
fn normalize_media_alias(s: &str) -> String {
match s.to_ascii_lowercase().as_str() {
"md" => "markdown".to_string(),
other => other.to_string(),
}
}
fn error_v1_json(code: &str, message: &str, hint: Option<&str>) -> Value {
serde_json::json!({
"schema_version": "error.v1",
"code": code,
"message": message,
"hint": hint,
})
}
#[cfg(test)]
mod tests {
use super::*;
fn open_temp() -> kebab_config::Config {
let dir = tempfile::tempdir().unwrap();
let mut cfg = kebab_config::Config::defaults();
cfg.storage.data_dir = dir.path().to_string_lossy().into_owned();
// Bring up migrations so SqliteStore::open_existing succeeds inside App::open.
let store = kebab_store_sqlite::SqliteStore::open(&cfg).unwrap();
store.run_migrations().unwrap();
drop(store);
// Leak the tempdir into a static — tests are short-lived; not worth threading.
std::mem::forget(dir);
cfg
}
#[test]
fn empty_input_returns_empty_summary() {
let cfg = open_temp();
let (items, summary) = bulk_search_with_config(cfg, vec![]).unwrap();
assert!(items.is_empty());
assert_eq!(summary.total, 0);
assert_eq!(summary.succeeded, 0);
assert_eq!(summary.failed, 0);
}
#[test]
fn over_cap_returns_err() {
let cfg = open_temp();
let raw: Vec<Value> = (0..101)
.map(|_| serde_json::json!({"query": "x"}))
.collect();
let err = bulk_search_with_config(cfg, raw).unwrap_err();
let msg = format!("{err:#}");
assert!(msg.contains("max 100"));
}
#[test]
fn invalid_item_emits_error_keeps_total_count() {
let cfg = open_temp();
let raw = vec![
serde_json::json!({"query": "ok", "mode": "lexical"}),
serde_json::json!({"mode": "lexical"}), // missing required `query`
];
let (items, summary) = bulk_search_with_config(cfg, raw).unwrap();
assert_eq!(items.len(), 2);
assert_eq!(summary.total, 2);
// First item: lexical mode against empty corpus succeeds with empty hits.
assert!(items[0].error.is_none());
// Second item: missing required field.
assert!(items[1].error.is_some());
assert_eq!(items[1].error.as_ref().unwrap()["code"], "invalid_input");
}
}

View File

@@ -55,6 +55,7 @@ use kebab_parse_md::{BodyHints, parse_blocks, parse_frontmatter};
use kebab_source_fs::FsSourceConnector;
mod app;
mod bulk;
pub mod cursor;
pub mod doctor_signal;
pub mod error_signal;
@@ -72,6 +73,8 @@ pub use ingest_progress::{AggregateCounts, IngestEvent, render_skipped_breakdown
pub use reset::{ResetReport, ResetScope};
pub use error_wire::{ERROR_V1_ID, ErrorV1, StructuredError, classify};
pub use fetch::fetch_with_config;
#[doc(hidden)]
pub use bulk::{BULK_QUERIES_MAX, bulk_search_with_config};
pub use schema::{Capabilities, Models, SCHEMA_V1_ID, SchemaV1, Stats, WireBlock, schema_with_config};
pub use staleness::{compute_stale, mark_stale_in_place};

View File

@@ -32,6 +32,7 @@ pub struct Capabilities {
pub http_daemon: bool,
pub mcp_server: bool,
pub single_file_ingest: bool,
pub bulk_search: bool,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
@@ -85,6 +86,8 @@ const WIRE_SCHEMAS: &[&str] = &[
"citation.v1",
"schema.v1",
"error.v1",
"bulk_search_item.v1",
"bulk_search_response.v1",
];
/// Build a [`SchemaV1`] introspection report for the given config.
@@ -123,6 +126,7 @@ fn capabilities_snapshot() -> Capabilities {
http_daemon: false,
mcp_server: true,
single_file_ingest: false,
bulk_search: true,
}
}

View File

@@ -94,7 +94,8 @@ enum Cmd {
/// Lexical / vector / hybrid search over chunks.
Search {
query: String,
/// Query text. Not required when `--bulk` is set (queries from stdin).
query: Option<String>,
#[arg(long, default_value_t = 10)]
k: usize,
@@ -171,6 +172,16 @@ enum Cmd {
/// without embeddings via a no-op vector stub.
#[arg(long)]
trace: bool,
/// p9-fb-42: bulk multi-query mode. Reads ndjson from stdin —
/// one JSON object per line, each item shape mirrors the
/// single-query input. Output is per-query ndjson on stdout
/// (one `bulk_search_item.v1` per line) plus a summary line on
/// stderr. Single-query flags (`--mode`, `--k`, `--tag`, etc.)
/// are ignored when `--bulk` is set; pass them per-item in the
/// stdin JSON instead. Caps at 100 queries per call.
#[arg(long)]
bulk: bool,
},
/// Retrieval-augmented question answering.
@@ -678,9 +689,97 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
ingested_after,
doc_id,
trace,
bulk,
} => {
// p9-fb-42: bulk mode — stdin ndjson → bulk_search_with_config
// → stdout ndjson per query + stderr summary. Single-query
// flags are ignored (each item supplies its own).
if *bulk {
use std::io::{BufRead, Write};
let cfg = kebab_config::Config::load(cli.config.as_deref())?;
let stdin = std::io::stdin();
let stdin_locked = stdin.lock();
let mut raw_items: Vec<serde_json::Value> = Vec::new();
for (lineno, line) in stdin_locked.lines().enumerate() {
let line = line?;
if line.trim().is_empty() {
continue;
}
let v: serde_json::Value =
serde_json::from_str(&line).map_err(|e| {
anyhow::Error::new(kebab_app::StructuredError(
kebab_app::ErrorV1 {
schema_version: kebab_app::ERROR_V1_ID
.to_string(),
code: "config_invalid".to_string(),
message: format!(
"stdin ndjson line {} parse error: {e}",
lineno + 1
),
details: serde_json::Value::Null,
hint: Some(
"each line must be a JSON object with at least `query`"
.to_string(),
),
},
))
})?;
raw_items.push(v);
}
let (items, summary) =
kebab_app::bulk_search_with_config(cfg, raw_items)?;
if cli.json {
let mut stdout = std::io::stdout().lock();
for item in &items {
let v = wire::wire_bulk_search_item(item);
writeln!(stdout, "{}", serde_json::to_string(&v)?)?;
}
eprintln!(
"bulk_summary: total={} succeeded={} failed={}",
summary.total, summary.succeeded, summary.failed,
);
} else {
let mut stdout = std::io::stdout().lock();
for (idx, item) in items.iter().enumerate() {
writeln!(
stdout,
"# Query {}: {}",
idx + 1,
serde_json::to_string(&item.query)?,
)?;
if let Some(err) = &item.error {
writeln!(stdout, "error: {}", err)?;
} else if let Some(resp) = &item.response {
writeln!(
stdout,
"{}",
serde_json::to_string_pretty(resp)?
)?;
}
writeln!(stdout)?;
}
eprintln!(
"bulk_summary: total={} succeeded={} failed={}",
summary.total, summary.succeeded, summary.failed,
);
}
return Ok(());
}
let cfg = kebab_config::Config::load(cli.config.as_deref())?;
// p9-fb-42: bulk mode requires no query; single-query mode requires query.
let query_text = match query.as_ref() {
Some(q) => q.clone(),
None => {
return Err(anyhow::anyhow!("query is required unless --bulk is set"));
}
};
// p9-fb-36: normalize --media aliases (md → markdown).
fn normalize_media_alias(s: &str) -> String {
match s.to_ascii_lowercase().as_str() {
@@ -732,7 +831,7 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
};
let q = kebab_core::SearchQuery {
text: query.clone(),
text: query_text,
mode: (*mode).into(),
k: *k,
filters,
@@ -1266,6 +1365,7 @@ fn print_schema_text(s: &kebab_app::SchemaV1) {
("http_daemon", s.capabilities.http_daemon),
("mcp_server", s.capabilities.mcp_server),
("single_file_ingest", s.capabilities.single_file_ingest),
("bulk_search", s.capabilities.bulk_search),
];
for (name, on) in caps {
let mark = if on { "" } else { "" };

View File

@@ -201,6 +201,20 @@ pub fn wire_fetch_result(r: &kebab_core::FetchResult) -> Value {
tag_object(v, "fetch_result.v1")
}
/// p9-fb-42: tag a `BulkSearchItem` (already serialized as a Value)
/// as `bulk_search_item.v1`. The inner `query` / `response` / `error`
/// fields stay verbatim — only the envelope gets the schema_version stamp.
pub fn wire_bulk_search_item(item: &kebab_core::BulkSearchItem) -> Value {
let mut v = serde_json::to_value(item).expect("BulkSearchItem serializes");
if let Value::Object(ref mut map) = v {
map.insert(
"schema_version".to_string(),
Value::String("bulk_search_item.v1".to_string()),
);
}
v
}
#[cfg(test)]
mod tests {
use super::*;
@@ -297,7 +311,7 @@ mod tests {
json_mode: true, ingest_progress: true, ingest_cancellation: true,
rag_multi_turn: true, search_cache: true, incremental_ingest: true,
streaming_ask: false, http_daemon: false, mcp_server: false,
single_file_ingest: false,
single_file_ingest: false, bulk_search: true,
},
models: Models {
parser_version: "x".to_string(),

View File

@@ -66,8 +66,8 @@ fn cli_mcp_initialize_then_tools_list() {
.expect("tools/list result.tools must be an array");
assert_eq!(
tools.len(),
7,
"expected 7 tools (schema, doctor, search, ask, fetch, ingest_file, ingest_stdin), got {}: {list}",
8,
"expected 8 tools (schema, doctor, search, bulk_search, ask, fetch, ingest_file, ingest_stdin), got {}: {list}",
tools.len()
);

View File

@@ -0,0 +1,174 @@
//! p9-fb-42: integration tests for `kebab search --bulk`.
//!
//! Lexical-only — no fastembed / no Ollama. Each test builds its own
//! TempDir KB via `common::write_config` + `common::ingest` and drives
//! `kebab search --bulk` through stdin. Verifies:
//!
//! - Two queries over stdin emit per-query ndjson `bulk_search_item.v1` lines.
//! - Empty stdin returns empty results with zero summary.
//! - Malformed ndjson exits with code 2 (config_invalid).
//! - Input over the 100-item cap fails with "max 100" error message.
//! - Invalid item field (e.g. bad `mode`) emits per-item error and continues.
mod common;
use serde_json::Value;
use std::fs;
use std::io::Write;
use std::process::{Command, Stdio};
fn cargo_bin() -> &'static str {
env!("CARGO_BIN_EXE_kebab")
}
fn run_bulk_with_stdin(cfg: &std::path::Path, stdin_body: &str, json: bool) -> std::process::Output {
let mut cmd = Command::new(cargo_bin());
cmd.arg("--config").arg(cfg).arg("search").arg("--bulk");
if json {
cmd.arg("--json");
}
cmd.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped());
let mut child = cmd.spawn().expect("spawn kebab");
{
let mut sin = child.stdin.take().expect("stdin");
sin.write_all(stdin_body.as_bytes()).expect("write stdin");
}
child.wait_with_output().expect("wait")
}
fn seed_workspace(workspace: &std::path::Path) {
fs::write(workspace.join("a.md"), "# Alpha\n\nrust async hello").unwrap();
fs::write(workspace.join("b.md"), "# Bravo\n\nbread and kebab").unwrap();
}
// ---------------------------------------------------------------------------
// Test 1: Two queries over stdin emit per-query ndjson
// ---------------------------------------------------------------------------
#[test]
fn two_query_bulk_emits_per_query_ndjson() {
let dir = tempfile::tempdir().unwrap();
let (cfg, workspace, _data) = common::write_config(dir.path(), 0);
seed_workspace(&workspace);
common::ingest(&cfg, &workspace);
let out = run_bulk_with_stdin(
&cfg,
"{\"query\":\"rust\",\"mode\":\"lexical\"}\n{\"query\":\"kebab\",\"mode\":\"lexical\"}\n",
true,
);
assert!(
out.status.success(),
"stderr: {}",
String::from_utf8_lossy(&out.stderr)
);
let stdout = String::from_utf8_lossy(&out.stdout);
let lines: Vec<&str> = stdout.lines().filter(|l| !l.trim().is_empty()).collect();
assert_eq!(lines.len(), 2, "expected 2 ndjson lines, got {lines:?}");
for line in &lines {
let v: Value = serde_json::from_str(line).expect("valid JSON line");
assert_eq!(v["schema_version"], "bulk_search_item.v1");
assert!(v["response"].is_object());
assert!(v["error"].is_null());
}
let stderr = String::from_utf8_lossy(&out.stderr);
assert!(
stderr.contains("bulk_summary: total=2 succeeded=2 failed=0"),
"stderr summary missing: {stderr}"
);
}
// ---------------------------------------------------------------------------
// Test 2: Empty stdin returns empty results with zero summary
// ---------------------------------------------------------------------------
#[test]
fn empty_stdin_returns_empty_results_with_zero_summary() {
let dir = tempfile::tempdir().unwrap();
let (cfg, workspace, _data) = common::write_config(dir.path(), 0);
seed_workspace(&workspace);
common::ingest(&cfg, &workspace);
let out = run_bulk_with_stdin(&cfg, "", true);
assert!(out.status.success());
let stdout = String::from_utf8_lossy(&out.stdout);
assert!(stdout.trim().is_empty(), "expected empty stdout, got: {stdout}");
let stderr = String::from_utf8_lossy(&out.stderr);
assert!(stderr.contains("bulk_summary: total=0 succeeded=0 failed=0"));
}
// ---------------------------------------------------------------------------
// Test 3: Malformed ndjson line emits config_invalid exit 2
// ---------------------------------------------------------------------------
#[test]
fn malformed_ndjson_line_emits_config_invalid_exit_2() {
let dir = tempfile::tempdir().unwrap();
let (cfg, workspace, _data) = common::write_config(dir.path(), 0);
seed_workspace(&workspace);
common::ingest(&cfg, &workspace);
let out = run_bulk_with_stdin(&cfg, "not json\n", true);
assert_eq!(out.status.code(), Some(2), "expected exit 2");
let stderr = String::from_utf8_lossy(&out.stderr);
assert!(
stderr.contains("config_invalid") || stderr.contains("parse error"),
"expected config_invalid or parse error in stderr: {stderr}"
);
}
// ---------------------------------------------------------------------------
// Test 4: Over cap input (>100) emits error
// ---------------------------------------------------------------------------
#[test]
fn over_cap_input_emits_error() {
let dir = tempfile::tempdir().unwrap();
let (cfg, workspace, _data) = common::write_config(dir.path(), 0);
seed_workspace(&workspace);
common::ingest(&cfg, &workspace);
let body: String = (0..101)
.map(|_| "{\"query\":\"x\",\"mode\":\"lexical\"}\n")
.collect();
let out = run_bulk_with_stdin(&cfg, &body, true);
// bulk_search_with_config returns Err — surfaces as exit 1 (anyhow chain)
// or 2 if classified by error_wire. Accept either, but message must mention `max 100`.
assert!(out.status.code().is_some());
let stderr = String::from_utf8_lossy(&out.stderr);
assert!(
stderr.contains("max 100"),
"expected 'max 100' in stderr: {stderr}"
);
}
// ---------------------------------------------------------------------------
// Test 5: Invalid item field (bad mode) emits per-item error and continues
// ---------------------------------------------------------------------------
#[test]
fn invalid_item_field_emits_per_item_error_continues() {
let dir = tempfile::tempdir().unwrap();
let (cfg, workspace, _data) = common::write_config(dir.path(), 0);
seed_workspace(&workspace);
common::ingest(&cfg, &workspace);
let out = run_bulk_with_stdin(
&cfg,
"{\"query\":\"rust\",\"mode\":\"lexical\"}\n{\"query\":\"x\",\"mode\":\"bogus\"}\n",
true,
);
assert!(out.status.success());
let stdout = String::from_utf8_lossy(&out.stdout);
let lines: Vec<&str> = stdout.lines().filter(|l| !l.trim().is_empty()).collect();
assert_eq!(lines.len(), 2);
let v0: Value = serde_json::from_str(lines[0]).unwrap();
let v1: Value = serde_json::from_str(lines[1]).unwrap();
assert!(v0["error"].is_null());
assert!(v1["error"].is_object());
assert_eq!(v1["error"]["code"], "invalid_input");
let stderr = String::from_utf8_lossy(&out.stderr);
assert!(stderr.contains("succeeded=1 failed=1"));
}

View File

@@ -51,9 +51,9 @@ pub use metadata::{
TrustLevel,
};
pub use search::{
DocFilter, DocSummary, IndexBytes, MEDIA_KINDS, RetrievalDetail, ScoreKind, SearchFilters, SearchHit,
SearchMode, SearchOpts, SearchQuery, SearchTrace, TraceCandidate, TraceFusionInput,
TraceTiming,
BulkSearchItem, BulkSearchResponse, BulkSearchSummary, DocFilter, DocSummary, IndexBytes, MEDIA_KINDS,
RetrievalDetail, ScoreKind, SearchFilters, SearchHit, SearchMode, SearchOpts, SearchQuery, SearchTrace,
TraceCandidate, TraceFusionInput, TraceTiming,
};
pub use answer::{
Answer, AnswerCitation, AnswerRetrievalSummary, ModelRef, RefusalReason, TokenUsage,

View File

@@ -196,6 +196,32 @@ pub struct IndexBytes {
pub lancedb: u64,
}
/// p9-fb-42: per-query result in bulk search. `response` XOR `error` —
/// exactly one is `Some`. `query` is the input echo (raw JSON value)
/// so consumers can correlate input to output without index tracking.
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
pub struct BulkSearchItem {
pub query: serde_json::Value,
pub response: Option<serde_json::Value>,
pub error: Option<serde_json::Value>,
}
/// p9-fb-42: bulk summary counts. Invariant: total == succeeded + failed.
#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, Serialize, Deserialize)]
pub struct BulkSearchSummary {
pub total: u32,
pub succeeded: u32,
pub failed: u32,
}
/// p9-fb-42: MCP-only envelope. CLI emits raw ndjson without envelope.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct BulkSearchResponse {
pub schema_version: String,
pub results: Vec<BulkSearchItem>,
pub summary: BulkSearchSummary,
}
#[cfg(test)]
mod tests {
use super::*;
@@ -356,4 +382,51 @@ mod tests {
let hit: SearchHit = serde_json::from_value(json).unwrap();
assert_eq!(hit.score_kind, ScoreKind::Rrf);
}
#[test]
fn bulk_search_summary_serde_roundtrip() {
let s = BulkSearchSummary {
total: 5,
succeeded: 4,
failed: 1,
};
let v = serde_json::to_value(s).unwrap();
assert_eq!(v["total"], 5);
assert_eq!(v["succeeded"], 4);
assert_eq!(v["failed"], 1);
let back: BulkSearchSummary = serde_json::from_value(v).unwrap();
assert_eq!(back, s);
}
#[test]
fn bulk_search_summary_default_is_zeros() {
let s = BulkSearchSummary::default();
assert_eq!(s.total, 0);
assert_eq!(s.succeeded, 0);
assert_eq!(s.failed, 0);
}
#[test]
fn bulk_search_item_serde_response_variant() {
let item = BulkSearchItem {
query: serde_json::json!({"query": "rust"}),
response: Some(serde_json::json!({"hits": []})),
error: None,
};
let v = serde_json::to_value(&item).unwrap();
assert!(v["response"].is_object());
assert!(v["error"].is_null());
}
#[test]
fn bulk_search_item_serde_error_variant() {
let item = BulkSearchItem {
query: serde_json::json!({"query": "rust"}),
response: None,
error: Some(serde_json::json!({"code": "config_invalid", "message": "bad"})),
};
let v = serde_json::to_value(&item).unwrap();
assert!(v["response"].is_null());
assert_eq!(v["error"]["code"], "config_invalid");
}
}

View File

@@ -1,7 +1,7 @@
//! MCP (Model Context Protocol) server over stdio. Exposes 7 tools
//! MCP (Model Context Protocol) server over stdio. Exposes 8 tools
//! (`search` / `ask` / `schema` / `doctor` / `ingest_file` / `ingest_stdin`
//! / `fetch`) backed by `kebab-app` facade methods. Used by `kebab-cli`'s
//! `Cmd::Mcp` arm.
//! / `fetch` / `bulk_search`) backed by `kebab-app` facade methods. Used by
//! `kebab-cli`'s `Cmd::Mcp` arm.
//!
//! See spec `docs/superpowers/specs/2026-05-07-p9-fb-30-mcp-server-design.md`.
@@ -67,6 +67,11 @@ pub fn build_tools_vec() -> Vec<Tool> {
"Verbatim fetch — chunk / doc / span modes. Returns fetch_result.v1 with the indexed text (no LLM rewrite).",
schema_for_type::<tools::fetch::FetchInput>(),
),
Tool::new(
"bulk_search",
"Bulk multi-query search — N queries per call (cap 100). Each query mirrors the `search` input shape; returns `bulk_search_response.v1` with per-query results + summary. Sequential execution reuses one App instance so cache / embedder cold-start cost amortizes.",
schema_for_type::<tools::bulk_search::BulkSearchInput>(),
),
]
}
@@ -170,6 +175,13 @@ impl ServerHandler for KebabHandler {
})
.await
}
"bulk_search" => {
let args = request.arguments.unwrap_or_default();
self.spawn_tool(args, |state, input| {
tools::bulk_search::handle(&state, input)
})
.await
}
_other => Err(ErrorData::method_not_found::<
rmcp::model::CallToolRequestMethod,
>()),

View File

@@ -0,0 +1,55 @@
//! `bulk_search` tool — wraps `kebab_app::bulk_search_with_config`.
//! Input: `{ queries: [<SearchInput shape>, ...] }`.
//! Output: `bulk_search_response.v1` envelope (results + summary).
use rmcp::model::CallToolResult;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use crate::error::{to_tool_error, to_tool_success};
use crate::state::KebabAppState;
#[derive(Debug, Deserialize, Serialize, JsonSchema)]
pub struct BulkSearchInput {
/// Per-query inputs. Each item mirrors the single-query `search`
/// tool's input shape — `query` is required, all other fields are
/// optional and default to single-search defaults. Capped at 100
/// items; exceeding returns an `invalid_input` tool error without
/// running any query.
pub queries: Vec<serde_json::Value>,
}
pub fn handle(state: &KebabAppState, input: BulkSearchInput) -> CallToolResult {
let cfg_clone = (*state.config).clone();
match kebab_app::bulk_search_with_config(cfg_clone, input.queries) {
Ok((items, summary)) => {
let tagged_items: Vec<serde_json::Value> = items
.iter()
.map(|it| {
let mut v = serde_json::to_value(it).unwrap_or(serde_json::Value::Null);
if let serde_json::Value::Object(ref mut map) = v {
map.insert(
"schema_version".to_string(),
serde_json::Value::String("bulk_search_item.v1".to_string()),
);
}
v
})
.collect();
let envelope = serde_json::json!({
"schema_version": "bulk_search_response.v1",
"results": tagged_items,
"summary": {
"total": summary.total,
"succeeded": summary.succeeded,
"failed": summary.failed,
},
});
match serde_json::to_string(&envelope) {
Ok(json) => to_tool_success(json),
Err(e) => to_tool_error(&anyhow::anyhow!(e)),
}
}
Err(e) => to_tool_error(&e),
}
}

View File

@@ -7,3 +7,4 @@ pub mod ask;
pub mod ingest_file;
pub mod ingest_stdin;
pub mod fetch;
pub mod bulk_search;

View File

@@ -0,0 +1,121 @@
//! p9-fb-42: integration tests for `mcp__kebab__bulk_search`.
use std::fs;
use kebab_config::Config;
use kebab_core::SourceScope;
use kebab_mcp::{KebabAppState, KebabHandler};
use rmcp::model::RawContent;
use serde_json::json;
fn minimal_config(data_dir: &std::path::Path, workspace_root: &std::path::Path) -> Config {
let mut cfg = Config::defaults();
cfg.storage.data_dir = data_dir.to_string_lossy().into_owned();
cfg.storage.model_dir = data_dir.join("models").to_string_lossy().into_owned();
cfg.workspace.root = workspace_root.to_string_lossy().into_owned();
cfg.workspace.exclude.clear();
cfg.models.embedding.provider = "none".to_string();
cfg.models.embedding.dimensions = 0;
cfg
}
fn setup() -> (tempfile::TempDir, KebabHandler) {
let dir = tempfile::tempdir().unwrap();
let data_dir = dir.path().join("data");
let workspace_root = dir.path().join("notes");
fs::create_dir_all(&data_dir).unwrap();
fs::create_dir_all(&workspace_root).unwrap();
let config = minimal_config(&data_dir, &workspace_root);
fs::write(
workspace_root.join("a.md"),
"# Alpha\n\nThis document mentions kebab and bread.",
)
.unwrap();
let scope = SourceScope { root: workspace_root.clone(), include: vec![], exclude: vec![] };
let _ = kebab_app::ingest_with_config(config.clone(), scope, false).unwrap();
let state = KebabAppState::new(config, None);
let handler = KebabHandler::new(state);
(dir, handler)
}
fn extract_json(result: &rmcp::model::CallToolResult) -> serde_json::Value {
assert!(!result.is_error.unwrap_or(false), "expected isError=false, got {result:?}");
let content = result.content.first().expect("at least one content item");
let text = match &content.raw {
RawContent::Text(t) => &t.text,
other => panic!("expected Text content, got {other:?}"),
};
serde_json::from_str(text).expect("valid JSON")
}
#[tokio::test]
async fn bulk_search_two_queries_returns_envelope() {
let (_dir, handler) = setup();
let input = kebab_mcp::tools::bulk_search::BulkSearchInput {
queries: vec![
json!({"query": "kebab", "mode": "lexical", "k": 5}),
json!({"query": "bread", "mode": "lexical", "k": 5}),
],
};
let result = kebab_mcp::tools::bulk_search::handle(handler.state(), input);
let v = extract_json(&result);
assert_eq!(v["schema_version"], "bulk_search_response.v1");
let results = v["results"].as_array().expect("results array");
assert_eq!(results.len(), 2);
for r in results {
assert_eq!(r["schema_version"], "bulk_search_item.v1");
assert!(r["response"].is_object());
assert!(r["error"].is_null());
}
assert_eq!(v["summary"]["total"], 2);
assert_eq!(v["summary"]["succeeded"], 2);
assert_eq!(v["summary"]["failed"], 0);
}
#[tokio::test]
async fn bulk_search_empty_queries_returns_empty_envelope() {
let (_dir, handler) = setup();
let input = kebab_mcp::tools::bulk_search::BulkSearchInput { queries: vec![] };
let result = kebab_mcp::tools::bulk_search::handle(handler.state(), input);
let v = extract_json(&result);
assert_eq!(v["schema_version"], "bulk_search_response.v1");
assert_eq!(v["results"].as_array().unwrap().len(), 0);
assert_eq!(v["summary"]["total"], 0);
}
#[tokio::test]
async fn bulk_search_invalid_item_field_continues_with_per_item_error() {
let (_dir, handler) = setup();
let input = kebab_mcp::tools::bulk_search::BulkSearchInput {
queries: vec![
json!({"query": "kebab", "mode": "lexical"}),
json!({"query": "bread", "mode": "bogus"}), // invalid mode
],
};
let result = kebab_mcp::tools::bulk_search::handle(handler.state(), input);
let v = extract_json(&result);
let results = v["results"].as_array().unwrap();
assert_eq!(results.len(), 2);
assert!(results[0]["error"].is_null());
assert!(results[1]["error"].is_object());
assert_eq!(results[1]["error"]["code"], "invalid_input");
assert_eq!(v["summary"]["succeeded"], 1);
assert_eq!(v["summary"]["failed"], 1);
}
#[tokio::test]
async fn bulk_search_over_cap_returns_tool_error() {
let (_dir, handler) = setup();
let queries: Vec<serde_json::Value> = (0..101)
.map(|_| json!({"query": "x", "mode": "lexical"}))
.collect();
let input = kebab_mcp::tools::bulk_search::BulkSearchInput { queries };
let result = kebab_mcp::tools::bulk_search::handle(handler.state(), input);
assert!(result.is_error.unwrap_or(false), "expected isError=true");
let content = result.content.first().expect("error content");
let text = match &content.raw {
RawContent::Text(t) => &t.text,
other => panic!("expected Text content, got {other:?}"),
};
assert!(text.contains("max 100"), "expected 'max 100' in error: {text}");
}

View File

@@ -1,13 +1,13 @@
//! Integration: `build_tools_vec` returns 7 tools with correct names and
//! Integration: `build_tools_vec` returns 8 tools with correct names and
//! inputSchema. Uses the extracted `pub fn build_tools_vec()` helper — no
//! transport or RequestContext needed.
use kebab_mcp::build_tools_vec;
#[test]
fn tools_list_returns_seven_tools() {
fn tools_list_returns_eight_tools() {
let tools = build_tools_vec();
assert_eq!(tools.len(), 7, "expected exactly 7 tools, got {}", tools.len());
assert_eq!(tools.len(), 8, "expected exactly 8 tools, got {}", tools.len());
let names: Vec<&str> = tools.iter().map(|t| t.name.as_ref()).collect();
assert!(names.contains(&"schema"), "missing 'schema' tool");
@@ -17,6 +17,7 @@ fn tools_list_returns_seven_tools() {
assert!(names.contains(&"ingest_file"), "missing 'ingest_file' tool");
assert!(names.contains(&"ingest_stdin"), "missing 'ingest_stdin' tool");
assert!(names.contains(&"fetch"), "missing 'fetch' tool");
assert!(names.contains(&"bulk_search"), "missing 'bulk_search' tool");
}
#[test]

View File

@@ -222,6 +222,23 @@ kebab --config /tmp/kebab-smoke/config.toml schema --json | jq .stats
Look for: `media_breakdown` (5 keys), `lang_breakdown`, `index_bytes`, `stale_doc_count`.
### Bulk multi-query (fb-42)
Stdin ndjson으로 N query 한 번에:
```bash
printf '{"query":"rust","mode":"lexical"}\n{"query":"async","mode":"lexical"}\n' \
| kebab --config /tmp/kebab-smoke/config.toml search --bulk --json
```
stdout: per-query ndjson (`bulk_search_item.v1`). stderr: `bulk_summary: total=2 succeeded=2 failed=0`.
MCP tool 동등:
```json
{"name":"kebab__bulk_search","arguments":{"queries":[{"query":"rust"},{"query":"async"}]}}
```
## P6-4 이미지 ingestion 옵션
`config.toml` 에 다음 절을 추가하면 `kebab ingest` 가 `**/*.png` / `**/*.jpg` 등 이미지 자산도 함께 색인합니다 (텍스트만 색인하려면 생략):

View File

@@ -0,0 +1,418 @@
# fb-39 Eval Foundation (P@k Metric) Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add chunk-level `precision_at_k_chunk` metric (P@5, P@10) to kebab-eval `AggregateMetrics`, plus golden-set ground-truth documentation strengthening — so a future fb-39b can measure whether a lever (chunk policy / RRF / cross-encoder / embedding upgrade) actually moves the rank-5+ noise needle.
**Architecture:** Single new field on `AggregateMetrics`, computed inside the existing `compute_aggregate_with_config` loop using the same accumulator pattern as `recall_at_k_doc` (sum-of-per-query-ratios / denominator), serialized via the existing `round_recall_map` helper. Denominator is k (fixed), matching the `hit_at_k` convention. Skip queries with empty `expected_chunk_ids`. Golden set schema unchanged — `expected_chunk_ids` is the ground truth (curator fills per-workspace).
**Tech Stack:** Rust 2024, serde, serde_yaml. No new deps.
**Spec:** `docs/superpowers/specs/2026-05-10-p9-fb-39-eval-foundation-design.md`
---
## File map
**Modify:**
- `crates/kebab-eval/src/metrics.rs` — add `precision_at_k_chunk` field on `AggregateMetrics`, init/accumulate/finalize inside `compute_aggregate_with_config`, plus unit tests.
- `fixtures/golden_queries.yaml` — strengthen header comment about `expected_chunk_ids` being P@k ground truth.
- `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` — add `precision_at_k_chunk` to §11 eval metric table.
- `tasks/p9/p9-fb-39-retrieval-precision-tuning.md` — flip status, link design + plan, "lever 적용 deferred to fb-39b" banner.
- `tasks/INDEX.md` — flip fb-39 row to ✅ (eval foundation only).
**Create:** none.
---
## Task 1: Add precision_at_k_chunk field + serde backwards-compat
**Files:**
- Modify: `crates/kebab-eval/src/metrics.rs`
- [ ] **Step 1: Append failing test to `mod tests`**
```rust
#[test]
fn precision_at_k_chunk_field_default_empty_on_old_json() {
// Old eval_runs.metrics_json predates fb-39 — no precision_at_k_chunk field.
// serde(default) should yield empty BTreeMap.
let old = serde_json::json!({
"hit_at_k": {"1": 0.5, "3": 0.5, "5": 0.5, "10": 0.5},
"mrr": 0.5,
"recall_at_k_doc": {"1": 0.0, "3": 0.0, "5": 0.0, "10": 0.0},
"citation_coverage": null,
"groundedness": 0.0,
"empty_result_rate": 0.0,
"refusal_correctness": null,
"total_queries": 1,
"failed_queries": 0
});
let parsed: AggregateMetrics = serde_json::from_value(old).expect("backwards-compat deserialize");
assert!(parsed.precision_at_k_chunk.is_empty());
}
#[test]
fn precision_at_k_chunk_field_serializes_when_populated() {
let mut p = BTreeMap::new();
p.insert(5, 0.6_f32);
p.insert(10, 0.3_f32);
let agg = AggregateMetrics {
hit_at_k: BTreeMap::new(),
mrr: 0.0,
recall_at_k_doc: BTreeMap::new(),
precision_at_k_chunk: p,
citation_coverage: 0.0,
groundedness: 0.0,
empty_result_rate: 0.0,
refusal_correctness: 0.0,
total_queries: 0,
failed_queries: 0,
};
let v = serde_json::to_value(&agg).unwrap();
assert_eq!(v["precision_at_k_chunk"]["5"], 0.6);
assert_eq!(v["precision_at_k_chunk"]["10"], 0.3);
}
```
- [ ] **Step 2: Run tests — expect compile errors (field undefined)**
```bash
cargo test -p kebab-eval --lib precision_at_k_chunk
```
Expected: errors — `precision_at_k_chunk` field missing on `AggregateMetrics`.
- [ ] **Step 3: Add field to `AggregateMetrics`**
In `crates/kebab-eval/src/metrics.rs`, find `pub struct AggregateMetrics { ... }` (~line 57). Add field after `recall_at_k_doc`:
```rust
/// p9-fb-39: chunk-level precision at k. Binary relevance via
/// `expected_chunk_ids` (a hit is "relevant" if its chunk_id is
/// in the golden's `expected_chunk_ids`). Denominator is k (fixed)
/// — `hits.len() < k` still divides by k, treating shortfall as
/// precision loss (mirrors `hit_at_k`). Queries with empty
/// `expected_chunk_ids` are skipped (mirrors `hit_at_k_chunk`).
#[serde(default)]
pub precision_at_k_chunk: BTreeMap<u32, f32>,
```
The other tests in the file (e.g. `hit_at_k_handles_ranks_1_4_miss`, `recall_at_k_doc_partial`) construct `AggregateMetrics` via the public `compute_aggregate_with_config` path, not via struct literal, so the new `#[serde(default)]` field does NOT break them. Only direct struct-literal constructions need updates — search the file to confirm:
```bash
grep -n "AggregateMetrics {" crates/kebab-eval/src/metrics.rs
```
For each direct struct-literal site, add `precision_at_k_chunk: BTreeMap::new(),` to the literal.
- [ ] **Step 4: Run tests — expect both new tests pass**
```bash
cargo test -p kebab-eval --lib precision_at_k_chunk
```
Expected: both pass.
- [ ] **Step 5: Run clippy**
```bash
cargo clippy -p kebab-eval --all-targets -- -D warnings
```
Expected: clean.
- [ ] **Step 6: Commit**
```bash
git add crates/kebab-eval/src/metrics.rs
git commit -m "feat(eval): AggregateMetrics.precision_at_k_chunk field (fb-39)"
```
---
## Task 2: Compute precision_at_k_chunk in aggregate loop
**Files:**
- Modify: `crates/kebab-eval/src/metrics.rs`
- [ ] **Step 1: Append failing tests to `mod tests`**
(Use the existing `make_query_result` / fixture helpers — read the top of the test module for available helpers, e.g. `mk_qr_with_chunks(query_id, chunk_ids_with_ranks)`.)
```rust
#[test]
fn precision_at_k_chunk_exact_match() {
// 1 query, expected = [c1, c2, c3]. Top-5 hits: [c1@1, c2@2, c3@3, x@4, y@5].
// P@5 = 3/5 = 0.6. P@10 = 3/10 = 0.3.
let queries = vec![mk_golden(
"g1",
&[], // expected_doc_ids
&["c1", "c2", "c3"], // expected_chunk_ids
&[], // must_contain
&[], // forbidden
None, // expected_refusal
)];
let rows = vec![mk_query_row(
"g1",
&[("c1", 1), ("c2", 2), ("c3", 3), ("x", 4), ("y", 5)],
)];
let agg = compute_from_inputs(&queries, &rows);
assert_eq!(agg.precision_at_k_chunk[&5], 0.6);
assert_eq!(agg.precision_at_k_chunk[&10], 0.3);
}
#[test]
fn precision_at_k_chunk_partial_topk_divides_by_k() {
// 1 query, expected = [c1, c2]. Top hits: only [c1@1, c2@2] (3 results total).
// P@5 = 2/5 = 0.4 (denominator k, not hits.len).
let queries = vec![mk_golden("g1", &[], &["c1", "c2"], &[], &[], None)];
let rows = vec![mk_query_row("g1", &[("c1", 1), ("c2", 2), ("x", 3)])];
let agg = compute_from_inputs(&queries, &rows);
assert_eq!(agg.precision_at_k_chunk[&5], 0.4);
assert_eq!(agg.precision_at_k_chunk[&10], 0.2);
}
#[test]
fn precision_at_k_chunk_zero_relevant_in_topk() {
// 1 query, expected = [c1]. Top hits all unrelated.
// P@5 = 0/5 = 0.0.
let queries = vec![mk_golden("g1", &[], &["c1"], &[], &[], None)];
let rows = vec![mk_query_row("g1", &[("x", 1), ("y", 2), ("z", 3)])];
let agg = compute_from_inputs(&queries, &rows);
assert_eq!(agg.precision_at_k_chunk[&5], 0.0);
assert_eq!(agg.precision_at_k_chunk[&10], 0.0);
}
#[test]
fn precision_at_k_chunk_empty_expected_skipped() {
// 1 query, expected_chunk_ids = []. Should be skipped — denom 0 → entry value 0.0
// (matches `recall_at_k_doc` behavior in `round_recall_map` for zero-denom).
let queries = vec![mk_golden("g1", &[], &[], &[], &[], None)];
let rows = vec![mk_query_row("g1", &[("c1", 1)])];
let agg = compute_from_inputs(&queries, &rows);
// Mirrors recall_at_k_doc: zero-denom → 0.0 in map (not absent).
assert_eq!(agg.precision_at_k_chunk[&5], 0.0);
assert_eq!(agg.precision_at_k_chunk[&10], 0.0);
}
#[test]
fn precision_at_k_chunk_two_queries_averaged() {
// q1: expected=[c1], hits=[c1@1, x@2, y@3] → P@5 = 1/5 = 0.2
// q2: expected=[c1, c2], hits=[c1@1, c2@2] → P@5 = 2/5 = 0.4
// Avg P@5 = (0.2 + 0.4) / 2 = 0.3.
let queries = vec![
mk_golden("g1", &[], &["c1"], &[], &[], None),
mk_golden("g2", &[], &["c1", "c2"], &[], &[], None),
];
let rows = vec![
mk_query_row("g1", &[("c1", 1), ("x", 2), ("y", 3)]),
mk_query_row("g2", &[("c1", 1), ("c2", 2)]),
];
let agg = compute_from_inputs(&queries, &rows);
assert_eq!(agg.precision_at_k_chunk[&5], 0.3);
}
```
The `mk_golden` / `mk_query_row` / `compute_from_inputs` helpers are existing test helpers in this file. Read the top of `mod tests` (~line 380-510) to confirm the actual helper names and signatures. If your helpers have different shapes (e.g. `mk_qr_with_chunks(id, &[(chunk, rank)])`), adapt the test calls accordingly.
If those helpers don't exist, look for the pattern in the existing `hit_at_k_handles_ranks_1_4_miss` test (~line 513) and mirror it.
- [ ] **Step 2: Run tests — expect failures**
```bash
cargo test -p kebab-eval --lib precision_at_k_chunk
```
Expected: 5 failures — `precision_at_k_chunk` map empty (only `#[serde(default)]` populates it from JSON; the compute path doesn't yet).
- [ ] **Step 3: Implement aggregation in `compute_aggregate_with_config`**
In `crates/kebab-eval/src/metrics.rs`, find `compute_aggregate_with_config` body. After the `recall_at_k_doc` accumulator init (~line 188-189), add:
```rust
let mut precision_at_k_chunk: BTreeMap<u32, (f64, u32)> =
TOP_K_VARIANTS.iter().map(|k| (*k, (0.0_f64, 0_u32))).collect();
```
Inside the loop, after the existing `hit@k + MRR` block (~line 222-247) which already gates on `!gq.expected_chunk_ids.is_empty()`, add a sibling `for k in TOP_K_VARIANTS { ... }` that updates `precision_at_k_chunk`. Place it INSIDE the same `if !gq.expected_chunk_ids.is_empty() { ... }` block so the skip-empty policy is shared:
```rust
// hit@k + MRR (chunk-level, requires non-empty expected_chunk_ids)
if !gq.expected_chunk_ids.is_empty() {
let expected: HashSet<&ChunkId> = gq.expected_chunk_ids.iter().collect();
// ... existing hit@k + MRR computation ...
// p9-fb-39: precision@k_chunk — count of top-k hits whose
// chunk_id is in `expected`, divided by k (fixed denominator).
for k in TOP_K_VARIANTS {
let hits_in_topk_relevant = qr
.hits_top_k
.iter()
.filter(|h| h.rank <= *k && expected.contains(&h.chunk_id))
.count();
let entry = precision_at_k_chunk.get_mut(k).expect("init");
entry.0 += hits_in_topk_relevant as f64 / f64::from(*k);
entry.1 += 1;
}
}
```
Then at the final `Ok(AggregateMetrics { ... })` return (~line 325-345), add:
```rust
precision_at_k_chunk: round_recall_map(&precision_at_k_chunk),
```
(`round_recall_map` is the existing helper at line ~366; it accepts `BTreeMap<u32, (f64, u32)>` and divides sum by denom, returning `BTreeMap<u32, f32>`. Same shape used by `recall_at_k_doc`.)
- [ ] **Step 4: Run tests — expect all 5 pass**
```bash
cargo test -p kebab-eval --lib precision_at_k_chunk
```
Expected: 5 passes.
- [ ] **Step 5: Run full kebab-eval suite**
```bash
cargo test -p kebab-eval
cargo clippy -p kebab-eval --all-targets -- -D warnings
```
Expected: no regressions; clippy clean.
- [ ] **Step 6: Commit**
```bash
git add crates/kebab-eval/src/metrics.rs
git commit -m "feat(eval): compute precision_at_k_chunk in aggregate loop (fb-39)"
```
---
## Task 3: Strengthen golden YAML header documentation
**Files:**
- Modify: `fixtures/golden_queries.yaml`
- [ ] **Step 1: Read existing header**
```bash
head -20 fixtures/golden_queries.yaml
```
- [ ] **Step 2: Replace header comment**
Find the existing header (the comment block above the first `- id: g001` entry). Replace with:
```yaml
# Golden query suite for `kebab eval run` (P5-1 / P5-2 / fb-39).
#
# Top-level: list of queries. Required fields: `id`, `query`. All
# others are optional and default to empty / null.
#
# Curators: `expected_doc_ids` and `expected_chunk_ids` MUST refer to
# real rows in the active workspace's SQLite store at run time. Stale
# references make the runner bail at start. The shipped template
# leaves them empty so the file is loadable on any fresh workspace —
# fill them in after a `kebab ingest` to enable the metrics that
# require ground truth (P5-2 + fb-39):
#
# - `expected_chunk_ids` → hit_at_k, MRR, precision_at_k_chunk (fb-39)
# - `expected_doc_ids` → recall_at_k_doc
#
# `precision_at_k_chunk` (fb-39): of the top-k retrieved hits, what
# fraction's `chunk_id` is in `expected_chunk_ids`. Denominator is k
# (fixed) — `top-k` shortfall is treated as precision loss. Queries
# with empty `expected_chunk_ids` are skipped from this metric.
#
# `must_contain` / `forbidden` drive the rule-based groundedness
# metric (P5-2).
```
- [ ] **Step 3: Verify YAML still parses**
```bash
cargo test -p kebab-eval --test golden_loader 2>/dev/null || cargo test -p kebab-eval load_golden
```
If a loader test exists, it should still pass. If not, run a quick parse check:
```bash
cargo run --bin kebab -- eval --help 2>/dev/null || true
```
(The shipped `golden_queries.yaml` is just a fixture — the workspace test loader will read it during integration tests and fail loudly if YAML is malformed.)
- [ ] **Step 4: Commit**
```bash
git add fixtures/golden_queries.yaml
git commit -m "docs(eval): document expected_chunk_ids as P@k ground truth (fb-39)"
```
---
## Task 4: Update design doc + spec status flip + INDEX
**Files:**
- Modify: `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`
- Modify: `tasks/p9/p9-fb-39-retrieval-precision-tuning.md`
- Modify: `tasks/INDEX.md`
- [ ] **Step 1: Update design §11 eval metric list**
```bash
grep -n "^## §11\|^## 11\|hit_at_k\|recall_at_k_doc\|precision" docs/superpowers/specs/2026-04-27-kebab-final-form-design.md | head -10
```
Find the §11 eval section (or wherever metrics are listed). Add a `precision_at_k_chunk` line next to `hit_at_k` / `recall_at_k_doc`:
```markdown
- `precision_at_k_chunk` (fb-39): top-k 안 chunk_id 가 `expected_chunk_ids` 에 포함된 비율. 분모 = k (fixed). `expected_chunk_ids` 빈 query 는 skip.
```
If the design doc doesn't currently list metrics inline, add a short subsection or bullet under §11 introducing it.
- [ ] **Step 2: Flip task spec status**
```bash
sed -i.bak 's/^status: open$/status: completed/' tasks/p9/p9-fb-39-retrieval-precision-tuning.md
rm tasks/p9/p9-fb-39-retrieval-precision-tuning.md.bak
```
Replace the existing `> ⏳ **백로그 only — 미구현.**` skeleton banner with:
```markdown
> ✅ **Eval foundation 부분 구현 완료.** P@k metric (P@5, P@10) 추가. 본 spec 의 lever 적용 (chunk policy / RRF / cross-encoder / embedding 업그레이드) 은 별도 task 로 분리 (fb-39b 이후).
>
> - Design: [`docs/superpowers/specs/2026-05-10-p9-fb-39-eval-foundation-design.md`](../../docs/superpowers/specs/2026-05-10-p9-fb-39-eval-foundation-design.md)
> - Plan: [`docs/superpowers/plans/2026-05-10-p9-fb-39-eval-foundation.md`](../../docs/superpowers/plans/2026-05-10-p9-fb-39-eval-foundation.md)
```
- [ ] **Step 3: Flip INDEX row**
In `tasks/INDEX.md`, find the fb-39 row. Replace its status with `✅ 머지 (2026-05-10) — eval foundation only, lever 적용 deferred` (mirror the fb-42 row format from the previous PR for consistency).
- [ ] **Step 4: Workspace test + clippy gate**
```bash
cargo test --workspace --no-fail-fast -j 1 2>&1 | tail -10
cargo clippy --workspace --all-targets -- -D warnings 2>&1 | tail -5
```
`-j 1` REQUIRED.
Expected: all green.
- [ ] **Step 5: Commit**
```bash
git add docs/superpowers/specs/2026-04-27-kebab-final-form-design.md tasks/p9/p9-fb-39-retrieval-precision-tuning.md tasks/INDEX.md
git commit -m "docs(fb-39): design §11 + spec status + INDEX (eval foundation)"
```
---
## Final verification checklist
- [ ] `cargo test --workspace --no-fail-fast -j 1` green
- [ ] `cargo clippy --workspace --all-targets -- -D warnings` clean
- [ ] `kebab eval run` (against any workspace with non-empty `expected_chunk_ids` in golden) emits `precision_at_k_chunk: {5: ..., 10: ...}` in the run's `metrics_json`
- [ ] design §11 + INDEX + task spec status flipped

View File

@@ -245,6 +245,12 @@ normalize: rrf_score = raw_rrf / (2 / (k_rrf + 1))
`score_kind` 는 wire schema v1 에 **optional** 필드로 추가 (additive, backwards-compat). 누락 시 historical default `rrf` 로 해석.
#### Bulk multi-query (fb-42)
`kebab search --bulk` (stdin ndjson) + `mcp__kebab__bulk_search` tool 신규. agent 가 N sub-query 한 번에 실행 — query decomposition 시 단일 round-trip. Cap 100 per call. Sequential for-loop, App instance 재사용 → 캐시 / embedder cold-start 비용 한 번만.
Per-query failure 는 `bulk_search_item.v1.error` (error.v1) 에 격리, 다른 query 계속 진행. wire shape additive minor (`bulk_search_item.v1` + `bulk_search_response.v1` 신규).
### 2.3 Answer
```json

View File

@@ -0,0 +1,133 @@
---
title: "p9-fb-39 — Eval foundation design (P@k metric)"
phase: P9
component: kebab-eval + docs
task_id: p9-fb-39
status: design
target_version: 0.7.0
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3 chunking, §4 search, §7 RAG, §11 eval]
date: 2026-05-10
---
# p9-fb-39 — Eval foundation (P@k metric)
## Goal
도그푸딩 피드백 — agent / 사용자가 "rank 5+ 부터 노이즈 섞임" 지적 (precision-at-k 저하). lever (chunk policy / RRF / score_gate / cross-encoder / embedding) 선택 전, **measurement infrastructure 먼저** 정비. 본 PR scope:
- `AggregateMetrics``precision_at_k_chunk: BTreeMap<u32, f32>` 추가 (P@5, P@10).
- chunk-level binary relevance 기반 — `expected_chunk_ids` 안 chunk 가 top-k 안 등장한 비율.
- Golden set schema 무변경 — `expected_chunk_ids` 가 ground truth (curator 책임).
- 문서화 강화 — `fixtures/golden_queries.yaml` 헤더 주석.
Lever 적용 (chunk policy / RRF tune / cross-encoder / embedding upgrade) 은 **본 spec 범위 외** — fb-39b 이후 별도 task 로 분리. 측정 도구가 먼저 있어야 lever 효과 비교 가능.
## Behavior contract
### Metric definition
```
P@k_chunk(query) = |top-k hits ∩ expected_chunk_ids| / k
```
**Denominator = k 고정**. `hits.len() < k` 인 경우에도 분모는 k — top-k 부족도 precision 손실로 간주 (`hit_at_k` 와 동일 컨벤션).
`expected_chunk_ids` 빈 query 는 metric 계산에서 skip (`hit_at_k_chunk` 와 동일 정책).
**Aggregation**: 모든 valid query (expected_chunk_ids 비어있지 않음) 의 P@k_chunk 평균. valid query 0 건이면 NaN → JSON null.
### Wire shape
`AggregateMetrics` 신규 field:
```rust
pub struct AggregateMetrics {
pub hit_at_k: BTreeMap<u32, f32>,
pub mrr: f32,
pub recall_at_k_doc: BTreeMap<u32, f32>,
/// p9-fb-39: chunk-level precision at k. Binary relevance via
/// `expected_chunk_ids`. Denominator = k (fixed). Skip queries
/// with empty `expected_chunk_ids`.
#[serde(default)]
pub precision_at_k_chunk: BTreeMap<u32, f32>,
// ... 기존 필드 ...
}
```
`#[serde(default)]` — 기존 eval_runs.metrics_json (옛 binary 가 기록한) 에 field 부재 시 empty BTreeMap 로 deserialize. backwards-compat 보장.
### k values
`compute_aggregate_metrics` 가 5, 10 두 값에 대해 계산. (기존 `hit_at_k` / `recall_at_k_doc` 가 이미 동일 k 사용 — 재사용.)
## Allowed / forbidden dependencies
- `kebab-eval`: 신규 dep 없음. metrics 모듈 확장만.
- 다른 crate 무수정.
`kebab-eval``metrics` / `compare` 모듈은 retrieval / embedding / LLM crate 직접 import 금지 룰 그대로 (P5 inheritance).
## Public surface delta
### kebab-eval::metrics
```rust
pub struct AggregateMetrics {
// ... 기존 ...
#[serde(default)]
pub precision_at_k_chunk: BTreeMap<u32, f32>,
}
```
`compute_aggregate_metrics` body 안 새 누적 BTreeMap + 평균 계산 추가. NaN handling 은 기존 `serialize_f32_nan_as_null` 패턴 재사용 — 단, BTreeMap<u32, f32> 의 NaN 처리 패턴이 hit_at_k 와 동일하게 round_recall_map 같은 helper 통해.
## Test plan
| kind | description |
|------|-------------|
| unit (metrics) | `precision_at_k_chunk` empty expected → query skip → metric BTreeMap 안 entry 부재 또는 NaN |
| unit (metrics) | exact match: 5 hits, top-3 in expected → P@5 = 3/5 = 0.6 |
| unit (metrics) | partial top-k: hits.len() = 3 < k=5, all 3 in expected → P@5 = 3/5 = 0.6 (분모 k 고정) |
| unit (metrics) | top-k 안 expected 0건 → P@5 = 0.0 |
| unit (metrics) | 모든 query expected 비어있음 → P@k entry 부재 또는 NaN → JSON null |
| unit (metrics) | `AggregateMetrics` serde roundtrip — precision_at_k_chunk 신규 field 보존 |
| unit (metrics) | 옛 JSON (precision_at_k_chunk 부재) deserialize → empty BTreeMap default |
| 통합 (eval runner) | runner end-to-end → eval_runs.metrics_json 안 precision_at_k_chunk 채워짐 |
snapshot tests (기존 metrics 출력 fixture 가 있다면 갱신 — `cargo test -p kebab-eval` 수행 후 fixture diff 확인).
## Implementation steps (high-level)
1. `kebab-eval::metrics`: `AggregateMetrics.precision_at_k_chunk` field 추가 + 계산 로직 + 단위 테스트.
2. snapshot tests 갱신 (있다면).
3. `fixtures/golden_queries.yaml` 헤더 주석 강화 — `expected_chunk_ids` 채우기 가이드.
4. README `kebab eval` 섹션 또는 design §11 eval 에 P@k 정의 한 줄 추가.
5. tasks/INDEX.md / spec status flip.
3-5 step PR. 단일 세션 내 완료 가능.
## Risks / notes
- **분모 = k 고정 정책**: `hits.len() < k` 인 query 가 많으면 P@k 가 항상 < 1.0. 사용자 직관과 다를 수 있음 — README/design 에 명시.
- **frozen design vs new metric**: design §11 eval 의 metric 표 갱신 필요. frozen contract 변경 트리거 — `target_version: 0.7.0` bump 명시.
- **lever deferral**: 본 spec contract_sections 는 §3 chunking + §4 search + §7 RAG + §11 eval 인데, 실제 본 PR 은 §11 만 건드림. lever 적용 (chunk policy / RRF / cross-encoder / embedding) 은 fb-39b 이후 별도. spec status banner 에 명시.
- **expected_chunk_ids 비어있는 shipped golden**: 현재 `fixtures/golden_queries.yaml` 의 g001-g005 모두 expected_chunk_ids 비어있음. P@k 계산 시 모두 skip — out-of-the-box 측정값 0건. curator 가 자기 KB 로 채워야 metric 의미 가짐. 의도 — golden set 은 workspace 의존이라 shipped fixtures 는 template, 실제 측정은 user 가 채워서 한다.
- **fb-23 incremental ingest 와 충돌 없음**: 본 PR 은 metric 만 추가. chunker_version / embedding_version 무변경.
## Out of scope
- Lever 적용 (chunk policy retune / RRF k tune / score_gate default ON / cross-encoder PoC / embedding model 업그레이드).
- NDCG / MAP / 기타 ranking metric.
- precision_at_k_doc (doc-level — recall_at_k_doc 가 이미 있음, 본 spec 은 chunk-level 만).
- Golden set 콘텐츠 확장 (g006+ 추가) — curator 책임.
- Synthetic golden generator (`kebab eval golden-from-corpus` 등).
- Per-query relevance score (binary 0/1 만 — graded relevance 는 NDCG 도입 시 검토).
## Documentation updates (implementation PR 동시)
- `fixtures/golden_queries.yaml` — 헤더 주석에 `expected_chunk_ids` ground truth 의미 + P@k 측정 위해 채우기 권장 안내.
- `README.md``kebab eval` 섹션 (있다면) 에 P@k metric 한 줄. 없으면 skip.
- `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` §11 eval — metric 표에 `precision_at_k_chunk` 한 줄 추가.
- `tasks/p9/p9-fb-39-retrieval-precision-tuning.md``status: open → completed`, 단 banner 에 "eval foundation only, lever 적용 deferred to fb-39b" 명시 + design/plan 링크.
- `tasks/INDEX.md` — fb-39 행 ✅ (eval foundation only).

View File

@@ -0,0 +1,20 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://kb.local/wire/v1/bulk_search_item.schema.json",
"title": "BulkSearchItem v1",
"description": "p9-fb-42: per-query result inside a bulk_search response. `response` XOR `error` — exactly one is non-null. `query` is the input echo so consumers can correlate without index tracking.",
"type": "object",
"required": ["schema_version", "query", "response", "error"],
"properties": {
"schema_version": { "const": "bulk_search_item.v1" },
"query": { "type": "object", "description": "Input echo (verbatim JSON object)." },
"response":{
"type": ["object", "null"],
"description": "search_response.v1 payload on success; null when error is non-null."
},
"error": {
"type": ["object", "null"],
"description": "error.v1 payload when this query failed; null on success."
}
}
}

View File

@@ -0,0 +1,24 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://kb.local/wire/v1/bulk_search_response.schema.json",
"title": "BulkSearchResponse v1",
"description": "p9-fb-42: MCP envelope for bulk_search. CLI emits raw `bulk_search_item.v1` ndjson without this envelope (summary on stderr).",
"type": "object",
"required": ["schema_version", "results", "summary"],
"properties": {
"schema_version": { "const": "bulk_search_response.v1" },
"results": {
"type": "array",
"items": { "type": "object", "description": "bulk_search_item.v1" }
},
"summary": {
"type": "object",
"required": ["total", "succeeded", "failed"],
"properties": {
"total": { "type": "integer", "minimum": 0 },
"succeeded": { "type": "integer", "minimum": 0 },
"failed": { "type": "integer", "minimum": 0 }
}
}
}
}

View File

@@ -24,7 +24,7 @@
"required": [
"json_mode", "ingest_progress", "ingest_cancellation",
"rag_multi_turn", "search_cache", "incremental_ingest",
"streaming_ask", "http_daemon", "mcp_server", "single_file_ingest"
"streaming_ask", "http_daemon", "mcp_server", "single_file_ingest", "bulk_search"
]
},
"models": {

View File

@@ -28,11 +28,12 @@ User-specific trigger keywords (team names, system names, internal acronyms) bel
## MCP tools (preferred)
When `kebab` is registered as an MCP server (see `~/.claude/mcp.json` example below), seven tools are exposed as `mcp__kebab__<name>`:
When `kebab` is registered as an MCP server (see `~/.claude/mcp.json` example below), eight tools are exposed as `mcp__kebab__<name>`:
| tool | purpose | mutation |
|------|---------|----------|
| `mcp__kebab__search` | corpus search → `search_response.v1` (`{hits, next_cursor, truncated}`) | no |
| `mcp__kebab__bulk_search` | N queries in one call → `bulk_search_response.v1` (`{results, summary}`) | no |
| `mcp__kebab__ask` | RAG answer → `answer.v1` | no |
| `mcp__kebab__fetch` | verbatim text → `fetch_result.v1` (chunk / doc / span) | no |
| `mcp__kebab__schema` | capability discovery → `schema.v1` | no |
@@ -60,6 +61,17 @@ Input:
- When `truncated: true`, the budget loop modified the page (snippet shortening or k reduction). `next_cursor` is **independent** — non-null whenever more hits may be reachable. Caller may widen `max_tokens` (re-issue same query for fuller snippets / more hits per page) or follow `next_cursor` (advance through more hits) or both. Mismatched cursor (corpus_revision changed) returns `error.v1.code = stale_cursor` — re-issue the search to obtain a fresh one.
- **`trace: true` (p9-fb-37)** — debug aid. Response carries an extra `trace` block: `lexical[]` + `vector[]` (pre-fusion candidates), `rrf_inputs[]` (RRF union before final cut), and `timing` (`lexical_ms`, `vector_ms`, `fusion_ms`, `total_ms`). Trace bypasses the search cache (always cold). Use sparingly — it bloats the wire response and is for diagnosing "why did this hit / not hit", not normal retrieval.
### `mcp__kebab__bulk_search`
N개 query 한 번에 — agent loop 효율 개선. 각 query 는 `mcp__kebab__search` 와 동일 input shape (query 필수, 나머지 optional). Cap 100.
Input:
```json
{"queries": [{"query": "..."}, {"query": "...", "mode": "lexical"}, ...]}
```
Output: `bulk_search_response.v1` envelope — `results: [bulk_search_item.v1]` (각 item = `{query, response | null, error | null}`) + `summary: {total, succeeded, failed}`. Per-query 실패는 item 의 error 에 격리, 다른 query 계속 진행.
### `mcp__kebab__ask` — when you need the answer
Use when the user wants a synthesized answer, not a list of links.

View File

@@ -134,7 +134,7 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.
### 🎯 0.6.0 또는 P+ — reasoning
- [p9-fb-41 multi-hop reasoning](p9/p9-fb-41-multi-hop-reasoning.md) — ⏳ 미구현, brainstorm 필요 (XL, eval 인프라 선행)
- [p9-fb-42 bulk multi-query + re-rank hint](p9/p9-fb-42-bulk-multi-query-rerank.md) — ⏳ 미구현, brainstorm 필요 (Nice)
- [p9-fb-42 bulk multi-query + re-rank hint](p9/p9-fb-42-bulk-multi-query-rerank.md) — ✅ 머지 (2026-05-10) — bulk only, rerank hint deferred
## Post-merge 핫픽스

View File

@@ -3,7 +3,7 @@ phase: P9
component: kebab-cli + kebab-search
task_id: p9-fb-42
title: "Bulk multi-query + re-rank hint — agent loop 효율"
status: open
status: completed
target_version: 0.6.0+
depends_on: []
unblocks: []
@@ -14,7 +14,10 @@ source_feedback: 사용자 도그푸딩 2026-05-06 — agent 가 N 개 query 동
# p9-fb-42 — Bulk multi-query + re-rank hint
> **백로그 only — 미구현 (Nice-to-have).** 본 spec 은 도그푸딩 피드백 skeleton. 구현 착수 전 [superpowers:brainstorming](../../docs/superpowers/) 으로 설계 단계 선행 필요. multi-query input 형식 / 결과 합성 정책 / re-rank hint 의 LLM 호출 비용 brainstorm 후 확정.
> **Bulk multi-query 부분 구현 완료.** 본 spec 의 rerank hint lever 는 별도 task 로 분리 (fb-39 cross-encoder 설계 후).
>
> - Design: [`docs/superpowers/specs/2026-05-10-p9-fb-42-bulk-multi-query-design.md`](../../docs/superpowers/specs/2026-05-10-p9-fb-42-bulk-multi-query-design.md)
> - Plan: [`docs/superpowers/plans/2026-05-10-p9-fb-42-bulk-multi-query.md`](../../docs/superpowers/plans/2026-05-10-p9-fb-42-bulk-multi-query.md)
## 증상 / 동기