# Multi-hop golden query suite for `kebab eval run` (fb-41 baseline + post-merge Δ). # # Sister to `fixtures/golden_queries.yaml` (single-pass). Same `GoldenQuery` # shape (kebab_eval::types::GoldenQuery) so the existing runner can ingest # both fixtures without code changes — the multi-hop pipeline (fb-41 PR-2+) # will dispatch on AskOpts.multi_hop, NOT on the fixture file. # # Curators: `expected_chunk_ids` / `expected_doc_ids` MUST refer to real # rows in the active workspace's SQLite store at run time. Leave empty # until you have ingested the corpus (`kebab ingest` over the kebab repo # itself). The runner skips metrics that need ground-truth refs. # # fb-41 measurement protocol: # 1) Pre-PR-2 (current binary, single-pass): baseline run → capture # P@5, P@10, must_contain pass rate, citation_coverage. # 2) Post-PR-3 (multi-hop enabled): same fixture, AskOpts.multi_hop=true # → re-run → Δ vs baseline. # 3) Cross-doc + intra-doc questions are the ones we expect to improve. # Single-fact negatives detect multi-hop regression (added LLM hops # should not make simple lookups worse). # # Question buckets (5 / 5 / 5): # - `mh-c-*` — cross-doc multi-hop (README + HANDOFF + design doc, two # or more docs needed) # - `mh-i-*` — intra-doc multi-hop (same doc, two sections joined) # - `mh-s-*` — single-fact negative (multi-hop should not regress) # ── Cross-doc multi-hop ────────────────────────────────────────────── - id: mh-c-001 query: "kebab 가 지원하는 모든 미디어 타입 (markdown / image / pdf / code) 과 각 타입의 chunker_version 은?" lang: ko must_contain: ["markdown", "image", "pdf", "chunker_version"] difficulty: multi-hop - id: mh-c-002 query: "v0.17.0 의 trigram tokenizer migration (V007) 가 한국어 lexical 검색에 미친 영향과, 그로 인해 발생한 heading_path_json 노이즈 문제는 어떤 후속 release 에서 해결됐나?" lang: ko must_contain: ["trigram", "heading_path", "v0.17.2", "column filter"] difficulty: multi-hop - id: mh-c-003 query: "kebab 의 RAG pipeline 에서 LLM endpoint timeout 노브 (`request_timeout_secs`) 가 LLM 과 OCR 두 곳에 별도로 존재하는데, 둘이 분리된 이유와 각각의 default 값은?" lang: ko must_contain: ["request_timeout_secs", "models.llm", "image.ocr", "300"] difficulty: multi-hop - id: mh-c-004 query: "kebab MCP server (fb-30) 가 노출하는 tool 의 개수와 각 tool 이 호출하는 kebab-app facade fn 은? mutation tool (fb-31) 도입 후 read-only 정책은 어떻게 변경됐나?" lang: ko must_contain: ["search", "ask", "schema", "doctor", "ingest_file", "ingest_stdin"] difficulty: multi-hop - id: mh-c-005 query: "Which wire schemas in kebab's v1 contract carry the `indexed_at` / `stale` staleness fields added by fb-32? List every schema id under wire schema v1." lang: en must_contain: ["schema_version", "indexed_at", "stale", "search_hit", "citation"] difficulty: multi-hop # ── Intra-doc multi-hop ────────────────────────────────────────────── - id: mh-i-001 query: "How do the boundary rules in design §3 chunking and the chunk_id recipe in §5 storage cascade together? What pieces of the chunk_id come from each section?" lang: en must_contain: ["chunker_version", "policy_hash", "chunk_id"] difficulty: multi-hop - id: mh-i-002 query: "HANDOFF.md 의 phase 로드맵 표에서 P10 의 현재 status 와, 같은 doc 의 다음 task 후보 절에 등장하는 미구현 fb-* 항목들?" lang: ko must_contain: ["P10", "fb-41"] difficulty: multi-hop - id: mh-i-003 query: "README 의 kebab ingest 명령이 지원한다고 명시한 모든 확장자와, 같은 README 의 Configuration 절에 등장하는 `workspace.exclude` default pattern 의 관계?" lang: ko must_contain: [".md", ".pdf", ".png", "workspace.exclude"] difficulty: multi-hop - id: mh-i-004 query: "CLAUDE.md 의 facade rule (kebab-app 만 UI binary 가 import 가능) 과, 같은 doc 의 Allowed / forbidden deps 절에서 kebab-core 의 의존성 제약 — 두 규약이 어떻게 일관성 있게 작동하는가?" lang: ko must_contain: ["kebab-app", "kebab-core", "facade"] difficulty: multi-hop - id: mh-i-005 query: "p10 의 tier 1 AST chunker 와 tier 3 paragraph fallback 의 차이, 그리고 둘이 같은 file 에 적용되는 dispatch 순서?" lang: ko must_contain: ["AST", "paragraph", "fallback"] difficulty: multi-hop # ── Single-fact negative (regression detection) ────────────────────── - id: mh-s-001 query: "kebab 의 default embedding model 은?" lang: ko must_contain: ["multilingual-e5-large"] difficulty: easy - id: mh-s-002 query: "What license does kebab ship under?" lang: en must_contain: ["MIT", "Apache"] difficulty: easy - id: mh-s-003 query: "kebab.sqlite 파일의 default 위치는?" lang: ko must_contain: ["~/.local/share/kebab", ".local/share"] difficulty: easy - id: mh-s-004 query: "kebab tui 의 mode machine 에서 NORMAL → INSERT 토글 키는?" lang: ko must_contain: ["INSERT", "i 입력모드"] difficulty: easy - id: mh-s-005 query: "kebab 의 RRF k 파라미터 default 값은?" lang: ko must_contain: ["60"] difficulty: easy