fix(rag): fb-41 PR-7 — multi-hop pre-decompose score-gate (S7 hallucination 회귀 핀)

v0.18 cut 전 fb-41 multi-hop RAG 도그푸딩에서 발견된 **safety regression** fix. 자세한 도그푸딩 결과는 `tasks/HOTFIXES.md` 의 2026-05-25 fb-41 pre-v0.18 entry + `/build/cache/dogfood-v018/results/SUMMARY.md` 참조. ## 문제 (S7) Query: `What is the chemical formula of caffeine?` (KB 에 없는 fact). - Single-pass `kebab ask`: retrieve top score 가 default `rag.score_gate = 0.30` 미만 → `refuse_score_gate` → 안전한 refusal. - Multi-hop `kebab ask --multi-hop`: **`grounded = true`**, 본문 `"카페인의 화학식은 C₉H₁₅N₃O 입니다 [#6]"` (hallucination — 실제 C₈H₁₀N₄O₂) + `[#6]` 가 Adam optimizer chunk 의 `g_t = ∂L/∂θ_i` 본문을 인용 (시각적 short structured token 매칭 trigger). 원인: `ask_multi_hop` 의 score-gate 검사가 *pool 의 top_score* 만 봤다. multi-hop 의 pool 은 5 sub-queries 의 union — 한 sub-query 의 top score 가 gate 위면 다른 chunks 가 원본 query 와 무관해도 gate 통과 + synth → LLM hallucinate. ## Fix `ask_multi_hop` entry 에 **pre-decompose probe** 추가: 1. *원본 query* 로 retrieve 한 번 (LLM call 0회, ~ms). 2. probe empty → `refuse_no_chunks(None)` (decompose 안 함, hops=None). 3. probe top_score < gate → `refuse_score_gate(None)` (decompose 안 함). 4. probe pass → 기존 decompose / decide / synthesize flow 그대로. Multi-hop 의 safety floor 가 single-pass 와 정확히 일치 — multi-hop 은 *원본 query 가 이미 KB 범위 내* 일 때만 cross-doc reasoning 추가. 비용: 한 번의 retrieve (수 ms), LLM call 없음. multi-hop 의 LLM-dominated latency 대비 무시 가능. ## Tests 신규 3 회귀 핀 (`crates/kebab-rag/tests/multi_hop.rs`): - `multi_hop_below_probe_gate_refuses_before_any_llm_call` — **S7 직접 회귀 핀**. low-score chunk + empty LM script → score_gate refusal, LM calls 0회, hops=None. fix revert 시 즉시 panic. - `multi_hop_empty_probe_pool_refuses_before_any_llm_call` — empty retrieve 시 NoChunks refusal, LM calls 0회. - `multi_hop_above_probe_gate_proceeds_to_decompose` — probe pass 시 full multi-hop flow 정상 (decompose + decide + synth). 기존 7 multi-hop test 의 `ScriptedRetriever` 에 *probe-pass entry* prepend + `retriever_handle.calls()` expectation +1. test 2 / test 4 처럼 entry 두 개였던 곳도 prepend (3 entries). `multi_hop_refuse_no_chunks_preserves_hops_trace` / `multi_hop_refuse_score_gate_preserves_hops_trace` 의 의미 좁힘 — 이제 *decompose-driven* refusal (probe pass 후 sub-query retrieve 가 empty 또는 below-gate) 만 검증. *probe-driven* refusal 은 hops=None (decompose 안 함) — 신규 test 가 그 path 핀. ## 검증 - `cargo test -p kebab-rag -j 1` — 10 multi-hop (7 갱신 + 3 신규) + 19 pipeline + 31 unit + 3 prompt_template + 3 streaming 모두 통과. 회귀 없음. - `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean. - 단일 crate 직렬 build (16 GB RAM 제약). ## 변경 없음 - Wire schema — `Answer.hops` shape 동일, `refusal_reason` enum 동일. - 다른 도그푸딩 발견 (synthesize citation 일관성, latency, binary path confusion) — v0.18.1 또는 별 PR 의 책임. HOTFIXES 의 "다른 도그푸딩 발견" 절에 명시. ## 다음 PR-7 머지 후: 1. Workspace `Cargo.toml` version 0.17.2 → 0.18.0 (minor bump). 2. HANDOFF.md / INDEX.md 갱신 + frozen design §3.8 multi-hop sub-section. 3. `gitea-release v0.18.0 --auto-notes`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 12:02:11 +00:00
parent 5bfea3c28b
commit da25ce330b
3 changed files with 295 additions and 27 deletions
--- a/crates/kebab-rag/src/pipeline.rs
+++ b/crates/kebab-rag/src/pipeline.rs
@@ -619,6 +619,60 @@ impl RagPipeline {
    /// eval `compare` can isolate multi-hop runs from single-pass.
    pub fn ask_multi_hop(&self, query: &str, opts: AskOpts) -> Result<Answer> {
        let started = std::time::Instant::now();
+        let k_effective = opts.k.max(self.config.search.default_k);
+
+        // ── 0. Pre-decompose score-gate probe (v0.18 dogfood fix) ──────────
+        //
+        // p9-fb-41 v0.18 pre-cut dogfood (`/build/cache/dogfood-v018/
+        // results/SUMMARY.md`) found that an out-of-corpus query
+        // ("What is the chemical formula of caffeine?") on the multi-
+        // hop path returned `grounded=true` with hallucinated content +
+        // a misattributed citation marker. Cause: multi-hop's 5-sub-
+        // query union pool fills with chunks each loosely matching one
+        // sub-query, then the post-pool score gate (which inspects
+        // `pool[0].fusion_score`) sees a sub-query's hit and passes —
+        // even though the *original* query never matched anything
+        // above the gate.
+        //
+        // Fix: probe the original query exactly the way single-pass
+        // `ask` would, before any decompose / decide LLM call. If
+        // top_score < gate (or hits empty), refuse with the same
+        // envelope single-pass would emit. Multi-hop's safety floor
+        // is now identical to single-pass's — multi-hop only *adds*
+        // the cross-doc reasoning when the original query is already
+        // in scope.
+        //
+        // Cost: one extra retrieve call (~ms, no LLM). Negligible vs.
+        // the LLM-dominated multi-hop latency.
+        let probe_query = SearchQuery {
+            text: query.to_string(),
+            mode: opts.mode,
+            k: k_effective,
+            filters: SearchFilters::default(),
+        };
+        let mut probe_hits = self
+            .retriever
+            .search(&probe_query)
+            .context("kb-rag: multi-hop probe retriever.search")?;
+        let probe_now = OffsetDateTime::now_utc();
+        let probe_threshold = self.config.search.stale_threshold_days;
+        for h in &mut probe_hits {
+            h.stale = compute_stale(h.indexed_at, probe_now, probe_threshold);
+        }
+        if probe_hits.is_empty() {
+            return self.refuse_no_chunks(query, &opts, k_effective, started, None);
+        }
+        if probe_hits[0].retrieval.fusion_score < self.config.rag.score_gate {
+            return self.refuse_score_gate(
+                query,
+                &opts,
+                &probe_hits,
+                k_effective,
+                started,
+                None,
+            );
+        }
+
        let mut hops: Vec<HopRecord> = Vec::new();

        // ── 1. Decompose (iter 0) ──────────────────────────────────────────
@@ -647,7 +701,7 @@ impl RagPipeline {
        // is needed. The LLM emits new sub-queries (continue) or `[]`
        // (stop); the loop also breaks when `max_depth` or
        // `max_pool_chunks` cap fires (`forced_stop = true`).
-        let k_effective = opts.k.max(self.config.search.default_k);
+        // `k_effective` already computed at the probe step above.
        let max_depth = self.config.rag.multi_hop_max_depth;
        let max_pool = self.config.rag.multi_hop_max_pool_chunks as usize;
        let mut pool: Vec<SearchHit> = Vec::new();
--- a/crates/kebab-rag/tests/multi_hop.rs
+++ b/crates/kebab-rag/tests/multi_hop.rs
@@ -58,7 +58,9 @@ fn multi_hop_decide_stop_triggers_synthesize() {
    let did = id32("d1");
    env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
    let hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
-    let retriever = Arc::new(ScriptedRetriever::new(vec![hits]));
+    // PR-7: ScriptedRetriever entry 0 = probe retrieve (pre-decompose
+    // score-gate), entry 1 = decompose-driven retrieve for "q1".
+    let retriever = Arc::new(ScriptedRetriever::new(vec![hits.clone(), hits]));
    let retriever_handle = retriever.clone();
    let retriever_dyn: Arc<dyn Retriever> = retriever;

@@ -84,8 +86,8 @@ fn multi_hop_decide_stop_triggers_synthesize() {
    );
    assert_eq!(
        retriever_handle.calls(),
-        1,
-        "single sub-query → single retrieval"
+        2,
+        "probe retrieve + 1 sub-query retrieve = 2"
    );

    let hops = answer.hops.expect("multi-hop happy path stamps Some(hops)");
@@ -115,9 +117,10 @@ fn multi_hop_decide_continue_adds_more_chunks() {
    let did2 = id32("d2");
    env.seed_chunk(&cid1, &did1, "notes/a.md", "Chunk one.", &["A"]);
    env.seed_chunk(&cid2, &did2, "notes/b.md", "Chunk two.", &["B"]);
-    // iter 1 retrieves chunk 1; iter 2 retrieves chunk 2 (different
-    // chunk_id → pool grows).
+    // PR-7: entry 0 = probe (above gate), entry 1 = iter 1 retrieves
+    // chunk 1, entry 2 = iter 2 retrieves chunk 2.
    let retriever = Arc::new(ScriptedRetriever::new(vec![
+        vec![mk_hit(1, &cid1, &did1, "notes/a.md", 0.85, &["A"])],
        vec![mk_hit(1, &cid1, &did1, "notes/a.md", 0.85, &["A"])],
        vec![mk_hit(1, &cid2, &did2, "notes/b.md", 0.80, &["B"])],
    ]));
@@ -146,8 +149,8 @@ fn multi_hop_decide_continue_adds_more_chunks() {
    );
    assert_eq!(
        retriever_handle.calls(),
-        2,
-        "iter 1 retrieves q1, iter 2 retrieves q2"
+        3,
+        "probe + iter 1 retrieves q1 + iter 2 retrieves q2"
    );
    assert_eq!(
        answer.retrieval.chunks_returned, 2,
@@ -187,7 +190,8 @@ fn multi_hop_max_depth_force_stops() {
    cfg.rag.multi_hop_max_depth = 1;

    let hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
-    let retriever = Arc::new(ScriptedRetriever::new(vec![hits]));
+    // PR-7: entry 0 = probe, entry 1 = decompose-driven retrieve.
+    let retriever = Arc::new(ScriptedRetriever::new(vec![hits.clone(), hits]));
    let retriever_handle = retriever.clone();
    let retriever_dyn: Arc<dyn Retriever> = retriever;

@@ -210,7 +214,7 @@ fn multi_hop_max_depth_force_stops() {
        2,
        "depth-cap skips decide → only decompose + synthesize"
    );
-    assert_eq!(retriever_handle.calls(), 1);
+    assert_eq!(retriever_handle.calls(), 2, "probe + 1 decompose retrieve");

    let hops = answer.hops.expect("happy path stamps hops");
    assert_eq!(hops.len(), 3, "[Decompose, Decide(forced_stop), Synthesize]");
@@ -238,9 +242,10 @@ fn multi_hop_pool_chunks_dedup_by_chunk_id() {
    let did = id32("d1");
    env.seed_chunk(&cid, &did, "notes/a.md", "Shared chunk text.", &["X"]);
    // Both sub-queries retrieve the same chunk_id — dedup must
-    // keep exactly one pool entry.
+    // keep exactly one pool entry. PR-7: entry 0 = probe.
    let shared_hit = mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["X"]);
    let retriever = Arc::new(ScriptedRetriever::new(vec![
+        vec![shared_hit.clone()],
        vec![shared_hit.clone()],
        vec![shared_hit],
    ]));
@@ -262,8 +267,8 @@ fn multi_hop_pool_chunks_dedup_by_chunk_id() {
    assert!(answer.grounded);
    assert_eq!(
        retriever_handle.calls(),
-        2,
-        "two sub-queries → two retrieval calls"
+        3,
+        "probe + two sub-query retrieves"
    );
    assert_eq!(
        answer.retrieval.chunks_returned, 1,
@@ -295,7 +300,8 @@ fn multi_hop_decide_parse_failure_falls_through_to_synthesize() {
    let did = id32("d1");
    env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
    let hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
-    let retriever = Arc::new(ScriptedRetriever::new(vec![hits]));
+    // PR-7: entry 0 = probe, entry 1 = decompose-driven retrieve.
+    let retriever = Arc::new(ScriptedRetriever::new(vec![hits.clone(), hits]));
    let retriever_dyn: Arc<dyn Retriever> = retriever;

    // Decide LLM emits non-JSON garbage. Spec §9: this is NOT a
@@ -347,15 +353,25 @@ fn multi_hop_decide_parse_failure_falls_through_to_synthesize() {
 //
 // PR-3b-ii widens `refuse_no_chunks` to accept `hops:
 // Option<Vec<HopRecord>>` and wires `ask_multi_hop` to forward the
-// partial trace. This test pins that contract — a refusal Answer
-// still carries the decompose + decide hops the loop accumulated
-// before pool came up empty.
+// partial trace. PR-7 added a pre-decompose probe — so this test
+// now exercises the *decompose-driven* empty-pool path: probe
+// passes (KB has at least one relevant chunk), decompose emits
+// sub-queries, but the sub-query retrieve hits nothing → pool stays
+// empty → refuse_no_chunks with the partial hop trace preserved.
+// (For the *probe-driven* refusal, see
+// `multi_hop_empty_probe_pool_refuses_before_any_llm_call` —
+// that path returns hops=None because decompose never ran.)

 #[test]
 fn multi_hop_refuse_no_chunks_preserves_hops_trace() {
    let env = RagEnv::new();
-    // Retriever returns 0 hits — pool stays empty → refuse_no_chunks.
-    let retriever = Arc::new(ScriptedRetriever::new(vec![vec![]]));
+    let cid = id32("c1");
+    let did = id32("d1");
+    env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
+    let probe_hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
+    // PR-7: entry 0 = probe (passes gate), entry 1 = decompose-driven
+    // retrieve (empty — sub-query returned nothing).
+    let retriever = Arc::new(ScriptedRetriever::new(vec![probe_hits, vec![]]));
    let retriever_handle = retriever.clone();
    let retriever_dyn: Arc<dyn Retriever> = retriever;

@@ -373,7 +389,11 @@ fn multi_hop_refuse_no_chunks_preserves_hops_trace() {

    assert!(!answer.grounded);
    assert_eq!(answer.refusal_reason, Some(RefusalReason::NoChunks));
-    assert_eq!(retriever_handle.calls(), 1, "single sub-query → single retrieve");
+    assert_eq!(
+        retriever_handle.calls(),
+        2,
+        "probe (passes) + 1 decompose-driven retrieve (empty)"
+    );
    assert_eq!(lm_handle.calls(), 1, "decompose only — decide skipped (empty pool), no synthesize");

    let hops = answer
@@ -398,14 +418,25 @@ fn multi_hop_refuse_no_chunks_preserves_hops_trace() {

 #[test]
 fn multi_hop_refuse_score_gate_preserves_hops_trace() {
+    // PR-7 narrowed this path: with the pre-decompose probe gate,
+    // the *probe* must pass (high-score chunk) for decompose to
+    // run at all. The *decompose-driven* retrieve can then return
+    // a below-gate hit that triggers the post-pool gate refusal —
+    // which is the surface that preserves hops.
+    //
+    // For the *probe-driven* gate refusal (single-pass-equivalent
+    // safety floor), see
+    // `multi_hop_below_probe_gate_refuses_before_any_llm_call` —
+    // that returns hops=None because decompose never ran.
    let env = RagEnv::new();
-    let (cid, did) = seed_low_score_chunk(&env);
-    // Top score 0.10 is well below the default gate (0.30) — the
-    // score-gate refusal fires after the pool has been built, so
-    // the decide LLM call did run and the hop trace contains both
-    // Decompose and Decide entries.
-    let hits = vec![mk_hit(1, &cid, &did, "notes/low.md", 0.10, &["Low"])];
-    let retriever = Arc::new(ScriptedRetriever::new(vec![hits]));
+    let (low_cid, low_did) = seed_low_score_chunk(&env);
+    let high_cid = id32("c_high");
+    let high_did = id32("d_high");
+    env.seed_chunk(&high_cid, &high_did, "notes/high.md", "high score body", &["High"]);
+
+    let probe_hits = vec![mk_hit(1, &high_cid, &high_did, "notes/high.md", 0.85, &["High"])];
+    let decompose_hits = vec![mk_hit(1, &low_cid, &low_did, "notes/low.md", 0.10, &["Low"])];
+    let retriever = Arc::new(ScriptedRetriever::new(vec![probe_hits, decompose_hits]));
    let retriever_dyn: Arc<dyn Retriever> = retriever;

    // decompose + decide (pool not empty so decide fires) — synthesize
@@ -456,3 +487,134 @@ fn seed_low_score_chunk(env: &RagEnv) -> (String, String) {
    env.seed_chunk(&cid, &did, "notes/low.md", "low score text", &["Low"]);
    (cid, did)
 }
+
+// ── p9-fb-41 v0.18 dogfood fix: pre-decompose score-gate probe ────────────
+//
+// Out-of-corpus query that single-pass would have refused via
+// score-gate must also refuse on the multi-hop path — *before* any
+// decompose / decide / synthesize LLM call. Otherwise the decompose
+// can emit sub-queries that pull in chunks loosely matching each
+// sub-query, fill the pool past the gate, and let the synthesize
+// hallucinate over chunks that were never relevant to the *original*
+// query. Dogfood S7 (`/build/cache/dogfood-v018/results/SUMMARY.md`)
+// is the symptom; these tests pin the fix.
+
+#[test]
+fn multi_hop_below_probe_gate_refuses_before_any_llm_call() {
+    let env = RagEnv::new();
+    let cid = id32("c_low");
+    let did = id32("d_low");
+    env.seed_chunk(&cid, &did, "notes/low.md", "low score body", &["Low"]);
+    // Single hit far below the default 0.30 gate.
+    let hits = vec![mk_hit(1, &cid, &did, "notes/low.md", 0.05, &["Low"])];
+    let retriever = Arc::new(ScriptedRetriever::new(vec![hits]));
+    let retriever_handle = retriever.clone();
+    let retriever_dyn: Arc<dyn Retriever> = retriever;
+
+    // Empty LM script — ANY LLM call panics on exhaustion. The fix
+    // must short-circuit before decompose.
+    let lm = Arc::new(ScriptedLm::new(vec![]));
+    let lm_handle = lm.clone();
+    let lm_dyn: Arc<dyn LanguageModel> = lm;
+    let pipeline =
+        RagPipeline::new(env.config.clone(), retriever_dyn, lm_dyn, env.sqlite.clone());
+
+    let answer = pipeline.ask("out-of-corpus query", multi_hop_opts()).unwrap();
+
+    assert!(!answer.grounded);
+    assert_eq!(answer.refusal_reason, Some(RefusalReason::ScoreGate));
+    assert_eq!(
+        lm_handle.calls(),
+        0,
+        "below-gate must short-circuit BEFORE any LLM call (no decompose, decide, or synthesize)"
+    );
+    assert_eq!(
+        retriever_handle.calls(),
+        1,
+        "only the probe retrieve happened — no decompose-driven retrieves"
+    );
+    // S7 dogfood: in the pre-fix world the multi-hop path would have
+    // returned grounded=true with hallucinated content. This test
+    // pins the safe envelope.
+    assert!(
+        answer.hops.is_none(),
+        "pre-decompose refusal carries no hop trace (decompose never ran)"
+    );
+}
+
+#[test]
+fn multi_hop_empty_probe_pool_refuses_before_any_llm_call() {
+    let env = RagEnv::new();
+    // Retriever returns 0 hits — probe is empty.
+    let retriever = Arc::new(ScriptedRetriever::new(vec![vec![]]));
+    let retriever_handle = retriever.clone();
+    let retriever_dyn: Arc<dyn Retriever> = retriever;
+
+    let lm = Arc::new(ScriptedLm::new(vec![]));
+    let lm_handle = lm.clone();
+    let lm_dyn: Arc<dyn LanguageModel> = lm;
+    let pipeline =
+        RagPipeline::new(env.config.clone(), retriever_dyn, lm_dyn, env.sqlite.clone());
+
+    let answer = pipeline.ask("q", multi_hop_opts()).unwrap();
+
+    assert!(!answer.grounded);
+    assert_eq!(answer.refusal_reason, Some(RefusalReason::NoChunks));
+    assert_eq!(
+        lm_handle.calls(),
+        0,
+        "empty probe must short-circuit BEFORE any LLM call"
+    );
+    assert_eq!(
+        retriever_handle.calls(),
+        1,
+        "only the probe retrieve happened — no decompose retrieves"
+    );
+    assert!(answer.hops.is_none());
+}
+
+#[test]
+fn multi_hop_above_probe_gate_proceeds_to_decompose() {
+    // Sanity counterpart: a query that PASSES the probe gate still
+    // exercises the full multi-hop flow (decompose → decide → synth).
+    // Guards against the fix accidentally short-circuiting valid
+    // multi-hop calls.
+    let env = RagEnv::new();
+    let cid = id32("c1");
+    let did = id32("d1");
+    env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
+    // Probe retrieve returns a high-score hit (above gate),
+    // decompose-driven retrieve returns the same chunk again.
+    let probe_hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
+    let decompose_hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
+    let retriever = Arc::new(ScriptedRetriever::new(vec![probe_hits, decompose_hits]));
+    let retriever_handle = retriever.clone();
+    let retriever_dyn: Arc<dyn Retriever> = retriever;
+
+    let lm = Arc::new(ScriptedLm::new(vec![
+        r#"["q1"]"#,
+        r#"[]"#,
+        "answer [#1]",
+    ]));
+    let lm_handle = lm.clone();
+    let lm_dyn: Arc<dyn LanguageModel> = lm;
+    let pipeline =
+        RagPipeline::new(env.config.clone(), retriever_dyn, lm_dyn, env.sqlite.clone());
+
+    let answer = pipeline.ask("valid query", multi_hop_opts()).unwrap();
+
+    assert!(answer.grounded);
+    assert_eq!(answer.refusal_reason, None);
+    assert_eq!(
+        lm_handle.calls(),
+        3,
+        "decompose + decide + synthesize all ran"
+    );
+    assert_eq!(
+        retriever_handle.calls(),
+        2,
+        "probe retrieve + decompose-driven retrieve"
+    );
+    let hops = answer.hops.expect("happy path stamps hops");
+    assert_eq!(hops.len(), 3);
+}
--- a/tasks/HOTFIXES.md
+++ b/tasks/HOTFIXES.md
@@ -14,6 +14,58 @@ historical contract that was implemented; this file accumulates the
 deltas so phase 5+ readers can find the live behavior without diffing
 git history.

+## 2026-05-25 — fb-41 pre-v0.18 dogfood: multi-hop score-gate 우회 (S7 hallucination 회귀 핀)
+
+v0.18.0 cut 전 fb-41 multi-hop RAG 도그푸딩 (`/build/cache/dogfood-v018/`, 33 assets / 205 chunks corpus — 16 신규 markdown 5 클러스터 + v017 carryover, gemma3:4b CPU only / 16 GB RAM) 에서 발견된 **score_gate 우회 + hallucination 케이스**.
+
+### Symptom (S7)
+
+Query: `"What is the chemical formula of caffeine?"` (KB 에 없는 fact).
+- **Single-pass** `kebab ask`: retrieve 의 top score 가 default `rag.score_gate = 0.30` 미만 → `refuse_score_gate` → 안전한 refusal.
+- **Multi-hop** `kebab ask --multi-hop`: `grounded = true`, 본문 `"카페인의 화학식은 C₉H₁₅N₃O 입니다 [#6]"` (**hallucination** — 실제 C₈H₁₀N₄O₂) + `[#6]` 가 *Adam optimizer chunk* 의 `g_t = ∂L/∂θ_i` 본문을 인용 (시각적으로 화학식 비슷한 short structured token 매칭 trigger).
+
+### Root cause
+
+`ask_multi_hop` (`crates/kebab-rag/src/pipeline.rs`) 의 score-gate 검사가 *pool 의 top_score* 만 봄. multi-hop 의 pool 은 *5 sub-queries 의 union* — 한 sub-query 의 top score 가 gate 위면 pool 의 다른 chunks 가 원본 query 와 무관해도 gate 통과 + synthesize 단계 진입 → LLM 이 chunks 위에서 hallucinate.
+
+같은 query 에 대해 single-pass 는 *원본 query 의 retrieve top score* 검사 → reject. multi-hop 만 hole.
+
+### Fix (PR-7)
+
+`ask_multi_hop` entry 에 **pre-decompose probe** 추가:
+1. *원본 query* 로 retrieve 한 번 (LLM call 0회, ~ms).
+2. probe empty → `refuse_no_chunks(None)` (decompose 안 함, hop trace 도 없음).
+3. probe top_score < gate → `refuse_score_gate(None)` (decompose 안 함).
+4. probe pass → 기존 decompose / decide / synthesize flow 그대로.
+
+Multi-hop 의 safety floor 가 single-pass 와 정확히 일치 — multi-hop 은 *원본 query 가 이미 KB 범위 내* 일 때만 cross-doc reasoning 추가.
+
+### Test 갱신
+
+- 신규 3 회귀 핀 (`crates/kebab-rag/tests/multi_hop.rs`):
+  - `multi_hop_below_probe_gate_refuses_before_any_llm_call` — S7 회귀 직접 핀. low-score chunk + empty LM script → score_gate refusal, LM calls 0회, hops=None.
+  - `multi_hop_empty_probe_pool_refuses_before_any_llm_call` — empty retrieve 시 NoChunks refusal, LM calls 0회.
+  - `multi_hop_above_probe_gate_proceeds_to_decompose` — probe pass 시 full multi-hop flow (decompose + decide + synth) 정상 동작.
+- 기존 7 multi-hop test 의 `ScriptedRetriever` 에 probe entry prepend + `retriever_handle.calls()` expectation +1.
+- `multi_hop_refuse_no_chunks_preserves_hops_trace` / `multi_hop_refuse_score_gate_preserves_hops_trace` 의 의미 좁힘 — 이제 *decompose-driven* refusal (probe pass 후 sub-query retrieve 가 empty 또는 below-gate) 만 검증. *probe-driven* refusal 은 hops=None (decompose 안 함) — 신규 test 가 그 path 핀.
+
+### 다른 도그푸딩 발견 (별 fix 보류)
+
+같은 도그푸딩에서 발견된 다른 항목들 — PR-7 본 fix 의 scope 밖, v0.18.1 또는 후속 PR 대상:
+
+- **synthesize citation marker 일관성 부족** (S1/S2/S3, P1) — 30-chunk pool 의 large prompt 에서 gemma3:4b 가 `[#N]` citation rule 잃음 → 답변 본문 정상이나 `grounded=false (LlmSelfJudge)` 로 노출. **권장 mitigation**: `MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT` 의 citation rule 강화 또는 `multi_hop_max_pool_chunks` default 30 → 15.
+- **latency 20-25× cost** — single-pass 30s vs multi-hop 590-685s (synthesize 단계가 cost dominant). spec 의 "2-5× LLM cost" 보다 큼. pool size 축소가 mitigation.
+- **release binary path confusion** — `/home/altair823/kebab/target/release/kebab` (v0.17.1 stale) vs `/build/out/cargo-target/release/kebab` (v0.17.2 latest). CARGO_TARGET_DIR env 의 영향. docs 한 줄 권장.
+
+### 사용자 영향
+
+PR-7 머지 후 (v0.18.0 cut 직전):
+- multi-hop 의 safety 가 single-pass 와 동일. KB 밖 query 가 hallucination 답변 못 받음.
+- Multi-hop 의 정상 use case (compound / cross-doc reasoning) 는 영향 없음 — probe 가 통과하면 기존 flow 그대로.
+- Wire 변경 없음 (`Answer.hops` 의 노출 동일, refusal_reason 값 동일).
+
+Cross-link: `/build/cache/dogfood-v018/results/SUMMARY.md` (전체 dogfood 보고서), spec `docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md`.
+
 ## 2026-05-25 — v0.17.0 post-dogfood: `[models.llm] request_timeout_secs` 노브 + 권장 모델 가이드

 v0.17.0 후속 도그푸딩에서 발견: 사용자가 default `gemma4:e4b` (8B Q4, 9.6 GB) 를 CPU only / 16 GB RAM 환경에서 시도 시 첫 RAG 답변이 5 분 (hard-coded 300 s) 한도를 항상 넘겨 `error: kb-rag: llm.generate_stream` 으로 떨어졌다. 메모리도 ollama RSS 10.7 GB / free 2 GB 까지 압박. 후속 도그푸딩 32 분 / 199 mem-monitor sample 결과는 `tasks/HOTFIXES.md` 의 본 entry 와 conversation 의 도그푸딩 보고 참조.