fix(rag): fb-41 PR-7 — multi-hop pre-decompose score-gate (S7 hallucination 회귀 핀)

v0.18 cut 전 fb-41 multi-hop RAG 도그푸딩에서 발견된 **safety regression**
fix. 자세한 도그푸딩 결과는 `tasks/HOTFIXES.md` 의 2026-05-25 fb-41
pre-v0.18 entry + `/build/cache/dogfood-v018/results/SUMMARY.md` 참조.

## 문제 (S7)

Query: `What is the chemical formula of caffeine?` (KB 에 없는 fact).

- Single-pass `kebab ask`: retrieve top score 가 default `rag.score_gate
  = 0.30` 미만 → `refuse_score_gate` → 안전한 refusal.
- Multi-hop `kebab ask --multi-hop`: **`grounded = true`**, 본문
  `"카페인의 화학식은 C₉H₁₅N₃O 입니다 [#6]"` (hallucination — 실제
  C₈H₁₀N₄O₂) + `[#6]` 가 Adam optimizer chunk 의 `g_t = ∂L/∂θ_i` 본문을
  인용 (시각적 short structured token 매칭 trigger).

원인: `ask_multi_hop` 의 score-gate 검사가 *pool 의 top_score* 만 봤다.
multi-hop 의 pool 은 5 sub-queries 의 union — 한 sub-query 의 top score
가 gate 위면 다른 chunks 가 원본 query 와 무관해도 gate 통과 + synth →
LLM hallucinate.

## Fix

`ask_multi_hop` entry 에 **pre-decompose probe** 추가:

1. *원본 query* 로 retrieve 한 번 (LLM call 0회, ~ms).
2. probe empty → `refuse_no_chunks(None)` (decompose 안 함, hops=None).
3. probe top_score < gate → `refuse_score_gate(None)` (decompose 안 함).
4. probe pass → 기존 decompose / decide / synthesize flow 그대로.

Multi-hop 의 safety floor 가 single-pass 와 정확히 일치 — multi-hop 은
*원본 query 가 이미 KB 범위 내* 일 때만 cross-doc reasoning 추가.

비용: 한 번의 retrieve (수 ms), LLM call 없음. multi-hop 의 LLM-dominated
latency 대비 무시 가능.

## Tests

신규 3 회귀 핀 (`crates/kebab-rag/tests/multi_hop.rs`):

- `multi_hop_below_probe_gate_refuses_before_any_llm_call` — **S7 직접
  회귀 핀**. low-score chunk + empty LM script → score_gate refusal, LM
  calls 0회, hops=None. fix revert 시 즉시 panic.
- `multi_hop_empty_probe_pool_refuses_before_any_llm_call` — empty
  retrieve 시 NoChunks refusal, LM calls 0회.
- `multi_hop_above_probe_gate_proceeds_to_decompose` — probe pass 시
  full multi-hop flow 정상 (decompose + decide + synth).

기존 7 multi-hop test 의 `ScriptedRetriever` 에 *probe-pass entry*
prepend + `retriever_handle.calls()` expectation +1. test 2 / test 4
처럼 entry 두 개였던 곳도 prepend (3 entries).

`multi_hop_refuse_no_chunks_preserves_hops_trace` /
`multi_hop_refuse_score_gate_preserves_hops_trace` 의 의미 좁힘 — 이제
*decompose-driven* refusal (probe pass 후 sub-query retrieve 가 empty
또는 below-gate) 만 검증. *probe-driven* refusal 은 hops=None
(decompose 안 함) — 신규 test 가 그 path 핀.

## 검증

- `cargo test -p kebab-rag -j 1` — 10 multi-hop (7 갱신 + 3 신규) + 19
  pipeline + 31 unit + 3 prompt_template + 3 streaming 모두 통과. 회귀
  없음.
- `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean.
- 단일 crate 직렬 build (16 GB RAM 제약).

## 변경 없음

- Wire schema — `Answer.hops` shape 동일, `refusal_reason` enum 동일.
- 다른 도그푸딩 발견 (synthesize citation 일관성, latency, binary path
  confusion) — v0.18.1 또는 별 PR 의 책임. HOTFIXES 의 "다른 도그푸딩
  발견" 절에 명시.

## 다음

PR-7 머지 후:
1. Workspace `Cargo.toml` version 0.17.2 → 0.18.0 (minor bump).
2. HANDOFF.md / INDEX.md 갱신 + frozen design §3.8 multi-hop sub-section.
3. `gitea-release v0.18.0 --auto-notes`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-25 12:02:11 +00:00
parent 5bfea3c28b
commit da25ce330b
3 changed files with 295 additions and 27 deletions

View File

@@ -619,6 +619,60 @@ impl RagPipeline {
/// eval `compare` can isolate multi-hop runs from single-pass.
pub fn ask_multi_hop(&self, query: &str, opts: AskOpts) -> Result<Answer> {
let started = std::time::Instant::now();
let k_effective = opts.k.max(self.config.search.default_k);
// ── 0. Pre-decompose score-gate probe (v0.18 dogfood fix) ──────────
//
// p9-fb-41 v0.18 pre-cut dogfood (`/build/cache/dogfood-v018/
// results/SUMMARY.md`) found that an out-of-corpus query
// ("What is the chemical formula of caffeine?") on the multi-
// hop path returned `grounded=true` with hallucinated content +
// a misattributed citation marker. Cause: multi-hop's 5-sub-
// query union pool fills with chunks each loosely matching one
// sub-query, then the post-pool score gate (which inspects
// `pool[0].fusion_score`) sees a sub-query's hit and passes —
// even though the *original* query never matched anything
// above the gate.
//
// Fix: probe the original query exactly the way single-pass
// `ask` would, before any decompose / decide LLM call. If
// top_score < gate (or hits empty), refuse with the same
// envelope single-pass would emit. Multi-hop's safety floor
// is now identical to single-pass's — multi-hop only *adds*
// the cross-doc reasoning when the original query is already
// in scope.
//
// Cost: one extra retrieve call (~ms, no LLM). Negligible vs.
// the LLM-dominated multi-hop latency.
let probe_query = SearchQuery {
text: query.to_string(),
mode: opts.mode,
k: k_effective,
filters: SearchFilters::default(),
};
let mut probe_hits = self
.retriever
.search(&probe_query)
.context("kb-rag: multi-hop probe retriever.search")?;
let probe_now = OffsetDateTime::now_utc();
let probe_threshold = self.config.search.stale_threshold_days;
for h in &mut probe_hits {
h.stale = compute_stale(h.indexed_at, probe_now, probe_threshold);
}
if probe_hits.is_empty() {
return self.refuse_no_chunks(query, &opts, k_effective, started, None);
}
if probe_hits[0].retrieval.fusion_score < self.config.rag.score_gate {
return self.refuse_score_gate(
query,
&opts,
&probe_hits,
k_effective,
started,
None,
);
}
let mut hops: Vec<HopRecord> = Vec::new();
// ── 1. Decompose (iter 0) ──────────────────────────────────────────
@@ -647,7 +701,7 @@ impl RagPipeline {
// is needed. The LLM emits new sub-queries (continue) or `[]`
// (stop); the loop also breaks when `max_depth` or
// `max_pool_chunks` cap fires (`forced_stop = true`).
let k_effective = opts.k.max(self.config.search.default_k);
// `k_effective` already computed at the probe step above.
let max_depth = self.config.rag.multi_hop_max_depth;
let max_pool = self.config.rag.multi_hop_max_pool_chunks as usize;
let mut pool: Vec<SearchHit> = Vec::new();

View File

@@ -58,7 +58,9 @@ fn multi_hop_decide_stop_triggers_synthesize() {
let did = id32("d1");
env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
let hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
let retriever = Arc::new(ScriptedRetriever::new(vec![hits]));
// PR-7: ScriptedRetriever entry 0 = probe retrieve (pre-decompose
// score-gate), entry 1 = decompose-driven retrieve for "q1".
let retriever = Arc::new(ScriptedRetriever::new(vec![hits.clone(), hits]));
let retriever_handle = retriever.clone();
let retriever_dyn: Arc<dyn Retriever> = retriever;
@@ -84,8 +86,8 @@ fn multi_hop_decide_stop_triggers_synthesize() {
);
assert_eq!(
retriever_handle.calls(),
1,
"single sub-query → single retrieval"
2,
"probe retrieve + 1 sub-query retrieve = 2"
);
let hops = answer.hops.expect("multi-hop happy path stamps Some(hops)");
@@ -115,9 +117,10 @@ fn multi_hop_decide_continue_adds_more_chunks() {
let did2 = id32("d2");
env.seed_chunk(&cid1, &did1, "notes/a.md", "Chunk one.", &["A"]);
env.seed_chunk(&cid2, &did2, "notes/b.md", "Chunk two.", &["B"]);
// iter 1 retrieves chunk 1; iter 2 retrieves chunk 2 (different
// chunk_id → pool grows).
// PR-7: entry 0 = probe (above gate), entry 1 = iter 1 retrieves
// chunk 1, entry 2 = iter 2 retrieves chunk 2.
let retriever = Arc::new(ScriptedRetriever::new(vec![
vec![mk_hit(1, &cid1, &did1, "notes/a.md", 0.85, &["A"])],
vec![mk_hit(1, &cid1, &did1, "notes/a.md", 0.85, &["A"])],
vec![mk_hit(1, &cid2, &did2, "notes/b.md", 0.80, &["B"])],
]));
@@ -146,8 +149,8 @@ fn multi_hop_decide_continue_adds_more_chunks() {
);
assert_eq!(
retriever_handle.calls(),
2,
"iter 1 retrieves q1, iter 2 retrieves q2"
3,
"probe + iter 1 retrieves q1 + iter 2 retrieves q2"
);
assert_eq!(
answer.retrieval.chunks_returned, 2,
@@ -187,7 +190,8 @@ fn multi_hop_max_depth_force_stops() {
cfg.rag.multi_hop_max_depth = 1;
let hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
let retriever = Arc::new(ScriptedRetriever::new(vec![hits]));
// PR-7: entry 0 = probe, entry 1 = decompose-driven retrieve.
let retriever = Arc::new(ScriptedRetriever::new(vec![hits.clone(), hits]));
let retriever_handle = retriever.clone();
let retriever_dyn: Arc<dyn Retriever> = retriever;
@@ -210,7 +214,7 @@ fn multi_hop_max_depth_force_stops() {
2,
"depth-cap skips decide → only decompose + synthesize"
);
assert_eq!(retriever_handle.calls(), 1);
assert_eq!(retriever_handle.calls(), 2, "probe + 1 decompose retrieve");
let hops = answer.hops.expect("happy path stamps hops");
assert_eq!(hops.len(), 3, "[Decompose, Decide(forced_stop), Synthesize]");
@@ -238,9 +242,10 @@ fn multi_hop_pool_chunks_dedup_by_chunk_id() {
let did = id32("d1");
env.seed_chunk(&cid, &did, "notes/a.md", "Shared chunk text.", &["X"]);
// Both sub-queries retrieve the same chunk_id — dedup must
// keep exactly one pool entry.
// keep exactly one pool entry. PR-7: entry 0 = probe.
let shared_hit = mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["X"]);
let retriever = Arc::new(ScriptedRetriever::new(vec![
vec![shared_hit.clone()],
vec![shared_hit.clone()],
vec![shared_hit],
]));
@@ -262,8 +267,8 @@ fn multi_hop_pool_chunks_dedup_by_chunk_id() {
assert!(answer.grounded);
assert_eq!(
retriever_handle.calls(),
2,
"two sub-queries → two retrieval calls"
3,
"probe + two sub-query retrieves"
);
assert_eq!(
answer.retrieval.chunks_returned, 1,
@@ -295,7 +300,8 @@ fn multi_hop_decide_parse_failure_falls_through_to_synthesize() {
let did = id32("d1");
env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
let hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
let retriever = Arc::new(ScriptedRetriever::new(vec![hits]));
// PR-7: entry 0 = probe, entry 1 = decompose-driven retrieve.
let retriever = Arc::new(ScriptedRetriever::new(vec![hits.clone(), hits]));
let retriever_dyn: Arc<dyn Retriever> = retriever;
// Decide LLM emits non-JSON garbage. Spec §9: this is NOT a
@@ -347,15 +353,25 @@ fn multi_hop_decide_parse_failure_falls_through_to_synthesize() {
//
// PR-3b-ii widens `refuse_no_chunks` to accept `hops:
// Option<Vec<HopRecord>>` and wires `ask_multi_hop` to forward the
// partial trace. This test pins that contract — a refusal Answer
// still carries the decompose + decide hops the loop accumulated
// before pool came up empty.
// partial trace. PR-7 added a pre-decompose probe — so this test
// now exercises the *decompose-driven* empty-pool path: probe
// passes (KB has at least one relevant chunk), decompose emits
// sub-queries, but the sub-query retrieve hits nothing → pool stays
// empty → refuse_no_chunks with the partial hop trace preserved.
// (For the *probe-driven* refusal, see
// `multi_hop_empty_probe_pool_refuses_before_any_llm_call` —
// that path returns hops=None because decompose never ran.)
#[test]
fn multi_hop_refuse_no_chunks_preserves_hops_trace() {
let env = RagEnv::new();
// Retriever returns 0 hits — pool stays empty → refuse_no_chunks.
let retriever = Arc::new(ScriptedRetriever::new(vec![vec![]]));
let cid = id32("c1");
let did = id32("d1");
env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
let probe_hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
// PR-7: entry 0 = probe (passes gate), entry 1 = decompose-driven
// retrieve (empty — sub-query returned nothing).
let retriever = Arc::new(ScriptedRetriever::new(vec![probe_hits, vec![]]));
let retriever_handle = retriever.clone();
let retriever_dyn: Arc<dyn Retriever> = retriever;
@@ -373,7 +389,11 @@ fn multi_hop_refuse_no_chunks_preserves_hops_trace() {
assert!(!answer.grounded);
assert_eq!(answer.refusal_reason, Some(RefusalReason::NoChunks));
assert_eq!(retriever_handle.calls(), 1, "single sub-query → single retrieve");
assert_eq!(
retriever_handle.calls(),
2,
"probe (passes) + 1 decompose-driven retrieve (empty)"
);
assert_eq!(lm_handle.calls(), 1, "decompose only — decide skipped (empty pool), no synthesize");
let hops = answer
@@ -398,14 +418,25 @@ fn multi_hop_refuse_no_chunks_preserves_hops_trace() {
#[test]
fn multi_hop_refuse_score_gate_preserves_hops_trace() {
// PR-7 narrowed this path: with the pre-decompose probe gate,
// the *probe* must pass (high-score chunk) for decompose to
// run at all. The *decompose-driven* retrieve can then return
// a below-gate hit that triggers the post-pool gate refusal —
// which is the surface that preserves hops.
//
// For the *probe-driven* gate refusal (single-pass-equivalent
// safety floor), see
// `multi_hop_below_probe_gate_refuses_before_any_llm_call` —
// that returns hops=None because decompose never ran.
let env = RagEnv::new();
let (cid, did) = seed_low_score_chunk(&env);
// Top score 0.10 is well below the default gate (0.30) — the
// score-gate refusal fires after the pool has been built, so
// the decide LLM call did run and the hop trace contains both
// Decompose and Decide entries.
let hits = vec![mk_hit(1, &cid, &did, "notes/low.md", 0.10, &["Low"])];
let retriever = Arc::new(ScriptedRetriever::new(vec![hits]));
let (low_cid, low_did) = seed_low_score_chunk(&env);
let high_cid = id32("c_high");
let high_did = id32("d_high");
env.seed_chunk(&high_cid, &high_did, "notes/high.md", "high score body", &["High"]);
let probe_hits = vec![mk_hit(1, &high_cid, &high_did, "notes/high.md", 0.85, &["High"])];
let decompose_hits = vec![mk_hit(1, &low_cid, &low_did, "notes/low.md", 0.10, &["Low"])];
let retriever = Arc::new(ScriptedRetriever::new(vec![probe_hits, decompose_hits]));
let retriever_dyn: Arc<dyn Retriever> = retriever;
// decompose + decide (pool not empty so decide fires) — synthesize
@@ -456,3 +487,134 @@ fn seed_low_score_chunk(env: &RagEnv) -> (String, String) {
env.seed_chunk(&cid, &did, "notes/low.md", "low score text", &["Low"]);
(cid, did)
}
// ── p9-fb-41 v0.18 dogfood fix: pre-decompose score-gate probe ────────────
//
// Out-of-corpus query that single-pass would have refused via
// score-gate must also refuse on the multi-hop path — *before* any
// decompose / decide / synthesize LLM call. Otherwise the decompose
// can emit sub-queries that pull in chunks loosely matching each
// sub-query, fill the pool past the gate, and let the synthesize
// hallucinate over chunks that were never relevant to the *original*
// query. Dogfood S7 (`/build/cache/dogfood-v018/results/SUMMARY.md`)
// is the symptom; these tests pin the fix.
#[test]
fn multi_hop_below_probe_gate_refuses_before_any_llm_call() {
let env = RagEnv::new();
let cid = id32("c_low");
let did = id32("d_low");
env.seed_chunk(&cid, &did, "notes/low.md", "low score body", &["Low"]);
// Single hit far below the default 0.30 gate.
let hits = vec![mk_hit(1, &cid, &did, "notes/low.md", 0.05, &["Low"])];
let retriever = Arc::new(ScriptedRetriever::new(vec![hits]));
let retriever_handle = retriever.clone();
let retriever_dyn: Arc<dyn Retriever> = retriever;
// Empty LM script — ANY LLM call panics on exhaustion. The fix
// must short-circuit before decompose.
let lm = Arc::new(ScriptedLm::new(vec![]));
let lm_handle = lm.clone();
let lm_dyn: Arc<dyn LanguageModel> = lm;
let pipeline =
RagPipeline::new(env.config.clone(), retriever_dyn, lm_dyn, env.sqlite.clone());
let answer = pipeline.ask("out-of-corpus query", multi_hop_opts()).unwrap();
assert!(!answer.grounded);
assert_eq!(answer.refusal_reason, Some(RefusalReason::ScoreGate));
assert_eq!(
lm_handle.calls(),
0,
"below-gate must short-circuit BEFORE any LLM call (no decompose, decide, or synthesize)"
);
assert_eq!(
retriever_handle.calls(),
1,
"only the probe retrieve happened — no decompose-driven retrieves"
);
// S7 dogfood: in the pre-fix world the multi-hop path would have
// returned grounded=true with hallucinated content. This test
// pins the safe envelope.
assert!(
answer.hops.is_none(),
"pre-decompose refusal carries no hop trace (decompose never ran)"
);
}
#[test]
fn multi_hop_empty_probe_pool_refuses_before_any_llm_call() {
let env = RagEnv::new();
// Retriever returns 0 hits — probe is empty.
let retriever = Arc::new(ScriptedRetriever::new(vec![vec![]]));
let retriever_handle = retriever.clone();
let retriever_dyn: Arc<dyn Retriever> = retriever;
let lm = Arc::new(ScriptedLm::new(vec![]));
let lm_handle = lm.clone();
let lm_dyn: Arc<dyn LanguageModel> = lm;
let pipeline =
RagPipeline::new(env.config.clone(), retriever_dyn, lm_dyn, env.sqlite.clone());
let answer = pipeline.ask("q", multi_hop_opts()).unwrap();
assert!(!answer.grounded);
assert_eq!(answer.refusal_reason, Some(RefusalReason::NoChunks));
assert_eq!(
lm_handle.calls(),
0,
"empty probe must short-circuit BEFORE any LLM call"
);
assert_eq!(
retriever_handle.calls(),
1,
"only the probe retrieve happened — no decompose retrieves"
);
assert!(answer.hops.is_none());
}
#[test]
fn multi_hop_above_probe_gate_proceeds_to_decompose() {
// Sanity counterpart: a query that PASSES the probe gate still
// exercises the full multi-hop flow (decompose → decide → synth).
// Guards against the fix accidentally short-circuiting valid
// multi-hop calls.
let env = RagEnv::new();
let cid = id32("c1");
let did = id32("d1");
env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
// Probe retrieve returns a high-score hit (above gate),
// decompose-driven retrieve returns the same chunk again.
let probe_hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
let decompose_hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
let retriever = Arc::new(ScriptedRetriever::new(vec![probe_hits, decompose_hits]));
let retriever_handle = retriever.clone();
let retriever_dyn: Arc<dyn Retriever> = retriever;
let lm = Arc::new(ScriptedLm::new(vec![
r#"["q1"]"#,
r#"[]"#,
"answer [#1]",
]));
let lm_handle = lm.clone();
let lm_dyn: Arc<dyn LanguageModel> = lm;
let pipeline =
RagPipeline::new(env.config.clone(), retriever_dyn, lm_dyn, env.sqlite.clone());
let answer = pipeline.ask("valid query", multi_hop_opts()).unwrap();
assert!(answer.grounded);
assert_eq!(answer.refusal_reason, None);
assert_eq!(
lm_handle.calls(),
3,
"decompose + decide + synthesize all ran"
);
assert_eq!(
retriever_handle.calls(),
2,
"probe retrieve + decompose-driven retrieve"
);
let hops = answer.hops.expect("happy path stamps hops");
assert_eq!(hops.len(), 3);
}

View File

@@ -14,6 +14,58 @@ historical contract that was implemented; this file accumulates the
deltas so phase 5+ readers can find the live behavior without diffing
git history.
## 2026-05-25 — fb-41 pre-v0.18 dogfood: multi-hop score-gate 우회 (S7 hallucination 회귀 핀)
v0.18.0 cut 전 fb-41 multi-hop RAG 도그푸딩 (`/build/cache/dogfood-v018/`, 33 assets / 205 chunks corpus — 16 신규 markdown 5 클러스터 + v017 carryover, gemma3:4b CPU only / 16 GB RAM) 에서 발견된 **score_gate 우회 + hallucination 케이스**.
### Symptom (S7)
Query: `"What is the chemical formula of caffeine?"` (KB 에 없는 fact).
- **Single-pass** `kebab ask`: retrieve 의 top score 가 default `rag.score_gate = 0.30` 미만 → `refuse_score_gate` → 안전한 refusal.
- **Multi-hop** `kebab ask --multi-hop`: `grounded = true`, 본문 `"카페인의 화학식은 C₉H₁₅N₃O 입니다 [#6]"` (**hallucination** — 실제 C₈H₁₀N₄O₂) + `[#6]`*Adam optimizer chunk*`g_t = ∂L/∂θ_i` 본문을 인용 (시각적으로 화학식 비슷한 short structured token 매칭 trigger).
### Root cause
`ask_multi_hop` (`crates/kebab-rag/src/pipeline.rs`) 의 score-gate 검사가 *pool 의 top_score* 만 봄. multi-hop 의 pool 은 *5 sub-queries 의 union* — 한 sub-query 의 top score 가 gate 위면 pool 의 다른 chunks 가 원본 query 와 무관해도 gate 통과 + synthesize 단계 진입 → LLM 이 chunks 위에서 hallucinate.
같은 query 에 대해 single-pass 는 *원본 query 의 retrieve top score* 검사 → reject. multi-hop 만 hole.
### Fix (PR-7)
`ask_multi_hop` entry 에 **pre-decompose probe** 추가:
1. *원본 query* 로 retrieve 한 번 (LLM call 0회, ~ms).
2. probe empty → `refuse_no_chunks(None)` (decompose 안 함, hop trace 도 없음).
3. probe top_score < gate → `refuse_score_gate(None)` (decompose 안 함).
4. probe pass → 기존 decompose / decide / synthesize flow 그대로.
Multi-hop 의 safety floor 가 single-pass 와 정확히 일치 — multi-hop 은 *원본 query 가 이미 KB 범위 내* 일 때만 cross-doc reasoning 추가.
### Test 갱신
- 신규 3 회귀 핀 (`crates/kebab-rag/tests/multi_hop.rs`):
- `multi_hop_below_probe_gate_refuses_before_any_llm_call` — S7 회귀 직접 핀. low-score chunk + empty LM script → score_gate refusal, LM calls 0회, hops=None.
- `multi_hop_empty_probe_pool_refuses_before_any_llm_call` — empty retrieve 시 NoChunks refusal, LM calls 0회.
- `multi_hop_above_probe_gate_proceeds_to_decompose` — probe pass 시 full multi-hop flow (decompose + decide + synth) 정상 동작.
- 기존 7 multi-hop test 의 `ScriptedRetriever` 에 probe entry prepend + `retriever_handle.calls()` expectation +1.
- `multi_hop_refuse_no_chunks_preserves_hops_trace` / `multi_hop_refuse_score_gate_preserves_hops_trace` 의 의미 좁힘 — 이제 *decompose-driven* refusal (probe pass 후 sub-query retrieve 가 empty 또는 below-gate) 만 검증. *probe-driven* refusal 은 hops=None (decompose 안 함) — 신규 test 가 그 path 핀.
### 다른 도그푸딩 발견 (별 fix 보류)
같은 도그푸딩에서 발견된 다른 항목들 — PR-7 본 fix 의 scope 밖, v0.18.1 또는 후속 PR 대상:
- **synthesize citation marker 일관성 부족** (S1/S2/S3, P1) — 30-chunk pool 의 large prompt 에서 gemma3:4b 가 `[#N]` citation rule 잃음 → 답변 본문 정상이나 `grounded=false (LlmSelfJudge)` 로 노출. **권장 mitigation**: `MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT` 의 citation rule 강화 또는 `multi_hop_max_pool_chunks` default 30 → 15.
- **latency 20-25× cost** — single-pass 30s vs multi-hop 590-685s (synthesize 단계가 cost dominant). spec 의 "2-5× LLM cost" 보다 큼. pool size 축소가 mitigation.
- **release binary path confusion** — `/home/altair823/kebab/target/release/kebab` (v0.17.1 stale) vs `/build/out/cargo-target/release/kebab` (v0.17.2 latest). CARGO_TARGET_DIR env 의 영향. docs 한 줄 권장.
### 사용자 영향
PR-7 머지 후 (v0.18.0 cut 직전):
- multi-hop 의 safety 가 single-pass 와 동일. KB 밖 query 가 hallucination 답변 못 받음.
- Multi-hop 의 정상 use case (compound / cross-doc reasoning) 는 영향 없음 — probe 가 통과하면 기존 flow 그대로.
- Wire 변경 없음 (`Answer.hops` 의 노출 동일, refusal_reason 값 동일).
Cross-link: `/build/cache/dogfood-v018/results/SUMMARY.md` (전체 dogfood 보고서), spec `docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md`.
## 2026-05-25 — v0.17.0 post-dogfood: `[models.llm] request_timeout_secs` 노브 + 권장 모델 가이드
v0.17.0 후속 도그푸딩에서 발견: 사용자가 default `gemma4:e4b` (8B Q4, 9.6 GB) 를 CPU only / 16 GB RAM 환경에서 시도 시 첫 RAG 답변이 5 분 (hard-coded 300 s) 한도를 항상 넘겨 `error: kb-rag: llm.generate_stream` 으로 떨어졌다. 메모리도 ollama RSS 10.7 GB / free 2 GB 까지 압박. 후속 도그푸딩 32 분 / 199 mem-monitor sample 결과는 `tasks/HOTFIXES.md` 의 본 entry 와 conversation 의 도그푸딩 보고 참조.