feat(rag): fb-41 PR-3b-ii — ScriptedLm + 5 multi-hop tests + refusal hop trace + carry-over

PR-3b 의 분할 두 번째 PR — PR-3b-i 의 dynamic decide loop 위에서: 1. **ScriptedLm + ScriptedRetriever helper** (kebab-rag tests/common/mod.rs) per-call 다른 response 반환. decompose / decide×N / synthesize 의 각 LLM call 을 구분하는 다단계 multi-hop 시나리오를 mock-only 로 exercise 가능. `Vec<&str>` / `Vec<Vec<SearchHit>>` 받아 call sequence 순서대로 emit. Send + Sync. 2. **5 multi-hop integration tests** (kebab-rag tests/multi_hop.rs 신규) - decide_stop_triggers_synthesize: decide [] → 즉시 synthesize - decide_continue_adds_more_chunks: decide ["q2"] → iter 2 retrieve + pool 확장 - max_depth_force_stops: depth cap → forced_stop + decide LLM call skip - pool_chunks_dedup_by_chunk_id: 같은 chunk_id 두 sub-query 에서 1 회 - decide_parse_failure_falls_through_to_synthesize: parse fail = graceful synthesize (refusal 아님, spec §9) 3. **refuse_* helper hops trace 보존** (회차 1 carry-over) refuse_no_chunks / refuse_score_gate 시그니처에 `hops: Option<Vec<HopRecord>>` 인자 추가. ask_multi_hop 의 score-gate / no-chunks refusal 시 누적된 hops 그대로 Answer.hops 에 보존. single-pass ask 는 None 전달 — wire 변동 없음 (skip_serializing_if). 4. **HopRecord doc 보강** (회차 1 carry-over) sub_queries 의 per-kind 의미 명시 (Decompose=initial / Decide=next-iter or empty=stop / Synthesize=always empty). llm_call_ms=0 의 ambiguity (no call vs 0ms call) doc 명시. 5. **MULTI_HOP_MAX_SUB_QUERIES_DEFAULT → _HARD_CAP rename** (회차 1 carry-over) const 의 의도 명확화 — config knob `multi_hop_max_sub_queries_per_iter` (5, prompt-side soft hint) 와 const (10, parse-side hard ceiling) 분리. 두 layer 의 책임 doc 동기화. test 도 rename. 6. **decide guard 단순화 + preview budget doc** (회차 1 carry-over) parse_decompose_response 의 post-condition (Some=non-empty 보장) doc 명시. defensive `Some(qs) if !qs.is_empty()` → `decide_result.unwrap_or_default()` 단순화. decide preview 의 snippet-only path (full chunk text 안 fetch) 의도 doc. 검증 - `cargo test -p kebab-rag -j 1` — 31 unit + 19 pipeline + 5 multi_hop + 3 prompt_template + 3 streaming 모두 통과. - `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean. Spec / plan - design: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md - plan: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-3b 단락) 다음 단계 = PR-4 (CLI --multi-hop + wire schema + error_wire). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 08:17:37 +00:00
parent 94e6146013
commit 6188a50c1c
5 changed files with 648 additions and 41 deletions
--- a/crates/kebab-rag/tests/common/mod.rs
+++ b/crates/kebab-rag/tests/common/mod.rs
@@ -206,6 +206,47 @@ impl Retriever for MockRetriever {
    }
 }

+/// p9-fb-41 PR-3b-ii: scripted retriever. Returns a different
+/// `Vec<SearchHit>` per `search` call from a pre-supplied sequence,
+/// so a multi-hop test can simulate "iter 1 returns chunk A, iter 2
+/// returns chunks A+B" (pool dedup) or "different sub-queries hit
+/// different docs". Exhaustion returns an empty `Vec` (no panic) —
+/// the pipeline already handles "no hits this round" gracefully via
+/// the dedup loop, and a panic would conflate "test forgot a row"
+/// with "pipeline made an unexpected extra retrieval call".
+///
+/// Use [`ScriptedRetriever::calls`] to assert the expected number
+/// of retrievals occurred.
+pub struct ScriptedRetriever {
+    hits_per_call: Vec<Vec<SearchHit>>,
+    next: std::sync::atomic::AtomicUsize,
+}
+
+impl ScriptedRetriever {
+    pub fn new(hits_per_call: Vec<Vec<SearchHit>>) -> Self {
+        Self {
+            hits_per_call,
+            next: std::sync::atomic::AtomicUsize::new(0),
+        }
+    }
+
+    pub fn calls(&self) -> usize {
+        self.next.load(std::sync::atomic::Ordering::SeqCst)
+    }
+}
+
+impl Retriever for ScriptedRetriever {
+    fn search(&self, _q: &SearchQuery) -> anyhow::Result<Vec<SearchHit>> {
+        let idx = self
+            .next
+            .fetch_add(1, std::sync::atomic::Ordering::SeqCst);
+        Ok(self.hits_per_call.get(idx).cloned().unwrap_or_default())
+    }
+    fn index_version(&self) -> IndexVersion {
+        IndexVersion("test-iv".to_string())
+    }
+}
+
 /// Pad a short prefix to the 32-hex shape `kebab_core` newtypes expect.
 pub fn id32(prefix: &str) -> String {
    let mut s = prefix.to_string();
@@ -215,3 +256,128 @@ pub fn id32(prefix: &str) -> String {
    s.truncate(32);
    s
 }
+
+/// p9-fb-41 PR-3b-ii: scripted language model. Returns a different
+/// canned response per `generate_stream` call from a pre-supplied
+/// `Vec<String>`. Mirrors `MockLanguageModel`'s streaming contract
+/// (one `TokenChunk::Token` per Unicode scalar, terminal `Done` with
+/// `canned_usage`, stop-string truncation honoured) but lets a test
+/// distinguish the decompose / per-iter decide / synthesize LLM calls
+/// of `RagPipeline::ask_multi_hop` — each can return a different
+/// payload (`["q1","q2"]`, `[]`, `"final answer [#1]"`, etc.).
+///
+/// Internally `Arc<Vec<String>> + AtomicUsize` so the type is
+/// `Send + Sync` and can be wrapped in `Arc<dyn LanguageModel>`.
+/// Tests can read `calls()` for an assertion on the expected LLM
+/// call count.
+///
+/// Exhaustion (more calls than scripted responses) panics — tests
+/// that need an "infinite" final response can supply a longer
+/// script; the panic message names the call index so the test
+/// failure points at the missing entry.
+pub struct ScriptedLm {
+    model_id: String,
+    provider: String,
+    context_tokens: usize,
+    /// Canned responses in call order. Index `i` is returned on the
+    /// `i`-th `generate_stream` call (0-based).
+    responses: Vec<String>,
+    /// 0-based index of the next response to return on `generate_stream`.
+    next: std::sync::atomic::AtomicUsize,
+    canned_finish: kebab_core::FinishReason,
+    canned_usage: kebab_core::TokenUsage,
+}
+
+impl ScriptedLm {
+    /// Build a scripted LM with the default model_id/provider used by
+    /// the rest of the test suite (`mock-lm` / `mock`) and the
+    /// MockLanguageModel-equivalent canned usage. Call `with_*`
+    /// builders if a test needs to override the defaults.
+    pub fn new(responses: Vec<&str>) -> Self {
+        Self {
+            model_id: "mock-lm".to_string(),
+            provider: "mock".to_string(),
+            context_tokens: 32_768,
+            responses: responses.into_iter().map(str::to_string).collect(),
+            next: std::sync::atomic::AtomicUsize::new(0),
+            canned_finish: kebab_core::FinishReason::Stop,
+            canned_usage: kebab_core::TokenUsage {
+                prompt_tokens: 10,
+                completion_tokens: 5,
+                latency_ms: 7,
+            },
+        }
+    }
+
+    /// Total `generate_stream` invocations so far. Tests use this to
+    /// assert "exactly N LLM calls happened" without scanning the
+    /// HopRecord trace (the trace is the user-visible signal; this
+    /// is the lower-level call counter).
+    pub fn calls(&self) -> usize {
+        self.next.load(std::sync::atomic::Ordering::SeqCst)
+    }
+
+    /// Earliest byte position of any non-empty stop string in
+    /// `canned`. Same precedence rule as `MockLanguageModel`:
+    /// `Iterator::min` returns the first equal element, so ties
+    /// break by `stop` declaration order. `str::find` returns a
+    /// UTF-8 char boundary by contract, so the resulting prefix
+    /// slice is sound.
+    fn apply_stop<'a>(canned: &'a str, stop: &[String]) -> (&'a str, bool) {
+        let earliest = stop
+            .iter()
+            .filter(|s| !s.is_empty())
+            .filter_map(|s| canned.find(s.as_str()))
+            .min();
+        match earliest {
+            Some(idx) => (&canned[..idx], true),
+            None => (canned, false),
+        }
+    }
+}
+
+impl kebab_core::LanguageModel for ScriptedLm {
+    fn model_ref(&self) -> kebab_core::ModelRef {
+        kebab_core::ModelRef {
+            id: self.model_id.clone(),
+            provider: self.provider.clone(),
+            dimensions: None,
+        }
+    }
+
+    fn context_tokens(&self) -> usize {
+        self.context_tokens
+    }
+
+    fn generate_stream(
+        &self,
+        req: kebab_core::GenerateRequest,
+    ) -> anyhow::Result<
+        Box<dyn Iterator<Item = anyhow::Result<kebab_core::TokenChunk>> + Send>,
+    > {
+        let idx = self
+            .next
+            .fetch_add(1, std::sync::atomic::Ordering::SeqCst);
+        let canned = self.responses.get(idx).unwrap_or_else(|| {
+            panic!(
+                "ScriptedLm exhausted: call #{idx} requested but only {} responses scripted",
+                self.responses.len()
+            )
+        });
+        let (truncated, stop_hit) = Self::apply_stop(canned, &req.stop);
+        let mut chunks: Vec<kebab_core::TokenChunk> = truncated
+            .chars()
+            .map(|c| kebab_core::TokenChunk::Token(c.to_string()))
+            .collect();
+        let finish_reason = if stop_hit {
+            kebab_core::FinishReason::Stop
+        } else {
+            self.canned_finish.clone()
+        };
+        chunks.push(kebab_core::TokenChunk::Done {
+            finish_reason,
+            usage: self.canned_usage.clone(),
+        });
+        Ok(Box::new(chunks.into_iter().map(Ok)))
+    }
+}