feat(rag): fb-41 PR-3b-ii — ScriptedLm + 5 multi-hop tests + refusal hop trace + carry-over
PR-3b 의 분할 두 번째 PR — PR-3b-i 의 dynamic decide loop 위에서:
1. **ScriptedLm + ScriptedRetriever helper** (kebab-rag tests/common/mod.rs)
per-call 다른 response 반환. decompose / decide×N / synthesize 의 각
LLM call 을 구분하는 다단계 multi-hop 시나리오를 mock-only 로 exercise
가능. `Vec<&str>` / `Vec<Vec<SearchHit>>` 받아 call sequence 순서대로
emit. Send + Sync.
2. **5 multi-hop integration tests** (kebab-rag tests/multi_hop.rs 신규)
- decide_stop_triggers_synthesize: decide [] → 즉시 synthesize
- decide_continue_adds_more_chunks: decide ["q2"] → iter 2 retrieve + pool 확장
- max_depth_force_stops: depth cap → forced_stop + decide LLM call skip
- pool_chunks_dedup_by_chunk_id: 같은 chunk_id 두 sub-query 에서 1 회
- decide_parse_failure_falls_through_to_synthesize: parse fail = graceful
synthesize (refusal 아님, spec §9)
3. **refuse_* helper hops trace 보존** (회차 1 carry-over)
refuse_no_chunks / refuse_score_gate 시그니처에 `hops:
Option<Vec<HopRecord>>` 인자 추가. ask_multi_hop 의 score-gate /
no-chunks refusal 시 누적된 hops 그대로 Answer.hops 에 보존.
single-pass ask 는 None 전달 — wire 변동 없음 (skip_serializing_if).
4. **HopRecord doc 보강** (회차 1 carry-over)
sub_queries 의 per-kind 의미 명시 (Decompose=initial / Decide=next-iter
or empty=stop / Synthesize=always empty). llm_call_ms=0 의 ambiguity
(no call vs 0ms call) doc 명시.
5. **MULTI_HOP_MAX_SUB_QUERIES_DEFAULT → _HARD_CAP rename** (회차 1 carry-over)
const 의 의도 명확화 — config knob `multi_hop_max_sub_queries_per_iter`
(5, prompt-side soft hint) 와 const (10, parse-side hard ceiling)
분리. 두 layer 의 책임 doc 동기화. test 도 rename.
6. **decide guard 단순화 + preview budget doc** (회차 1 carry-over)
parse_decompose_response 의 post-condition (Some=non-empty 보장)
doc 명시. defensive `Some(qs) if !qs.is_empty()` →
`decide_result.unwrap_or_default()` 단순화. decide preview 의
snippet-only path (full chunk text 안 fetch) 의도 doc.
검증
- `cargo test -p kebab-rag -j 1` — 31 unit + 19 pipeline + 5 multi_hop
+ 3 prompt_template + 3 streaming 모두 통과.
- `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean.
Spec / plan
- design: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md
- plan: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-3b 단락)
다음 단계 = PR-4 (CLI --multi-hop + wire schema + error_wire).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -206,6 +206,47 @@ impl Retriever for MockRetriever {
|
||||
}
|
||||
}
|
||||
|
||||
/// p9-fb-41 PR-3b-ii: scripted retriever. Returns a different
|
||||
/// `Vec<SearchHit>` per `search` call from a pre-supplied sequence,
|
||||
/// so a multi-hop test can simulate "iter 1 returns chunk A, iter 2
|
||||
/// returns chunks A+B" (pool dedup) or "different sub-queries hit
|
||||
/// different docs". Exhaustion returns an empty `Vec` (no panic) —
|
||||
/// the pipeline already handles "no hits this round" gracefully via
|
||||
/// the dedup loop, and a panic would conflate "test forgot a row"
|
||||
/// with "pipeline made an unexpected extra retrieval call".
|
||||
///
|
||||
/// Use [`ScriptedRetriever::calls`] to assert the expected number
|
||||
/// of retrievals occurred.
|
||||
pub struct ScriptedRetriever {
|
||||
hits_per_call: Vec<Vec<SearchHit>>,
|
||||
next: std::sync::atomic::AtomicUsize,
|
||||
}
|
||||
|
||||
impl ScriptedRetriever {
|
||||
pub fn new(hits_per_call: Vec<Vec<SearchHit>>) -> Self {
|
||||
Self {
|
||||
hits_per_call,
|
||||
next: std::sync::atomic::AtomicUsize::new(0),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn calls(&self) -> usize {
|
||||
self.next.load(std::sync::atomic::Ordering::SeqCst)
|
||||
}
|
||||
}
|
||||
|
||||
impl Retriever for ScriptedRetriever {
|
||||
fn search(&self, _q: &SearchQuery) -> anyhow::Result<Vec<SearchHit>> {
|
||||
let idx = self
|
||||
.next
|
||||
.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
|
||||
Ok(self.hits_per_call.get(idx).cloned().unwrap_or_default())
|
||||
}
|
||||
fn index_version(&self) -> IndexVersion {
|
||||
IndexVersion("test-iv".to_string())
|
||||
}
|
||||
}
|
||||
|
||||
/// Pad a short prefix to the 32-hex shape `kebab_core` newtypes expect.
|
||||
pub fn id32(prefix: &str) -> String {
|
||||
let mut s = prefix.to_string();
|
||||
@@ -215,3 +256,128 @@ pub fn id32(prefix: &str) -> String {
|
||||
s.truncate(32);
|
||||
s
|
||||
}
|
||||
|
||||
/// p9-fb-41 PR-3b-ii: scripted language model. Returns a different
|
||||
/// canned response per `generate_stream` call from a pre-supplied
|
||||
/// `Vec<String>`. Mirrors `MockLanguageModel`'s streaming contract
|
||||
/// (one `TokenChunk::Token` per Unicode scalar, terminal `Done` with
|
||||
/// `canned_usage`, stop-string truncation honoured) but lets a test
|
||||
/// distinguish the decompose / per-iter decide / synthesize LLM calls
|
||||
/// of `RagPipeline::ask_multi_hop` — each can return a different
|
||||
/// payload (`["q1","q2"]`, `[]`, `"final answer [#1]"`, etc.).
|
||||
///
|
||||
/// Internally `Arc<Vec<String>> + AtomicUsize` so the type is
|
||||
/// `Send + Sync` and can be wrapped in `Arc<dyn LanguageModel>`.
|
||||
/// Tests can read `calls()` for an assertion on the expected LLM
|
||||
/// call count.
|
||||
///
|
||||
/// Exhaustion (more calls than scripted responses) panics — tests
|
||||
/// that need an "infinite" final response can supply a longer
|
||||
/// script; the panic message names the call index so the test
|
||||
/// failure points at the missing entry.
|
||||
pub struct ScriptedLm {
|
||||
model_id: String,
|
||||
provider: String,
|
||||
context_tokens: usize,
|
||||
/// Canned responses in call order. Index `i` is returned on the
|
||||
/// `i`-th `generate_stream` call (0-based).
|
||||
responses: Vec<String>,
|
||||
/// 0-based index of the next response to return on `generate_stream`.
|
||||
next: std::sync::atomic::AtomicUsize,
|
||||
canned_finish: kebab_core::FinishReason,
|
||||
canned_usage: kebab_core::TokenUsage,
|
||||
}
|
||||
|
||||
impl ScriptedLm {
|
||||
/// Build a scripted LM with the default model_id/provider used by
|
||||
/// the rest of the test suite (`mock-lm` / `mock`) and the
|
||||
/// MockLanguageModel-equivalent canned usage. Call `with_*`
|
||||
/// builders if a test needs to override the defaults.
|
||||
pub fn new(responses: Vec<&str>) -> Self {
|
||||
Self {
|
||||
model_id: "mock-lm".to_string(),
|
||||
provider: "mock".to_string(),
|
||||
context_tokens: 32_768,
|
||||
responses: responses.into_iter().map(str::to_string).collect(),
|
||||
next: std::sync::atomic::AtomicUsize::new(0),
|
||||
canned_finish: kebab_core::FinishReason::Stop,
|
||||
canned_usage: kebab_core::TokenUsage {
|
||||
prompt_tokens: 10,
|
||||
completion_tokens: 5,
|
||||
latency_ms: 7,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
/// Total `generate_stream` invocations so far. Tests use this to
|
||||
/// assert "exactly N LLM calls happened" without scanning the
|
||||
/// HopRecord trace (the trace is the user-visible signal; this
|
||||
/// is the lower-level call counter).
|
||||
pub fn calls(&self) -> usize {
|
||||
self.next.load(std::sync::atomic::Ordering::SeqCst)
|
||||
}
|
||||
|
||||
/// Earliest byte position of any non-empty stop string in
|
||||
/// `canned`. Same precedence rule as `MockLanguageModel`:
|
||||
/// `Iterator::min` returns the first equal element, so ties
|
||||
/// break by `stop` declaration order. `str::find` returns a
|
||||
/// UTF-8 char boundary by contract, so the resulting prefix
|
||||
/// slice is sound.
|
||||
fn apply_stop<'a>(canned: &'a str, stop: &[String]) -> (&'a str, bool) {
|
||||
let earliest = stop
|
||||
.iter()
|
||||
.filter(|s| !s.is_empty())
|
||||
.filter_map(|s| canned.find(s.as_str()))
|
||||
.min();
|
||||
match earliest {
|
||||
Some(idx) => (&canned[..idx], true),
|
||||
None => (canned, false),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl kebab_core::LanguageModel for ScriptedLm {
|
||||
fn model_ref(&self) -> kebab_core::ModelRef {
|
||||
kebab_core::ModelRef {
|
||||
id: self.model_id.clone(),
|
||||
provider: self.provider.clone(),
|
||||
dimensions: None,
|
||||
}
|
||||
}
|
||||
|
||||
fn context_tokens(&self) -> usize {
|
||||
self.context_tokens
|
||||
}
|
||||
|
||||
fn generate_stream(
|
||||
&self,
|
||||
req: kebab_core::GenerateRequest,
|
||||
) -> anyhow::Result<
|
||||
Box<dyn Iterator<Item = anyhow::Result<kebab_core::TokenChunk>> + Send>,
|
||||
> {
|
||||
let idx = self
|
||||
.next
|
||||
.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
|
||||
let canned = self.responses.get(idx).unwrap_or_else(|| {
|
||||
panic!(
|
||||
"ScriptedLm exhausted: call #{idx} requested but only {} responses scripted",
|
||||
self.responses.len()
|
||||
)
|
||||
});
|
||||
let (truncated, stop_hit) = Self::apply_stop(canned, &req.stop);
|
||||
let mut chunks: Vec<kebab_core::TokenChunk> = truncated
|
||||
.chars()
|
||||
.map(|c| kebab_core::TokenChunk::Token(c.to_string()))
|
||||
.collect();
|
||||
let finish_reason = if stop_hit {
|
||||
kebab_core::FinishReason::Stop
|
||||
} else {
|
||||
self.canned_finish.clone()
|
||||
};
|
||||
chunks.push(kebab_core::TokenChunk::Done {
|
||||
finish_reason,
|
||||
usage: self.canned_usage.clone(),
|
||||
});
|
||||
Ok(Box::new(chunks.into_iter().map(Ok)))
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user