PR-2 of fb-41 multi-hop RAG. Decompose + retrieve + synthesize 3-stage
pipeline가 `opts.multi_hop=true` 일 때 dispatch. Dynamic decide loop
는 PR-3.
- `AskOpts.multi_hop: bool` 필드 추가 + `impl Default for AskOpts`
도입 (HOTFIXES 2026-05-07 의 known limitation 해소). 9 explicit
init site 모두 `multi_hop: false` 추가 — Default 도입으로 향후
`..Default::default()` 점진 migrate 가능.
- `RagPipeline::ask` 의 entry 에 dispatcher 한 줄
(`if opts.multi_hop { return self.ask_multi_hop(...) }`).
- `RagPipeline::ask_multi_hop` 신규 method. 1) decompose LLM call
→ JSON array of strings parse, 2) 각 sub-query 로 retrieve +
chunk_id dedup pool, 3) score gate / no-chunks 가드, 4)
pack_context (single-pass 와 helper 공유), 5) synthesize LLM
call w/ MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT, 6) citation extract
+ Answer build. `prompt_template_version` = "rag-multi-hop-v1"
로 stamp — eval `compare` 가 single-pass vs multi-hop 분리.
- Prompt const 신규: MULTI_HOP_DECOMPOSE_SYSTEM_PROMPT +
MULTI_HOP_DECOMPOSE_USER_TEMPLATE + MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT
+ PROMPT_TEMPLATE_VERSION_MULTI_HOP + MULTI_HOP_MAX_SUB_QUERIES_DEFAULT.
- `kebab_core::RefusalReason::MultiHopDecomposeFailed` variant 신규.
Cascade: kebab-store-sqlite `refusal_reason_label` + kebab-tui `ask
refusal render` exhaustive match 갱신.
- `parse_decompose_response` + `strip_markdown_json_fence` helper —
markdown code fence (```json / ```) strip + JSON array of strings
parse + trim + drop empty + cap at MULTI_HOP_MAX_SUB_QUERIES_DEFAULT.
None 반환 시 caller 가 `MultiHopDecomposeFailed` refusal.
Tests (55 passing total, 8 신규):
- 6 unit (parse_decompose_response 의 bare array / fence variants /
garbage / cap / trim 회귀 핀).
- 2 integration: `ask_multi_hop_dispatches_and_decompose_garbage_refuses`
(decompose garbage → MultiHopDecomposeFailed + 정확히 1 LLM call) +
`ask_with_multi_hop_false_keeps_single_pass_path` (회귀 핀, 기존
caller 자동 backwards-compat).
Happy-path multi-hop (decompose 성공 → synthesize) 의 integration
test 는 ScriptedLm helper 가 PR-3 의 decide loop 와 함께 도입될
때 같이 추가. 현 `MockLanguageModel` 는 canned single response 라
2-LLM-call sequence 핀 불가.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
165 lines
5.6 KiB
Rust
165 lines
5.6 KiB
Rust
//! p9-fb-40: integration tests for rag-v1 / rag-v2 / unknown-version dispatch.
|
|
//!
|
|
//! Wraps `MockLanguageModel` in a `CapturingLm` that snapshots
|
|
//! `GenerateRequest::system` on every `generate_stream` call so the
|
|
//! tests can assert which template constant the pipeline rendered.
|
|
|
|
mod common;
|
|
|
|
use std::sync::{Arc, Mutex};
|
|
|
|
use common::{MockRetriever, RagEnv, id32, mk_hit};
|
|
use kebab_core::{FinishReason, LanguageModel, Retriever, SearchMode, TokenChunk, TokenUsage};
|
|
use kebab_llm::MockLanguageModel;
|
|
use kebab_rag::{AskOpts, RagPipeline};
|
|
|
|
const TEST_LM_ID: &str = "mock-lm";
|
|
|
|
/// LM wrapper that captures the system prompt of the most-recent
|
|
/// `generate_stream` call, so tests can assert which template was
|
|
/// rendered. Mirrors the `CountingLm` pattern from
|
|
/// `tests/streaming_events.rs` but stores `req.system` instead of a
|
|
/// call counter.
|
|
struct CapturingLm {
|
|
inner: MockLanguageModel,
|
|
captured_system: Arc<Mutex<Option<String>>>,
|
|
}
|
|
|
|
impl CapturingLm {
|
|
fn new(captured: Arc<Mutex<Option<String>>>) -> Self {
|
|
Self {
|
|
inner: MockLanguageModel {
|
|
model_id: TEST_LM_ID.to_string(),
|
|
provider: "mock".to_string(),
|
|
context_tokens: 32_768,
|
|
canned_response: "근거가 충분합니다 [#1]".to_string(),
|
|
canned_finish: FinishReason::Stop,
|
|
canned_usage: TokenUsage {
|
|
prompt_tokens: 10,
|
|
completion_tokens: 5,
|
|
latency_ms: 7,
|
|
},
|
|
},
|
|
captured_system: captured,
|
|
}
|
|
}
|
|
}
|
|
|
|
impl LanguageModel for CapturingLm {
|
|
fn model_ref(&self) -> kebab_core::ModelRef {
|
|
self.inner.model_ref()
|
|
}
|
|
fn context_tokens(&self) -> usize {
|
|
self.inner.context_tokens()
|
|
}
|
|
fn generate_stream(
|
|
&self,
|
|
req: kebab_core::GenerateRequest,
|
|
) -> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<TokenChunk>> + Send>> {
|
|
*self.captured_system.lock().unwrap() = Some(req.system.clone());
|
|
self.inner.generate_stream(req)
|
|
}
|
|
}
|
|
|
|
/// Mirror of `streaming_events::opts_with_sink` minus the sink. p9-fb-41
|
|
/// added `impl Default for AskOpts` — these explicit fixtures stay
|
|
/// for now so a future field addition fails compilation here too,
|
|
/// surfacing intent. New callers should prefer `..Default::default()`.
|
|
fn lexical_opts() -> AskOpts {
|
|
AskOpts {
|
|
k: 3,
|
|
explain: false,
|
|
mode: SearchMode::Lexical,
|
|
temperature: Some(0.0),
|
|
seed: Some(0),
|
|
stream_sink: None,
|
|
history: Vec::new(),
|
|
conversation_id: None,
|
|
turn_index: None,
|
|
multi_hop: false,
|
|
}
|
|
}
|
|
|
|
/// Build a `RagPipeline` with the given `prompt_template_version`.
|
|
/// Returns the pipeline, the captured-system handle, and the env (kept
|
|
/// alive for the test body — drops the SqliteStore + tempdir together).
|
|
fn build_pipeline_with_template(
|
|
version: &str,
|
|
) -> (RagPipeline, Arc<Mutex<Option<String>>>, RagEnv) {
|
|
let mut env = RagEnv::new();
|
|
env.config.rag.prompt_template_version = version.to_string();
|
|
// Drop score gate so the seeded hit (fusion_score = 0.9) always
|
|
// makes it through — the dispatch we want to exercise lives past
|
|
// the gate.
|
|
env.config.rag.score_gate = 0.0;
|
|
let captured = Arc::new(Mutex::new(None));
|
|
let lm: Arc<dyn LanguageModel> = Arc::new(CapturingLm::new(captured.clone()));
|
|
// Seed one chunk so the [근거] block has content and the LM is
|
|
// actually invoked on the success path.
|
|
let chunk_id = id32("c");
|
|
let doc_id = id32("d");
|
|
env.seed_chunk(&chunk_id, &doc_id, "a.md", "hello world", &["H"]);
|
|
let hit = mk_hit(1, &chunk_id, &doc_id, "a.md", 0.9, &["H"]);
|
|
let retriever: Arc<dyn Retriever> = Arc::new(MockRetriever::new(vec![hit]));
|
|
let pipeline = RagPipeline::new(env.config.clone(), retriever, lm, env.sqlite.clone());
|
|
(pipeline, captured, env)
|
|
}
|
|
|
|
#[test]
|
|
fn ask_with_rag_v1_uses_v1_system_prompt() {
|
|
let (pipeline, captured, _env) = build_pipeline_with_template("rag-v1");
|
|
let _ = pipeline.ask("hello", lexical_opts());
|
|
let s = captured
|
|
.lock()
|
|
.unwrap()
|
|
.clone()
|
|
.expect("system prompt captured");
|
|
assert!(
|
|
s.contains("로컬 KB 위에서 동작"),
|
|
"shared V1/V2 prefix expected, got: {s}"
|
|
);
|
|
assert!(
|
|
!s.contains("학습 지식"),
|
|
"V1 must NOT contain V2-only 학습 지식 rule, got: {s}"
|
|
);
|
|
assert!(
|
|
!s.contains("확실하지 않다"),
|
|
"V1 must NOT contain V2-only 확실하지 않다 rule, got: {s}"
|
|
);
|
|
}
|
|
|
|
#[test]
|
|
fn ask_with_rag_v2_uses_v2_system_prompt() {
|
|
let (pipeline, captured, _env) = build_pipeline_with_template("rag-v2");
|
|
let _ = pipeline.ask("hello", lexical_opts());
|
|
let s = captured
|
|
.lock()
|
|
.unwrap()
|
|
.clone()
|
|
.expect("system prompt captured");
|
|
assert!(
|
|
s.contains("학습 지식"),
|
|
"V2 must contain 학습 지식 rule, got: {s}"
|
|
);
|
|
assert!(
|
|
s.contains("확실하지 않다"),
|
|
"V2 must contain 확실하지 않다 rule, got: {s}"
|
|
);
|
|
assert!(
|
|
s.contains("큰따옴표"),
|
|
"V2 must contain 큰따옴표 rule, got: {s}"
|
|
);
|
|
}
|
|
|
|
#[test]
|
|
fn ask_with_unknown_template_returns_early_error() {
|
|
let (pipeline, _captured, _env) = build_pipeline_with_template("rag-v99");
|
|
let result = pipeline.ask("hello", lexical_opts());
|
|
assert!(result.is_err(), "expected error on unknown version");
|
|
let msg = format!("{:#}", result.unwrap_err());
|
|
assert!(
|
|
msg.contains("rag-v99") && msg.contains("expected"),
|
|
"expected error to mention version + expected list, got: {msg}"
|
|
);
|
|
}
|