Files
kebab/crates/kebab-rag/tests/multi_hop_nli_stream.rs
altair823 7c27633df2 chore(rag): post-PR9 refactor — H1/H2/H3/D/E + test coverage + post-refactor dogfood retest
OMC team `post-pr9-refactor` 의 architectural cleanup. architect priorities 분석 후 executor + test-engineer 가 file edits, system-architect 가 component-level review 로 *pre-cut nothing — all v0.18.1+ defer* 결론.

## Executor 작업 (H1/H2/H3/D/E)

- **H1** (kebab-nli/src/onnx.rs): `[models.nli]` config wire 활성화. `DEFAULT_MODEL_ID` const 제거 (kebab-config 의 NliCfg::defaults 가 single source). OnnxNliVerifier::new 가 config.models.nli.model 읽고 config.models.nli.provider 가 "onnx" 아니면 anyhow::bail. 3 stale "PR-9c-1 will wire this" 코멘트 제거. 2 unit test 추가 (`new_uses_config_model_id`, `new_rejects_unsupported_provider`).
- **H2** (kebab-rag/src/pipeline.rs): `truncate_for_nli(premise: &str, _hypothesis: &str)` → `truncate_for_nli(premise: &str)`. v0.18.1 placeholder doc 제거. 4 callsite (tests/multi_hop.rs) 갱신 + test rename `multi_hop_truncate_for_nli_preserves_hypothesis` → `multi_hop_truncate_for_nli_char_budget` (contract 정합).
- **H3** (kebab-rag/src/pipeline.rs:1041): `was_truncated` 가 tracing::debug! 으로 surface (observability 추가, signature 보존 — caller logging contract).
- **D** (kebab-mcp/tests/tools_call_ask_multi_hop.rs): request_timeout_secs 2 → 5 (slow CI 안정성), `mh_code` discriminator 제거. dispatch contract = `mh.is_error.unwrap_or(false)` (기존 assertion 으로 충분).
- **E** (tasks/HOTFIXES.md + pipeline.rs:1633-1638): fb-41 PR-9 closure entry 의 sibling 으로 "### PR-9 NLI refusal: terminal Synthesize hop omitted from hops trace" subsection 추가. pipeline 의 "cleanup deferred to a follow-up" → "// See tasks/HOTFIXES.md ... for follow-up" cross-link.

## Test-engineer 작업 (T1/T2/T3/T4, 9 new tests)

- **T1** (kebab-nli/src/onnx.rs::tests): sanitize_model_id 3 unit (replaces_slash / idempotent / leaves_other_chars).
- **T2** (kebab-rag/tests/multi_hop_nli_panic.rs 신규): 2 panic-path tests — facade invariant (`expect("verifier must be Some when nli_threshold > 0.0")`) 의 #[should_panic] + threshold=0 의 companion.
- **T3** (kebab-rag/tests/multi_hop_nli_stream.rs 신규): 2 StreamEvent::Final tests — refuse_nli_verification + refuse_nli_model_unavailable 의 stream_sink Final 분기 wire shape pinning.
- **T4** (kebab-app/tests/open_with_config_nli.rs 신규): 2 NLI failure path — model_dir 가 unwritable 일 때 App::open_with_config 의 Result<App> Err (with "OnnxNliVerifier" in chain) + threshold=0 일 때 graceful skip.

## System-architect 결론

3 lenses (absorption / duplication / under-engineered interface) 분석 결과 — *pre-cut nothing*. Top-3 items 모두 v0.18.1+ defer:
- Lens 1: kebab-normalize + kebab-parse-types 흡수 가능 (parse-md 만 사용, 5 parsers 우회) → v0.18.1+.
- Lens 3: Extractor + Chunker trait 의 dead polymorphism (모든 callsite 가 hardcoded) → v0.18.1+.
- Lens 1 bundled: kebab-source-fs 가 kebab-parse-code 의 9 tree-sitter grammars drag → low-risk dep-graph win, v0.18.1+ bundled.
- Defer-with-intent: LanguageModel async refactor (cloud-LLM 시), NliVerifier::score_batch + typed NliError (2nd impl 시), compute_stale → kebab-core::stale.

보고서: /build/cache/tmp/post-pr9-refactor-priorities.md, /build/cache/tmp/system-architecture-priorities.md (둘 다 repo 외 — analysis 보존).

## 검증

- cargo test -p kebab-nli -j 1 → 11/11 pass.
- cargo test -p kebab-rag -j 1 → 75/75 pass (5 NLI multi-hop + 4 신규 T2/T3 포함).
- cargo test -p kebab-app -j 1 → 23 pass + 2 ignored (T4 의 2 포함).
- cargo test -p kebab-mcp --test tools_call_ask_multi_hop -j 1 → 1 pass + 1 pre-existing flaky (HOTFIX #15, no_chunks short-circuit, executor D fix 와 무관 — line 86 의 base assertion 이 fixture 없어서 fail).
- cargo clippy --workspace --all-targets -j 1 -- -D warnings clean.
- cargo test --workspace --no-fail-fast -j 1 → 1304 passed (+11 new) + 1 pre-existing flaky 동일.
- **Post-refactor dogfood retest byte-identical** (PR-9d / post-cleanup / post-refactor 3번 모두): S7 0.0035389824770390987, S1 0.058334656059741974, S10 0.0027875436935573816, S3 nli_model_unavailable.

docs/dogfood/v0.18.0/SUMMARY.md 에 "Post-architectural-refactor retest" section 추가.

Wire 영향: 없음.
Behavior 영향: 없음 (H1 의 config wiring 가 default 와 같은 model → byte-identical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 04:42:37 +00:00

178 lines
7.3 KiB
Rust

//! Tests that the NLI refusal paths emit a `StreamEvent::Final` event
//! into `opts.stream_sink`.
//!
//! Coverage:
//! 1. `nli_verification_fail_emits_final_stream_event_with_refusal` —
//! `MockNliVerifier::fail()` → `RefusalReason::NliVerificationFailed`
//! arrives as the payload of the terminal `StreamEvent::Final`.
//! 2. `nli_model_unavailable_emits_final_stream_event_with_refusal` —
//! `MockNliVerifier::err()` → `RefusalReason::NliModelUnavailable`
//! arrives as the payload of the terminal `StreamEvent::Final`.
//!
//! Note: `ask_multi_hop` does NOT have a separate `_streaming` entrypoint.
//! Streaming is handled by the `stream_sink: Option<Sender<StreamEvent>>`
//! field on `AskOpts`. Both refusal helpers (`refuse_nli_verification` and
//! `refuse_nli_model_unavailable`) fire `sink.send(StreamEvent::Final { … })`
//! before returning — these tests pin that wire shape.
mod common;
use std::sync::Arc;
use std::sync::mpsc;
use common::{MockNliVerifier, RagEnv, ScriptedLm, ScriptedRetriever, id32, mk_hit};
use kebab_core::{LanguageModel, RefusalReason, Retriever, SearchMode};
use kebab_nli::NliVerifier;
use kebab_rag::{AskOpts, RagPipeline, StreamEvent};
// ── shared helpers ─────────────────────────────────────────────────────────
/// Build the minimal happy-path scenario that reaches step 8.5 (NLI gate):
/// probe passes, decompose → one sub-query, decide → stop, synthesize →
/// non-empty answer. Returns the env, scripted retriever, and scripted LM.
fn happy_env_for_stream() -> (RagEnv, Arc<ScriptedRetriever>, Arc<ScriptedLm>) {
let env = RagEnv::new();
let cid = id32("c1");
let did = id32("d1");
env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
let hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
// Entry 0 = probe, entry 1 = decompose-driven retrieve.
let retriever = Arc::new(ScriptedRetriever::new(vec![hits.clone(), hits]));
let lm = Arc::new(ScriptedLm::new(vec![
r#"["q1"]"#, // decompose
r"[]", // decide: stop
"answer body [#1]", // synthesize: non-empty so NLI gate runs
]));
(env, retriever, lm)
}
/// Multi-hop `AskOpts` with a `stream_sink` wired in so every pipeline
/// stage emits `StreamEvent`s into `tx`.
fn multi_hop_opts_with_sink(tx: mpsc::Sender<StreamEvent>) -> AskOpts {
AskOpts {
k: 5,
explain: false,
mode: SearchMode::Lexical,
temperature: Some(0.0),
seed: Some(0),
stream_sink: Some(tx),
history: Vec::new(),
conversation_id: None,
turn_index: None,
multi_hop: true,
}
}
/// Drain `rx` and return the first `StreamEvent::Final` found, panicking
/// with a clear message if none is present.
fn expect_final_event(rx: mpsc::Receiver<StreamEvent>) -> StreamEvent {
let events: Vec<StreamEvent> = rx.try_iter().collect();
events
.into_iter()
.find(|e| matches!(e, StreamEvent::Final { .. }))
.expect("pipeline must emit at least one StreamEvent::Final")
}
// ── 1. NliVerificationFailed ───────────────────────────────────────────────
#[test]
fn nli_verification_fail_emits_final_stream_event_with_refusal() {
let (env, retriever, lm) = happy_env_for_stream();
let mut cfg = env.config.clone();
cfg.rag.nli_threshold = 0.5; // entailment 0.1 < 0.5 → refusal
let retriever_dyn: Arc<dyn Retriever> = retriever;
let lm_dyn: Arc<dyn LanguageModel> = lm;
let verifier = MockNliVerifier::fail(); // entailment score = 0.1
let verifier_dyn: Arc<dyn NliVerifier> = verifier;
let (tx, rx) = mpsc::channel::<StreamEvent>();
let pipeline = RagPipeline::new(cfg, retriever_dyn, lm_dyn, env.sqlite.clone())
.with_verifier(verifier_dyn);
let answer = pipeline
.ask("compound", multi_hop_opts_with_sink(tx))
.expect("pipeline returns Ok even on NLI refusal");
// Synchronous return value.
assert_eq!(
answer.refusal_reason,
Some(RefusalReason::NliVerificationFailed),
"return value must carry NliVerificationFailed"
);
assert!(!answer.grounded, "NLI refusal must not be grounded");
// Stream wire shape: terminal Final event must carry matching refusal.
let final_event = expect_final_event(rx);
match final_event {
StreamEvent::Final { answer: streamed } => {
assert_eq!(
streamed.refusal_reason,
Some(RefusalReason::NliVerificationFailed),
"Final event's answer must carry NliVerificationFailed"
);
assert!(!streamed.grounded);
// verification summary is stamped even on the refusal path.
let v = streamed
.verification
.expect("NliVerificationFailed carries a VerificationSummary");
assert!(!v.nli_passed);
assert!((v.nli_score - 0.1).abs() < 1e-5, "score: {}", v.nli_score);
}
other => panic!("expected StreamEvent::Final, got {other:?}"),
}
}
// ── 2. NliModelUnavailable ─────────────────────────────────────────────────
#[test]
fn nli_model_unavailable_emits_final_stream_event_with_refusal() {
let (env, retriever, lm) = happy_env_for_stream();
let mut cfg = env.config.clone();
cfg.rag.nli_threshold = 0.5; // gate enabled; verifier will error
let retriever_dyn: Arc<dyn Retriever> = retriever;
let lm_dyn: Arc<dyn LanguageModel> = lm;
let verifier = MockNliVerifier::err(); // returns anyhow::Error
let verifier_dyn: Arc<dyn NliVerifier> = verifier;
let (tx, rx) = mpsc::channel::<StreamEvent>();
let pipeline = RagPipeline::new(cfg, retriever_dyn, lm_dyn, env.sqlite.clone())
.with_verifier(verifier_dyn);
let answer = pipeline
.ask("compound", multi_hop_opts_with_sink(tx))
.expect("pipeline returns Ok even when NLI model is unavailable");
// Synchronous return value.
assert_eq!(
answer.refusal_reason,
Some(RefusalReason::NliModelUnavailable),
"return value must carry NliModelUnavailable"
);
assert!(!answer.grounded);
// verification is None — we can't summarize what didn't happen.
assert!(
answer.verification.is_none(),
"NliModelUnavailable must leave Answer.verification = None"
);
// Stream wire shape: terminal Final event must carry matching refusal.
let final_event = expect_final_event(rx);
match final_event {
StreamEvent::Final { answer: streamed } => {
assert_eq!(
streamed.refusal_reason,
Some(RefusalReason::NliModelUnavailable),
"Final event's answer must carry NliModelUnavailable"
);
assert!(!streamed.grounded);
assert!(
streamed.verification.is_none(),
"NliModelUnavailable: verification must be None in the streamed Final event"
);
}
other => panic!("expected StreamEvent::Final, got {other:?}"),
}
}