feat(rag): fb-41 PR-9c-2 — pipeline integration + mock test + SKILL.md (★ NLI 실 활성화)
PR-9c-1 의 wire surface 위에 behavior 활성화 — `ask_multi_hop` 의 step 8.5 hook 가 `[rag] nli_threshold > 0` 일 때 NLI 검증 실 수행. **첫 user-visible behavior change** in PR-9. - crates/kebab-rag/src/pipeline.rs: - ask_multi_hop step 8.5 NLI hook (empty answer 가드 + truncate_for_nli + verifier.score + verification field + refusal 분기). - refuse_nli_verification helper (verification: Some(...) + RefusalReason::NliVerificationFailed). - refuse_nli_model_unavailable helper (verification: None + RefusalReason::NliModelUnavailable). - truncate_for_nli helper (module-level pub fn, MAX_NLI_PREMISE_CHARS = 4 * 400 = 1600 chars 의 chars-based budget, _hypothesis 미사용 placeholder — v0.18.1 token-budget 갱신 candidate). - PR-9c-1 의 #[allow(dead_code)] 두 곳 제거 (verifier field + with_verifier builder; doc 의 transitional sentence 도 정리). round-1 PR-9c-1 review N1 carry-forward closure. - crates/kebab-app/src/app.rs: - App::open_with_config 의 NliVerifier construction — config.rag.nli_threshold > 0 → OnnxNliVerifier::new + Arc::new wrap + 후속 RagPipeline 초기화 시 with_verifier 호출. 실패 시 ? 전파 (시그니처 Result<Self> 그대로 — caller cascading 0). - kebab-app/Cargo.toml 에 kebab-nli path 의존 추가. - crates/kebab-rag/tests/multi_hop.rs + tests/common/mod.rs: - MockNliVerifier (pass / fail / err 생성자 + score call_count instrumented). - multi_hop_nli_pass_keeps_grounded — entailment 0.9 → grounded=true, verification.nli_passed=true. - multi_hop_nli_fail_refuses — entailment 0.1 → refusal=NliVerificationFailed. - multi_hop_nli_disabled_skip_verify — threshold 0.0 → verify skip, verification=None. - multi_hop_nli_model_unavailable_refuses — verifier Err → refusal=NliModelUnavailable. - multi_hop_truncate_for_nli_preserves_hypothesis — long premise truncation + hypothesis 보전. - integrations/claude-code/kebab/SKILL.md: mcp__kebab__ask 절에 NLI 안내 한 단락 (verification.nli_passed 의미 + threshold tuning + nli_verification_failed/nli_model_unavailable refusal handling). 검증: cargo test --workspace -j 1 — 5 신규 multi-hop pass + 회귀 0 (pre-existing kebab-mcp::tools_call_ask_multi_hop 동일 flaky). cargo clippy --workspace --all-targets -j 1 -- -D warnings clean. Wire 영향: PR-9c-1 의 schema 변경에 *behavior wiring* — answer.v1.verification field 가 multi-hop happy path + refuse path 양쪽에서 채움. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -12,13 +12,14 @@
|
||||
|
||||
#![allow(dead_code)]
|
||||
|
||||
use std::sync::Arc;
|
||||
use std::sync::{Arc, Mutex};
|
||||
|
||||
use kebab_config::Config;
|
||||
use kebab_core::{
|
||||
ChunkerVersion, ChunkId, Citation, DocumentId, IndexVersion, RetrievalDetail,
|
||||
Retriever, SearchHit, SearchMode, SearchQuery, WorkspacePath,
|
||||
};
|
||||
use kebab_nli::{NliScores, NliVerifier};
|
||||
use kebab_store_sqlite::SqliteStore;
|
||||
use rusqlite::params;
|
||||
use tempfile::TempDir;
|
||||
@@ -384,3 +385,73 @@ impl kebab_core::LanguageModel for ScriptedLm {
|
||||
Ok(Box::new(chunks.into_iter().map(Ok)))
|
||||
}
|
||||
}
|
||||
|
||||
/// p9-fb-41 PR-9c-2: mock NLI verifier for multi-hop step 8.5 tests.
|
||||
/// Three constructors mirror the test scenarios:
|
||||
/// - [`MockNliVerifier::pass`] — high entailment score (0.9), `nli_passed`
|
||||
/// is true at the production default threshold (0.5).
|
||||
/// - [`MockNliVerifier::fail`] — low entailment (0.1), refuses at any
|
||||
/// threshold > 0.1.
|
||||
/// - [`MockNliVerifier::err`] — returns an `anyhow::Error` so the pipeline
|
||||
/// surfaces `RefusalReason::NliModelUnavailable`.
|
||||
///
|
||||
/// `call_count` instrumented (Mutex-wrapped usize) so a test can assert
|
||||
/// the verifier ran the expected number of times — useful for pinning
|
||||
/// the "threshold = 0 skips verify" invariant when the verifier is
|
||||
/// nonetheless attached to the pipeline.
|
||||
pub struct MockNliVerifier {
|
||||
pub mode: MockMode,
|
||||
pub call_count: Mutex<usize>,
|
||||
}
|
||||
|
||||
pub enum MockMode {
|
||||
/// Return these scores. Used by pass / fail variants.
|
||||
Scores(NliScores),
|
||||
/// Return an `anyhow::Error`. Used by the err variant.
|
||||
Err(String),
|
||||
}
|
||||
|
||||
impl MockNliVerifier {
|
||||
pub fn pass() -> Arc<Self> {
|
||||
Arc::new(Self {
|
||||
mode: MockMode::Scores(NliScores {
|
||||
entailment: 0.9,
|
||||
neutral: 0.07,
|
||||
contradiction: 0.03,
|
||||
}),
|
||||
call_count: Mutex::new(0),
|
||||
})
|
||||
}
|
||||
|
||||
pub fn fail() -> Arc<Self> {
|
||||
Arc::new(Self {
|
||||
mode: MockMode::Scores(NliScores {
|
||||
entailment: 0.1,
|
||||
neutral: 0.4,
|
||||
contradiction: 0.5,
|
||||
}),
|
||||
call_count: Mutex::new(0),
|
||||
})
|
||||
}
|
||||
|
||||
pub fn err() -> Arc<Self> {
|
||||
Arc::new(Self {
|
||||
mode: MockMode::Err("mock NLI unavailable".into()),
|
||||
call_count: Mutex::new(0),
|
||||
})
|
||||
}
|
||||
|
||||
pub fn calls(&self) -> usize {
|
||||
*self.call_count.lock().unwrap()
|
||||
}
|
||||
}
|
||||
|
||||
impl NliVerifier for MockNliVerifier {
|
||||
fn score(&self, _premise: &str, _hypothesis: &str) -> anyhow::Result<NliScores> {
|
||||
*self.call_count.lock().unwrap() += 1;
|
||||
match &self.mode {
|
||||
MockMode::Scores(s) => Ok(*s),
|
||||
MockMode::Err(e) => anyhow::bail!("{e}"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user