fix(rag): S3 NLI unavailable — hypothesis char budget + token-count fallback retry

S3 dogfood query 의 `nli_model_unavailable` consistent fail root cause = mDeBERTa-v3 tokenizer 의 `OnlyFirst` strategy + 949-token hypothesis. 기존 char-budget 단독 fix 의 KR-extreme density 미해결 → token-count fallback retry + RC1-residual trait dispatch 정합. 핵심 변경: - kebab-nli::NliVerifier: `hypothesis_token_count(&str) -> Result<usize>` trait method 추가 (default `Ok(0)` backward-compat). `OnnxNliVerifier` 가 *trait impl block* 안에서 real mDeBERTa tokenize override — vtable 등록 보장 (round-3 critic RC1-residual closure). - kebab-rag::pipeline: `MAX_NLI_HYPOTHESIS_CHARS_INITIAL = 1200` + `MAX_NLI_HYPOTHESIS_CHARS_MIN = 150` const + `pub(crate) fn truncate_chars` pure-fn + `pub fn truncate_hypothesis_for_nli_with_budget` retry helper (char budget 반감 retry, min floor 시 graceful unavailable). step 8.5 hook 의 callsite explicit `match` + `return self.refuse_nli_model_unavailable` 패턴 (`?` 금지 — round-2 plan critic CRITICAL #1 closure). - SpyNliVerifier 신규 helper (closure score_fn + hypothesis_token_count_fn, 2-arg constructor). - §5.1 의 2 ignored test (EN-long err + vtable dispatch RC1-residual pin) + §5.2 의 4 boundary test (truncate_chars) + §5.3 의 3 mock multi-hop test (long_en_grounded / long_kr_retries / unrelenting_fallback). +7 new tests (2 ignored default skip). - tasks/HOTFIXES.md 신규 dated entry `## 2026-05-26 — S3 NLI unavailable ...` — Symptom / Root cause / Action / Amends 4-block. - spec + plan (`docs/superpowers/{specs,plans}/2026-05-26-s3-nli-model-unavailable-diagnose-*.md`) — 4 round spec + 3 round plan OMC reviewer ACCEPT 산출물. 검증: - cargo test -p kebab-nli -j 1 → 11/11 pass + 7 ignored default skip. - cargo test -p kebab-rag -j 1 → 19+3+3+... 전체 pass + 3 new mock + 4 new boundary. - cargo test --workspace --no-fail-fast -j 1 → **1313 pass (+7 new)**, 0 failed. 회귀 0 (HOTFIX #15 이미 fixed, no remaining flaky). - cargo clippy --workspace --all-targets -j 1 -- -D warnings clean (type_complexity allow on Arc<dyn Fn> type aliases). KR safe (token-count retry path) + graceful fallback (min floor 시 기존 unavailable wire 유지, regression 0). Wire 영향 없음 (additive trait method). Cargo bump 불필요. Refs: - spec: docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md (4 round APPROVE — analyst → critic + verifier × 4 rounds) - plan: docs/superpowers/plans/2026-05-26-s3-nli-model-unavailable-diagnose-plan.md (3 round ACCEPT — planner → critic-plan + verifier-plan × 3 rounds) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 09:12:21 +00:00
parent 1a224bf983
commit 336962715a
10 changed files with 1654 additions and 4 deletions
--- a/crates/kebab-rag/tests/common/mod.rs
+++ b/crates/kebab-rag/tests/common/mod.rs
@@ -455,3 +455,52 @@ impl NliVerifier for MockNliVerifier {
        }
    }
 }
+
+/// S3 follow-up (2026-05-26): closure type aliases for `SpyNliVerifier`
+/// fields — clippy `type_complexity` 회피.
+#[allow(clippy::type_complexity)]
+pub type ScoreFn = Arc<dyn Fn(&str, &str) -> anyhow::Result<NliScores> + Send + Sync>;
+#[allow(clippy::type_complexity)]
+pub type HypothesisTokenCountFn = Arc<dyn Fn(&str) -> anyhow::Result<usize> + Send + Sync>;
+
+/// S3 follow-up (2026-05-26): closure-based NLI verifier — caller 가
+/// `(premise, hypothesis) -> Result<NliScores>` + `(hypothesis) ->
+/// Result<usize>` 두 closures 정의 가능 + spy 로 입력 capture. 기존
+/// `MockNliVerifier` (고정 mode) 와 sibling. truncate_hypothesis_for_nli_with_budget
+/// retry loop 의 token-count 시뮬레이션 + final score 동작을 한 verifier
+/// 안에서 inject 하기 위함.
+pub struct SpyNliVerifier {
+    pub score_fn: ScoreFn,
+    pub hypothesis_token_count_fn: HypothesisTokenCountFn,
+    pub received_premises: Mutex<Vec<String>>,
+    pub received_hypotheses: Mutex<Vec<String>>,
+}
+
+impl SpyNliVerifier {
+    /// 2-arg constructor — score + hypothesis_token_count 둘 다 closure
+    /// 로 주입. `Arc<T>` field mutation 불가 회피를 위해 한 번에 전달.
+    pub fn new<F, G>(score_fn: F, token_count_fn: G) -> Arc<Self>
+    where
+        F: Fn(&str, &str) -> anyhow::Result<NliScores> + Send + Sync + 'static,
+        G: Fn(&str) -> anyhow::Result<usize> + Send + Sync + 'static,
+    {
+        Arc::new(Self {
+            score_fn: Arc::new(score_fn),
+            hypothesis_token_count_fn: Arc::new(token_count_fn),
+            received_premises: Mutex::new(Vec::new()),
+            received_hypotheses: Mutex::new(Vec::new()),
+        })
+    }
+}
+
+impl NliVerifier for SpyNliVerifier {
+    fn score(&self, premise: &str, hypothesis: &str) -> anyhow::Result<NliScores> {
+        self.received_premises.lock().unwrap().push(premise.to_string());
+        self.received_hypotheses.lock().unwrap().push(hypothesis.to_string());
+        (self.score_fn)(premise, hypothesis)
+    }
+
+    fn hypothesis_token_count(&self, hypothesis: &str) -> anyhow::Result<usize> {
+        (self.hypothesis_token_count_fn)(hypothesis)
+    }
+}