fix(rag): S3 NLI unavailable — hypothesis char budget + token-count fallback retry

S3 dogfood query 의 `nli_model_unavailable` consistent fail root cause = mDeBERTa-v3 tokenizer 의 `OnlyFirst` strategy + 949-token hypothesis. 기존 char-budget 단독 fix 의 KR-extreme density 미해결 → token-count fallback retry + RC1-residual trait dispatch 정합.

핵심 변경:
- kebab-nli::NliVerifier: `hypothesis_token_count(&str) -> Result<usize>` trait method 추가 (default `Ok(0)` backward-compat). `OnnxNliVerifier` 가 *trait impl block* 안에서 real mDeBERTa tokenize override — vtable 등록 보장 (round-3 critic RC1-residual closure).
- kebab-rag::pipeline: `MAX_NLI_HYPOTHESIS_CHARS_INITIAL = 1200` + `MAX_NLI_HYPOTHESIS_CHARS_MIN = 150` const + `pub(crate) fn truncate_chars` pure-fn + `pub fn truncate_hypothesis_for_nli_with_budget` retry helper (char budget 반감 retry, min floor 시 graceful unavailable). step 8.5 hook 의 callsite explicit `match` + `return self.refuse_nli_model_unavailable` 패턴 (`?` 금지 — round-2 plan critic CRITICAL #1 closure).
- SpyNliVerifier 신규 helper (closure score_fn + hypothesis_token_count_fn, 2-arg constructor).
- §5.1 의 2 ignored test (EN-long err + vtable dispatch RC1-residual pin) + §5.2 의 4 boundary test (truncate_chars) + §5.3 의 3 mock multi-hop test (long_en_grounded / long_kr_retries / unrelenting_fallback). +7 new tests (2 ignored default skip).
- tasks/HOTFIXES.md 신규 dated entry `## 2026-05-26 — S3 NLI unavailable ...` — Symptom / Root cause / Action / Amends 4-block.
- spec + plan (`docs/superpowers/{specs,plans}/2026-05-26-s3-nli-model-unavailable-diagnose-*.md`) — 4 round spec + 3 round plan OMC reviewer ACCEPT 산출물.

검증:
- cargo test -p kebab-nli -j 1 → 11/11 pass + 7 ignored default skip.
- cargo test -p kebab-rag -j 1 → 19+3+3+... 전체 pass + 3 new mock + 4 new boundary.
- cargo test --workspace --no-fail-fast -j 1 → **1313 pass (+7 new)**, 0 failed. 회귀 0 (HOTFIX #15 이미 fixed, no remaining flaky).
- cargo clippy --workspace --all-targets -j 1 -- -D warnings clean (type_complexity allow on Arc<dyn Fn> type aliases).

KR safe (token-count retry path) + graceful fallback (min floor 시 기존 unavailable wire 유지, regression 0). Wire 영향 없음 (additive trait method). Cargo bump 불필요.

Refs:
- spec: docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md (4 round APPROVE — analyst → critic + verifier × 4 rounds)
- plan: docs/superpowers/plans/2026-05-26-s3-nli-model-unavailable-diagnose-plan.md (3 round ACCEPT — planner → critic-plan + verifier-plan × 3 rounds)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-26 09:12:21 +00:00
parent 1a224bf983
commit 336962715a
10 changed files with 1654 additions and 4 deletions

View File

@@ -455,3 +455,52 @@ impl NliVerifier for MockNliVerifier {
}
}
}
/// S3 follow-up (2026-05-26): closure type aliases for `SpyNliVerifier`
/// fields — clippy `type_complexity` 회피.
#[allow(clippy::type_complexity)]
pub type ScoreFn = Arc<dyn Fn(&str, &str) -> anyhow::Result<NliScores> + Send + Sync>;
#[allow(clippy::type_complexity)]
pub type HypothesisTokenCountFn = Arc<dyn Fn(&str) -> anyhow::Result<usize> + Send + Sync>;
/// S3 follow-up (2026-05-26): closure-based NLI verifier — caller 가
/// `(premise, hypothesis) -> Result<NliScores>` + `(hypothesis) ->
/// Result<usize>` 두 closures 정의 가능 + spy 로 입력 capture. 기존
/// `MockNliVerifier` (고정 mode) 와 sibling. truncate_hypothesis_for_nli_with_budget
/// retry loop 의 token-count 시뮬레이션 + final score 동작을 한 verifier
/// 안에서 inject 하기 위함.
pub struct SpyNliVerifier {
pub score_fn: ScoreFn,
pub hypothesis_token_count_fn: HypothesisTokenCountFn,
pub received_premises: Mutex<Vec<String>>,
pub received_hypotheses: Mutex<Vec<String>>,
}
impl SpyNliVerifier {
/// 2-arg constructor — score + hypothesis_token_count 둘 다 closure
/// 로 주입. `Arc<T>` field mutation 불가 회피를 위해 한 번에 전달.
pub fn new<F, G>(score_fn: F, token_count_fn: G) -> Arc<Self>
where
F: Fn(&str, &str) -> anyhow::Result<NliScores> + Send + Sync + 'static,
G: Fn(&str) -> anyhow::Result<usize> + Send + Sync + 'static,
{
Arc::new(Self {
score_fn: Arc::new(score_fn),
hypothesis_token_count_fn: Arc::new(token_count_fn),
received_premises: Mutex::new(Vec::new()),
received_hypotheses: Mutex::new(Vec::new()),
})
}
}
impl NliVerifier for SpyNliVerifier {
fn score(&self, premise: &str, hypothesis: &str) -> anyhow::Result<NliScores> {
self.received_premises.lock().unwrap().push(premise.to_string());
self.received_hypotheses.lock().unwrap().push(hypothesis.to_string());
(self.score_fn)(premise, hypothesis)
}
fn hypothesis_token_count(&self, hypothesis: &str) -> anyhow::Result<usize> {
(self.hypothesis_token_count_fn)(hypothesis)
}
}