From 336962715a452f0ae6fcb7f09d3c75f3c8731cbd Mon Sep 17 00:00:00 2001
From: altair823 <dlsrks0734@gmail.com>
Date: Tue, 26 May 2026 09:12:21 +0000
Subject: [PATCH] =?UTF-8?q?fix(rag):=20S3=20NLI=20unavailable=20=E2=80=94?=
 =?UTF-8?q?=20hypothesis=20char=20budget=20+=20token-count=20fallback=20re?=
 =?UTF-8?q?try?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

S3 dogfood query 의 `nli_model_unavailable` consistent fail root cause = mDeBERTa-v3 tokenizer 의 `OnlyFirst` strategy + 949-token hypothesis. 기존 char-budget 단독 fix 의 KR-extreme density 미해결 → token-count fallback retry + RC1-residual trait dispatch 정합.

핵심 변경:
- kebab-nli::NliVerifier: `hypothesis_token_count(&str) -> Result<usize>` trait method 추가 (default `Ok(0)` backward-compat). `OnnxNliVerifier` 가 *trait impl block* 안에서 real mDeBERTa tokenize override — vtable 등록 보장 (round-3 critic RC1-residual closure).
- kebab-rag::pipeline: `MAX_NLI_HYPOTHESIS_CHARS_INITIAL = 1200` + `MAX_NLI_HYPOTHESIS_CHARS_MIN = 150` const + `pub(crate) fn truncate_chars` pure-fn + `pub fn truncate_hypothesis_for_nli_with_budget` retry helper (char budget 반감 retry, min floor 시 graceful unavailable). step 8.5 hook 의 callsite explicit `match` + `return self.refuse_nli_model_unavailable` 패턴 (`?` 금지 — round-2 plan critic CRITICAL #1 closure).
- SpyNliVerifier 신규 helper (closure score_fn + hypothesis_token_count_fn, 2-arg constructor).
- §5.1 의 2 ignored test (EN-long err + vtable dispatch RC1-residual pin) + §5.2 의 4 boundary test (truncate_chars) + §5.3 의 3 mock multi-hop test (long_en_grounded / long_kr_retries / unrelenting_fallback). +7 new tests (2 ignored default skip).
- tasks/HOTFIXES.md 신규 dated entry `## 2026-05-26 — S3 NLI unavailable ...` — Symptom / Root cause / Action / Amends 4-block.
- spec + plan (`docs/superpowers/{specs,plans}/2026-05-26-s3-nli-model-unavailable-diagnose-*.md`) — 4 round spec + 3 round plan OMC reviewer ACCEPT 산출물.

검증:
- cargo test -p kebab-nli -j 1 → 11/11 pass + 7 ignored default skip.
- cargo test -p kebab-rag -j 1 → 19+3+3+... 전체 pass + 3 new mock + 4 new boundary.
- cargo test --workspace --no-fail-fast -j 1 → **1313 pass (+7 new)**, 0 failed. 회귀 0 (HOTFIX #15 이미 fixed, no remaining flaky).
- cargo clippy --workspace --all-targets -j 1 -- -D warnings clean (type_complexity allow on Arc<dyn Fn> type aliases).

KR safe (token-count retry path) + graceful fallback (min floor 시 기존 unavailable wire 유지, regression 0). Wire 영향 없음 (additive trait method). Cargo bump 불필요.

Refs:
- spec: docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md (4 round APPROVE — analyst → critic + verifier × 4 rounds)
- plan: docs/superpowers/plans/2026-05-26-s3-nli-model-unavailable-diagnose-plan.md (3 round ACCEPT — planner → critic-plan + verifier-plan × 3 rounds)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 crates/kebab-nli/src/lib.rs                   |  15 +
 crates/kebab-nli/src/onnx.rs                  |  25 +
 crates/kebab-nli/tests/inference.rs           |  56 ++
 crates/kebab-rag/src/lib.rs                   |   6 +-
 crates/kebab-rag/src/pipeline.rs              | 151 +++-
 crates/kebab-rag/tests/common/mod.rs          |  49 ++
 .../kebab-rag/tests/multi_hop_nli_truncate.rs | 245 +++++++
 ...-s3-nli-model-unavailable-diagnose-plan.md | 447 ++++++++++++
 ...-s3-nli-model-unavailable-diagnose-spec.md | 646 ++++++++++++++++++
 tasks/HOTFIXES.md                             |  18 +
 10 files changed, 1654 insertions(+), 4 deletions(-)
 create mode 100644 crates/kebab-rag/tests/multi_hop_nli_truncate.rs
 create mode 100644 docs/superpowers/plans/2026-05-26-s3-nli-model-unavailable-diagnose-plan.md
 create mode 100644 docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md
diff --git a/crates/kebab-nli/src/lib.rs b/crates/kebab-nli/src/lib.rs
index 7a2b43a..cc87505 100644
--- a/crates/kebab-nli/src/lib.rs
+++ b/crates/kebab-nli/src/lib.rs
@@ -47,6 +47,21 @@ impl NliScores {
 /// entails hypothesis ⇒ answer is grounded in retrieved evidence).
 pub trait NliVerifier: Send + Sync {
     fn score(&self, premise: &str, hypothesis: &str) -> anyhow::Result<NliScores>;
+
+    /// Probe-only tokenize for caller-side budget verification. S3
+    /// follow-up (2026-05-26) — pipeline 의 char-budget retry loop 가
+    /// 이 API 로 mDeBERTa-v3 의 `OnlyFirst` dead-end (hypothesis 단독이
+    /// 512-token cap 초과 시 truncate 불가) 를 회피.
+    ///
+    /// **Default impl 반환 `Ok(0)`** — 기존 mock implementations
+    /// (`MockNliVerifier` 등) 가 trait 확장 후에도 backward-compat
+    /// (compile fail 회피, retry loop immediate 통과). `OnnxNliVerifier`
+    /// 는 real tokenizer 로 *trait impl 블록 안에서* override 해야 함
+    /// — inherent method 는 vtable 미등록 → trait dispatch 시 default
+    /// 호출 → production silent NO-OP.
+    fn hypothesis_token_count(&self, _hypothesis: &str) -> anyhow::Result<usize> {
+        Ok(0)
+    }
 }
 
 /// Numerically stable 3-way softmax (subtract max for log-sum-exp safety).
diff --git a/crates/kebab-nli/src/onnx.rs b/crates/kebab-nli/src/onnx.rs
index 9b91db5..981b69e 100644
--- a/crates/kebab-nli/src/onnx.rs
+++ b/crates/kebab-nli/src/onnx.rs
@@ -66,6 +66,13 @@ pub struct OnnxNliVerifier {
 }
 
 impl OnnxNliVerifier {
+    /// Hypothesis-side budget. Pipeline 의
+    /// `truncate_hypothesis_for_nli_with_budget` retry loop 가 char-truncate
+    /// 후 token-count 재검증 시 이 값을 cap 으로 사용. = `MAX_TOKENS`
+    /// (512) - 3 special tokens reserved (CLS, SEP, SEP) - 253 premise
+    /// room (caller decides). 안전 마진 (S3 follow-up 2026-05-26).
+    pub const HYPOTHESIS_TOKEN_BUDGET: usize = 256;
+
     /// Construct a verifier from the user's `Config`. Eagerly resolves
     /// `cache_dir = config.storage.model_dir/nli/<sanitized-model-id>/`
     /// and runs `create_dir_all` so the first `score` call can drop
@@ -278,6 +285,24 @@ impl NliVerifier for OnnxNliVerifier {
         let l = [logits[[0, 0]], logits[[0, 1]], logits[[0, 2]]];
         Ok(NliScores::from_xnli_logits(l))
     }
+
+    /// **Override** the trait default `Ok(0)` with a real mDeBERTa
+    /// tokenize. Pipeline 의 `truncate_hypothesis_for_nli_with_budget`
+    /// retry loop 가 이 method 를 vtable 통해 호출 — production code
+    /// path 에서 실 token count 측정.
+    ///
+    /// **CRITICAL placement**: 이 method 는 *trait impl block 안* 에
+    /// 위치해야 vtable 에 등록 — inherent `impl OnnxNliVerifier {}` 안에
+    /// 두면 dispatch 시 trait default (`Ok(0)`) 호출 → retry loop
+    /// 즉시 통과 → production silent NO-OP (S3 follow-up 2026-05-26
+    /// RC1-residual closure).
+    fn hypothesis_token_count(&self, hypothesis: &str) -> Result<usize> {
+        let (_session, tokenizer) = self.ensure_loaded()?;
+        let enc = tokenizer
+            .encode(hypothesis, /*add_special_tokens=*/ false)
+            .map_err(|e| anyhow!("kebab-nli: tokenizer.encode (probe) failed: {e}"))?;
+        Ok(enc.get_ids().len())
+    }
 }
 
 /// Make a HuggingFace model id (`"owner/repo"`) into a single
diff --git a/crates/kebab-nli/tests/inference.rs b/crates/kebab-nli/tests/inference.rs
index 4115199..a702a72 100644
--- a/crates/kebab-nli/tests/inference.rs
+++ b/crates/kebab-nli/tests/inference.rs
@@ -139,3 +139,59 @@ fn empty_hypothesis_returns_err() {
         "expected 'empty hypothesis' in error, got: {msg}"
     );
 }
+
+/// Test 6 (S3 follow-up 2026-05-26): EN-long hypothesis alone exceeds
+/// max_length. Without pipeline-side truncation, `OnlyFirst` strategy
+/// dead-ends. Pin raw nli crate behavior so any future regression in
+/// the pipeline-side budget surfaces as a clear nli-level err.
+#[test]
+#[ignore]
+fn score_long_en_hypothesis_returns_err_without_pipeline_truncation() {
+    let cfg = Config::defaults();
+    let v = OnnxNliVerifier::new(&cfg).expect("verifier construction");
+    let premise = "short premise";
+    let hypothesis = "lorem ipsum ".repeat(500); // ~6 000 chars / >>512 tokens
+    let result = v.score(premise, &hypothesis);
+    assert!(result.is_err(), "long hypothesis should err under OnlyFirst");
+    let msg = result.err().unwrap().to_string();
+    assert!(
+        msg.contains("Truncation error") || msg.contains("too short to respect"),
+        "expected tokenizer truncation err, got: {msg}"
+    );
+}
+
+/// Test 7 (S3 follow-up 2026-05-26): `hypothesis_token_count` helper —
+/// pure tokenizer probe. **vtable dispatch 검증** (RC1-residual pin) —
+/// concrete type 호출은 inherent method 우선이라 RC1-residual 버그
+/// 잡지 못함; `&dyn NliVerifier` 통해 dispatch 해야 vtable 등록 검증.
+/// inherent-only 배치 시 default `Ok(0)` 반환 → `assert!(count > 0)`
+/// 실패. trait impl block 배치 시 real tokenizer → PASS. Pipeline 이
+/// retry budget 결정에 사용하는 API 의 정확성 pin.
+#[test]
+#[ignore]
+fn hypothesis_token_count_dispatches_correctly_via_dyn_trait() {
+    let cfg = Config::defaults();
+    let v = OnnxNliVerifier::new(&cfg).expect("verifier construction");
+    // ★ vtable dispatch — &dyn NliVerifier 통해 호출. inherent-only
+    // 배치 시 default `Ok(0)` 반환 → assert!(count > 0) 실패.
+    // trait impl block 배치 시 real tokenizer → PASS. RC1-residual
+    // 의 코드-수준 regression pin.
+    let v_dyn: &dyn NliVerifier = &v;
+    // 짧은 EN — 4 chars/token 추정 (27 chars / 4 = ~6 tokens)
+    let en_count = v_dyn
+        .hypothesis_token_count("short english test sentence")
+        .expect("EN dyn dispatch must reach real tokenizer (vtable check)");
+    assert!(
+        en_count > 0 && en_count < 20,
+        "EN ~6 tokens expected via vtable dispatch, got {en_count} \
+         (Ok(0) signals inherent-only placement bug — RC1-residual)"
+    );
+    // 짧은 KR — 1-2 chars/token (15 chars / 1.5 = ~10 tokens)
+    let kr_count = v_dyn
+        .hypothesis_token_count("짧은 한국어 테스트 문장입니다")
+        .expect("KR dyn dispatch must reach real tokenizer");
+    assert!(
+        kr_count > 0 && kr_count < 30,
+        "KR ~10 tokens expected, got {kr_count}"
+    );
+}
diff --git a/crates/kebab-rag/src/lib.rs b/crates/kebab-rag/src/lib.rs
index 1cc5328..33122ca 100644
--- a/crates/kebab-rag/src/lib.rs
+++ b/crates/kebab-rag/src/lib.rs
@@ -22,4 +22,8 @@ pub use kebab_core::{Answer, AnswerCitation, AnswerRetrievalSummary, RefusalReas
 
 mod pipeline;
 
-pub use pipeline::{AskOpts, MAX_NLI_PREMISE_CHARS, RagPipeline, StreamEvent, truncate_for_nli};
+pub use pipeline::{
+    AskOpts, MAX_NLI_HYPOTHESIS_CHARS_INITIAL, MAX_NLI_HYPOTHESIS_CHARS_MIN,
+    MAX_NLI_PREMISE_CHARS, RagPipeline, StreamEvent, truncate_for_nli,
+    truncate_hypothesis_for_nli_with_budget,
+};
diff --git a/crates/kebab-rag/src/pipeline.rs b/crates/kebab-rag/src/pipeline.rs
index b07a8f4..cf9a028 100644
--- a/crates/kebab-rag/src/pipeline.rs
+++ b/crates/kebab-rag/src/pipeline.rs
@@ -1038,14 +1038,37 @@ impl RagPipeline {
                 "verifier must be Some when nli_threshold > 0.0 \
                  (kebab-app's open_with_config enforces this invariant)",
             );
-            let (truncated_premise, was_truncated) = truncate_for_nli(&packed_text);
-            if was_truncated {
+            let (truncated_premise, premise_was_truncated) = truncate_for_nli(&packed_text);
+            if premise_was_truncated {
                 tracing::debug!(
                     target: "kebab-rag",
                     "NLI premise truncated to MAX_NLI_PREMISE_CHARS for entailment check"
                 );
             }
-            match v.score(&truncated_premise, &acc) {
+            // S3 follow-up (2026-05-26): hypothesis-side budget + token-count
+            // fallback retry. `?` 사용 금지 — wire `answer.v1 + NliModelUnavailable
+            // refusal` 유지 (graceful fallback, regression 0). v.score() Err
+            // 분기와 *대칭* explicit match + return refuse.
+            let (truncated_hypothesis, hypothesis_was_truncated) =
+                match truncate_hypothesis_for_nli_with_budget(v.as_ref(), &acc) {
+                    Ok(x) => x,
+                    Err(e) => {
+                        tracing::warn!(
+                            target: "kebab-rag",
+                            error = %e,
+                            "NLI hypothesis budget retry exhausted; refusing with NliModelUnavailable"
+                        );
+                        return self.refuse_nli_model_unavailable(query, &opts, hops, started);
+                    }
+                };
+            if hypothesis_was_truncated {
+                tracing::debug!(
+                    target: "kebab-rag",
+                    original_chars = acc.chars().count(),
+                    "NLI hypothesis truncated to MAX_NLI_HYPOTHESIS_CHARS"
+                );
+            }
+            match v.score(&truncated_premise, &truncated_hypothesis) {
                 Ok(scores) => {
                     let passed = scores.entailment >= self.config.rag.nli_threshold;
                     Some(VerificationSummary {
@@ -1816,6 +1839,82 @@ pub fn truncate_for_nli(premise: &str) -> (String, bool) {
     }
 }
 
+/// S3 follow-up (2026-05-26): NLI hypothesis (= synthesized answer)
+/// 가 mDeBERTa-v3 의 512-token cap 을 단독 초과하면 `OnlyFirst`
+/// truncation 이 premise 를 0 까지 잘라도 fit 시킬 수 없어 tokenizer
+/// `SequenceTooShortToTruncate` err. char-budget 으로 자른 후 *실
+/// mDeBERTa tokenizer* 로 token count 재검증 → 초과 시 char budget
+/// 절반으로 retry. KR-heavy hypothesis (1-2 chars/token) safe.
+pub const MAX_NLI_HYPOTHESIS_CHARS_INITIAL: usize = 1200;
+
+/// S3 follow-up (2026-05-26): retry budget 의 최소 floor. budget 이
+/// 이 값 미만으로 절반화되면 graceful `nli_model_unavailable` fallback
+/// 으로 빠짐 (regression 0). KR-extreme density (한자/CJK 의 1 char
+/// = 2-3 tokens) 케이스 보호.
+pub const MAX_NLI_HYPOTHESIS_CHARS_MIN: usize = 150;
+
+/// S3 follow-up (2026-05-26): chars-only truncation arithmetic. Pure
+/// fn: input → output, no side effect. Codepoint-aware (chars().count()
+/// + chars().take()) — KR / emoji / multi-byte 안전.
+///
+/// Used internally by `truncate_hypothesis_for_nli_with_budget` 의
+/// retry loop 의 각 step. Pure-fn unit tests (in this file's
+/// `#[cfg(test)] mod tests`) pin the arithmetic 회귀.
+pub(crate) fn truncate_chars(s: &str, budget: usize) -> (String, bool) {
+    if s.chars().count() <= budget {
+        (s.to_string(), false)
+    } else {
+        let truncated: String = s.chars().take(budget).collect();
+        (truncated, true)
+    }
+}
+
+/// S3 follow-up (2026-05-26): hypothesis-side budget + token-count
+/// fallback retry. Char-truncate (`Right` direction = front preserved
+/// — LLM 답변의 도입부에 핵심 claim 이 있음) → real mDeBERTa tokenizer
+/// 로 token count 재검증 → 초과 시 char budget 절반화 retry (1200 →
+/// 600 → 300 → 150). Min floor 미달 시 `anyhow::Err` — caller (step
+/// 8.5 hook) 가 graceful `nli_model_unavailable` refusal 로 fallback
+/// (regression 0).
+///
+/// Returns `(truncated_hypothesis, was_truncated)`. `was_truncated`
+/// 은 logging 용 — wire 영향 0.
+pub fn truncate_hypothesis_for_nli_with_budget(
+    verifier: &(dyn kebab_nli::NliVerifier + 'static),
+    hypothesis: &str,
+) -> anyhow::Result<(String, bool)> {
+    let original_chars = hypothesis.chars().count();
+    let mut budget = MAX_NLI_HYPOTHESIS_CHARS_INITIAL;
+    let mut was_truncated = false;
+
+    loop {
+        let (candidate, this_truncated) = truncate_chars(hypothesis, budget);
+        if this_truncated {
+            was_truncated = true;
+        }
+
+        // verifier 의 internal tokenizer 로 token count 재검증.
+        // trait method (vtable dispatch) — `OnnxNliVerifier` 는
+        // trait impl block 안에서 override (RC1-residual).
+        let token_count = verifier
+            .hypothesis_token_count(&candidate)
+            .with_context(|| "kebab-rag: hypothesis token-count probe failed")?;
+        if token_count <= kebab_nli::OnnxNliVerifier::HYPOTHESIS_TOKEN_BUDGET {
+            return Ok((candidate, was_truncated));
+        }
+
+        // 초과 — char budget 절반화 retry.
+        budget /= 2;
+        if budget < MAX_NLI_HYPOTHESIS_CHARS_MIN {
+            anyhow::bail!(
+                "kebab-rag: hypothesis remains over token budget after retry (original {original_chars} chars, last budget {} chars, tokens {token_count} > {})",
+                budget * 2,
+                kebab_nli::OnnxNliVerifier::HYPOTHESIS_TOKEN_BUDGET,
+            );
+        }
+    }
+}
+
 const MULTI_HOP_DECOMPOSE_SYSTEM_PROMPT: &str = "당신은 사용자의 질문을 다단계 검색에 필요한 sub-question 들로 분해하는 도구다.\n- multi-hop 정보가 필요한 경우 독립적으로 검색 가능한 sub-question 들로 분해한다.\n- 각 sub-question 은 자기 자신만으로 의미가 통해야 한다 (대명사 / \"위 답변\" 같은 reference 금지).\n- 원본이 이미 단순하면 원본 그대로 1 개만 반환한다.\n- 응답은 JSON array of strings 만 출력한다. 다른 prose / markdown fence / 설명 금지.";
 
 const MULTI_HOP_DECIDE_SYSTEM_PROMPT: &str = "당신은 multi-hop 검색의 매 iter 에서 \"추가 retrieval 이 필요한가?\" 를 판단하는 도구다.\n- 지금까지 모은 [근거] 가 [원본 질문] 의 모든 측면을 cover 하는지 평가한다.\n- 추가가 필요하면 새 sub-question 들 (이미 모은 정보로 답할 수 없는 부분만, 독립적으로 검색 가능한 형태로) 을 JSON array of strings 로 반환한다.\n- 충분하면 빈 array `[]` 를 반환한다.\n- 응답은 JSON array of strings 만 출력한다. 다른 prose / markdown fence / 설명 금지.\n- 각 sub-question 은 자기 자신만으로 의미가 통해야 한다 (대명사 / \"위 답변\" 같은 reference 금지).";
@@ -2097,6 +2196,52 @@ mod tests {
         assert_eq!(out, vec!["valid q"]);
     }
 
+    // ── S3 follow-up (2026-05-26): truncate_chars boundary tests ─────────
+    //
+    // Pure-fn arithmetic 회귀 핀. `truncate_chars` 가 `pub(crate)` 라
+    // integration test 파일에서 접근 불가 — 동일 crate 의 `#[cfg(test)]
+    // mod tests` 안에서 직접 호출.
+
+    #[test]
+    fn truncate_chars_identity_when_under_budget() {
+        let s = "short";
+        let (out, was_truncated) = truncate_chars(s, 100);
+        assert_eq!(out, s);
+        assert!(!was_truncated);
+    }
+
+    #[test]
+    fn truncate_chars_truncates_when_over_budget() {
+        let s = "abcdefghij"; // 10 chars
+        let (out, was_truncated) = truncate_chars(s, 3);
+        assert_eq!(out, "abc");
+        assert_eq!(out.chars().count(), 3);
+        assert!(was_truncated);
+    }
+
+    #[test]
+    fn truncate_chars_empty_input_is_identity() {
+        let (out, was_truncated) = truncate_chars("", 100);
+        assert_eq!(out, "");
+        assert!(!was_truncated);
+        // budget = 0 도 empty 입력에서는 identity.
+        let (out2, was_truncated2) = truncate_chars("", 0);
+        assert_eq!(out2, "");
+        assert!(!was_truncated2);
+    }
+
+    #[test]
+    fn truncate_chars_counts_codepoints_not_bytes() {
+        // "가나다라마" = 5 chars, 각 char 는 3 bytes (UTF-8) → 15 bytes.
+        // budget = 3 chars 일 때 "가나다" 3 chars / 9 bytes 가 정확.
+        let kr = "가나다라마";
+        let (out, was_truncated) = truncate_chars(kr, 3);
+        assert_eq!(out, "가나다");
+        assert_eq!(out.chars().count(), 3);
+        assert_eq!(out.len(), 9, "3 KR codepoints × 3 bytes/char = 9 bytes");
+        assert!(was_truncated);
+    }
+
     #[test]
     fn est_tokens_approx_quarters() {
         assert_eq!(est_tokens(""), 0);
diff --git a/crates/kebab-rag/tests/common/mod.rs b/crates/kebab-rag/tests/common/mod.rs
index 6d30b2c..ed7bee0 100644
--- a/crates/kebab-rag/tests/common/mod.rs
+++ b/crates/kebab-rag/tests/common/mod.rs
@@ -455,3 +455,52 @@ impl NliVerifier for MockNliVerifier {
         }
     }
 }
+
+/// S3 follow-up (2026-05-26): closure type aliases for `SpyNliVerifier`
+/// fields — clippy `type_complexity` 회피.
+#[allow(clippy::type_complexity)]
+pub type ScoreFn = Arc<dyn Fn(&str, &str) -> anyhow::Result<NliScores> + Send + Sync>;
+#[allow(clippy::type_complexity)]
+pub type HypothesisTokenCountFn = Arc<dyn Fn(&str) -> anyhow::Result<usize> + Send + Sync>;
+
+/// S3 follow-up (2026-05-26): closure-based NLI verifier — caller 가
+/// `(premise, hypothesis) -> Result<NliScores>` + `(hypothesis) ->
+/// Result<usize>` 두 closures 정의 가능 + spy 로 입력 capture. 기존
+/// `MockNliVerifier` (고정 mode) 와 sibling. truncate_hypothesis_for_nli_with_budget
+/// retry loop 의 token-count 시뮬레이션 + final score 동작을 한 verifier
+/// 안에서 inject 하기 위함.
+pub struct SpyNliVerifier {
+    pub score_fn: ScoreFn,
+    pub hypothesis_token_count_fn: HypothesisTokenCountFn,
+    pub received_premises: Mutex<Vec<String>>,
+    pub received_hypotheses: Mutex<Vec<String>>,
+}
+
+impl SpyNliVerifier {
+    /// 2-arg constructor — score + hypothesis_token_count 둘 다 closure
+    /// 로 주입. `Arc<T>` field mutation 불가 회피를 위해 한 번에 전달.
+    pub fn new<F, G>(score_fn: F, token_count_fn: G) -> Arc<Self>
+    where
+        F: Fn(&str, &str) -> anyhow::Result<NliScores> + Send + Sync + 'static,
+        G: Fn(&str) -> anyhow::Result<usize> + Send + Sync + 'static,
+    {
+        Arc::new(Self {
+            score_fn: Arc::new(score_fn),
+            hypothesis_token_count_fn: Arc::new(token_count_fn),
+            received_premises: Mutex::new(Vec::new()),
+            received_hypotheses: Mutex::new(Vec::new()),
+        })
+    }
+}
+
+impl NliVerifier for SpyNliVerifier {
+    fn score(&self, premise: &str, hypothesis: &str) -> anyhow::Result<NliScores> {
+        self.received_premises.lock().unwrap().push(premise.to_string());
+        self.received_hypotheses.lock().unwrap().push(hypothesis.to_string());
+        (self.score_fn)(premise, hypothesis)
+    }
+
+    fn hypothesis_token_count(&self, hypothesis: &str) -> anyhow::Result<usize> {
+        (self.hypothesis_token_count_fn)(hypothesis)
+    }
+}
diff --git a/crates/kebab-rag/tests/multi_hop_nli_truncate.rs b/crates/kebab-rag/tests/multi_hop_nli_truncate.rs
new file mode 100644
index 0000000..aa86e38
--- /dev/null
+++ b/crates/kebab-rag/tests/multi_hop_nli_truncate.rs
@@ -0,0 +1,245 @@
+//! S3 follow-up (2026-05-26): hypothesis-side char budget + token-count
+//! fallback retry — multi-hop integration tests via `SpyNliVerifier`.
+//!
+//! Coverage:
+//!
+//! 1. `long_en_synth_answer_truncated_before_nli_call` — EN long answer
+//!    → char-budget 만으로 충분 (token_count Ok(100)) → retry 0 회 →
+//!    hypothesis 가 정확히 1200 chars 로 truncate + Right direction pin
+//!    (앞부분 보존).
+//! 2. `long_kr_synth_answer_retries_with_smaller_budget` — KR-sim long
+//!    answer → token_count fn 이 chars > 1000 → 900 / > 500 → 450 /
+//!    else → 220 시뮬레이션 → retry >= 3 회 + 최종 hypothesis ≤ 300
+//!    chars + happy path. KR safety pin.
+//! 3. `unrelenting_token_overflow_falls_through_to_unavailable` —
+//!    token_count fn 이 무조건 Ok(9_999) → retry 소진 → graceful
+//!    `NliModelUnavailable` refusal (regression 0).
+//!
+//! Pipeline construction pattern (Option B inline — plan §2 step 8):
+//! 각 test 안에서 `RagEnv::new()` + `ScriptedRetriever::new(...)` +
+//! `ScriptedLm::new(vec![decompose, decide, synth])` +
+//! `RagPipeline::new(...).with_verifier(verifier)` inline. helper
+//! `build_test_pipeline_with_long_answer` 미작성 (1회용 + 매 test 마다
+//! long-answer 길이/언어 다름).
+
+mod common;
+
+use std::sync::{Arc, Mutex};
+
+use common::{RagEnv, ScriptedLm, ScriptedRetriever, SpyNliVerifier, id32, mk_hit};
+use kebab_core::{LanguageModel, RefusalReason, Retriever, SearchMode};
+use kebab_nli::{NliScores, NliVerifier};
+use kebab_rag::{AskOpts, RagPipeline};
+
+/// Default `AskOpts` for multi-hop tests: deterministic seed, lexical
+/// mode (so the test crate doesn't need to wire up an embedder), and
+/// `multi_hop: true` to route through `ask_multi_hop`.
+fn multi_hop_opts() -> AskOpts {
+    AskOpts {
+        k: 5,
+        explain: false,
+        mode: SearchMode::Lexical,
+        temperature: Some(0.0),
+        seed: Some(0),
+        stream_sink: None,
+        history: Vec::new(),
+        conversation_id: None,
+        turn_index: None,
+        multi_hop: true,
+    }
+}
+
+/// EN long answer (5 000 chars) → char-budget 만으로 충분 → retry 0회 →
+/// grounded. Right direction pin: hypothesis 의 첫 1200 chars 가 input 의
+/// 첫 1200 chars 와 일치 (= Right direction = 앞부분 보존).
+#[test]
+fn long_en_synth_answer_truncated_before_nli_call() {
+    let env = RagEnv::new();
+    let cid = id32("c1");
+    let did = id32("d1");
+    env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
+    let hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
+
+    let mut cfg = env.config.clone();
+    cfg.rag.nli_threshold = 0.5;
+
+    // ScriptedRetriever: entry 0 = probe, entry 1 = q1 sub-query.
+    let retriever = Arc::new(ScriptedRetriever::new(vec![hits.clone(), hits]));
+    let retriever_dyn: Arc<dyn Retriever> = retriever;
+
+    // Synthesize answer ~6000 chars (lorem ipsum, 12 chars × 500 reps),
+    // citation marker appended after so `grounded_unaware` 통과.
+    let long_answer = format!("{} [#1]", "lorem ipsum ".repeat(500));
+    let lm = Arc::new(ScriptedLm::new(vec![
+        r#"["q1"]"#,
+        r"[]",
+        &long_answer,
+    ]));
+    let lm_dyn: Arc<dyn LanguageModel> = lm;
+
+    let verifier = SpyNliVerifier::new(
+        |_premise, _hypothesis| {
+            Ok(NliScores {
+                entailment: 0.9,
+                neutral: 0.05,
+                contradiction: 0.05,
+            })
+        },
+        |_h| Ok(100), // budget 안 — retry 0
+    );
+    let verifier_handle = verifier.clone();
+    let verifier_dyn: Arc<dyn NliVerifier> = verifier;
+
+    let pipeline = RagPipeline::new(cfg, retriever_dyn, lm_dyn, env.sqlite.clone())
+        .with_verifier(verifier_dyn);
+
+    let answer = pipeline.ask("compound", multi_hop_opts()).unwrap();
+
+    let received = verifier_handle.received_hypotheses.lock().unwrap();
+    assert_eq!(received.len(), 1, "verifier called exactly once");
+
+    let hyp = &received[0];
+    assert_eq!(
+        hyp.chars().count(),
+        1200,
+        "hypothesis truncated to MAX_NLI_HYPOTHESIS_CHARS_INITIAL"
+    );
+
+    // Right direction pin — hypothesis 의 첫 1200 chars 가 input 의
+    // 첫 1200 chars 와 일치 (= Right direction = 앞부분 보존). Left/Middle
+    // direction 으로 regress 시 본 test 가 즉시 fail.
+    let input_first_1200: String = long_answer.chars().take(1200).collect();
+    assert_eq!(
+        hyp.as_str(),
+        input_first_1200.as_str(),
+        "Right direction = front preserved"
+    );
+
+    assert_eq!(
+        answer.refusal_reason, None,
+        "long answer must reach happy path"
+    );
+}
+
+/// KR long answer → token count > budget → char budget 절반화 retry →
+/// eventual fit. KR safety pin (1-2 chars/token density 시뮬레이션).
+#[test]
+fn long_kr_synth_answer_retries_with_smaller_budget() {
+    let env = RagEnv::new();
+    let cid = id32("c1");
+    let did = id32("d1");
+    env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
+    let hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
+
+    let mut cfg = env.config.clone();
+    cfg.rag.nli_threshold = 0.5;
+
+    let retriever = Arc::new(ScriptedRetriever::new(vec![hits.clone(), hits]));
+    let retriever_dyn: Arc<dyn Retriever> = retriever;
+
+    // ~2500-char KR-sim answer (한국어 6 chars × 416 reps ≈ 2496 chars)
+    // + citation marker.
+    let kr_long_answer = format!("{} [#1]", "한국어 본문 ".repeat(416));
+    let lm = Arc::new(ScriptedLm::new(vec![
+        r#"["q1"]"#,
+        r"[]",
+        &kr_long_answer,
+    ]));
+    let lm_dyn: Arc<dyn LanguageModel> = lm;
+
+    let token_count_call_count = Arc::new(Mutex::new(0_usize));
+    let tcc = token_count_call_count.clone();
+    // 시뮬레이션: 1200 chars → 900 tokens (cap 초과), 600 chars → 450
+    // tokens (cap 초과), 300 chars → 220 tokens (cap 안). retry 3 회.
+    let verifier = SpyNliVerifier::new(
+        |_premise, _hypothesis| {
+            Ok(NliScores {
+                entailment: 0.85,
+                neutral: 0.10,
+                contradiction: 0.05,
+            })
+        },
+        move |h| {
+            *tcc.lock().unwrap() += 1;
+            let count = h.chars().count();
+            if count > 1000 {
+                Ok(900)
+            } else if count > 500 {
+                Ok(450)
+            } else {
+                Ok(220)
+            }
+        },
+    );
+    let verifier_handle = verifier.clone();
+    let verifier_dyn: Arc<dyn NliVerifier> = verifier;
+
+    let pipeline = RagPipeline::new(cfg, retriever_dyn, lm_dyn, env.sqlite.clone())
+        .with_verifier(verifier_dyn);
+
+    let answer = pipeline.ask("compound", multi_hop_opts()).unwrap();
+
+    assert!(
+        *token_count_call_count.lock().unwrap() >= 3,
+        "retry loop must call token_count >= 3 (1200, 600, 300 candidates)"
+    );
+    let received = verifier_handle.received_hypotheses.lock().unwrap();
+    assert!(
+        received[0].chars().count() <= 300,
+        "final hypothesis <= 300 chars after retry, got {}",
+        received[0].chars().count()
+    );
+    assert_eq!(
+        answer.refusal_reason, None,
+        "KR long answer reaches happy path after retry"
+    );
+}
+
+/// Retry budget 소진 시 graceful unavailable — fix 의 fallback path 가
+/// 기존 unavailable wire shape 유지 (regression 0).
+#[test]
+fn unrelenting_token_overflow_falls_through_to_unavailable() {
+    let env = RagEnv::new();
+    let cid = id32("c1");
+    let did = id32("d1");
+    env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
+    let hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
+
+    let mut cfg = env.config.clone();
+    cfg.rag.nli_threshold = 0.5;
+
+    let retriever = Arc::new(ScriptedRetriever::new(vec![hits.clone(), hits]));
+    let retriever_dyn: Arc<dyn Retriever> = retriever;
+
+    // ~3000-char answer + marker — over budget regardless.
+    let unrelenting_answer = format!("{} [#1]", "단단한 압축 ".repeat(500));
+    let lm = Arc::new(ScriptedLm::new(vec![
+        r#"["q1"]"#,
+        r"[]",
+        &unrelenting_answer,
+    ]));
+    let lm_dyn: Arc<dyn LanguageModel> = lm;
+
+    let verifier = SpyNliVerifier::new(
+        |_premise, _hypothesis| {
+            unreachable!("score not reached when token-count check fails");
+        },
+        |_h| Ok(9_999), // 모든 budget 에서 token count 초과 — retry 소진
+    );
+    let verifier_dyn: Arc<dyn NliVerifier> = verifier;
+
+    let pipeline = RagPipeline::new(cfg, retriever_dyn, lm_dyn, env.sqlite.clone())
+        .with_verifier(verifier_dyn);
+
+    let answer = pipeline.ask("compound", multi_hop_opts()).unwrap();
+
+    assert_eq!(
+        answer.refusal_reason,
+        Some(RefusalReason::NliModelUnavailable),
+        "graceful fallback to unavailable when retry exhausted"
+    );
+    assert!(
+        answer.verification.is_none(),
+        "NliModelUnavailable: verification stays None"
+    );
+}
diff --git a/docs/superpowers/plans/2026-05-26-s3-nli-model-unavailable-diagnose-plan.md b/docs/superpowers/plans/2026-05-26-s3-nli-model-unavailable-diagnose-plan.md
new file mode 100644
index 0000000..e062698
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-26-s3-nli-model-unavailable-diagnose-plan.md
@@ -0,0 +1,447 @@
+---
+title: "S3 NLI unavailable implementation plan v1 — hypothesis-side char budget + token-count fallback retry"
+date: 2026-05-26
+task_id: s3-nli-unavailable
+status: open
+target_version: 0.18.1
+design: ../specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md
+---
+
+# S3 NLI unavailable — implementation plan v1
+
+## §0. 개요
+
+S3 dogfood query (`"Why does kebab combine multilingual-e5, LanceDB, and RRF together?"`) 가 `nli_model_unavailable` 로 일관 fail 하는 root cause 는 mDeBERTa-v3 tokenizer 의 `TruncationStrategy::OnlyFirst` + LLM 의 949-token (4564-char) 장문 답변 — *hypothesis 단독이 max_length cap (512) 을 초과* 하여 premise 를 0 까지 잘라도 fit 시킬 수 없어 tokenizer 가 `SequenceTooShortToTruncate` raise. 본 plan 은 spec §3 의 **Option A** (KR safety + graceful fallback) 를 단일 PR 로 구현 — `kebab-nli::NliVerifier` trait 에 `hypothesis_token_count` probe-only API 1 개 추가 + `OnnxNliVerifier` 가 *trait impl 블록 안에서* real tokenizer 로 override (RC1-residual: inherent impl 은 vtable 미등록 → silent NO-OP) + `kebab-rag::pipeline` 에 char-budget retry helper (`truncate_hypothesis_for_nli_with_budget`) + step 8.5 hook 의 callsite 수정. 총 3 production files + 1 test scaffold + 2 test files + 1 doc entry (round-1 plan critic MAJOR #2 정정). Wire schema 변경 없음, `Cargo.toml` version bump 불필요 — test/diagnostic fix.
+
+---
+
+## §1. 단일 PR 변경 file list
+
+| 파일 | 종류 | 변경 요지 |
+|---|---|---|
+| `crates/kebab-nli/src/lib.rs` | production | `NliVerifier` trait 에 `hypothesis_token_count(&self, &str) -> anyhow::Result<usize>` method 추가 (default impl `Ok(0)` — backward-compat). |
+| `crates/kebab-nli/src/onnx.rs` | production | inherent `OnnxNliVerifier::HYPOTHESIS_TOKEN_BUDGET = 256` const + **trait impl block 안** 에서 `hypothesis_token_count` override (real `tokenizer.encode` probe). |
+| `crates/kebab-rag/src/pipeline.rs` | production | `MAX_NLI_HYPOTHESIS_CHARS_INITIAL = 1200` + `MAX_NLI_HYPOTHESIS_CHARS_MIN = 150` const + `truncate_chars(s, budget)` pure-fn + `truncate_hypothesis_for_nli_with_budget(verifier, hypothesis)` retry helper + step 8.5 callsite (line 1041 근방) hypothesis-side hook 추가 + `tracing::debug` log. |
+| `crates/kebab-rag/tests/common/mod.rs` | test scaffold | `SpyNliVerifier` helper (closure-based 2-arg constructor: `score_fn` + `token_count_fn` + capture `Mutex<Vec<String>>`). 기존 `MockNliVerifier` 와 sibling 으로 공존. |
+| `crates/kebab-rag/tests/multi_hop_nli_truncate.rs` | 신규 test | spec §5.3 의 3 mock multi-hop tests (EN happy / KR retry / unrelenting fallback). |
+| `crates/kebab-nli/tests/inference.rs` | 기존 test 확장 | spec §5.1 의 2 신규 `#[ignore]` tests (long EN err pin + token_count 의 bounded range pin). |
+| `tasks/HOTFIXES.md` | doc | 신규 dated entry `## 2026-05-26 — S3 NLI unavailable — hypothesis truncate + token-count fallback` (Symptom / Root cause / Action / Amends 4-block, HOTFIX 번호 미부여 — sibling fb-41 layer follow-up). |
+
+총 **7 files** (4 production + 2 test + 1 doc). `truncate_chars` pure-fn 의 4 boundary test 는 `pipeline.rs` 내부 `#[cfg(test)] mod tests` 또는 `multi_hop_nli_truncate.rs` 의 별 `mod` 로 흡수 — 신규 file 미추가.
+
+---
+
+## §2. 구현 step list
+
+Subagent 가 *spec §3 + §5 의 코드 블록을 verbatim 으로* 따라 12 step. 각 step 의 acceptance check 도 명시.
+
+1. **`kebab-nli/src/lib.rs` — trait 확장.**
+   - `NliVerifier` trait 에 `hypothesis_token_count(&self, _hypothesis: &str) -> anyhow::Result<usize> { Ok(0) }` default impl 추가.
+   - Doc comment: spec §3 의 "Default impl 반환 0 — backward-compat, OnnxNliVerifier 는 *trait impl 블록 안에서* override" 명시.
+   - Acceptance: `cargo check -p kebab-nli` 통과. 기존 `MockNliVerifier` (rag tests/common/mod.rs) 가 default impl 상속 → 명시 override 안 해도 compile.
+
+2. **`kebab-nli/src/onnx.rs` — inherent const 추가.**
+   - `impl OnnxNliVerifier { pub const HYPOTHESIS_TOKEN_BUDGET: usize = 256; }` (기존 `MAX_TOKENS = 512` 와 sibling 위치).
+   - Doc comment: "= MAX_TOKENS (512) - 3 special tokens reserved (CLS, SEP, SEP) - 253 premise room".
+   - Acceptance: `cargo check -p kebab-nli` 통과.
+
+3. **`kebab-nli/src/onnx.rs` — trait impl 안 override (RC1-residual 핵심).**
+   - `impl NliVerifier for OnnxNliVerifier { ... fn score(...) ... fn hypothesis_token_count(&self, hypothesis: &str) -> anyhow::Result<usize> { let (_session, tokenizer) = self.ensure_loaded()?; let enc = tokenizer.encode(hypothesis, /*add_special_tokens=*/ false).map_err(|e| anyhow!("kebab-nli: tokenizer.encode (probe) failed: {e}"))?; Ok(enc.get_ids().len()) } }`
+   - **반드시 trait impl 블록 안 — inherent `impl OnnxNliVerifier {}` 안에 두면 vtable 미등록 → trait dispatch 시 default `Ok(0)` 호출 → production silent NO-OP** (round-3 critic RC1-residual closure).
+   - Acceptance: `cargo check -p kebab-nli` 통과. spec §5.1 Test 7 (`hypothesis_token_count_dispatches_correctly_via_dyn_trait`) 가 vtable 통해 호출 검증.
+
+4. **`kebab-rag/src/pipeline.rs` — consts 추가.**
+   - 기존 `MAX_NLI_PREMISE_CHARS` (line 1803 근방) 직후에 `pub const MAX_NLI_HYPOTHESIS_CHARS_INITIAL: usize = 1200;` + `pub const MAX_NLI_HYPOTHESIS_CHARS_MIN: usize = 150;`.
+   - Doc comment: spec §3 의 KR safety + retry rationale 명시 + "round-1 critic H3 closure" cross-link.
+   - Acceptance: `cargo check -p kebab-rag` 통과.
+
+5. **`kebab-rag/src/pipeline.rs` — `truncate_chars(s, budget)` pure-fn sub-helper.**
+   - `pub(crate) fn truncate_chars(s: &str, budget: usize) -> (String, bool) { if s.chars().count() <= budget { (s.to_string(), false) } else { let truncated: String = s.chars().take(budget).collect(); (truncated, true) } }`
+   - 기존 `truncate_for_nli` (line 1810 근방) 와 sibling 위치.
+   - Acceptance: `cargo check -p kebab-rag` 통과. spec §5.2 의 4 boundary tests 가 이 helper 호출.
+
+6. **`kebab-rag/src/pipeline.rs` — `truncate_hypothesis_for_nli_with_budget` retry helper.**
+   - spec §3 의 코드 블록 verbatim:
+     ```rust
+     pub fn truncate_hypothesis_for_nli_with_budget(
+         verifier: &(dyn kebab_nli::NliVerifier + 'static),
+         hypothesis: &str,
+     ) -> anyhow::Result<(String, bool)> {
+         let original_chars = hypothesis.chars().count();
+         let mut budget = MAX_NLI_HYPOTHESIS_CHARS_INITIAL;
+         let mut was_truncated = false;
+         loop {
+             let candidate: String = if original_chars <= budget {
+                 hypothesis.to_string()
+             } else {
+                 was_truncated = true;
+                 hypothesis.chars().take(budget).collect()
+             };
+             let token_count = verifier
+                 .hypothesis_token_count(&candidate)
+                 .with_context(|| "kebab-rag: hypothesis token-count probe failed")?;
+             if token_count <= kebab_nli::OnnxNliVerifier::HYPOTHESIS_TOKEN_BUDGET {
+                 return Ok((candidate, was_truncated));
+             }
+             budget = budget / 2;
+             if budget < MAX_NLI_HYPOTHESIS_CHARS_MIN {
+                 anyhow::bail!(
+                     "kebab-rag: hypothesis remains over token budget after retry (original {original_chars} chars, last budget {} chars, tokens {token_count} > {})",
+                     budget * 2,
+                     kebab_nli::OnnxNliVerifier::HYPOTHESIS_TOKEN_BUDGET,
+                 );
+             }
+         }
+     }
+     ```
+   - Acceptance: `cargo check -p kebab-rag` 통과.
+
+7. **`kebab-rag/src/pipeline.rs` — step 8.5 hook 의 callsite 수정** (round-1 plan critic CRITICAL #1 closure: `?` propagation 이 `ask_multi_hop` 의 `Err(anyhow::Error)` 반환 → wire `error.v1` 로 빠짐 → graceful fallback 약속 위반. 기존 `v.score()` Err 분기 (`return self.refuse_nli_model_unavailable`) 와 *대칭* 으로 explicit `match` + `return refuse_*` 패턴).
+
+   기존 callsite (`if was_truncated { debug! }` block 직후, `match v.score(...)` 직전) 다음과 같이 갱신:
+
+   ```rust
+   let (truncated_premise, premise_was_truncated) = truncate_for_nli(&packed_text);
+   if premise_was_truncated {
+       tracing::debug!(target: "kebab-rag", "NLI premise truncated to MAX_NLI_PREMISE_CHARS");
+   }
+   let (truncated_hypothesis, hypothesis_was_truncated) =
+       match truncate_hypothesis_for_nli_with_budget(v.as_ref(), &acc) {
+           Ok(x) => x,
+           Err(e) => {
+               tracing::warn!(
+                   target: "kebab-rag",
+                   error = %e,
+                   "NLI hypothesis budget retry exhausted; refusing with NliModelUnavailable"
+               );
+               return self.refuse_nli_model_unavailable(query, &opts, hops, started);
+           }
+       };
+   if hypothesis_was_truncated {
+       tracing::debug!(
+           target: "kebab-rag",
+           original_chars = acc.chars().count(),
+           "NLI hypothesis truncated to MAX_NLI_HYPOTHESIS_CHARS"
+       );
+   }
+   match v.score(&truncated_premise, &truncated_hypothesis) {
+       // ... 기존 Ok/Err 분기 그대로 (Err 도 동일하게 refuse_nli_model_unavailable)
+   }
+   ```
+
+   - `v.score(&truncated_premise, &acc)` → `v.score(&truncated_premise, &truncated_hypothesis)`.
+   - **`?` 사용 금지** — wire `answer.v1 + NliModelUnavailable refusal` 유지 보장 (graceful fallback).
+   - Acceptance: `cargo check -p kebab-rag` 통과. 기존 `multi_hop_nli_model_unavailable_refuses` test 등 여전히 PASS. **§5.3 test #3 (`unrelenting_token_overflow_falls_through_to_unavailable`) 의 `.unwrap()` 이 panic 안 함** (Ok(Answer{refusal}) unwrap).
+
+8. **`kebab-rag/tests/common/mod.rs` — `SpyNliVerifier` helper 추가.**
+   - spec §5.3 의 코드 블록 verbatim:
+     ```rust
+     use std::sync::{Arc, Mutex};
+     pub struct SpyNliVerifier {
+         pub score_fn: Arc<dyn Fn(&str, &str) -> anyhow::Result<NliScores> + Send + Sync>,
+         pub hypothesis_token_count_fn:
+             Arc<dyn Fn(&str) -> anyhow::Result<usize> + Send + Sync>,
+         pub received_premises: Mutex<Vec<String>>,
+         pub received_hypotheses: Mutex<Vec<String>>,
+     }
+     impl SpyNliVerifier {
+         pub fn new<F, G>(score_fn: F, token_count_fn: G) -> Arc<Self>
+         where
+             F: Fn(&str, &str) -> anyhow::Result<NliScores> + Send + Sync + 'static,
+             G: Fn(&str) -> anyhow::Result<usize> + Send + Sync + 'static,
+         {
+             Arc::new(Self {
+                 score_fn: Arc::new(score_fn),
+                 hypothesis_token_count_fn: Arc::new(token_count_fn),
+                 received_premises: Mutex::new(Vec::new()),
+                 received_hypotheses: Mutex::new(Vec::new()),
+             })
+         }
+     }
+     impl NliVerifier for SpyNliVerifier {
+         fn score(&self, premise: &str, hypothesis: &str) -> anyhow::Result<NliScores> {
+             self.received_premises.lock().unwrap().push(premise.to_string());
+             self.received_hypotheses.lock().unwrap().push(hypothesis.to_string());
+             (self.score_fn)(premise, hypothesis)
+         }
+         fn hypothesis_token_count(&self, hypothesis: &str) -> anyhow::Result<usize> {
+             (self.hypothesis_token_count_fn)(hypothesis)
+         }
+     }
+     ```
+   - 기존 `MockNliVerifier` 와 sibling, 둘 다 살려둠 (서로 다른 test pattern).
+   - **Option B (권장) — inline pattern, helper 작성 안 함** (round-1 plan critic MAJOR #3 closure: 기존 `build_test_pipeline_*` builder 부재 (empirical grep 확인), helper 1회용 + 매 test 마다 long-answer 길이/언어 다름 → inline 이 더 명확). 각 §5.3 의 3 test 안에서 직접 `RagEnv::new()` + `ScriptedRetriever::new(...)` + `ScriptedLm::new(vec!["[\"q1\"]", "[]", &"lorem ".repeat(1000)])` (또는 `&"한국어 ".repeat(N)`) + `RagPipeline::new(...).with_verifier(verifier)` 패턴 — 기존 `crates/kebab-rag/tests/multi_hop.rs` 의 happy-path test 의 inline pattern 그대로 따름. ScriptedLm 응답 시퀀스: decompose JSON → decide JSON → synthesize long-answer.
+   - Acceptance: `cargo check -p kebab-rag --tests` 통과.
+
+9. **`kebab-nli/tests/inference.rs` — §5.1 의 2 신규 `#[ignore]` tests.**
+   - `score_long_en_hypothesis_returns_err_without_pipeline_truncation` — `"lorem ipsum ".repeat(500)` (~6000 chars) 가 OnlyFirst 에서 err 반환 + 메시지 `"Truncation error"` 또는 `"too short to respect"` contains assertion.
+   - `hypothesis_token_count_dispatches_correctly_via_dyn_trait` — **vtable dispatch 검증** (`let v_dyn: &dyn NliVerifier = &v; v_dyn.hypothesis_token_count(...)`) — concrete type 호출이 아닌 dyn dispatch 로 RC1-residual silent NO-OP regression pin. inherent-only 배치 시 default `Ok(0)` 반환 → `assert!(en_count > 0)` 실패. trait impl block 배치 시 EN bounded range (`0 < count < 20`) + KR bounded range (`0 < count < 30`) 양쪽 통과.
+   - 둘 다 `#[ignore]` (network 요구 — model download).
+   - Acceptance: `cargo test -p kebab-nli -j 1` 통과 (ignored 는 default skip 으로 GREEN). `cargo test -p kebab-nli --test inference -- --ignored --test-threads=1` 로 manual 실행 시 PASS (사용자 smoke).
+
+10. **`crates/kebab-rag/tests/multi_hop_nli_truncate.rs` (신규) — §5.3 의 3 mock tests.**
+    - `long_en_synth_answer_truncated_before_nli_call` — EN 5000-char synth answer → `SpyNliVerifier` 가 token_count `Ok(100)` 반환 (budget 안) → retry 0회 → hypothesis 가 정확히 1200 chars 로 truncate + Right direction pin (`assert_eq!(hyp.as_str(), input_first_1200)`).
+    - `long_kr_synth_answer_retries_with_smaller_budget` — `token_count_call_count` Arc<Mutex<usize>> spy, token_count_fn 이 chars > 1000 → 900 tokens, > 500 → 450, else → 220. 2500-char KR-sim answer → retry >= 3 회 + 최종 hypothesis <= 300 chars + `refusal_reason = None` pin.
+    - `unrelenting_token_overflow_falls_through_to_unavailable` — token_count_fn 이 `Ok(9_999)` 무조건 반환 → retry 소진 → `refusal_reason = Some(RefusalReason::NliModelUnavailable)` pin (regression 0 guarantee). score_fn 은 `unreachable!()` 로 — 호출되면 안 됨.
+    - Acceptance: `cargo test -p kebab-rag --test multi_hop_nli_truncate -j 1` GREEN.
+
+11. **`crates/kebab-rag/src/pipeline.rs` 의 `#[cfg(test)] mod tests` block — §5.2 의 4 pure-fn boundary tests** (round-1 plan critic MAJOR #4 closure: `truncate_chars` 가 `pub(crate)` 라 integration test 파일에서 접근 불가 → 동일 crate 의 `#[cfg(test)] mod tests` 필수).
+    - `truncate_chars` 의 4 boundary:
+      - 입력 <= budget → identity, `was_truncated=false`.
+      - 입력 > budget → 정확히 budget chars, `was_truncated=true`.
+      - empty 입력 → identity (budget 무관).
+      - KR 한글 입력 (codepoint, not byte) → `chars().count()` 정확 — 예: `"가나다라마"` 5 chars, budget 3 → `"가나다"` 3 chars (byte 9).
+    - `pipeline.rs` 의 `#[cfg(test)] mod tests` block 안에 inline (`truncate_chars` 가 `pub(crate)` 라 같은 crate 에서 직접 호출 가능). 별 file 추가 없음.
+    - Acceptance: `cargo test -p kebab-rag --lib -j 1` 또는 `cargo test -p kebab-rag -j 1` GREEN.
+
+12. **`tasks/HOTFIXES.md` 신규 dated entry.**
+    - line 17 (현 HOTFIX #15 entry) 직전 위치에 삽입. **HOTFIX 번호 미부여** — sibling fb-41 PR-9 closure layer 의 production behavior follow-up (round-1 critic M6 closure).
+    - 형식: `## 2026-05-26 — S3 NLI unavailable — hypothesis truncate + token-count fallback`
+      - `**Symptom**`: S3 dogfood query 가 NLI 활성 시 `nli_model_unavailable` 일관 fail + 5 회 WARN tokenizer truncation err in `kb.log.2026-05-26`.
+      - `**Root cause**`: mDeBERTa-v3 tokenizer 의 `OnlyFirst` 가 hypothesis 단독으로 512-token cap 초과 시 truncate dead-end. premise-side `truncate_for_nli` 만 적용되고 hypothesis-side 무방비.
+      - `**Action**`: `kebab-nli::NliVerifier::hypothesis_token_count` trait method 추가 (default `Ok(0)`) + `OnnxNliVerifier` 가 trait impl block 안에서 real tokenize override + `kebab-rag::pipeline::truncate_hypothesis_for_nli_with_budget` retry helper (1200 → 600 → 300 → 150 chars 반감 retry + min 150 floor 시 graceful unavailable fallback) + step 8.5 callsite 양쪽 truncate. 7 files.
+      - `**Amends**`: spec `docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md` cross-link. task spec `tasks/p9/p9-fb-41-multi-hop-finalize.md` 의 NLI 동작 추가 보강 (hypothesis-side budget 신규).
+    - Acceptance: `git diff tasks/HOTFIXES.md` 로 entry 가 line 17 직전 위치에 삽입됐는지 확인.
+
+---
+
+## §3. 검증
+
+spec §5.5 의 cargo command verbatim — 단일 사용자 환경 / 16 GB RAM 제약 → `-j 1` 필수.
+
+```bash
+# 1. kebab-nli 회귀 — ignored 2 신규 default skip 으로 GREEN
+CARGO_TARGET_DIR=/build/out/cargo-target/target \
+  cargo test -p kebab-nli -j 1
+
+# 2. kebab-rag 회귀 — 3 mock + 4 boundary 신규 포함 전부 GREEN
+CARGO_TARGET_DIR=/build/out/cargo-target/target \
+  cargo test -p kebab-rag -j 1
+
+# 3. workspace 전체 — baseline 1306 test 수 유지
+CARGO_TARGET_DIR=/build/out/cargo-target/target \
+  cargo test --workspace --no-fail-fast -j 1
+
+# 4. lint — -D warnings clean
+CARGO_TARGET_DIR=/build/out/cargo-target/target \
+  cargo clippy --workspace --all-targets -j 1 -- -D warnings
+
+# 5. Manual smoke — 2 신규 ignored test 가 real mDeBERTa 로 PASS
+CARGO_TARGET_DIR=/build/out/cargo-target/target \
+  cargo test -p kebab-nli --test inference -- --ignored --test-threads=1
+```
+
+### Dogfood retest (spec §5.4 verbatim — `2>&1` 제거)
+
+```bash
+CARGO_TARGET_DIR=/build/out/cargo-target/target \
+  cargo build --release -p kebab-cli -j 1
+
+# round-1 plan verifier Gap 2 closure: config 의 nli_threshold > 0 사전 확인 —
+# default 0.0 이면 NLI off → retest 가 vacuous (fix 의 효과 미발현).
+grep -E "^\s*nli_threshold\s*=" /build/cache/dogfood-v018/config/config.toml \
+  || echo "WARNING: nli_threshold 미설정 — config 에 [rag] nli_threshold = 0.3 추가 후 진행"
+
+# EN — S3 본래 케이스
+RUST_LOG=info,kebab_rag=debug,kebab_nli=debug \
+  /build/out/cargo-target/target/release/kebab ask \
+    "Why does kebab combine multilingual-e5, LanceDB, and RRF together?" \
+    --multi-hop --config /build/cache/dogfood-v018/config/config.toml \
+    --json > /build/cache/dogfood-v018/results/post-s3-fix/s3-en-retest.json
+
+# KR — long-answer 시뮬레이션
+RUST_LOG=info,kebab_rag=debug,kebab_nli=debug \
+  /build/out/cargo-target/target/release/kebab ask \
+    "kebab 의 multi-hop RAG + NLI verification 의 동작 원리는 무엇인가?" \
+    --multi-hop --config /build/cache/dogfood-v018/config/config.toml \
+    --json > /build/cache/dogfood-v018/results/post-s3-fix/s3-kr-retest.json
+
+# 결과 검증
+jq -r '.refusal_reason, .verification.nli_score, .usage.completion_tokens' \
+  /build/cache/dogfood-v018/results/post-s3-fix/s3-en-retest.json
+jq -r '.refusal_reason, .verification.nli_score, .usage.completion_tokens' \
+  /build/cache/dogfood-v018/results/post-s3-fix/s3-kr-retest.json
+
+# Tracing log 의 truncate path 활성 확인 (file appender)
+grep "NLI hypothesis truncated\|hypothesis_token_count\|hypothesis budget retry" \
+  ~/.local/state/kebab/logs/kb.log.$(date -I)
+```
+
+### 성공 기준
+
+| 검증 | GREEN 정의 |
+|---|---|
+| `cargo test -p kebab-nli -j 1` | 기존 test 전부 PASS, 2 ignored skip 으로 표시 |
+| `cargo test -p kebab-rag -j 1` | 기존 test 전부 PASS + 3 mock multi-hop + 4 boundary 신규 PASS |
+| `cargo test --workspace -j 1` | 1306 baseline 유지 (+ 신규 9 — 정확 카운트 PR 시점 측정) |
+| `cargo clippy --workspace -- -D warnings` | warning 0 |
+| Manual ignored smoke | 2 신규 test PASS (real mDeBERTa 호출) |
+| Dogfood S3 EN retest | `refusal_reason` 가 `null` 또는 `"nli_verification_failed"` (NOT `"nli_model_unavailable"`) + `verification.nli_score` finite float |
+| Dogfood S3 KR retest | 동일 — `nli_model_unavailable` 아님 + log 에 retry trace |
+
+---
+
+## §4. 시간 추정
+
+| 단계 | wall time |
+|---|---|
+| 구현 (12 step) | ~1 hr |
+| 검증 (cargo test + clippy + 2 dogfood retest) | ~30 min (`-j 1` workspace test ~12 min + dogfood 2 query ~2 min × 2 + log/jq inspection) |
+| OMC reviewer dispatch + 0-1 review iteration amend | ~30 min - 1 hr |
+| **합계** | **wall time 2-3 h** |
+
+단일 PR, scope 작음 — round-1 review 만으로 ACCEPT 가능 확률 높음.
+
+---
+
+## §5. 위험 / 회피
+
+spec §8 cross-ref + pre-mortem (a)-(d). 핵심 risk 와 mitigation 만 재서술:
+
+| Risk | Mitigation |
+|---|---|
+| **RC1-residual: inherent vs trait impl** — step 3 의 `hypothesis_token_count` 가 inherent `impl OnnxNliVerifier {}` 안에 위치하면 vtable 미등록 → pipeline 의 `&dyn NliVerifier` dispatch 시 default `Ok(0)` 호출 → retry loop 즉시 통과 (token count 0 ≤ 256) → real tokenizer 무방비 → **production silent NO-OP** | step 3 의 code-block 위치를 *`impl NliVerifier for OnnxNliVerifier {}` block 안* 명시. spec §5.1 Test 7 (`hypothesis_token_count_dispatches_correctly_via_dyn_trait`) 가 vtable 통해 `> 0` count 검증 — silent NO-OP regression 시 fail. |
+| **KR-extreme density (pre-mortem a)** — char-truncate 후에도 token count 초과 (한자/CJK 1 char = 2-3 tokens 가능) | retry 1200→600→300→150 + min floor 시 graceful `nli_model_unavailable` (regression 0). §5.3 mock test `unrelenting_token_overflow_falls_through_to_unavailable` 가 pin. |
+| **Conclusion-bearing hypothesis (pre-mortem b)** — LLM 답변의 후반부 결론이 truncate 로 손실 → false negative entailment | fail-closed semantic 보존 (정상 reject). 사용자가 `nli_threshold = 0` 으로 임시 disable 가능. README 안내 별 task §6 #7. |
+| **Mock test 의 `build_test_pipeline_with_long_answer` 부재** | round-2 plan critic MAJOR #3 closure: Option B (inline pattern) 채택 — helper 작성 안 함. 각 §5.3 test 안에서 `RagEnv::new()` + `ScriptedRetriever::new()` + `ScriptedLm::new(vec![decompose, decide, &"lorem ".repeat(N)])` + `RagPipeline::new(...).with_verifier(verifier)` 5-component inline (plan step 8 verbatim). |
+| **`hypothesis_token_count` probe 자체 fail (pre-mortem d)** | `with_context` wrap → anyhow err → step 8.5 hook 의 **explicit `match` + `return self.refuse_nli_model_unavailable(...)`** (round-1 plan critic CRITICAL #1 closure: `?` propagation 금지, `v.score()` Err 분기와 *대칭*). empty hypothesis 는 진입 전 `acc.trim().is_empty()` guard. |
+
+---
+
+## §6. 자기-review (spec items → plan step mapping)
+
+| spec item | plan step |
+|---|---|
+| §3 Option A: `NliVerifier::hypothesis_token_count` trait 확장 (default `Ok(0)`) | step 1 |
+| §3 Option A: `OnnxNliVerifier::HYPOTHESIS_TOKEN_BUDGET = 256` inherent const | step 2 |
+| §3 Option A: trait impl block 안 override (RC1-residual) | step 3 |
+| §3 Option A: `MAX_NLI_HYPOTHESIS_CHARS_INITIAL = 1200` + `MIN = 150` consts | step 4 |
+| §5.2 chars-only pure-fn `truncate_chars` | step 5 |
+| §3 Option A: `truncate_hypothesis_for_nli_with_budget` retry helper | step 6 |
+| §3 Option A: step 8.5 hook callsite + premise + hypothesis 양쪽 truncate + tracing debug | step 7 |
+| §5.3 verifier Blocker 1: `SpyNliVerifier` helper (closure 2-arg constructor) | step 8 |
+| §5.1 Test 6: `score_long_en_hypothesis_returns_err_without_pipeline_truncation` | step 9 |
+| §5.1 Test 7 (renumbered): `hypothesis_token_count_dispatches_correctly_via_dyn_trait` (vtable dispatch + KR/EN bounded range — RC1-residual pin) | step 9 |
+| §5.3 Test 1: `long_en_synth_answer_truncated_before_nli_call` (Right direction pin) | step 10 |
+| §5.3 Test 2: `long_kr_synth_answer_retries_with_smaller_budget` (token_count call count + ≤ 300 chars 최종) | step 10 |
+| §5.3 Test 3: `unrelenting_token_overflow_falls_through_to_unavailable` (graceful fallback pin) | step 10 |
+| §5.2 4 boundary tests (identity / truncate / empty / KR codepoint) | step 11 |
+| §7 합의 출구 #6: HOTFIXES.md dated entry (Symptom / Root cause / Action / Amends 4-block, HOTFIX 번호 미부여) | step 12 |
+| §5.5 cargo test + clippy 4 command + manual ignored smoke | §3 |
+| §5.4 dogfood EN + KR retest | §3 (dogfood retest 블록) |
+
+모든 spec 의 9 test + 7 file + RC1-residual mitigation 이 plan step 으로 1-to-1 mapping. 누락 0.
+
+---
+
+## §7. Subagent dispatch task
+
+`/superpowers:subagent-driven-development` 의 single task description — executor (sonnet/opus) 가 self-contained 로 따라할 수 있게 *spec §3 + §5 의 핵심 코드 skeleton inline* (round-2 critic M2 의 self-contained 패턴).
+
+### Task name
+
+`S3 NLI unavailable — hypothesis char budget + token-count fallback retry`
+
+### Task description (subagent 에 직접 전달)
+
+> S3 dogfood query (`"Why does kebab combine multilingual-e5, LanceDB, and RRF together?"`) 가 multi-hop 에서 `nli_model_unavailable` 로 일관 fail. Root cause: mDeBERTa-v3 tokenizer 의 `TruncationStrategy::OnlyFirst` + 949-token LLM 답변 = hypothesis 단독이 512-cap 초과 → truncate dead-end. Fix: char-budget 으로 자른 후 real tokenizer 로 token-count 재검증 → 초과 시 char budget 절반화 retry, min 150 chars floor 시 graceful unavailable fallback. KR safe + regression 0.
+>
+> Spec: `docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md` (single source).
+> Plan: `docs/superpowers/plans/2026-05-26-s3-nli-model-unavailable-diagnose-plan.md` (이 문서).
+>
+> 12 step (plan §2):
+>
+> 1. `crates/kebab-nli/src/lib.rs` — trait 에 `hypothesis_token_count(&self, &str) -> anyhow::Result<usize> { Ok(0) }` default 추가.
+> 2. `crates/kebab-nli/src/onnx.rs` — `impl OnnxNliVerifier { pub const HYPOTHESIS_TOKEN_BUDGET: usize = 256; }`.
+> 3. `crates/kebab-nli/src/onnx.rs` — `impl NliVerifier for OnnxNliVerifier` block **안** 에서 `hypothesis_token_count` override (real `tokenizer.encode` probe). **inherent 가 아닌 trait impl block — vtable 등록 보장**.
+> 4. `crates/kebab-rag/src/pipeline.rs` — `MAX_NLI_HYPOTHESIS_CHARS_INITIAL = 1200` + `MAX_NLI_HYPOTHESIS_CHARS_MIN = 150` const.
+> 5. `crates/kebab-rag/src/pipeline.rs` — `truncate_chars(s, budget) -> (String, bool)` pure-fn (codepoint-aware).
+> 6. `crates/kebab-rag/src/pipeline.rs` — `truncate_hypothesis_for_nli_with_budget(verifier: &dyn NliVerifier, hypothesis: &str) -> anyhow::Result<(String, bool)>` retry loop (spec §3 코드 verbatim).
+> 7. `crates/kebab-rag/src/pipeline.rs` — step 8.5 hook (line 1041 근방) 의 `v.score(&truncated_premise, &acc)` 직전에 hypothesis-side truncate hook 삽입 + `tracing::debug` log.
+> 8. `crates/kebab-rag/tests/common/mod.rs` — `SpyNliVerifier` (closure score_fn + token_count_fn, 2-arg constructor, capture `Mutex<Vec<String>>`) helper 추가.
+> 9. `crates/kebab-nli/tests/inference.rs` — `#[ignore]` 2 test (long EN err pin + **vtable dispatch test for RC1-residual pin** — `let v_dyn: &dyn NliVerifier = &v;` 통해 `hypothesis_token_count` 호출, inherent-only 배치 시 `Ok(0)` 받아 assertion fail).
+> 10. `crates/kebab-rag/tests/multi_hop_nli_truncate.rs` (신규) — 3 mock test (EN happy / KR retry / unrelenting fallback).
+> 11. `crates/kebab-rag/src/pipeline.rs` 의 `#[cfg(test)] mod tests` — `truncate_chars` 4 boundary test.
+> 12. `tasks/HOTFIXES.md` — line 17 직전 신규 dated entry (Symptom / Root cause / Action / Amends 4-block, HOTFIX 번호 미부여).
+>
+> **검증** (`-j 1` 필수, 16 GB RAM):
+>
+> ```bash
+> CARGO_TARGET_DIR=/build/out/cargo-target/target cargo test -p kebab-nli -j 1
+> CARGO_TARGET_DIR=/build/out/cargo-target/target cargo test -p kebab-rag -j 1
+> CARGO_TARGET_DIR=/build/out/cargo-target/target cargo test --workspace --no-fail-fast -j 1
+> CARGO_TARGET_DIR=/build/out/cargo-target/target cargo clippy --workspace --all-targets -j 1 -- -D warnings
+> CARGO_TARGET_DIR=/build/out/cargo-target/target cargo test -p kebab-nli --test inference -- --ignored --test-threads=1
+> ```
+>
+> + dogfood S3 EN + KR retest (plan §3 의 코드 블록 verbatim, `2>&1` 제거 — jq parsing 충돌 회피).
+>
+> **단일 commit**:
+>
+> ```
+> fix(rag,nli): S3 NLI unavailable — hypothesis char budget + token-count fallback retry
+>
+> mDeBERTa-v3 tokenizer 의 OnlyFirst strategy 가 hypothesis 단독으로
+> 512-token cap 초과 시 truncate dead-end. char-budget retry + real
+> tokenizer token-count probe 로 회피, KR safe (1200→600→300→150 retry +
+> graceful unavailable fallback at min 150 chars).
+>
+> - kebab-nli::NliVerifier 에 hypothesis_token_count probe API 추가
+>   (default Ok(0) backward-compat).
+> - OnnxNliVerifier 가 trait impl block 안에서 real tokenize override
+>   (RC1-residual: inherent impl 은 vtable 미등록 → silent NO-OP).
+> - kebab-rag::pipeline::truncate_hypothesis_for_nli_with_budget retry
+>   helper + step 8.5 hook 의 hypothesis-side hook.
+> - SpyNliVerifier closure-based test helper + 5 신규 test (2 ignored
+>   inference + 3 mock multi-hop + 4 pure-fn boundary).
+> - HOTFIXES.md dated entry.
+>
+> Closes spec docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md
+>
+> 🤖 Generated with [Claude Code](https://claude.com/claude-code)
+>
+> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
+> ```
+>
+> **PR title**: `fix(rag,nli): S3 NLI unavailable — hypothesis char budget + token-count fallback retry`
+>
+> **PR body skeleton**:
+>
+> ```
+> ## Summary
+> - S3 dogfood query 의 `nli_model_unavailable` consistent fail root cause = mDeBERTa-v3 OnlyFirst tokenizer 의 hypothesis-side overflow. char-budget retry + real tokenizer token-count probe 로 회피, KR safe + graceful fallback (regression 0).
+> - 4 production files (lib.rs trait, onnx.rs inherent+trait override, pipeline.rs consts+helpers+callsite) + 3 test files (common/mod.rs SpyNliVerifier, inference.rs 2 ignored, multi_hop_nli_truncate.rs 3 mock) + HOTFIXES.md entry.
+> - Wire schema 변경 0, Cargo.toml version bump 불필요 (production behavior fix).
+>
+> ## Test plan
+> - [ ] `cargo test -p kebab-nli -j 1` GREEN (기존 + 2 ignored skip)
+> - [ ] `cargo test -p kebab-rag -j 1` GREEN (기존 + 3 mock + 4 boundary)
+> - [ ] `cargo test --workspace --no-fail-fast -j 1` GREEN (baseline 1306+ 유지)
+> - [ ] `cargo clippy --workspace --all-targets -j 1 -- -D warnings` clean
+> - [ ] `cargo test -p kebab-nli --test inference -- --ignored --test-threads=1` GREEN (manual smoke)
+> - [ ] Dogfood S3 EN retest: `refusal_reason` 가 `nli_model_unavailable` 아님
+> - [ ] Dogfood S3 KR retest: 동일 + retry trace in kb.log
+>
+> Spec: `docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md`
+> Plan: `docs/superpowers/plans/2026-05-26-s3-nli-model-unavailable-diagnose-plan.md`
+> ```
+
+### RR-tier cleanup notes (executor 자기-review 단계 또는 별 cleanup commit)
+
+executor 가 step 1-12 완료 후 *자기-review 단계* 에서 다음 RR1-5 cleanup 확인:
+
+- **RR1 (style)**: `truncate_chars` / `truncate_hypothesis_for_nli_with_budget` doc comment 의 spec §X cross-link 형식이 기존 pipeline.rs 의 doc comment 형식과 일치 (e.g. `truncate_for_nli` 의 doc 와 sibling style).
+- **RR2 (naming)**: const `MAX_NLI_HYPOTHESIS_CHARS_INITIAL` / `_MIN` 의 prefix 가 기존 `MAX_NLI_PREMISE_CHARS` 와 sibling 명명 유지.
+- **RR3 (test 분류)**: §5.2 boundary 4 test 는 *pipeline.rs `#[cfg(test)] mod tests`* 안 (pure-fn unit) vs *`multi_hop_nli_truncate.rs`* (integration mock) 의 layer 구분 명확. `truncate_chars` 가 `pub(crate)` 라 같은 crate 의 lib test 에서 직접 접근 가능 — integration test 로 빼면 visibility issue. *pipeline.rs 안 #[cfg(test)] mod 로 둠* 권장.
+- **RR4 (tracing target)**: `tracing::debug!(target: "kebab-rag", ...)` 의 target 이 기존 `truncate_for_nli` 의 tracing target 과 일치.
+- **RR5 (HOTFIXES 위치)**: line 17 직전 (현 HOTFIX #15 위) 정확 위치. HOTFIX 번호 *미부여* (sibling fb-41 layer follow-up — round-1 critic M6 closure).
+
+RR1-5 는 별 cleanup commit 으로 분리하지 말고 implementation commit 의 자기-review 단계에서 inline 처리 권장 (scope 작은 hotfix 라).
+
+---
+
+## §8. plan v1 status
+
+- spec 4 round adversarial review (critic + verifier 모두 ACCEPT/APPROVE) 완료 후 작성.
+- round-1 OMC reviewer dispatch 준비 — 본 plan v1 의 critique 가 amend 입력.
+- subagent dispatch 는 plan v1 의 reviewer ACCEPT 후 진행.
diff --git a/docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md b/docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md
new file mode 100644
index 0000000..339cb65
--- /dev/null
+++ b/docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md
@@ -0,0 +1,646 @@
+---
+title: "S3 dogfood query 의 nli_model_unavailable consistent fail — hypothesis-side 토큰 overflow 가 NLI tokenizer 의 OnlyFirst-too-short 에러를 trip"
+date: 2026-05-26
+task_id: s3-nli-unavailable
+status: open
+target_version: 0.18.1
+sibling_of: tasks/HOTFIXES.md "2026-05-25 — fb-41 pre-v0.18 dogfood: multi-hop score-gate 우회"
+fix_files:
+  - crates/kebab-nli/src/lib.rs  # NliVerifier trait 의 hypothesis_token_count default impl
+  - crates/kebab-nli/src/onnx.rs # OnnxNliVerifier::hypothesis_token_count override + HYPOTHESIS_TOKEN_BUDGET
+  - crates/kebab-rag/src/pipeline.rs # truncate_hypothesis_for_nli_with_budget + MAX_NLI_HYPOTHESIS_CHARS_INITIAL/MIN + step 8.5 callsite
+  - crates/kebab-rag/tests/common/mod.rs # SpyNliVerifier helper
+fix_symbols:
+  - "kebab_nli::NliVerifier::hypothesis_token_count (trait method, default impl)"
+  - "kebab_nli::OnnxNliVerifier::HYPOTHESIS_TOKEN_BUDGET (const 256)"
+  - "kebab_rag::pipeline::MAX_NLI_HYPOTHESIS_CHARS_INITIAL (const 1200)"
+  - "kebab_rag::pipeline::MAX_NLI_HYPOTHESIS_CHARS_MIN (const 150)"
+  - "kebab_rag::pipeline::truncate_hypothesis_for_nli_with_budget"
+---
+
+# S3 NLI unavailable — Hypothesis-side overflow trips OnlyFirst-only tokenizer
+
+## §1. 진단 (Root cause)
+
+`crates/kebab-nli/src/onnx.rs::OnnxNliVerifier::load_tokenizer` (line 216-223) 는 mDeBERTa-v3 tokenizer 를 다음과 같이 설정:
+
+```rust
+tokenizer.with_truncation(Some(TruncationParams {
+    max_length: MAX_TOKENS,           // 512
+    strategy: TruncationStrategy::OnlyFirst,
+    stride: 0,
+    direction: TruncationDirection::Right,
+}))
+```
+
+`OnlyFirst` 는 *2-sequence input* 일 때 첫 번째 sequence (= premise) 만 자른다. 두 번째 sequence (= hypothesis) 가 *단독으로* `max_length - special_tokens (≈ 509)` 를 초과하면, premise 를 0 까지 잘라도 fit 시킬 방법이 없어 `tokenizers` crate 가 `TokenizerError::SequenceTooShortToTruncate` 를 raise:
+
+```
+Truncation error: Sequence to truncate too short to respect the provided max_length
+```
+
+`pipeline.rs::ask_multi_hop` (line 1041) 는 premise 측만 char-budget 으로 자르고 (`truncate_for_nli` / `MAX_NLI_PREMISE_CHARS = 1600`), hypothesis (`acc` = LLM-synth answer) 는 raw 로 `v.score(&truncated_premise, &acc)` 호출. `Err` 가 나오면 line 1057-1064 의 `Err(e) => return self.refuse_nli_model_unavailable(...)` 가 *모든* tokenizer/inference err 를 `RefusalReason::NliModelUnavailable` 로 단일화 → wire 상 "모델 unavailable" 로 보고.
+
+### S3 의 특이성
+
+- Query: `"Why does kebab combine multilingual-e5, LanceDB, and RRF together?"`
+- KB: kebab self-knowledge (README / spec / SMOKE / ARCHITECTURE / HANDOFF). retrieval 이 다수 chunks 반환 → synthesize prompt 가 풍부 → gemma3:4b 가 장황한 답 생성.
+
+**Evidence chain (round-1 critic H1 closure — 두 invocation 분리 명시)**:
+
+1. **LLM answer 길이 evidence**: `/build/cache/dogfood-v018/results/s3-multihop.json` (2026-05-25 dogfood, `nli_threshold = 0` 즉 NLI off) — `usage.completion_tokens = 949`, `answer chars = 4564`. *이 invocation 의 refusal 은 `llm_self_judge` (NLI off 상태)* — NLI tokenizer 는 호출조차 안 됨. *오직* "S3 같은 query 가 KB self-knowledge 라 LLM 이 verbose 답을 생성한다" 는 *답변 길이* 만 evidence.
+
+2. **NLI tokenizer error evidence**: `~/.local/state/kebab/logs/kb.log.2026-05-26` 의 5 회 WARN 라인 (2026-05-26 retest, NLI threshold > 0 활성) — `tokenizer.encode failed: Truncation error: Sequence to truncate too short to respect the provided max_length`. 이 retest 의 JSON 결과는 미보존 (별 follow-up `--json` redirect 누락) — wire evidence 는 log 만.
+
+3. **연결 추론**: long answer (#1) + NLI 활성 (#2) → OnlyFirst-only tokenizer 가 hypothesis 단독 cap 초과 → tokenizer error → `nli_model_unavailable` wire. 두 invocation 의 mechanic 이 같은 root cause 로 수렴.
+
+다른 PASS 케이스 (S1=한국어 컴파일러, S7=caffeine, S10=dinosaurs) 는 KB 에 본문 fact 가 *없거나* 부분만 있어 답이 짧다 (S7 retest: `/build/cache/dogfood-v018/results/post-pr9/s7-multihop.json` 의 `usage.completion_tokens = 114`). 그래서 NLI tokenizer 가 정상 작동하고 `nli_score` 가 측정됨 (낮아서 verification_failed 로 거부됨).
+
+### Tracing surface
+
+진단 evidence (`kb.log.2026-05-26` 의 WARN 5 회) 발견 시 *우회 자체* 가 시간 비용 — 별 task `§6 #2 (KEBAB_LOG_STDERR opt-in)` + `§6 #7 (README/SMOKE 안내 한 줄)`. **본 spec scope 외** (round-1 critic M4 closure — root cause 와 무관, defer 로 정렬).
+
+### Hypothesis 별 검증 (analyst empirical)
+
+- **A (ort Session::run shape err)** — REJECTED. tokenizer 가 먼저 실패해 Session::run 까지 도달하지 않음. 에러 메시지가 ort 가 아니라 `tokenizer.encode failed`.
+- **B (tokenizer encode fail)** — **CONFIRMED**. error 메시지 텍스트 일치.
+- **C (hf-hub Cache::get probe race)** — REJECTED. 모델 cache 존재 + 매 invocation 결정적 같은 실패.
+- **D (OnceLock partial init race)** — REJECTED. sequential dogfood, single-thread `App` boot. 같은 input 으로 매번 같은 실패 → race 아님.
+
+### 313 s latency 의 정체
+
+NLI 가 hang 한 게 아님. dogfood S3 의 `latency_ms` 는 gemma3:4b 의 **synthesize hop LLM streaming 시간**. NLI tokenizer 는 ms 단위 fast-fail.
+
+---
+
+## §2. 영향
+
+| Surface | 영향 |
+|---|---|
+| Production behavior | NLI gate 활성 (`rag.nli_threshold > 0`) 인 KB 에서 LLM 이 **500 tokens 초과** 답변을 생성하면 항상 `nli_model_unavailable` refusal. KB-rich self-knowledge query / 다단계 multi-hop 결과에서 빈번. |
+| Wire (`answer.v1`) | `refusal_reason = "nli_model_unavailable"`, `verification = null`. 사용자/agent 가 "NLI 모델 다운로드 / 로드 문제" 로 오인. |
+| Wire (`error.v1`) | 영향 없음 — refusal envelope 만 사용, error envelope 으로는 안 빠짐. |
+| 사용자 binary | `[rag] nli_threshold = 0` (현 default) 사용자: 미영향. README + frozen design 의 *권장 production 0.5* 활성 사용자: KB-rich self-knowledge query 또는 verbose LLM 답변에서 빈번 노출 — `nli_model_unavailable` refusal 받음 (round-1 critic M5 정정: "일반 사용자 미영향" 은 *현 default 기준* 만 정확, *권장 사용 기준* 으로는 misleading). |
+| NLI model 자체 | **정상**. Cache 정상, `score` 도 짧은 hypothesis 에는 작동. |
+
+---
+
+## §3. Fix design
+
+### Option A (권장) — Char-budget + token-count fallback retry (round-1 critic H3 + verifier Blocker 2 closure)
+
+`crates/kebab-rag/src/pipeline.rs` 의 premise-side `truncate_for_nli` + `MAX_NLI_PREMISE_CHARS = 1600` 패턴을 hypothesis 에도 대칭 적용하되, **char-budget 만으로는 KR-heavy hypothesis (1-2 chars/token) 가 512 cap 을 단독 초과 가능** (1200 KR chars × 1 char/token = 1200 tokens >> 512). 따라서 char-truncate 후 *실 mDeBERTa tokenizer 로 token-count 재검증* → 초과 시 char budget 절반으로 retry. 최악 fallback 시 `nli_model_unavailable` (regression 0).
+
+#### `kebab-nli` crate 의 API 작은 확장
+
+**Trait 확장 + impl placement** (round-3 critic RC1-residual closure — `hypothesis_token_count` 가 *inherent* 가 아닌 *trait impl block* 안에 위치해야 trait dispatch 가 vtable 통해 호출 — production silent NO-OP 회피):
+
+```rust
+// crates/kebab-nli/src/lib.rs — trait 확장
+pub trait NliVerifier: Send + Sync {
+    fn score(&self, premise: &str, hypothesis: &str) -> anyhow::Result<NliScores>;
+
+    /// Probe-only tokenize for caller-side budget verification. round-1
+    /// critic H3 closure — pipeline 의 char-budget retry loop 가 이 API 로
+    /// `OnlyFirst` dead-end 회피.
+    ///
+    /// **Default impl 반환 0** — 기존 mock implementations
+    /// (`MockNliVerifier`) 가 trait 확장 후에도 backward-compat (compile
+    /// fail 회피, retry loop immediate 통과). `OnnxNliVerifier` 는 real
+    /// tokenizer 로 *trait impl 블록 안에서* override (round-3 critic
+    /// RC1-residual — inherent method 는 vtable 미등록 → trait dispatch
+    /// 시 default 호출 → silent NO-OP).
+    fn hypothesis_token_count(&self, _hypothesis: &str) -> anyhow::Result<usize> {
+        Ok(0)
+    }
+}
+```
+
+```rust
+// crates/kebab-nli/src/onnx.rs — inherent 와 trait impl 분리
+impl OnnxNliVerifier {
+    /// Hypothesis-side budget. Pipeline uses this to size its
+    /// char-truncation retry loop. = MAX_TOKENS (512) - 3 special tokens
+    /// reserved (CLS, SEP, SEP) - some premise room (caller decides).
+    pub const HYPOTHESIS_TOKEN_BUDGET: usize = 256; // 안전 마진: 512 - 3 - 253 premise room
+}
+
+impl NliVerifier for OnnxNliVerifier {
+    fn score(&self, premise: &str, hypothesis: &str) -> anyhow::Result<NliScores> {
+        // ... 기존 score body 그대로 (현재 onnx.rs:229-280)
+    }
+
+    /// **Override** the trait default `Ok(0)` with a real mDeBERTa tokenize.
+    /// Pipeline 의 `truncate_hypothesis_for_nli_with_budget` retry loop 가
+    /// 이 method 를 vtable 통해 호출 — production code path 에서 실 token
+    /// count 측정 (round-3 critic RC1-residual closure: inherent impl 이
+    /// 아닌 *trait impl block 안* 에 위치해야 vtable 등록).
+    fn hypothesis_token_count(&self, hypothesis: &str) -> anyhow::Result<usize> {
+        let (_session, tokenizer) = self.ensure_loaded()?;
+        let enc = tokenizer
+            .encode(hypothesis, /*add_special_tokens=*/false)
+            .map_err(|e| anyhow!("kebab-nli: tokenizer.encode (probe) failed: {e}"))?;
+        Ok(enc.get_ids().len())
+    }
+}
+```
+
+#### `crates/kebab-rag/src/pipeline.rs` 의 hypothesis-side budget + retry
+
+```rust
+/// p9-fb-41 S3 follow-up: NLI hypothesis (= synthesized answer) 가
+/// 512-token cap 을 단독으로 초과하면 `OnlyFirst` truncation 이 premise 를
+/// 0 까지 잘라도 fit 시킬 수 없어 tokenizer 가 err. char-budget 으로
+/// 자른 후 *실 mDeBERTa tokenizer* 로 token count 재검증 — 초과 시 char
+/// budget 절반으로 retry. KR safe (round-1 critic H3 closure).
+pub const MAX_NLI_HYPOTHESIS_CHARS_INITIAL: usize = 1200;
+pub const MAX_NLI_HYPOTHESIS_CHARS_MIN: usize = 150;
+
+/// Truncate hypothesis with `Right` direction (앞부분 보존 — LLM 답변의
+/// 도입부에 핵심 claim 이 있음. 후반 손실은 결론 재요약 영향 적음).
+/// Empirically validates token-count via `verifier.hypothesis_token_count`
+/// 와 절반화 retry. 최종 cap = `HYPOTHESIS_TOKEN_BUDGET`. fallback 도달
+/// 시 anyhow::Err — caller (step 8.5 hook) 가 기존 unavailable path.
+pub fn truncate_hypothesis_for_nli_with_budget(
+    verifier: &(dyn kebab_nli::NliVerifier + 'static),
+    hypothesis: &str,
+) -> anyhow::Result<(String, bool)> {
+    let original_chars = hypothesis.chars().count();
+    let mut budget = MAX_NLI_HYPOTHESIS_CHARS_INITIAL;
+    let mut was_truncated = false;
+
+    loop {
+        let candidate: String = if original_chars <= budget {
+            hypothesis.to_string()
+        } else {
+            was_truncated = true;
+            hypothesis.chars().take(budget).collect()
+        };
+
+        // verifier 의 internal tokenizer 로 token count 재검증.
+        // downcast 는 trait method 이라 직접 호출 가능.
+        let token_count = verifier
+            .hypothesis_token_count(&candidate)
+            .with_context(|| "kebab-rag: hypothesis token-count probe failed")?;
+        if token_count <= kebab_nli::OnnxNliVerifier::HYPOTHESIS_TOKEN_BUDGET {
+            return Ok((candidate, was_truncated));
+        }
+
+        // 초과 — char budget 절반화 retry.
+        budget = budget / 2;
+        if budget < MAX_NLI_HYPOTHESIS_CHARS_MIN {
+            anyhow::bail!(
+                "kebab-rag: hypothesis remains over token budget after retry (original {original_chars} chars, last budget {} chars, tokens {token_count} > {})",
+                budget * 2,
+                kebab_nli::OnnxNliVerifier::HYPOTHESIS_TOKEN_BUDGET,
+            );
+        }
+    }
+}
+```
+
+#### Callsite (step 8.5 hook) — round-1 plan critic CRITICAL #1 closure
+
+`?` propagation 사용 시 `ask_multi_hop` 이 `Err(anyhow::Error)` 반환 → wire `error.v1` 로 빠짐 (graceful fallback 약속 위반). 기존 `v.score()` Err 분기 (`return self.refuse_nli_model_unavailable(...)`) 와 *대칭* 으로 explicit match + return refuse 사용:
+
+```rust
+let (truncated_premise, premise_was_truncated) = truncate_for_nli(&packed_text);
+if premise_was_truncated {
+    tracing::debug!(target: "kebab-rag", "NLI premise truncated to MAX_NLI_PREMISE_CHARS");
+}
+let (truncated_hypothesis, hypothesis_was_truncated) =
+    match truncate_hypothesis_for_nli_with_budget(v.as_ref(), &acc) {
+        Ok(x) => x,
+        Err(e) => {
+            tracing::warn!(
+                target: "kebab-rag",
+                error = %e,
+                "NLI hypothesis budget retry exhausted; refusing with NliModelUnavailable"
+            );
+            return self.refuse_nli_model_unavailable(query, &opts, hops, started);
+        }
+    };
+if hypothesis_was_truncated {
+    tracing::debug!(
+        target: "kebab-rag",
+        original_chars = acc.chars().count(),
+        "NLI hypothesis truncated to MAX_NLI_HYPOTHESIS_CHARS"
+    );
+}
+// ... 기존 v.score() 호출 path — Err 분기 (line 1057-1064) 와 대칭 패턴
+match v.score(&truncated_premise, &truncated_hypothesis) {
+    Ok(scores) => { /* 기존 path */ }
+    Err(e) => { /* 기존 refuse_nli_model_unavailable */ }
+}
+```
+
+**핵심**: `?` 대신 explicit `match` + `return self.refuse_nli_model_unavailable(...)` — wire `answer.v1 + refusal_reason=nli_model_unavailable` 유지 (graceful fallback 약속 보존, regression 0).
+
+**Pros**:
+- KR + EN *대부분 케이스* 안전 (token-count retry path + char budget 절반화 retry). 최악 case (KR-extreme density e.g. 압축 한자 텍스트) 시 graceful fallback 으로 기존 unavailable 유지 — regression 0 (round-2 critic RM3 closure).
+- premise-side `truncate_for_nli` 와 *대칭* (둘 다 pipeline-layer helper).
+- `kebab-nli::NliVerifier` trait 에 1 method 추가 — wire 영향 0, API 확장 acceptable (v0.18 의 NLI surface 가 아직 unstable internal).
+
+**Cons**:
+- `NliVerifier` trait 에 `hypothesis_token_count` method 추가 — implementations (현재 1 개: `OnnxNliVerifier`) 갱신. mock implementations (`MockNliVerifier`) 도 갱신 필요.
+- Retry loop 마다 tokenizer 호출 — KR worst case 4-5 회 (1200→600→300→150). 각 호출 ~1 ms (tokenizer-only, no inference). 부담 적음.
+
+### Option B — Tokenizer 의 `OnlyFirst` → `LongestFirst` 로 변경
+
+`crates/kebab-nli/src/onnx.rs:219` 의 `TruncationStrategy::OnlyFirst` → `LongestFirst`.
+
+**Pros**: char-budget / retry 도입 불필요, 두 sequence 모두 자연스럽게 truncate.
+**Cons**: 짧은 premise + 긴 hypothesis 케이스에서 premise 는 살아남고 hypothesis 가 잘림 — premise 가 충분히 사실 정보를 담고 있어야 entailment 가 의미 있음 (보통 그렇지만 보장은 아님). 또한 task spec `2026-05-25-p9-fb-41-finalize-spec.md::§2.2.3` 에 `OnlyFirst` 가 명시되어 있어 task spec amend + HOTFIXES.md entry 필요 (round-1 critic H2 closure — frozen design contract `2026-04-27-kebab-final-form-design.md` 에는 §2.2.3 자체 부재; OnlyFirst 명시는 task spec 수준).
+
+### Option C — NLI crate 가 자기 cap 을 관리 (hypothesis-aware truncate in `score()`)
+
+`crates/kebab-nli/src/onnx.rs::score` 안에서 hypothesis tokenize → token-count 체크 → 초과면 char-budget 으로 truncate 후 retry.
+
+**Pros**: caller 가 cap 을 모르고도 안전. encapsulation 잘 됨.
+**Cons**: pipeline-side 의 기존 `truncate_for_nli` 와 layering 불일치 (premise 는 pipeline 이 자르는데 hypothesis 는 nli crate 가 자르는 비대칭).
+
+### 권장: **Option A** (KR best-effort + graceful fallback + 대칭 + frozen design contract 무관)
+
+추후 개선 (별 task): token-count 기반 *premise* budget — 현재 `MAX_NLI_PREMISE_CHARS = 1600` 도 KR-heavy chunks 에서 동일 risk. fix_file 의 기존 주석 *"v0.18.1 candidate: token-count-based budget"* — 별 task 후보 (§6 #1 참조).
+
+### 부수 fix (별 task 권장)
+
+**tracing surface — env var + stderr layer 안내**
+
+진단 시간 단축을 위해 다음 중 하나 (별 PR / 별 task):
+
+- (B-1) `README.md` / `docs/SMOKE.md` 에 *"진단 시 `RUST_LOG=debug` + `tail -f ~/.local/state/kebab/logs/kb.log.$(date -I)`"* 한 줄 추가.
+- (B-2) `crates/kebab-app/src/logging.rs` 에 `KEBAB_LOG_STDERR=1` env var 가 set 되어 있으면 stderr fmt::layer 도 추가. opt-in 이라 wire / 일반 사용자 영향 0.
+
+**본 S3 fix PR 의 scope 밖** — analyst 권장 따라 별 task 분리 (PR 의 review lens 분리).
+
+---
+
+## §4. Wire / behavior 영향
+
+- **Wire schema**: 변경 없음. `answer.v1` / `error.v1` 모두 영향 없음.
+- **`schema_version` bump**: 불필요.
+- **`Cargo.toml` version bump**: 불필요 (test/diagnostic fix, breaking schema 아님).
+- **Behavior 변화** (round-2 critic RM3 정합): 긴-답변 multi-hop 케이스 *대다수* 가 `nli_model_unavailable` 대신 *실제 NLI score* 를 받음. score 가 낮으면 `nli_verification_failed` (정상 정확한 refusal), 높으면 `nli_passed=true` (이전엔 도달 못함). KR-extreme density 최악 case 만 기존 `nli_model_unavailable` 유지 (graceful fallback, regression 0). 즉 *false-negative 대다수 줄어듦 + 최악 case 의 wire 결과 보존*.
+- **Default behavior**: 사용자 default config 는 `nli_threshold = 0` (NLI off) — 일반 사용자에는 무영향. NLI 활성 사용자에게는 *옳은 방향의 변경* (이전엔 false unavailable, 이후엔 truthful score).
+
+---
+
+## §5. 검증 plan
+
+### 5.1. Unit test — `crates/kebab-nli/tests/inference.rs` (round-1 critic M1 보강)
+
+3 신규 test (network 요구 → `#[ignore]` 패턴 기존과 동일):
+
+```rust
+/// Test 6: EN-long hypothesis alone exceeds max_length. Without pipeline-side
+/// truncation, OnlyFirst strategy dead-ends. Pin raw nli crate behavior so
+/// any future regression in the pipeline-side budget surfaces as a clear
+/// nli-level err.
+#[test]
+#[ignore]
+fn score_long_en_hypothesis_returns_err_without_pipeline_truncation() {
+    let cfg = Config::defaults();
+    let v = OnnxNliVerifier::new(&cfg).expect("verifier construction");
+    let premise = "short premise";
+    let hypothesis = "lorem ipsum ".repeat(500); // ~6 000 chars / >>512 tokens
+    let result = v.score(premise, &hypothesis);
+    assert!(result.is_err(), "long hypothesis should err under OnlyFirst");
+    let msg = result.err().unwrap().to_string();
+    assert!(
+        msg.contains("Truncation error") || msg.contains("too short to respect"),
+        "expected tokenizer truncation err, got: {msg}"
+    );
+}
+
+// (round-2 critic RM4 closure: 이전 draft 의 "Test 7 KR-heavy hypothesis" 가
+// conditional assertion (`if result.is_err()`) 로 regression pin 가 아닌
+// measurement test 패턴이었음 — drop. Test 8 (hypothesis_token_count 의
+// bounded range assertion) 이 KR/EN behavior 의 deterministic pin 으로
+// 더 안전. KR safety 의 wire-level pin 은 §5.3 의
+// `long_kr_synth_answer_retries_with_smaller_budget` 의 retry 시뮬레이션
+// 이 담당.)
+
+/// Test 7 (renumbered): hypothesis_token_count helper — pure tokenizer probe.
+/// **vtable dispatch 검증** (round-3 critic RC1-residual + round-1 plan verifier
+/// Gap 1 closure — concrete type 호출은 inherent method 우선이라 RC1-residual
+/// 버그 잡지 못함; `&dyn NliVerifier` 통해 dispatch 해야 vtable 등록 검증).
+/// inherent-only 배치 시 default `Ok(0)` 반환 → `assert!(count > 0)` 실패.
+/// trait impl block 배치 시 real tokenizer → PASS.
+/// Pipeline 이 retry budget 결정에 사용하는 API 의 정확성 pin.
+#[test]
+#[ignore]
+fn hypothesis_token_count_dispatches_correctly_via_dyn_trait() {
+    let cfg = Config::defaults();
+    let v = OnnxNliVerifier::new(&cfg).expect("verifier construction");
+    // ★ vtable dispatch — &dyn NliVerifier 통해 호출. inherent-only 배치 시 default
+    // `Ok(0)` 반환 → assert!(count > 0) 실패. trait impl block 배치 시 real
+    // tokenizer → PASS. round-3 critic RC1-residual + round-1 plan verifier
+    // Gap 1 closure 의 코드-수준 regression pin.
+    let v_dyn: &dyn NliVerifier = &v;
+    // 짧은 EN — 4 chars/token 추정 (24 chars / 4 = ~6 tokens)
+    let en_count = v_dyn.hypothesis_token_count("short english test sentence")
+        .expect("EN dyn dispatch must reach real tokenizer (vtable check)");
+    assert!(
+        en_count > 0 && en_count < 20,
+        "EN ~6 tokens expected via vtable dispatch, got {en_count} \
+         (Ok(0) signals inherent-only placement bug — RC1-residual)"
+    );
+    // 짧은 KR — 1-2 chars/token (15 chars / 1.5 = ~10 tokens)
+    let kr_count = v_dyn.hypothesis_token_count("짧은 한국어 테스트 문장입니다")
+        .expect("KR dyn dispatch must reach real tokenizer");
+    assert!(kr_count > 0 && kr_count < 30, "KR ~10 tokens expected, got {kr_count}");
+}
+```
+
+### 5.2. Pure-fn test — chars-only sub-helper (round-2 critic RM2 closure)
+
+`truncate_hypothesis_for_nli_with_budget(verifier, hypothesis)` 는 *verifier 의존* 이라 pure-fn 아님. round-2 critic RM2 권장 (a) 채택 — chars-only sub-helper 분리:
+
+```rust
+// crates/kebab-rag/src/pipeline.rs
+/// Chars-only truncation arithmetic. Pure: input → output, no side effect.
+/// Used by `truncate_hypothesis_for_nli_with_budget` 의 retry loop 의 각 step.
+/// Pure-fn test (§5.2) 가 이 helper 의 산술 회귀 핀.
+pub(crate) fn truncate_chars(s: &str, budget: usize) -> (String, bool) {
+    if s.chars().count() <= budget {
+        (s.to_string(), false)
+    } else {
+        let truncated: String = s.chars().take(budget).collect();
+        (truncated, true)
+    }
+}
+```
+
+Test (`crates/kebab-rag/tests/multi_hop.rs` 또는 pipeline.rs `#[cfg(test)]` mod) — 4 boundary cases:
+
+- 입력 <= budget: identity, `was_truncated=false`.
+- 입력 > budget: budget chars, `was_truncated=true`.
+- 입력 = empty: identity.
+- 입력 = 한글 chars (codepoint, NOT byte): `chars().count()` 정확.
+
+`truncate_hypothesis_for_nli_with_budget` 의 retry-loop integration 은 §5.3 mock test 가 검증 (verifier 의 token_count_fn 시뮬레이션). 두 layer 의 *분리된 검증* — pure-fn 산술 + integration retry path.
+
+### 5.3. Mock multi-hop test — long-answer happy path (verifier Blocker 1 closure)
+
+`crates/kebab-rag/tests/multi_hop_nli_stream.rs` (또는 신규 `multi_hop_nli_truncate.rs`).
+
+**Helper 신규 작성 필요** (verifier Blocker 1): 기존 `crates/kebab-rag/tests/common/mod.rs::MockNliVerifier` 는 *고정 mode* (`MockMode::Scores` / `MockMode::Err`) 만 지원 — closure spy 패턴 불가. 신규 `SpyNliVerifier` (또는 `FakeNliVerifier`) helper 추가:
+
+```rust
+// crates/kebab-rag/tests/common/mod.rs 에 추가
+use std::sync::{Arc, Mutex};
+use kebab_nli::{NliScores, NliVerifier};
+
+/// Closure-based NLI verifier — caller 가 `(premise, hypothesis) -> Result<NliScores>`
+/// + `(hypothesis) -> Result<usize>` 두 closures 정의 가능 + spy 로 입력 capture.
+/// 기존 `MockNliVerifier` (고정 mode) 와 sibling. round-1 verifier Blocker 1
+/// + round-2 verifier Caveat 1 closure (struct 필드 위치 + 2-arg constructor).
+pub struct SpyNliVerifier {
+    pub score_fn: Arc<dyn Fn(&str, &str) -> anyhow::Result<NliScores> + Send + Sync>,
+    pub hypothesis_token_count_fn:
+        Arc<dyn Fn(&str) -> anyhow::Result<usize> + Send + Sync>,
+    pub received_premises: Mutex<Vec<String>>,
+    pub received_hypotheses: Mutex<Vec<String>>,
+}
+
+impl SpyNliVerifier {
+    /// 2-arg constructor — score + hypothesis_token_count 둘 다 closure 로
+    /// 주입. Caveat 1 의 "Arc<T> field mutation 불가" 회피.
+    pub fn new<F, G>(score_fn: F, token_count_fn: G) -> Arc<Self>
+    where
+        F: Fn(&str, &str) -> anyhow::Result<NliScores> + Send + Sync + 'static,
+        G: Fn(&str) -> anyhow::Result<usize> + Send + Sync + 'static,
+    {
+        Arc::new(Self {
+            score_fn: Arc::new(score_fn),
+            hypothesis_token_count_fn: Arc::new(token_count_fn),
+            received_premises: Mutex::new(Vec::new()),
+            received_hypotheses: Mutex::new(Vec::new()),
+        })
+    }
+}
+
+impl NliVerifier for SpyNliVerifier {
+    fn score(&self, premise: &str, hypothesis: &str) -> anyhow::Result<NliScores> {
+        self.received_premises.lock().unwrap().push(premise.to_string());
+        self.received_hypotheses.lock().unwrap().push(hypothesis.to_string());
+        (self.score_fn)(premise, hypothesis)
+    }
+
+    fn hypothesis_token_count(&self, hypothesis: &str) -> anyhow::Result<usize> {
+        (self.hypothesis_token_count_fn)(hypothesis)
+    }
+}
+```
+
+(NliVerifier trait 확장 + OnnxNliVerifier 의 trait impl override 는 §3 에서 이미 명시 — 중복 제거 (round-3 critic RC1-residual closure 의 일부).)
+
+**Pipeline construction pattern** (round-2 plan critic MAJOR #3 closure: Option B inline 채택, helper 신규 작성 안 함): 각 test 안에서 직접 inline. 아래 3 test 의 `build_test_pipeline_with_long_answer(verifier, char_len)` 호출은 *placeholder semantic only* — 실 구현은 `plan §2 step 8` 의 Option B 5-component recipe (`RagEnv::new()` + `ScriptedRetriever::new(...)` + `ScriptedLm::new(vec!["[\"q1\"]", "[]", &"lorem ".repeat(char_len/12)])` + `RagPipeline::new(cfg, retriever, lm, sqlite).with_verifier(verifier)`) 따를 것. 기존 `crates/kebab-rag/tests/multi_hop.rs` 의 happy-path test pattern 그대로.
+
+**3 신규 test**:
+
+```rust
+/// EN long answer → char-budget 만으로 충분 → retry 0회 → grounded.
+#[test]
+fn long_en_synth_answer_truncated_before_nli_call() {
+    let verifier = SpyNliVerifier::new(
+        |_premise, _hypothesis| {
+            Ok(NliScores { entailment: 0.9, neutral: 0.05, contradiction: 0.05 })
+        },
+        |_h| Ok(100), // budget 안 — retry 0
+    );
+
+    let pipeline = build_test_pipeline_with_long_answer(verifier.clone(), 5_000); // EN-long
+    let answer = pipeline.ask_multi_hop("q", AskOpts::default()).unwrap();
+
+    let received = verifier.received_hypotheses.lock().unwrap();
+    assert_eq!(received.len(), 1, "verifier called exactly once");
+
+    let hyp = &received[0];
+    assert_eq!(
+        hyp.chars().count(),
+        1200,
+        "hypothesis truncated to MAX_NLI_HYPOTHESIS_CHARS_INITIAL"
+    );
+
+    // round-1 critic M2: direction Right pin — hypothesis 의 *첫 1200 chars* 가
+    // input 의 *첫 1200 chars* 와 일치 (= Right direction = 앞부분 보존). Left/Middle
+    // direction 으로 regress 시 본 test 가 즉시 fail.
+    let input_first_1200: String = "lorem ipsum ".repeat(2000).chars().take(1200).collect();
+    assert_eq!(hyp.as_str(), input_first_1200, "Right direction = front preserved");
+
+    assert_eq!(answer.refusal_reason, None, "long answer must reach happy path");
+}
+
+/// KR long answer → token count > budget → char budget 절반화 retry → eventual fit.
+/// round-1 critic H3 + verifier Blocker 2 의 KR safety pin.
+#[test]
+fn long_kr_synth_answer_retries_with_smaller_budget() {
+    let token_count_call_count = Arc::new(Mutex::new(0));
+    let tcc = token_count_call_count.clone();
+    // 시뮬레이션: 1200 chars → 900 tokens (cap 초과), 600 chars → 450 tokens (cap 초과),
+    // 300 chars → 220 tokens (cap 안). retry 3 회.
+    let verifier = SpyNliVerifier::new(
+        |_premise, _hypothesis| {
+            Ok(NliScores { entailment: 0.85, neutral: 0.10, contradiction: 0.05 })
+        },
+        move |h| {
+            *tcc.lock().unwrap() += 1;
+            let count = h.chars().count();
+            if count > 1000 { Ok(900) }
+            else if count > 500 { Ok(450) }
+            else { Ok(220) }
+        },
+    );
+
+    let pipeline = build_test_pipeline_with_long_answer(verifier.clone(), 2_500); // 2500 chars KR-sim
+    let answer = pipeline.ask_multi_hop("q", AskOpts::default()).unwrap();
+
+    assert!(
+        *token_count_call_count.lock().unwrap() >= 3,
+        "retry loop must call token_count >= 3 (1200, 600, 300 candidates)"
+    );
+    let received = verifier.received_hypotheses.lock().unwrap();
+    assert!(received[0].chars().count() <= 300, "final hypothesis <= 300 chars after retry");
+    assert_eq!(answer.refusal_reason, None, "KR long answer reaches happy path after retry");
+}
+
+/// Retry budget 소진 시 graceful unavailable — fix 의 fallback path 가
+/// 기존 unavailable wire shape 유지 (regression 0).
+#[test]
+fn unrelenting_token_overflow_falls_through_to_unavailable() {
+    let verifier = SpyNliVerifier::new(
+        |_premise, _hypothesis| {
+            unreachable!("score not reached when token-count check fails");
+        },
+        |_h| Ok(9_999), // 모든 budget 에서 token count 초과 — retry 소진
+    );
+
+    let pipeline = build_test_pipeline_with_long_answer(verifier.clone(), 3_000);
+    let answer = pipeline.ask_multi_hop("q", AskOpts::default()).unwrap();
+
+    assert_eq!(
+        answer.refusal_reason,
+        Some(RefusalReason::NliModelUnavailable),
+        "graceful fallback to unavailable when retry exhausted"
+    );
+}
+```
+
+### 5.4. Dogfood S3 + KR 케이스 retest (round-1 critic H4 + verifier non-blocker closure)
+
+Fix 적용 후 EN (S3) + KR (long-answer 시뮬레이션) 두 케이스 retest:
+
+```bash
+# round-1 plan verifier Gap 2 closure — config 의 nli_threshold > 0 사전 확인.
+# nli_threshold = 0 (default) 이면 NLI off 상태 → retest 가 vacuous (fix 의 효과 미발현).
+grep -E "^\s*nli_threshold\s*=" /build/cache/dogfood-v018/config/config.toml \
+  || echo "WARNING: nli_threshold 미설정 — dogfood retest 전에 config 에 [rag] nli_threshold = 0.3 (또는 원하는 값) 추가 필수"
+# nli_threshold > 0 확인 후 진행.
+
+CARGO_TARGET_DIR=/build/out/cargo-target/target \
+  cargo build --release -p kebab-cli -j 1
+
+# Non-blocker fix: stdout / stderr 분리 (2>&1 제거 — jq parsing 충돌 회피).
+# stderr 는 file appender (kb.log) 로만 가므로 stderr redirect 불요.
+
+# EN — S3 본래 케이스
+RUST_LOG=info,kebab_rag=debug,kebab_nli=debug \
+  /build/out/cargo-target/target/release/kebab ask \
+    "Why does kebab combine multilingual-e5, LanceDB, and RRF together?" \
+    --multi-hop --config /build/cache/dogfood-v018/config/config.toml \
+    --json > /build/cache/dogfood-v018/results/post-s3-fix/s3-en-retest.json
+
+# KR — long-answer 시뮬레이션 (KR self-knowledge query)
+# KR-heavy hypothesis 가 본 fix 의 token-count fallback retry path 를 trigger 검증
+RUST_LOG=info,kebab_rag=debug,kebab_nli=debug \
+  /build/out/cargo-target/target/release/kebab ask \
+    "kebab 의 multi-hop RAG + NLI verification 의 동작 원리는 무엇인가?" \
+    --multi-hop --config /build/cache/dogfood-v018/config/config.toml \
+    --json > /build/cache/dogfood-v018/results/post-s3-fix/s3-kr-retest.json
+
+# 결과 검증
+jq -r '.refusal_reason, .verification.nli_score, .usage.completion_tokens' \
+  /build/cache/dogfood-v018/results/post-s3-fix/s3-en-retest.json
+jq -r '.refusal_reason, .verification.nli_score, .usage.completion_tokens' \
+  /build/cache/dogfood-v018/results/post-s3-fix/s3-kr-retest.json
+
+# Tracing log 의 truncate path 활성 확인 (file appender 분리 grep)
+grep "NLI hypothesis truncated\|hypothesis_token_count" \
+  ~/.local/state/kebab/logs/kb.log.$(date -I)
+```
+
+**성공 기준**:
+- **EN (S3)**: `refusal_reason` 이 `null` (NLI passed) 또는 `nli_verification_failed` (정상 거부). **`nli_model_unavailable` 아님**.
+- **KR (long-answer)**: 동일 — `nli_model_unavailable` 아님. token-count retry path 가 작동했음을 log 의 *retry budget* line 으로 확인.
+- `verification.nli_score` 가 finite float (이전엔 `null`).
+- `~/.local/state/kebab/logs/kb.log.$(date -I)` 에 `truncate_hypothesis_for_nli_with_budget` 의 retry trace 또는 success debug line emit.
+
+### 5.5. 회귀 — 기존 NLI test 전부
+
+```bash
+CARGO_TARGET_DIR=/build/out/cargo-target/target \
+  cargo test -p kebab-nli -j 1
+CARGO_TARGET_DIR=/build/out/cargo-target/target \
+  cargo test -p kebab-rag -j 1
+CARGO_TARGET_DIR=/build/out/cargo-target/target \
+  cargo test --workspace --no-fail-fast -j 1
+CARGO_TARGET_DIR=/build/out/cargo-target/target \
+  cargo clippy --workspace --all-targets -j 1 -- -D warnings
+```
+
+기존 NLI test 전부 PASS 유지 (e.g. `score_empty_hypothesis_returns_err`, `multi_hop_nli_model_unavailable_refuses`, `nli_model_unavailable_emits_final_stream_event_with_refusal`).
+
+---
+
+## §6. 비범위 (별 task 후보)
+
+1. **Token-count 기반 NLI budget** — `MAX_NLI_PREMISE_CHARS` / `MAX_NLI_HYPOTHESIS_CHARS` 둘 다 *char* 기반. KR SentencePiece tokenization 측정 후 token-count 기반 dynamic budget 으로 전환. `MAX_NLI_PREMISE_CHARS` 의 기존 주석에 이미 *"v0.18.1 candidate"* 명시.
+
+2. **`KEBAB_LOG_STDERR=1` opt-in stderr layer** — diagnostic 시 stderr 로 log 흘려보기. logging.rs 의 fmt::layer 한 줄 추가.
+
+3. **`RefusalReason::NliInputTooLong` 신규 variant** — 현재는 OnlyFirst-err 가 unavailable 로 단일화. 길이 초과만 별 wire variant 로 분리하면 사용자/agent 가 retry 전략 결정 가능 (e.g. answer length cap 후 재요청). **wire schema breaking change** — `answer.v1` enum 확장 = additive minor, OK. 그러나 별 cycle 권장.
+
+4. **NLI multi-model adapter** — sentence-pair NLI 외 token-level entailment scorer / chunk-level entailment scorer 같은 알터너티브.
+
+5. **NLI threshold tuning** — S1 (0.058) / S7 (0.0035) / S10 (0.0028) 모두 *매우 낮은* entailment. default `nli_threshold` 결정 (현재 0 = off). 본 S3 fix 후 truthful score 분포 측정 필요.
+
+6. **gemma3:4b 의 4500-char 답변 자체 억제** — synthesize prompt 의 max-output 가이드 강화. NLI 와 무관한 별 문제.
+
+7. **README / SMOKE 의 tracing 진단 안내** — Option B-1 (한 줄 추가). 별 PR / 별 task.
+
+---
+
+## §7. 합의된 출구 조건
+
+이 spec 이 closed 되려면:
+
+1. Fix PR 머지 — round-2 critic RM1 closure: 정확한 신규 심볼 list (frontmatter `fix_symbols` verbatim):
+   - `kebab-nli/src/lib.rs::NliVerifier::hypothesis_token_count` (trait method, default impl `Ok(0)` for backward-compat).
+   - `kebab-nli/src/onnx.rs::OnnxNliVerifier::hypothesis_token_count` (override + `HYPOTHESIS_TOKEN_BUDGET = 256` const).
+   - `kebab-rag/src/pipeline.rs::MAX_NLI_HYPOTHESIS_CHARS_INITIAL = 1200` + `MAX_NLI_HYPOTHESIS_CHARS_MIN = 150` consts.
+   - `kebab-rag/src/pipeline.rs::truncate_hypothesis_for_nli_with_budget(verifier, hypothesis)` helper + step 8.5 hook 의 callsite 수정.
+   - `kebab-rag/tests/common/mod.rs::SpyNliVerifier` helper (closure spy + token_count_fn).
+2. Unit test 3개 (5.1 ignored / 5.2 pure-fn / 5.3 mock multi-hop) 추가 + 모두 GREEN.
+3. Dogfood S3 retest GREEN — `nli_model_unavailable` → `nli_verification_failed` 또는 happy path 로 전환 확인.
+4. `cargo test --workspace --no-fail-fast -j 1` GREEN — 기존 NLI test 전부 PASS, 신규 회귀 0.
+5. `cargo clippy --workspace --all-targets -j 1 -- -D warnings` clean.
+6. `tasks/HOTFIXES.md` 에 신규 dated entry **`## 2026-05-26 — S3 NLI unavailable — hypothesis truncate + token-count fallback`** 추가 (date-top convention). round-1 critic M6 closure: **HOTFIX 번호 부여 안 함** (HOTFIX #15 처럼 fixture-issue 가 아닌 *production behavior fix* — sibling fb-41 PR-9 closure entry 와 같은 layer 의 follow-up). line 17 (현재 HOTFIX #15) 직전, 그 직전이 2026-05-25 fb-41 entry. 형식: Symptom / Root cause / Action / Amends 4-block — HOTFIX #15 entry 와 동일 sibling pattern.
+
+---
+
+## §8. Risk
+
+- **Risk level: 낮음.** fix 가 NLI 입력 정규화 layer (pipeline-side helper + nli crate 의 token-count probe) 만 건드림. Wire 변경 없음, behavior 는 false-negative (정상 답변이 unavailable 로 거부) 줄어듦 + graceful fallback (worst case 시 기존 unavailable 유지 = regression 0).
+- **Char-budget 단독 부족 (round-1 critic H3 + verifier Blocker 2 closure)**: Option A 의 token-count fallback retry 가 KR safe 보장. budget 1200 → 600 → 300 → 150 chars 까지 retry → 최악 fallback 시 `nli_model_unavailable` (현재 동작 유지).
+- **Hypothesis truncate direction `Right`** (앞부분 보존, 뒤 손실): LLM 답변의 *도입부* 가 핵심 claim. *결론부* 는 보통 도입부 재요약 — Right direction 이 안전한 default. round-1 critic M2 의 test pin (`long_en_synth_answer_truncated_before_nli_call` 의 `assert_eq!(hyp.as_str(), input_first_1200)`) 으로 회귀 detection.
+
+### Pre-mortem (round-1 critic M3 closure)
+
+| 시나리오 | 영향 | Mitigation |
+|---|---|---|
+| **(a) KR-extreme hypothesis (e.g. 한자/CJK character density)** — char-truncate + retry 모두 token count 초과 | budget 소진 → graceful `nli_model_unavailable` fallback | spec §3 의 `MAX_NLI_HYPOTHESIS_CHARS_MIN = 150` floor + §6 #1 의 token-count budget 별 task. 현재는 graceful refuse (regression 0). |
+| **(b) Conclusion-bearing hypothesis** — LLM 의 핵심 claim 이 *후반부* (예: "Therefore, X" 단락 답변 끝) | Right truncation 으로 conclusion 손실 → entailment score 낮아짐 → false `nli_verification_failed` 가능 | NLI entailment 는 *front-loaded fact 위주* — 보통 도입부 claim 만으로 충분. Worst case 도 *false positive (잘못된 entailment passed)* 아닌 *false negative (정상이지만 reject)* — fail-closed semantic 보존. 사용자가 `nli_threshold = 0` 으로 임시 disable + retry 가능. |
+| **(c) Grapheme cluster split** (emoji ZWJ, KR 조합형 NFD 분해 등) — `chars().take(N)` 가 grapheme 중간 split | tokenizer 가 UNK 폭주 → score 왜곡 | mDeBERTa SentencePiece 가 codepoint-level subword — grapheme split 영향 미미 (worst case UNK 1-2 개, NLI signal 의미 보존). |
+| **(d) `hypothesis_token_count` probe 자체 fail** (e.g. tokenizer crate panic on edge input) | `with_context` 가 anyhow err 로 wrap → graceful fallback | empty hypothesis 는 `acc.trim().is_empty()` guard 가 step 8.5 진입 전 차단. 비-empty 의 edge input 은 mDeBERTa tokenizer 가 robust 처리 (기존 PR-9b unit test 의 long-premise / empty-hypothesis 검증).
diff --git a/tasks/HOTFIXES.md b/tasks/HOTFIXES.md
index 5815edd..ec96cf5 100644
--- a/tasks/HOTFIXES.md
+++ b/tasks/HOTFIXES.md
@@ -14,6 +14,24 @@ historical contract that was implemented; this file accumulates the
 deltas so phase 5+ readers can find the live behavior without diffing
 git history.
 
+## 2026-05-26 — S3 NLI unavailable — hypothesis truncate + token-count fallback
+
+**Symptom**: S3 dogfood query (`"Why does kebab combine multilingual-e5, LanceDB, and RRF together?"`) 가 NLI 활성 (`rag.nli_threshold > 0`) 시 `nli_model_unavailable` 일관 fail. `~/.local/state/kebab/logs/kb.log.2026-05-26` 의 5 회 WARN 라인 `tokenizer.encode failed: Truncation error: Sequence to truncate too short to respect the provided max_length`.
+
+**Root cause**: `crates/kebab-nli/src/onnx.rs::load_tokenizer` 가 mDeBERTa-v3 tokenizer 의 truncation strategy 를 `OnlyFirst` 로 설정 — *2-sequence input* 에서 첫 번째 sequence (premise) 만 truncate. LLM 의 949-token (4564-char) 장문 답변 = hypothesis 단독이 512-token cap 초과 → premise 를 0 까지 잘라도 fit 시킬 방법 없음 → tokenizer `SequenceTooShortToTruncate` raise. `pipeline.rs::ask_multi_hop` 의 step 8.5 hook 이 premise-side `truncate_for_nli` 만 적용, hypothesis-side 무방비.
+
+**Action**: 4 production files + 2 test files + 1 doc entry — Option A (KR-safe + graceful fallback + symmetric layering).
+
+- `crates/kebab-nli/src/lib.rs`: `NliVerifier::hypothesis_token_count(&self, &str) -> anyhow::Result<usize>` trait method 추가 (default `Ok(0)` backward-compat).
+- `crates/kebab-nli/src/onnx.rs`: `OnnxNliVerifier::HYPOTHESIS_TOKEN_BUDGET = 256` inherent const + `impl NliVerifier for OnnxNliVerifier {}` block **안** 에서 `hypothesis_token_count` override (real `tokenizer.encode` probe). **CRITICAL**: trait impl block 안 위치 필수 — inherent `impl OnnxNliVerifier {}` 안에 두면 vtable 미등록 → trait dispatch 시 default `Ok(0)` → retry loop 즉시 통과 → production silent NO-OP.
+- `crates/kebab-rag/src/pipeline.rs`: `MAX_NLI_HYPOTHESIS_CHARS_INITIAL = 1200` + `MAX_NLI_HYPOTHESIS_CHARS_MIN = 150` consts + `pub(crate) fn truncate_chars` pure-fn (codepoint-aware) + `pub fn truncate_hypothesis_for_nli_with_budget` retry helper (char budget 1200 → 600 → 300 → 150 절반화 retry + min floor 미달 시 `anyhow::bail!`) + step 8.5 hook 의 callsite explicit `match { Ok => x, Err => return refuse_nli_model_unavailable(...) }` (—`?` propagation 금지: wire `answer.v1 + NliModelUnavailable refusal` 유지). `v.score(&truncated_premise, &acc)` → `v.score(&truncated_premise, &truncated_hypothesis)`.
+- `crates/kebab-rag/tests/common/mod.rs`: `SpyNliVerifier` closure-based helper (2-arg constructor: `score_fn` + `token_count_fn`, `Mutex<Vec<String>>` capture). 기존 `MockNliVerifier` (고정 mode) 와 sibling.
+- `crates/kebab-nli/tests/inference.rs`: 2 신규 `#[ignore]` tests — `score_long_en_hypothesis_returns_err_without_pipeline_truncation` (raw nli crate 의 OnlyFirst dead-end pin) + `hypothesis_token_count_dispatches_correctly_via_dyn_trait` (vtable dispatch pin — `&dyn NliVerifier` 통해 호출, EN/KR bounded range assertion, RC1-residual silent NO-OP regression pin).
+- `crates/kebab-rag/tests/multi_hop_nli_truncate.rs` (신규): 3 mock multi-hop tests — `long_en_synth_answer_truncated_before_nli_call` (EN happy + Right direction pin) + `long_kr_synth_answer_retries_with_smaller_budget` (KR retry path + ≤ 300 chars 최종) + `unrelenting_token_overflow_falls_through_to_unavailable` (graceful unavailable fallback pin, score_fn 은 `unreachable!()`).
+- `crates/kebab-rag/src/pipeline.rs::#[cfg(test)] mod tests`: 4 `truncate_chars` boundary tests (identity / truncate / empty / KR codepoint).
+
+**Amends**: spec `docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md` (4 round APPROVE) cross-link. task spec `tasks/p9/p9-fb-41-multi-hop-finalize.md` 의 NLI 동작 추가 보강 (hypothesis-side budget 신규 — frozen task spec 자체는 변경 안 함, 본 entry 가 live source of truth). HOTFIX 번호 미부여 — sibling fb-41 PR-9 closure layer 의 production behavior follow-up (HOTFIX #15 의 fixture-issue 와 다른 layer).
+
 ## 2026-05-26 — HOTFIX #15 — MCP ask multi_hop dispatch-divergence assertion stale (fixture 보강)
 
 **Symptom**: PR-7 (multi-hop probe-first dogfood fix) 머지 후 `kebab-mcp::tools_call_ask_multi_hop::ask_tool_routes_multi_hop_true_to_decompose_first` 가 모든 workspace test 에서 deterministic fail (no_chunks short-circuit 으로 `is_error=Some(false)`).