chore(rag): post-PR9 refactor — H1/H2/H3/D/E + test coverage + post-refactor dogfood retest

OMC team `post-pr9-refactor` 의 architectural cleanup. architect priorities 분석 후 executor + test-engineer 가 file edits, system-architect 가 component-level review 로 *pre-cut nothing — all v0.18.1+ defer* 결론. ## Executor 작업 (H1/H2/H3/D/E) - **H1** (kebab-nli/src/onnx.rs): `[models.nli]` config wire 활성화. `DEFAULT_MODEL_ID` const 제거 (kebab-config 의 NliCfg::defaults 가 single source). OnnxNliVerifier::new 가 config.models.nli.model 읽고 config.models.nli.provider 가 "onnx" 아니면 anyhow::bail. 3 stale "PR-9c-1 will wire this" 코멘트 제거. 2 unit test 추가 (`new_uses_config_model_id`, `new_rejects_unsupported_provider`). - **H2** (kebab-rag/src/pipeline.rs): `truncate_for_nli(premise: &str, _hypothesis: &str)` → `truncate_for_nli(premise: &str)`. v0.18.1 placeholder doc 제거. 4 callsite (tests/multi_hop.rs) 갱신 + test rename `multi_hop_truncate_for_nli_preserves_hypothesis` → `multi_hop_truncate_for_nli_char_budget` (contract 정합). - **H3** (kebab-rag/src/pipeline.rs:1041): `was_truncated` 가 tracing::debug! 으로 surface (observability 추가, signature 보존 — caller logging contract). - **D** (kebab-mcp/tests/tools_call_ask_multi_hop.rs): request_timeout_secs 2 → 5 (slow CI 안정성), `mh_code` discriminator 제거. dispatch contract = `mh.is_error.unwrap_or(false)` (기존 assertion 으로 충분). - **E** (tasks/HOTFIXES.md + pipeline.rs:1633-1638): fb-41 PR-9 closure entry 의 sibling 으로 "### PR-9 NLI refusal: terminal Synthesize hop omitted from hops trace" subsection 추가. pipeline 의 "cleanup deferred to a follow-up" → "// See tasks/HOTFIXES.md ... for follow-up" cross-link. ## Test-engineer 작업 (T1/T2/T3/T4, 9 new tests) - **T1** (kebab-nli/src/onnx.rs::tests): sanitize_model_id 3 unit (replaces_slash / idempotent / leaves_other_chars). - **T2** (kebab-rag/tests/multi_hop_nli_panic.rs 신규): 2 panic-path tests — facade invariant (`expect("verifier must be Some when nli_threshold > 0.0")`) 의 #[should_panic] + threshold=0 의 companion. - **T3** (kebab-rag/tests/multi_hop_nli_stream.rs 신규): 2 StreamEvent::Final tests — refuse_nli_verification + refuse_nli_model_unavailable 의 stream_sink Final 분기 wire shape pinning. - **T4** (kebab-app/tests/open_with_config_nli.rs 신규): 2 NLI failure path — model_dir 가 unwritable 일 때 App::open_with_config 의 Result<App> Err (with "OnnxNliVerifier" in chain) + threshold=0 일 때 graceful skip. ## System-architect 결론 3 lenses (absorption / duplication / under-engineered interface) 분석 결과 — *pre-cut nothing*. Top-3 items 모두 v0.18.1+ defer: - Lens 1: kebab-normalize + kebab-parse-types 흡수 가능 (parse-md 만 사용, 5 parsers 우회) → v0.18.1+. - Lens 3: Extractor + Chunker trait 의 dead polymorphism (모든 callsite 가 hardcoded) → v0.18.1+. - Lens 1 bundled: kebab-source-fs 가 kebab-parse-code 의 9 tree-sitter grammars drag → low-risk dep-graph win, v0.18.1+ bundled. - Defer-with-intent: LanguageModel async refactor (cloud-LLM 시), NliVerifier::score_batch + typed NliError (2nd impl 시), compute_stale → kebab-core::stale. 보고서: /build/cache/tmp/post-pr9-refactor-priorities.md, /build/cache/tmp/system-architecture-priorities.md (둘 다 repo 외 — analysis 보존). ## 검증 - cargo test -p kebab-nli -j 1 → 11/11 pass. - cargo test -p kebab-rag -j 1 → 75/75 pass (5 NLI multi-hop + 4 신규 T2/T3 포함). - cargo test -p kebab-app -j 1 → 23 pass + 2 ignored (T4 의 2 포함). - cargo test -p kebab-mcp --test tools_call_ask_multi_hop -j 1 → 1 pass + 1 pre-existing flaky (HOTFIX #15, no_chunks short-circuit, executor D fix 와 무관 — line 86 의 base assertion 이 fixture 없어서 fail). - cargo clippy --workspace --all-targets -j 1 -- -D warnings clean. - cargo test --workspace --no-fail-fast -j 1 → 1304 passed (+11 new) + 1 pre-existing flaky 동일. - **Post-refactor dogfood retest byte-identical** (PR-9d / post-cleanup / post-refactor 3번 모두): S7 0.0035389824770390987, S1 0.058334656059741974, S10 0.0027875436935573816, S3 nli_model_unavailable. docs/dogfood/v0.18.0/SUMMARY.md 에 "Post-architectural-refactor retest" section 추가. Wire 영향: 없음. Behavior 영향: 없음 (H1 의 config wiring 가 default 와 같은 model → byte-identical). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 04:42:37 +00:00
parent 3712d005cc
commit 7c27633df2
9 changed files with 508 additions and 52 deletions
--- a/crates/kebab-app/tests/open_with_config_nli.rs
+++ b/crates/kebab-app/tests/open_with_config_nli.rs
@@ -0,0 +1,81 @@
 //! Tests for `App::open_with_config`'s NLI verifier construction path.
 //!
 //! Coverage:
 //! 1. `open_with_config_nli_fails_when_model_dir_unwritable_and_threshold_positive` —
 //!    when `rag.nli_threshold > 0` and `storage.model_dir` is unwritable,
 //!    `open_with_config` returns `Err` with "OnnxNliVerifier" in the
 //!    error chain.
 //! 2. `open_with_config_nli_skipped_when_threshold_zero` —
 //!    same bad `model_dir`, but `rag.nli_threshold = 0.0` (gate disabled),
 //!    so `OnnxNliVerifier::new` is never called and `open_with_config`
 //!    succeeds.
 //!
 //! `/proc/1/root` is the init process's filesystem root; on Linux it is
 //! owned by root and not traversable by unprivileged users, making
 //! `create_dir_all` fail with `EACCES` — a reliable "unwritable path"
 //! that requires no test setup beyond the path literal.
 use kebab_config::Config;
 /// Return a `Config` whose `data_dir` lives in a fresh `TempDir`
 /// (so `SqliteStore::open` succeeds) and whose `model_dir` is set to
 /// `/proc/1/root` (unwritable by non-root processes on Linux).
 ///
 /// The `TempDir` is returned alongside the config so the caller keeps
 /// it alive until the test completes — dropping it early would delete
 /// the data directory before any assertions run.
 fn config_with_unwritable_model_dir() -> (tempfile::TempDir, Config) {
    let tmp = tempfile::tempdir().expect("tempdir");
    let mut cfg = Config::defaults();
    // Valid data_dir → SqliteStore::open + run_migrations succeed.
    cfg.storage.data_dir = tmp.path().to_string_lossy().into_owned();
    // /proc/1/root is only accessible to root; create_dir_all will
    // return EACCES for any unprivileged user, which is exactly the
    // failure mode we want to exercise.
    cfg.storage.model_dir = "/proc/1/root".to_string();
    (tmp, cfg)
 }
 // ── 1. Failure path: threshold > 0 + unwritable model_dir ─────────────────
 #[test]
 fn open_with_config_nli_fails_when_model_dir_unwritable_and_threshold_positive() {
    let (_tmp, mut cfg) = config_with_unwritable_model_dir();
    cfg.rag.nli_threshold = 0.5; // gate enabled → OnnxNliVerifier::new runs
    let result = kebab_app::App::open_with_config(cfg);
    let Err(err) = result else {
        panic!(
            "App::open_with_config must fail when model_dir is unwritable and nli_threshold > 0"
        );
    };
    // The error chain must identify the OnnxNliVerifier as the source so
    // an operator reading logs can trace the failure to the NLI config.
    let err_chain = format!("{err:?}");
    assert!(
        err_chain.contains("OnnxNliVerifier"),
        "error chain must mention OnnxNliVerifier; full chain: {err_chain}"
    );
 }
 // ── 2. Success path: threshold = 0.0 → NLI verifier never constructed ──────
 #[test]
 fn open_with_config_nli_skipped_when_threshold_zero() {
    let (_tmp, cfg) = config_with_unwritable_model_dir();
    // Default nli_threshold is 0.0 — gate disabled, verifier skipped.
    assert!(
        (cfg.rag.nli_threshold - 0.0).abs() < f32::EPSILON,
        "precondition: default nli_threshold must be 0.0 (gate disabled)"
    );
    // A bad model_dir must NOT cause a failure when the NLI gate is off.
    let result = kebab_app::App::open_with_config(cfg);
    assert!(
        result.is_ok(),
        "App::open_with_config must succeed when nli_threshold = 0.0 \
         (OnnxNliVerifier is never constructed); err: {:?}",
        result.err()
    );
 }
--- a/crates/kebab-mcp/tests/tools_call_ask_multi_hop.rs
+++ b/crates/kebab-mcp/tests/tools_call_ask_multi_hop.rs
@@ -32,11 +32,13 @@ fn minimal_config(data_dir: &std::path::Path, workspace_root: &std::path::Path)
    cfg.models.embedding.dimensions = 0;
    // Force the LLM endpoint to a known-unreachable port so this test
    // is robust against whether a real Ollama happens to be running
-    // on 127.0.0.1:11434 (the developer's box; CI; etc.). Combined
+    // on 127.0.0.1:11434 (the developer's box; CI; etc.). The
-    // with a tight `request_timeout_secs`, the multi-hop dispatch
+    // `request_timeout_secs = 5` gives slow CI / Docker network stacks
-    // surfaces `model_unreachable` quickly and deterministically.
+    // enough headroom that *some* error fires deterministically — the
    // dispatch contract below only cares that `is_error` flipped, not
    // which specific error code surfaced.
    cfg.models.llm.endpoint = "http://127.0.0.1:1".to_string();
-    cfg.models.llm.request_timeout_secs = 2;
+    cfg.models.llm.request_timeout_secs = 5;
    cfg
 }
@@ -91,18 +93,12 @@ async fn ask_tool_routes_multi_hop_true_to_decompose_first() {
    };
    let mh_v: serde_json::Value = serde_json::from_str(&mh_text).unwrap();
    assert_eq!(mh_v["schema_version"], "error.v1");
-    // The dispatch contract is "multi-hop reached the LLM". The exact
+    // The dispatch contract is "multi-hop reached the LLM" — i.e.
-    // error code depends on how the host TCP stack reports an
+    // `is_error` fires because decompose tried to talk to the LLM and
-    // unreachable port — fast-path `ECONNREFUSED` classifies as
+    // failed. Which *specific* error code lands (`model_unreachable`
-    // `model_unreachable`, but environments that take the connect
+    // on fast ECONNREFUSED hosts, `timeout` on slow connect-timeout
-    // timeout path (some CI / Docker network stacks) surface
+    // stacks, etc.) is implementation detail of the host TCP/HTTP
-    // `timeout`. Accept either.
+    // path; pinning it here would just produce flakes on slow CI.
    let mh_code = mh_v["code"].as_str().unwrap_or("");
    assert!(
        matches!(mh_code, "model_unreachable" | "timeout"),
        "multi-hop dispatch must reach the LLM and surface model_unreachable/timeout; \
         got code={mh_code:?} from {mh_v}"
    );
    // Single-pass branch — empty KB short-circuits at retrieve, no LLM
    // call happens, refusal Answer comes back as isError=false.
--- a/crates/kebab-nli/src/onnx.rs
+++ b/crates/kebab-nli/src/onnx.rs
@@ -1,6 +1,6 @@
 //! ONNX-backed `NliVerifier` adapter (mDeBERTa-v3 XNLI).
 //!
-//! PR-9b: real implementation. `new` resolves the cache directory from
+//! `new` resolves the cache directory from
 //! `config.storage.model_dir/nli/<sanitized-model-id>/` (matching the
 //! fastembed adapter's pattern of `model_dir/fastembed/`) and stamps it
 //! on `self`. The (potentially network-bound) model + tokenizer download
@@ -10,9 +10,9 @@
 //! a model load on every CLI invocation.
 //!
 //! Per design §2.2.2 (Lazy init), §2.2.3 (truncation = `OnlyFirst`,
-//! premise truncates, hypothesis preserved). PR-9c-1 will wire the
+//! premise truncates, hypothesis preserved). The model id flows from
-//! `[models.nli]` config section; until then the model id is hard-coded
+//! `config.models.nli.model`; `config.models.nli.provider` selects the
-//! to the Xenova mDeBERTa-v3 XNLI multilingual checkpoint.
+//! verifier impl (only `"onnx"` is implemented in v0.18).
 use std::path::PathBuf;
 use std::sync::OnceLock;
@@ -26,14 +26,10 @@ use tokenizers::{
 use crate::{NliScores, NliVerifier};
-/// Default HuggingFace model id for the XNLI verifier. PR-9c-1 will
+/// Filename inside the HF repo (NOT a path on disk). The Xenova repo
-/// replace this constant with a `config.models.nli.model` lookup once
+/// packages the mDeBERTa-v3-base XNLI multilingual checkpoint (the
-/// the `NliCfg` section lands. The Xenova repo packages the
+/// default `config.models.nli.model` — see `kebab-config::NliCfg::defaults`)
-/// mDeBERTa-v3-base XNLI multilingual checkpoint as ONNX under the
+/// as ONNX under this path; the tokenizer ships at `tokenizer.json`.
 /// `onnx/model.onnx` path; the tokenizer ships at `tokenizer.json`.
 const DEFAULT_MODEL_ID: &str = "Xenova/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7";
 /// Filename inside the HF repo (NOT a path on disk).
 const HF_MODEL_FILE: &str = "onnx/model.onnx";
 /// Filename inside the HF repo (NOT a path on disk).
 const HF_TOKENIZER_FILE: &str = "tokenizer.json";
@@ -75,9 +71,19 @@ impl OnnxNliVerifier {
    /// and runs `create_dir_all` so the first `score` call can drop
    /// straight into download + load without re-deriving paths.
    ///
-    /// PR-9c-1 will swap `DEFAULT_MODEL_ID` for `config.models.nli.model`.
+    /// Reads `config.models.nli.model` for the HuggingFace model id
    /// and `config.models.nli.provider` to select the verifier impl —
    /// only `"onnx"` is implemented in v0.18. The defaults live in
    /// `kebab-config::NliCfg::defaults` so this path always receives
    /// a non-empty model id.
    pub fn new(config: &kebab_config::Config) -> Result<Self> {
-        let model_id = DEFAULT_MODEL_ID.to_string();
+        let provider = config.models.nli.provider.as_str();
        if provider != "onnx" {
            anyhow::bail!(
                "kebab-nli: unsupported provider {provider:?} (only 'onnx' is implemented in v0.18)"
            );
        }
        let model_id = config.models.nli.model.clone();
        // Match kebab-embed-local's two-step expansion: data_dir first,
        // then model_dir with `{data_dir}` substituted in.
@@ -329,4 +335,74 @@ mod tests {
            "unexpected error message: {err}"
        );
    }
    /// Pins that `config.models.nli.model` flows into `OnnxNliVerifier`
    /// instead of being silently overridden by a hardcoded constant.
    /// `model_id` is a private field, but this test lives in the same
    /// module so it can read it directly — the wiring contract is
    /// "whatever the user puts in TOML / KEBAB_MODELS_NLI_MODEL is the
    /// id the verifier uses".
    #[test]
    fn new_uses_config_model_id() {
        let (_tmp, mut cfg) = tempdir_config();
        cfg.models.nli.model = "custom-org/custom-nli-model".to_string();
        let v = OnnxNliVerifier::new(&cfg).expect("new should succeed with custom model id");
        assert_eq!(v.model_id, "custom-org/custom-nli-model");
        // The custom id also flows into the on-disk cache_dir layout
        // (sanitized so `/` doesn't escape the namespace).
        let s = v.cache_dir.to_string_lossy();
        assert!(
            s.contains("custom-org_custom-nli-model"),
            "cache_dir should embed sanitized custom model id: {s}"
        );
    }
    /// Pins that a non-`"onnx"` provider value errors out at `new` —
    /// the field is no longer silently ignored.
    #[test]
    fn new_rejects_unsupported_provider() {
        let (_tmp, mut cfg) = tempdir_config();
        cfg.models.nli.provider = "candle".to_string();
        let result = OnnxNliVerifier::new(&cfg);
        assert!(result.is_err(), "non-onnx provider must error");
        let msg = result.err().unwrap().to_string();
        assert!(
            msg.contains("unsupported provider") && msg.contains("candle"),
            "error should name the rejected provider: {msg}"
        );
    }
    // ── sanitize_model_id pure-fn coverage ────────────────────────────────
    //
    // Three tests pin the behavior of the private `sanitize_model_id`
    // helper. These are orthogonal to the H1 executor tests above
    // (which cover config-wiring); these cover the transformation
    // contract of the sanitizer itself.
    #[test]
    fn sanitize_model_id_replaces_slash_with_underscore() {
        let input = "Xenova/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7";
        let expected = "Xenova_mDeBERTa-v3-base-xnli-multilingual-nli-2mil7";
        assert_eq!(sanitize_model_id(input), expected);
    }
    #[test]
    fn sanitize_model_id_is_idempotent_on_already_sanitized() {
        // Input with no '/' must come back byte-for-byte unchanged.
        let input = "Xenova_mDeBERTa-v3-base-xnli-multilingual-nli-2mil7";
        assert_eq!(sanitize_model_id(input), input);
    }
    #[test]
    fn sanitize_model_id_leaves_other_chars_untouched() {
        // Hyphens, digits, dots, and underscores must all pass through
        // unchanged — only '/' is replaced with '_'.
        let input = "org_name/model-name_v2.3-alpha";
        let got = sanitize_model_id(input);
        assert_eq!(got, "org_name_model-name_v2.3-alpha");
        assert!(!got.contains('/'), "no slash must remain after sanitize");
        assert!(got.contains('-'), "hyphens must be preserved");
        assert!(got.contains('.'), "dots must be preserved");
        assert!(got.contains('_'), "underscores must be preserved");
    }
 }
--- a/crates/kebab-rag/src/pipeline.rs
+++ b/crates/kebab-rag/src/pipeline.rs
@@ -1038,7 +1038,13 @@ impl RagPipeline {
                "verifier must be Some when nli_threshold > 0.0 \
                 (kebab-app's open_with_config enforces this invariant)",
            );
-            let (truncated_premise, _was_truncated) = truncate_for_nli(&packed_text, &acc);
+            let (truncated_premise, was_truncated) = truncate_for_nli(&packed_text);
            if was_truncated {
                tracing::debug!(
                    target: "kebab-rag",
                    "NLI premise truncated to MAX_NLI_PREMISE_CHARS for entailment check"
                );
            }
            match v.score(&truncated_premise, &acc) {
                Ok(scores) => {
                    let passed = scores.entailment >= self.config.rag.nli_threshold;
@@ -1633,9 +1639,9 @@ impl RagPipeline {
            // PR-9c-2: NLI refusal still carries the hop trace built
            // up to step 8.5 — synthesize ran, so the trace is the
            // full decompose+decide chain (terminal Synthesize hop is
-            // NOT appended for the refusal path; cleanup deferred to
+            // NOT appended for the refusal path). See
-            // a follow-up if the user-visible trace shape needs the
+            // `tasks/HOTFIXES.md` "PR-9 NLI refusal: terminal Synthesize
-            // synthesize entry).
+            // hop omitted" for follow-up.
            hops: Some(hops),
            verification: Some(v),
        };
@@ -1797,15 +1803,11 @@ pub(crate) const MULTI_HOP_MAX_SUB_QUERIES_HARD_CAP: usize = 10;
 pub const MAX_NLI_PREMISE_CHARS: usize = 4 * 400;
 /// p9-fb-41 PR-9c-2: truncate `premise` to fit the NLI input budget
-/// while preserving `hypothesis` in full. Returns `(truncated_premise,
+/// (`MAX_NLI_PREMISE_CHARS`). Returns `(truncated_premise,
-/// was_truncated)`. `was_truncated` is informational for tracing —
+/// was_truncated)`; `was_truncated` is informational so the callsite
-/// the v0.18 wire doesn't surface it; a v0.19+ extension might.
+/// can log a truncation tracing event (the v0.18 wire doesn't surface
-///
+/// it).
-/// `_hypothesis` is currently unused — placeholder for the v0.18.1
+pub fn truncate_for_nli(premise: &str) -> (String, bool) {
 /// token-budget version that would carve the budget *around* the
 /// hypothesis. Kept on the signature to preserve the contract from
 /// spec §2.2.3 / spec §3 PR-9c-2.
 pub fn truncate_for_nli(premise: &str, _hypothesis: &str) -> (String, bool) {
    if premise.chars().count() <= MAX_NLI_PREMISE_CHARS {
        (premise.to_string(), false)
    } else {
--- a/crates/kebab-rag/tests/multi_hop.rs
+++ b/crates/kebab-rag/tests/multi_hop.rs
@@ -631,7 +631,7 @@ fn multi_hop_above_probe_gate_proceeds_to_decompose() {
 //    `Answer.verification` stays `None` (no verifier attached).
 // 4. `multi_hop_nli_model_unavailable_refuses` — verifier returns `Err` →
 //    refusal with `RefusalReason::NliModelUnavailable` + `verification = None`.
-// 5. `multi_hop_truncate_for_nli_preserves_hypothesis` — pure unit test on
+// 5. `multi_hop_truncate_for_nli_char_budget` — pure unit test on
 //    `truncate_for_nli`'s char-budget contract.
 /// Helper to build a "valid multi-hop happy-path" scenario where probe +
@@ -775,12 +775,11 @@ fn multi_hop_nli_model_unavailable_refuses() {
 }
 #[test]
-fn multi_hop_truncate_for_nli_preserves_hypothesis() {
+fn multi_hop_truncate_for_nli_char_budget() {
-    // Long premise (>1600 chars) gets truncated, short hypothesis is
+    // Long premise (>1600 chars) gets truncated.
-    // passed unchanged (signature placeholder for v0.18.1 token-budget
+    // MAX_NLI_PREMISE_CHARS = 4 * 400 = 1600.
    // version). MAX_NLI_PREMISE_CHARS = 4 * 400 = 1600.
    let long_premise: String = "a".repeat(2000);
-    let (truncated, was_truncated) = truncate_for_nli(&long_premise, "short hypothesis");
+    let (truncated, was_truncated) = truncate_for_nli(&long_premise);
    assert!(was_truncated);
    assert_eq!(
        truncated.chars().count(),
@@ -790,20 +789,20 @@ fn multi_hop_truncate_for_nli_preserves_hypothesis() {
    // Short premise (under budget): no truncation, `was_truncated = false`.
    let short_premise = "short premise text";
-    let (passthrough, was_truncated) = truncate_for_nli(short_premise, "anything");
+    let (passthrough, was_truncated) = truncate_for_nli(short_premise);
    assert!(!was_truncated);
    assert_eq!(passthrough, short_premise);
    // Multi-byte safety: 1600 Korean chars (3 bytes each in UTF-8) fits
    // within the char budget even though byte length exceeds 4800.
    let kr_short: String = "가".repeat(1600);
-    let (passthrough_kr, was_truncated_kr) = truncate_for_nli(&kr_short, "h");
+    let (passthrough_kr, was_truncated_kr) = truncate_for_nli(&kr_short);
    assert!(!was_truncated_kr, "1600 KR chars == budget, no truncation");
    assert_eq!(passthrough_kr.chars().count(), 1600);
    // Multi-byte over-budget: truncation must count chars, not bytes.
    let kr_long: String = "가".repeat(2000);
-    let (truncated_kr, was_truncated_kr) = truncate_for_nli(&kr_long, "h");
+    let (truncated_kr, was_truncated_kr) = truncate_for_nli(&kr_long);
    assert!(was_truncated_kr);
    assert_eq!(
        truncated_kr.chars().count(),
--- a/crates/kebab-rag/tests/multi_hop_nli_panic.rs
+++ b/crates/kebab-rag/tests/multi_hop_nli_panic.rs
@@ -0,0 +1,100 @@
 //! Pins the documented facade-invariant panic in `ask_multi_hop`.
 //!
 //! When `cfg.rag.nli_threshold > 0` but no verifier is attached via
 //! `.with_verifier()`, the `expect` at `pipeline.rs` step 8.5 fires
 //! with the message "verifier must be Some when nli_threshold > 0.0".
 //!
 //! This is a **contract test**: it documents the invariant so that a
 //! future refactor replacing the `expect` with `bail!` (or a different
 //! message) is caught by the test suite, prompting an explicit decision
 //! rather than a silent behavior change.
 //!
 //! The kebab-app facade (`App::open_with_config`) always pairs
 //! `nli_threshold > 0` with a constructed `OnnxNliVerifier`, so this
 //! panic is unreachable via the normal CLI / MCP / TUI paths — only
 //! a direct `RagPipeline::new(...)` caller without `.with_verifier()`
 //! can trigger it.
 mod common;
 use std::sync::Arc;
 use common::{RagEnv, ScriptedLm, ScriptedRetriever, id32, mk_hit};
 use kebab_core::{LanguageModel, Retriever, SearchMode};
 use kebab_rag::{AskOpts, RagPipeline};
 /// Minimal multi-hop `AskOpts` mirroring the pattern used in
 /// `multi_hop.rs` — lexical mode, deterministic seed, no streaming.
 fn multi_hop_opts() -> AskOpts {
    AskOpts {
        k: 5,
        explain: false,
        mode: SearchMode::Lexical,
        temperature: Some(0.0),
        seed: Some(0),
        stream_sink: None,
        history: Vec::new(),
        conversation_id: None,
        turn_index: None,
        multi_hop: true,
    }
 }
 /// Building the "happy-path" scenario inline: probe retrieve passes
 /// the score gate, decompose emits one sub-query, decide signals stop,
 /// and synthesize produces a non-empty cited answer. This is the minimal
 /// scenario that reaches step 8.5 (NLI gate) in `ask_multi_hop`.
 fn setup_happy_pipeline_no_verifier(nli_threshold: f32) -> (RagPipeline, RagEnv) {
    let env = RagEnv::new();
    let cid = id32("c1");
    let did = id32("d1");
    env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
    let hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
    // Entry 0 = probe retrieve (pre-decompose gate check).
    // Entry 1 = decompose-driven retrieve for "q1".
    let retriever = Arc::new(ScriptedRetriever::new(vec![hits.clone(), hits]));
    let retriever_dyn: Arc<dyn Retriever> = retriever;
    // Three LLM calls: decompose → decide (stop) → synthesize.
    // Synthesize emits a non-empty answer so step 8.5 is reached.
    let lm = Arc::new(ScriptedLm::new(vec![
        r#"["q1"]"#,   // decompose
        r"[]",         // decide: stop signal
        "answer body [#1]", // synthesize: non-empty → step 8.5 entered
    ]));
    let lm_dyn: Arc<dyn LanguageModel> = lm;
    let mut cfg = env.config.clone();
    cfg.rag.nli_threshold = nli_threshold;
    // Intentionally NO `.with_verifier()` — this is the condition under test.
    let pipeline = RagPipeline::new(cfg, retriever_dyn, lm_dyn, env.sqlite.clone());
    (pipeline, env)
 }
 #[test]
 #[should_panic(expected = "verifier must be Some when nli_threshold > 0.0")]
 fn ask_multi_hop_panics_when_threshold_positive_but_verifier_none() {
    // nli_threshold = 0.5 (gate enabled) but the pipeline has no verifier
    // because `.with_verifier()` was never called. The `expect` at
    // pipeline.rs step 8.5 fires once synthesize produces a non-empty answer.
    let (pipeline, _env) = setup_happy_pipeline_no_verifier(0.5);
    // Unwrap is intentional: we're asserting the panic, not an Ok/Err return.
    let _ = pipeline.ask("compound", multi_hop_opts());
 }
 /// Companion: threshold = 0.0 (gate disabled) with no verifier must
 /// NOT panic — the `if nli_threshold > 0.0` guard short-circuits the
 /// entire step 8.5 block.
 #[test]
 fn ask_multi_hop_does_not_panic_when_threshold_zero_and_verifier_none() {
    let (pipeline, _env) = setup_happy_pipeline_no_verifier(0.0);
    let answer = pipeline
        .ask("compound", multi_hop_opts())
        .expect("threshold = 0.0 skips NLI gate; no panic expected");
    // Gate is disabled → verification summary stays None.
    assert!(
        answer.verification.is_none(),
        "nli_threshold = 0.0 must leave Answer.verification = None"
    );
 }
--- a/crates/kebab-rag/tests/multi_hop_nli_stream.rs
+++ b/crates/kebab-rag/tests/multi_hop_nli_stream.rs
@@ -0,0 +1,177 @@
 //! Tests that the NLI refusal paths emit a `StreamEvent::Final` event
 //! into `opts.stream_sink`.
 //!
 //! Coverage:
 //! 1. `nli_verification_fail_emits_final_stream_event_with_refusal` —
 //!    `MockNliVerifier::fail()` → `RefusalReason::NliVerificationFailed`
 //!    arrives as the payload of the terminal `StreamEvent::Final`.
 //! 2. `nli_model_unavailable_emits_final_stream_event_with_refusal` —
 //!    `MockNliVerifier::err()` → `RefusalReason::NliModelUnavailable`
 //!    arrives as the payload of the terminal `StreamEvent::Final`.
 //!
 //! Note: `ask_multi_hop` does NOT have a separate `_streaming` entrypoint.
 //! Streaming is handled by the `stream_sink: Option<Sender<StreamEvent>>`
 //! field on `AskOpts`. Both refusal helpers (`refuse_nli_verification` and
 //! `refuse_nli_model_unavailable`) fire `sink.send(StreamEvent::Final { … })`
 //! before returning — these tests pin that wire shape.
 mod common;
 use std::sync::Arc;
 use std::sync::mpsc;
 use common::{MockNliVerifier, RagEnv, ScriptedLm, ScriptedRetriever, id32, mk_hit};
 use kebab_core::{LanguageModel, RefusalReason, Retriever, SearchMode};
 use kebab_nli::NliVerifier;
 use kebab_rag::{AskOpts, RagPipeline, StreamEvent};
 // ── shared helpers ─────────────────────────────────────────────────────────
 /// Build the minimal happy-path scenario that reaches step 8.5 (NLI gate):
 /// probe passes, decompose → one sub-query, decide → stop, synthesize →
 /// non-empty answer.  Returns the env, scripted retriever, and scripted LM.
 fn happy_env_for_stream() -> (RagEnv, Arc<ScriptedRetriever>, Arc<ScriptedLm>) {
    let env = RagEnv::new();
    let cid = id32("c1");
    let did = id32("d1");
    env.seed_chunk(&cid, &did, "notes/a.md", "Body text.", &["Intro"]);
    let hits = vec![mk_hit(1, &cid, &did, "notes/a.md", 0.85, &["Intro"])];
    // Entry 0 = probe, entry 1 = decompose-driven retrieve.
    let retriever = Arc::new(ScriptedRetriever::new(vec![hits.clone(), hits]));
    let lm = Arc::new(ScriptedLm::new(vec![
        r#"["q1"]"#,        // decompose
        r"[]",              // decide: stop
        "answer body [#1]", // synthesize: non-empty so NLI gate runs
    ]));
    (env, retriever, lm)
 }
 /// Multi-hop `AskOpts` with a `stream_sink` wired in so every pipeline
 /// stage emits `StreamEvent`s into `tx`.
 fn multi_hop_opts_with_sink(tx: mpsc::Sender<StreamEvent>) -> AskOpts {
    AskOpts {
        k: 5,
        explain: false,
        mode: SearchMode::Lexical,
        temperature: Some(0.0),
        seed: Some(0),
        stream_sink: Some(tx),
        history: Vec::new(),
        conversation_id: None,
        turn_index: None,
        multi_hop: true,
    }
 }
 /// Drain `rx` and return the first `StreamEvent::Final` found, panicking
 /// with a clear message if none is present.
 fn expect_final_event(rx: mpsc::Receiver<StreamEvent>) -> StreamEvent {
    let events: Vec<StreamEvent> = rx.try_iter().collect();
    events
        .into_iter()
        .find(|e| matches!(e, StreamEvent::Final { .. }))
        .expect("pipeline must emit at least one StreamEvent::Final")
 }
 // ── 1. NliVerificationFailed ───────────────────────────────────────────────
 #[test]
 fn nli_verification_fail_emits_final_stream_event_with_refusal() {
    let (env, retriever, lm) = happy_env_for_stream();
    let mut cfg = env.config.clone();
    cfg.rag.nli_threshold = 0.5; // entailment 0.1 < 0.5 → refusal
    let retriever_dyn: Arc<dyn Retriever> = retriever;
    let lm_dyn: Arc<dyn LanguageModel> = lm;
    let verifier = MockNliVerifier::fail(); // entailment score = 0.1
    let verifier_dyn: Arc<dyn NliVerifier> = verifier;
    let (tx, rx) = mpsc::channel::<StreamEvent>();
    let pipeline = RagPipeline::new(cfg, retriever_dyn, lm_dyn, env.sqlite.clone())
        .with_verifier(verifier_dyn);
    let answer = pipeline
        .ask("compound", multi_hop_opts_with_sink(tx))
        .expect("pipeline returns Ok even on NLI refusal");
    // Synchronous return value.
    assert_eq!(
        answer.refusal_reason,
        Some(RefusalReason::NliVerificationFailed),
        "return value must carry NliVerificationFailed"
    );
    assert!(!answer.grounded, "NLI refusal must not be grounded");
    // Stream wire shape: terminal Final event must carry matching refusal.
    let final_event = expect_final_event(rx);
    match final_event {
        StreamEvent::Final { answer: streamed } => {
            assert_eq!(
                streamed.refusal_reason,
                Some(RefusalReason::NliVerificationFailed),
                "Final event's answer must carry NliVerificationFailed"
            );
            assert!(!streamed.grounded);
            // verification summary is stamped even on the refusal path.
            let v = streamed
                .verification
                .expect("NliVerificationFailed carries a VerificationSummary");
            assert!(!v.nli_passed);
            assert!((v.nli_score - 0.1).abs() < 1e-5, "score: {}", v.nli_score);
        }
        other => panic!("expected StreamEvent::Final, got {other:?}"),
    }
 }
 // ── 2. NliModelUnavailable ─────────────────────────────────────────────────
 #[test]
 fn nli_model_unavailable_emits_final_stream_event_with_refusal() {
    let (env, retriever, lm) = happy_env_for_stream();
    let mut cfg = env.config.clone();
    cfg.rag.nli_threshold = 0.5; // gate enabled; verifier will error
    let retriever_dyn: Arc<dyn Retriever> = retriever;
    let lm_dyn: Arc<dyn LanguageModel> = lm;
    let verifier = MockNliVerifier::err(); // returns anyhow::Error
    let verifier_dyn: Arc<dyn NliVerifier> = verifier;
    let (tx, rx) = mpsc::channel::<StreamEvent>();
    let pipeline = RagPipeline::new(cfg, retriever_dyn, lm_dyn, env.sqlite.clone())
        .with_verifier(verifier_dyn);
    let answer = pipeline
        .ask("compound", multi_hop_opts_with_sink(tx))
        .expect("pipeline returns Ok even when NLI model is unavailable");
    // Synchronous return value.
    assert_eq!(
        answer.refusal_reason,
        Some(RefusalReason::NliModelUnavailable),
        "return value must carry NliModelUnavailable"
    );
    assert!(!answer.grounded);
    // verification is None — we can't summarize what didn't happen.
    assert!(
        answer.verification.is_none(),
        "NliModelUnavailable must leave Answer.verification = None"
    );
    // Stream wire shape: terminal Final event must carry matching refusal.
    let final_event = expect_final_event(rx);
    match final_event {
        StreamEvent::Final { answer: streamed } => {
            assert_eq!(
                streamed.refusal_reason,
                Some(RefusalReason::NliModelUnavailable),
                "Final event's answer must carry NliModelUnavailable"
            );
            assert!(!streamed.grounded);
            assert!(
                streamed.verification.is_none(),
                "NliModelUnavailable: verification must be None in the streamed Final event"
            );
        }
        other => panic!("expected StreamEvent::Final, got {other:?}"),
    }
 }
--- a/docs/dogfood/v0.18.0/SUMMARY.md
+++ b/docs/dogfood/v0.18.0/SUMMARY.md
@@ -79,3 +79,18 @@ workspace-wide cleanup (chore: clippy::pedantic baseline + auto-fix, 128 files /
 | S3 | `nli_model_unavailable` | `nli_model_unavailable` | ✓ identical (cleanup 무관 — root cause v0.18.1 follow-up) |
 cleanup 가 *mechanical refactor only* — behavior 회귀 0 + NLI score deterministic. cut PR v0.18.0 진행 가능 baseline.
 ## Post-architectural-refactor retest (2026-05-26)
 OMC team `post-pr9-refactor` 의 architect 가 priorities 분석 후 executor + test-engineer 가 추가 cleanup 진행 — H1 (`models.nli.model` config wiring, `DEFAULT_MODEL_ID` 제거), H2 (`truncate_for_nli` 의 `_hypothesis` stub param 제거), H3 (`was_truncated` tracing::debug! 로 surfacing), D (MCP test flake fix), E (carried TODO → HOTFIXES cross-link). test-engineer 가 T1/T2/T3/T4 (총 9 new tests) 추가. system-architect 가 component-level review 후 "pre-cut nothing — all architectural items v0.18.1+ defer" 결론.
 본 architectural refactor 후 동일 4 case 재실행. **PR-9d / post-cleanup / post-refactor 3번 모두 byte-identical**:
 | case | PR-9d | post-cleanup | **post-refactor** |
 |---|---|---|---|
 | S7 | 0.0035389824770390987 | 0.0035389824770390987 | **0.0035389824770390987** ✓ |
 | S1 | 0.058334656059741974 | 0.058334656059741974 | **0.058334656059741974** ✓ |
 | S10 | 0.0027875436935573816 | 0.0027875436935573816 | **0.0027875436935573816** ✓ |
 | S3 | nli_model_unavailable | nli_model_unavailable | nli_model_unavailable ✓ |
 H1 의 config wiring (DEFAULT_MODEL_ID 제거 후 `config.models.nli.model` 사용) 가 *behavior 변경 0* — default config 의 model 값이 hardcoded 와 같음. workspace test 1304 passed + 1 pre-existing flaky (kebab-mcp HOTFIX #15 동일). cargo clippy --workspace --all-targets -j 1 -- -D warnings clean.
--- a/tasks/HOTFIXES.md
+++ b/tasks/HOTFIXES.md
@@ -119,6 +119,16 @@ PR-9 의 핵심 목표 (S7 silent hallucination root cause 해결) ✅ **달성*
 **Amends**: 없음 (PR-9 의 known limitation, v0.18.1 candidate).
 ### PR-9 NLI refusal: terminal Synthesize hop omitted from hops trace
 **Symptom**: multi-hop `ask` 가 step 8.5 NLI gate 에서 refuse 시 (`RefusalReason::NliVerificationFailed` / `NliModelUnavailable`) `Answer.hops` 는 decompose+decide chain 까지만 담겨 있고, synthesize 가 실제로 LLM 호출 + 토큰 누적까지 끝났음에도 terminal `Synthesize` `HopRecord` 가 append 되지 않는다. happy path (refuse 안 함) 의 `hops` trace 는 `Synthesize` 항을 포함하므로 두 surface 사이 trace shape 가 비대칭.
 **Action**: 본 entry 가 tracker. `crates/kebab-rag/src/pipeline.rs` 의 `refuse_nli_verification` / `refuse_nli_model_unavailable` 진입 직전 (step 8.5) 에서 `Synthesize` `HopRecord` (with `forced_stop = false`, synth `usage` / `elapsed_ms` 동봉) 를 append 후 refuse 호출하도록 정리.
 **Next step**: v0.18.1 candidate — wire 비대칭이 explain / TUI hops 표시에 사용자 노출되는지 도그푸딩에서 확인 후 fix. 시급도 낮음 (`hops` 는 trace/debug 용, grounded/refusal 결정에는 무관).
 **Amends**: 없음 (v0.18.1 candidate). Cross-link: `crates/kebab-rag/src/pipeline.rs` (refuse_nli_* call sites 직전).
 ### 사용자 영향
 PR-7 + PR-8 머지 후 (v0.18.0 cut 직전):