feat(rag): fb-41 PR-9c-1 — core types + wire scaffolding (NLI verification)

Surface-only PR (no behavior wiring — that's PR-9c-2): - kebab-core: RefusalReason::NliVerificationFailed + NliModelUnavailable (serde rename_all="snake_case", wire = identical strings). - kebab-core: Answer.verification: Option<VerificationSummary> field (additive minor wire — pre-v0.18 reader 무영향). - kebab-core: VerificationSummary { nli_score: f32, nli_threshold: f32, nli_passed: bool } struct + lib.rs 재-export. - kebab-config: NliCfg { model, provider } + ModelsCfg.nli (default Xenova/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7). - kebab-config: RagCfg.nli_threshold: f32 (default 0.0 = disabled, spec §2.6 single gate). - kebab-config: env override KEBAB_MODELS_NLI_MODEL/PROVIDER + KEBAB_RAG_NLI_THRESHOLD (parse 실패 시 tracing::warn + default 유지). - kebab-rag: RagPipeline.verifier: Option<Arc<dyn NliVerifier>> field + with_verifier builder (모두 #[allow(dead_code)] — PR-9c-2 의 step 8.5 hook 가 활성화 시 제거). RagPipeline::new signature 유지 (round-2 NEW-M1 Option B). - kebab-rag: Cargo.toml 에 kebab-nli path 의존 추가. - kebab-store-sqlite + kebab-tui: 두 신규 RefusalReason variant 에 대한 exhaustive match arm 추가 (snake_case label / 표시 문구). - 모든 Answer 구축 site (rag 6 + cli/tui/eval 3 fixture) 에 verification: None 추가. - wire schemas: answer.schema.json verification field + \$defs.VerificationSummary + refusal_reason.enum 2 추가. error.schema.json code.enum + details.description 2 추가 (forward-looking reserved). - docs/ARCHITECTURE.md: Mermaid Adapters subgraph 의 nli 노드 + rag→nli + app→nli (forward-looking) + nli→config edges. nli→core edge 는 skip (kebab-nli/Cargo.toml direct dep 가 config 만, ARCHITECTURE 컨벤션 = direct deps only). 디렉토리 트리에 crates/kebab-nli/ 추가. Tests: kebab-core 3 (serde rename + verification skip + struct shape) + kebab-config 6 (defaults + legacy + env + malformed env) + kebab-cli wire 5 (schema verification + enum 검증). 검증: cargo test --workspace -j 1 회귀 0 (pre-existing kebab-mcp::tools_call_ask_multi_hop flaky 1개 동일 — spec 에 명시된 known-flaky). cargo clippy --workspace --all-targets -D warnings clean. Wire 영향: additive minor — answer.v1 의 verification optional + refusal_reason.enum 확장 + error.v1.code 확장. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 23:27:36 +00:00
parent 79ad6e376f
commit 546c1564b0
15 changed files with 479 additions and 5 deletions
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -68,6 +68,7 @@ flowchart TB
        llmlocal["kebab-llm-local<br/>(Ollama)"]
        search["kebab-search"]
        rag["kebab-rag"]
+        nli["kebab-nli<br/>(NLI verifier, fb-41)"]
    end
    eval["kebab-eval"]
    config["kebab-config"]
@@ -106,6 +107,9 @@ flowchart TB
    rag --> search
    rag --> llm
    rag --> sqlite
+    rag --> nli
+    app --> nli
+    nli --> config
    search --> sqlite
    search --> vector
    search --> embed
@@ -181,6 +185,7 @@ kebab/
 │   ├── kebab-store-vector/                            # LanceDB VectorStore (P3-3, P7-3 follow-up)
 │   ├── kebab-llm/  kebab-llm-local/                      # LanguageModel trait + Ollama adapter (P4-1, P4-2)
 │   ├── kebab-rag/                                     # RAG pipeline (P4-3)
+│   ├── kebab-nli/                                     # NLI verifier (mDeBERTa-v3 XNLI, fb-41 PR-9a/9b/9c-1)
 │   ├── kebab-eval/                                    # golden query runner + metrics (P5-1, P5-2)
 │   ├── kebab-parse-image/                             # ImageExtractor + Ollama OCR + caption (P6)
 │   ├── kebab-parse-pdf/                               # lopdf per-page text extractor (P7-1)
--- a/docs/wire-schema/v1/answer.schema.json
+++ b/docs/wire-schema/v1/answer.schema.json
@@ -30,12 +30,14 @@
            "no_index",
            "no_chunks",
            "llm_stream_aborted",
-            "multi_hop_decompose_failed"
+            "multi_hop_decompose_failed",
+            "nli_verification_failed",
+            "nli_model_unavailable"
          ]
        },
        { "type": "null" }
      ],
-      "description": "p9-fb-41: `multi_hop_decompose_failed` added in PR-2 alongside the multi-hop pipeline skeleton (only emitted when AskOpts.multi_hop = true and the decompose LLM call fails to parse). Other variants are unchanged from earlier phases."
+      "description": "p9-fb-41: `multi_hop_decompose_failed` added in PR-2 alongside the multi-hop pipeline skeleton (only emitted when AskOpts.multi_hop = true and the decompose LLM call fails to parse). `nli_verification_failed` + `nli_model_unavailable` added in PR-9c-1 — both emitted only on the multi-hop path when `[rag].nli_threshold > 0`; surface declared in PR-9c-1, behavior wired in PR-9c-2."
    },
    "model":                   { "type": "object" },
    "embedding":               { "type": ["object", "null"] },
@@ -61,6 +63,13 @@
        { "type": "null" }
      ],
      "description": "p9-fb-41 multi-hop trace. Present (non-null array) only when the ask routed through the multi-hop pipeline (`AskOpts.multi_hop = true`); single-pass answers omit the field entirely (serde `skip_serializing_if = None`). Each entry records one LLM hop — decompose / decide / synthesize — with sub-queries, retrieval count, and per-hop latency. Wire-additive: pre-fb-41 readers tolerate the missing field; new readers branch on its presence to render the per-hop trace."
+    },
+    "verification": {
+      "anyOf": [
+        { "$ref": "#/$defs/VerificationSummary" },
+        { "type": "null" }
+      ],
+      "description": "p9-fb-41 PR-9c-1: NLI-based groundedness verification summary. Present only when `[rag].nli_threshold > 0` and multi-hop ask reached step 8.5 (single-pass ask never verifies). Surface declared in PR-9c-1; the actual stamp lands in PR-9c-2. Wire-additive: pre-v0.18 readers tolerate the missing field."
    }
  },
  "$defs": {
@@ -97,6 +106,24 @@
          "description": "Wall-clock latency of the LLM call for this hop. `0` is overloaded — means 'no LLM call happened' when (a) the Decide hop was skipped due to forced_stop or (b) the pool was empty before any decide could run. Treat 0 as absent or instantaneous."
        }
      }
+    },
+    "VerificationSummary": {
+      "type": "object",
+      "required": ["nli_score", "nli_threshold", "nli_passed"],
+      "properties": {
+        "nli_score": {
+          "type": "number",
+          "description": "p9-fb-41 PR-9c-1: NLI entailment channel score (faithfulness) — `NliScores::faithfulness()` of `(premise = packed chunks, hypothesis = answer)`."
+        },
+        "nli_threshold": {
+          "type": "number",
+          "description": "p9-fb-41 PR-9c-1: mirror of `[rag].nli_threshold` at the time the verification ran (audit field — same value the pipeline gates on)."
+        },
+        "nli_passed": {
+          "type": "boolean",
+          "description": "p9-fb-41 PR-9c-1: `nli_score >= nli_threshold`. When false, the matching wire emit also carries `refusal_reason = \"nli_verification_failed\"`."
+        }
+      }
    }
  }
 }
--- a/docs/wire-schema/v1/error.schema.json
+++ b/docs/wire-schema/v1/error.schema.json
@@ -17,14 +17,16 @@
        "timeout",
        "io_error",
        "generic",
-        "multi_hop_decompose_failed"
+        "multi_hop_decompose_failed",
+        "nli_verification_failed",
+        "nli_model_unavailable"
      ]
    },
    "message": { "type": "string" },
    "details": {
      "type": "object",
      "additionalProperties": true,
-      "description": "Per-code free-form context. config_invalid: { path, cause }. not_indexed: { expected, found }. model_unreachable: { endpoint, source }. model_not_pulled: { model }. timeout: { source }. io_error: { kind }. generic: { chain (when --verbose) }. multi_hop_decompose_failed: {} (reserved — currently emitted as Answer.refusal_reason on stdout, NOT as error.v1 on stderr; the enum member is forward-looking for a future RefusalReason → error_wire promotion)."
+      "description": "Per-code free-form context. config_invalid: { path, cause }. not_indexed: { expected, found }. model_unreachable: { endpoint, source }. model_not_pulled: { model }. timeout: { source }. io_error: { kind }. generic: { chain (when --verbose) }. multi_hop_decompose_failed: {} (reserved — currently emitted as Answer.refusal_reason on stdout, NOT as error.v1 on stderr; the enum member is forward-looking for a future RefusalReason → error_wire promotion). nli_verification_failed: {} (p9-fb-41 PR-9c-1 reserved — currently emitted only as Answer.refusal_reason on stdout; forward-looking for future RefusalReason → error_wire promotion). nli_model_unavailable: {} (p9-fb-41 PR-9c-1 reserved — same pattern as nli_verification_failed)."
    },
    "hint": {
      "anyOf": [