feat(nli): fb-41 PR-9a — kebab-nli crate skeleton + workspace deps
- 신규 crate kebab-nli (trait + impl 동일 crate, v0.18 scope = ONNX adapter 1개).
- NliVerifier trait + NliScores struct (XNLI 3-channel: entailment/neutral/contradiction).
- private softmax3 (log-sum-exp 안전).
- OnnxNliVerifier placeholder (PR-9b 가 ONNX inference + model download 추가).
- workspace.dependencies 추가: ort 2.0-rc.9, tokenizers 0.21 (default-features=false, onig), hf-hub 0.4, ndarray 0.16.
Pre-flight (PR-9 design contract 의 gate):
- HF Xenova/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 model.onnx + tokenizer.json → HTTP/2 302 (HF S3 routing, file 존재).
- tokenizers --no-default-features -F onig 의 standalone repro: SentencePiece mDeBERTa tokenizer.json 로드 OK (KR 9 tokens / EN 11 tokens 정상 encode).
- Cargo features 결정 trace: tokenizers = { default-features = false, features = ["onig"] } lock.
Tests: 6 unit (softmax3 정규화 + 불변성 + XNLI logits 변환 + faithfulness + new + score stub) — 통과.
Verification: cargo test -p kebab-nli -j 1 (6/6) + cargo clippy -p kebab-nli --all-targets -j 1 -- -D warnings clean.
Workspace: cargo test --workspace -j 1 — pre-existing kebab-mcp::tools_call_ask_multi_hop 1 fail (main baseline 동일 fail, PR-9a 무관 — ingest fixture/Ollama 의존 flaky).
Wire 영향: 없음 (crate 도입만).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
14
Cargo.toml
14
Cargo.toml
@@ -24,6 +24,7 @@ members = [
|
||||
"crates/kebab-tui",
|
||||
"crates/kebab-mcp",
|
||||
"crates/kebab-parse-code",
|
||||
"crates/kebab-nli",
|
||||
]
|
||||
|
||||
[workspace.package]
|
||||
@@ -102,6 +103,19 @@ tree-sitter-kotlin-ng = "1.1.0" # bare tree-sitter-kotlin requires ts <0.23;
|
||||
# C/C++ family grammars for code ingest (kebab-parse-code, p10-1D).
|
||||
tree-sitter-c = "0.24.2"
|
||||
tree-sitter-cpp = "0.23.4"
|
||||
# fb-41 PR-9 (kebab-nli): mDeBERTa-v3 XNLI verifier deps. Versions match
|
||||
# the fastembed 4.9 transitive set so the ONNX Runtime + tokenizer stack
|
||||
# stays single-versioned across the workspace. ort `default-features=false`
|
||||
# drops the bundled binary downloader (fastembed already provides one);
|
||||
# tokenizers `default-features=false, onig` swaps the default `esaxx` regex
|
||||
# backend for `onig` so the build doesn't need libstdc++ headers (verified
|
||||
# via PR-9a pre-flight: SentencePiece tokenizer.json loads + KR/EN encode).
|
||||
# hf-hub uses `ureq + rustls-tls` to stay aligned with kebab-embed-local's
|
||||
# pure-Rust TLS stack.
|
||||
ort = { version = "=2.0.0-rc.9", default-features = false, features = ["ndarray"] }
|
||||
tokenizers = { version = "0.21", default-features = false, features = ["onig"] }
|
||||
hf-hub = { version = "0.4", default-features = false, features = ["ureq", "rustls-tls"] }
|
||||
ndarray = "0.16"
|
||||
|
||||
# Disk-footprint trim for dev / test builds. Codegen, opt-level, and
|
||||
# behavior are unchanged — only DWARF debug info is reduced (line
|
||||
|
||||
Reference in New Issue
Block a user