kebab

altair823-org/kebab

Fork 0

RSS Feed

v0.18.0 08495eb425

Compare
v0.18.0 Stable

altair823 released this 2026-05-26 05:37:47 +00:00 | 245 commits to main since this release
v0.18.0 — fb-41 multi-hop RAG ship + NLI verification

post-PR-8 의 dogfood 에서 발견된 S7 caffeine hallucination 의 root cause = LLM-self-judge ceiling. 본 release 가 deterministic external verifier (mDeBERTa-v3 XNLI ONNX) 로 close. 학계 표준 (Self-RAG, CRAG, Auto-GDA, MedTrust-RAG) 정합.

새 surface
- CLI: kebab ask --multi-hop <query> — multi-hop reasoning (decompose → decide → synthesize loop).
- MCP: ask tool 의 multi_hop: true argument.
- TUI: Ask 패널 의 F2 toggle + multi-hop badge + hops summary inline.
새 wire (additive minor — pre-v0.18 reader 무영향)
- answer.v1.hops — multi-hop per-iter trace ({kind, iter, sub_queries, context_chunks_added, llm_call_ms, forced_stop} per element).
- answer.v1.verification — NLI groundedness summary ({nli_score, nli_threshold, nli_passed}, present only when cfg.rag.nli_threshold > 0).
- error.v1.code enum 확장: multi_hop_decompose_failed, nli_verification_failed, nli_model_unavailable.
새 config
- [rag] nli_threshold (default 0.0 = disabled; production 권장 0.5).
- [models.nli] model (default Xenova/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7).
- [models.nli] provider (default onnx — only impl in v0.18; v0.19+ candle/remote candidate).
- env override: KEBAB_RAG_NLI_THRESHOLD, KEBAB_MODELS_NLI_MODEL, KEBAB_MODELS_NLI_PROVIDER.
새 RefusalReason
- multi_hop_decompose_failed (LLM decompose JSON parse fail).
- nli_verification_failed (entailment < nli_threshold).
- nli_model_unavailable (download / inference 실패 — fail-closed; 사용자 우회 = [rag] nli_threshold = 0).
권장 환경
- LLM: gemma3:4b (CPU only, 16 GB RAM 권장).
- NLI 활성화 시: ~280 MB first-run download to {data_dir}/models/nli/Xenova_mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/.
- RAM peak (NLI + Ollama 동시, gemma3:4b 기준): ~7-8 GB (dogfood 실측, ONNX session ~600 MB 추가). 16 GB 환경 안전.
- 8B+ Q4 모델 (gemma4:e4b 8B / gemma2:9b 등): 추정 peak ~10 GB — 16 GB 경계, OOM 위험.
Known limitations (v0.18.1+ priority)
- single-pass ask 는 NLI 미적용 (LlmSelfJudge 유지). multi-hop 만 step 8.5 gate.
- atomic claim split 미적용 — entire answer 가 1 NLI call (paraphrase / multi-claim 답변의 평균 entailment 사용).
- GPU acceleration 미지원 (CPU ONNX runtime).
- S3 류 query (특정 input dependent nli_model_unavailable) consistent fail — v0.18.1 follow-up (HOTFIXES 의 fb-41 PR-9 closure entry 의 S3 subsection 참조).
도그푸딩

docs/dogfood/v0.18.0/SUMMARY.md + 4 sample wire JSON (S7 / S1 / S3 / S10 multi-hop ask 결과). 핵심: S7 caffeine hallucination root cause 해결 확정 — PR-8 baseline grounded=true + Adam gradient hallucination (silent) → PR-9 refusal_reason=nli_verification_failed, nli_score=0.0035 (graceful).

Sub-PR sequence
- PR #176 PR-9a — kebab-nli crate skeleton + workspace deps (ort 2.0-rc.9 / tokenizers 0.21 onig / hf-hub 0.4 / ndarray 0.16).
- PR #177 PR-9b — OnnxNliVerifier ONNX inference + lazy hf-hub download.
- PR #178 PR-9c-1 — core types + wire scaffolding (RefusalReason::Nli* + Answer.verification + RagPipeline.verifier + config knobs).
- PR #179 PR-9c-2 — pipeline integration ★ NLI 실 활성화 (step 8.5 hook + App::open_with_config NLI verifier construction + 5 mock multi-hop tests + SKILL.md).
- PR #180 PR-9d — dogfood retest + HOTFIXES closure + docs/dogfood/v0.18.0/ 보존.
- PR #181 cleanup — workspace-wide clippy::pedantic baseline + post-PR9 refactor (H1 config wiring + 9 new tests).
- PR #182 cut — version bump + cascading docs.
Cascade rule

v0.18.0 release 후 kebab.sqlite schema / wire schema 의 breaking 변경 없음 (additive minor only). 사용자 작업 불필요 — 기존 ingest / config 모두 호환.
Downloads
- Source Code (ZIP)
- Source Code (TAR.GZ)

40 Releases 40 Tags

v0.18.0 Stable

v0.18.0 — fb-41 multi-hop RAG ship + NLI verification

새 surface

새 wire (additive minor — pre-v0.18 reader 무영향)

새 config

새 RefusalReason

권장 환경

Known limitations (v0.18.1+ priority)

도그푸딩

Sub-PR sequence

Cascade rule