• v0.18.0 Stable

    altair823 released this 2026-05-26 05:37:47 +00:00 | 245 commits to main since this release

    v0.18.0 — fb-41 multi-hop RAG ship + NLI verification

    post-PR-8 의 dogfood 에서 발견된 S7 caffeine hallucination 의 root cause = LLM-self-judge ceiling. 본 release 가 deterministic external verifier (mDeBERTa-v3 XNLI ONNX) 로 close. 학계 표준 (Self-RAG, CRAG, Auto-GDA, MedTrust-RAG) 정합.

    새 surface

    • CLI: kebab ask --multi-hop <query> — multi-hop reasoning (decompose → decide → synthesize loop).
    • MCP: ask tool 의 multi_hop: true argument.
    • TUI: Ask 패널 의 F2 toggle + multi-hop badge + hops summary inline.

    새 wire (additive minor — pre-v0.18 reader 무영향)

    • answer.v1.hops — multi-hop per-iter trace ({kind, iter, sub_queries, context_chunks_added, llm_call_ms, forced_stop} per element).
    • answer.v1.verification — NLI groundedness summary ({nli_score, nli_threshold, nli_passed}, present only when cfg.rag.nli_threshold > 0).
    • error.v1.code enum 확장: multi_hop_decompose_failed, nli_verification_failed, nli_model_unavailable.

    새 config

    • [rag] nli_threshold (default 0.0 = disabled; production 권장 0.5).
    • [models.nli] model (default Xenova/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7).
    • [models.nli] provider (default onnx — only impl in v0.18; v0.19+ candle/remote candidate).
    • env override: KEBAB_RAG_NLI_THRESHOLD, KEBAB_MODELS_NLI_MODEL, KEBAB_MODELS_NLI_PROVIDER.

    새 RefusalReason

    • multi_hop_decompose_failed (LLM decompose JSON parse fail).
    • nli_verification_failed (entailment < nli_threshold).
    • nli_model_unavailable (download / inference 실패 — fail-closed; 사용자 우회 = [rag] nli_threshold = 0).

    권장 환경

    • LLM: gemma3:4b (CPU only, 16 GB RAM 권장).
    • NLI 활성화 시: ~280 MB first-run download to {data_dir}/models/nli/Xenova_mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/.
    • RAM peak (NLI + Ollama 동시, gemma3:4b 기준): ~7-8 GB (dogfood 실측, ONNX session ~600 MB 추가). 16 GB 환경 안전.
    • 8B+ Q4 모델 (gemma4:e4b 8B / gemma2:9b 등): 추정 peak ~10 GB — 16 GB 경계, OOM 위험.

    Known limitations (v0.18.1+ priority)

    • single-pass ask 는 NLI 미적용 (LlmSelfJudge 유지). multi-hop 만 step 8.5 gate.
    • atomic claim split 미적용 — entire answer 가 1 NLI call (paraphrase / multi-claim 답변의 평균 entailment 사용).
    • GPU acceleration 미지원 (CPU ONNX runtime).
    • S3 류 query (특정 input dependent nli_model_unavailable) consistent fail — v0.18.1 follow-up (HOTFIXES 의 fb-41 PR-9 closure entry 의 S3 subsection 참조).

    도그푸딩

    docs/dogfood/v0.18.0/SUMMARY.md + 4 sample wire JSON (S7 / S1 / S3 / S10 multi-hop ask 결과). 핵심: S7 caffeine hallucination root cause 해결 확정 — PR-8 baseline grounded=true + Adam gradient hallucination (silent) → PR-9 refusal_reason=nli_verification_failed, nli_score=0.0035 (graceful).

    Sub-PR sequence

    • PR #176 PR-9a — kebab-nli crate skeleton + workspace deps (ort 2.0-rc.9 / tokenizers 0.21 onig / hf-hub 0.4 / ndarray 0.16).
    • PR #177 PR-9b — OnnxNliVerifier ONNX inference + lazy hf-hub download.
    • PR #178 PR-9c-1 — core types + wire scaffolding (RefusalReason::Nli* + Answer.verification + RagPipeline.verifier + config knobs).
    • PR #179 PR-9c-2 — pipeline integration ★ NLI 실 활성화 (step 8.5 hook + App::open_with_config NLI verifier construction + 5 mock multi-hop tests + SKILL.md).
    • PR #180 PR-9d — dogfood retest + HOTFIXES closure + docs/dogfood/v0.18.0/ 보존.
    • PR #181 cleanup — workspace-wide clippy::pedantic baseline + post-PR9 refactor (H1 config wiring + 9 new tests).
    • PR #182 cut — version bump + cascading docs.

    Cascade rule

    v0.18.0 release 후 kebab.sqlite schema / wire schema 의 breaking 변경 없음 (additive minor only). 사용자 작업 불필요 — 기존 ingest / config 모두 호환.

    Downloads