PR-9 의 진짜 작동 확인 — PR-1~PR-9c-2 머지 후 `/build/cache/dogfood-v018/` corpus 의 S7/S1/S3/S10 multi-hop retest. 핵심 결과: **S7 hallucination root cause 해결 확정**. - PR-8 baseline: `grounded=true, refusal_reason=null`, **답변=Adam gradient 공식** (caffeine 질문에 무관 hallucination, silent). - PR-9 retest: `refusal_reason=nli_verification_failed, nli_score=0.0035` (graceful refuse, NLI 가 entailment 0.35% 검출). 전체 비교 (4 case): - S7 ✅ hallucination FIXED. - S1 ✅ 둘 다 reject, NLI 가 더 deterministic (0.058). - S3 ⚠ consistent fail (`nli_model_unavailable`, 313s) — *v0.18.1 follow-up* (kebab-nli 의 특정 input 의존 fail, debug log emit 안 됨 → 진단 어려움). - S10 ✅ 둘 다 reject, NLI 가 더 deterministic (0.0028). - docs/dogfood/v0.18.0/SUMMARY.md (sanitized 보고서) + s{1,3,7,10}-multihop-post-pr9.json (sample wire output, repo 보존). - tasks/HOTFIXES.md 의 fb-41 PR-9 entry: "예정" → "완료 (2026-05-26)" + 비교 표 inline + S3 follow-up subsection (v0.18.1 candidate). RAM: 5-6 GB → 7-8 GB (ONNX session ~600 MB), 16 GB 안전. Disk: NLI model cache 1.1 GB (XDG default 또는 storage.model_dir). Wire 영향: 없음 (PR-9c-1 의 schema 변경만 + 측정값 sample 보존). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 lines
1.1 KiB
JSON
2 lines
1.1 KiB
JSON
{"answer":"근거 부족. NLI 검증 모델을 사용할 수 없음 — `[rag] nli_threshold = 0` 으로 비활성화 후 재시도 가능.","citations":[],"created_at":"2026-05-26T01:26:45.887132823Z","embedding":{"dimensions":384,"id":"multilingual-e5-small","provider":"fastembed"},"grounded":false,"hops":[{"context_chunks_added":0,"forced_stop":false,"iter":0,"kind":"decompose","llm_call_ms":37987,"sub_queries":["Kebab의 목적은 무엇인가?","Kebab은 multilingual-e5와 어떻게 통합되는가?","Kebab은 LanceDB와 어떻게 통합되는가?","Kebab은 RRF와 어떻게 통합되는가?","Kebab의 통합의 이유는 무엇인가?"]},{"context_chunks_added":15,"forced_stop":true,"iter":1,"kind":"decide","llm_call_ms":0,"sub_queries":[]}],"model":{"dimensions":null,"id":"gemma3:4b","provider":"ollama"},"prompt_template_version":"rag-multi-hop-v1","refusal_reason":"nli_model_unavailable","retrieval":{"chunks_returned":0,"chunks_used":0,"k":10,"mode":"hybrid","score_gate":0.30000001192092896,"top_score":0.0,"trace_id":"ret_d10dd875"},"schema_version":"answer.v1","usage":{"completion_tokens":0,"latency_ms":311799,"prompt_tokens":0}}
|