PR-9 의 진짜 작동 확인 — PR-1~PR-9c-2 머지 후 `/build/cache/dogfood-v018/` corpus 의 S7/S1/S3/S10 multi-hop retest. 핵심 결과: **S7 hallucination root cause 해결 확정**. - PR-8 baseline: `grounded=true, refusal_reason=null`, **답변=Adam gradient 공식** (caffeine 질문에 무관 hallucination, silent). - PR-9 retest: `refusal_reason=nli_verification_failed, nli_score=0.0035` (graceful refuse, NLI 가 entailment 0.35% 검출). 전체 비교 (4 case): - S7 ✅ hallucination FIXED. - S1 ✅ 둘 다 reject, NLI 가 더 deterministic (0.058). - S3 ⚠ consistent fail (`nli_model_unavailable`, 313s) — *v0.18.1 follow-up* (kebab-nli 의 특정 input 의존 fail, debug log emit 안 됨 → 진단 어려움). - S10 ✅ 둘 다 reject, NLI 가 더 deterministic (0.0028). - docs/dogfood/v0.18.0/SUMMARY.md (sanitized 보고서) + s{1,3,7,10}-multihop-post-pr9.json (sample wire output, repo 보존). - tasks/HOTFIXES.md 의 fb-41 PR-9 entry: "예정" → "완료 (2026-05-26)" + 비교 표 inline + S3 follow-up subsection (v0.18.1 candidate). RAM: 5-6 GB → 7-8 GB (ONNX session ~600 MB), 16 GB 안전. Disk: NLI model cache 1.1 GB (XDG default 또는 storage.model_dir). Wire 영향: 없음 (PR-9c-1 의 schema 변경만 + 측정값 sample 보존). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 lines
1.2 KiB
JSON
2 lines
1.2 KiB
JSON
{"answer":"근거 부족. 생성된 답변이 검색된 문서 내용에 충분히 entail 되지 않음.","citations":[],"created_at":"2026-05-26T01:35:26.098601876Z","embedding":{"dimensions":384,"id":"multilingual-e5-small","provider":"fastembed"},"grounded":false,"hops":[{"context_chunks_added":0,"forced_stop":false,"iter":0,"kind":"decompose","llm_call_ms":37810,"sub_queries":["What were the main causes of the dinosaurs' extinction?","When did the dinosaurs go extinct?","What types of dinosaurs went extinct?","What evidence supports the theory of the dinosaurs' extinction?","What role did the asteroid play in the dinosaurs' extinction?"]},{"context_chunks_added":15,"forced_stop":true,"iter":1,"kind":"decide","llm_call_ms":0,"sub_queries":[]}],"model":{"dimensions":null,"id":"gemma3:4b","provider":"ollama"},"prompt_template_version":"rag-multi-hop-v1","refusal_reason":"nli_verification_failed","retrieval":{"chunks_returned":0,"chunks_used":0,"k":10,"mode":"hybrid","score_gate":0.30000001192092896,"top_score":0.0,"trace_id":"ret_3152e943"},"schema_version":"answer.v1","usage":{"completion_tokens":0,"latency_ms":79182,"prompt_tokens":0},"verification":{"nli_passed":false,"nli_score":0.0027875436935573816,"nli_threshold":0.5}}
|