kebab

Author	SHA1	Message	Date
altair823	7c27633df2	chore(rag): post-PR9 refactor — H1/H2/H3/D/E + test coverage + post-refactor dogfood retest OMC team `post-pr9-refactor` 의 architectural cleanup. architect priorities 분석 후 executor + test-engineer 가 file edits, system-architect 가 component-level review 로 pre-cut nothing — all v0.18.1+ defer 결론. ## Executor 작업 (H1/H2/H3/D/E) - H1 (kebab-nli/src/onnx.rs): `[models.nli]` config wire 활성화. `DEFAULT_MODEL_ID` const 제거 (kebab-config 의 NliCfg::defaults 가 single source). OnnxNliVerifier::new 가 config.models.nli.model 읽고 config.models.nli.provider 가 "onnx" 아니면 anyhow::bail. 3 stale "PR-9c-1 will wire this" 코멘트 제거. 2 unit test 추가 (`new_uses_config_model_id`, `new_rejects_unsupported_provider`). - H2 (kebab-rag/src/pipeline.rs): `truncate_for_nli(premise: &str, _hypothesis: &str)` → `truncate_for_nli(premise: &str)`. v0.18.1 placeholder doc 제거. 4 callsite (tests/multi_hop.rs) 갱신 + test rename `multi_hop_truncate_for_nli_preserves_hypothesis` → `multi_hop_truncate_for_nli_char_budget` (contract 정합). - H3 (kebab-rag/src/pipeline.rs:1041): `was_truncated` 가 tracing::debug! 으로 surface (observability 추가, signature 보존 — caller logging contract). - D (kebab-mcp/tests/tools_call_ask_multi_hop.rs): request_timeout_secs 2 → 5 (slow CI 안정성), `mh_code` discriminator 제거. dispatch contract = `mh.is_error.unwrap_or(false)` (기존 assertion 으로 충분). - E (tasks/HOTFIXES.md + pipeline.rs:1633-1638): fb-41 PR-9 closure entry 의 sibling 으로 "### PR-9 NLI refusal: terminal Synthesize hop omitted from hops trace" subsection 추가. pipeline 의 "cleanup deferred to a follow-up" → "// See tasks/HOTFIXES.md ... for follow-up" cross-link. ## Test-engineer 작업 (T1/T2/T3/T4, 9 new tests) - T1 (kebab-nli/src/onnx.rs::tests): sanitize_model_id 3 unit (replaces_slash / idempotent / leaves_other_chars). - T2 (kebab-rag/tests/multi_hop_nli_panic.rs 신규): 2 panic-path tests — facade invariant (`expect("verifier must be Some when nli_threshold > 0.0")`) 의 #[should_panic] + threshold=0 의 companion. - T3 (kebab-rag/tests/multi_hop_nli_stream.rs 신규): 2 StreamEvent::Final tests — refuse_nli_verification + refuse_nli_model_unavailable 의 stream_sink Final 분기 wire shape pinning. - T4 (kebab-app/tests/open_with_config_nli.rs 신규): 2 NLI failure path — model_dir 가 unwritable 일 때 App::open_with_config 의 Result<App> Err (with "OnnxNliVerifier" in chain) + threshold=0 일 때 graceful skip. ## System-architect 결론 3 lenses (absorption / duplication / under-engineered interface) 분석 결과 — pre-cut nothing. Top-3 items 모두 v0.18.1+ defer: - Lens 1: kebab-normalize + kebab-parse-types 흡수 가능 (parse-md 만 사용, 5 parsers 우회) → v0.18.1+. - Lens 3: Extractor + Chunker trait 의 dead polymorphism (모든 callsite 가 hardcoded) → v0.18.1+. - Lens 1 bundled: kebab-source-fs 가 kebab-parse-code 의 9 tree-sitter grammars drag → low-risk dep-graph win, v0.18.1+ bundled. - Defer-with-intent: LanguageModel async refactor (cloud-LLM 시), NliVerifier::score_batch + typed NliError (2nd impl 시), compute_stale → kebab-core::stale. 보고서: /build/cache/tmp/post-pr9-refactor-priorities.md, /build/cache/tmp/system-architecture-priorities.md (둘 다 repo 외 — analysis 보존). ## 검증 - cargo test -p kebab-nli -j 1 → 11/11 pass. - cargo test -p kebab-rag -j 1 → 75/75 pass (5 NLI multi-hop + 4 신규 T2/T3 포함). - cargo test -p kebab-app -j 1 → 23 pass + 2 ignored (T4 의 2 포함). - cargo test -p kebab-mcp --test tools_call_ask_multi_hop -j 1 → 1 pass + 1 pre-existing flaky (HOTFIX #15, no_chunks short-circuit, executor D fix 와 무관 — line 86 의 base assertion 이 fixture 없어서 fail). - cargo clippy --workspace --all-targets -j 1 -- -D warnings clean. - cargo test --workspace --no-fail-fast -j 1 → 1304 passed (+11 new) + 1 pre-existing flaky 동일. - Post-refactor dogfood retest byte-identical (PR-9d / post-cleanup / post-refactor 3번 모두): S7 0.0035389824770390987, S1 0.058334656059741974, S10 0.0027875436935573816, S3 nli_model_unavailable. docs/dogfood/v0.18.0/SUMMARY.md 에 "Post-architectural-refactor retest" section 추가. Wire 영향: 없음. Behavior 영향: 없음 (H1 의 config wiring 가 default 와 같은 model → byte-identical). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 04:42:37 +00:00
altair823	3712d005cc	chore(rag): post-cleanup dogfood retest — byte-identical 회귀 0 workspace-wide cleanup commit 직후 동일 dogfood S7/S1/S10/S3 재실행. NLI score byte-identical 확인: - S7: nli_verification_failed, 0.0035389824770390987 (PR-9d 동일) - S1: nli_verification_failed, 0.058334656059741974 (PR-9d 동일) - S10: nli_verification_failed, 0.0027875436935573816 (PR-9d 동일) - S3: nli_model_unavailable (PR-9d 동일, cleanup 무관 — v0.18.1 follow-up) cleanup = mechanical refactor only. behavior 회귀 0. cut PR v0.18.0 진행 가능. docs/dogfood/v0.18.0/SUMMARY.md 에 "Post-cleanup retest" section 추가. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 03:19:15 +00:00
altair823	a8fd6994d2	chore(rag): PR-9d SUMMARY 의 latency 표 정정 PR-8 baseline 의 S1/S10 latency 추정값 (~150s, ~80s) 이 부정확. `results/s1-multihop.json` + `results/s10-multihop.json` 가 실제로 614s / 589s (`jq '.usage.latency_ms'` 측정) — PR-8 시점 baseline 이 아닌 더 이전 timeline. S7 만 `results/post-pr8/` 에 retest 보존되어 비교 의미 있음 (158s baseline → PR-9 241s with NLI first-run download). SUMMARY.md 의 latency 표를 정정 — S1/S10 의 동일 시점 baseline 부재 명시 + S7 의 단일 비교만 의미 있음 caveat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 01:47:38 +00:00
altair823	505b3889fb	feat(rag): fb-41 PR-9d — dogfood retest + HOTFIXES PR-9 closure + docs/dogfood/v0.18.0/ 보존 PR-9 의 진짜 작동 확인 — PR-1~PR-9c-2 머지 후 `/build/cache/dogfood-v018/` corpus 의 S7/S1/S3/S10 multi-hop retest. 핵심 결과: S7 hallucination root cause 해결 확정. - PR-8 baseline: `grounded=true, refusal_reason=null`, 답변=Adam gradient 공식 (caffeine 질문에 무관 hallucination, silent). - PR-9 retest: `refusal_reason=nli_verification_failed, nli_score=0.0035` (graceful refuse, NLI 가 entailment 0.35% 검출). 전체 비교 (4 case): - S7 ✅ hallucination FIXED. - S1 ✅ 둘 다 reject, NLI 가 더 deterministic (0.058). - S3 ⚠ consistent fail (`nli_model_unavailable`, 313s) — v0.18.1 follow-up (kebab-nli 의 특정 input 의존 fail, debug log emit 안 됨 → 진단 어려움). - S10 ✅ 둘 다 reject, NLI 가 더 deterministic (0.0028). - docs/dogfood/v0.18.0/SUMMARY.md (sanitized 보고서) + s{1,3,7,10}-multihop-post-pr9.json (sample wire output, repo 보존). - tasks/HOTFIXES.md 의 fb-41 PR-9 entry: "예정" → "완료 (2026-05-26)" + 비교 표 inline + S3 follow-up subsection (v0.18.1 candidate). RAM: 5-6 GB → 7-8 GB (ONNX session ~600 MB), 16 GB 안전. Disk: NLI model cache 1.1 GB (XDG default 또는 storage.model_dir). Wire 영향: 없음 (PR-9c-1 의 schema 변경만 + 측정값 sample 보존). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 01:44:57 +00:00

4 Commits