kebab

Author	SHA1	Message	Date
altair823	9d7faab650	docs+chore(plan-bootstrap): apply spec L-1 cosmetic fix + capture cargo tree baselines for v0.20 sub-item 1 verifier gates Step 1 (Group A) of v0.20.0 sub-item 1 (scanned PDF OCR) implementation plan. A1 — spec §4.2 line 740 prose pseudo-code fix: `app.pdf_ocr_engine.as_ref()` → local `pdf_ocr_engine: Option<OllamaVisionOcr>` built in `ingest_with_config_opts` (정합 with §4.4 eager init, App field 도입 0). A2 — Cargo.toml dep invariant verified (image crate 미도입 — H-3 DCTDecode-only v1 invariant 보존; kebab-parse-pdf + kebab-parse-image 가 kebab-app 의 기존 dep). description 갱신은 Step 3 (module 추가 후) 으로 이연. A3 — cargo tree baseline 캡처 — K5 row #9/#10 의 ground-truth (.omc/state/pdf-ocr-{app-parse,parse-pdf}-deps.baseline.txt). 본 sub-item 의 다른 step 의 dep graph 변경 0 invariant 의 verifier 의 baseline. Note: .omc/ 는 .gitignore 대상 — baseline files 는 로컬 파일로 존재. spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (round 1c ACCEPT) contract: §9 (additive minor wire bump — 후속 step) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 03:40:49 +00:00
altair823	bcd1e37dab	chore(repo): .omc/ ignore + AGENTS·GEMINI symlinks + release notes 작성 가이드 강화 - .gitignore: .omc/ (OMC state directory) 추가 — .claude/, .superpowers/ 와 동급 - AGENTS.md / GEMINI.md: CLAUDE.md 로의 symlink — Codex / Gemini CLI 도 동일 지침 따르도록 - CLAUDE.md release 절차: release notes 가 commit subject 단순 나열 대신 사용자 친절한 설명 + 도그푸딩/테스트 결과 포함하도록 가이드 강화 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 23:58:04 +00:00
altair823	e7a4330798	Merge pull request 'docs: v0.20 image+pdf handoff + sub-item 3 spec/plan backfill' (#188 ) from docs/v0-20-image-pdf-handoff into main Reviewed-on: #188	2026-05-26 23:36:49 +00:00
altair823	574e1b1ca1	docs: v0.20 image+pdf handoff + sub-item 3 spec/plan backfill v0.19.0 release 후 다음 session 인계용 handoff 문서 + 사후 backfill. - docs/superpowers/handoffs/2026-05-26-v0.20-image-pdf-normalize-handoff.md (540 lines, 9 section) - sub-item 1/2/3 머지 결과 + 도그푸딩 baseline (1781 doc / 9050 chunks) + user memory + OMC workflow + 빌드 환경 - 현재 구현 상태 (v0.19.0, image+pdf) — 정확한 file:line + struct/fn signature + flow - 8 TODO 상세 (problem + scope + affected files + risk + trigger 조건) - 우선순위 + sequencing 권장 + 새 session 첫 단계 제안 - docs/superpowers/specs/2026-05-26-extractor-dispatch-unification-spec.md (sub-item 3 spec) - docs/superpowers/plans/2026-05-26-extractor-dispatch-unification-plan.md (sub-item 3 plan) PR #187 머지 시 source code 만 들어가고 spec/plan 누락 — 동일 PR 의 reference link 가 main 에서 404. 본 commit 으로 backfill. Assisted-by: Claude Code	2026-05-26 23:34:17 +00:00
altair823	c1e82cca92	Merge pull request 'refactor(app): extract dispatch polymorphism — App.extract_for(...) + 11 Extractor registry' (#187 ) from refactor/extractor-dispatch-unification into main Reviewed-on: #187 v0.19.0	2026-05-26 21:07:20 +00:00
altair823	2c05dbd0dd	refactor(app): extract dispatch polymorphism — App.extract_for(...) + 11 Extractor registry kebab-app 의 hardcoded extract dispatch (`ImageExtractor` + `PdfTextExtractor` + 9 AST `*Extractor` 의 `::new().extract(…)` callsite 11곳 + 9 AST arm match) 를 `App::extract_for(&MediaType, &ExtractContext, &[u8])` 단일 polymorphic call 로 통합. trait 변경 0, parser source 변경 0, wire schema 변경 0 (success path). 핵심 변경: - App struct 에 `pub(crate) extractors: Vec<Box<dyn Extractor + Send + Sync>>` field + `pub(crate) fn extract_for(...)` helper method. - App::open_with_config 의 registry init = 11 Extractor (image + pdf + 9 AST). - ImagePipeline struct 의 `extractor: &'a ImageExtractor` field 제거 + lib.rs:356 local + lib.rs:1235 alias 삭제 (atomic block). - 9 AST arm (lib.rs:2012-2047 의 12 arm = 11 explicit + 1 wildcard) → 4 arm (9 AST grouped + 7 manifest + 1 shell + 1 other-bail). - in-crate unit test (app.rs 의 `mod tests_extractor_dispatch`) 3 class: registry length 11 / mutually-exclusive supports() grid (16 sample MediaType) / extract_for error path (Audio). scope = AST 9-arm + image + pdf extract callsite only. MarkdownExtractor / Tier 2/3 / outer 4-arm / inner 4 match / Chunker dispatch 모두 future-defer (별 PR — spec §11). Wire schema (success path) 변경 0 — ingest_report.v1 / search_response.v1 / answer.v1 byte-identical (4-medium SMOKE 비교 검증). error.v1.message 의 internal context string wording 변경 (예: `kb-parse-image::ImageExtractor::extract` → `kb-app::extract_for (image)`) 은 spec §5.5 risk acceptance — `error.v1.code` + `error.v1.schema_version` 보존, user-visible surface 외. Cargo workspace.version bump 0. Refs: - docs/superpowers/specs/2026-05-26-extractor-dispatch-unification-spec.md (2 round APPROVE) - docs/superpowers/plans/2026-05-26-extractor-dispatch-unification-plan.md (3 round APPROVE) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-26 17:43:44 +00:00
altair823	96766406aa	Merge pull request 'refactor(parse-md): absorb kebab-normalize + kebab-parse-types — 24 → 22 crates + §3.7b 재작성' (#186 ) from refactor/normalize-absorption into main Reviewed-on: #186	2026-05-26 15:37:21 +00:00
altair823	710945c4b0	refactor(parse-md): absorb kebab-normalize + kebab-parse-types — 24 → 22 crates + §3.7b 재작성 design §3.7b 의 thin layer (ParsedBlock 류) 가 4 parser 중 1개 (markdown) 만 lift 를 경유하는 현실 — fan-in/fan-out 모두 1 → layer 의미 잃음. kebab-normalize (1097 LOC) + kebab-parse-types (98 LOC) 둘을 kebab-parse-md 로 흡수. 설계: docs/superpowers/specs/2026-05-26-normalize-absorption-spec.md 플랜: docs/superpowers/plans/2026-05-26-normalize-absorption-plan.md HOTFIXES: tasks/HOTFIXES.md 의 2026-05-26 entry (design deviation) - 5 사용 type + 3 forward-declared struct → kebab-parse-md::types module 의 pub explicit re-export. - build_canonical_document + derive_title + warning_agent → kebab-parse-md::normalize module. - 4 hard-coded agent literal (lib.rs:122/128/134/153) + warning_agent body return + tracing target literal 모두 보존 — stage label 일관성. - kebab-app callsite (lib.rs:51 use + :1119 context string) + Cargo.toml 의 2 dep (regular + dead) 제거. - kebab-chunk + kebab-store-sqlite 의 [dev-dependencies] kebab-normalize → 제거 (kebab-parse-md 로 갈음). 통합 test source 의 use shift. - test file 이동 (kebab-normalize/tests/normalize_snapshot.rs → kebab-parse-md/tests/). - workspace Cargo.toml: Hunk (a) members 2 entry 삭제 + Hunk (b) version 0.18.0 → 0.19.0 (frozen contract 변경). - design §3.7b 4-단락 재작성 (원래 intent 보존 + 현재 상태 + 보존된 surface + future re-extraction trigger). - design §8 graph 갱신 (3 edge 제거 + 2 forbidden bullet 의미 갱신 + commentary). - ARCHITECTURE.md crate graph + directory tree mechanical 갱신. - tasks/INDEX.md L169 closure mention + "Future work / deferred" 섹션 신설 (image/pdf normalize integration entry). - tasks/HOTFIXES.md 신규 entry (4-block — design deviation Symptom). - HANDOFF.md cross-link 한 줄. - 3 dead struct (ParsedImageRegion / ParsedPdfPage / ParsedAudioSegment) 는 보존 — v0.20+ image/pdf normalize integration 의 future surface (spec §11). Wire / surface impact: 0건. CLI / TUI / MCP / --json 출력 / config / XDG path / parser_version 모두 unchanged. wire-invisible provenance.events[].agent + tracing target literal "kb-normalize" 도 보존 — old DB row 와 new DB row 의 audit log 일관성. Verification: cargo test --workspace --no-fail-fast -j 1 → 1313 passed / 0 failed (172 result blocks). cargo clippy --workspace --all-targets -j 1 -- -D warnings → 0 warning (5m 46s). cargo metadata --no-deps --format-version 1 \| jq '.workspace_members \| length' = 22. cargo tree -p kebab-app --depth 2 \| grep -E "kebab_(parse_types\|normalize)" = 0 줄.	2026-05-26 15:00:59 +00:00
altair823	d4395a306b	Merge pull request 'refactor(source-fs): drop kebab-parse-code dep — 9 tree-sitter grammars drag 제거' (#185 ) from refactor/source-fs-dep-lightening into main Reviewed-on: #185	2026-05-26 12:31:29 +00:00
altair823	bd48baa19a	refactor(source-fs): drop kebab-parse-code dep — 9 tree-sitter grammars drag 제거 kebab-source-fs 가 kebab-parse-code 의 9 tree-sitter grammars 를 drag 했던 무거운 의존성 제거. 4 surface (code_lang_for_path / is_generated_file / is_oversized / BUILTIN_BLACKLIST) 만 사용하지만 dep 그래프에서 9 grammar 전체 link → kebab-source-fs::code_meta 로 이전 + kebab-parse-code 측 cleanup. 핵심 변경: - kebab-source-fs::code_meta 신설: 4 surface 이전 (BUILTIN_BLACKLIST `pub` for frozen contract + 3 helper fn `pub(crate)`). lib.rs 의 `pub use code_meta::BUILTIN_BLACKLIST` 1 줄 추가 (Option A — 다른 mod surface 무근거 확장 0). - callsite migration: media.rs (1) + walker.rs (2) + connector.rs (2) 모두 `kebab_source_fs::code_meta::` 로 갱신. - kebab-parse-code 측 cleanup: skip.rs 삭제 + lang.rs narrow edit (code_lang_for_path body + unit test 2 + Path import 삭제, module_path_for_ 보존) + lib.rs 헤더 doc rewrite (migration breadcrumb 포함). - tests/{lang,skip}.rs 13 test 이동 — 12 unit (`src/code_meta.rs::tests`) + 1 integration (`tests/code_meta.rs` for BUILTIN_BLACKLIST frozen contract). - design §8 graph: edge 제거 + p10-2 inline note. ARCHITECTURE.md 산문 1 줄 갱신. kebab-core::metadata.rs:36 stale dep reference 정정. G1+G5: cargo tree -p kebab-source-fs \| grep tree-sitter = 0 줄. G2+G3: workspace test 회귀 0 + 13 test 1:1 이동. G4: design §8 + ARCHITECTURE.md 갱신. Wire 영향: 없음 (internal Rust crate-API surface 만, user-facing 0). Cargo workspace.version bump 불필요. Refs: - docs/superpowers/specs/2026-05-26-source-fs-dep-lightening-spec.md (v3, 4-round APPROVE) - docs/superpowers/plans/2026-05-26-source-fs-dep-lightening-plan.md (v4, 4-round ACCEPT) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 12:19:32 +00:00
altair823	b02ac8200e	Merge pull request 'fix(rag): S3 NLI unavailable — hypothesis char budget + token-count fallback retry' (#184 ) from fix/s3-nli-model-unavailable-diagnose into main Reviewed-on: #184	2026-05-26 09:17:12 +00:00
altair823	336962715a	fix(rag): S3 NLI unavailable — hypothesis char budget + token-count fallback retry S3 dogfood query 의 `nli_model_unavailable` consistent fail root cause = mDeBERTa-v3 tokenizer 의 `OnlyFirst` strategy + 949-token hypothesis. 기존 char-budget 단독 fix 의 KR-extreme density 미해결 → token-count fallback retry + RC1-residual trait dispatch 정합. 핵심 변경: - kebab-nli::NliVerifier: `hypothesis_token_count(&str) -> Result<usize>` trait method 추가 (default `Ok(0)` backward-compat). `OnnxNliVerifier` 가 trait impl block 안에서 real mDeBERTa tokenize override — vtable 등록 보장 (round-3 critic RC1-residual closure). - kebab-rag::pipeline: `MAX_NLI_HYPOTHESIS_CHARS_INITIAL = 1200` + `MAX_NLI_HYPOTHESIS_CHARS_MIN = 150` const + `pub(crate) fn truncate_chars` pure-fn + `pub fn truncate_hypothesis_for_nli_with_budget` retry helper (char budget 반감 retry, min floor 시 graceful unavailable). step 8.5 hook 의 callsite explicit `match` + `return self.refuse_nli_model_unavailable` 패턴 (`?` 금지 — round-2 plan critic CRITICAL #1 closure). - SpyNliVerifier 신규 helper (closure score_fn + hypothesis_token_count_fn, 2-arg constructor). - §5.1 의 2 ignored test (EN-long err + vtable dispatch RC1-residual pin) + §5.2 의 4 boundary test (truncate_chars) + §5.3 의 3 mock multi-hop test (long_en_grounded / long_kr_retries / unrelenting_fallback). +7 new tests (2 ignored default skip). - tasks/HOTFIXES.md 신규 dated entry `## 2026-05-26 — S3 NLI unavailable ...` — Symptom / Root cause / Action / Amends 4-block. - spec + plan (`docs/superpowers/{specs,plans}/2026-05-26-s3-nli-model-unavailable-diagnose-.md`) — 4 round spec + 3 round plan OMC reviewer ACCEPT 산출물. 검증: - cargo test -p kebab-nli -j 1 → 11/11 pass + 7 ignored default skip. - cargo test -p kebab-rag -j 1 → 19+3+3+... 전체 pass + 3 new mock + 4 new boundary. - cargo test --workspace --no-fail-fast -j 1 → 1313 pass (+7 new)*, 0 failed. 회귀 0 (HOTFIX #15 이미 fixed, no remaining flaky). - cargo clippy --workspace --all-targets -j 1 -- -D warnings clean (type_complexity allow on Arc<dyn Fn> type aliases). KR safe (token-count retry path) + graceful fallback (min floor 시 기존 unavailable wire 유지, regression 0). Wire 영향 없음 (additive trait method). Cargo bump 불필요. Refs: - spec: docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md (4 round APPROVE — analyst → critic + verifier × 4 rounds) - plan: docs/superpowers/plans/2026-05-26-s3-nli-model-unavailable-diagnose-plan.md (3 round ACCEPT — planner → critic-plan + verifier-plan × 3 rounds) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 09:12:21 +00:00
altair823	1a224bf983	Merge pull request 'fix(mcp): HOTFIX #15 — MCP ask multi_hop dispatch-divergence assertion (fixture 보강)' (#183 ) from fix/hotfix-15-mcp-ask-multi-hop-flaky into main Reviewed-on: #183	2026-05-26 07:02:26 +00:00
altair823	a210bf5d52	docs(rag): HOTFIX #15 spec + plan (3 round OMC reviewer approve) OMC team `hotfix-15-mcp-flaky` 의 spec + plan 작성 + 리뷰 산출물. - spec: analyst 가 진단 (root cause = PR-7 probe-first 가 PR-5 test 의 stale empty-KB contract 와 mismatch) + Option A 권장 (test-only fix). 3 round review (critic + verifier): CRITICAL C1 (fixture/query FTS5 0 hits) + MAJOR M1/M2 + 등 closure. - plan: planner 가 7 steps + subagent dispatch task 작성. 3 round review (critic-plan + verifier-plan): empirical SQLite REPL 검증, level-1 dated entry placement, actual KebabHandler/KebabAppState pattern 정합. implementation = `429287f` commit (executor). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 06:52:04 +00:00
altair823	429287f6cb	fix(mcp,tests): HOTFIX #15 — MCP ask multi_hop dispatch-divergence assertion (fixture 보강) PR-7 (v0.18 dogfood probe-first) 머지 후 PR-5 의 test `ask_tool_routes_multi_hop_true_to_decompose_first` 가 stale empty-KB contract 로 deterministic fail. test-only fix — production code 0 touch. - `minimal_config`: `score_gate = 0.0` (probe 의 second gate `top_score < score_gate` 우회, test config isolation). - fixture `workspace_root/note.md`: "This note is about a compound containing X and Y in detail." — build_match_string 의 token_and branch (FTS5 implicit-AND) 가 `compound` + `about` + `and` 셋 다 매칭 필요. empirical SQLite REPL (V007 trigram DDL) 로 1 hit 확정. - 기존 assertion 보존, single-pass branch 도 query "anything" 으로 fixture 미매칭 → NoChunks refusal 유지. - 신규 `_multi_hop_short_circuits_when_probe_empty` test (REQUIRED — round-1 critic HIGH + verifier 격상): probe-empty short-circuit 의 MCP-layer wire shape pin (kebab-rag::multi_hop_empty_probe_pool_refuses_before_any_llm_call 은 RAG-layer 만 pin, MCP-layer 안전망 부재). - module doc 갱신: 두 test 가 각각 pin 하는 contract enumerate. inline 주석 (line 94-101) 도 새 contract 정합. - HOTFIXES.md 신규 dated entry \`## 2026-05-26 — HOTFIX #15 ...\` (date-top convention). 검증: cargo test --workspace -j 1 — 회귀 0 (known flaky 1 → 0). cargo clippy --workspace --all-targets -j 1 -- -D warnings clean. Wire / behavior / version cascade: 0. Refs: docs/superpowers/specs/2026-05-26-hotfix-15-mcp-ask-multi-hop-flaky-spec.md (review 3 rounds APPROVE) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 06:51:06 +00:00
altair823	08495eb425	Merge pull request 'chore(release): bump version 0.17.2 → 0.18.0 + cut fb-41 multi-hop' (#182 ) from chore/v0-18-0-cut into main Reviewed-on: #182 v0.18.0	2026-05-26 05:36:05 +00:00
altair823	98cf4e8a04	chore(release): bump version 0.17.2 → 0.18.0 + cut fb-41 multi-hop v0.18.0 cut PR. fb-41 multi-hop RAG + NLI verification 의 user-visible surface (PR #176-180) + post-PR9 cleanup/refactor (PR #181) ship 마무리. ## 변경 사항 ### Version - workspace `Cargo.toml`: 0.17.2 → 0.18.0. Cargo.lock 자동 cascade (24 kebab-* crate 모두 0.18.0). ### Frozen design contract - `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`: - §3.8 RAG types — RefusalReason 에 NliVerificationFailed + NliModelUnavailable + MultiHopDecomposeFailed 추가 + Multi-hop RAG + NLI verification 의 ask_multi_hop facade + step 8.5 NLI hook + HopRecord / VerificationSummary 명시. - §9 versioning rules 표 — nli_model_version row 신규 (선택 — v0.19+ second adapter 시 wire surface candidate). ### Status transitions - `docs/superpowers/specs/2026-05-25-p9-fb-41-finalize-spec.md`: status approved-by-team → completed. - `docs/superpowers/plans/2026-05-25-p9-fb-41-finalize-plan.md`: status approved-by-team → completed (spec_status 도). ### User-facing docs - `README.md`: 명령 표의 `kebab ask` row 에 `--multi-hop` flag + NLI 옵션 안내 한 단락 (mDeBERTa-v3 XNLI 280 MB 자동 다운로드 / RAM peak ~7-8 GB / threshold tuning 0.5 prod / 0.0 disable). - `docs/SMOKE.md`: `[rag] nli_threshold = 0.0` config 예시 + 활성화 절차 + first-run download + RAM 권장 inline 안내. ### Handoff + dashboard - `HANDOFF.md`: 한 줄 요약 의 현재 version 0.17.2 → 0.18.0. v0.18.0 cut entry 추가 (fb-41 multi-hop + NLI + cleanup ship). Component 카운트 단락에 fb-41 PR-9 의 kebab-nli + ask_multi_hop 추가 명시. 머지 후 결정 절 맨 위에 v0.18.0 fb-41 entry 신규. - `tasks/INDEX.md`: p9-fb-41 ⏳ → ✅ 머지 (v0.18.0). v0.18.0 subsection 신규 — PR #176-181 의 6 sub-PR + cleanup 각 한 줄 요약. ## 비범위 / 별 작업 - HOTFIXES.md 의 fb-41 entry 는 이미 PR #180 (PR-9d closure) 에서 작성 완료 — 본 cut PR 에서 추가 anchor 불필요. - SKILL.md 의 v0.18+ NLI 안내는 이미 PR-9c-2 에서 inline 추가 완료. ## 검증 - `cargo check --workspace -j 1` 통과 (모든 24 crate v0.18.0 확인). - frozen design 의 RefusalReason enum 확장이 kebab-core 의 production code 와 정합 (PR-9c-1 시점부터 동일 variants 있음). Wire 영향: 없음 (additive minor 는 PR-9c-1 에서 이미 ship, 본 commit 은 documentation cascade only). Behavior 영향: 없음. 머지 후 `gitea-release v0.18.0` 으로 tag + release notes 작성. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 05:18:08 +00:00
altair823	4030f04f37	Merge pull request 'chore: workspace-wide cleanup — clippy::pedantic baseline + auto-fix' (#181 ) from chore/workspace-wide-cleanup-pre-v0-18 into main Reviewed-on: #181	2026-05-26 04:48:50 +00:00
altair823	7c27633df2	chore(rag): post-PR9 refactor — H1/H2/H3/D/E + test coverage + post-refactor dogfood retest OMC team `post-pr9-refactor` 의 architectural cleanup. architect priorities 분석 후 executor + test-engineer 가 file edits, system-architect 가 component-level review 로 pre-cut nothing — all v0.18.1+ defer 결론. ## Executor 작업 (H1/H2/H3/D/E) - H1 (kebab-nli/src/onnx.rs): `[models.nli]` config wire 활성화. `DEFAULT_MODEL_ID` const 제거 (kebab-config 의 NliCfg::defaults 가 single source). OnnxNliVerifier::new 가 config.models.nli.model 읽고 config.models.nli.provider 가 "onnx" 아니면 anyhow::bail. 3 stale "PR-9c-1 will wire this" 코멘트 제거. 2 unit test 추가 (`new_uses_config_model_id`, `new_rejects_unsupported_provider`). - H2 (kebab-rag/src/pipeline.rs): `truncate_for_nli(premise: &str, _hypothesis: &str)` → `truncate_for_nli(premise: &str)`. v0.18.1 placeholder doc 제거. 4 callsite (tests/multi_hop.rs) 갱신 + test rename `multi_hop_truncate_for_nli_preserves_hypothesis` → `multi_hop_truncate_for_nli_char_budget` (contract 정합). - H3 (kebab-rag/src/pipeline.rs:1041): `was_truncated` 가 tracing::debug! 으로 surface (observability 추가, signature 보존 — caller logging contract). - D (kebab-mcp/tests/tools_call_ask_multi_hop.rs): request_timeout_secs 2 → 5 (slow CI 안정성), `mh_code` discriminator 제거. dispatch contract = `mh.is_error.unwrap_or(false)` (기존 assertion 으로 충분). - E (tasks/HOTFIXES.md + pipeline.rs:1633-1638): fb-41 PR-9 closure entry 의 sibling 으로 "### PR-9 NLI refusal: terminal Synthesize hop omitted from hops trace" subsection 추가. pipeline 의 "cleanup deferred to a follow-up" → "// See tasks/HOTFIXES.md ... for follow-up" cross-link. ## Test-engineer 작업 (T1/T2/T3/T4, 9 new tests) - T1 (kebab-nli/src/onnx.rs::tests): sanitize_model_id 3 unit (replaces_slash / idempotent / leaves_other_chars). - T2 (kebab-rag/tests/multi_hop_nli_panic.rs 신규): 2 panic-path tests — facade invariant (`expect("verifier must be Some when nli_threshold > 0.0")`) 의 #[should_panic] + threshold=0 의 companion. - T3 (kebab-rag/tests/multi_hop_nli_stream.rs 신규): 2 StreamEvent::Final tests — refuse_nli_verification + refuse_nli_model_unavailable 의 stream_sink Final 분기 wire shape pinning. - T4 (kebab-app/tests/open_with_config_nli.rs 신규): 2 NLI failure path — model_dir 가 unwritable 일 때 App::open_with_config 의 Result<App> Err (with "OnnxNliVerifier" in chain) + threshold=0 일 때 graceful skip. ## System-architect 결론 3 lenses (absorption / duplication / under-engineered interface) 분석 결과 — pre-cut nothing. Top-3 items 모두 v0.18.1+ defer: - Lens 1: kebab-normalize + kebab-parse-types 흡수 가능 (parse-md 만 사용, 5 parsers 우회) → v0.18.1+. - Lens 3: Extractor + Chunker trait 의 dead polymorphism (모든 callsite 가 hardcoded) → v0.18.1+. - Lens 1 bundled: kebab-source-fs 가 kebab-parse-code 의 9 tree-sitter grammars drag → low-risk dep-graph win, v0.18.1+ bundled. - Defer-with-intent: LanguageModel async refactor (cloud-LLM 시), NliVerifier::score_batch + typed NliError (2nd impl 시), compute_stale → kebab-core::stale. 보고서: /build/cache/tmp/post-pr9-refactor-priorities.md, /build/cache/tmp/system-architecture-priorities.md (둘 다 repo 외 — analysis 보존). ## 검증 - cargo test -p kebab-nli -j 1 → 11/11 pass. - cargo test -p kebab-rag -j 1 → 75/75 pass (5 NLI multi-hop + 4 신규 T2/T3 포함). - cargo test -p kebab-app -j 1 → 23 pass + 2 ignored (T4 의 2 포함). - cargo test -p kebab-mcp --test tools_call_ask_multi_hop -j 1 → 1 pass + 1 pre-existing flaky (HOTFIX #15, no_chunks short-circuit, executor D fix 와 무관 — line 86 의 base assertion 이 fixture 없어서 fail). - cargo clippy --workspace --all-targets -j 1 -- -D warnings clean. - cargo test --workspace --no-fail-fast -j 1 → 1304 passed (+11 new) + 1 pre-existing flaky 동일. - Post-refactor dogfood retest byte-identical (PR-9d / post-cleanup / post-refactor 3번 모두): S7 0.0035389824770390987, S1 0.058334656059741974, S10 0.0027875436935573816, S3 nli_model_unavailable. docs/dogfood/v0.18.0/SUMMARY.md 에 "Post-architectural-refactor retest" section 추가. Wire 영향: 없음. Behavior 영향: 없음 (H1 의 config wiring 가 default 와 같은 model → byte-identical). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 04:42:37 +00:00
altair823	3712d005cc	chore(rag): post-cleanup dogfood retest — byte-identical 회귀 0 workspace-wide cleanup commit 직후 동일 dogfood S7/S1/S10/S3 재실행. NLI score byte-identical 확인: - S7: nli_verification_failed, 0.0035389824770390987 (PR-9d 동일) - S1: nli_verification_failed, 0.058334656059741974 (PR-9d 동일) - S10: nli_verification_failed, 0.0027875436935573816 (PR-9d 동일) - S3: nli_model_unavailable (PR-9d 동일, cleanup 무관 — v0.18.1 follow-up) cleanup = mechanical refactor only. behavior 회귀 0. cut PR v0.18.0 진행 가능. docs/dogfood/v0.18.0/SUMMARY.md 에 "Post-cleanup retest" section 추가. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 03:19:15 +00:00
altair823	7c85de065a	chore: workspace-wide cleanup — clippy::pedantic baseline + auto-fix cut PR v0.18.0 전 마지막 정리. 사용자 요청: "전체 코드베이스를 깔끔하고 알아보기 쉽게". ## Workspace lints - `Cargo.toml` 의 `[workspace.lints.clippy]` 에 `pedantic = "warn"` (priority -1) + 의도적 allow-list 추가: - cast_possible_truncation / cast_possible_wrap / cast_sign_loss / cast_precision_loss — ONNX i64 / hash modular reduction 등 의도적 truncation. - doc_markdown / missing_errors_doc / missing_panics_doc — cosmetic doc style. - too_many_lines / module_name_repetitions / must_use_candidate / needless_pass_by_value / manual_let_else / items_after_statements / similar_names — informational only. - format_collect / match_wildcard_for_single_variants / trivially_copy_pass_by_ref / unnecessary_wraps — intentional patterns (exhaustive match, future Result variants 등). - default_trait_access — `Foo::default()` 가 idiomatic. - float_cmp — NLI / RRF score 의 explicit threshold 비교 의도. - struct_excessive_bools / case_sensitive_file_extension_comparisons / naive_bytecount / ignore_without_reason — domain-specific 의도. - format_push_string / return_self_not_must_use / match_same_arms — builder / wire-label / hot-path 패턴 보존. - needless_continue / used_underscore_binding / nonminimal_bool / unreadable_literal / many_single_char_names / doc_link_with_quotes / assigning_clones / collapsible_str_replace / trivial_regex / elidable_lifetime_names / range_plus_one / explicit_iter_loop / implicit_hasher / ref_option — remaining low-value style. - 각 24 crate `Cargo.toml` 에 `[lints] workspace = true` 추가. ## Auto-fix `cargo clippy --workspace --all-targets --fix` 적용 — 128 files changed, 552 insertions / 472 deletions. 주로: - uninlined_format_args (~18): `format!("{}", x)` → `format!("{x}")`. - redundant_closure_for_method_calls (~33): `.map(\|x\| x.foo())` → `.map(T::foo)`. - 그 외 mechanical refactor. ## 검증 - `cargo clippy --workspace --all-targets -j 1 -- -D warnings` clean (pedantic + 모든 lint group). - `cargo test --workspace --no-fail-fast -j 1` — 1293 tests pass + 1 pre-existing flaky fail (`kebab-mcp::tools_call_ask_multi_hop::ask_tool_routes_multi_hop_true_to_decompose_first`, HOTFIX candidate, cleanup 무관). 회귀 0. Wire 영향: 없음. Behavior 영향: 없음 (mechanical refactor only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 03:01:58 +00:00
altair823	a0ccc7b021	Merge pull request 'feat(rag): fb-41 PR-9d — dogfood retest + HOTFIXES closure + corpus 보존' (#180 ) from feat/fb-41-pr-9d-dogfood-retest into main Reviewed-on: #180	2026-05-26 02:01:56 +00:00
altair823	a8fd6994d2	chore(rag): PR-9d SUMMARY 의 latency 표 정정 PR-8 baseline 의 S1/S10 latency 추정값 (~150s, ~80s) 이 부정확. `results/s1-multihop.json` + `results/s10-multihop.json` 가 실제로 614s / 589s (`jq '.usage.latency_ms'` 측정) — PR-8 시점 baseline 이 아닌 더 이전 timeline. S7 만 `results/post-pr8/` 에 retest 보존되어 비교 의미 있음 (158s baseline → PR-9 241s with NLI first-run download). SUMMARY.md 의 latency 표를 정정 — S1/S10 의 동일 시점 baseline 부재 명시 + S7 의 단일 비교만 의미 있음 caveat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 01:47:38 +00:00
altair823	505b3889fb	feat(rag): fb-41 PR-9d — dogfood retest + HOTFIXES PR-9 closure + docs/dogfood/v0.18.0/ 보존 PR-9 의 진짜 작동 확인 — PR-1~PR-9c-2 머지 후 `/build/cache/dogfood-v018/` corpus 의 S7/S1/S3/S10 multi-hop retest. 핵심 결과: S7 hallucination root cause 해결 확정. - PR-8 baseline: `grounded=true, refusal_reason=null`, 답변=Adam gradient 공식 (caffeine 질문에 무관 hallucination, silent). - PR-9 retest: `refusal_reason=nli_verification_failed, nli_score=0.0035` (graceful refuse, NLI 가 entailment 0.35% 검출). 전체 비교 (4 case): - S7 ✅ hallucination FIXED. - S1 ✅ 둘 다 reject, NLI 가 더 deterministic (0.058). - S3 ⚠ consistent fail (`nli_model_unavailable`, 313s) — v0.18.1 follow-up (kebab-nli 의 특정 input 의존 fail, debug log emit 안 됨 → 진단 어려움). - S10 ✅ 둘 다 reject, NLI 가 더 deterministic (0.0028). - docs/dogfood/v0.18.0/SUMMARY.md (sanitized 보고서) + s{1,3,7,10}-multihop-post-pr9.json (sample wire output, repo 보존). - tasks/HOTFIXES.md 의 fb-41 PR-9 entry: "예정" → "완료 (2026-05-26)" + 비교 표 inline + S3 follow-up subsection (v0.18.1 candidate). RAM: 5-6 GB → 7-8 GB (ONNX session ~600 MB), 16 GB 안전. Disk: NLI model cache 1.1 GB (XDG default 또는 storage.model_dir). Wire 영향: 없음 (PR-9c-1 의 schema 변경만 + 측정값 sample 보존). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 01:44:57 +00:00
altair823	772575d8f0	Merge pull request 'feat(rag): fb-41 PR-9c-2 — pipeline integration + mock test + SKILL.md (★ NLI 실 활성화)' (#179 ) from feat/fb-41-pr-9c-2-pipeline-integration into main Reviewed-on: #179	2026-05-26 01:03:18 +00:00
altair823	00ffe9c792	feat(rag): fb-41 PR-9c-2 — pipeline integration + mock test + SKILL.md (★ NLI 실 활성화) PR-9c-1 의 wire surface 위에 behavior 활성화 — `ask_multi_hop` 의 step 8.5 hook 가 `[rag] nli_threshold > 0` 일 때 NLI 검증 실 수행. 첫 user-visible behavior change in PR-9. - crates/kebab-rag/src/pipeline.rs: - ask_multi_hop step 8.5 NLI hook (empty answer 가드 + truncate_for_nli + verifier.score + verification field + refusal 분기). - refuse_nli_verification helper (verification: Some(...) + RefusalReason::NliVerificationFailed). - refuse_nli_model_unavailable helper (verification: None + RefusalReason::NliModelUnavailable). - truncate_for_nli helper (module-level pub fn, MAX_NLI_PREMISE_CHARS = 4 * 400 = 1600 chars 의 chars-based budget, _hypothesis 미사용 placeholder — v0.18.1 token-budget 갱신 candidate). - PR-9c-1 의 #[allow(dead_code)] 두 곳 제거 (verifier field + with_verifier builder; doc 의 transitional sentence 도 정리). round-1 PR-9c-1 review N1 carry-forward closure. - crates/kebab-app/src/app.rs: - App::open_with_config 의 NliVerifier construction — config.rag.nli_threshold > 0 → OnnxNliVerifier::new + Arc::new wrap + 후속 RagPipeline 초기화 시 with_verifier 호출. 실패 시 ? 전파 (시그니처 Result<Self> 그대로 — caller cascading 0). - kebab-app/Cargo.toml 에 kebab-nli path 의존 추가. - crates/kebab-rag/tests/multi_hop.rs + tests/common/mod.rs: - MockNliVerifier (pass / fail / err 생성자 + score call_count instrumented). - multi_hop_nli_pass_keeps_grounded — entailment 0.9 → grounded=true, verification.nli_passed=true. - multi_hop_nli_fail_refuses — entailment 0.1 → refusal=NliVerificationFailed. - multi_hop_nli_disabled_skip_verify — threshold 0.0 → verify skip, verification=None. - multi_hop_nli_model_unavailable_refuses — verifier Err → refusal=NliModelUnavailable. - multi_hop_truncate_for_nli_preserves_hypothesis — long premise truncation + hypothesis 보전. - integrations/claude-code/kebab/SKILL.md: mcp__kebab__ask 절에 NLI 안내 한 단락 (verification.nli_passed 의미 + threshold tuning + nli_verification_failed/nli_model_unavailable refusal handling). 검증: cargo test --workspace -j 1 — 5 신규 multi-hop pass + 회귀 0 (pre-existing kebab-mcp::tools_call_ask_multi_hop 동일 flaky). cargo clippy --workspace --all-targets -j 1 -- -D warnings clean. Wire 영향: PR-9c-1 의 schema 변경에 behavior wiring — answer.v1.verification field 가 multi-hop happy path + refuse path 양쪽에서 채움. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 00:55:02 +00:00
altair823	681c48b2a3	Merge pull request 'feat(rag): fb-41 PR-9c-1 — core types + wire scaffolding (NLI verification)' (#178 ) from feat/fb-41-pr-9c-1-core-types-wire into main Reviewed-on: #178	2026-05-26 00:09:03 +00:00
altair823	546c1564b0	feat(rag): fb-41 PR-9c-1 — core types + wire scaffolding (NLI verification) Surface-only PR (no behavior wiring — that's PR-9c-2): - kebab-core: RefusalReason::NliVerificationFailed + NliModelUnavailable (serde rename_all="snake_case", wire = identical strings). - kebab-core: Answer.verification: Option<VerificationSummary> field (additive minor wire — pre-v0.18 reader 무영향). - kebab-core: VerificationSummary { nli_score: f32, nli_threshold: f32, nli_passed: bool } struct + lib.rs 재-export. - kebab-config: NliCfg { model, provider } + ModelsCfg.nli (default Xenova/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7). - kebab-config: RagCfg.nli_threshold: f32 (default 0.0 = disabled, spec §2.6 single gate). - kebab-config: env override KEBAB_MODELS_NLI_MODEL/PROVIDER + KEBAB_RAG_NLI_THRESHOLD (parse 실패 시 tracing::warn + default 유지). - kebab-rag: RagPipeline.verifier: Option<Arc<dyn NliVerifier>> field + with_verifier builder (모두 #[allow(dead_code)] — PR-9c-2 의 step 8.5 hook 가 활성화 시 제거). RagPipeline::new signature 유지 (round-2 NEW-M1 Option B). - kebab-rag: Cargo.toml 에 kebab-nli path 의존 추가. - kebab-store-sqlite + kebab-tui: 두 신규 RefusalReason variant 에 대한 exhaustive match arm 추가 (snake_case label / 표시 문구). - 모든 Answer 구축 site (rag 6 + cli/tui/eval 3 fixture) 에 verification: None 추가. - wire schemas: answer.schema.json verification field + \$defs.VerificationSummary + refusal_reason.enum 2 추가. error.schema.json code.enum + details.description 2 추가 (forward-looking reserved). - docs/ARCHITECTURE.md: Mermaid Adapters subgraph 의 nli 노드 + rag→nli + app→nli (forward-looking) + nli→config edges. nli→core edge 는 skip (kebab-nli/Cargo.toml direct dep 가 config 만, ARCHITECTURE 컨벤션 = direct deps only). 디렉토리 트리에 crates/kebab-nli/ 추가. Tests: kebab-core 3 (serde rename + verification skip + struct shape) + kebab-config 6 (defaults + legacy + env + malformed env) + kebab-cli wire 5 (schema verification + enum 검증). 검증: cargo test --workspace -j 1 회귀 0 (pre-existing kebab-mcp::tools_call_ask_multi_hop flaky 1개 동일 — spec 에 명시된 known-flaky). cargo clippy --workspace --all-targets -D warnings clean. Wire 영향: additive minor — answer.v1 의 verification optional + refusal_reason.enum 확장 + error.v1.code 확장. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 23:27:36 +00:00
altair823	79ad6e376f	Merge pull request 'feat(nli): fb-41 PR-9b — OnnxNliVerifier ONNX inference + model download' (#177 ) from feat/fb-41-pr-9b-onnx-nli-inference into main Reviewed-on: #177	2026-05-25 22:24:01 +00:00
altair823	6ffbe0a5a3	chore(nli): PR #177 회차 1 리뷰 반영 (N1 cache-hit probe + N2 test pollution) - N1: fetch 의 cache-hit 검사 경로가 실제로는 download 트리거 (ApiRepo::get 가 cache miss 시 download 후 path 반환). log 의 "NLI artifact cache hit" 가 방금 download 한 직후 출력 — misleading. hf_hub::Cache::new(cache_dir).repo(repo).get(filename).is_some() 로 변경 — Cache::get 은 fs lookup only, 네트워크 안 탐. actual download 횟수는 변화 없음 (1번), log accuracy 만 개선. - N2: new_succeeds_on_default_config / score_empty_hypothesis_returns_err 가 XDG 실 디렉토리 (`~/.local/share/kebab/models/nli/...`) 를 create_dir_all → test pollution. tempdir_config() 헬퍼 추가 — TempDir 으로 storage.data_dir override, model_dir 는 `{data_dir}/models` 그대로 두어 expand_path 의 substitution 검증도 유지. cargo test -p kebab-nli -j 1 → 6 passed / 0 failed (unit) + 5 ignored (integration, manual). cargo clippy -p kebab-nli --all-targets -j 1 -- -D warnings clean. inference.rs 미수정 → manual --ignored smoke 결과 (5/5 PASS) 그대로 유효. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 22:22:30 +00:00
altair823	ab3408cb49	chore(nli): PR-9b inference test 2 의 expectation 정정 기존 expectation `entailment < 0.3` 가 너무 strict — mDeBERTa-v3 multilingual NLI 가 두 caffeine 사실 (premise: "Caffeine is a stimulant.", hypothesis: "The chemical formula of caffeine is C8H10N4O2.") 의 neutral 을 0.53 으로, entailment 를 0.43 으로 판단함 (서로 entail 안 하지만 모순도 아님 = 정확히 neutral). spec §3 PR-9b 의 "entailment 낮음 — neutral/contradiction 이 winning channel" 의 spirit 은 neutral 이 max 임. expectation 을 `s.neutral > s.entailment && s.neutral > s.contradiction` 로 변경. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 22:10:51 +00:00
altair823	b807fd5aa5	feat(nli): fb-41 PR-9b — OnnxNliVerifier 의 ONNX inference + model download - OnnxNliVerifier fields: model_id, cache_dir (XDG model_dir/nli/<sanitized>), session/tokenizer OnceLock. - new(): eager cache_dir stamp만 — actual model download + Session::commit_from_file 는 첫 score 호출 시 ensure_loaded() 가 lazy 수행. - score(): ensure_loaded → tokenizer.encode(pair, OnlyFirst truncation max_length=512) → ndarray Array2<i64> → ort::Session::run → logits[1,3] → NliScores::from_xnli_logits. - empty hypothesis edge: defense-in-depth bail (spec §2.3 의 caller-side skip 외 추가). - sanitize_model_id helper: "/" → "_". - 5 #[ignore] integration tests (EN self-entailment, EN unrelated, KR entailment, long premise truncation, empty hypothesis err) — manual smoke 가 PR description 첨부. Cargo.toml: `download-binaries` feature 를 kebab-nli 의 ort dep 에 활성화 (PR-9b prep commit 의 후속). 단독 `cargo test -p kebab-nli` 의 per-crate feature 유니온은 fastembed 없이 ort/download-binaries 가 OFF 되어 ort-sys link 가 실패 — kebab-nli 측에서 명시적으로 켜 줘야 standalone build 가 ONNX 런타임 link 됨. workspace 전체 빌드에서는 fastembed 의 동일 opt-in 과 union 되어 부작용 없음. Verification: - cargo test -p kebab-nli -j 1 — PR-9a 의 6 unit pass (`score_returns_err_in_skeleton` → `score_empty_hypothesis_returns_err` 로 stub→실 path 갱신, 갯수 유지). - cargo clippy -p kebab-nli --all-targets -- -D warnings clean. - cargo build --workspace -j 1 — 회귀 0. - Manual --ignored smoke 결과 PR body 첨부. Wire 영향: 없음 (crate-internal). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 21:56:22 +00:00
altair823	93436f9eca	feat(nli): fb-41 PR-9b prep — activate ort/tokenizers/hf-hub/ndarray/tracing deps in kebab-nli PR-9a 의 workspace.dependencies 만 declared 였던 5 crate 의존을 kebab-nli/Cargo.toml 에 활성화. PR-9b 의 OnnxNliVerifier 실 구현이 본 commit 위에서 빌드 가능. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 21:42:07 +00:00
altair823	11ce7847a1	Merge pull request 'feat(nli): fb-41 PR-9a — kebab-nli crate skeleton + workspace deps' (#176 ) from feat/fb-41-pr-9a-kebab-nli-crate into main Reviewed-on: #176	2026-05-25 21:34:49 +00:00
altair823	1d88dccf8a	chore(nli): PR #176 회차 1 리뷰 반영 - lib.rs::NliScores::faithfulness doc 의 `rag.nli_faithfulness_min` → `rag.nli_threshold` (spec §2.5/§2.6 의 실 config knob 이름 정합). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 21:25:44 +00:00
altair823	1eb0bbecb3	feat(nli): fb-41 PR-9a — kebab-nli crate skeleton + workspace deps - 신규 crate kebab-nli (trait + impl 동일 crate, v0.18 scope = ONNX adapter 1개). - NliVerifier trait + NliScores struct (XNLI 3-channel: entailment/neutral/contradiction). - private softmax3 (log-sum-exp 안전). - OnnxNliVerifier placeholder (PR-9b 가 ONNX inference + model download 추가). - workspace.dependencies 추가: ort 2.0-rc.9, tokenizers 0.21 (default-features=false, onig), hf-hub 0.4, ndarray 0.16. Pre-flight (PR-9 design contract 의 gate): - HF Xenova/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 model.onnx + tokenizer.json → HTTP/2 302 (HF S3 routing, file 존재). - tokenizers --no-default-features -F onig 의 standalone repro: SentencePiece mDeBERTa tokenizer.json 로드 OK (KR 9 tokens / EN 11 tokens 정상 encode). - Cargo features 결정 trace: tokenizers = { default-features = false, features = ["onig"] } lock. Tests: 6 unit (softmax3 정규화 + 불변성 + XNLI logits 변환 + faithfulness + new + score stub) — 통과. Verification: cargo test -p kebab-nli -j 1 (6/6) + cargo clippy -p kebab-nli --all-targets -j 1 -- -D warnings clean. Workspace: cargo test --workspace -j 1 — pre-existing kebab-mcp::tools_call_ask_multi_hop 1 fail (main baseline 동일 fail, PR-9a 무관 — ingest fixture/Ollama 의존 flaky). Wire 영향: 없음 (crate 도입만). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 21:22:38 +00:00
altair823	44fbffff26	docs(rag): fb-41 PR-9 spec + plan — NLI verification + v0.18.0 cut fb-41 multi-hop RAG 의 dogfood S7 hallucination root cause = LLM-self-judge ceiling. 대응 = NLI-based post-synthesis verification (mDeBERTa-v3 XNLI, 280 MB ONNX). 산출물: - docs/superpowers/specs/2026-05-25-p9-fb-41-finalize-spec.md (review_round=5, 4 OMC reviewer APPROVE: 1 CRITICAL + 9 MAJOR + 3 MINOR → 1 NIT carry-forward). - docs/superpowers/plans/2026-05-25-p9-fb-41-finalize-plan.md (plan_review_round=3, 4 OMC reviewer APPROVE: 15 issues → 0 actionable). 5 sub-PR (PR-9a~9d) + cut PR. 작업 21-31h / wall time 28-44h. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 21:22:20 +00:00
altair823	63aece3ea1	Merge pull request 'fix(rag): fb-41 PR-8 multi-hop synthesize safety in depth (pool 15 + self-check rule)' (#175 ) from feat/fb-41-pr-8-multi-hop-synthesize-safety into main	2026-05-25 12:51:46 +00:00
altair823	28a8bbeace	chore(rag): PR #175 회차 1 리뷰 반영 HOTFIXES.md 의 fb-41 entry 에 post-PR-7 dogfood retest + PR-8 partial mitigation sub-section 추가 + PR-9 NLI plan anchor + 사용자 영향 절 갱신. config.rs 의 doc reference 가 정확한 entry sub-section 가리키도록 조정 — dangling reference 해소. 검증 - `cargo test -p kebab-config -j 1` — 모든 test 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:51:15 +00:00
altair823	52a97303dc	fix(rag): fb-41 PR-8 — multi-hop synthesize safety in depth (pool 15 + self-check rule) v0.18 cut 전 fb-41 multi-hop RAG layered defense — PR-7 의 pre-decompose probe gate 위에 추가 safety. PR-7 의 fix 만으로는 hybrid mode 의 RRF top_score 가 gate 통과 시 (도그푸딩 S7 의 caffeine query) hallucination 여전히 발생 — synthesize 단계 자체의 safety 보강 필요. 중요: 본 PR 만으로는 S7 hallucination 완전 차단 안 됨 (gemma3:4b 의 prompt-following 한계 — 추가 dogfood S7 retest 에서 확인). 진짜 fix 는 PR-9 (NLI-based post-synthesis verification). PR-8 은 그 사이의 partial mitigation + safety in depth — latency 4× 개선 (614s → 158s) + future larger LLM 용 prompt rule. 설계: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md 계획: /build/cache/dogfood-v018/results/PR-9-DESIGN.md (사용자 결정 후 spec/plan 으로 promotion) ## 변경 - `crates/kebab-config/src/lib.rs`: - `RagCfg::multi_hop_max_pool_chunks` default 30 → 15. - rationale doc — gemma3:4b 가 30-chunk large prompt 에서 citation rule 잃는 측정 결과. - 2 unit test (`default_` rename + `legacy_` assert) 갱신. - `crates/kebab-rag/src/pipeline.rs`: - `MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT` 에 답하기 전 self-check rule 추가 — "[원본 질문] 의 핵심 entity (고유명사, 화학식, 수치 단위, 코드명, 약자) 가 [근거] 본문에 literal 으로 등장하지 않으면 다른 entity 의 정보로 답을 합성하지 말고 '근거가 부족하다' 답한다". example (caffeine + Adam optimizer chunk) 도 명시. ## 도그푸딩 결과 (retest with PR-7 + PR-8) \| query \| path \| grounded \| latency \| answer \| \|---\|---\|---\|---\|---\| \| caffeine formula \| single-pass \| false (LlmSelfJudge) \| 30s \| "근거가 부족하다" ✓ \| \| caffeine formula \| multi-hop pre-fix \| true ✗ \| 141s \| hallucination \| \| caffeine formula \| multi-hop PR-7 \| true ✗ \| 143s \| hallucination (probe gate top_score 0.5 > 0.30) \| \| caffeine formula \| multi-hop PR-8 \| true ✗ \| 158s \| hallucination (LLM 가 새 rule 무시) — latency 4× 개선 \| PR-8 의 부분 성과: - pool 30→15 로 synthesize prompt size ↓ → latency 614s → 158s. - prompt rule 은 future larger LLM (gemma2:9b, qwen2.5:7b 등) 에서 가치 ↑. PR-8 의 한계: - gemma3:4b 의 prompt-following 한계 — strong rule 도 무시하고 다른 entity chunk (Adam optimizer formula) 의 본문을 caffeine 화학식 출처로 인용. - LLM-self-judge 기반 safety 의 ceiling. ## 진짜 fix → PR-9 (별 PR) 학계 / industry 표준 검색 결과 (Self-RAG, CRAG, Auto-GDA, MedTrust-RAG): deterministic post-synthesis verification 이 정답 path. NLI-based groundedness check — mDeBERTa-v3-base-xnli (280 MB multilingual) ONNX model 이 (premise=packed_chunks, hypothesis=answer) entailment 검사. score < 0.5 면 refuse. PR-8 위에 layered defense. ## 검증 - `cargo test -p kebab-config -p kebab-rag -j 1` — 모든 test 통과 (config default test 2개 갱신, rag tests 영향 없음). - `cargo clippy -p kebab-config -p kebab-rag --all-targets -j 1 -- -D warnings` clean. - 단일 crate 직렬 build (16 GB RAM 제약). - S7 dogfood retest — hallucination 여전 (PR 본문에 정직 명시). ## 변경 없음 - Wire schema — additive (config knob default 만 변경). - PR-7 의 probe gate — 그대로 작동 (gate 통과 시 PR-8 의 추가 safety layer). - 다른 도그푸딩 P1 항목 (citation 일관성, binary path) — 별 PR. ## 다음 - PR-9a/b/c: NLI-based post-synthesis verification — 진짜 fix. - PR-9 머지 후 dogfood S7 재검증 (예상: refuse + nli_score < 0.5). - v0.18.0 cut. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:44:31 +00:00
altair823	71fb2cbcb3	Merge pull request 'fix(rag): fb-41 PR-7 multi-hop pre-decompose score-gate (S7 hallucination 회귀 핀)' (#174 ) from feat/fb-41-pr-7-multi-hop-score-gate-fix into main	2026-05-25 12:05:23 +00:00
altair823	85855ef596	chore(rag): PR #174 회차 1 리뷰 반영 `ask_multi_hop` 의 probe_hits 가 gate 검사 후 throw away 되는 의도 명시 — pool 초기값으로 재사용 안 하는 invariant clarity rationale 을 코드 안에 doc. 향후 retrieve cost 가 multi-hop bottleneck 이 될 경우 재검토 hint 도 함께. 검증 - `cargo test -p kebab-rag -j 1 --test multi_hop` 10 모두 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:04:53 +00:00
altair823	da25ce330b	fix(rag): fb-41 PR-7 — multi-hop pre-decompose score-gate (S7 hallucination 회귀 핀) v0.18 cut 전 fb-41 multi-hop RAG 도그푸딩에서 발견된 safety regression fix. 자세한 도그푸딩 결과는 `tasks/HOTFIXES.md` 의 2026-05-25 fb-41 pre-v0.18 entry + `/build/cache/dogfood-v018/results/SUMMARY.md` 참조. ## 문제 (S7) Query: `What is the chemical formula of caffeine?` (KB 에 없는 fact). - Single-pass `kebab ask`: retrieve top score 가 default `rag.score_gate = 0.30` 미만 → `refuse_score_gate` → 안전한 refusal. - Multi-hop `kebab ask --multi-hop`: `grounded = true`, 본문 `"카페인의 화학식은 C₉H₁₅N₃O 입니다 [#6]"` (hallucination — 실제 C₈H₁₀N₄O₂) + `[#6]` 가 Adam optimizer chunk 의 `g_t = ∂L/∂θ_i` 본문을 인용 (시각적 short structured token 매칭 trigger). 원인: `ask_multi_hop` 의 score-gate 검사가 pool 의 top_score 만 봤다. multi-hop 의 pool 은 5 sub-queries 의 union — 한 sub-query 의 top score 가 gate 위면 다른 chunks 가 원본 query 와 무관해도 gate 통과 + synth → LLM hallucinate. ## Fix `ask_multi_hop` entry 에 pre-decompose probe 추가: 1. 원본 query 로 retrieve 한 번 (LLM call 0회, ~ms). 2. probe empty → `refuse_no_chunks(None)` (decompose 안 함, hops=None). 3. probe top_score < gate → `refuse_score_gate(None)` (decompose 안 함). 4. probe pass → 기존 decompose / decide / synthesize flow 그대로. Multi-hop 의 safety floor 가 single-pass 와 정확히 일치 — multi-hop 은 원본 query 가 이미 KB 범위 내 일 때만 cross-doc reasoning 추가. 비용: 한 번의 retrieve (수 ms), LLM call 없음. multi-hop 의 LLM-dominated latency 대비 무시 가능. ## Tests 신규 3 회귀 핀 (`crates/kebab-rag/tests/multi_hop.rs`): - `multi_hop_below_probe_gate_refuses_before_any_llm_call` — S7 직접 회귀 핀. low-score chunk + empty LM script → score_gate refusal, LM calls 0회, hops=None. fix revert 시 즉시 panic. - `multi_hop_empty_probe_pool_refuses_before_any_llm_call` — empty retrieve 시 NoChunks refusal, LM calls 0회. - `multi_hop_above_probe_gate_proceeds_to_decompose` — probe pass 시 full multi-hop flow 정상 (decompose + decide + synth). 기존 7 multi-hop test 의 `ScriptedRetriever` 에 probe-pass entry prepend + `retriever_handle.calls()` expectation +1. test 2 / test 4 처럼 entry 두 개였던 곳도 prepend (3 entries). `multi_hop_refuse_no_chunks_preserves_hops_trace` / `multi_hop_refuse_score_gate_preserves_hops_trace` 의 의미 좁힘 — 이제 decompose-driven refusal (probe pass 후 sub-query retrieve 가 empty 또는 below-gate) 만 검증. probe-driven refusal 은 hops=None (decompose 안 함) — 신규 test 가 그 path 핀. ## 검증 - `cargo test -p kebab-rag -j 1` — 10 multi-hop (7 갱신 + 3 신규) + 19 pipeline + 31 unit + 3 prompt_template + 3 streaming 모두 통과. 회귀 없음. - `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean. - 단일 crate 직렬 build (16 GB RAM 제약). ## 변경 없음 - Wire schema — `Answer.hops` shape 동일, `refusal_reason` enum 동일. - 다른 도그푸딩 발견 (synthesize citation 일관성, latency, binary path confusion) — v0.18.1 또는 별 PR 의 책임. HOTFIXES 의 "다른 도그푸딩 발견" 절에 명시. ## 다음 PR-7 머지 후: 1. Workspace `Cargo.toml` version 0.17.2 → 0.18.0 (minor bump). 2. HANDOFF.md / INDEX.md 갱신 + frozen design §3.8 multi-hop sub-section. 3. `gitea-release v0.18.0 --auto-notes`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:02:11 +00:00
altair823	5bfea3c28b	Merge pull request 'feat(tui): fb-41 PR-6 TUI Ask multi-hop toggle + hop trace summary' (#173 ) from feat/fb-41-pr-6-tui-multi-hop-toggle into main	2026-05-25 09:30:06 +00:00
altair823	b6756f8ce3	chore(tui): PR #173 회차 1 리뷰 반영 test `spawn_snapshot_multi_hop_into_askopts` → `ask_state_multi_hop_field_default_false_and_round_trips` 로 rename. 이전 이름은 spawn 동작 검증을 약속했으나 본문은 단순 field default + setter round-trip 만 검증 — name 과 실제 의도의 mismatch. 새 이름이 실제 검증 (field shape pin) 과 정확히 일치. doc string 도 spawn 동작은 별 path (live dogfood) 로 검증된다고 명확히 표기 — test 의 책임 범위가 무엇인지 reader 가 즉시 파악. 검증 - `cargo test -p kebab-tui -j 1 --test ask` — 42 test (6 multi-hop 포함) 모두 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 09:29:36 +00:00
altair823	016f380428	feat(tui): fb-41 PR-6 — TUI Ask multi-hop toggle + hop trace summary fb-41 multi-hop RAG 의 마지막 component PR (PR-5 머지 직후). TUI Ask 패널의 user-facing surface — F2 toggle, multi-hop badge, status panel 의 hop count summary, cheatsheet 안내. v0.18.0 cut 준비. 설계: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md 계획: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-6 단락) ## TUI surface - `crates/kebab-tui/src/app.rs`: - `AskState.multi_hop: bool` field + Default false. 사용자 토글 상태를 인-패널 보존, 대화 history 와 직교 — F2 flipping mid- conversation 도 turns 보존 (다음 turn 만 다른 pipeline 으로 route). - `crates/kebab-tui/src/ask.rs`: - `handle_key_ask` 에 `(KeyCode::F(2), _) → s.multi_hop = !s.multi_hop`. Mode-agnostic (physical function key — Normal/Insert 양쪽 작동, typing ambiguity 없음). Briefing 의 candidate (F2 vs Ctrl-T) 중 F2 채택 — Ctrl-M 은 Enter 와 collision 이미 명시, F2 가 cleanest. - `spawn_ask_worker` 의 `AskOpts.multi_hop` 가 spawn 시점에 토글값 snapshot. 이후 F2 flip 은 다음 Enter 부터 적용 (in-flight turn 무영향). - `render_input` 의 input pane title 에 `F2=multi-hop` binding 안내 추가 + prompt row 에 `multi-hop` badge (Success 녹색, toggled-on 일 때만). 사용자가 어떤 pipeline 으로 다음 query 를 보낼지 항상 가시. - `render_status` 의 status panel 에 `multi-hop: N hops` line 추가 (last_answer.hops 가 Some 일 때만). forced_stop 발생 시 `forced_stop=K` suffix — depth/pool cap tuning 단서. - `crates/kebab-tui/src/cheatsheet.rs`: - Ask section 에 `F2 toggle multi-hop pipeline` entry 추가. ## 변경 없음 (의도된 deferral) - `InspectTarget::Hop(turn_index)` variant — plan 의 PR-6 stretch goal. per-iter hop trace detail 을 Inspect 패널에 노출하는 기능은 별 PR (PR-6b 또는 v0.18 dogfood follow-up). PR-6 의 핵심 가치 (사용자가 multi-hop pipeline 을 토글하고 결과의 hop count 를 본다) 는 status panel 의 한 줄 summary 로 100% cover. Inspect 진입은 multi-hop 사용자가 드물게 필요한 surface — v0.18 cut 부담 회피. - prompt_template_version (`rag-multi-hop-v1`) — 그대로. - MCP / CLI surface — PR-4 / PR-5 의 책임. ## Tests (`tests/ask.rs` 신규 6 multi-hop pins) - `f2_toggles_multi_hop_flag_from_insert_mode`: Insert 에서 F2 toggle (fresh_app default mode). - `f2_toggles_multi_hop_flag_from_normal_mode`: Normal 에서도 동일 — mode-agnostic 회귀 핀. - `input_pane_shows_multi_hop_badge_when_toggled_on`: 토글 on 시 prompt row 에 `multi-hop` 등장 + title 의 `F2=multi-hop` binding hint 등장. - `input_pane_omits_multi_hop_badge_when_toggled_off`: 토글 off 시 prompt row 의 badge 부재 (title hint 는 유지 — 사용자 discoverability). - `status_panel_summarizes_hops_when_answer_has_trace`: 3-hop trace (Decompose + Decide + Synthesize) → `multi-hop: 3 hops` line. - `status_panel_omits_hops_summary_for_single_pass`: hops=None → 본문 에 summary line 부재 (title binding hint 만). - `spawn_snapshot_multi_hop_into_askopts`: AskState.multi_hop 의 field shape 회귀 핀 (default false / settable / round-trip). ## 검증 - `cargo test -p kebab-tui -j 1` — 신규 6 multi-hop + 기존 ask / search / library / mode / cheatsheet / inspect / status_bar 모두 통과 (42 ask test + 10 mode + 기타). 회귀 없음. - `cargo clippy -p kebab-tui --all-targets -j 1 -- -D warnings` clean. - 단일 crate 직렬 build (16 GB RAM 제약). ## v0.18.0 cut (다음 단계) - Workspace `Cargo.toml` version 0.17.2 → 0.18.0 (minor — surface 확장 + new prompt_template_version `rag-multi-hop-v1`). - HANDOFF.md / HOTFIXES.md / INDEX.md 갱신 (fb-41 entry 정리). - `gitea-release v0.18.0 --auto-notes`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 09:26:29 +00:00
altair823	bf28a1e4d9	Merge pull request 'feat(mcp): fb-41 PR-5 MCP ask multi_hop arg + SKILL.md 안내' (#172 ) from feat/fb-41-pr-5-mcp-multi-hop-arg into main	2026-05-25 09:09:06 +00:00
altair823	24221826ed	chore(mcp): PR #172 회차 1 리뷰 반영 `ask_tool_routes_multi_hop_true_to_decompose_first` 의 error code 검증을 더 견고하게 — `model_unreachable \| timeout` 둘 다 accept. 환경 차이 (즉시 ECONNREFUSED vs connect timeout) 가 다른 wire code 로 분류돼도 dispatch divergence 자체 (schema_version=error.v1 + isError=true vs single-pass 의 answer.v1 grounded=false) 는 동일하게 검증. 검증 - `cargo test -p kebab-mcp -j 1 --test tools_call_ask_multi_hop` 2 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 09:08:40 +00:00
altair823	8a2f7affa6	feat(mcp): fb-41 PR-5 — MCP ask multi_hop arg + SKILL.md 안내 fb-41 multi-hop RAG 의 PR-5 (PR-4 머지 직후). PR-4 의 CLI `--multi-hop` flag 와 sister surface — agent (Claude Code 등 MCP host) 가 `mcp__kebab__ask` 호출 시 `multi_hop: true` 옵션 사용 가능. 설계: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md 계획: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-5 단락) ## MCP surface - `crates/kebab-mcp/src/tools/ask.rs`: - `AskInput.multi_hop: Option<bool>` 추가. JsonSchema derive 가 tools/list 에 자동 반영 — agent capability discovery 가 새 필드 인식. - `handle()` 가 `AskOpts.multi_hop = input.multi_hop.unwrap_or(false)` — 기존 caller (필드 누락 / null) 는 single-pass 그대로. - `crates/kebab-mcp/src/lib.rs` (tools/list): - `ask` tool description 에 multi-hop 한 줄 (decompose → retrieve → synthesize, 2-5× LLM cost, per-hop trace on Answer.hops). ## SKILL.md 안내 - `integrations/claude-code/kebab/SKILL.md` 의 `mcp__kebab__ask` 절: - Input shape JSON 예제에 `multi_hop: false` 추가. - Returns 절에 `hops` (multi-hop only) 추가. - 신규 bullet (p9-fb-41) — opt-in 조건 / 비용 trade-off / 사용 케이스 (compound questions / prereq chains / cross-doc reasoning) / `Answer.hops` 의 per-hop trace shape / `multi_hop_decompose_failed` refusal 처리. ## Tests (`tests/tools_call_ask_multi_hop.rs` 신규, 2 Ollama-free pins) - `ask_tool_routes_multi_hop_true_to_decompose_first`: dispatch divergence 핀. invalid LLM endpoint (`http://127.0.0.1:1`, request_timeout_secs=2) 로 force unreachable. multi_hop=true 는 decompose 먼저 호출 → `error.v1` (code=model_unreachable) / isError=true. multi_hop=false (single-pass) 는 empty KB 에서 retrieve 먼저 → no LLM call → `answer.v1` grounded=false / isError=false. 두 shape 의 분기가 dispatch 가 실제로 다른 path 로 라우팅됨의 증거. - `ask_input_schema_advertises_multi_hop_field`: AskInput 의 JsonSchema 가 `multi_hop` property 노출 — MCP host capability discovery (tools/list 의 input schema) 회귀 핀. 기존 `tools_call_ask.rs` 의 AskInput literal 도 `multi_hop: None` 추가 (struct field 추가에 따른 minimal cascade). ## 변경 없음 - `prompt_template_version` (`rag-multi-hop-v1`) — 그대로. - TUI surface — PR-6 의 책임. - error.v1 매핑 — PR-4 의 enum reservation 그대로 (no error_wire promotion). ## 검증 - `cargo test -p kebab-mcp -j 1` — 신규 tools_call_ask_multi_hop 2 + 기존 ask / search / bulk_search / fetch / ingest / schema / doctor / tools_list / initialize 등 모두 통과 (회귀 없음). - `cargo clippy -p kebab-mcp --all-targets -j 1 -- -D warnings` clean. - 단일 crate 직렬 build (16 GB RAM 제약). ## 다음 PR - PR-6: TUI Ask 패널 multi-hop toggle (F2 / Ctrl-T) + hop trace render + cheatsheet 갱신. - v0.18.0 cut (PR-6 머지 후). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 09:06:28 +00:00
altair823	f28a422f79	Merge pull request 'feat(cli): fb-41 PR-4 CLI --multi-hop flag + answer.v1 / error.v1 wire' (#171 ) from feat/fb-41-pr-4-cli-multi-hop-flag into main	2026-05-25 08:48:08 +00:00

1 2 3 4 5 ...

895 Commits