Commit Graph

895 Commits

Author SHA1 Message Date
9d7faab650 docs+chore(plan-bootstrap): apply spec L-1 cosmetic fix + capture cargo tree baselines for v0.20 sub-item 1 verifier gates
Step 1 (Group A) of v0.20.0 sub-item 1 (scanned PDF OCR) implementation plan.

A1 — spec §4.2 line 740 prose pseudo-code fix: `app.pdf_ocr_engine.as_ref()`
→ local `pdf_ocr_engine: Option<OllamaVisionOcr>` built in
`ingest_with_config_opts` (정합 with §4.4 eager init, App field 도입 0).

A2 — Cargo.toml dep invariant verified (image crate 미도입 — H-3 DCTDecode-only
v1 invariant 보존; kebab-parse-pdf + kebab-parse-image 가 kebab-app 의 기존
dep). description 갱신은 Step 3 (module 추가 후) 으로 이연.

A3 — cargo tree baseline 캡처 — K5 row #9/#10 의 ground-truth
(.omc/state/pdf-ocr-{app-parse,parse-pdf}-deps.baseline.txt). 본 sub-item
의 다른 step 의 dep graph 변경 0 invariant 의 verifier 의 baseline.
Note: .omc/ 는 .gitignore 대상 — baseline files 는 로컬 파일로 존재.

spec:  docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md
plan:  docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (round 1c ACCEPT)
contract: §9 (additive minor wire bump — 후속 step)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 03:40:49 +00:00
bcd1e37dab chore(repo): .omc/ ignore + AGENTS·GEMINI symlinks + release notes 작성 가이드 강화
- .gitignore: .omc/ (OMC state directory) 추가 — .claude/, .superpowers/ 와 동급
- AGENTS.md / GEMINI.md: CLAUDE.md 로의 symlink — Codex / Gemini CLI 도 동일 지침 따르도록
- CLAUDE.md release 절차: release notes 가 commit subject 단순 나열 대신 사용자 친절한 설명 + 도그푸딩/테스트 결과 포함하도록 가이드 강화

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 23:58:04 +00:00
e7a4330798 Merge pull request 'docs: v0.20 image+pdf handoff + sub-item 3 spec/plan backfill' (#188) from docs/v0-20-image-pdf-handoff into main
Reviewed-on: #188
2026-05-26 23:36:49 +00:00
574e1b1ca1 docs: v0.20 image+pdf handoff + sub-item 3 spec/plan backfill
v0.19.0 release 후 다음 session 인계용 handoff 문서 + 사후 backfill.

- docs/superpowers/handoffs/2026-05-26-v0.20-image-pdf-normalize-handoff.md (540 lines, 9 section)
  - sub-item 1/2/3 머지 결과 + 도그푸딩 baseline (1781 doc / 9050 chunks) + user memory + OMC workflow + 빌드 환경
  - 현재 구현 상태 (v0.19.0, image+pdf) — 정확한 file:line + struct/fn signature + flow
  - 8 TODO 상세 (problem + scope + affected files + risk + trigger 조건)
  - 우선순위 + sequencing 권장 + 새 session 첫 단계 제안

- docs/superpowers/specs/2026-05-26-extractor-dispatch-unification-spec.md (sub-item 3 spec)
- docs/superpowers/plans/2026-05-26-extractor-dispatch-unification-plan.md (sub-item 3 plan)

PR #187 머지 시 source code 만 들어가고 spec/plan 누락 — 동일 PR 의 reference link 가 main 에서 404. 본 commit 으로 backfill.

Assisted-by: Claude Code
2026-05-26 23:34:17 +00:00
c1e82cca92 Merge pull request 'refactor(app): extract dispatch polymorphism — App.extract_for(...) + 11 Extractor registry' (#187) from refactor/extractor-dispatch-unification into main
Reviewed-on: #187
v0.19.0
2026-05-26 21:07:20 +00:00
2c05dbd0dd refactor(app): extract dispatch polymorphism — App.extract_for(...) + 11 Extractor registry
kebab-app 의 hardcoded extract dispatch (`ImageExtractor` + `PdfTextExtractor` + 9 AST `*Extractor` 의 `::new().extract(…)` callsite 11곳 + 9 AST arm match) 를 `App::extract_for(&MediaType, &ExtractContext, &[u8])` 단일 polymorphic call 로 통합. trait 변경 0, parser source 변경 0, wire schema 변경 0 (success path).

핵심 변경:
- App struct 에 `pub(crate) extractors: Vec<Box<dyn Extractor + Send + Sync>>` field + `pub(crate) fn extract_for(...)` helper method.
- App::open_with_config 의 registry init = 11 Extractor (image + pdf + 9 AST).
- ImagePipeline struct 의 `extractor: &'a ImageExtractor` field 제거 + lib.rs:356 local + lib.rs:1235 alias 삭제 (atomic block).
- 9 AST arm (lib.rs:2012-2047 의 12 arm = 11 explicit + 1 wildcard) → 4 arm (9 AST grouped + 7 manifest + 1 shell + 1 other-bail).
- in-crate unit test (app.rs 의 `mod tests_extractor_dispatch`) 3 class: registry length 11 / mutually-exclusive supports() grid (16 sample MediaType) / extract_for error path (Audio).

scope = AST 9-arm + image + pdf extract callsite only. MarkdownExtractor / Tier 2/3 / outer 4-arm / inner 4 match / Chunker dispatch 모두 future-defer (별 PR — spec §11).

Wire schema (success path) 변경 0 — ingest_report.v1 / search_response.v1 / answer.v1 byte-identical (4-medium SMOKE 비교 검증). error.v1.message 의 internal context string wording 변경 (예: `kb-parse-image::ImageExtractor::extract` → `kb-app::extract_for (image)`) 은 spec §5.5 risk acceptance — `error.v1.code` + `error.v1.schema_version` 보존, user-visible surface 외. Cargo workspace.version bump 0.

Refs:
- docs/superpowers/specs/2026-05-26-extractor-dispatch-unification-spec.md (2 round APPROVE)
- docs/superpowers/plans/2026-05-26-extractor-dispatch-unification-plan.md (3 round APPROVE)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 17:43:44 +00:00
96766406aa Merge pull request 'refactor(parse-md): absorb kebab-normalize + kebab-parse-types — 24 → 22 crates + §3.7b 재작성' (#186) from refactor/normalize-absorption into main
Reviewed-on: #186
2026-05-26 15:37:21 +00:00
710945c4b0 refactor(parse-md): absorb kebab-normalize + kebab-parse-types — 24 → 22 crates + §3.7b 재작성
design §3.7b 의 thin layer (ParsedBlock 류) 가 4 parser 중 1개 (markdown) 만 lift 를
경유하는 현실 — fan-in/fan-out 모두 1 → layer 의미 잃음. kebab-normalize (1097 LOC)
+ kebab-parse-types (98 LOC) 둘을 kebab-parse-md 로 흡수.

설계: docs/superpowers/specs/2026-05-26-normalize-absorption-spec.md
플랜: docs/superpowers/plans/2026-05-26-normalize-absorption-plan.md
HOTFIXES: tasks/HOTFIXES.md 의 2026-05-26 entry (design deviation)

- 5 사용 type + 3 forward-declared struct → kebab-parse-md::types module 의 pub explicit re-export.
- build_canonical_document + derive_title + warning_agent → kebab-parse-md::normalize module.
- 4 hard-coded agent literal (lib.rs:122/128/134/153) + warning_agent body return + tracing target literal 모두 보존 — stage label 일관성.
- kebab-app callsite (lib.rs:51 use + :1119 context string) + Cargo.toml 의 2 dep (regular + dead) 제거.
- kebab-chunk + kebab-store-sqlite 의 [dev-dependencies] kebab-normalize → 제거 (kebab-parse-md 로 갈음). 통합 test source 의 use shift.
- test file 이동 (kebab-normalize/tests/normalize_snapshot.rs → kebab-parse-md/tests/).
- workspace Cargo.toml: Hunk (a) members 2 entry 삭제 + Hunk (b) version 0.18.0 → 0.19.0 (frozen contract 변경).
- design §3.7b 4-단락 재작성 (원래 intent 보존 + 현재 상태 + 보존된 surface + future re-extraction trigger).
- design §8 graph 갱신 (3 edge 제거 + 2 forbidden bullet 의미 갱신 + commentary).
- ARCHITECTURE.md crate graph + directory tree mechanical 갱신.
- tasks/INDEX.md L169 closure mention + "Future work / deferred" 섹션 신설 (image/pdf normalize integration entry).
- tasks/HOTFIXES.md 신규 entry (4-block — design deviation Symptom).
- HANDOFF.md cross-link 한 줄.
- 3 dead struct (ParsedImageRegion / ParsedPdfPage / ParsedAudioSegment) 는 보존 — v0.20+ image/pdf normalize integration 의 future surface (spec §11).

Wire / surface impact: 0건. CLI / TUI / MCP / --json 출력 / config / XDG path /
parser_version 모두 unchanged. wire-invisible provenance.events[].agent + tracing target
literal "kb-normalize" 도 보존 — old DB row 와 new DB row 의 audit log 일관성.

Verification: cargo test --workspace --no-fail-fast -j 1 → 1313 passed / 0 failed (172 result blocks).
cargo clippy --workspace --all-targets -j 1 -- -D warnings → 0 warning (5m 46s).
cargo metadata --no-deps --format-version 1 | jq '.workspace_members | length' = 22.
cargo tree -p kebab-app --depth 2 | grep -E "kebab_(parse_types|normalize)" = 0 줄.
2026-05-26 15:00:59 +00:00
d4395a306b Merge pull request 'refactor(source-fs): drop kebab-parse-code dep — 9 tree-sitter grammars drag 제거' (#185) from refactor/source-fs-dep-lightening into main
Reviewed-on: #185
2026-05-26 12:31:29 +00:00
bd48baa19a refactor(source-fs): drop kebab-parse-code dep — 9 tree-sitter grammars drag 제거
kebab-source-fs 가 kebab-parse-code 의 9 tree-sitter grammars 를 drag 했던 무거운 의존성 제거. 4 surface (code_lang_for_path / is_generated_file / is_oversized / BUILTIN_BLACKLIST) 만 사용하지만 dep 그래프에서 9 grammar 전체 link → kebab-source-fs::code_meta 로 이전 + kebab-parse-code 측 cleanup.

핵심 변경:
- kebab-source-fs::code_meta 신설: 4 surface 이전 (BUILTIN_BLACKLIST `pub` for frozen contract + 3 helper fn `pub(crate)`). lib.rs 의 `pub use code_meta::BUILTIN_BLACKLIST` 1 줄 추가 (Option A — 다른 mod surface 무근거 확장 0).
- callsite migration: media.rs (1) + walker.rs (2) + connector.rs (2) 모두 `kebab_source_fs::code_meta::*` 로 갱신.
- kebab-parse-code 측 cleanup: skip.rs 삭제 + lang.rs narrow edit (code_lang_for_path body + unit test 2 + Path import 삭제, module_path_for_* 보존) + lib.rs 헤더 doc rewrite (migration breadcrumb 포함).
- tests/{lang,skip}.rs 13 test 이동 — 12 unit (`src/code_meta.rs::tests`) + 1 integration (`tests/code_meta.rs` for BUILTIN_BLACKLIST frozen contract).
- design §8 graph: edge 제거 + p10-2 inline note. ARCHITECTURE.md 산문 1 줄 갱신. kebab-core::metadata.rs:36 stale dep reference 정정.

G1+G5: cargo tree -p kebab-source-fs | grep tree-sitter = 0 줄.
G2+G3: workspace test 회귀 0 + 13 test 1:1 이동.
G4: design §8 + ARCHITECTURE.md 갱신.

Wire 영향: 없음 (internal Rust crate-API surface 만, user-facing 0). Cargo workspace.version bump 불필요.

Refs:
- docs/superpowers/specs/2026-05-26-source-fs-dep-lightening-spec.md (v3, 4-round APPROVE)
- docs/superpowers/plans/2026-05-26-source-fs-dep-lightening-plan.md (v4, 4-round ACCEPT)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 12:19:32 +00:00
b02ac8200e Merge pull request 'fix(rag): S3 NLI unavailable — hypothesis char budget + token-count fallback retry' (#184) from fix/s3-nli-model-unavailable-diagnose into main
Reviewed-on: #184
2026-05-26 09:17:12 +00:00
336962715a fix(rag): S3 NLI unavailable — hypothesis char budget + token-count fallback retry
S3 dogfood query 의 `nli_model_unavailable` consistent fail root cause = mDeBERTa-v3 tokenizer 의 `OnlyFirst` strategy + 949-token hypothesis. 기존 char-budget 단독 fix 의 KR-extreme density 미해결 → token-count fallback retry + RC1-residual trait dispatch 정합.

핵심 변경:
- kebab-nli::NliVerifier: `hypothesis_token_count(&str) -> Result<usize>` trait method 추가 (default `Ok(0)` backward-compat). `OnnxNliVerifier` 가 *trait impl block* 안에서 real mDeBERTa tokenize override — vtable 등록 보장 (round-3 critic RC1-residual closure).
- kebab-rag::pipeline: `MAX_NLI_HYPOTHESIS_CHARS_INITIAL = 1200` + `MAX_NLI_HYPOTHESIS_CHARS_MIN = 150` const + `pub(crate) fn truncate_chars` pure-fn + `pub fn truncate_hypothesis_for_nli_with_budget` retry helper (char budget 반감 retry, min floor 시 graceful unavailable). step 8.5 hook 의 callsite explicit `match` + `return self.refuse_nli_model_unavailable` 패턴 (`?` 금지 — round-2 plan critic CRITICAL #1 closure).
- SpyNliVerifier 신규 helper (closure score_fn + hypothesis_token_count_fn, 2-arg constructor).
- §5.1 의 2 ignored test (EN-long err + vtable dispatch RC1-residual pin) + §5.2 의 4 boundary test (truncate_chars) + §5.3 의 3 mock multi-hop test (long_en_grounded / long_kr_retries / unrelenting_fallback). +7 new tests (2 ignored default skip).
- tasks/HOTFIXES.md 신규 dated entry `## 2026-05-26 — S3 NLI unavailable ...` — Symptom / Root cause / Action / Amends 4-block.
- spec + plan (`docs/superpowers/{specs,plans}/2026-05-26-s3-nli-model-unavailable-diagnose-*.md`) — 4 round spec + 3 round plan OMC reviewer ACCEPT 산출물.

검증:
- cargo test -p kebab-nli -j 1 → 11/11 pass + 7 ignored default skip.
- cargo test -p kebab-rag -j 1 → 19+3+3+... 전체 pass + 3 new mock + 4 new boundary.
- cargo test --workspace --no-fail-fast -j 1 → **1313 pass (+7 new)**, 0 failed. 회귀 0 (HOTFIX #15 이미 fixed, no remaining flaky).
- cargo clippy --workspace --all-targets -j 1 -- -D warnings clean (type_complexity allow on Arc<dyn Fn> type aliases).

KR safe (token-count retry path) + graceful fallback (min floor 시 기존 unavailable wire 유지, regression 0). Wire 영향 없음 (additive trait method). Cargo bump 불필요.

Refs:
- spec: docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md (4 round APPROVE — analyst → critic + verifier × 4 rounds)
- plan: docs/superpowers/plans/2026-05-26-s3-nli-model-unavailable-diagnose-plan.md (3 round ACCEPT — planner → critic-plan + verifier-plan × 3 rounds)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 09:12:21 +00:00
1a224bf983 Merge pull request 'fix(mcp): HOTFIX #15 — MCP ask multi_hop dispatch-divergence assertion (fixture 보강)' (#183) from fix/hotfix-15-mcp-ask-multi-hop-flaky into main
Reviewed-on: #183
2026-05-26 07:02:26 +00:00
a210bf5d52 docs(rag): HOTFIX #15 spec + plan (3 round OMC reviewer approve)
OMC team `hotfix-15-mcp-flaky` 의 spec + plan 작성 + 리뷰 산출물.

- spec: analyst 가 진단 (root cause = PR-7 probe-first 가 PR-5 test 의 stale empty-KB contract 와 mismatch) + Option A 권장 (test-only fix). 3 round review (critic + verifier): CRITICAL C1 (fixture/query FTS5 0 hits) + MAJOR M1/M2 + 등 closure.
- plan: planner 가 7 steps + subagent dispatch task 작성. 3 round review (critic-plan + verifier-plan): empirical SQLite REPL 검증, level-1 dated entry placement, actual KebabHandler/KebabAppState pattern 정합.

implementation = 429287f commit (executor).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 06:52:04 +00:00
429287f6cb fix(mcp,tests): HOTFIX #15 — MCP ask multi_hop dispatch-divergence assertion (fixture 보강)
PR-7 (v0.18 dogfood probe-first) 머지 후 PR-5 의 test `ask_tool_routes_multi_hop_true_to_decompose_first` 가 stale empty-KB contract 로 deterministic fail. test-only fix — production code 0 touch.

- `minimal_config`: `score_gate = 0.0` (probe 의 second gate `top_score < score_gate` 우회, test config isolation).
- fixture `workspace_root/note.md`: "This note is about a compound containing X and Y in detail." — build_match_string 의 token_and branch (FTS5 implicit-AND) 가 `compound` + `about` + `and` 셋 다 매칭 필요. empirical SQLite REPL (V007 trigram DDL) 로 1 hit 확정.
- 기존 assertion 보존, single-pass branch 도 query "anything" 으로 fixture 미매칭 → NoChunks refusal 유지.
- 신규 `_multi_hop_short_circuits_when_probe_empty` test (REQUIRED — round-1 critic HIGH + verifier 격상): probe-empty short-circuit 의 MCP-layer wire shape pin (kebab-rag::multi_hop_empty_probe_pool_refuses_before_any_llm_call 은 RAG-layer 만 pin, MCP-layer 안전망 부재).
- module doc 갱신: 두 test 가 각각 pin 하는 contract enumerate. inline 주석 (line 94-101) 도 새 contract 정합.
- HOTFIXES.md 신규 dated entry \`## 2026-05-26 — HOTFIX #15 ...\` (date-top convention).

검증: cargo test --workspace -j 1 — 회귀 0 (known flaky 1 → 0). cargo clippy --workspace --all-targets -j 1 -- -D warnings clean.

Wire / behavior / version cascade: 0.

Refs: docs/superpowers/specs/2026-05-26-hotfix-15-mcp-ask-multi-hop-flaky-spec.md (review 3 rounds APPROVE)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 06:51:06 +00:00
08495eb425 Merge pull request 'chore(release): bump version 0.17.2 → 0.18.0 + cut fb-41 multi-hop' (#182) from chore/v0-18-0-cut into main
Reviewed-on: #182
v0.18.0
2026-05-26 05:36:05 +00:00
98cf4e8a04 chore(release): bump version 0.17.2 → 0.18.0 + cut fb-41 multi-hop
v0.18.0 cut PR. fb-41 multi-hop RAG + NLI verification 의 user-visible surface (PR #176-180) + post-PR9 cleanup/refactor (PR #181) ship 마무리.

## 변경 사항

### Version
- workspace `Cargo.toml`: 0.17.2 → 0.18.0. Cargo.lock 자동 cascade (24 kebab-* crate 모두 0.18.0).

### Frozen design contract
- `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`:
  - §3.8 RAG types — RefusalReason 에 NliVerificationFailed + NliModelUnavailable + MultiHopDecomposeFailed 추가 + Multi-hop RAG + NLI verification 의 ask_multi_hop facade + step 8.5 NLI hook + HopRecord / VerificationSummary 명시.
  - §9 versioning rules 표 — nli_model_version row 신규 (선택 — v0.19+ second adapter 시 wire surface candidate).

### Status transitions
- `docs/superpowers/specs/2026-05-25-p9-fb-41-finalize-spec.md`: status approved-by-team → completed.
- `docs/superpowers/plans/2026-05-25-p9-fb-41-finalize-plan.md`: status approved-by-team → completed (spec_status 도).

### User-facing docs
- `README.md`: 명령 표의 `kebab ask` row 에 `--multi-hop` flag + NLI 옵션 안내 한 단락 (mDeBERTa-v3 XNLI 280 MB 자동 다운로드 / RAM peak ~7-8 GB / threshold tuning 0.5 prod / 0.0 disable).
- `docs/SMOKE.md`: `[rag] nli_threshold = 0.0` config 예시 + 활성화 절차 + first-run download + RAM 권장 inline 안내.

### Handoff + dashboard
- `HANDOFF.md`: 한 줄 요약 의 현재 version 0.17.2 → 0.18.0. v0.18.0 cut entry 추가 (fb-41 multi-hop + NLI + cleanup ship). Component 카운트 단락에 fb-41 PR-9 의 kebab-nli + ask_multi_hop 추가 명시. 머지 후 결정 절 맨 위에 v0.18.0 fb-41 entry 신규.
- `tasks/INDEX.md`: p9-fb-41  머지 (v0.18.0). v0.18.0 subsection 신규 — PR #176-181 의 6 sub-PR + cleanup 각 한 줄 요약.

## 비범위 / 별 작업
- HOTFIXES.md 의 fb-41 entry 는 이미 PR #180 (PR-9d closure) 에서 작성 완료 — 본 cut PR 에서 추가 anchor 불필요.
- SKILL.md 의 v0.18+ NLI 안내는 이미 PR-9c-2 에서 inline 추가 완료.

## 검증
- `cargo check --workspace -j 1` 통과 (모든 24 crate v0.18.0 확인).
- frozen design 의 RefusalReason enum 확장이 kebab-core 의 production code 와 정합 (PR-9c-1 시점부터 동일 variants 있음).

Wire 영향: 없음 (additive minor 는 PR-9c-1 에서 이미 ship, 본 commit 은 documentation cascade only).
Behavior 영향: 없음.

머지 후 `gitea-release v0.18.0` 으로 tag + release notes 작성.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 05:18:08 +00:00
4030f04f37 Merge pull request 'chore: workspace-wide cleanup — clippy::pedantic baseline + auto-fix' (#181) from chore/workspace-wide-cleanup-pre-v0-18 into main
Reviewed-on: #181
2026-05-26 04:48:50 +00:00
7c27633df2 chore(rag): post-PR9 refactor — H1/H2/H3/D/E + test coverage + post-refactor dogfood retest
OMC team `post-pr9-refactor` 의 architectural cleanup. architect priorities 분석 후 executor + test-engineer 가 file edits, system-architect 가 component-level review 로 *pre-cut nothing — all v0.18.1+ defer* 결론.

## Executor 작업 (H1/H2/H3/D/E)

- **H1** (kebab-nli/src/onnx.rs): `[models.nli]` config wire 활성화. `DEFAULT_MODEL_ID` const 제거 (kebab-config 의 NliCfg::defaults 가 single source). OnnxNliVerifier::new 가 config.models.nli.model 읽고 config.models.nli.provider 가 "onnx" 아니면 anyhow::bail. 3 stale "PR-9c-1 will wire this" 코멘트 제거. 2 unit test 추가 (`new_uses_config_model_id`, `new_rejects_unsupported_provider`).
- **H2** (kebab-rag/src/pipeline.rs): `truncate_for_nli(premise: &str, _hypothesis: &str)` → `truncate_for_nli(premise: &str)`. v0.18.1 placeholder doc 제거. 4 callsite (tests/multi_hop.rs) 갱신 + test rename `multi_hop_truncate_for_nli_preserves_hypothesis` → `multi_hop_truncate_for_nli_char_budget` (contract 정합).
- **H3** (kebab-rag/src/pipeline.rs:1041): `was_truncated` 가 tracing::debug! 으로 surface (observability 추가, signature 보존 — caller logging contract).
- **D** (kebab-mcp/tests/tools_call_ask_multi_hop.rs): request_timeout_secs 2 → 5 (slow CI 안정성), `mh_code` discriminator 제거. dispatch contract = `mh.is_error.unwrap_or(false)` (기존 assertion 으로 충분).
- **E** (tasks/HOTFIXES.md + pipeline.rs:1633-1638): fb-41 PR-9 closure entry 의 sibling 으로 "### PR-9 NLI refusal: terminal Synthesize hop omitted from hops trace" subsection 추가. pipeline 의 "cleanup deferred to a follow-up" → "// See tasks/HOTFIXES.md ... for follow-up" cross-link.

## Test-engineer 작업 (T1/T2/T3/T4, 9 new tests)

- **T1** (kebab-nli/src/onnx.rs::tests): sanitize_model_id 3 unit (replaces_slash / idempotent / leaves_other_chars).
- **T2** (kebab-rag/tests/multi_hop_nli_panic.rs 신규): 2 panic-path tests — facade invariant (`expect("verifier must be Some when nli_threshold > 0.0")`) 의 #[should_panic] + threshold=0 의 companion.
- **T3** (kebab-rag/tests/multi_hop_nli_stream.rs 신규): 2 StreamEvent::Final tests — refuse_nli_verification + refuse_nli_model_unavailable 의 stream_sink Final 분기 wire shape pinning.
- **T4** (kebab-app/tests/open_with_config_nli.rs 신규): 2 NLI failure path — model_dir 가 unwritable 일 때 App::open_with_config 의 Result<App> Err (with "OnnxNliVerifier" in chain) + threshold=0 일 때 graceful skip.

## System-architect 결론

3 lenses (absorption / duplication / under-engineered interface) 분석 결과 — *pre-cut nothing*. Top-3 items 모두 v0.18.1+ defer:
- Lens 1: kebab-normalize + kebab-parse-types 흡수 가능 (parse-md 만 사용, 5 parsers 우회) → v0.18.1+.
- Lens 3: Extractor + Chunker trait 의 dead polymorphism (모든 callsite 가 hardcoded) → v0.18.1+.
- Lens 1 bundled: kebab-source-fs 가 kebab-parse-code 의 9 tree-sitter grammars drag → low-risk dep-graph win, v0.18.1+ bundled.
- Defer-with-intent: LanguageModel async refactor (cloud-LLM 시), NliVerifier::score_batch + typed NliError (2nd impl 시), compute_stale → kebab-core::stale.

보고서: /build/cache/tmp/post-pr9-refactor-priorities.md, /build/cache/tmp/system-architecture-priorities.md (둘 다 repo 외 — analysis 보존).

## 검증

- cargo test -p kebab-nli -j 1 → 11/11 pass.
- cargo test -p kebab-rag -j 1 → 75/75 pass (5 NLI multi-hop + 4 신규 T2/T3 포함).
- cargo test -p kebab-app -j 1 → 23 pass + 2 ignored (T4 의 2 포함).
- cargo test -p kebab-mcp --test tools_call_ask_multi_hop -j 1 → 1 pass + 1 pre-existing flaky (HOTFIX #15, no_chunks short-circuit, executor D fix 와 무관 — line 86 의 base assertion 이 fixture 없어서 fail).
- cargo clippy --workspace --all-targets -j 1 -- -D warnings clean.
- cargo test --workspace --no-fail-fast -j 1 → 1304 passed (+11 new) + 1 pre-existing flaky 동일.
- **Post-refactor dogfood retest byte-identical** (PR-9d / post-cleanup / post-refactor 3번 모두): S7 0.0035389824770390987, S1 0.058334656059741974, S10 0.0027875436935573816, S3 nli_model_unavailable.

docs/dogfood/v0.18.0/SUMMARY.md 에 "Post-architectural-refactor retest" section 추가.

Wire 영향: 없음.
Behavior 영향: 없음 (H1 의 config wiring 가 default 와 같은 model → byte-identical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 04:42:37 +00:00
3712d005cc chore(rag): post-cleanup dogfood retest — byte-identical 회귀 0
workspace-wide cleanup commit 직후 동일 dogfood S7/S1/S10/S3 재실행. NLI score byte-identical 확인:

- S7: nli_verification_failed, 0.0035389824770390987 (PR-9d 동일)
- S1: nli_verification_failed, 0.058334656059741974 (PR-9d 동일)
- S10: nli_verification_failed, 0.0027875436935573816 (PR-9d 동일)
- S3: nli_model_unavailable (PR-9d 동일, cleanup 무관 — v0.18.1 follow-up)

cleanup = mechanical refactor only. behavior 회귀 0. cut PR v0.18.0 진행 가능.

docs/dogfood/v0.18.0/SUMMARY.md 에 "Post-cleanup retest" section 추가.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 03:19:15 +00:00
7c85de065a chore: workspace-wide cleanup — clippy::pedantic baseline + auto-fix
cut PR v0.18.0 전 마지막 정리. 사용자 요청: "전체 코드베이스를 깔끔하고 알아보기 쉽게".

## Workspace lints

- `Cargo.toml` 의 `[workspace.lints.clippy]` 에 `pedantic = "warn"` (priority -1) + 의도적 allow-list 추가:
  - cast_possible_truncation / cast_possible_wrap / cast_sign_loss / cast_precision_loss — ONNX i64 / hash modular reduction 등 의도적 truncation.
  - doc_markdown / missing_errors_doc / missing_panics_doc — cosmetic doc style.
  - too_many_lines / module_name_repetitions / must_use_candidate / needless_pass_by_value / manual_let_else / items_after_statements / similar_names — informational only.
  - format_collect / match_wildcard_for_single_variants / trivially_copy_pass_by_ref / unnecessary_wraps — intentional patterns (exhaustive match, future Result variants 등).
  - default_trait_access — `Foo::default()` 가 idiomatic.
  - float_cmp — NLI / RRF score 의 explicit threshold 비교 의도.
  - struct_excessive_bools / case_sensitive_file_extension_comparisons / naive_bytecount / ignore_without_reason — domain-specific 의도.
  - format_push_string / return_self_not_must_use / match_same_arms — builder / wire-label / hot-path 패턴 보존.
  - needless_continue / used_underscore_binding / nonminimal_bool / unreadable_literal / many_single_char_names / doc_link_with_quotes / assigning_clones / collapsible_str_replace / trivial_regex / elidable_lifetime_names / range_plus_one / explicit_iter_loop / implicit_hasher / ref_option — remaining low-value style.
- 각 24 crate `Cargo.toml` 에 `[lints] workspace = true` 추가.

## Auto-fix

`cargo clippy --workspace --all-targets --fix` 적용 — 128 files changed, 552 insertions / 472 deletions. 주로:
- uninlined_format_args (~18): `format!("{}", x)` → `format!("{x}")`.
- redundant_closure_for_method_calls (~33): `.map(|x| x.foo())` → `.map(T::foo)`.
- 그 외 mechanical refactor.

## 검증

- `cargo clippy --workspace --all-targets -j 1 -- -D warnings` clean (pedantic + 모든 lint group).
- `cargo test --workspace --no-fail-fast -j 1` — **1293 tests pass + 1 pre-existing flaky fail** (`kebab-mcp::tools_call_ask_multi_hop::ask_tool_routes_multi_hop_true_to_decompose_first`, HOTFIX candidate, cleanup 무관). 회귀 0.

Wire 영향: 없음.
Behavior 영향: 없음 (mechanical refactor only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 03:01:58 +00:00
a0ccc7b021 Merge pull request 'feat(rag): fb-41 PR-9d — dogfood retest + HOTFIXES closure + corpus 보존' (#180) from feat/fb-41-pr-9d-dogfood-retest into main
Reviewed-on: #180
2026-05-26 02:01:56 +00:00
a8fd6994d2 chore(rag): PR-9d SUMMARY 의 latency 표 정정
PR-8 baseline 의 S1/S10 latency 추정값 (~150s, ~80s) 이 부정확. `results/s1-multihop.json` + `results/s10-multihop.json` 가 실제로 614s / 589s (`jq '.usage.latency_ms'` 측정) — *PR-8 시점 baseline 이 아닌 더 이전 timeline*. S7 만 `results/post-pr8/` 에 retest 보존되어 비교 의미 있음 (158s baseline → PR-9 241s with NLI first-run download).

SUMMARY.md 의 latency 표를 정정 — S1/S10 의 *동일 시점 baseline 부재* 명시 + S7 의 단일 비교만 의미 있음 caveat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 01:47:38 +00:00
505b3889fb feat(rag): fb-41 PR-9d — dogfood retest + HOTFIXES PR-9 closure + docs/dogfood/v0.18.0/ 보존
PR-9 의 진짜 작동 확인 — PR-1~PR-9c-2 머지 후 `/build/cache/dogfood-v018/` corpus 의 S7/S1/S3/S10 multi-hop retest.

핵심 결과: **S7 hallucination root cause 해결 확정**.
- PR-8 baseline: `grounded=true, refusal_reason=null`, **답변=Adam gradient 공식** (caffeine 질문에 무관 hallucination, silent).
- PR-9 retest: `refusal_reason=nli_verification_failed, nli_score=0.0035` (graceful refuse, NLI 가 entailment 0.35% 검출).

전체 비교 (4 case):
- S7  hallucination FIXED.
- S1  둘 다 reject, NLI 가 더 deterministic (0.058).
- S3 ⚠ consistent fail (`nli_model_unavailable`, 313s) — *v0.18.1 follow-up* (kebab-nli 의 특정 input 의존 fail, debug log emit 안 됨 → 진단 어려움).
- S10  둘 다 reject, NLI 가 더 deterministic (0.0028).

- docs/dogfood/v0.18.0/SUMMARY.md (sanitized 보고서) + s{1,3,7,10}-multihop-post-pr9.json (sample wire output, repo 보존).
- tasks/HOTFIXES.md 의 fb-41 PR-9 entry: "예정" → "완료 (2026-05-26)" + 비교 표 inline + S3 follow-up subsection (v0.18.1 candidate).

RAM: 5-6 GB → 7-8 GB (ONNX session ~600 MB), 16 GB 안전.
Disk: NLI model cache 1.1 GB (XDG default 또는 storage.model_dir).
Wire 영향: 없음 (PR-9c-1 의 schema 변경만 + 측정값 sample 보존).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 01:44:57 +00:00
772575d8f0 Merge pull request 'feat(rag): fb-41 PR-9c-2 — pipeline integration + mock test + SKILL.md (★ NLI 실 활성화)' (#179) from feat/fb-41-pr-9c-2-pipeline-integration into main
Reviewed-on: #179
2026-05-26 01:03:18 +00:00
00ffe9c792 feat(rag): fb-41 PR-9c-2 — pipeline integration + mock test + SKILL.md (★ NLI 실 활성화)
PR-9c-1 의 wire surface 위에 behavior 활성화 — `ask_multi_hop` 의 step 8.5 hook 가 `[rag] nli_threshold > 0` 일 때 NLI 검증 실 수행. **첫 user-visible behavior change** in PR-9.

- crates/kebab-rag/src/pipeline.rs:
  - ask_multi_hop step 8.5 NLI hook (empty answer 가드 + truncate_for_nli + verifier.score + verification field + refusal 분기).
  - refuse_nli_verification helper (verification: Some(...) + RefusalReason::NliVerificationFailed).
  - refuse_nli_model_unavailable helper (verification: None + RefusalReason::NliModelUnavailable).
  - truncate_for_nli helper (module-level pub fn, MAX_NLI_PREMISE_CHARS = 4 * 400 = 1600 chars 의 chars-based budget, _hypothesis 미사용 placeholder — v0.18.1 token-budget 갱신 candidate).
  - PR-9c-1 의 #[allow(dead_code)] 두 곳 제거 (verifier field + with_verifier builder; doc 의 transitional sentence 도 정리). round-1 PR-9c-1 review N1 carry-forward closure.
- crates/kebab-app/src/app.rs:
  - App::open_with_config 의 NliVerifier construction — config.rag.nli_threshold > 0 → OnnxNliVerifier::new + Arc::new wrap + 후속 RagPipeline 초기화 시 with_verifier 호출. 실패 시 ? 전파 (시그니처 Result<Self> 그대로 — caller cascading 0).
  - kebab-app/Cargo.toml 에 kebab-nli path 의존 추가.
- crates/kebab-rag/tests/multi_hop.rs + tests/common/mod.rs:
  - MockNliVerifier (pass / fail / err 생성자 + score call_count instrumented).
  - multi_hop_nli_pass_keeps_grounded — entailment 0.9 → grounded=true, verification.nli_passed=true.
  - multi_hop_nli_fail_refuses — entailment 0.1 → refusal=NliVerificationFailed.
  - multi_hop_nli_disabled_skip_verify — threshold 0.0 → verify skip, verification=None.
  - multi_hop_nli_model_unavailable_refuses — verifier Err → refusal=NliModelUnavailable.
  - multi_hop_truncate_for_nli_preserves_hypothesis — long premise truncation + hypothesis 보전.
- integrations/claude-code/kebab/SKILL.md: mcp__kebab__ask 절에 NLI 안내 한 단락 (verification.nli_passed 의미 + threshold tuning + nli_verification_failed/nli_model_unavailable refusal handling).

검증: cargo test --workspace -j 1 — 5 신규 multi-hop pass + 회귀 0 (pre-existing kebab-mcp::tools_call_ask_multi_hop 동일 flaky). cargo clippy --workspace --all-targets -j 1 -- -D warnings clean.
Wire 영향: PR-9c-1 의 schema 변경에 *behavior wiring* — answer.v1.verification field 가 multi-hop happy path + refuse path 양쪽에서 채움.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 00:55:02 +00:00
681c48b2a3 Merge pull request 'feat(rag): fb-41 PR-9c-1 — core types + wire scaffolding (NLI verification)' (#178) from feat/fb-41-pr-9c-1-core-types-wire into main
Reviewed-on: #178
2026-05-26 00:09:03 +00:00
546c1564b0 feat(rag): fb-41 PR-9c-1 — core types + wire scaffolding (NLI verification)
Surface-only PR (no behavior wiring — that's PR-9c-2):
- kebab-core: RefusalReason::NliVerificationFailed + NliModelUnavailable (serde rename_all="snake_case", wire = identical strings).
- kebab-core: Answer.verification: Option<VerificationSummary> field (additive minor wire — pre-v0.18 reader 무영향).
- kebab-core: VerificationSummary { nli_score: f32, nli_threshold: f32, nli_passed: bool } struct + lib.rs 재-export.
- kebab-config: NliCfg { model, provider } + ModelsCfg.nli (default Xenova/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7).
- kebab-config: RagCfg.nli_threshold: f32 (default 0.0 = disabled, spec §2.6 single gate).
- kebab-config: env override KEBAB_MODELS_NLI_MODEL/PROVIDER + KEBAB_RAG_NLI_THRESHOLD (parse 실패 시 tracing::warn + default 유지).
- kebab-rag: RagPipeline.verifier: Option<Arc<dyn NliVerifier>> field + with_verifier builder (모두 #[allow(dead_code)] — PR-9c-2 의 step 8.5 hook 가 활성화 시 제거). RagPipeline::new signature 유지 (round-2 NEW-M1 Option B).
- kebab-rag: Cargo.toml 에 kebab-nli path 의존 추가.
- kebab-store-sqlite + kebab-tui: 두 신규 RefusalReason variant 에 대한 exhaustive match arm 추가 (snake_case label / 표시 문구).
- 모든 Answer 구축 site (rag 6 + cli/tui/eval 3 fixture) 에 verification: None 추가.
- wire schemas: answer.schema.json verification field + \$defs.VerificationSummary + refusal_reason.enum 2 추가. error.schema.json code.enum + details.description 2 추가 (forward-looking reserved).
- docs/ARCHITECTURE.md: Mermaid Adapters subgraph 의 nli 노드 + rag→nli + app→nli (forward-looking) + nli→config edges. nli→core edge 는 skip (kebab-nli/Cargo.toml direct dep 가 config 만, ARCHITECTURE 컨벤션 = direct deps only). 디렉토리 트리에 crates/kebab-nli/ 추가.

Tests: kebab-core 3 (serde rename + verification skip + struct shape) + kebab-config 6 (defaults + legacy + env + malformed env) + kebab-cli wire 5 (schema verification + enum 검증).
검증: cargo test --workspace -j 1 회귀 0 (pre-existing kebab-mcp::tools_call_ask_multi_hop flaky 1개 동일 — spec 에 명시된 known-flaky). cargo clippy --workspace --all-targets -D warnings clean.
Wire 영향: additive minor — answer.v1 의 verification optional + refusal_reason.enum 확장 + error.v1.code 확장.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 23:27:36 +00:00
79ad6e376f Merge pull request 'feat(nli): fb-41 PR-9b — OnnxNliVerifier ONNX inference + model download' (#177) from feat/fb-41-pr-9b-onnx-nli-inference into main
Reviewed-on: #177
2026-05-25 22:24:01 +00:00
6ffbe0a5a3 chore(nli): PR #177 회차 1 리뷰 반영 (N1 cache-hit probe + N2 test pollution)
- N1: fetch 의 cache-hit 검사 경로가 실제로는 download 트리거 (ApiRepo::get 가 cache miss 시 download 후 path 반환). log 의 "NLI artifact cache hit" 가 *방금 download 한 직후* 출력 — misleading. hf_hub::Cache::new(cache_dir).repo(repo).get(filename).is_some() 로 변경 — Cache::get 은 fs lookup only, 네트워크 안 탐. actual download 횟수는 변화 없음 (1번), log accuracy 만 개선.
- N2: new_succeeds_on_default_config / score_empty_hypothesis_returns_err 가 XDG 실 디렉토리 (`~/.local/share/kebab/models/nli/...`) 를 create_dir_all → test pollution. tempdir_config() 헬퍼 추가 — TempDir 으로 storage.data_dir override, model_dir 는 `{data_dir}/models` 그대로 두어 expand_path 의 substitution 검증도 유지.

cargo test -p kebab-nli -j 1 → 6 passed / 0 failed (unit) + 5 ignored (integration, manual).
cargo clippy -p kebab-nli --all-targets -j 1 -- -D warnings clean.
inference.rs 미수정 → manual --ignored smoke 결과 (5/5 PASS) 그대로 유효.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 22:22:30 +00:00
ab3408cb49 chore(nli): PR-9b inference test 2 의 expectation 정정
기존 expectation `entailment < 0.3` 가 너무 strict — mDeBERTa-v3 multilingual NLI 가 두 caffeine 사실 (premise: "Caffeine is a stimulant.", hypothesis: "The chemical formula of caffeine is C8H10N4O2.") 의 *neutral* 을 0.53 으로, entailment 를 0.43 으로 판단함 (서로 entail 안 하지만 모순도 아님 = 정확히 neutral).

spec §3 PR-9b 의 "entailment 낮음 — neutral/contradiction 이 winning channel" 의 *spirit* 은 *neutral 이 max* 임. expectation 을 `s.neutral > s.entailment && s.neutral > s.contradiction` 로 변경.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 22:10:51 +00:00
b807fd5aa5 feat(nli): fb-41 PR-9b — OnnxNliVerifier 의 ONNX inference + model download
- OnnxNliVerifier fields: model_id, cache_dir (XDG model_dir/nli/<sanitized>), session/tokenizer OnceLock.
- new(): eager cache_dir stamp만 — actual model download + Session::commit_from_file 는 첫 score 호출 시 ensure_loaded() 가 lazy 수행.
- score(): ensure_loaded → tokenizer.encode(pair, OnlyFirst truncation max_length=512) → ndarray Array2<i64> → ort::Session::run → logits[1,3] → NliScores::from_xnli_logits.
- empty hypothesis edge: defense-in-depth bail (spec §2.3 의 caller-side skip 외 추가).
- sanitize_model_id helper: "/" → "_".
- 5 #[ignore] integration tests (EN self-entailment, EN unrelated, KR entailment, long premise truncation, empty hypothesis err) — manual smoke 가 PR description 첨부.

Cargo.toml: `download-binaries` feature 를 kebab-nli 의 ort dep 에 활성화 (PR-9b prep commit 의 후속). 단독 `cargo test -p kebab-nli` 의 per-crate feature 유니온은 fastembed 없이 ort/download-binaries 가 OFF 되어 ort-sys link 가 실패 — kebab-nli 측에서 명시적으로 켜 줘야 standalone build 가 ONNX 런타임 link 됨. workspace 전체 빌드에서는 fastembed 의 동일 opt-in 과 union 되어 부작용 없음.

Verification:
- cargo test -p kebab-nli -j 1 — PR-9a 의 6 unit pass (`score_returns_err_in_skeleton` → `score_empty_hypothesis_returns_err` 로 stub→실 path 갱신, 갯수 유지).
- cargo clippy -p kebab-nli --all-targets -- -D warnings clean.
- cargo build --workspace -j 1 — 회귀 0.
- Manual --ignored smoke 결과 PR body 첨부.

Wire 영향: 없음 (crate-internal).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 21:56:22 +00:00
93436f9eca feat(nli): fb-41 PR-9b prep — activate ort/tokenizers/hf-hub/ndarray/tracing deps in kebab-nli
PR-9a 의 workspace.dependencies 만 declared 였던 5 crate 의존을 kebab-nli/Cargo.toml 에 활성화. PR-9b 의 OnnxNliVerifier 실 구현이 본 commit 위에서 빌드 가능.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 21:42:07 +00:00
11ce7847a1 Merge pull request 'feat(nli): fb-41 PR-9a — kebab-nli crate skeleton + workspace deps' (#176) from feat/fb-41-pr-9a-kebab-nli-crate into main
Reviewed-on: #176
2026-05-25 21:34:49 +00:00
1d88dccf8a chore(nli): PR #176 회차 1 리뷰 반영
- lib.rs::NliScores::faithfulness doc 의 `rag.nli_faithfulness_min` → `rag.nli_threshold` (spec §2.5/§2.6 의 실 config knob 이름 정합).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 21:25:44 +00:00
1eb0bbecb3 feat(nli): fb-41 PR-9a — kebab-nli crate skeleton + workspace deps
- 신규 crate kebab-nli (trait + impl 동일 crate, v0.18 scope = ONNX adapter 1개).
- NliVerifier trait + NliScores struct (XNLI 3-channel: entailment/neutral/contradiction).
- private softmax3 (log-sum-exp 안전).
- OnnxNliVerifier placeholder (PR-9b 가 ONNX inference + model download 추가).
- workspace.dependencies 추가: ort 2.0-rc.9, tokenizers 0.21 (default-features=false, onig), hf-hub 0.4, ndarray 0.16.

Pre-flight (PR-9 design contract 의 gate):
- HF Xenova/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 model.onnx + tokenizer.json → HTTP/2 302 (HF S3 routing, file 존재).
- tokenizers --no-default-features -F onig 의 standalone repro: SentencePiece mDeBERTa tokenizer.json 로드 OK (KR 9 tokens / EN 11 tokens 정상 encode).
- Cargo features 결정 trace: tokenizers = { default-features = false, features = ["onig"] } lock.

Tests: 6 unit (softmax3 정규화 + 불변성 + XNLI logits 변환 + faithfulness + new + score stub) — 통과.
Verification: cargo test -p kebab-nli -j 1 (6/6) + cargo clippy -p kebab-nli --all-targets -j 1 -- -D warnings clean.
Workspace: cargo test --workspace -j 1 — pre-existing kebab-mcp::tools_call_ask_multi_hop 1 fail (main baseline 동일 fail, PR-9a 무관 — ingest fixture/Ollama 의존 flaky).

Wire 영향: 없음 (crate 도입만).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 21:22:38 +00:00
44fbffff26 docs(rag): fb-41 PR-9 spec + plan — NLI verification + v0.18.0 cut
fb-41 multi-hop RAG 의 dogfood S7 hallucination root cause = LLM-self-judge ceiling.
대응 = NLI-based post-synthesis verification (mDeBERTa-v3 XNLI, 280 MB ONNX).

산출물:
- docs/superpowers/specs/2026-05-25-p9-fb-41-finalize-spec.md (review_round=5,
  4 OMC reviewer APPROVE: 1 CRITICAL + 9 MAJOR + 3 MINOR → 1 NIT carry-forward).
- docs/superpowers/plans/2026-05-25-p9-fb-41-finalize-plan.md (plan_review_round=3,
  4 OMC reviewer APPROVE: 15 issues → 0 actionable).

5 sub-PR (PR-9a~9d) + cut PR. 작업 21-31h / wall time 28-44h.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 21:22:20 +00:00
63aece3ea1 Merge pull request 'fix(rag): fb-41 PR-8 multi-hop synthesize safety in depth (pool 15 + self-check rule)' (#175) from feat/fb-41-pr-8-multi-hop-synthesize-safety into main 2026-05-25 12:51:46 +00:00
28a8bbeace chore(rag): PR #175 회차 1 리뷰 반영
HOTFIXES.md 의 fb-41 entry 에 *post-PR-7 dogfood retest + PR-8 partial
mitigation* sub-section 추가 + *PR-9 NLI plan* anchor + 사용자 영향
절 갱신. config.rs 의 doc reference 가 정확한 entry sub-section
가리키도록 조정 — dangling reference 해소.

검증
- `cargo test -p kebab-config -j 1` — 모든 test 통과.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 12:51:15 +00:00
52a97303dc fix(rag): fb-41 PR-8 — multi-hop synthesize safety in depth (pool 15 + self-check rule)
v0.18 cut 전 fb-41 multi-hop RAG **layered defense** — PR-7 의 pre-decompose
probe gate 위에 추가 safety. PR-7 의 fix 만으로는 hybrid mode 의 RRF
top_score 가 gate 통과 시 (도그푸딩 S7 의 caffeine query) hallucination
여전히 발생 — synthesize 단계 자체의 safety 보강 필요.

**중요**: 본 PR 만으로는 S7 hallucination 완전 차단 안 됨 (gemma3:4b 의
prompt-following 한계 — 추가 dogfood S7 retest 에서 확인). 진짜 fix 는
PR-9 (NLI-based post-synthesis verification). PR-8 은 그 사이의 *partial
mitigation + safety in depth* — latency 4× 개선 (614s → 158s) + future
larger LLM 용 prompt rule.

설계: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md
계획: /build/cache/dogfood-v018/results/PR-9-DESIGN.md (사용자 결정 후
spec/plan 으로 promotion)

## 변경

- `crates/kebab-config/src/lib.rs`:
  - `RagCfg::multi_hop_max_pool_chunks` default **30 → 15**.
  - rationale doc — gemma3:4b 가 30-chunk large prompt 에서 citation
    rule 잃는 측정 결과.
  - 2 unit test (`default_*` rename + `legacy_*` assert) 갱신.
- `crates/kebab-rag/src/pipeline.rs`:
  - `MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT` 에 **답하기 전 self-check** rule
    추가 — "[원본 질문] 의 핵심 entity (고유명사, 화학식, 수치 단위,
    코드명, 약자) 가 [근거] 본문에 literal 으로 등장하지 않으면 다른
    entity 의 정보로 답을 합성하지 말고 '근거가 부족하다' 답한다". example
    (caffeine + Adam optimizer chunk) 도 명시.

## 도그푸딩 결과 (retest with PR-7 + PR-8)

| query | path | grounded | latency | answer |
|---|---|---|---|---|
| caffeine formula | single-pass | false (LlmSelfJudge) | 30s | "근거가 부족하다" ✓ |
| caffeine formula | multi-hop pre-fix | true ✗ | 141s | hallucination |
| caffeine formula | multi-hop PR-7 | true ✗ | 143s | hallucination (probe gate top_score 0.5 > 0.30) |
| caffeine formula | multi-hop PR-8 | true ✗ | **158s** | hallucination (LLM 가 새 rule 무시) — **latency 4× 개선** |

PR-8 의 부분 성과:
- pool 30→15 로 synthesize prompt size ↓ → latency 614s → 158s.
- prompt rule 은 future larger LLM (gemma2:9b, qwen2.5:7b 등) 에서 가치 ↑.

PR-8 의 한계:
- gemma3:4b 의 prompt-following 한계 — strong rule 도 무시하고 다른 entity
  chunk (Adam optimizer formula) 의 본문을 caffeine 화학식 출처로 인용.
- LLM-self-judge 기반 safety 의 ceiling.

## 진짜 fix → PR-9 (별 PR)

학계 / industry 표준 검색 결과 (Self-RAG, CRAG, Auto-GDA, MedTrust-RAG):
deterministic post-synthesis verification 이 정답 path. **NLI-based
groundedness check** — mDeBERTa-v3-base-xnli (280 MB multilingual) ONNX
model 이 (premise=packed_chunks, hypothesis=answer) entailment 검사. score
< 0.5 면 refuse. PR-8 위에 layered defense.

## 검증

- `cargo test -p kebab-config -p kebab-rag -j 1` — 모든 test 통과
  (config default test 2개 갱신, rag tests 영향 없음).
- `cargo clippy -p kebab-config -p kebab-rag --all-targets -j 1 --
  -D warnings` clean.
- 단일 crate 직렬 build (16 GB RAM 제약).
- S7 dogfood retest — hallucination 여전 (PR 본문에 정직 명시).

## 변경 없음

- Wire schema — additive (config knob default 만 변경).
- PR-7 의 probe gate — 그대로 작동 (gate 통과 시 PR-8 의 추가 safety
  layer).
- 다른 도그푸딩 P1 항목 (citation 일관성, binary path) — 별 PR.

## 다음

- **PR-9a/b/c**: NLI-based post-synthesis verification — 진짜 fix.
- PR-9 머지 후 dogfood S7 재검증 (예상: refuse + nli_score < 0.5).
- v0.18.0 cut.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 12:44:31 +00:00
71fb2cbcb3 Merge pull request 'fix(rag): fb-41 PR-7 multi-hop pre-decompose score-gate (S7 hallucination 회귀 핀)' (#174) from feat/fb-41-pr-7-multi-hop-score-gate-fix into main 2026-05-25 12:05:23 +00:00
85855ef596 chore(rag): PR #174 회차 1 리뷰 반영
`ask_multi_hop` 의 probe_hits 가 gate 검사 후 throw away 되는 의도
명시 — pool 초기값으로 재사용 안 하는 *invariant clarity* rationale 을
코드 안에 doc. 향후 retrieve cost 가 multi-hop bottleneck 이 될 경우
재검토 hint 도 함께.

검증
- `cargo test -p kebab-rag -j 1 --test multi_hop` 10 모두 통과.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 12:04:53 +00:00
da25ce330b fix(rag): fb-41 PR-7 — multi-hop pre-decompose score-gate (S7 hallucination 회귀 핀)
v0.18 cut 전 fb-41 multi-hop RAG 도그푸딩에서 발견된 **safety regression**
fix. 자세한 도그푸딩 결과는 `tasks/HOTFIXES.md` 의 2026-05-25 fb-41
pre-v0.18 entry + `/build/cache/dogfood-v018/results/SUMMARY.md` 참조.

## 문제 (S7)

Query: `What is the chemical formula of caffeine?` (KB 에 없는 fact).

- Single-pass `kebab ask`: retrieve top score 가 default `rag.score_gate
  = 0.30` 미만 → `refuse_score_gate` → 안전한 refusal.
- Multi-hop `kebab ask --multi-hop`: **`grounded = true`**, 본문
  `"카페인의 화학식은 C₉H₁₅N₃O 입니다 [#6]"` (hallucination — 실제
  C₈H₁₀N₄O₂) + `[#6]` 가 Adam optimizer chunk 의 `g_t = ∂L/∂θ_i` 본문을
  인용 (시각적 short structured token 매칭 trigger).

원인: `ask_multi_hop` 의 score-gate 검사가 *pool 의 top_score* 만 봤다.
multi-hop 의 pool 은 5 sub-queries 의 union — 한 sub-query 의 top score
가 gate 위면 다른 chunks 가 원본 query 와 무관해도 gate 통과 + synth →
LLM hallucinate.

## Fix

`ask_multi_hop` entry 에 **pre-decompose probe** 추가:

1. *원본 query* 로 retrieve 한 번 (LLM call 0회, ~ms).
2. probe empty → `refuse_no_chunks(None)` (decompose 안 함, hops=None).
3. probe top_score < gate → `refuse_score_gate(None)` (decompose 안 함).
4. probe pass → 기존 decompose / decide / synthesize flow 그대로.

Multi-hop 의 safety floor 가 single-pass 와 정확히 일치 — multi-hop 은
*원본 query 가 이미 KB 범위 내* 일 때만 cross-doc reasoning 추가.

비용: 한 번의 retrieve (수 ms), LLM call 없음. multi-hop 의 LLM-dominated
latency 대비 무시 가능.

## Tests

신규 3 회귀 핀 (`crates/kebab-rag/tests/multi_hop.rs`):

- `multi_hop_below_probe_gate_refuses_before_any_llm_call` — **S7 직접
  회귀 핀**. low-score chunk + empty LM script → score_gate refusal, LM
  calls 0회, hops=None. fix revert 시 즉시 panic.
- `multi_hop_empty_probe_pool_refuses_before_any_llm_call` — empty
  retrieve 시 NoChunks refusal, LM calls 0회.
- `multi_hop_above_probe_gate_proceeds_to_decompose` — probe pass 시
  full multi-hop flow 정상 (decompose + decide + synth).

기존 7 multi-hop test 의 `ScriptedRetriever` 에 *probe-pass entry*
prepend + `retriever_handle.calls()` expectation +1. test 2 / test 4
처럼 entry 두 개였던 곳도 prepend (3 entries).

`multi_hop_refuse_no_chunks_preserves_hops_trace` /
`multi_hop_refuse_score_gate_preserves_hops_trace` 의 의미 좁힘 — 이제
*decompose-driven* refusal (probe pass 후 sub-query retrieve 가 empty
또는 below-gate) 만 검증. *probe-driven* refusal 은 hops=None
(decompose 안 함) — 신규 test 가 그 path 핀.

## 검증

- `cargo test -p kebab-rag -j 1` — 10 multi-hop (7 갱신 + 3 신규) + 19
  pipeline + 31 unit + 3 prompt_template + 3 streaming 모두 통과. 회귀
  없음.
- `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean.
- 단일 crate 직렬 build (16 GB RAM 제약).

## 변경 없음

- Wire schema — `Answer.hops` shape 동일, `refusal_reason` enum 동일.
- 다른 도그푸딩 발견 (synthesize citation 일관성, latency, binary path
  confusion) — v0.18.1 또는 별 PR 의 책임. HOTFIXES 의 "다른 도그푸딩
  발견" 절에 명시.

## 다음

PR-7 머지 후:
1. Workspace `Cargo.toml` version 0.17.2 → 0.18.0 (minor bump).
2. HANDOFF.md / INDEX.md 갱신 + frozen design §3.8 multi-hop sub-section.
3. `gitea-release v0.18.0 --auto-notes`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 12:02:11 +00:00
5bfea3c28b Merge pull request 'feat(tui): fb-41 PR-6 TUI Ask multi-hop toggle + hop trace summary' (#173) from feat/fb-41-pr-6-tui-multi-hop-toggle into main 2026-05-25 09:30:06 +00:00
b6756f8ce3 chore(tui): PR #173 회차 1 리뷰 반영
test `spawn_snapshot_multi_hop_into_askopts` →
`ask_state_multi_hop_field_default_false_and_round_trips` 로 rename.
이전 이름은 spawn 동작 검증을 약속했으나 본문은 단순 field
default + setter round-trip 만 검증 — name 과 실제 의도의 mismatch.
새 이름이 실제 검증 (field shape pin) 과 정확히 일치.

doc string 도 spawn 동작은 별 path (live dogfood) 로 검증된다고
명확히 표기 — test 의 책임 범위가 무엇인지 reader 가 즉시 파악.

검증
- `cargo test -p kebab-tui -j 1 --test ask` — 42 test (6 multi-hop
  포함) 모두 통과.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:29:36 +00:00
016f380428 feat(tui): fb-41 PR-6 — TUI Ask multi-hop toggle + hop trace summary
fb-41 multi-hop RAG 의 **마지막 component PR** (PR-5 머지 직후). TUI Ask
패널의 user-facing surface — F2 toggle, multi-hop badge, status panel
의 hop count summary, cheatsheet 안내. v0.18.0 cut 준비.

설계: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md
계획: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-6 단락)

## TUI surface

- `crates/kebab-tui/src/app.rs`:
  - `AskState.multi_hop: bool` field + Default false. 사용자 토글
    상태를 인-패널 보존, 대화 history 와 직교 — F2 flipping mid-
    conversation 도 turns 보존 (다음 turn 만 다른 pipeline 으로 route).
- `crates/kebab-tui/src/ask.rs`:
  - `handle_key_ask` 에 `(KeyCode::F(2), _) → s.multi_hop = !s.multi_hop`.
    Mode-agnostic (physical function key — Normal/Insert 양쪽 작동, typing
    ambiguity 없음). Briefing 의 candidate (F2 vs Ctrl-T) 중 F2 채택 —
    Ctrl-M 은 Enter 와 collision 이미 명시, F2 가 cleanest.
  - `spawn_ask_worker` 의 `AskOpts.multi_hop` 가 spawn 시점에 토글값
    snapshot. 이후 F2 flip 은 다음 Enter 부터 적용 (in-flight turn 무영향).
  - `render_input` 의 input pane title 에 `F2=multi-hop` binding 안내
    추가 + prompt row 에 `multi-hop` badge (Success 녹색, toggled-on 일
    때만). 사용자가 어떤 pipeline 으로 다음 query 를 보낼지 항상 가시.
  - `render_status` 의 status panel 에 `multi-hop: N hops` line 추가
    (last_answer.hops 가 Some 일 때만). forced_stop 발생 시
    `forced_stop=K` suffix — depth/pool cap tuning 단서.
- `crates/kebab-tui/src/cheatsheet.rs`:
  - Ask section 에 `F2 toggle multi-hop pipeline` entry 추가.

## 변경 없음 (의도된 deferral)

- `InspectTarget::Hop(turn_index)` variant — plan 의 PR-6 stretch goal.
  per-iter hop trace detail 을 Inspect 패널에 노출하는 기능은 별 PR
  (PR-6b 또는 v0.18 dogfood follow-up). PR-6 의 핵심 가치
  (사용자가 multi-hop pipeline 을 토글하고 결과의 hop count 를 본다)
  는 status panel 의 한 줄 summary 로 100% cover. Inspect 진입은
  multi-hop 사용자가 *드물게* 필요한 surface — v0.18 cut 부담 회피.
- prompt_template_version (`rag-multi-hop-v1`) — 그대로.
- MCP / CLI surface — PR-4 / PR-5 의 책임.

## Tests (`tests/ask.rs` 신규 6 multi-hop pins)

- `f2_toggles_multi_hop_flag_from_insert_mode`: Insert 에서 F2 toggle
  (fresh_app default mode).
- `f2_toggles_multi_hop_flag_from_normal_mode`: Normal 에서도 동일
  — mode-agnostic 회귀 핀.
- `input_pane_shows_multi_hop_badge_when_toggled_on`: 토글 on 시
  prompt row 에 `multi-hop` 등장 + title 의 `F2=multi-hop` binding
  hint 등장.
- `input_pane_omits_multi_hop_badge_when_toggled_off`: 토글 off 시
  prompt row 의 badge 부재 (title hint 는 유지 — 사용자 discoverability).
- `status_panel_summarizes_hops_when_answer_has_trace`: 3-hop trace
  (Decompose + Decide + Synthesize) → `multi-hop: 3 hops` line.
- `status_panel_omits_hops_summary_for_single_pass`: hops=None → 본문
  에 summary line 부재 (title binding hint 만).
- `spawn_snapshot_multi_hop_into_askopts`: AskState.multi_hop 의
  field shape 회귀 핀 (default false / settable / round-trip).

## 검증

- `cargo test -p kebab-tui -j 1` — 신규 6 multi-hop + 기존 ask /
  search / library / mode / cheatsheet / inspect / status_bar 모두
  통과 (42 ask test + 10 mode + 기타). 회귀 없음.
- `cargo clippy -p kebab-tui --all-targets -j 1 -- -D warnings` clean.
- 단일 crate 직렬 build (16 GB RAM 제약).

## v0.18.0 cut (다음 단계)

- Workspace `Cargo.toml` version 0.17.2 → 0.18.0 (minor — surface
  확장 + new prompt_template_version `rag-multi-hop-v1`).
- HANDOFF.md / HOTFIXES.md / INDEX.md 갱신 (fb-41 entry 정리).
- `gitea-release v0.18.0 --auto-notes`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:26:29 +00:00
bf28a1e4d9 Merge pull request 'feat(mcp): fb-41 PR-5 MCP ask multi_hop arg + SKILL.md 안내' (#172) from feat/fb-41-pr-5-mcp-multi-hop-arg into main 2026-05-25 09:09:06 +00:00
24221826ed chore(mcp): PR #172 회차 1 리뷰 반영
`ask_tool_routes_multi_hop_true_to_decompose_first` 의 error code
검증을 더 견고하게 — `model_unreachable | timeout` 둘 다 accept.
환경 차이 (즉시 ECONNREFUSED vs connect timeout) 가 다른 wire code
로 분류돼도 dispatch divergence 자체 (schema_version=error.v1 +
isError=true vs single-pass 의 answer.v1 grounded=false) 는 동일하게
검증.

검증
- `cargo test -p kebab-mcp -j 1 --test tools_call_ask_multi_hop` 2 통과.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:08:40 +00:00
8a2f7affa6 feat(mcp): fb-41 PR-5 — MCP ask multi_hop arg + SKILL.md 안내
fb-41 multi-hop RAG 의 **PR-5** (PR-4 머지 직후). PR-4 의 CLI `--multi-hop`
flag 와 sister surface — agent (Claude Code 등 MCP host) 가 `mcp__kebab__ask`
호출 시 `multi_hop: true` 옵션 사용 가능.

설계: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md
계획: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-5 단락)

## MCP surface

- `crates/kebab-mcp/src/tools/ask.rs`:
  - `AskInput.multi_hop: Option<bool>` 추가. JsonSchema derive 가 tools/list
    에 자동 반영 — agent capability discovery 가 새 필드 인식.
  - `handle()` 가 `AskOpts.multi_hop = input.multi_hop.unwrap_or(false)` —
    기존 caller (필드 누락 / null) 는 single-pass 그대로.
- `crates/kebab-mcp/src/lib.rs` (tools/list):
  - `ask` tool description 에 multi-hop 한 줄 (decompose → retrieve →
    synthesize, 2-5× LLM cost, per-hop trace on Answer.hops).

## SKILL.md 안내

- `integrations/claude-code/kebab/SKILL.md` 의 `mcp__kebab__ask` 절:
  - Input shape JSON 예제에 `multi_hop: false` 추가.
  - Returns 절에 `hops` (multi-hop only) 추가.
  - 신규 bullet (p9-fb-41) — opt-in 조건 / 비용 trade-off / 사용 케이스
    (compound questions / prereq chains / cross-doc reasoning) /
    `Answer.hops` 의 per-hop trace shape / `multi_hop_decompose_failed`
    refusal 처리.

## Tests (`tests/tools_call_ask_multi_hop.rs` 신규, 2 Ollama-free pins)

- `ask_tool_routes_multi_hop_true_to_decompose_first`: dispatch
  divergence 핀. invalid LLM endpoint (`http://127.0.0.1:1`,
  request_timeout_secs=2) 로 force unreachable. multi_hop=true 는
  decompose 먼저 호출 → `error.v1` (code=model_unreachable) /
  isError=true. multi_hop=false (single-pass) 는 empty KB 에서 retrieve
  먼저 → no LLM call → `answer.v1` grounded=false / isError=false. 두
  shape 의 분기가 dispatch 가 실제로 다른 path 로 라우팅됨의 증거.
- `ask_input_schema_advertises_multi_hop_field`: AskInput 의 JsonSchema
  가 `multi_hop` property 노출 — MCP host capability discovery
  (tools/list 의 input schema) 회귀 핀.

기존 `tools_call_ask.rs` 의 AskInput literal 도 `multi_hop: None`
추가 (struct field 추가에 따른 minimal cascade).

## 변경 없음

- `prompt_template_version` (`rag-multi-hop-v1`) — 그대로.
- TUI surface — PR-6 의 책임.
- error.v1 매핑 — PR-4 의 enum reservation 그대로 (no error_wire
  promotion).

## 검증

- `cargo test -p kebab-mcp -j 1` — 신규 tools_call_ask_multi_hop 2 +
  기존 ask / search / bulk_search / fetch / ingest / schema / doctor /
  tools_list / initialize 등 모두 통과 (회귀 없음).
- `cargo clippy -p kebab-mcp --all-targets -j 1 -- -D warnings` clean.
- 단일 crate 직렬 build (16 GB RAM 제약).

## 다음 PR

- PR-6: TUI Ask 패널 multi-hop toggle (F2 / Ctrl-T) + hop trace render +
  cheatsheet 갱신.
- v0.18.0 cut (PR-6 머지 후).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:06:28 +00:00
f28a422f79 Merge pull request 'feat(cli): fb-41 PR-4 CLI --multi-hop flag + answer.v1 / error.v1 wire' (#171) from feat/fb-41-pr-4-cli-multi-hop-flag into main 2026-05-25 08:48:08 +00:00