kebab

Author	SHA1	Message	Date
altair823	11ce7847a1	Merge pull request 'feat(nli): fb-41 PR-9a — kebab-nli crate skeleton + workspace deps' (#176 ) from feat/fb-41-pr-9a-kebab-nli-crate into main Reviewed-on: #176	2026-05-25 21:34:49 +00:00
altair823	1d88dccf8a	chore(nli): PR #176 회차 1 리뷰 반영 - lib.rs::NliScores::faithfulness doc 의 `rag.nli_faithfulness_min` → `rag.nli_threshold` (spec §2.5/§2.6 의 실 config knob 이름 정합). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 21:25:44 +00:00
altair823	1eb0bbecb3	feat(nli): fb-41 PR-9a — kebab-nli crate skeleton + workspace deps - 신규 crate kebab-nli (trait + impl 동일 crate, v0.18 scope = ONNX adapter 1개). - NliVerifier trait + NliScores struct (XNLI 3-channel: entailment/neutral/contradiction). - private softmax3 (log-sum-exp 안전). - OnnxNliVerifier placeholder (PR-9b 가 ONNX inference + model download 추가). - workspace.dependencies 추가: ort 2.0-rc.9, tokenizers 0.21 (default-features=false, onig), hf-hub 0.4, ndarray 0.16. Pre-flight (PR-9 design contract 의 gate): - HF Xenova/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 model.onnx + tokenizer.json → HTTP/2 302 (HF S3 routing, file 존재). - tokenizers --no-default-features -F onig 의 standalone repro: SentencePiece mDeBERTa tokenizer.json 로드 OK (KR 9 tokens / EN 11 tokens 정상 encode). - Cargo features 결정 trace: tokenizers = { default-features = false, features = ["onig"] } lock. Tests: 6 unit (softmax3 정규화 + 불변성 + XNLI logits 변환 + faithfulness + new + score stub) — 통과. Verification: cargo test -p kebab-nli -j 1 (6/6) + cargo clippy -p kebab-nli --all-targets -j 1 -- -D warnings clean. Workspace: cargo test --workspace -j 1 — pre-existing kebab-mcp::tools_call_ask_multi_hop 1 fail (main baseline 동일 fail, PR-9a 무관 — ingest fixture/Ollama 의존 flaky). Wire 영향: 없음 (crate 도입만). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 21:22:38 +00:00
altair823	44fbffff26	docs(rag): fb-41 PR-9 spec + plan — NLI verification + v0.18.0 cut fb-41 multi-hop RAG 의 dogfood S7 hallucination root cause = LLM-self-judge ceiling. 대응 = NLI-based post-synthesis verification (mDeBERTa-v3 XNLI, 280 MB ONNX). 산출물: - docs/superpowers/specs/2026-05-25-p9-fb-41-finalize-spec.md (review_round=5, 4 OMC reviewer APPROVE: 1 CRITICAL + 9 MAJOR + 3 MINOR → 1 NIT carry-forward). - docs/superpowers/plans/2026-05-25-p9-fb-41-finalize-plan.md (plan_review_round=3, 4 OMC reviewer APPROVE: 15 issues → 0 actionable). 5 sub-PR (PR-9a~9d) + cut PR. 작업 21-31h / wall time 28-44h. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 21:22:20 +00:00
altair823	63aece3ea1	Merge pull request 'fix(rag): fb-41 PR-8 multi-hop synthesize safety in depth (pool 15 + self-check rule)' (#175 ) from feat/fb-41-pr-8-multi-hop-synthesize-safety into main	2026-05-25 12:51:46 +00:00
altair823	28a8bbeace	chore(rag): PR #175 회차 1 리뷰 반영 HOTFIXES.md 의 fb-41 entry 에 post-PR-7 dogfood retest + PR-8 partial mitigation sub-section 추가 + PR-9 NLI plan anchor + 사용자 영향 절 갱신. config.rs 의 doc reference 가 정확한 entry sub-section 가리키도록 조정 — dangling reference 해소. 검증 - `cargo test -p kebab-config -j 1` — 모든 test 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:51:15 +00:00
altair823	52a97303dc	fix(rag): fb-41 PR-8 — multi-hop synthesize safety in depth (pool 15 + self-check rule) v0.18 cut 전 fb-41 multi-hop RAG layered defense — PR-7 의 pre-decompose probe gate 위에 추가 safety. PR-7 의 fix 만으로는 hybrid mode 의 RRF top_score 가 gate 통과 시 (도그푸딩 S7 의 caffeine query) hallucination 여전히 발생 — synthesize 단계 자체의 safety 보강 필요. 중요: 본 PR 만으로는 S7 hallucination 완전 차단 안 됨 (gemma3:4b 의 prompt-following 한계 — 추가 dogfood S7 retest 에서 확인). 진짜 fix 는 PR-9 (NLI-based post-synthesis verification). PR-8 은 그 사이의 partial mitigation + safety in depth — latency 4× 개선 (614s → 158s) + future larger LLM 용 prompt rule. 설계: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md 계획: /build/cache/dogfood-v018/results/PR-9-DESIGN.md (사용자 결정 후 spec/plan 으로 promotion) ## 변경 - `crates/kebab-config/src/lib.rs`: - `RagCfg::multi_hop_max_pool_chunks` default 30 → 15. - rationale doc — gemma3:4b 가 30-chunk large prompt 에서 citation rule 잃는 측정 결과. - 2 unit test (`default_` rename + `legacy_` assert) 갱신. - `crates/kebab-rag/src/pipeline.rs`: - `MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT` 에 답하기 전 self-check rule 추가 — "[원본 질문] 의 핵심 entity (고유명사, 화학식, 수치 단위, 코드명, 약자) 가 [근거] 본문에 literal 으로 등장하지 않으면 다른 entity 의 정보로 답을 합성하지 말고 '근거가 부족하다' 답한다". example (caffeine + Adam optimizer chunk) 도 명시. ## 도그푸딩 결과 (retest with PR-7 + PR-8) \| query \| path \| grounded \| latency \| answer \| \|---\|---\|---\|---\|---\| \| caffeine formula \| single-pass \| false (LlmSelfJudge) \| 30s \| "근거가 부족하다" ✓ \| \| caffeine formula \| multi-hop pre-fix \| true ✗ \| 141s \| hallucination \| \| caffeine formula \| multi-hop PR-7 \| true ✗ \| 143s \| hallucination (probe gate top_score 0.5 > 0.30) \| \| caffeine formula \| multi-hop PR-8 \| true ✗ \| 158s \| hallucination (LLM 가 새 rule 무시) — latency 4× 개선 \| PR-8 의 부분 성과: - pool 30→15 로 synthesize prompt size ↓ → latency 614s → 158s. - prompt rule 은 future larger LLM (gemma2:9b, qwen2.5:7b 등) 에서 가치 ↑. PR-8 의 한계: - gemma3:4b 의 prompt-following 한계 — strong rule 도 무시하고 다른 entity chunk (Adam optimizer formula) 의 본문을 caffeine 화학식 출처로 인용. - LLM-self-judge 기반 safety 의 ceiling. ## 진짜 fix → PR-9 (별 PR) 학계 / industry 표준 검색 결과 (Self-RAG, CRAG, Auto-GDA, MedTrust-RAG): deterministic post-synthesis verification 이 정답 path. NLI-based groundedness check — mDeBERTa-v3-base-xnli (280 MB multilingual) ONNX model 이 (premise=packed_chunks, hypothesis=answer) entailment 검사. score < 0.5 면 refuse. PR-8 위에 layered defense. ## 검증 - `cargo test -p kebab-config -p kebab-rag -j 1` — 모든 test 통과 (config default test 2개 갱신, rag tests 영향 없음). - `cargo clippy -p kebab-config -p kebab-rag --all-targets -j 1 -- -D warnings` clean. - 단일 crate 직렬 build (16 GB RAM 제약). - S7 dogfood retest — hallucination 여전 (PR 본문에 정직 명시). ## 변경 없음 - Wire schema — additive (config knob default 만 변경). - PR-7 의 probe gate — 그대로 작동 (gate 통과 시 PR-8 의 추가 safety layer). - 다른 도그푸딩 P1 항목 (citation 일관성, binary path) — 별 PR. ## 다음 - PR-9a/b/c: NLI-based post-synthesis verification — 진짜 fix. - PR-9 머지 후 dogfood S7 재검증 (예상: refuse + nli_score < 0.5). - v0.18.0 cut. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:44:31 +00:00
altair823	71fb2cbcb3	Merge pull request 'fix(rag): fb-41 PR-7 multi-hop pre-decompose score-gate (S7 hallucination 회귀 핀)' (#174 ) from feat/fb-41-pr-7-multi-hop-score-gate-fix into main	2026-05-25 12:05:23 +00:00
altair823	85855ef596	chore(rag): PR #174 회차 1 리뷰 반영 `ask_multi_hop` 의 probe_hits 가 gate 검사 후 throw away 되는 의도 명시 — pool 초기값으로 재사용 안 하는 invariant clarity rationale 을 코드 안에 doc. 향후 retrieve cost 가 multi-hop bottleneck 이 될 경우 재검토 hint 도 함께. 검증 - `cargo test -p kebab-rag -j 1 --test multi_hop` 10 모두 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:04:53 +00:00
altair823	da25ce330b	fix(rag): fb-41 PR-7 — multi-hop pre-decompose score-gate (S7 hallucination 회귀 핀) v0.18 cut 전 fb-41 multi-hop RAG 도그푸딩에서 발견된 safety regression fix. 자세한 도그푸딩 결과는 `tasks/HOTFIXES.md` 의 2026-05-25 fb-41 pre-v0.18 entry + `/build/cache/dogfood-v018/results/SUMMARY.md` 참조. ## 문제 (S7) Query: `What is the chemical formula of caffeine?` (KB 에 없는 fact). - Single-pass `kebab ask`: retrieve top score 가 default `rag.score_gate = 0.30` 미만 → `refuse_score_gate` → 안전한 refusal. - Multi-hop `kebab ask --multi-hop`: `grounded = true`, 본문 `"카페인의 화학식은 C₉H₁₅N₃O 입니다 [#6]"` (hallucination — 실제 C₈H₁₀N₄O₂) + `[#6]` 가 Adam optimizer chunk 의 `g_t = ∂L/∂θ_i` 본문을 인용 (시각적 short structured token 매칭 trigger). 원인: `ask_multi_hop` 의 score-gate 검사가 pool 의 top_score 만 봤다. multi-hop 의 pool 은 5 sub-queries 의 union — 한 sub-query 의 top score 가 gate 위면 다른 chunks 가 원본 query 와 무관해도 gate 통과 + synth → LLM hallucinate. ## Fix `ask_multi_hop` entry 에 pre-decompose probe 추가: 1. 원본 query 로 retrieve 한 번 (LLM call 0회, ~ms). 2. probe empty → `refuse_no_chunks(None)` (decompose 안 함, hops=None). 3. probe top_score < gate → `refuse_score_gate(None)` (decompose 안 함). 4. probe pass → 기존 decompose / decide / synthesize flow 그대로. Multi-hop 의 safety floor 가 single-pass 와 정확히 일치 — multi-hop 은 원본 query 가 이미 KB 범위 내 일 때만 cross-doc reasoning 추가. 비용: 한 번의 retrieve (수 ms), LLM call 없음. multi-hop 의 LLM-dominated latency 대비 무시 가능. ## Tests 신규 3 회귀 핀 (`crates/kebab-rag/tests/multi_hop.rs`): - `multi_hop_below_probe_gate_refuses_before_any_llm_call` — S7 직접 회귀 핀. low-score chunk + empty LM script → score_gate refusal, LM calls 0회, hops=None. fix revert 시 즉시 panic. - `multi_hop_empty_probe_pool_refuses_before_any_llm_call` — empty retrieve 시 NoChunks refusal, LM calls 0회. - `multi_hop_above_probe_gate_proceeds_to_decompose` — probe pass 시 full multi-hop flow 정상 (decompose + decide + synth). 기존 7 multi-hop test 의 `ScriptedRetriever` 에 probe-pass entry prepend + `retriever_handle.calls()` expectation +1. test 2 / test 4 처럼 entry 두 개였던 곳도 prepend (3 entries). `multi_hop_refuse_no_chunks_preserves_hops_trace` / `multi_hop_refuse_score_gate_preserves_hops_trace` 의 의미 좁힘 — 이제 decompose-driven refusal (probe pass 후 sub-query retrieve 가 empty 또는 below-gate) 만 검증. probe-driven refusal 은 hops=None (decompose 안 함) — 신규 test 가 그 path 핀. ## 검증 - `cargo test -p kebab-rag -j 1` — 10 multi-hop (7 갱신 + 3 신규) + 19 pipeline + 31 unit + 3 prompt_template + 3 streaming 모두 통과. 회귀 없음. - `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean. - 단일 crate 직렬 build (16 GB RAM 제약). ## 변경 없음 - Wire schema — `Answer.hops` shape 동일, `refusal_reason` enum 동일. - 다른 도그푸딩 발견 (synthesize citation 일관성, latency, binary path confusion) — v0.18.1 또는 별 PR 의 책임. HOTFIXES 의 "다른 도그푸딩 발견" 절에 명시. ## 다음 PR-7 머지 후: 1. Workspace `Cargo.toml` version 0.17.2 → 0.18.0 (minor bump). 2. HANDOFF.md / INDEX.md 갱신 + frozen design §3.8 multi-hop sub-section. 3. `gitea-release v0.18.0 --auto-notes`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:02:11 +00:00
altair823	5bfea3c28b	Merge pull request 'feat(tui): fb-41 PR-6 TUI Ask multi-hop toggle + hop trace summary' (#173 ) from feat/fb-41-pr-6-tui-multi-hop-toggle into main	2026-05-25 09:30:06 +00:00
altair823	b6756f8ce3	chore(tui): PR #173 회차 1 리뷰 반영 test `spawn_snapshot_multi_hop_into_askopts` → `ask_state_multi_hop_field_default_false_and_round_trips` 로 rename. 이전 이름은 spawn 동작 검증을 약속했으나 본문은 단순 field default + setter round-trip 만 검증 — name 과 실제 의도의 mismatch. 새 이름이 실제 검증 (field shape pin) 과 정확히 일치. doc string 도 spawn 동작은 별 path (live dogfood) 로 검증된다고 명확히 표기 — test 의 책임 범위가 무엇인지 reader 가 즉시 파악. 검증 - `cargo test -p kebab-tui -j 1 --test ask` — 42 test (6 multi-hop 포함) 모두 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 09:29:36 +00:00
altair823	016f380428	feat(tui): fb-41 PR-6 — TUI Ask multi-hop toggle + hop trace summary fb-41 multi-hop RAG 의 마지막 component PR (PR-5 머지 직후). TUI Ask 패널의 user-facing surface — F2 toggle, multi-hop badge, status panel 의 hop count summary, cheatsheet 안내. v0.18.0 cut 준비. 설계: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md 계획: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-6 단락) ## TUI surface - `crates/kebab-tui/src/app.rs`: - `AskState.multi_hop: bool` field + Default false. 사용자 토글 상태를 인-패널 보존, 대화 history 와 직교 — F2 flipping mid- conversation 도 turns 보존 (다음 turn 만 다른 pipeline 으로 route). - `crates/kebab-tui/src/ask.rs`: - `handle_key_ask` 에 `(KeyCode::F(2), _) → s.multi_hop = !s.multi_hop`. Mode-agnostic (physical function key — Normal/Insert 양쪽 작동, typing ambiguity 없음). Briefing 의 candidate (F2 vs Ctrl-T) 중 F2 채택 — Ctrl-M 은 Enter 와 collision 이미 명시, F2 가 cleanest. - `spawn_ask_worker` 의 `AskOpts.multi_hop` 가 spawn 시점에 토글값 snapshot. 이후 F2 flip 은 다음 Enter 부터 적용 (in-flight turn 무영향). - `render_input` 의 input pane title 에 `F2=multi-hop` binding 안내 추가 + prompt row 에 `multi-hop` badge (Success 녹색, toggled-on 일 때만). 사용자가 어떤 pipeline 으로 다음 query 를 보낼지 항상 가시. - `render_status` 의 status panel 에 `multi-hop: N hops` line 추가 (last_answer.hops 가 Some 일 때만). forced_stop 발생 시 `forced_stop=K` suffix — depth/pool cap tuning 단서. - `crates/kebab-tui/src/cheatsheet.rs`: - Ask section 에 `F2 toggle multi-hop pipeline` entry 추가. ## 변경 없음 (의도된 deferral) - `InspectTarget::Hop(turn_index)` variant — plan 의 PR-6 stretch goal. per-iter hop trace detail 을 Inspect 패널에 노출하는 기능은 별 PR (PR-6b 또는 v0.18 dogfood follow-up). PR-6 의 핵심 가치 (사용자가 multi-hop pipeline 을 토글하고 결과의 hop count 를 본다) 는 status panel 의 한 줄 summary 로 100% cover. Inspect 진입은 multi-hop 사용자가 드물게 필요한 surface — v0.18 cut 부담 회피. - prompt_template_version (`rag-multi-hop-v1`) — 그대로. - MCP / CLI surface — PR-4 / PR-5 의 책임. ## Tests (`tests/ask.rs` 신규 6 multi-hop pins) - `f2_toggles_multi_hop_flag_from_insert_mode`: Insert 에서 F2 toggle (fresh_app default mode). - `f2_toggles_multi_hop_flag_from_normal_mode`: Normal 에서도 동일 — mode-agnostic 회귀 핀. - `input_pane_shows_multi_hop_badge_when_toggled_on`: 토글 on 시 prompt row 에 `multi-hop` 등장 + title 의 `F2=multi-hop` binding hint 등장. - `input_pane_omits_multi_hop_badge_when_toggled_off`: 토글 off 시 prompt row 의 badge 부재 (title hint 는 유지 — 사용자 discoverability). - `status_panel_summarizes_hops_when_answer_has_trace`: 3-hop trace (Decompose + Decide + Synthesize) → `multi-hop: 3 hops` line. - `status_panel_omits_hops_summary_for_single_pass`: hops=None → 본문 에 summary line 부재 (title binding hint 만). - `spawn_snapshot_multi_hop_into_askopts`: AskState.multi_hop 의 field shape 회귀 핀 (default false / settable / round-trip). ## 검증 - `cargo test -p kebab-tui -j 1` — 신규 6 multi-hop + 기존 ask / search / library / mode / cheatsheet / inspect / status_bar 모두 통과 (42 ask test + 10 mode + 기타). 회귀 없음. - `cargo clippy -p kebab-tui --all-targets -j 1 -- -D warnings` clean. - 단일 crate 직렬 build (16 GB RAM 제약). ## v0.18.0 cut (다음 단계) - Workspace `Cargo.toml` version 0.17.2 → 0.18.0 (minor — surface 확장 + new prompt_template_version `rag-multi-hop-v1`). - HANDOFF.md / HOTFIXES.md / INDEX.md 갱신 (fb-41 entry 정리). - `gitea-release v0.18.0 --auto-notes`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 09:26:29 +00:00
altair823	bf28a1e4d9	Merge pull request 'feat(mcp): fb-41 PR-5 MCP ask multi_hop arg + SKILL.md 안내' (#172 ) from feat/fb-41-pr-5-mcp-multi-hop-arg into main	2026-05-25 09:09:06 +00:00
altair823	24221826ed	chore(mcp): PR #172 회차 1 리뷰 반영 `ask_tool_routes_multi_hop_true_to_decompose_first` 의 error code 검증을 더 견고하게 — `model_unreachable \| timeout` 둘 다 accept. 환경 차이 (즉시 ECONNREFUSED vs connect timeout) 가 다른 wire code 로 분류돼도 dispatch divergence 자체 (schema_version=error.v1 + isError=true vs single-pass 의 answer.v1 grounded=false) 는 동일하게 검증. 검증 - `cargo test -p kebab-mcp -j 1 --test tools_call_ask_multi_hop` 2 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 09:08:40 +00:00
altair823	8a2f7affa6	feat(mcp): fb-41 PR-5 — MCP ask multi_hop arg + SKILL.md 안내 fb-41 multi-hop RAG 의 PR-5 (PR-4 머지 직후). PR-4 의 CLI `--multi-hop` flag 와 sister surface — agent (Claude Code 등 MCP host) 가 `mcp__kebab__ask` 호출 시 `multi_hop: true` 옵션 사용 가능. 설계: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md 계획: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-5 단락) ## MCP surface - `crates/kebab-mcp/src/tools/ask.rs`: - `AskInput.multi_hop: Option<bool>` 추가. JsonSchema derive 가 tools/list 에 자동 반영 — agent capability discovery 가 새 필드 인식. - `handle()` 가 `AskOpts.multi_hop = input.multi_hop.unwrap_or(false)` — 기존 caller (필드 누락 / null) 는 single-pass 그대로. - `crates/kebab-mcp/src/lib.rs` (tools/list): - `ask` tool description 에 multi-hop 한 줄 (decompose → retrieve → synthesize, 2-5× LLM cost, per-hop trace on Answer.hops). ## SKILL.md 안내 - `integrations/claude-code/kebab/SKILL.md` 의 `mcp__kebab__ask` 절: - Input shape JSON 예제에 `multi_hop: false` 추가. - Returns 절에 `hops` (multi-hop only) 추가. - 신규 bullet (p9-fb-41) — opt-in 조건 / 비용 trade-off / 사용 케이스 (compound questions / prereq chains / cross-doc reasoning) / `Answer.hops` 의 per-hop trace shape / `multi_hop_decompose_failed` refusal 처리. ## Tests (`tests/tools_call_ask_multi_hop.rs` 신규, 2 Ollama-free pins) - `ask_tool_routes_multi_hop_true_to_decompose_first`: dispatch divergence 핀. invalid LLM endpoint (`http://127.0.0.1:1`, request_timeout_secs=2) 로 force unreachable. multi_hop=true 는 decompose 먼저 호출 → `error.v1` (code=model_unreachable) / isError=true. multi_hop=false (single-pass) 는 empty KB 에서 retrieve 먼저 → no LLM call → `answer.v1` grounded=false / isError=false. 두 shape 의 분기가 dispatch 가 실제로 다른 path 로 라우팅됨의 증거. - `ask_input_schema_advertises_multi_hop_field`: AskInput 의 JsonSchema 가 `multi_hop` property 노출 — MCP host capability discovery (tools/list 의 input schema) 회귀 핀. 기존 `tools_call_ask.rs` 의 AskInput literal 도 `multi_hop: None` 추가 (struct field 추가에 따른 minimal cascade). ## 변경 없음 - `prompt_template_version` (`rag-multi-hop-v1`) — 그대로. - TUI surface — PR-6 의 책임. - error.v1 매핑 — PR-4 의 enum reservation 그대로 (no error_wire promotion). ## 검증 - `cargo test -p kebab-mcp -j 1` — 신규 tools_call_ask_multi_hop 2 + 기존 ask / search / bulk_search / fetch / ingest / schema / doctor / tools_list / initialize 등 모두 통과 (회귀 없음). - `cargo clippy -p kebab-mcp --all-targets -j 1 -- -D warnings` clean. - 단일 crate 직렬 build (16 GB RAM 제약). ## 다음 PR - PR-6: TUI Ask 패널 multi-hop toggle (F2 / Ctrl-T) + hop trace render + cheatsheet 갱신. - v0.18.0 cut (PR-6 머지 후). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 09:06:28 +00:00
altair823	f28a422f79	Merge pull request 'feat(cli): fb-41 PR-4 CLI --multi-hop flag + answer.v1 / error.v1 wire' (#171 ) from feat/fb-41-pr-4-cli-multi-hop-flag into main	2026-05-25 08:48:08 +00:00
altair823	c56242d04f	chore(cli): PR #171 회차 1 리뷰 반영 `answer.schema.json` 의 `refusal_reason` description 의 PR 번호 정정: `multi_hop_decompose_failed` 도입 시점 = PR-2 (#167, RefusalReason variant + ask_multi_hop decompose-failure 분기). PR-3a (#168) 는 `Answer.hops` field + RagCfg knob 만 — refusal variant 와 무관. 검증 - `cargo test -p kebab-cli -j 1 --test wire_ask_multi_hop` 4 모두 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 08:47:35 +00:00
altair823	17c48a0ee6	feat(cli): fb-41 PR-4 — CLI --multi-hop flag + answer.v1 / error.v1 wire 확장 fb-41 multi-hop RAG 의 PR-4 (PR-3b-ii 의 ScriptedLm + tests 위에서 user-facing CLI surface + JSON Schema 확장). PR-3b-i / PR-3b-ii 의 multi-hop pipeline 을 `kebab ask --multi-hop` 으로 사용자에게 노출. 설계: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md 계획: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-4 단락) ## CLI surface - `kebab ask --multi-hop <query>` — 새 flag (default false). `AskOpts.multi_hop` 로 전달, stream + non-stream 두 callsite 모두 갱신. - `--show-citations` / `--hide-citations` / `--stream` / `--session` 등 기존 flag 와 orthogonal. - `--json` 모드에서 `Answer.hops` 배열이 multi-hop happy path / refusal-with- partial-trace 양쪽 경로에서 노출됨 (PR-3b-i + PR-3b-ii 의 wiring). ## Wire schema 확장 - `docs/wire-schema/v1/answer.schema.json`: - 신규 `hops: array \| null` 필드 (optional, additive). `HopRecord` 의 `$defs` 추가 — `iter` / `kind` (decompose\|decide\|synthesize) / `sub_queries` / `context_chunks_added` / `forced_stop` / `llm_call_ms` 6 필드 + per-field doc. - `refusal_reason` 필드를 `anyOf [enum, null]` 로 명시 — 6 variant (`score_gate`, `llm_self_judge`, `no_index`, `no_chunks`, `llm_stream_aborted`, `multi_hop_decompose_failed`). 이전 schema 는 `type: string\|null` 만 명시 → enum 명시는 agent / consumer 의 strict validate 강화 (additive — 기존 producer 값 모두 enum 안). - `$id` / `schema_version` 변경 없음 — additive minor. - `docs/wire-schema/v1/error.schema.json`: - `code` enum 에 `multi_hop_decompose_failed` 추가. 이는 forward-looking enum extension — 현재 RefusalReason 은 `Answer.refusal_reason` (stdout) 으로만 노출되고 `error.v1` (stderr) 경로 안 거침. 미래 PR 에서 fatal promotion 정책 결정 시 trigger 가능하도록 enum 만 미리 reserve. - details.description 의 per-code 안내에 `multi_hop_decompose_failed: {}` note 추가 — reserved 상태 명시. ## Tests - `crates/kebab-cli/tests/wire_ask_multi_hop.rs` 신규 (4 Ollama-free pins): - `cli_ask_help_advertises_multi_hop_flag`: clap-level smoke, `kebab ask --help` 출력에 `--multi-hop` 등장 확인. - `answer_schema_declares_hops_property_with_hop_record_defs`: `hops` property 존재 + `$defs.HopRecord` 의 `kind` enum 3 variant (decompose/decide/synthesize) 회귀 핀. - `answer_schema_refusal_reason_enum_includes_multi_hop_decompose_failed`: 6 variant 모두 enum 에 존재 — 기존 5 도 함께 핀 (회귀 방지). - `error_schema_code_enum_includes_multi_hop_decompose_failed`: 신규 code enum 확장 + 기존 code (config_invalid / not_indexed / ...) 보존 핀. End-to-end multi-hop ask 의 live Ollama 검증은 후속 `#[ignore]` test 로 (같은 `wire_ask_stale.rs` 패턴). PR-4 의 범위 = clap + schema 정합성 만. ## 변경 없음 - `crates/kebab-app/src/error_wire.rs` — plan 의 "error_wire 매핑" 항목은 현재 RefusalReason 가 `Answer.refusal_reason` 로만 노출 (anyhow chain 안 거침) 라 trigger 가 없음. enum reservation 만으로 충분, 매핑 코드는 dead code 회피. 향후 fatal-promotion 정책 (refusal → error.v1) 결정 시 PR-4b 로 split. - `prompt_template_version` — `rag-multi-hop-v1` 그대로. - TUI / MCP surface — PR-5 / PR-6 에서. ## 검증 - `cargo test -p kebab-cli -j 1` — 모든 test 통과 (신규 wire_ask_multi_hop 4 + 기존 ask / search / schema / ingest / mcp / reset 등 모두). - `cargo clippy -p kebab-cli --all-targets -j 1 -- -D warnings` clean. - 단일 crate 직렬 build (16 GB RAM 제약). ## 다음 PR - PR-5: MCP `ask` tool 의 `multi_hop: bool` argument + `integrations/claude- code/kebab/SKILL.md` 의 ask 절 갱신. - PR-6: TUI Ask 패널 multi-hop toggle (F2 / Ctrl-T) + hop trace render. - v0.18.0 cut (PR-6 머지 후): `Cargo.toml` 0.17.2 → 0.18.0 + HANDOFF / HOTFIXES / INDEX 갱신 + gitea-release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 08:45:01 +00:00
altair823	64a009314c	Merge pull request 'feat(rag): fb-41 PR-3b-ii ScriptedLm + multi-hop tests + refusal hop trace' (#170 ) from feat/fb-41-pr-3b-ii-scripted-lm-tests into main	2026-05-25 08:25:42 +00:00
altair823	ddfe7ba099	chore(rag): PR #170 회차 2 리뷰 반영 test 7 의 `i32_below_gate_chunk` helper rename → `seed_low_score_chunk` + 반환 shape 을 `(chunk_id, doc_id)` tuple 로 확장. `i32` prefix 가 Rust integer 타입과 충돌하던 가독성 문제 해소 + 호출자가 `id32("d_low")` 를 재계산하지 않도록 id 페어를 single source of truth 로 통합. 검증 - `cargo test -p kebab-rag -j 1 --test multi_hop` — 7 모두 통과. - `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 08:24:36 +00:00
altair823	104363a0db	chore(rag): PR #170 회차 1 리뷰 반영 (A) ScriptedLm doc 의 `Arc<Vec<String>>` 표기 → 실제 구현 (`Vec<String>` + `AtomicUsize`, 외부에서 `Arc::new(ScriptedLm::new(...))` 로 wrap) 반영. (B) ScriptedLm::new doc 의 미존재 `with_` builder 언급 제거. (C) refuse path 의 hops 보존 회귀 핀 2 건 추가 (`tests/multi_hop.rs`): - `multi_hop_refuse_no_chunks_preserves_hops_trace`: empty pool → `refuse_no_chunks(Some(hops))` → Answer.hops = Some([Decompose, Decide]). - `multi_hop_refuse_score_gate_preserves_hops_trace`: top score 0.10 < 0.30 gate → `refuse_score_gate(Some(hops))` → 같은 shape. refuse_ widening + ask_multi_hop 의 forwarding wiring 이 reverting 되면 두 test 가 회귀 잡음. (D) test 5 의 redundant `assert_ne!(.., Some(MultiHopDecomposeFailed))` 제거 — `assert_eq!(.., None)` 이미 함의. 메시지에 의도 통합. 검증 - `cargo test -p kebab-rag -j 1 --test multi_hop` — 7 (5+2) 모두 통과. - `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 08:22:58 +00:00
altair823	6188a50c1c	feat(rag): fb-41 PR-3b-ii — ScriptedLm + 5 multi-hop tests + refusal hop trace + carry-over PR-3b 의 분할 두 번째 PR — PR-3b-i 의 dynamic decide loop 위에서: 1. ScriptedLm + ScriptedRetriever helper (kebab-rag tests/common/mod.rs) per-call 다른 response 반환. decompose / decide×N / synthesize 의 각 LLM call 을 구분하는 다단계 multi-hop 시나리오를 mock-only 로 exercise 가능. `Vec<&str>` / `Vec<Vec<SearchHit>>` 받아 call sequence 순서대로 emit. Send + Sync. 2. 5 multi-hop integration tests (kebab-rag tests/multi_hop.rs 신규) - decide_stop_triggers_synthesize: decide [] → 즉시 synthesize - decide_continue_adds_more_chunks: decide ["q2"] → iter 2 retrieve + pool 확장 - max_depth_force_stops: depth cap → forced_stop + decide LLM call skip - pool_chunks_dedup_by_chunk_id: 같은 chunk_id 두 sub-query 에서 1 회 - decide_parse_failure_falls_through_to_synthesize: parse fail = graceful synthesize (refusal 아님, spec §9) 3. *refuse_ helper hops trace 보존 (회차 1 carry-over) refuse_no_chunks / refuse_score_gate 시그니처에 `hops: Option<Vec<HopRecord>>` 인자 추가. ask_multi_hop 의 score-gate / no-chunks refusal 시 누적된 hops 그대로 Answer.hops 에 보존. single-pass ask 는 None 전달 — wire 변동 없음 (skip_serializing_if). 4. HopRecord doc 보강 (회차 1 carry-over) sub_queries 의 per-kind 의미 명시 (Decompose=initial / Decide=next-iter or empty=stop / Synthesize=always empty). llm_call_ms=0 의 ambiguity (no call vs 0ms call) doc 명시. 5. MULTI_HOP_MAX_SUB_QUERIES_DEFAULT → _HARD_CAP rename (회차 1 carry-over) const 의 의도 명확화 — config knob `multi_hop_max_sub_queries_per_iter` (5, prompt-side soft hint) 와 const (10, parse-side hard ceiling) 분리. 두 layer 의 책임 doc 동기화. test 도 rename. 6. decide guard 단순화 + preview budget doc** (회차 1 carry-over) parse_decompose_response 의 post-condition (Some=non-empty 보장) doc 명시. defensive `Some(qs) if !qs.is_empty()` → `decide_result.unwrap_or_default()` 단순화. decide preview 의 snippet-only path (full chunk text 안 fetch) 의도 doc. 검증 - `cargo test -p kebab-rag -j 1` — 31 unit + 19 pipeline + 5 multi_hop + 3 prompt_template + 3 streaming 모두 통과. - `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean. Spec / plan - design: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md - plan: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-3b 단락) 다음 단계 = PR-4 (CLI --multi-hop + wire schema + error_wire). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 08:17:37 +00:00
altair823	94e6146013	Merge pull request 'feat(rag): fb-41 PR-3b-i dynamic decide loop + helpers' (#169 ) from feat/fb-41-pr-3b-decide-loop into main	2026-05-25 07:32:25 +00:00
altair823	12c7dc9efb	feat(rag): fb-41 PR-3b-i — dynamic decide loop + helpers + format! named arg PR-3b 의 분할 첫 PR. ask_multi_hop 의 fixed depth=2 → dynamic N-hop. ScriptedLm helper + 5+ integration tests (happy-path 통합 검증) 는 PR-3b-ii 분리. 본 PR 의 회귀 핀 = 기존 PR-2 의 2 integration test 통과 (decompose garbage refusal + multi_hop=false single-pass keep). - `RagPipeline::multi_hop_decompose` 시그니처 변경 — `Result< (Option<Vec<String>>, u32)>` (parsed result + LLM call latency_ms). caller (`ask_multi_hop`) 가 hop trace 의 `llm_call_ms` stamp. - `RagPipeline::multi_hop_decide` helper 신규. decide LLM call → `parse_decompose_response` 으로 `Option<Vec<String>>` 반환. None 또는 empty array 가 stop signal (refusal 아닌 graceful degrade). - `MULTI_HOP_DECIDE_SYSTEM_PROMPT` const 신규. - `MULTI_HOP_DECOMPOSE_USER_TEMPLATE` const 제거 + `format!` named arg 사용 (PR-2 회차 1 carry-over fix). compile-time substitution check — 사용자 query 안에 `{max_sub_queries}` literal 있어도 mis-replace 회피. - `ask_multi_hop` 의 §1 (Decompose) + §2 (Retrieve) 영역을 dynamic loop 으로 재작성: - iter 0 = decompose, HopRecord 추가 (kind=Decompose). - iter 1..=max_depth = retrieve current_sub_queries → pool dedup → decide LLM call (forced_stop / pool_cap_hit 시 skip). HopRecord 추가 (kind=Decide, sub_queries=new_sub_queries, context_chunks_added, forced_stop, llm_call_ms). - `max_pool_chunks` 도달 시 `pool_cap_hit = true` → 그 iter 의 HopRecord 가 `forced_stop = true` + decide LLM call skip. - depth 도달 (`iter >= max_depth`) 시 동일하게 forced_stop. - decide parse failure 또는 empty array → loop break (early synthesize, NOT refusal — spec §9 graceful degrade). - §6 (Generate) 시작 시 `synthesize_started: Instant::now()` 별 stamp → §8 Build Answer 직전 `HopRecord { kind=Synthesize, llm_call_ms = synth_ms }` 추가. happy path 의 Answer literal `hops: Some(hops)` 채움 (`hops: None` → `Some(...)` 변경). - doc comment 갱신: "PR-2 scope (fixed depth=2)" → "PR-3b-i scope (dynamic N-hop)". refusal path 의 hops trace 손실 caveat 명시 (PR-3b-ii / follow-up 에서 helper signature 확장 시 해결). 기존 회귀 핀 (PR-2 의 2 integration test): - `ask_multi_hop_dispatches_and_decompose_garbage_refuses`: decompose garbage → RefusalReason::MultiHopDecomposeFailed + 정확히 1 LLM call. PR-3b-i 의 시그니처 변경 후도 통과. - `ask_with_multi_hop_false_keeps_single_pass_path`: 영향 없음. 56 unit + integration test 모두 통과 (kebab-rag). Wire 영향: `Answer.hops` 가 multi-hop happy path 에서 emit. JSON Schema additionalProperties default `true` 라 wire breaking 아님 (PR-3a 의 review 확인). schema.json 명시 갱신은 별 PR (PR-3b-ii 또는 PR-4 의 schema sweep). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 07:29:46 +00:00
altair823	cd1d4fb807	Merge pull request 'feat(rag): fb-41 PR-3a HopRecord wire + RagCfg multi-hop knobs' (#168 ) from feat/fb-41-pr-3-dynamic-decide-loop into main	2026-05-25 07:18:27 +00:00
altair823	7150c376bb	feat(rag): fb-41 PR-3a — HopRecord wire + RagCfg multi-hop knobs PR-3 의 분할 첫 PR. wire additive (HopRecord + HopKind + Answer.hops field) + RagCfg 의 multi_hop_* 3 노브. RAG pipeline 동작 미변경 — 모든 Answer literal 의 `hops = None`. PR-3b (후속) 가 ask_multi_hop 의 happy path 에서 dynamic decide loop 구현 + hops trace 채움. 분할 이유: 원래 PR-3 가 wire + cfg + decide loop + ScriptedLm + helper refactor + 5+ tests 단일 PR 였는데 ~1500 줄 단일 patch 가 review 부담 + 회기 위험 ↑. additive foundation 부터 ship 후 decide loop 별 PR — 사용자 결정 (2026-05-25). - `kebab_core::HopRecord` (iter, kind, sub_queries, context_chunks_added, forced_stop, llm_call_ms) + `HopKind` (Decompose / Decide / Synthesize) — wire-additive shape. - `kebab_core::Answer.hops: Option<Vec<HopRecord>>` — `#[serde(default, skip_serializing_if = "Option::is_none")]`, single-pass / refusal path 는 None, PR-3b 의 multi-hop happy path 가 Some. - `kebab_config::RagCfg` 에 3 신규 노브: - `multi_hop_max_depth: u32` (default 3) - `multi_hop_max_sub_queries_per_iter: u32` (default 5) - `multi_hop_max_pool_chunks: u32` (default 30) 3 모두 `#[serde(default)]` + env override (`KEBAB_RAG_MULTI_HOP_MAX_*`) + legacy parse 핀 (`LEGACY_PRE_TIMEOUT_TOML` 공유). - 9 Answer literal site (pipeline.rs ×6 + kebab-cli + kebab-tui tests + kebab-eval test) 에 `hops: None` 명시 추가. exhaustive field check 가 자동 guard — 빠진 site 시 compile fail. - plan 의 PR-3 단락 → PR-3a / PR-3b 분할 명시 + scope 정정. Tests (163 passing across kebab-config + kebab-core + kebab-rag): - 5 신규 multi-hop knob test (default / env override / legacy parse). - 기존 50+57+31+19+3+3 test 모두 hops:None 추가 후도 통과. Wire 영향: `answer.v1` 의 optional `hops` 필드 — `skip_serializing_ if = None` 이라 single-pass response 에 emit 안 됨. wire breaking 아님, JSON Schema 갱신은 PR-3b 또는 PR-4 (실제 emit 시점). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 07:15:01 +00:00
altair823	6280abf2df	Merge pull request 'feat(rag): fb-41 PR-2 ask_multi_hop skeleton (fixed depth=2)' (#167 ) from feat/fb-41-pr-2-ask-multi-hop-skeleton into main	2026-05-25 06:50:02 +00:00
altair823	192da45dbf	chore(rag): PR #167 회차 1 리뷰 반영 - `parse_decompose_response_drops_partial_empty_keeps_valid` 신규 회귀 핀 — `["", "valid q", " "]` → `["valid q"]` (trim+filter chain 동작 pin). - `multi_hop_decompose` 의 `stop: Vec::new()` 옆 doc comment 추가 — 의도 명시 (instruction-following 모델 기대 + prose 추가 시 MultiHopDecomposeFailed refusal 가 policy). 회차 1 question 의 답변. - plan 의 PR-3 implementation order 에 회차 1 carry-over 추가: 1) ask + ask_multi_hop 의 §4-§9 mirror → 공통 helper 추출, 2) decompose template 의 substitution corner case → format! named arg 으로 교체. 회차 1 의 다른 suggestion (mirror refactor, substitution corner case, history block helper) 는 PR-3 합리적 timing 으로 plan 에 명시 — 회차 2 reply 에 정리. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 06:49:21 +00:00
altair823	cf35f36f88	feat(rag): fb-41 PR-2 — RagPipeline::ask_multi_hop skeleton (fixed depth=2) PR-2 of fb-41 multi-hop RAG. Decompose + retrieve + synthesize 3-stage pipeline가 `opts.multi_hop=true` 일 때 dispatch. Dynamic decide loop 는 PR-3. - `AskOpts.multi_hop: bool` 필드 추가 + `impl Default for AskOpts` 도입 (HOTFIXES 2026-05-07 의 known limitation 해소). 9 explicit init site 모두 `multi_hop: false` 추가 — Default 도입으로 향후 `..Default::default()` 점진 migrate 가능. - `RagPipeline::ask` 의 entry 에 dispatcher 한 줄 (`if opts.multi_hop { return self.ask_multi_hop(...) }`). - `RagPipeline::ask_multi_hop` 신규 method. 1) decompose LLM call → JSON array of strings parse, 2) 각 sub-query 로 retrieve + chunk_id dedup pool, 3) score gate / no-chunks 가드, 4) pack_context (single-pass 와 helper 공유), 5) synthesize LLM call w/ MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT, 6) citation extract + Answer build. `prompt_template_version` = "rag-multi-hop-v1" 로 stamp — eval `compare` 가 single-pass vs multi-hop 분리. - Prompt const 신규: MULTI_HOP_DECOMPOSE_SYSTEM_PROMPT + MULTI_HOP_DECOMPOSE_USER_TEMPLATE + MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT + PROMPT_TEMPLATE_VERSION_MULTI_HOP + MULTI_HOP_MAX_SUB_QUERIES_DEFAULT. - `kebab_core::RefusalReason::MultiHopDecomposeFailed` variant 신규. Cascade: kebab-store-sqlite `refusal_reason_label` + kebab-tui `ask refusal render` exhaustive match 갱신. - `parse_decompose_response` + `strip_markdown_json_fence` helper — markdown code fence (```json / ```) strip + JSON array of strings parse + trim + drop empty + cap at MULTI_HOP_MAX_SUB_QUERIES_DEFAULT. None 반환 시 caller 가 `MultiHopDecomposeFailed` refusal. Tests (55 passing total, 8 신규): - 6 unit (parse_decompose_response 의 bare array / fence variants / garbage / cap / trim 회귀 핀). - 2 integration: `ask_multi_hop_dispatches_and_decompose_garbage_refuses` (decompose garbage → MultiHopDecomposeFailed + 정확히 1 LLM call) + `ask_with_multi_hop_false_keeps_single_pass_path` (회귀 핀, 기존 caller 자동 backwards-compat). Happy-path multi-hop (decompose 성공 → synthesize) 의 integration test 는 ScriptedLm helper 가 PR-3 의 decide loop 와 함께 도입될 때 같이 추가. 현 `MockLanguageModel` 는 canned single response 라 2-LLM-call sequence 핀 불가. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 06:45:32 +00:00
altair823	ed34f2e03f	Merge pull request 'feat(eval): fb-41 multi-hop golden set + spec/plan' (#166 ) from feat/fb-41-multi-hop-eval-golden into main	2026-05-25 06:27:06 +00:00
altair823	624b44c46b	chore(eval): PR #166 회차 1 리뷰 반영 - `mh-s-004` 의 `must_contain: ["i"]` 한 글자 → `["INSERT", "i 입력모드"]` 보강. trigram 0-hit + noise 매칭 위험 해소. - 3 question 영어 변경 (`mh-c-005` / `mh-i-001` / `mh-s-002`) — fixture 의 lang 다양성 mix (12 ko + 3 en). 영어 dogfood 시 measurement gap 회피. - plan 의 PR-1 단락이 outdated (kebab-eval crate 미survey 단계 작성 → 실제 PR 와 deviation). actual 변경 명시 + 초안 대비 deviation 명시. 회차 1 의 다른 2 suggestion (mh-c-002 의 `v0.17.2` hard-coded, 15 question / 5-per-bucket 회귀 핀의 frozen size) 은 baseline anchor 의도 적 freeze — 회차 2 reply 에 명시. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 06:26:15 +00:00
altair823	caf690dc72	feat(eval): fb-41 multi-hop golden set + spec/plan PR-1 of fb-41 multi-hop RAG (spec: docs/superpowers/specs/2026-05-25- p9-fb-41-multi-hop-rag-design.md, plan: docs/superpowers/plans/2026- 05-25-p9-fb-41-multi-hop-rag.md). XL 작업의 첫 PR — baseline 측정 anchor 만 추가. RAG pipeline 미변경, fixture file + parse 회귀 핀. 사용자 결정 4 axis (2026-05-25): - approach: query decomposition (LLM 서브-질문) - trigger: explicit `--multi-hop` flag - MVP scope: dynamic N-hop (LLM 이 depth 결정, decompose seed + ReAct-style decide loop hybrid) - eval: multi-hop golden set 먼저 (본 PR) 본 PR: - `fixtures/multi_hop_golden.yaml` 신규. 15 question (5 cross-doc + 5 intra-doc + 5 single-fact negative). 기존 `GoldenQuery` struct 그대로 사용 — 별 loader / type 변경 없음. `expected_chunk_ids` 비어 있어 curator 가 `kebab ingest` 후 채울 수 있는 template 형태. `must_contain` 으로 baseline 측정 가능 (P5-2 metric). - `crates/kebab-eval/tests/loader.rs::loads_multi_hop_golden_fixture` 신규 회귀 핀. fixture parse OK + 15 question + 5/5/5 bucket 분포 + 모든 question 에 must_contain 최소 1 개. baseline 측정 protocol (별 run, commit 에 artifact 안 포함): 1. v0.17.2 binary 로 single-pass `kebab eval run --fixture multi_hop_golden.yaml` 실행 2. P@5, P@10, must_contain pass rate, citation_coverage 캡처 3. PR-3 (dynamic iter 머지) 후 동일 fixture + `multi_hop=true` 로 재실행 → Δ 비교 PR 분할 6 단계 (plan 참조): PR-1 (본 PR — fixture only), PR-2 (RagPipeline::ask_multi_hop fixed depth=2), PR-3 (dynamic iter), PR-4 (CLI flag + wire), PR-5 (MCP + SKILL.md), PR-6 (TUI toggle + trace render). 마지막 PR 후 v0.18.0 cut. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 06:22:08 +00:00
altair823	1640ecf288	chore: bump version 0.17.1 → 0.17.2 v0.17.1 post-dogfood polish cut. 두 PR 묶어 release: - PR #164 — `[image.ocr] request_timeout_secs` 별 노브 (v0.17.1 미진행 closure). LLM 패턴을 OCR 어댑터에 동일 적용, 별 노브로 분리 (OCR vs LLM 의 cold start 패턴 차이로 독립 조절). - PR #165 — `heading_path` FTS5 column filter 로 text-only 매칭 + raw-mode escape hatch (2026-05-24 v0.17.0 trigram entry 의 JSON 노이즈 closure). lexical.rs 가 non-raw 분기 결과를 `text : (<expr>)` 로 wrap, 색인 자체는 V007 verbatim 그대로 유지. raw mode `'heading_path : <token>'` 로 opt-in 가능. 둘 다 additive (옛 config 호환) + re-ingest 불필요. binary 교체만. HANDOFF 한 줄 요약 + 머지 후 결정 절에 v0.17.2 entry 추가. HOTFIXES 의 두 entry anchor 가 `post-v0.17.1 dogfood` → `v0.17.2` 로 갱신. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.17.2	2026-05-25 05:55:50 +00:00
altair823	90e77631a8	Merge pull request 'feat(search): heading_path FTS5 text column filter' (#165 ) from feat/heading-text-column-filter into main	2026-05-25 05:48:22 +00:00
altair823	fa251db48f	chore(search): PR #165 회차 2 리뷰 반영 HOTFIXES entry 의 MCP / agent 가시성 단락이 회차 1 의 SKILL.md 추가 결정과 contradiction (`별도 SKILL.md 갱신 불필요` 잘못된 표기). 갱신 사실 + 새 escape hatch 가 v0.17.0 raw mode pattern 위에 build 됐다는 점 명시. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:45:41 +00:00
altair823	3114c31841	chore(search): PR #165 회차 1 리뷰 반영 - HOTFIXES test 카운트 표기 정정: `9 신규 / 갱신 unit test` 의 산수 ambiguity → `9 unit test (8 갱신 + 1 신규) + 2 신규 통합 test = 11 total` 로 명시. - SKILL.md (Claude Code integration) 의 search 절에 column scoping + heading_path raw-mode escape hatch 안내 한 bullet 추가. 회차 1 의 follow-up suggestion 반영 — heading 검색 의도 agent 가 새 escape hatch `'heading_path : <token>'` 를 발견 가능. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:44:21 +00:00
altair823	271329efbd	feat(search): heading_path FTS5 text column filter (default text-only matching) v0.17.0 trigram tokenizer entry 가 미수정으로 남겨둔 heading_path_json JSON 노이즈 (HOTFIXES 2026-05-24) closure. trigram 이 chunks_fts.heading_path 컬럼 (V002/V007 트리거가 chunks.heading_path_json 그대로 INSERT) 의 JSON 표기 + 안의 path 세그먼트 (app, src) 까지 3-gram 색인해서 query 가 우연히 false positive hit 하는 문제. column filter 채택 — heading 색인 유지 (V007 verbatim 불변), 매칭 대상만 text 컬럼 한정. - build_match_string 가 non-raw 분기에서 combined expression 을 `text : (<expr>)` 로 wrap. FTS5 column filter syntax 가 OR/AND sub-expression 허용. - Raw mode (`'...'`) 는 그대로 — 사용자가 명시 의도로 `'heading_path : agent'` 같은 explicit opt-in 가능 (escape hatch). - 8 기존 build_match_string unit test expected string 갱신 + `build_match_string_raw_mode_preserves_heading_filter` 신규. - `lexical_heading_only_token_does_not_hit_default_mode` 신규 회귀 핀 (heading-only unique token 이 default mode 에서 0 hit). - `lexical_raw_mode_can_opt_into_heading_path_filter` 신규 — 같은 fixture 가 raw mode 로 hit 확인 (escape hatch 동작 핀). 사용자 영향: lexical / hybrid 검색의 본문 precision ↑. recall 변화 없음 (text 본문 token 매칭은 동일). re-ingest 불필요 (FTS query 시점 매칭만 변경). lexical_snapshot_run_1 + hybrid_snapshot 도 fixture regenerate 불필요 (text 본문 매칭 query 라 BM25 동일). HOTFIXES: 2026-05-24 v0.17.0 entry 의 `heading_path_json` 노이즈 항목 closure 표기 + 새 2026-05-25 post-v0.17.1 dogfood entry 추가. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:40:51 +00:00
altair823	f2867540d2	Merge pull request 'feat(ocr): request_timeout_secs config knob' (#164 ) from feat/ocr-timeout-config into main	2026-05-25 05:14:27 +00:00
altair823	e118844256	chore(ocr): PR #164 회차 1 리뷰 반영 - HOTFIXES 헤더 `v0.17.2` (vaporware) → `post-v0.17.1 dogfood` 로 변경, release tag 결정과 무관하게 정확한 anchor. - HOTFIXES caller 수 `6 (5+3)` → `9 call site (6+3)` 으로 정정. - OcrCfg.request_timeout_secs doc 의 edge case 가 LlmCfg sister doc 과 동일한 구체 예제 (`u64::MAX`, `86400`) + reqwest 0.12.x 명시 주석으로 강화. - LLM + OCR 양쪽의 legacy TOML fixture (78 줄 거의 동일) 를 module-level `LEGACY_PRE_TIMEOUT_TOML` const 로 추출. 두 test 가 동일 source 공유 → 옛 schema 가 또 변하면 한 곳만 수정. reqwest::Duration::ZERO fact-check (회차 1 점 5) 는 회차 2 reply 에서 검증 결과 보고. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:13:09 +00:00
altair823	41c5edc517	feat(image.ocr): request_timeout_secs config knob + closure of v0.17.1 미진행 v0.17.1 (PR #162) 가 LLM 쪽 hard-coded 300s 를 [models.llm] request_timeout_secs 로 풀어준 것과 같은 패턴을 OCR 어댑터에 적용. 사용자 결정으로 별 노브 분리 ([image.ocr] request_timeout_secs) — OCR 는 LLM 대비 cold start 패턴이 달라 독립 조절이 편함. - OcrCfg.request_timeout_secs: u64 (serde default 300) - KEBAB_IMAGE_OCR_REQUEST_TIMEOUT_SECS env override - OllamaVisionOcr::build / from_parts 시그니처에 timeout 인자 추가 - REQUEST_TIMEOUT 상수 제거 - 3 신규 unit test (default / env / legacy parse) — LlmCfg 패턴 그대로 - HOTFIXES 2026-05-25 v0.17.1 entry 의 두 미진행 항목 모두 closure (OCR timeout = 본 PR, --stream docs = PR #163 에서 이미 완료) 기존 config / 옛 KB 영향 없음 — 새 필드는 default 로 채워지고 동작도 동일 (300s). vision 모델 cold start 가 길면 env 또는 config 로 늘릴 수 있음. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:06:53 +00:00
altair823	d02149c010	docs(v0.17.1): HANDOFF + INDEX — v0.17.1 cut sync - HANDOFF 한 줄 요약 v0.17.0 → v0.17.1 + release URL 추가 (v0.17.1 cut: PR #162 + #163 한 묶음 안내). - 머지 후 발견 deviation 절: 2026-05-25 v0.17.1 entry 추가. - INDEX P10 Dogfooding Feedback section 하단에 'v0.17.1 post-dogfood polish' subsection 추가 (PR #162, #163 각 한 줄). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 03:35:58 +00:00
altair823	0c69b9621b	chore: bump version 0.17.0 → 0.17.1 v0.17.1 patch release — v0.17.0 post-dogfood follow-up 두 PR 머지 후. - PR #162: [models.llm] request_timeout_secs config + 권장 모델 가이드 - PR #163: sudo 없이 ollama 설치 가이드 + kebab ask --stream UX 권장 둘 다 additive only (config field) + docs only — wire breaking 없음, 기존 사용자 영향 없음. patch bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.17.1	2026-05-25 03:34:12 +00:00
altair823	0d69d85757	Merge pull request 'docs: sudo 없이 ollama 설치 + ask --stream 권장 (v0.17.0 post-dogfood)' (#163 ) from docs/ollama-install-and-stream into main Reviewed-on: #163	2026-05-25 03:26:24 +00:00
altair823	a67300317b	docs(ollama): sudo 없이 설치 가이드 + ask --stream 권장 (v0.17.0 post-dogfood) 확장 도그푸딩에서 사용된 두 패턴을 README + SMOKE 에 옮김. (1) sudo / systemd 없이 격리 디렉토리에 ollama 설치 — tarball 받아 /opt/ollama/{bin,models,logs} 같은 사용자 디렉토리에 풀고 OLLAMA_MODELS env 로 모델 위치 분리. 컨테이너 / WSL2 / 회사 머신 등 root 권한 제약 환경에 유용. 도그푸딩 머신에서 /build/cache/ollama 로 같은 패턴 검증. (2) cold start 가 긴 모델 (8B+ 또는 첫 호출) 은 `kebab ask --stream` 권장 — 동일 inference 시간이라도 progressive 토큰이 5분 timeout 한도 안에서 빠르게 surface 됨. p9-fb-33 의 streaming 경로를 UX 개선 권고로 명시. 코드 변경 없음 — docs only. README + SMOKE 두 군데 동일 패턴 sub-bullet + bash snippet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 03:23:35 +00:00
altair823	abb05ebc23	Merge pull request 'feat: [models.llm] request_timeout_secs config + 권장 모델 가이드' (#162 ) from feat/llm-timeout-config into main Reviewed-on: #162	2026-05-25 03:21:19 +00:00
altair823	26fdc4f344	docs(llm-timeout): 0-as-disable 함정 명시 + HOTFIXES typo + 용어 정리 PR #162 워커 리뷰 반영. - MEDIUM (W2) + LOW (W1): request_timeout_secs = 0 이 reqwest 의 의미상 disable 이 아닌 instant timeout (모든 요청 즉시 실패). LlmCfg field rustdoc + ollama.rs module-level comment + README 세 군데에 명시 + u64::MAX / 86400 같은 large finite 값 권장. - NIT (W1): HOTFIXES 2026-05-25 entry 의 '답변이 인 5분' typo → '답변이 5분' (1자 삭제). - NIT (W1): README + HOTFIXES 의 '확장 도그푸딩' 내부 jargon → '후속 도그푸딩' 으로 통일. 코드 동작 변경 없음 — doc only. cargo test request_timeout 3 PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 03:14:41 +00:00
altair823	3f5e0e6e90	feat(llm): [models.llm] request_timeout_secs config + 권장 모델 가이드 v0.17.0 확장 도그푸딩 (2026-05-25) 에서 발견된 두 가지를 한 PR 에 묶음. (1) llm.generate_stream 의 hard-coded 300s timeout 을 config 노브로 빼냄. 8B+ 모델 (gemma4:e4b 등) 은 CPU only 환경에서 5분 안에 첫 RAG 답변 못 마치고 `error: kb-rag: llm.generate_stream` 으로 떨어지던 문제. - kebab-config::LlmCfg 에 request_timeout_secs: u64 additive 필드 (#[serde(default = "default_llm_request_timeout_secs")] default 300). 옛 config 가 키 누락해도 그대로 파싱 + 동일 동작. - env override KEBAB_MODELS_LLM_REQUEST_TIMEOUT_SECS. - kebab-llm-local::ollama.rs 의 REQUEST_TIMEOUT 상수 제거 → OllamaLanguageModel::new 가 Duration::from_secs( llm.request_timeout_secs) 로 reqwest client 빌드. doc comment 도 동일 갱신. - 신규 unit test 3 — default 300 핀 / env override / legacy config (필드 누락) backward-compat. (2) docs — README 사전 요구 절 + docs/SMOKE.md ollama 안내에 한 단락: CPU only / RAM ≤ 16 GB 환경 ⇒ ≤ 4B Q4 모델 권장 (gemma3:4b / qwen2.5:3b / phi3:mini). 8B+ 시도 시 timeout 패턴 사전 안내. request_timeout_secs 노브 사용법. HOTFIXES 2026-05-25 entry — 위 두 변경 + 미진행 사항 (kebab-parse-image OCR 의 같은 hard-coded 300s 는 scope 외 follow-up 으로 등재 + ask --stream 권장 강조 후속) 기록. workspace cargo test -j 1 + clippy 통과. 코드 변경은 backwards-compat (additive serde field) 라 기존 사용자 영향 없음. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 03:01:03 +00:00
altair823	578a60e3bb	docs(v0.17.0): HANDOFF — version + PR-A/B/C closure entries (R1) - 한 줄 요약 v0.16.1 → v0.17.0 + release notes URL + PR-A/B/C 한 줄 요약. - 머지 후 발견 deviation 절: PR-A 외 PR-B / PR-C 의 2026-05-24 closure entry 추가. - '다음 task 후보' 의 P10 round 2 follow-up 라인: 세 항목 모두 v0.17.0 closure 표시. - 'P10 dogfooding 백로그' 의 chunk_breakdown + C typedef 두 항목도 ✅ v0.17.0 closure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:55:47 +00:00
altair823	64f518e08e	docs(v0.17.0): HANDOFF + INDEX — v0.17.0 cut sync (R1) - HANDOFF 한 줄 요약 v0.16.1 → v0.17.0, release notes URL, PR-A/B/C 셋 한 줄 요약. 머지 후 발견 deviation 절에 PR-B / PR-C closure entry 추가. "다음 task 후보" + "P10 백로그" 의 세 항목 ✅ v0.17.0 closure 표시. - INDEX 의 P10 섹션 하단에 신규 "P10 Dogfooding Feedback (v0.17.0)" subsection — PR-A/B/C 3 항목 listup (Gemini round 2 권장 형식). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:54:39 +00:00

1 2 3 4 5 ...

962 Commits