v0.18 cut 전 fb-41 multi-hop RAG 도그푸딩에서 발견된 **safety regression**
fix. 자세한 도그푸딩 결과는 `tasks/HOTFIXES.md` 의 2026-05-25 fb-41
pre-v0.18 entry + `/build/cache/dogfood-v018/results/SUMMARY.md` 참조.
## 문제 (S7)
Query: `What is the chemical formula of caffeine?` (KB 에 없는 fact).
- Single-pass `kebab ask`: retrieve top score 가 default `rag.score_gate
= 0.30` 미만 → `refuse_score_gate` → 안전한 refusal.
- Multi-hop `kebab ask --multi-hop`: **`grounded = true`**, 본문
`"카페인의 화학식은 C₉H₁₅N₃O 입니다 [#6]"` (hallucination — 실제
C₈H₁₀N₄O₂) + `[#6]` 가 Adam optimizer chunk 의 `g_t = ∂L/∂θ_i` 본문을
인용 (시각적 short structured token 매칭 trigger).
원인: `ask_multi_hop` 의 score-gate 검사가 *pool 의 top_score* 만 봤다.
multi-hop 의 pool 은 5 sub-queries 의 union — 한 sub-query 의 top score
가 gate 위면 다른 chunks 가 원본 query 와 무관해도 gate 통과 + synth →
LLM hallucinate.
## Fix
`ask_multi_hop` entry 에 **pre-decompose probe** 추가:
1. *원본 query* 로 retrieve 한 번 (LLM call 0회, ~ms).
2. probe empty → `refuse_no_chunks(None)` (decompose 안 함, hops=None).
3. probe top_score < gate → `refuse_score_gate(None)` (decompose 안 함).
4. probe pass → 기존 decompose / decide / synthesize flow 그대로.
Multi-hop 의 safety floor 가 single-pass 와 정확히 일치 — multi-hop 은
*원본 query 가 이미 KB 범위 내* 일 때만 cross-doc reasoning 추가.
비용: 한 번의 retrieve (수 ms), LLM call 없음. multi-hop 의 LLM-dominated
latency 대비 무시 가능.
## Tests
신규 3 회귀 핀 (`crates/kebab-rag/tests/multi_hop.rs`):
- `multi_hop_below_probe_gate_refuses_before_any_llm_call` — **S7 직접
회귀 핀**. low-score chunk + empty LM script → score_gate refusal, LM
calls 0회, hops=None. fix revert 시 즉시 panic.
- `multi_hop_empty_probe_pool_refuses_before_any_llm_call` — empty
retrieve 시 NoChunks refusal, LM calls 0회.
- `multi_hop_above_probe_gate_proceeds_to_decompose` — probe pass 시
full multi-hop flow 정상 (decompose + decide + synth).
기존 7 multi-hop test 의 `ScriptedRetriever` 에 *probe-pass entry*
prepend + `retriever_handle.calls()` expectation +1. test 2 / test 4
처럼 entry 두 개였던 곳도 prepend (3 entries).
`multi_hop_refuse_no_chunks_preserves_hops_trace` /
`multi_hop_refuse_score_gate_preserves_hops_trace` 의 의미 좁힘 — 이제
*decompose-driven* refusal (probe pass 후 sub-query retrieve 가 empty
또는 below-gate) 만 검증. *probe-driven* refusal 은 hops=None
(decompose 안 함) — 신규 test 가 그 path 핀.
## 검증
- `cargo test -p kebab-rag -j 1` — 10 multi-hop (7 갱신 + 3 신규) + 19
pipeline + 31 unit + 3 prompt_template + 3 streaming 모두 통과. 회귀
없음.
- `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean.
- 단일 crate 직렬 build (16 GB RAM 제약).
## 변경 없음
- Wire schema — `Answer.hops` shape 동일, `refusal_reason` enum 동일.
- 다른 도그푸딩 발견 (synthesize citation 일관성, latency, binary path
confusion) — v0.18.1 또는 별 PR 의 책임. HOTFIXES 의 "다른 도그푸딩
발견" 절에 명시.
## 다음
PR-7 머지 후:
1. Workspace `Cargo.toml` version 0.17.2 → 0.18.0 (minor bump).
2. HANDOFF.md / INDEX.md 갱신 + frozen design §3.8 multi-hop sub-section.
3. `gitea-release v0.18.0 --auto-notes`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test 7 의 `i32_below_gate_chunk` helper rename → `seed_low_score_chunk` +
반환 shape 을 `(chunk_id, doc_id)` tuple 로 확장. `i32` prefix 가 Rust
integer 타입과 충돌하던 가독성 문제 해소 + 호출자가 `id32("d_low")` 를
재계산하지 않도록 id 페어를 single source of truth 로 통합.
검증
- `cargo test -p kebab-rag -j 1 --test multi_hop` — 7 모두 통과.
- `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(A) ScriptedLm doc 의 `Arc<Vec<String>>` 표기 → 실제 구현 (`Vec<String>` +
`AtomicUsize`, 외부에서 `Arc::new(ScriptedLm::new(...))` 로 wrap)
반영.
(B) ScriptedLm::new doc 의 미존재 `with_*` builder 언급 제거.
(C) refuse path 의 hops 보존 회귀 핀 2 건 추가 (`tests/multi_hop.rs`):
- `multi_hop_refuse_no_chunks_preserves_hops_trace`: empty pool →
`refuse_no_chunks(Some(hops))` → Answer.hops = Some([Decompose,
Decide]).
- `multi_hop_refuse_score_gate_preserves_hops_trace`: top score 0.10
< 0.30 gate → `refuse_score_gate(Some(hops))` → 같은 shape.
refuse_* widening + ask_multi_hop 의 forwarding wiring 이 reverting
되면 두 test 가 회귀 잡음.
(D) test 5 의 redundant `assert_ne!(.., Some(MultiHopDecomposeFailed))`
제거 — `assert_eq!(.., None)` 이미 함의. 메시지에 의도 통합.
검증
- `cargo test -p kebab-rag -j 1 --test multi_hop` — 7 (5+2) 모두 통과.
- `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-2 of fb-41 multi-hop RAG. Decompose + retrieve + synthesize 3-stage
pipeline가 `opts.multi_hop=true` 일 때 dispatch. Dynamic decide loop
는 PR-3.
- `AskOpts.multi_hop: bool` 필드 추가 + `impl Default for AskOpts`
도입 (HOTFIXES 2026-05-07 의 known limitation 해소). 9 explicit
init site 모두 `multi_hop: false` 추가 — Default 도입으로 향후
`..Default::default()` 점진 migrate 가능.
- `RagPipeline::ask` 의 entry 에 dispatcher 한 줄
(`if opts.multi_hop { return self.ask_multi_hop(...) }`).
- `RagPipeline::ask_multi_hop` 신규 method. 1) decompose LLM call
→ JSON array of strings parse, 2) 각 sub-query 로 retrieve +
chunk_id dedup pool, 3) score gate / no-chunks 가드, 4)
pack_context (single-pass 와 helper 공유), 5) synthesize LLM
call w/ MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT, 6) citation extract
+ Answer build. `prompt_template_version` = "rag-multi-hop-v1"
로 stamp — eval `compare` 가 single-pass vs multi-hop 분리.
- Prompt const 신규: MULTI_HOP_DECOMPOSE_SYSTEM_PROMPT +
MULTI_HOP_DECOMPOSE_USER_TEMPLATE + MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT
+ PROMPT_TEMPLATE_VERSION_MULTI_HOP + MULTI_HOP_MAX_SUB_QUERIES_DEFAULT.
- `kebab_core::RefusalReason::MultiHopDecomposeFailed` variant 신규.
Cascade: kebab-store-sqlite `refusal_reason_label` + kebab-tui `ask
refusal render` exhaustive match 갱신.
- `parse_decompose_response` + `strip_markdown_json_fence` helper —
markdown code fence (```json / ```) strip + JSON array of strings
parse + trim + drop empty + cap at MULTI_HOP_MAX_SUB_QUERIES_DEFAULT.
None 반환 시 caller 가 `MultiHopDecomposeFailed` refusal.
Tests (55 passing total, 8 신규):
- 6 unit (parse_decompose_response 의 bare array / fence variants /
garbage / cap / trim 회귀 핀).
- 2 integration: `ask_multi_hop_dispatches_and_decompose_garbage_refuses`
(decompose garbage → MultiHopDecomposeFailed + 정확히 1 LLM call) +
`ask_with_multi_hop_false_keeps_single_pass_path` (회귀 핀, 기존
caller 자동 backwards-compat).
Happy-path multi-hop (decompose 성공 → synthesize) 의 integration
test 는 ScriptedLm helper 가 PR-3 의 decide loop 와 함께 도입될
때 같이 추가. 현 `MockLanguageModel` 는 canned single response 라
2-LLM-call sequence 핀 불가.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- pipeline: refresh module docstring step 5 to reflect new cancel
semantics (RetrievalDone/Token/Final + LlmStreamAborted)
- wire schema: spell out refusal-path behavior in answer_event.v1
description (only retrieval_done emitted; no final)
- test: factual comment on relax_score_gate-using test corrected
- test: new Ollama-gated stream_score_gate_refusal_emits_only_retrieval_done
- test: new ask_emits_no_final_when_cancelled_mid_stream pinning
the no-Final invariant on cancel
- pipeline: large_enum_variant comment broadened to acknowledge
RetrievalDone.hits as the dominant per-emit cost
- HOTFIXES: log AskOpts.stream_sink internal API break per spec
contract policy
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RetrievalDone after retrieve+stale-stamp, Token per LM chunk
(SendError → break, FinishReason::Cancelled, RefusalReason::
LlmStreamAborted), Final on success. answers row still persists
on cancel for audit. Adds FinishReason::Cancelled, re-exports
StreamEvent from kebab_rag, migrates two pre-fb-33 sink tests
in tests/pipeline.rs to the new StreamEvent type (the
"dropped receiver does not abort" test inverts to record cancel).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pack_context widened to carry indexed_at + stale alongside marker
and Citation. LLM-citation construction site now plumbs real values
from upstream SearchHit instead of the Task 6 UNIX_EPOCH placeholder.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test helper missed the SearchHit field expansion from fb-32 Task 1.
UNIX_EPOCH + false placeholders consistent with the cross-crate
synthetic-mock pattern (hybrid.rs, vector.rs build_hit Task 4 stub).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spec PR #59 의 §3.8 multi-turn behaviour 구현. RAG facade 가 prior
turns 받아 prompt 에 prepend, retrieval query expansion 적용,
Answer 에 conversation_id / turn_index 채움.
신규 (kebab-core):
- Answer 에 conversation_id (Option<String>) / turn_index (Option<u32>)
field 추가. serde skip_serializing_if 로 single-shot 의 wire
output 변경 0 (기존 외부 wrapper 영향 없음).
- Turn struct (question + answer + citations + created_at).
- RefusalReason::LlmStreamAborted variant.
신규 (kebab-rag):
- AskOpts 에 history (Vec<Turn>) / conversation_id / turn_index 3 field.
- AskOpts::single_shot(mode) helper.
- RagPipeline::ask_with_history(query, history, conversation_id,
turn_index, opts) — combined opts 로 ask 호출.
- expand_query_with_history: history.last() 의 answer 첫 200 자
concat 해 SearchQuery.text 확장 (spec §3.8 의 \"cheap concat\";
LLM-based standalone-question rewriting 은 P+).
- serialize_history + remaining_history_budget_chars: spec 의 priority
enforcement — system+question 필수, retrieved chunks 가 차지한
뒤 남은 char budget 안에서 newest 우선, oldest drop.
- ask 본문: history 가 비어있지 않으면 [이전 대화] 블록을 user
prompt 위에 prepend. Answer 생성 site 3 곳 (정상 / NoChunks /
ScoreGate refuse) 모두 conversation_id / turn_index 채움.
신규 (kebab-store-sqlite):
- refusal_reason_label 가 LlmStreamAborted → 'llm_stream_aborted'.
기존 caller 변경 (single-shot 동작 동일):
- kebab-cli main.rs Cmd::Ask: AskOpts 에 history=Vec::new(),
conversation_id=None, turn_index=None 명시 (CLI multi-turn 은
p9-fb-18 의 --session/--repl 가 채움).
- kebab-tui src/ask.rs spawn site 동일 (multi-turn UI 는 p9-fb-16).
- kebab-eval runner.rs golden eval 동일 (single-shot per query).
- kebab-app tests/ask_smoke.rs / kebab-tui tests/ask.rs / kebab-rag
tests/pipeline.rs / kebab-eval metrics.rs Answer literal 갱신.
Test:
- 9 신규 lib unit (expand_query 4 / serialize_history 3 / remaining_budget 2).
- 기존 12 PASS 회귀 0.
Plan 갱신:
- p9-fb-15 status planned → in_progress. 머지 후 한 줄 commit
으로 completed flip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>