feat(rag): fb-41 PR-3b-ii — ScriptedLm + 5 multi-hop tests + refusal hop trace + carry-over

PR-3b 의 분할 두 번째 PR — PR-3b-i 의 dynamic decide loop 위에서: 1. **ScriptedLm + ScriptedRetriever helper** (kebab-rag tests/common/mod.rs) per-call 다른 response 반환. decompose / decide×N / synthesize 의 각 LLM call 을 구분하는 다단계 multi-hop 시나리오를 mock-only 로 exercise 가능. `Vec<&str>` / `Vec<Vec<SearchHit>>` 받아 call sequence 순서대로 emit. Send + Sync. 2. **5 multi-hop integration tests** (kebab-rag tests/multi_hop.rs 신규) - decide_stop_triggers_synthesize: decide [] → 즉시 synthesize - decide_continue_adds_more_chunks: decide ["q2"] → iter 2 retrieve + pool 확장 - max_depth_force_stops: depth cap → forced_stop + decide LLM call skip - pool_chunks_dedup_by_chunk_id: 같은 chunk_id 두 sub-query 에서 1 회 - decide_parse_failure_falls_through_to_synthesize: parse fail = graceful synthesize (refusal 아님, spec §9) 3. **refuse_* helper hops trace 보존** (회차 1 carry-over) refuse_no_chunks / refuse_score_gate 시그니처에 `hops: Option<Vec<HopRecord>>` 인자 추가. ask_multi_hop 의 score-gate / no-chunks refusal 시 누적된 hops 그대로 Answer.hops 에 보존. single-pass ask 는 None 전달 — wire 변동 없음 (skip_serializing_if). 4. **HopRecord doc 보강** (회차 1 carry-over) sub_queries 의 per-kind 의미 명시 (Decompose=initial / Decide=next-iter or empty=stop / Synthesize=always empty). llm_call_ms=0 의 ambiguity (no call vs 0ms call) doc 명시. 5. **MULTI_HOP_MAX_SUB_QUERIES_DEFAULT → _HARD_CAP rename** (회차 1 carry-over) const 의 의도 명확화 — config knob `multi_hop_max_sub_queries_per_iter` (5, prompt-side soft hint) 와 const (10, parse-side hard ceiling) 분리. 두 layer 의 책임 doc 동기화. test 도 rename. 6. **decide guard 단순화 + preview budget doc** (회차 1 carry-over) parse_decompose_response 의 post-condition (Some=non-empty 보장) doc 명시. defensive `Some(qs) if !qs.is_empty()` → `decide_result.unwrap_or_default()` 단순화. decide preview 의 snippet-only path (full chunk text 안 fetch) 의도 doc. 검증 - `cargo test -p kebab-rag -j 1` — 31 unit + 19 pipeline + 5 multi_hop + 3 prompt_template + 3 streaming 모두 통과. - `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean. Spec / plan - design: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md - plan: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-3b 단락) 다음 단계 = PR-4 (CLI --multi-hop + wire schema + error_wire). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 08:17:37 +00:00
parent 94e6146013
commit 6188a50c1c
5 changed files with 648 additions and 41 deletions
--- a/crates/kebab-config/src/lib.rs
+++ b/crates/kebab-config/src/lib.rs
@@ -189,9 +189,13 @@ pub struct RagCfg {
    #[serde(default = "default_multi_hop_max_depth")]
    pub multi_hop_max_depth: u32,
    /// p9-fb-41: cap on how many sub-queries the LLM may emit in a
-    /// single decompose / decide call. Mirrors
-    /// [`MULTI_HOP_MAX_SUB_QUERIES_DEFAULT`] in kebab-rag — the
-    /// const is the hard floor while this is the runtime knob.
+    /// single decompose / decide call. This is the *prompt-side
+    /// soft hint* — the value the pipeline injects into the
+    /// decompose / decide prompts so the LLM knows what to aim for.
+    /// kebab-rag enforces a separate compile-time hard ceiling
+    /// (`MULTI_HOP_MAX_SUB_QUERIES_HARD_CAP`, currently 10) as a
+    /// safety net against misbehaving models — if you raise this
+    /// knob above the hard cap, bump the const in the same PR.
    /// Default `5`.
    #[serde(default = "default_multi_hop_max_sub_queries_per_iter")]
    pub multi_hop_max_sub_queries_per_iter: u32,