PR-3b 의 분할 두 번째 PR — PR-3b-i 의 dynamic decide loop 위에서:
1. **ScriptedLm + ScriptedRetriever helper** (kebab-rag tests/common/mod.rs)
per-call 다른 response 반환. decompose / decide×N / synthesize 의 각
LLM call 을 구분하는 다단계 multi-hop 시나리오를 mock-only 로 exercise
가능. `Vec<&str>` / `Vec<Vec<SearchHit>>` 받아 call sequence 순서대로
emit. Send + Sync.
2. **5 multi-hop integration tests** (kebab-rag tests/multi_hop.rs 신규)
- decide_stop_triggers_synthesize: decide [] → 즉시 synthesize
- decide_continue_adds_more_chunks: decide ["q2"] → iter 2 retrieve + pool 확장
- max_depth_force_stops: depth cap → forced_stop + decide LLM call skip
- pool_chunks_dedup_by_chunk_id: 같은 chunk_id 두 sub-query 에서 1 회
- decide_parse_failure_falls_through_to_synthesize: parse fail = graceful
synthesize (refusal 아님, spec §9)
3. **refuse_* helper hops trace 보존** (회차 1 carry-over)
refuse_no_chunks / refuse_score_gate 시그니처에 `hops:
Option<Vec<HopRecord>>` 인자 추가. ask_multi_hop 의 score-gate /
no-chunks refusal 시 누적된 hops 그대로 Answer.hops 에 보존.
single-pass ask 는 None 전달 — wire 변동 없음 (skip_serializing_if).
4. **HopRecord doc 보강** (회차 1 carry-over)
sub_queries 의 per-kind 의미 명시 (Decompose=initial / Decide=next-iter
or empty=stop / Synthesize=always empty). llm_call_ms=0 의 ambiguity
(no call vs 0ms call) doc 명시.
5. **MULTI_HOP_MAX_SUB_QUERIES_DEFAULT → _HARD_CAP rename** (회차 1 carry-over)
const 의 의도 명확화 — config knob `multi_hop_max_sub_queries_per_iter`
(5, prompt-side soft hint) 와 const (10, parse-side hard ceiling)
분리. 두 layer 의 책임 doc 동기화. test 도 rename.
6. **decide guard 단순화 + preview budget doc** (회차 1 carry-over)
parse_decompose_response 의 post-condition (Some=non-empty 보장)
doc 명시. defensive `Some(qs) if !qs.is_empty()` →
`decide_result.unwrap_or_default()` 단순화. decide preview 의
snippet-only path (full chunk text 안 fetch) 의도 doc.
검증
- `cargo test -p kebab-rag -j 1` — 31 unit + 19 pipeline + 5 multi_hop
+ 3 prompt_template + 3 streaming 모두 통과.
- `cargo clippy -p kebab-rag --all-targets -j 1 -- -D warnings` clean.
Spec / plan
- design: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md
- plan: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-3b 단락)
다음 단계 = PR-4 (CLI --multi-hop + wire schema + error_wire).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
211 lines
8.2 KiB
Rust
211 lines
8.2 KiB
Rust
//! Answer + RAG types (§3.8).
|
|
|
|
use serde::{Deserialize, Serialize};
|
|
use time::OffsetDateTime;
|
|
|
|
use crate::citation::Citation;
|
|
use crate::search::SearchMode;
|
|
use crate::versions::PromptTemplateVersion;
|
|
|
|
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
|
|
pub struct Answer {
|
|
pub answer: String,
|
|
pub citations: Vec<AnswerCitation>,
|
|
pub grounded: bool,
|
|
pub refusal_reason: Option<RefusalReason>,
|
|
pub model: ModelRef,
|
|
pub embedding: Option<ModelRef>,
|
|
pub prompt_template_version: PromptTemplateVersion,
|
|
pub retrieval: AnswerRetrievalSummary,
|
|
pub usage: TokenUsage,
|
|
#[serde(with = "time::serde::rfc3339")]
|
|
pub created_at: OffsetDateTime,
|
|
/// p9-fb-15: same conversation 의 turn 들이 공유. CLI single-shot
|
|
/// (history 없음) / TUI 첫 turn 은 None. blake3 해시 또는 사용자
|
|
/// 명시 (`kebab ask --session <id>`, p9-fb-18).
|
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
|
pub conversation_id: Option<String>,
|
|
/// p9-fb-15: 같은 conversation 안 0-based 순서. 첫 turn = 0. None
|
|
/// 이면 single-shot.
|
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
|
pub turn_index: Option<u32>,
|
|
/// p9-fb-41: multi-hop hop trace. `None` for single-pass asks.
|
|
/// Each entry records one hop (`decompose` / `decide` / `synthesize`)
|
|
/// — the LLM call category, the sub-queries emitted, retrieval
|
|
/// counts, and a `forced_stop` flag for cap-driven termination.
|
|
/// Wire-additive: `answer.v1` schema_version unchanged; consumers
|
|
/// reading older single-pass answers see `hops: None` (or absent).
|
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
|
pub hops: Option<Vec<HopRecord>>,
|
|
}
|
|
|
|
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
|
|
pub struct AnswerCitation {
|
|
pub marker: Option<String>,
|
|
pub citation: Citation,
|
|
/// p9-fb-32: cited doc's `documents.updated_at`.
|
|
#[serde(with = "time::serde::rfc3339")]
|
|
pub indexed_at: OffsetDateTime,
|
|
/// p9-fb-32: server-computed staleness flag per config threshold.
|
|
pub stale: bool,
|
|
}
|
|
|
|
/// p9-fb-15: history 가 prompt 에 들어갈 때의 한 turn. RAG facade 가
|
|
/// `Vec<Turn>` 받아 system + history + retrieval + new question 으로
|
|
/// prompt 빌드. token budget 안에 fit 안 되면 oldest turn 부터 drop
|
|
/// (newest 우선 보존).
|
|
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
|
|
pub struct Turn {
|
|
pub question: String,
|
|
pub answer: String,
|
|
pub citations: Vec<AnswerCitation>,
|
|
#[serde(with = "time::serde::rfc3339")]
|
|
pub created_at: OffsetDateTime,
|
|
}
|
|
|
|
/// p9-fb-41: one entry in [`Answer::hops`] — the per-iteration trace
|
|
/// of a multi-hop ask. The pipeline appends a `HopRecord` per LLM
|
|
/// call (decompose / decide / synthesize) so a `--multi-hop` user
|
|
/// can see what sub-queries the LLM emitted, how many chunks each
|
|
/// hop contributed, whether the iter stopped on the model's own
|
|
/// signal or hit a cap, and the per-hop LLM latency.
|
|
///
|
|
/// Wire-additive — every field uses `#[serde(default)]` where it
|
|
/// could plausibly be omitted by a future schema reader.
|
|
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
|
|
pub struct HopRecord {
|
|
/// 0-based hop index within this ask. `iter=0` is always the
|
|
/// initial decompose call; subsequent iters are decide calls;
|
|
/// the final iter is the synthesize call.
|
|
pub iter: u32,
|
|
pub kind: HopKind,
|
|
/// Sub-queries associated with this hop. The meaning depends on
|
|
/// `kind`:
|
|
///
|
|
/// - [`HopKind::Decompose`]: the initial sub-queries the LLM
|
|
/// broke the original user query into. These drive the
|
|
/// `iter=1` retrieval round.
|
|
/// - [`HopKind::Decide`]: the *new* sub-queries the LLM
|
|
/// emitted to drive the next retrieval round. Empty when the
|
|
/// LLM signalled stop OR when `forced_stop = true` (cap hit
|
|
/// or parse-degraded).
|
|
/// - [`HopKind::Synthesize`]: always empty — the final hop
|
|
/// produces the user-visible answer, not more sub-queries.
|
|
#[serde(default)]
|
|
pub sub_queries: Vec<String>,
|
|
/// Number of *new* chunks the retrieval round contributed to the
|
|
/// pool (dedup'd by `chunk_id` — repeated hits from a previous
|
|
/// iter do not count). `0` for the decompose hop (no retrieval
|
|
/// yet) and the synthesize hop.
|
|
pub context_chunks_added: u32,
|
|
/// `true` when the pipeline cut the iter loop short because a
|
|
/// safety cap fired (`max_depth` / `max_total_sub_queries` /
|
|
/// `max_pool_chunks`) rather than because the LLM signalled
|
|
/// stop. The user-visible answer still reflects all chunks
|
|
/// accumulated up to that point — `forced_stop` is a tracing
|
|
/// signal, not a refusal.
|
|
pub forced_stop: bool,
|
|
/// Wall-clock latency of the LLM call for this hop, in
|
|
/// milliseconds. Useful for cost / latency analysis when a
|
|
/// `kebab eval` run records `Answer.hops`.
|
|
///
|
|
/// `0` is overloaded: it means "no LLM call happened at this
|
|
/// hop" when (a) the hop was a Decide skipped due to
|
|
/// `forced_stop` (depth-cap or pool-cap fired before the LLM
|
|
/// was asked) or (b) the pool was empty before any decide
|
|
/// could run. Treat `0` as "absent or instantaneous" rather
|
|
/// than as a genuine measurement.
|
|
pub llm_call_ms: u32,
|
|
}
|
|
|
|
/// p9-fb-41: which stage of the multi-hop pipeline a [`HopRecord`]
|
|
/// describes. The serde tag matches the wire shape so agents /
|
|
/// CLIs can branch on the snake_case string without referencing
|
|
/// the Rust enum.
|
|
#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)]
|
|
#[serde(rename_all = "snake_case")]
|
|
pub enum HopKind {
|
|
/// First hop — LLM decomposed the user query into sub-queries.
|
|
Decompose,
|
|
/// Subsequent hop — LLM was asked whether more retrieval is
|
|
/// needed and either emitted new sub-queries (`continue`) or
|
|
/// returned an empty array (`stop`).
|
|
Decide,
|
|
/// Terminal hop — LLM produced the final user-visible answer
|
|
/// over the accumulated chunk pool.
|
|
Synthesize,
|
|
}
|
|
|
|
#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)]
|
|
#[serde(rename_all = "snake_case")]
|
|
pub enum RefusalReason {
|
|
ScoreGate,
|
|
LlmSelfJudge,
|
|
NoIndex,
|
|
NoChunks,
|
|
/// p9-fb-15: ask 가 LLM 토큰 stream 도중 cancel 됨. partial answer
|
|
/// 가 채워져 있을 수 있음 (사용자가 본 부분까지). RAG retrieval
|
|
/// 자체는 정상 — 모델 generation 단계에서만 중단.
|
|
LlmStreamAborted,
|
|
/// p9-fb-41: multi-hop pipeline 의 decompose LLM call 이 JSON
|
|
/// parse 가능한 sub-question array 를 반환하지 못함 (parse
|
|
/// error, 빈 응답, 또는 잘못된 형식). retrieval / synthesize
|
|
/// 단계 진입 못 함. CLI / MCP / TUI 가 받는 wire error code
|
|
/// = `"multi_hop_decompose_failed"` (PR-4 의 error_wire 매핑).
|
|
MultiHopDecomposeFailed,
|
|
}
|
|
|
|
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
|
|
pub struct ModelRef {
|
|
pub id: String,
|
|
pub provider: String,
|
|
pub dimensions: Option<usize>,
|
|
}
|
|
|
|
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
|
|
pub struct AnswerRetrievalSummary {
|
|
pub trace_id: TraceId,
|
|
pub mode: SearchMode,
|
|
pub k: usize,
|
|
pub score_gate: f32,
|
|
pub top_score: f32,
|
|
pub chunks_returned: u32,
|
|
pub chunks_used: u32,
|
|
}
|
|
|
|
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
|
|
pub struct TokenUsage {
|
|
pub prompt_tokens: u32,
|
|
pub completion_tokens: u32,
|
|
pub latency_ms: u32,
|
|
}
|
|
|
|
#[derive(Clone, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)]
|
|
pub struct TraceId(pub String);
|
|
|
|
#[cfg(test)]
|
|
mod tests {
|
|
use super::*;
|
|
use crate::asset::WorkspacePath;
|
|
use crate::citation::Citation;
|
|
use time::macros::datetime;
|
|
|
|
#[test]
|
|
fn answer_citation_serializes_indexed_at_and_stale() {
|
|
let ac = AnswerCitation {
|
|
marker: Some("[1]".to_string()),
|
|
citation: Citation::Line {
|
|
path: WorkspacePath::new("a.md".to_string()).unwrap(),
|
|
start: 1,
|
|
end: 1,
|
|
section: None,
|
|
},
|
|
indexed_at: datetime!(2026-05-09 12:00:00 UTC),
|
|
stale: false,
|
|
};
|
|
let v = serde_json::to_value(&ac).unwrap();
|
|
assert_eq!(v["indexed_at"], "2026-05-09T12:00:00Z");
|
|
assert_eq!(v["stale"], false);
|
|
}
|
|
}
|