Files
kebab/crates/kebab-core/src/answer.rs
altair823 2c058ab175 feat(rag): multi-turn ask — Turn struct + ask_with_history + token budget (p9-fb-15)
Spec PR #59 의 §3.8 multi-turn behaviour 구현. RAG facade 가 prior
turns 받아 prompt 에 prepend, retrieval query expansion 적용,
Answer 에 conversation_id / turn_index 채움.

신규 (kebab-core):
- Answer 에 conversation_id (Option<String>) / turn_index (Option<u32>)
  field 추가. serde skip_serializing_if 로 single-shot 의 wire
  output 변경 0 (기존 외부 wrapper 영향 없음).
- Turn struct (question + answer + citations + created_at).
- RefusalReason::LlmStreamAborted variant.

신규 (kebab-rag):
- AskOpts 에 history (Vec<Turn>) / conversation_id / turn_index 3 field.
- AskOpts::single_shot(mode) helper.
- RagPipeline::ask_with_history(query, history, conversation_id,
  turn_index, opts) — combined opts 로 ask 호출.
- expand_query_with_history: history.last() 의 answer 첫 200 자
  concat 해 SearchQuery.text 확장 (spec §3.8 의 \"cheap concat\";
  LLM-based standalone-question rewriting 은 P+).
- serialize_history + remaining_history_budget_chars: spec 의 priority
  enforcement — system+question 필수, retrieved chunks 가 차지한
  뒤 남은 char budget 안에서 newest 우선, oldest drop.
- ask 본문: history 가 비어있지 않으면 [이전 대화] 블록을 user
  prompt 위에 prepend. Answer 생성 site 3 곳 (정상 / NoChunks /
  ScoreGate refuse) 모두 conversation_id / turn_index 채움.

신규 (kebab-store-sqlite):
- refusal_reason_label 가 LlmStreamAborted → 'llm_stream_aborted'.

기존 caller 변경 (single-shot 동작 동일):
- kebab-cli main.rs Cmd::Ask: AskOpts 에 history=Vec::new(),
  conversation_id=None, turn_index=None 명시 (CLI multi-turn 은
  p9-fb-18 의 --session/--repl 가 채움).
- kebab-tui src/ask.rs spawn site 동일 (multi-turn UI 는 p9-fb-16).
- kebab-eval runner.rs golden eval 동일 (single-shot per query).
- kebab-app tests/ask_smoke.rs / kebab-tui tests/ask.rs / kebab-rag
  tests/pipeline.rs / kebab-eval metrics.rs Answer literal 갱신.

Test:
- 9 신규 lib unit (expand_query 4 / serialize_history 3 / remaining_budget 2).
- 기존 12 PASS 회귀 0.

Plan 갱신:
- p9-fb-15 status planned → in_progress. 머지 후 한 줄 commit
  으로 completed flip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:09:46 +00:00

93 lines
3.0 KiB
Rust

//! Answer + RAG types (§3.8).
use serde::{Deserialize, Serialize};
use time::OffsetDateTime;
use crate::citation::Citation;
use crate::search::SearchMode;
use crate::versions::PromptTemplateVersion;
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
pub struct Answer {
pub answer: String,
pub citations: Vec<AnswerCitation>,
pub grounded: bool,
pub refusal_reason: Option<RefusalReason>,
pub model: ModelRef,
pub embedding: Option<ModelRef>,
pub prompt_template_version: PromptTemplateVersion,
pub retrieval: AnswerRetrievalSummary,
pub usage: TokenUsage,
#[serde(with = "time::serde::rfc3339")]
pub created_at: OffsetDateTime,
/// p9-fb-15: same conversation 의 turn 들이 공유. CLI single-shot
/// (history 없음) / TUI 첫 turn 은 None. blake3 해시 또는 사용자
/// 명시 (`kebab ask --session <id>`, p9-fb-18).
#[serde(default, skip_serializing_if = "Option::is_none")]
pub conversation_id: Option<String>,
/// p9-fb-15: 같은 conversation 안 0-based 순서. 첫 turn = 0. None
/// 이면 single-shot.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub turn_index: Option<u32>,
}
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
pub struct AnswerCitation {
pub marker: Option<String>,
pub citation: Citation,
}
/// p9-fb-15: history 가 prompt 에 들어갈 때의 한 turn. RAG facade 가
/// `Vec<Turn>` 받아 system + history + retrieval + new question 으로
/// prompt 빌드. token budget 안에 fit 안 되면 oldest turn 부터 drop
/// (newest 우선 보존).
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
pub struct Turn {
pub question: String,
pub answer: String,
pub citations: Vec<AnswerCitation>,
#[serde(with = "time::serde::rfc3339")]
pub created_at: OffsetDateTime,
}
#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum RefusalReason {
ScoreGate,
LlmSelfJudge,
NoIndex,
NoChunks,
/// p9-fb-15: ask 가 LLM 토큰 stream 도중 cancel 됨. partial answer
/// 가 채워져 있을 수 있음 (사용자가 본 부분까지). RAG retrieval
/// 자체는 정상 — 모델 generation 단계에서만 중단.
LlmStreamAborted,
}
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
pub struct ModelRef {
pub id: String,
pub provider: String,
pub dimensions: Option<usize>,
}
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
pub struct AnswerRetrievalSummary {
pub trace_id: TraceId,
pub mode: SearchMode,
pub k: usize,
pub score_gate: f32,
pub top_score: f32,
pub chunks_returned: u32,
pub chunks_used: u32,
}
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
pub struct TokenUsage {
pub prompt_tokens: u32,
pub completion_tokens: u32,
pub latency_ms: u32,
}
#[derive(Clone, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)]
pub struct TraceId(pub String);