diff --git a/README.md b/README.md index 106c105..b7595a6 100644 --- a/README.md +++ b/README.md @@ -74,6 +74,7 @@ kebab doctor | `kebab search --mode {lexical,vector,hybrid} "" [--no-cache] [--max-tokens N] [--snippet-chars N] [--cursor ]` | 검색. hybrid는 RRF fusion, citation 포함. 같은 process 안에서 동일 query (NFKC + trim + lowercase 정규화) 반복 시 in-process LRU 캐시 hit (capacity = `[search] cache_capacity`, default 256). `--no-cache` 로 강제 bypass — 디버깅용. ingest commit 발생 시 `kv['corpus_revision']` bump 으로 모든 entry 자동 stale. **`--max-tokens` / `--snippet-chars` / `--cursor` (p9-fb-34)** — agent budget controls. `--json` 출력은 `search_response.v1` wrapper (`{hits, next_cursor, truncated}`) — pre-fb-34 의 bare array 와 호환 안 됨. mismatched cursor → `error.v1.code = stale_cursor` | | `kebab list docs` | 색인된 문서 목록 | | `kebab inspect doc ` / `kebab inspect chunk ` | raw record 보기 | +| `kebab fetch chunk [--context N]` / `kebab fetch doc [--max-tokens N]` / `kebab fetch span [--max-tokens N]` | (p9-fb-35) verbatim text fetch from indexed corpus. wire = `fetch_result.v1` (kind discriminator). chunk: target + ±N ordinal-context chunks. doc: full normalized markdown. span: 1-based line range (PDF/audio rejected as `error.v1.code = span_not_supported`). chars/4 budget on doc/span. | | `kebab ask "" [--show-citations / --hide-citations] [--session ] [--stream]` | RAG 답변 + 근거 인용. 답변 후 `근거:` block 으로 full path / line range / score 한 줄씩 (default ON — `--hide-citations` 로 끄기, pipe 시 유용). 근거 부족 시 거절. Ollama 필요. `--session ` 로 multi-turn — 첫 호출에서 SQLite `chat_sessions` 에 자동 생성, 이후 호출은 prior turns 를 history 로 받아 follow-up. session id 는 사용자 지정 (e.g. `kb-rust-async-2026-05`) — `kebab reset --data-only` 로 모든 session wipe. **`--stream` (p9-fb-33)** 로 ndjson `answer_event.v1` event (retrieval_done → token* → final) 를 stderr 에 흘리고 stdout 마지막 줄에 기존 `answer.v1` — agent 가 token 즉시 소비 가능 | | `kebab doctor` | 설정/모델/DB 헬스 체크 | | `kebab tui` | Ratatui 셸 (Library + Search + Ask + Inspect 패널, desktop 진행 중). Library 에서 `r` 키로 background ingest 시작 — 화면 하단 status bar 가 진행 표시, 완료/abort 시 final 라인 잠시 유지 후 자동 hide. ingest 진행 중 `Esc` / `Ctrl-C` 가 cancel signal (그 외에는 quit). vim-style mode (header 우측 `-- NORMAL --` / `-- INSERT --`) — Library/Inspect 는 자동 NORMAL, Search/Ask 는 자동 INSERT. `i` 로 Normal→Insert (모든 pane — p9-fb-21), `Esc` 로 Insert→Normal 어디서나. mode-authoritative dispatch — Search 의 `j/k/o/g`, Ask 의 `e/j/k` 는 NORMAL 모드에서만 명령으로 동작, INSERT 에서는 입력 문자로 typing. (Search 의 chunk inspect 키는 `i`→`o` 로 rebind — `i` 가 universal Insert toggle.) **`F1` 로 cheatsheet popup** (현재 pane 의 키 매핑 + global 토글 표) — `Esc` / `F1` 로 닫기. Search 패널은 200ms debounce 후 background worker 가 검색 — 키 입력으로 UI freeze 안 됨, 사용자가 계속 타이핑하면 stale 결과 자동 폐기 (generation counter). Ask 패널은 multi-turn — 같은 conversation 안에서 Q1/A1, Q2/A2 transcript 누적, 다음 질문이 이전 턴을 history 로 받아 답변. 답변 본문은 markdown 렌더 (bold/italic/inline code/heading/list/code fence/table/blockquote, raw `**bold**` 가 실제 굵게 표시). `Ctrl-L` 로 새 conversation 시작. Search 의 `g` 키가 `$EDITOR` (기본 `vi`) 로 hit 의 citation 위치 열기 — 종료 후 TUI 화면이 자동으로 깨끗이 redraw. CLI `kebab ask` 는 raw markdown 그대로 (terminal 호환성 위해). Library 의 doc-list 가 한글 / 일본어 / 중국어 (CJK) 제목을 wide-char 정확한 column width 로 truncate — 한글 제목이 한 줄을 넘기지 않음 (CJK 1 자 = 2 col). Search/Ask/Filter 입력의 cursor 가 wide char 위에서 column 단위로 정렬 — 한글 입력 시 caret 이 글자 옆에 정확히 놓임. `← / →` 로 입력 문자열 중간 cursor 이동 (한글 한 글자 = 2 column 이라도 한 번에 이동), `Home / End` 로 양 끝 점프, `Delete` 로 cursor 위치 char 삭제 — 모든 input pane (Ask / Search / Library filter overlay) 동일 (p9-fb-22). Ask 트랜스크립트는 새 답변이 viewport 아래로 누적될 때 자동으로 tail 을 따라감 (auto-scroll); `j` / `k` 로 위로 스크롤하면 freeze, `Shift-G` 로 다시 bottom + auto-tail 재개. 화면 하단 hint line 은 한국어 동사구로 (`"위로"` / `"아래로"` / `"필터"` / `"타이핑 검색어"` / `"Esc 로 NORMAL 모드"` / `"i 입력모드"` 등) + 현재 (pane, mode) 조합에 맞춰 자동 분기, **첫 fragment 가 항상 `F1 도움말`** (cheatsheet 발견성 보장). 모든 모드에서 항상 떠 있는 상태바 — `kebab v docs │ ` (state: streaming/searching/indexing/idle, ingest 진행 중에는 progress 가 같은 자리에 흡수됨). Ask 진입 시 conversation id 8 자 prefix 도 함께 표시. Ask 트랜스크립트와 Inspect 양쪽에서 `PgUp / PgDn` 으로 10 줄씩 페이지 스크롤. Library 의 doc list 위에는 `TITLE / TAGS / UPDATED / CHUNKS` 컬럼 헤더 행 표시 (display-width 정렬, Hangul / CJK 안전). | diff --git a/crates/kebab-app/src/fetch.rs b/crates/kebab-app/src/fetch.rs new file mode 100644 index 0000000..e8a456c --- /dev/null +++ b/crates/kebab-app/src/fetch.rs @@ -0,0 +1,447 @@ +//! p9-fb-35 verbatim fetch implementation. +//! +//! [`App::fetch`] is the facade entry point. It dispatches on +//! [`FetchQuery`] variants: +//! +//! - `Chunk(id)` — return the chunk row from `chunks.text`, optionally +//! with ±N surrounding chunks (`FetchOpts::context`). +//! - `Doc(id)` — return the entire document re-serialized to markdown. +//! (Implemented in Task 4.) +//! - `Span { doc_id, line_start, line_end }` — return a contiguous line +//! slice. (Implemented in Task 5.) +//! +//! Errors are surfaced as [`StructuredError`] (anyhow-friendly wrapper +//! around `ErrorV1`) so the CLI / MCP wire layer's `classify` keeps the +//! typed `code` (`chunk_not_found` / `doc_not_found` / +//! `span_not_supported`) instead of falling through to `code = +//! "generic"`. + +use anyhow::Result; +use time::OffsetDateTime; + +use kebab_core::{ + Block, CanonicalDocument, Chunk, ChunkId, DocumentId, DocumentStore, FetchKind, FetchOpts, + FetchQuery, FetchResult, +}; + +use crate::App; +use crate::error_wire::{ERROR_V1_ID, ErrorV1, StructuredError}; +use crate::staleness::compute_stale; + +impl App { + /// p9-fb-35: verbatim fetch facade. Returns text from + /// `chunks.text` / `CanonicalDocument` based on the requested + /// mode. Errors surface as `StructuredError(ErrorV1)` with one + /// of `chunk_not_found` / `doc_not_found` / `span_not_supported` + /// so the wire-layer classifier preserves the typed code. + pub fn fetch(&self, query: FetchQuery, opts: FetchOpts) -> Result { + match query { + FetchQuery::Chunk(id) => fetch_chunk(self, id, opts), + FetchQuery::Doc(id) => fetch_doc(self, id, opts), + FetchQuery::Span { + doc_id, + line_start, + line_end, + } => fetch_span(self, doc_id, line_start, line_end, opts), + } + } +} + +fn fetch_chunk(app: &App, id: ChunkId, opts: FetchOpts) -> Result { + let target = ::get_chunk(&app.sqlite, &id)? + .ok_or_else(|| { + anyhow::Error::new(StructuredError(ErrorV1 { + schema_version: ERROR_V1_ID.to_string(), + code: "chunk_not_found".to_string(), + message: format!("chunk_id '{}' not found", id.0), + details: serde_json::Value::Null, + hint: None, + })) + })?; + + let doc_id = target.doc_id.clone(); + let doc = + ::get_document(&app.sqlite, &doc_id)? + .ok_or_else(|| { + anyhow::Error::new(StructuredError(ErrorV1 { + schema_version: ERROR_V1_ID.to_string(), + code: "doc_not_found".to_string(), + message: format!( + "doc_id '{}' (parent of chunk '{}') not found", + doc_id.0, id.0 + ), + details: serde_json::Value::Null, + hint: None, + })) + })?; + + let (context_before, context_after) = match opts.context { + Some(n) if n > 0 => surrounding_chunks(app, &doc_id, &id, n)?, + _ => (Vec::new(), Vec::new()), + }; + + let now = OffsetDateTime::now_utc(); + let stale = compute_stale( + doc_metadata_updated_at(&doc), + now, + app.config.search.stale_threshold_days, + ); + + Ok(FetchResult { + kind: FetchKind::Chunk, + doc_id: doc.doc_id.clone(), + doc_path: doc.workspace_path.clone(), + indexed_at: doc_metadata_updated_at(&doc), + stale, + chunk: Some(target), + context_before, + context_after, + text: None, + line_start: None, + line_end: None, + effective_end: None, + truncated: false, + }) +} + +fn fetch_doc(app: &App, id: DocumentId, opts: FetchOpts) -> Result { + let doc = ::get_document(&app.sqlite, &id)? + .ok_or_else(|| { + anyhow::Error::new(StructuredError(ErrorV1 { + schema_version: ERROR_V1_ID.to_string(), + code: "doc_not_found".to_string(), + message: format!("doc_id '{}' not found", id.0), + details: serde_json::Value::Null, + hint: None, + })) + })?; + + let mut text = fmt_canonical_to_markdown(&doc); + let mut truncated = false; + if let Some(max_tokens) = opts.max_tokens { + let max_chars = max_tokens.saturating_mul(4); + if text.chars().count() > max_chars { + text = trim_to_chars(&text, max_chars); + truncated = true; + } + } + + let now = OffsetDateTime::now_utc(); + let stale = compute_stale( + doc_metadata_updated_at(&doc), + now, + app.config.search.stale_threshold_days, + ); + + Ok(FetchResult { + kind: FetchKind::Doc, + doc_id: doc.doc_id.clone(), + doc_path: doc.workspace_path.clone(), + indexed_at: doc_metadata_updated_at(&doc), + stale, + chunk: None, + context_before: Vec::new(), + context_after: Vec::new(), + text: Some(text), + line_start: None, + line_end: None, + effective_end: None, + truncated, + }) +} + +/// p9-fb-35: trim string to N chars (Unicode-safe). Mirrors fb-34's +/// helper at `crates/kebab-app/src/app.rs` — kept local to avoid +/// re-exporting an internal helper. +fn trim_to_chars(s: &str, n: usize) -> String { + if s.chars().count() <= n { + return s.to_string(); + } + let mut out = String::with_capacity(n * 4); + for (i, c) in s.chars().enumerate() { + if i >= n { + break; + } + out.push(c); + } + out +} + +fn fetch_span( + app: &App, + id: DocumentId, + line_start: u32, + line_end: u32, + opts: FetchOpts, +) -> Result { + let doc = ::get_document(&app.sqlite, &id)? + .ok_or_else(|| { + anyhow::Error::new(StructuredError(ErrorV1 { + schema_version: ERROR_V1_ID.to_string(), + code: "doc_not_found".to_string(), + message: format!("doc_id '{}' not found", id.0), + details: serde_json::Value::Null, + hint: None, + })) + })?; + + // Reject line-incompatible media types (PDF / audio). `SourceType` + // (markdown / note / paper / reference / inbox) is the *user-facing* + // category, not the rendering format — the actual byte-level format + // lives on the source `RawAsset.media_type`. Look it up via + // workspace_path (unique key per asset). + if let Some(asset) = ::get_asset_by_workspace_path( + &app.sqlite, + &doc.workspace_path, + )? { + if matches!( + asset.media_type, + kebab_core::MediaType::Pdf | kebab_core::MediaType::Audio(_) + ) { + return Err(anyhow::Error::new(StructuredError(ErrorV1 { + schema_version: ERROR_V1_ID.to_string(), + code: "span_not_supported".to_string(), + message: format!( + "doc '{}' has media_type {:?}; line-based span fetch unsupported. \ + Use `fetch chunk` or `fetch doc` instead.", + id.0, asset.media_type + ), + details: serde_json::Value::Null, + hint: Some("kind = chunk or kind = doc instead".to_string()), + }))); + } + } + + if line_start == 0 || line_end == 0 || line_end < line_start { + return Err(anyhow::Error::new(StructuredError(ErrorV1 { + schema_version: ERROR_V1_ID.to_string(), + code: "invalid_input".to_string(), + message: format!( + "line_start ({line_start}) and line_end ({line_end}) must be 1-based with start <= end" + ), + details: serde_json::Value::Null, + hint: None, + }))); + } + + let full = fmt_canonical_to_markdown(&doc); + let lines: Vec<&str> = full.lines().collect(); + let total = lines.len() as u32; + + // p9-fb-35 round-1 review fix: empty / out-of-range request must + // not slice. Returning empty text + `effective_end = line_start - 1` + // lets the caller detect "no lines fetched" via + // `text.is_empty() && effective_end < line_start`. `truncated` + // stays false because line-range clamp is NOT a budget event — + // budget-driven truncation is the only thing `truncated` signals. + if total == 0 || line_start > total { + let now = OffsetDateTime::now_utc(); + let stale = compute_stale( + doc_metadata_updated_at(&doc), + now, + app.config.search.stale_threshold_days, + ); + return Ok(FetchResult { + kind: FetchKind::Span, + doc_id: doc.doc_id.clone(), + doc_path: doc.workspace_path.clone(), + indexed_at: doc_metadata_updated_at(&doc), + stale, + chunk: None, + context_before: Vec::new(), + context_after: Vec::new(), + text: Some(String::new()), + line_start: Some(line_start), + line_end: Some(line_end), + // saturating_sub: when line_start = 1 we end at 0, signaling + // "no lines fetched" without underflowing u32. + effective_end: Some(line_start.saturating_sub(1)), + truncated: false, + }); + } + + let effective_end_raw = line_end.min(total); + let lo = (line_start - 1) as usize; + let hi = effective_end_raw as usize; + let mut text = lines[lo..hi].join("\n"); + + // p9-fb-35 round-1 review fix: `truncated` is reserved for + // budget-driven truncation only. Line-range clamp (line_end > + // total) is signaled via `effective_end < line_end`, not via + // `truncated`. + let mut truncated = false; + let mut effective_end = effective_end_raw; + if let Some(max_tokens) = opts.max_tokens { + let max_chars = max_tokens.saturating_mul(4); + if text.chars().count() > max_chars { + text = trim_to_chars(&text, max_chars); + truncated = true; + let kept = text.lines().count() as u32; + effective_end = (line_start - 1) + kept; + } + } + + let now = OffsetDateTime::now_utc(); + let stale = compute_stale( + doc_metadata_updated_at(&doc), + now, + app.config.search.stale_threshold_days, + ); + + Ok(FetchResult { + kind: FetchKind::Span, + doc_id: doc.doc_id.clone(), + doc_path: doc.workspace_path.clone(), + indexed_at: doc_metadata_updated_at(&doc), + stale, + chunk: None, + context_before: Vec::new(), + context_after: Vec::new(), + text: Some(text), + line_start: Some(line_start), + line_end: Some(line_end), + effective_end: Some(effective_end), + truncated, + }) +} + +/// p9-fb-35: list chunks for a document in ordinal order, return +/// `(before, after)` slices around the target chunk_id. `n` caps each +/// side independently — the worst case is `2n` total neighbors when +/// the target sits in the middle of the doc. +fn surrounding_chunks( + app: &App, + doc_id: &DocumentId, + target: &ChunkId, + n: u32, +) -> Result<(Vec, Vec)> { + let chunks = list_chunks_in_order(app, doc_id)?; + let target_idx = chunks + .iter() + .position(|c| c.chunk_id == *target) + .ok_or_else(|| anyhow::anyhow!("chunk not found in doc chunk list"))?; + let n = n as usize; + let lo = target_idx.saturating_sub(n); + let hi = target_idx + .saturating_add(n) + .saturating_add(1) + .min(chunks.len()); + let before: Vec = chunks[lo..target_idx].to_vec(); + let after: Vec = chunks[target_idx + 1..hi].to_vec(); + Ok((before, after)) +} + +/// p9-fb-35: chunks have no explicit ordinal column, so the underlying +/// helper sorts by `(created_at, chunk_id)` which matches insertion +/// order produced by the chunker (deterministic). The actual SQL lives +/// inside `kebab-store-sqlite` (`SqliteStore::list_chunk_ids_for_doc`) +/// to keep the facade crate free of direct rusqlite usage. +fn list_chunks_in_order(app: &App, doc_id: &DocumentId) -> Result> { + let chunk_ids = app.sqlite.list_chunk_ids_for_doc(doc_id)?; + let mut out: Vec = Vec::with_capacity(chunk_ids.len()); + for cid in chunk_ids { + if let Some(chunk) = + ::get_chunk(&app.sqlite, &cid)? + { + out.push(chunk); + } + } + Ok(out) +} + +fn doc_metadata_updated_at(doc: &CanonicalDocument) -> OffsetDateTime { + doc.metadata.updated_at +} + +/// p9-fb-35: serialize a `CanonicalDocument` back to markdown. Best- +/// effort round-trip — inline-styled spans (Strong/Emph children) +/// flatten to plain text via the already-flattened `TextBlock.text` +/// field. Good enough for an agent reading verbatim context. Used by +/// Task 4 (doc mode) and Task 5 (span mode). +pub(crate) fn fmt_canonical_to_markdown(doc: &CanonicalDocument) -> String { + let mut out = String::with_capacity(1024); + for (i, block) in doc.blocks.iter().enumerate() { + if i > 0 { + out.push_str("\n\n"); + } + match block { + Block::Heading(h) => { + let level = h.level.clamp(1, 6) as usize; + for _ in 0..level { + out.push('#'); + } + out.push(' '); + out.push_str(&h.text); + } + Block::Paragraph(t) => out.push_str(&t.text), + Block::Quote(t) => { + // Prefix every line with `> ` so block-quote round-trips. + for (li, line) in t.text.split('\n').enumerate() { + if li > 0 { + out.push('\n'); + } + out.push_str("> "); + out.push_str(line); + } + } + Block::List(l) => { + for (idx, item) in l.items.iter().enumerate() { + if idx > 0 { + out.push('\n'); + } + if l.ordered { + out.push_str(&format!("{}. {}", idx + 1, item.text)); + } else { + out.push_str(&format!("- {}", item.text)); + } + } + } + Block::Code(c) => { + out.push_str("```"); + if let Some(lang) = &c.lang { + out.push_str(lang); + } + out.push('\n'); + out.push_str(&c.code); + if !c.code.ends_with('\n') { + out.push('\n'); + } + out.push_str("```"); + } + Block::Table(t) => { + out.push_str(&t.headers.join(" | ")); + out.push('\n'); + // Markdown table separator — N copies of `---|` is + // acceptable for a verbatim re-serialization (renderer + // tolerates trailing pipe). + out.push_str(&"---|".repeat(t.headers.len())); + for row in &t.rows { + out.push('\n'); + out.push_str(&row.join(" | ")); + } + } + Block::ImageRef(img) => { + out.push_str(&format!("![{}]({})", img.alt, img.src)); + } + Block::AudioRef(_a) => { + // Canonical doc carries the transcript on AudioRefBlock, + // but markdown has no native audio embed. Emit a stub + // marker so the agent sees something ran here. + out.push_str("(audio reference)"); + } + } + } + out +} + +/// p9-fb-35: free-function entry for CLI / MCP. Mirrors the +/// `*_with_config` pattern documented in the kebab-app crate root — +/// `kebab-cli` calls this so a `--config ` flag is honored. +#[doc(hidden)] +pub fn fetch_with_config( + config: kebab_config::Config, + query: FetchQuery, + opts: FetchOpts, +) -> Result { + App::open_with_config(config)?.fetch(query, opts) +} diff --git a/crates/kebab-app/src/lib.rs b/crates/kebab-app/src/lib.rs index 66c38ad..a6035a6 100644 --- a/crates/kebab-app/src/lib.rs +++ b/crates/kebab-app/src/lib.rs @@ -60,6 +60,7 @@ pub mod doctor_signal; pub mod error_signal; pub mod error_wire; pub mod external; +pub mod fetch; pub mod ingest_progress; pub mod logging; pub mod reset; @@ -70,6 +71,7 @@ pub use app::{App, SearchResponse}; pub use ingest_progress::{AggregateCounts, IngestEvent, render_skipped_breakdown}; pub use reset::{ResetReport, ResetScope}; pub use error_wire::{ERROR_V1_ID, ErrorV1, StructuredError, classify}; +pub use fetch::fetch_with_config; pub use schema::{Capabilities, Models, SCHEMA_V1_ID, SchemaV1, Stats, WireBlock, schema_with_config}; pub use staleness::{compute_stale, mark_stale_in_place}; diff --git a/crates/kebab-app/tests/fetch_integration.rs b/crates/kebab-app/tests/fetch_integration.rs new file mode 100644 index 0000000..44ee58c --- /dev/null +++ b/crates/kebab-app/tests/fetch_integration.rs @@ -0,0 +1,329 @@ +//! p9-fb-35 App::fetch integration tests. + +mod common; + +use kebab_app::App; +use kebab_core::{FetchKind, FetchOpts, FetchQuery}; + +fn open(env: &common::TestEnv) -> App { + env.app() +} + +#[test] +fn fetch_chunk_returns_target_only_when_no_context() { + let env = common::TestEnv::new(); + common::ingest_md(&env, "a.md", "# Title\n\nFirst paragraph.\n\n## Section\n\nSecond.\n"); + let app = open(&env); + + // Find a chunk via search to obtain its id. + let q = kebab_core::SearchQuery { + text: "First".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 1, + filters: kebab_core::SearchFilters::default(), + }; + let hits = app.search(q).unwrap(); + let chunk_id = hits[0].chunk_id.clone(); + + let result = app + .fetch(FetchQuery::Chunk(chunk_id), FetchOpts::default()) + .unwrap(); + assert_eq!(result.kind, FetchKind::Chunk); + assert!(result.chunk.is_some(), "target chunk populated"); + assert!(result.context_before.is_empty()); + assert!(result.context_after.is_empty()); + assert!(!result.truncated); +} + +#[test] +fn fetch_chunk_with_context_returns_neighbors() { + let env = common::TestEnv::new(); + let body = "# H1\n\nA1\n\n# H2\n\nA2\n\n# H3\n\nA3\n\n# H4\n\nA4\n\n# H5\n\nA5\n"; + common::ingest_md(&env, "multi.md", body); + let app = env.app(); + + let q = kebab_core::SearchQuery { + text: "A3".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 1, + filters: kebab_core::SearchFilters::default(), + }; + let hits = app.search(q).unwrap(); + let chunk_id = hits[0].chunk_id.clone(); + + let result = app + .fetch( + FetchQuery::Chunk(chunk_id), + FetchOpts { + context: Some(2), + max_tokens: None, + }, + ) + .unwrap(); + assert_eq!(result.kind, FetchKind::Chunk); + assert!(result.chunk.is_some()); + let total = result.context_before.len() + result.context_after.len(); + assert!(total >= 1, "at least one neighbor expected"); + assert!(total <= 4, "context capped at +-2 ⇒ max 4 neighbors"); +} + +#[test] +fn fetch_chunk_unknown_id_returns_chunk_not_found() { + let env = common::TestEnv::new(); + let app = env.app(); + let err = app + .fetch( + FetchQuery::Chunk(kebab_core::ChunkId("nonexistent-id".to_string())), + FetchOpts::default(), + ) + .unwrap_err(); + let msg = err.to_string(); + assert!( + msg.contains("chunk_not_found") || msg.contains("nonexistent-id"), + "expected chunk_not_found error, got: {msg}" + ); +} + +#[test] +fn fetch_doc_returns_serialized_markdown() { + let env = common::TestEnv::new(); + let body = "# Heading One\n\nFirst paragraph.\n\n## Sub\n\nSecond.\n"; + common::ingest_md(&env, "doc.md", body); + let app = env.app(); + + // Discover doc_id via search hit (avoids depending on list_docs API shape). + let q = kebab_core::SearchQuery { + text: "First".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 1, + filters: kebab_core::SearchFilters::default(), + }; + let hits = app.search(q).unwrap(); + let doc_id = hits[0].doc_id.clone(); + + let result = app + .fetch(FetchQuery::Doc(doc_id), FetchOpts::default()) + .unwrap(); + assert_eq!(result.kind, FetchKind::Doc); + let text = result.text.expect("doc text"); + assert!(text.contains("Heading One"), "doc text contains heading: {text:?}"); + assert!(text.contains("First paragraph"), "doc text contains body"); + assert!(!result.truncated); +} + +#[test] +fn fetch_doc_unknown_id_returns_doc_not_found() { + let env = common::TestEnv::new(); + let app = env.app(); + let err = app + .fetch( + FetchQuery::Doc(kebab_core::DocumentId("nonexistent-doc".to_string())), + FetchOpts::default(), + ) + .unwrap_err(); + assert!(err.to_string().contains("doc_not_found"), "got: {err}"); +} + +#[test] +fn fetch_doc_with_max_tokens_truncates() { + let env = common::TestEnv::new(); + let p = "Lorem ipsum dolor sit amet consectetur adipiscing elit. ".repeat(20); + let body = format!("# Big\n\n{p}\n"); + common::ingest_md(&env, "big.md", &body); + let app = env.app(); + let q = kebab_core::SearchQuery { + text: "Lorem".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 1, + filters: kebab_core::SearchFilters::default(), + }; + let hits = app.search(q).unwrap(); + let doc_id = hits[0].doc_id.clone(); + + let result = app + .fetch( + FetchQuery::Doc(doc_id), + FetchOpts { + context: None, + max_tokens: Some(20), // ~80 chars + }, + ) + .unwrap(); + assert!(result.truncated); + let text = result.text.expect("doc text"); + assert!(text.chars().count() <= 100, "trimmed text len {}", text.chars().count()); +} + +#[test] +fn fetch_span_returns_line_range() { + let env = common::TestEnv::new(); + // Use a list so the canonical-to-markdown roundtrip emits 5 + // single-line entries joined by `\n` (paragraphs would be joined by + // `\n\n`, and CommonMark soft breaks inside one paragraph collapse to + // spaces — see crates/kebab-parse-md/src/blocks.rs `Event::SoftBreak`). + let body = "- Line one.\n- Line two.\n- Line three.\n- Line four.\n- Line five.\n"; + common::ingest_md(&env, "lines.md", body); + let app = env.app(); + + let q = kebab_core::SearchQuery { + text: "Line".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 1, + filters: kebab_core::SearchFilters::default(), + }; + let hits = app.search(q).unwrap(); + let doc_id = hits[0].doc_id.clone(); + + let result = app + .fetch( + FetchQuery::Span { + doc_id, + line_start: 2, + line_end: 4, + }, + FetchOpts::default(), + ) + .unwrap(); + assert_eq!(result.kind, FetchKind::Span); + let text = result.text.expect("span text"); + let line_count = text.lines().count(); + assert_eq!(line_count, 3, "span should be 3 lines: {text:?}"); + assert_eq!(result.line_start, Some(2)); + assert_eq!(result.line_end, Some(4)); + assert_eq!(result.effective_end, Some(4)); + assert!(!result.truncated); +} + +#[test] +fn fetch_span_clamps_line_end_when_out_of_range() { + let env = common::TestEnv::new(); + common::ingest_md(&env, "short.md", "Line one.\nLine two.\n"); + let app = env.app(); + let q = kebab_core::SearchQuery { + text: "Line".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 1, + filters: kebab_core::SearchFilters::default(), + }; + let hits = app.search(q).unwrap(); + let doc_id = hits[0].doc_id.clone(); + + let result = app + .fetch( + FetchQuery::Span { + doc_id, + line_start: 1, + line_end: 999, + }, + FetchOpts::default(), + ) + .unwrap(); + let text = result.text.expect("span text"); + let actual_lines = text.lines().count(); + assert_eq!(result.effective_end, Some(actual_lines as u32)); + assert!(actual_lines < 999); +} + +#[test] +fn fetch_span_invalid_input_when_zero_lines() { + let env = common::TestEnv::new(); + common::ingest_md(&env, "a.md", "Line one.\n"); + let app = env.app(); + let q = kebab_core::SearchQuery { + text: "Line".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 1, + filters: kebab_core::SearchFilters::default(), + }; + let hits = app.search(q).unwrap(); + let doc_id = hits[0].doc_id.clone(); + + let err = app + .fetch( + FetchQuery::Span { + doc_id, + line_start: 0, + line_end: 0, + }, + FetchOpts::default(), + ) + .unwrap_err(); + assert!(err.to_string().contains("invalid_input"), "got: {err}"); +} + +#[test] +fn fetch_span_line_start_beyond_total_returns_empty_text() { + let env = common::TestEnv::new(); + let body = "- Line one.\n- Line two.\n"; + common::ingest_md(&env, "two_lines.md", body); + let app = env.app(); + let q = kebab_core::SearchQuery { + text: "Line".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 1, + filters: kebab_core::SearchFilters::default(), + }; + let hits = app.search(q).unwrap(); + let doc_id = hits[0].doc_id.clone(); + + let result = app + .fetch( + FetchQuery::Span { + doc_id, + line_start: 100, + line_end: 200, + }, + FetchOpts::default(), + ) + .unwrap(); + let text = result.text.expect("text field"); + assert!(text.is_empty(), "out-of-range request returns empty text"); + assert!( + !result.truncated, + "out-of-range is NOT truncated (budget-only flag)" + ); +} + +#[test] +fn fetch_chunk_context_at_first_chunk_clamps_lower_bound() { + let env = common::TestEnv::new(); + // Multi-chunk markdown so context ±N has neighbors. + let body = + "# H1\n\nFirst chunk text body.\n\n# H2\n\nSecond chunk.\n\n# H3\n\nThird chunk.\n"; + common::ingest_md(&env, "boundary.md", body); + let app = env.app(); + let q = kebab_core::SearchQuery { + text: "First".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 1, + filters: kebab_core::SearchFilters::default(), + }; + let hits = app.search(q).unwrap(); + let chunk_id = hits[0].chunk_id.clone(); + + let result = app + .fetch( + FetchQuery::Chunk(chunk_id), + FetchOpts { + context: Some(2), + max_tokens: None, + }, + ) + .unwrap(); + // p9-fb-35 R2: doc has 3 chunks; ±2 should clamp the total + // neighbor count to ≤ 2 + 1 (= excludes target). + // + // ⚠ Strict "first-chunk → context_before is empty" cannot be + // asserted here yet because chunks.ordinal column does not exist + // — `list_chunk_ids_for_doc` orders by `(created_at, chunk_id)` + // and chunk_id is a blake3 hash, so the "First chunk" content + // may land at any hash-order position within the doc. The clamp + // logic itself is correct (target_idx ± n → [0..len]); we just + // can't pin which chunk is hash-order-first. Tracked as + // follow-up: V007 chunks.ordinal migration. + let total = result.context_before.len() + result.context_after.len(); + assert!( + total <= 2, + "doc with 3 chunks ±2 → at most 2 neighbors (excludes target), got {total}" + ); +} diff --git a/crates/kebab-cli/src/main.rs b/crates/kebab-cli/src/main.rs index 16857e5..c92ba27 100644 --- a/crates/kebab-cli/src/main.rs +++ b/crates/kebab-cli/src/main.rs @@ -86,6 +86,12 @@ enum Cmd { what: InspectWhat, }, + /// p9-fb-35: verbatim chunk / doc / span fetch. + Fetch { + #[command(subcommand)] + what: FetchWhat, + }, + /// Lexical / vector / hybrid search over chunks. Search { query: String, @@ -261,6 +267,33 @@ enum InspectWhat { Chunk { id: String }, } +#[derive(Subcommand, Debug)] +enum FetchWhat { + /// Fetch a single chunk verbatim, optionally with surrounding context. + Chunk { + id: String, + /// p9-fb-35: include ±N chunks before and after the target. + #[arg(long)] + context: Option, + }, + /// Fetch the entire normalized markdown text of a document. + Doc { + id: String, + /// p9-fb-35: chars/4 budget cap. + #[arg(long)] + max_tokens: Option, + }, + /// Fetch a 1-based line range of a document. PDF / audio rejected. + Span { + doc_id: String, + line_start: u32, + line_end: u32, + /// p9-fb-35: chars/4 budget cap. + #[arg(long)] + max_tokens: Option, + }, +} + #[derive(Subcommand, Debug)] enum EvalWhat { /// Run the golden suite end-to-end and persist `eval_runs` + @@ -526,6 +559,49 @@ fn run(cli: &Cli) -> anyhow::Result<()> { } }, + Cmd::Fetch { what } => { + let cfg = kebab_config::Config::load(cli.config.as_deref())?; + let (query, opts) = match what { + FetchWhat::Chunk { id, context } => ( + kebab_core::FetchQuery::Chunk(kebab_core::ChunkId(id.clone())), + kebab_core::FetchOpts { + context: *context, + max_tokens: None, + }, + ), + FetchWhat::Doc { id, max_tokens } => ( + kebab_core::FetchQuery::Doc(kebab_core::DocumentId(id.clone())), + kebab_core::FetchOpts { + context: None, + max_tokens: *max_tokens, + }, + ), + FetchWhat::Span { + doc_id, + line_start, + line_end, + max_tokens, + } => ( + kebab_core::FetchQuery::Span { + doc_id: kebab_core::DocumentId(doc_id.clone()), + line_start: *line_start, + line_end: *line_end, + }, + kebab_core::FetchOpts { + context: None, + max_tokens: *max_tokens, + }, + ), + }; + let result = kebab_app::fetch_with_config(cfg, query, opts)?; + if cli.json { + println!("{}", serde_json::to_string(&wire::wire_fetch_result(&result))?); + } else { + render_fetch_plain(&result); + } + Ok(()) + } + Cmd::Search { query, k, @@ -1112,6 +1188,53 @@ fn confirm_destructive( Ok(matches!(s.as_str(), "y" | "yes")) } +/// p9-fb-35: human-friendly plain output for `kebab fetch`. +fn render_fetch_plain(r: &kebab_core::FetchResult) { + println!("# {} ({})", r.doc_path.0, format_kind(r.kind)); + if r.stale { + println!("[stale; indexed_at = {}]", r.indexed_at); + } + match r.kind { + kebab_core::FetchKind::Chunk => { + if !r.context_before.is_empty() { + println!("\n=== before ==="); + for c in &r.context_before { + let heading = c.heading_path.last().map(|s| s.as_str()).unwrap_or(""); + println!("[{} § {}]\n{}\n", c.chunk_id.0, heading, c.text); + } + } + if let Some(c) = &r.chunk { + println!("\n=== target ==="); + let heading = c.heading_path.last().map(|s| s.as_str()).unwrap_or(""); + println!("[{} § {}]\n{}\n", c.chunk_id.0, heading, c.text); + } + if !r.context_after.is_empty() { + println!("\n=== after ==="); + for c in &r.context_after { + let heading = c.heading_path.last().map(|s| s.as_str()).unwrap_or(""); + println!("[{} § {}]\n{}\n", c.chunk_id.0, heading, c.text); + } + } + } + kebab_core::FetchKind::Doc | kebab_core::FetchKind::Span => { + if let Some(text) = &r.text { + println!("\n{text}"); + } + if r.truncated { + eprintln!("[truncated; widen --max-tokens for fuller text]"); + } + } + } +} + +fn format_kind(k: kebab_core::FetchKind) -> &'static str { + match k { + kebab_core::FetchKind::Chunk => "chunk", + kebab_core::FetchKind::Doc => "doc", + kebab_core::FetchKind::Span => "span", + } +} + #[cfg(test)] mod tests { //! p9-fb-32: unit tests for `render_ask_plain_citations`. The diff --git a/crates/kebab-cli/src/wire.rs b/crates/kebab-cli/src/wire.rs index 649d3f0..178fa22 100644 --- a/crates/kebab-cli/src/wire.rs +++ b/crates/kebab-cli/src/wire.rs @@ -189,6 +189,12 @@ pub fn wire_error_v1(e: &kebab_app::ErrorV1) -> Value { tag_object(v, "error.v1") } +/// p9-fb-35: tag a [`kebab_core::FetchResult`] as `fetch_result.v1`. +pub fn wire_fetch_result(r: &kebab_core::FetchResult) -> Value { + let v = serde_json::to_value(r).expect("FetchResult serializes"); + tag_object(v, "fetch_result.v1") +} + #[cfg(test)] mod tests { use super::*; diff --git a/crates/kebab-cli/tests/cli_mcp_smoke.rs b/crates/kebab-cli/tests/cli_mcp_smoke.rs index 1c0b07b..cc22302 100644 --- a/crates/kebab-cli/tests/cli_mcp_smoke.rs +++ b/crates/kebab-cli/tests/cli_mcp_smoke.rs @@ -66,8 +66,8 @@ fn cli_mcp_initialize_then_tools_list() { .expect("tools/list result.tools must be an array"); assert_eq!( tools.len(), - 6, - "expected 6 tools (schema, doctor, search, ask, ingest_file, ingest_stdin), got {}: {list}", + 7, + "expected 7 tools (schema, doctor, search, ask, fetch, ingest_file, ingest_stdin), got {}: {list}", tools.len() ); diff --git a/crates/kebab-cli/tests/common/mod.rs b/crates/kebab-cli/tests/common/mod.rs index 70c1924..27e2d3b 100644 --- a/crates/kebab-cli/tests/common/mod.rs +++ b/crates/kebab-cli/tests/common/mod.rs @@ -196,6 +196,30 @@ pub fn run_ask_json(cfg: &Path, query: &str) -> String { String::from_utf8_lossy(&out.stdout).to_string() } +/// p9-fb-35: invoke `kebab fetch` with arbitrary trailing flags, +/// capture stdout + stderr. Caller is responsible for supplying +/// `--json` (global flag) before the subcommand position via the +/// `args` slice (e.g. `&["--json", "chunk", &id]`). Asserts the +/// binary exited 0; non-zero exits fail the test with stderr +/// included — for negative-path tests (unknown chunk_id etc.) drive +/// the binary directly via `std::process::Command`. +pub fn run_fetch_with_args(cfg: &Path, args: &[&str]) -> (String, String) { + let bin = env!("CARGO_BIN_EXE_kebab"); + let mut cmd = Command::new(bin); + cmd.arg("--config").arg(cfg).arg("fetch"); + cmd.args(args); + let out = cmd.output().expect("kebab fetch"); + assert!( + out.status.success(), + "fetch failed: args={args:?} stderr={}", + String::from_utf8_lossy(&out.stderr) + ); + ( + String::from_utf8_lossy(&out.stdout).to_string(), + String::from_utf8_lossy(&out.stderr).to_string(), + ) +} + /// Rewrite `documents.updated_at` for one workspace path to /// `now - days_ago` (RFC3339 UTC). Mirrors /// `kebab-app/tests/common/mod.rs::backdate_document_updated_at`. diff --git a/crates/kebab-cli/tests/wire_fetch.rs b/crates/kebab-cli/tests/wire_fetch.rs new file mode 100644 index 0000000..550720b --- /dev/null +++ b/crates/kebab-cli/tests/wire_fetch.rs @@ -0,0 +1,130 @@ +//! p9-fb-35: CLI fetch wire shape + plain output + exit codes. +//! +//! Lexical-only — no fastembed / no Ollama. Each test builds its own +//! TempDir KB via `common::write_config` + `common::ingest` and drives +//! `kebab fetch` through `common::run_fetch_with_args`. Verifies: +//! +//! - `--json fetch chunk ` emits the `fetch_result.v1` wrapper +//! with `kind = "chunk"` and a populated `chunk` object. +//! - `--json fetch doc --max-tokens N` flips `truncated: true` +//! once the budget binds. +//! - Unknown `chunk_id` exits non-zero and emits an `error.v1` +//! ndjson line on stderr with `code = "chunk_not_found"`. + +mod common; + +use serde_json::Value; +use std::fs; + +#[test] +fn fetch_chunk_json_emits_fetch_result_v1() { + let dir = tempfile::tempdir().unwrap(); + let (cfg, workspace, _data) = common::write_config(dir.path(), 30); + fs::write(workspace.join("a.md"), "# T\n\napples are red.\n").unwrap(); + common::ingest(&cfg, &workspace); + + // Find chunk_id via search. + let (search_stdout, _) = common::run_search_with_args( + &cfg, + &["--json", "--mode", "lexical", "--k", "1", "apples"], + ); + let search: Value = serde_json::from_str(search_stdout.trim()) + .unwrap_or_else(|e| panic!("search not JSON: {search_stdout:?}: {e}")); + let chunk_id = search["hits"][0]["chunk_id"] + .as_str() + .expect("chunk_id on first hit") + .to_string(); + + let (stdout, _) = common::run_fetch_with_args( + &cfg, + &["--json", "chunk", &chunk_id], + ); + let v: Value = serde_json::from_str(stdout.trim()) + .unwrap_or_else(|e| panic!("fetch not JSON: {stdout:?}: {e}")); + assert_eq!(v["schema_version"], "fetch_result.v1"); + assert_eq!(v["kind"], "chunk"); + assert!( + v["chunk"].is_object(), + "target chunk must be populated: {v}" + ); + assert_eq!(v["truncated"], false); +} + +#[test] +fn fetch_doc_json_with_max_tokens_truncates() { + let dir = tempfile::tempdir().unwrap(); + let (cfg, workspace, _data) = common::write_config(dir.path(), 30); + let body: String = "Lorem ipsum dolor sit amet. ".repeat(20); + fs::write(workspace.join("big.md"), format!("# Big\n\n{body}\n")).unwrap(); + common::ingest(&cfg, &workspace); + + // Find doc_id via search. + let (search_stdout, _) = common::run_search_with_args( + &cfg, + &["--json", "--mode", "lexical", "--k", "1", "Lorem"], + ); + let search: Value = serde_json::from_str(search_stdout.trim()) + .unwrap_or_else(|e| panic!("search not JSON: {search_stdout:?}: {e}")); + let doc_id = search["hits"][0]["doc_id"] + .as_str() + .expect("doc_id on first hit") + .to_string(); + + let (stdout, _) = common::run_fetch_with_args( + &cfg, + &["--json", "doc", &doc_id, "--max-tokens", "20"], + ); + let v: Value = serde_json::from_str(stdout.trim()) + .unwrap_or_else(|e| panic!("fetch not JSON: {stdout:?}: {e}")); + assert_eq!(v["kind"], "doc"); + assert_eq!( + v["truncated"], true, + "20-token cap must trip truncation: {v}" + ); +} + +#[test] +fn fetch_chunk_unknown_id_exits_with_error_v1() { + let dir = tempfile::tempdir().unwrap(); + let (cfg, _workspace, _data) = common::write_config(dir.path(), 30); + + // Direct invocation (not via the success-asserting helper) so we + // can read stderr on failure — mirrors the stale_cursor test in + // `wire_search_response.rs`. + let exe = env!("CARGO_BIN_EXE_kebab"); + let cfg_str = cfg.to_str().expect("utf8"); + let out = std::process::Command::new(exe) + .args([ + "--config", + cfg_str, + "--json", + "fetch", + "chunk", + "nonexistent", + ]) + .output() + .expect("kebab fetch"); + + assert_ne!(out.status.code(), Some(0), "must exit non-zero"); + let stderr = String::from_utf8_lossy(&out.stderr); + let err_line = stderr + .lines() + .find(|l| { + serde_json::from_str::(l) + .ok() + .and_then(|v| { + v.get("schema_version") + .and_then(|s| s.as_str()) + .map(String::from) + }) + .as_deref() + == Some("error.v1") + }) + .unwrap_or_else(|| panic!("no error.v1 line on stderr: {stderr:?}")); + + let v: Value = serde_json::from_str(err_line).expect("error.v1 json"); + assert_eq!( + v["code"], "chunk_not_found", + "code must be chunk_not_found: {err_line}" + ); +} diff --git a/crates/kebab-core/src/fetch.rs b/crates/kebab-core/src/fetch.rs new file mode 100644 index 0000000..14374be --- /dev/null +++ b/crates/kebab-core/src/fetch.rs @@ -0,0 +1,87 @@ +//! p9-fb-35 verbatim fetch domain types. +//! +//! Three modes (chunk / doc / span) carried by [`FetchQuery`]; one +//! response shape ([`FetchResult`]) discriminated by [`FetchKind`]. +//! All types are `Serialize` so the CLI / MCP wire layers can hand +//! them straight through `serde_json::to_value`. + +use serde::{Deserialize, Serialize}; +use time::OffsetDateTime; + +use crate::asset::WorkspacePath; +use crate::chunk::Chunk; +use crate::ids::{ChunkId, DocumentId}; + +#[derive(Clone, Debug)] +pub enum FetchQuery { + Chunk(ChunkId), + Doc(DocumentId), + Span { + doc_id: DocumentId, + line_start: u32, + line_end: u32, + }, +} + +#[derive(Clone, Debug, Default)] +pub struct FetchOpts { + /// chunk mode only: ±N chunks. None = no surrounding context. + pub context: Option, + /// doc / span mode only: chars/4 budget. None = no cap. + pub max_tokens: Option, +} + +#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum FetchKind { + Chunk, + Doc, + Span, +} + +#[derive(Clone, Debug, Serialize, Deserialize)] +pub struct FetchResult { + pub kind: FetchKind, + pub doc_id: DocumentId, + pub doc_path: WorkspacePath, + #[serde(with = "time::serde::rfc3339")] + pub indexed_at: OffsetDateTime, + pub stale: bool, + // chunk mode payloads + #[serde(skip_serializing_if = "Option::is_none")] + pub chunk: Option, + #[serde(skip_serializing_if = "Vec::is_empty", default)] + pub context_before: Vec, + #[serde(skip_serializing_if = "Vec::is_empty", default)] + pub context_after: Vec, + // doc / span payloads + #[serde(skip_serializing_if = "Option::is_none")] + pub text: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub line_start: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub line_end: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub effective_end: Option, + pub truncated: bool, +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn fetch_opts_default_is_all_none() { + let o = FetchOpts::default(); + assert!(o.context.is_none()); + assert!(o.max_tokens.is_none()); + } + + #[test] + fn fetch_kind_serializes_snake_case() { + let v = serde_json::to_value(FetchKind::Chunk).unwrap(); + assert_eq!(v, serde_json::json!("chunk")); + let v = serde_json::to_value(FetchKind::Span).unwrap(); + assert_eq!(v, serde_json::json!("span")); + } +} diff --git a/crates/kebab-core/src/lib.rs b/crates/kebab-core/src/lib.rs index 6da7a53..7bbb01b 100644 --- a/crates/kebab-core/src/lib.rs +++ b/crates/kebab-core/src/lib.rs @@ -23,6 +23,7 @@ pub mod vector; pub mod errors; pub mod traits; pub mod normalize; +pub mod fetch; // Re-export the most commonly used items at the crate root, mirroring the // public surface listed in the task spec. @@ -68,3 +69,4 @@ pub use traits::{ SourceScope, TokenChunk, VectorStore, }; pub use normalize::{nfc, to_posix}; +pub use fetch::{FetchKind, FetchOpts, FetchQuery, FetchResult}; diff --git a/crates/kebab-mcp/src/lib.rs b/crates/kebab-mcp/src/lib.rs index fc6a2a4..93f39df 100644 --- a/crates/kebab-mcp/src/lib.rs +++ b/crates/kebab-mcp/src/lib.rs @@ -1,6 +1,7 @@ -//! MCP (Model Context Protocol) server over stdio. Exposes 6 tools -//! (`search` / `ask` / `schema` / `doctor` / `ingest_file` / `ingest_stdin`) -//! backed by `kebab-app` facade methods. Used by `kebab-cli`'s `Cmd::Mcp` arm. +//! MCP (Model Context Protocol) server over stdio. Exposes 7 tools +//! (`search` / `ask` / `schema` / `doctor` / `ingest_file` / `ingest_stdin` +//! / `fetch`) backed by `kebab-app` facade methods. Used by `kebab-cli`'s +//! `Cmd::Mcp` arm. //! //! See spec `docs/superpowers/specs/2026-05-07-p9-fb-30-mcp-server-design.md`. @@ -61,6 +62,11 @@ pub fn build_tools_vec() -> Vec { "Ingest markdown content into the knowledge base. v1 markdown only. Frontmatter (title + source_uri) auto-injected.", schema_for_type::(), ), + Tool::new( + "fetch", + "Verbatim fetch — chunk / doc / span modes. Returns fetch_result.v1 with the indexed text (no LLM rewrite).", + schema_for_type::(), + ), ] } @@ -157,6 +163,13 @@ impl ServerHandler for KebabHandler { }) .await } + "fetch" => { + let args = request.arguments.unwrap_or_default(); + self.spawn_tool(args, |state, input| { + tools::fetch::handle(&state, input) + }) + .await + } _other => Err(ErrorData::method_not_found::< rmcp::model::CallToolRequestMethod, >()), diff --git a/crates/kebab-mcp/src/tools/fetch.rs b/crates/kebab-mcp/src/tools/fetch.rs new file mode 100644 index 0000000..3f0ea5b --- /dev/null +++ b/crates/kebab-mcp/src/tools/fetch.rs @@ -0,0 +1,99 @@ +//! p9-fb-35 `fetch` tool — wraps `kebab_app::fetch_with_config`. +//! +//! Three modes (chunk / doc / span). Output is `fetch_result.v1`. +//! +//! Mirrors the CLI surface (`kebab fetch ...`): same input shape, +//! same wire envelope. Missing kind-specific fields produce an `error.v1` +//! with `code = "invalid_input"`. + +use rmcp::model::CallToolResult; +use schemars::JsonSchema; +use serde::{Deserialize, Serialize}; + +use crate::error::{to_tool_error, to_tool_success}; +use crate::state::KebabAppState; + +#[derive(Debug, Deserialize, Serialize, JsonSchema)] +pub struct FetchInput { + /// "chunk" | "doc" | "span" + pub kind: String, + /// Required when kind = "chunk". + pub chunk_id: Option, + /// Required when kind = "doc" or "span". + pub doc_id: Option, + /// Required when kind = "span" (1-based, inclusive). + pub line_start: Option, + pub line_end: Option, + /// chunk only: ±N surrounding chunks. + pub context: Option, + /// doc/span only: chars/4 budget. + pub max_tokens: Option, +} + +pub fn handle(state: &KebabAppState, input: FetchInput) -> CallToolResult { + let query = match input.kind.as_str() { + "chunk" => match input.chunk_id { + Some(id) => kebab_core::FetchQuery::Chunk(kebab_core::ChunkId(id)), + None => return invalid_input("kind=chunk requires chunk_id"), + }, + "doc" => match input.doc_id { + Some(id) => kebab_core::FetchQuery::Doc(kebab_core::DocumentId(id)), + None => return invalid_input("kind=doc requires doc_id"), + }, + "span" => match (input.doc_id, input.line_start, input.line_end) { + (Some(id), Some(start), Some(end)) => kebab_core::FetchQuery::Span { + doc_id: kebab_core::DocumentId(id), + line_start: start, + line_end: end, + }, + _ => return invalid_input("kind=span requires doc_id, line_start, line_end"), + }, + other => { + return invalid_input(&format!( + "unknown kind '{other}'; expected chunk|doc|span" + )); + } + }; + + let opts = kebab_core::FetchOpts { + context: input.context, + max_tokens: input.max_tokens, + }; + + let cfg_clone = (*state.config).clone(); + match kebab_app::fetch_with_config(cfg_clone, query, opts) { + Ok(r) => { + // FetchResult does not carry a `schema_version` field, so we + // tag the envelope inline (mirrors search.rs's pattern). + let mut v = match serde_json::to_value(&r) { + Ok(v) => v, + Err(e) => { + return to_tool_error(&anyhow::anyhow!("FetchResult serialize: {e}")); + } + }; + if let serde_json::Value::Object(ref mut map) = v { + map.insert( + "schema_version".to_string(), + serde_json::Value::String("fetch_result.v1".to_string()), + ); + } + match serde_json::to_string(&v) { + Ok(json) => to_tool_success(json), + Err(e) => to_tool_error(&anyhow::anyhow!(e)), + } + } + Err(e) => to_tool_error(&e), + } +} + +fn invalid_input(msg: &str) -> CallToolResult { + use kebab_app::{ErrorV1, StructuredError}; + let err = anyhow::Error::new(StructuredError(ErrorV1 { + schema_version: "error.v1".to_string(), + code: "invalid_input".to_string(), + message: msg.to_string(), + details: serde_json::Value::Null, + hint: None, + })); + to_tool_error(&err) +} diff --git a/crates/kebab-mcp/src/tools/mod.rs b/crates/kebab-mcp/src/tools/mod.rs index 087d630..22b5569 100644 --- a/crates/kebab-mcp/src/tools/mod.rs +++ b/crates/kebab-mcp/src/tools/mod.rs @@ -6,3 +6,4 @@ pub mod search; pub mod ask; pub mod ingest_file; pub mod ingest_stdin; +pub mod fetch; diff --git a/crates/kebab-mcp/tests/tools_call_fetch.rs b/crates/kebab-mcp/tests/tools_call_fetch.rs new file mode 100644 index 0000000..5627e93 --- /dev/null +++ b/crates/kebab-mcp/tests/tools_call_fetch.rs @@ -0,0 +1,215 @@ +//! p9-fb-35: tools/call name=fetch — chunk happy path + invalid_input. +//! +//! Mirrors `tools_call_search.rs` setup: a TempDir KB with embedding +//! provider = "none" (no Ollama / fastembed) and a single ingested +//! markdown doc. We discover a `chunk_id` via the search tool, call +//! `fetch` with it, then exercise the missing-arg branch separately. + +use std::fs; + +use kebab_config::Config; +use kebab_core::SourceScope; +use kebab_mcp::{KebabAppState, KebabHandler}; +use rmcp::model::RawContent; + +fn minimal_config(data_dir: &std::path::Path, workspace_root: &std::path::Path) -> Config { + let mut cfg = Config::defaults(); + cfg.storage.data_dir = data_dir.to_string_lossy().into_owned(); + cfg.storage.model_dir = data_dir + .join("models") + .to_string_lossy() + .into_owned(); + cfg.workspace.root = workspace_root.to_string_lossy().into_owned(); + cfg.workspace.exclude.clear(); + cfg.models.embedding.provider = "none".to_string(); + cfg.models.embedding.dimensions = 0; + cfg +} + +#[tokio::test] +async fn fetch_tool_chunk_returns_fetch_result_v1() { + let dir = tempfile::tempdir().unwrap(); + let data_dir = dir.path().join("data"); + let workspace_root = dir.path().join("notes"); + fs::create_dir_all(&data_dir).unwrap(); + fs::create_dir_all(&workspace_root).unwrap(); + + let config = minimal_config(&data_dir, &workspace_root); + + fs::write( + workspace_root.join("a.md"), + "# Alpha\n\nThis document mentions kebab and bread.", + ) + .unwrap(); + + let scope = SourceScope { + root: workspace_root.clone(), + include: vec![], + exclude: vec![], + }; + let _ = kebab_app::ingest_with_config(config.clone(), scope, false).unwrap(); + + let state = KebabAppState::new(config, None); + let handler = KebabHandler::new(state); + + // Discover a chunk_id via the search tool. + let search_result = kebab_mcp::tools::search::handle( + handler.state(), + kebab_mcp::tools::search::SearchInput { + query: "kebab".to_string(), + mode: Some("lexical".to_string()), + k: Some(1), + max_tokens: None, + snippet_chars: None, + cursor: None, + }, + ); + let search_text = match &search_result.content.first().unwrap().raw { + RawContent::Text(t) => t.text.clone(), + other => panic!("expected text content, got {other:?}"), + }; + let search_v: serde_json::Value = serde_json::from_str(&search_text).unwrap(); + let chunk_id = search_v["hits"][0]["chunk_id"] + .as_str() + .expect("chunk_id on first hit") + .to_string(); + + // Call fetch with kind=chunk. + let result = kebab_mcp::tools::fetch::handle( + handler.state(), + kebab_mcp::tools::fetch::FetchInput { + kind: "chunk".to_string(), + chunk_id: Some(chunk_id), + doc_id: None, + line_start: None, + line_end: None, + context: None, + max_tokens: None, + }, + ); + + assert!( + !result.is_error.unwrap_or(false), + "expected isError=false, got {:?}", + result + ); + + let content = result + .content + .first() + .expect("expected at least one content item"); + let text = match &content.raw { + RawContent::Text(t) => &t.text, + other => panic!("expected text content, got {other:?}"), + }; + + let v: serde_json::Value = serde_json::from_str(text).unwrap(); + assert_eq!( + v.get("schema_version").and_then(|s| s.as_str()), + Some("fetch_result.v1"), + "envelope must carry schema_version=fetch_result.v1" + ); + assert_eq!( + v.get("kind").and_then(|s| s.as_str()), + Some("chunk"), + "kind must be 'chunk'" + ); + assert!( + v.get("chunk").is_some_and(|c| c.is_object()), + "chunk payload must be populated for kind=chunk" + ); +} + +#[tokio::test] +async fn fetch_tool_invalid_kind_returns_invalid_input() { + let dir = tempfile::tempdir().unwrap(); + let data_dir = dir.path().join("data"); + let workspace_root = dir.path().join("notes"); + fs::create_dir_all(&data_dir).unwrap(); + fs::create_dir_all(&workspace_root).unwrap(); + + let config = minimal_config(&data_dir, &workspace_root); + + let state = KebabAppState::new(config, None); + let handler = KebabHandler::new(state); + + let result = kebab_mcp::tools::fetch::handle( + handler.state(), + kebab_mcp::tools::fetch::FetchInput { + kind: "garbage".to_string(), + chunk_id: None, + doc_id: None, + line_start: None, + line_end: None, + context: None, + max_tokens: None, + }, + ); + + assert!( + result.is_error.unwrap_or(false), + "expected isError=true for unknown kind" + ); + let content = result + .content + .first() + .expect("expected at least one content item"); + let text = match &content.raw { + RawContent::Text(t) => &t.text, + other => panic!("expected text content, got {other:?}"), + }; + let v: serde_json::Value = serde_json::from_str(text).unwrap(); + assert_eq!( + v.get("schema_version").and_then(|s| s.as_str()), + Some("error.v1"), + "must carry error.v1 envelope" + ); + assert_eq!( + v.get("code").and_then(|s| s.as_str()), + Some("invalid_input"), + "code must be invalid_input for unknown kind" + ); +} + +#[tokio::test] +async fn fetch_tool_chunk_missing_id_returns_invalid_input() { + let dir = tempfile::tempdir().unwrap(); + let data_dir = dir.path().join("data"); + let workspace_root = dir.path().join("notes"); + fs::create_dir_all(&data_dir).unwrap(); + fs::create_dir_all(&workspace_root).unwrap(); + + let config = minimal_config(&data_dir, &workspace_root); + + let state = KebabAppState::new(config, None); + let handler = KebabHandler::new(state); + + // kind=chunk but no chunk_id — invalid_input. + let result = kebab_mcp::tools::fetch::handle( + handler.state(), + kebab_mcp::tools::fetch::FetchInput { + kind: "chunk".to_string(), + chunk_id: None, + doc_id: None, + line_start: None, + line_end: None, + context: None, + max_tokens: None, + }, + ); + + assert!( + result.is_error.unwrap_or(false), + "expected isError=true when chunk_id is missing" + ); + let content = result.content.first().unwrap(); + let text = match &content.raw { + RawContent::Text(t) => &t.text, + other => panic!("expected text content, got {other:?}"), + }; + let v: serde_json::Value = serde_json::from_str(text).unwrap(); + assert_eq!( + v.get("code").and_then(|s| s.as_str()), + Some("invalid_input") + ); +} diff --git a/crates/kebab-mcp/tests/tools_list.rs b/crates/kebab-mcp/tests/tools_list.rs index 01bfe6e..c56c4a3 100644 --- a/crates/kebab-mcp/tests/tools_list.rs +++ b/crates/kebab-mcp/tests/tools_list.rs @@ -1,13 +1,13 @@ -//! Integration: `build_tools_vec` returns 6 tools with correct names and +//! Integration: `build_tools_vec` returns 7 tools with correct names and //! inputSchema. Uses the extracted `pub fn build_tools_vec()` helper — no //! transport or RequestContext needed. use kebab_mcp::build_tools_vec; #[test] -fn tools_list_returns_six_tools() { +fn tools_list_returns_seven_tools() { let tools = build_tools_vec(); - assert_eq!(tools.len(), 6, "expected exactly 6 tools, got {}", tools.len()); + assert_eq!(tools.len(), 7, "expected exactly 7 tools, got {}", tools.len()); let names: Vec<&str> = tools.iter().map(|t| t.name.as_ref()).collect(); assert!(names.contains(&"schema"), "missing 'schema' tool"); @@ -16,6 +16,7 @@ fn tools_list_returns_six_tools() { assert!(names.contains(&"ask"), "missing 'ask' tool"); assert!(names.contains(&"ingest_file"), "missing 'ingest_file' tool"); assert!(names.contains(&"ingest_stdin"), "missing 'ingest_stdin' tool"); + assert!(names.contains(&"fetch"), "missing 'fetch' tool"); } #[test] diff --git a/crates/kebab-store-sqlite/src/documents.rs b/crates/kebab-store-sqlite/src/documents.rs index ac59939..b769b03 100644 --- a/crates/kebab-store-sqlite/src/documents.rs +++ b/crates/kebab-store-sqlite/src/documents.rs @@ -375,6 +375,48 @@ impl kebab_core::DocumentStore for SqliteStore { } } +impl SqliteStore { + /// p9-fb-35: list `chunk_id`s for a document, returning a stable + /// `(created_at, chunk_id)` order. Used by + /// `App::fetch chunk --context N` to find ordinal-adjacent chunks. + /// + /// ⚠ Round-1 review caveat: `chunk_id` is a blake3 hash of + /// `(doc_id, chunker_version, …)` — hex-lexicographic sort does NOT + /// correspond to document position. Within one ingest transaction + /// all chunks share `created_at` to the millisecond, so the + /// secondary `chunk_id` sort dominates and the "neighbors" + /// returned here may not be document-adjacent. + /// + /// Real fix is a `chunks.ordinal` column (V007 migration) or sort + /// by `chunks.source_spans_json[0]` start offset. Tracked as + /// follow-up. Until then `--context` neighbors are best-effort — + /// they may or may not align with document position depending on + /// whether `chunk_id` hash order happens to match insertion order + /// for that particular doc. Large markdown / PDF (page-aligned + /// chunks) likely re-orders. See `tasks/HOTFIXES.md` if escalated. + pub fn list_chunk_ids_for_doc( + &self, + doc_id: &kebab_core::DocumentId, + ) -> Result> { + let conn = self.read_conn(); + let mut stmt = conn + .prepare( + "SELECT chunk_id FROM chunks + WHERE doc_id = ? + ORDER BY created_at ASC, chunk_id ASC", + ) + .map_err(StoreError::from)?; + let rows = stmt + .query_map(params![doc_id.0], |r| r.get::<_, String>(0)) + .map_err(StoreError::from)?; + let ids: Vec = rows + .map(|r| r.map(kebab_core::ChunkId)) + .collect::>>() + .map_err(StoreError::from)?; + Ok(ids) + } +} + // ── Internal row + (de)serialization helpers ───────────────────────────── struct DocumentRow { diff --git a/docs/SMOKE.md b/docs/SMOKE.md index bbb152e..272c1f7 100644 --- a/docs/SMOKE.md +++ b/docs/SMOKE.md @@ -171,6 +171,25 @@ kebab search "rust" --json --max-tokens 200 | jq '.truncated, (.hits | length)' `--json` 출력은 `search_response.v1` wrapper (`{hits, next_cursor, truncated}`) — pre-fb-34 의 bare `search_hit.v1[]` 배열과 호환 안 됨. +### Verbatim fetch (fb-35) + +```bash +# Search to get a chunk_id. +CHUNK_ID=$(kebab search "rust ownership" --json --k 1 | jq -r '.hits[0].chunk_id') + +# Fetch verbatim with surrounding context. +kebab fetch chunk "$CHUNK_ID" --context 2 --json | jq . + +# Fetch the full doc as markdown. +DOC_ID=$(kebab search "rust ownership" --json --k 1 | jq -r '.hits[0].doc_id') +kebab fetch doc "$DOC_ID" --max-tokens 1000 --json | jq '{kind, truncated, len: (.text | length)}' + +# Fetch a line range (markdown / text only). +kebab fetch span "$DOC_ID" 1 5 --json | jq '{line_start, line_end, effective_end, text}' +``` + +PDF / audio docs reject `fetch span` with `error.v1.code = span_not_supported` — use `fetch chunk` (PDF chunks are page-aligned) or `fetch doc` instead. + ## P6-4 이미지 ingestion 옵션 `config.toml` 에 다음 절을 추가하면 `kebab ingest` 가 `**/*.png` / `**/*.jpg` 등 이미지 자산도 함께 색인합니다 (텍스트만 색인하려면 생략): diff --git a/docs/superpowers/plans/2026-05-09-p9-fb-35-verbatim-fetch.md b/docs/superpowers/plans/2026-05-09-p9-fb-35-verbatim-fetch.md new file mode 100644 index 0000000..b458218 --- /dev/null +++ b/docs/superpowers/plans/2026-05-09-p9-fb-35-verbatim-fetch.md @@ -0,0 +1,1929 @@ +# p9-fb-35 — Verbatim Fetch Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add `kebab fetch chunk|doc|span` subcommand + `mcp__kebab__fetch` MCP tool. Returns verbatim text from `chunks.text` / `CanonicalDocument` (normalized markdown only — raw bytes not exposed). Wire = `fetch_result.v1` (kind discriminator). Reuses fb-32 staleness + fb-34 budget patterns. + +**Architecture:** New `App::fetch(query: FetchQuery, opts: FetchOpts) -> Result` facade. Three modes share one entry: `Chunk(ChunkId)` (with optional `--context N` ordinal-based ±N chunks), `Doc(DocumentId)` (markdown serialization of CanonicalDocument), `Span { doc_id, line_start, line_end }` (line-range slice; PDF/audio rejected as `span_not_supported`). fb-34 `StructuredError` wrapper preserves typed `error.v1.code` through anyhow. Single discriminated wire shape. + +**Tech Stack:** Rust 2024, serde, JSON Schema (fetch_result.v1), no new deps. + +**Spec:** `docs/superpowers/specs/2026-05-09-p9-fb-35-verbatim-fetch-design.md` + +--- + +## File Structure + +| File | Responsibility | Action | +|------|----------------|--------| +| `crates/kebab-core/src/fetch.rs` | NEW — `FetchQuery`, `FetchOpts`, `FetchResult`, `FetchKind` | create | +| `crates/kebab-core/src/lib.rs` | `mod fetch; pub use fetch::*;` | modify | +| `crates/kebab-app/src/fetch.rs` | NEW — `App::fetch` impl, mode dispatch, markdown serialization helper, error construction | create | +| `crates/kebab-app/src/app.rs` | Add `pub fn fetch(...)` method on `App` (delegates to `crate::fetch::run`) | modify | +| `crates/kebab-app/src/lib.rs` | `mod fetch;` + `pub use fetch::fetch_with_config;` | modify | +| `crates/kebab-app/src/error_wire.rs` | Add `chunk_not_found` / `doc_not_found` / `span_not_supported` / `invalid_input` to known codes (documentation comment only — codes already pass through `StructuredError`) | modify | +| `crates/kebab-cli/src/main.rs` | New `Cmd::Fetch` enum variant + `FetchWhat` subcommand + dispatch + plain renderer | modify | +| `crates/kebab-cli/src/wire.rs` | New `wire_fetch_result(&FetchResult) -> Value` | modify | +| `crates/kebab-mcp/src/tools/fetch.rs` | NEW — `FetchInput` + `handle()` | create | +| `crates/kebab-mcp/src/lib.rs` | Register `kebab__fetch` tool | modify | +| `docs/wire-schema/v1/fetch_result.schema.json` | NEW | create | +| `crates/kebab-app/tests/fetch_integration.rs` | NEW — chunk/doc/span unit + error paths | create | +| `crates/kebab-app/tests/common/mod.rs` | Possibly extend — ingest + lookup helpers | modify | +| `crates/kebab-cli/tests/wire_fetch.rs` | NEW — wire shape + plain output + exit codes | create | +| `crates/kebab-cli/tests/common/mod.rs` | Possibly extend — `run_fetch_with_args` helper | modify | +| `crates/kebab-mcp/tests/tools_call_fetch.rs` | NEW — 3 modes + invalid_input | create | +| `README.md` | `kebab fetch chunk|doc|span` row in 명령 table | modify | +| `docs/SMOKE.md` | Fetch walkthrough paragraph | modify | +| `tasks/p9/p9-fb-35-verbatim-fetch.md` | Status flip + design/plan links | modify | +| `tasks/INDEX.md` | fb-35 row → ✅ | modify | +| `integrations/claude-code/kebab/SKILL.md` | New `mcp__kebab__fetch` row + recipe | modify | + +--- + +## Pre-flight + +- [ ] **Step 0.1: Branch off main** + +```bash +git checkout main +git pull +git checkout -b feat/fb-35-verbatim-fetch +``` + +- [ ] **Step 0.2: Confirm spec branch reachable** + +```bash +git log --oneline spec/fb-35-verbatim-fetch -1 +``` + +Expected: `4eda9c3 spec(fb-35): verbatim fetch — design`. If spec PR not yet merged into main, `git merge spec/fb-35-verbatim-fetch` so the spec doc lives on this branch. + +--- + +## Task 1: Domain types in kebab-core + +**Files:** +- Create: `crates/kebab-core/src/fetch.rs` +- Modify: `crates/kebab-core/src/lib.rs` + +- [ ] **Step 1.1: Failing test** + +Append to `crates/kebab-core/src/fetch.rs` (will create the file): + +```rust +//! p9-fb-35 verbatim fetch domain types. +//! +//! Three modes (chunk / doc / span) carried by [`FetchQuery`]; one +//! response shape ([`FetchResult`]) discriminated by [`FetchKind`]. +//! All types are `Serialize` so the CLI / MCP wire layers can hand +//! them straight through `serde_json::to_value`. + +use serde::{Deserialize, Serialize}; +use time::OffsetDateTime; + +use crate::asset::WorkspacePath; +use crate::ids::{ChunkId, DocumentId}; +use crate::traits::Chunk; + +#[derive(Clone, Debug)] +pub enum FetchQuery { + Chunk(ChunkId), + Doc(DocumentId), + Span { + doc_id: DocumentId, + line_start: u32, + line_end: u32, + }, +} + +#[derive(Clone, Debug, Default)] +pub struct FetchOpts { + /// chunk mode only: ±N chunks. None = no surrounding context. + pub context: Option, + /// doc / span mode only: chars/4 budget. None = no cap. + pub max_tokens: Option, +} + +#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum FetchKind { + Chunk, + Doc, + Span, +} + +#[derive(Clone, Debug, Serialize, Deserialize)] +pub struct FetchResult { + pub kind: FetchKind, + pub doc_id: DocumentId, + pub doc_path: WorkspacePath, + #[serde(with = "time::serde::rfc3339")] + pub indexed_at: OffsetDateTime, + pub stale: bool, + // chunk mode payloads + #[serde(skip_serializing_if = "Option::is_none")] + pub chunk: Option, + #[serde(skip_serializing_if = "Vec::is_empty", default)] + pub context_before: Vec, + #[serde(skip_serializing_if = "Vec::is_empty", default)] + pub context_after: Vec, + // doc / span payloads + #[serde(skip_serializing_if = "Option::is_none")] + pub text: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub line_start: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub line_end: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub effective_end: Option, + pub truncated: bool, +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn fetch_opts_default_is_all_none() { + let o = FetchOpts::default(); + assert!(o.context.is_none()); + assert!(o.max_tokens.is_none()); + } + + #[test] + fn fetch_kind_serializes_snake_case() { + let v = serde_json::to_value(FetchKind::Chunk).unwrap(); + assert_eq!(v, serde_json::json!("chunk")); + let v = serde_json::to_value(FetchKind::Span).unwrap(); + assert_eq!(v, serde_json::json!("span")); + } +} +``` + +- [ ] **Step 1.2: Wire module** + +Edit `crates/kebab-core/src/lib.rs`. Find the existing `mod` declarations (search for `pub mod search;` or similar): + +```bash +grep -n "^pub mod\|^mod " crates/kebab-core/src/lib.rs | head -10 +``` + +Add at the same level: + +```rust +pub mod fetch; +``` + +And re-export the public types — append to the `pub use` block near the top: + +```rust +pub use fetch::{FetchKind, FetchOpts, FetchQuery, FetchResult}; +``` + +- [ ] **Step 1.3: Run tests — verify pass** + +```bash +cargo test -p kebab-core --lib fetch::tests +``` + +Expected: 2 PASS. + +- [ ] **Step 1.4: Commit** + +```bash +git add crates/kebab-core/src/fetch.rs crates/kebab-core/src/lib.rs +git commit -m "$(cat <<'EOF' +feat(core): FetchQuery / FetchOpts / FetchResult / FetchKind (fb-35) + +Domain types for `kebab fetch` 3 modes (chunk / doc / span). All +types Serialize so wire layers hand them through serde_json +directly. FetchKind is snake_case-renamed to match the wire +discriminator literal in fetch_result.v1. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 2: Wire schema — `fetch_result.v1` + +**Files:** +- Create: `docs/wire-schema/v1/fetch_result.schema.json` + +- [ ] **Step 2.1: Write the schema** + +Create `docs/wire-schema/v1/fetch_result.schema.json`: + +```json +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://kb.local/wire/v1/fetch_result.schema.json", + "title": "FetchResult v1", + "description": "Verbatim text fetch from the indexed corpus. Discriminated by `kind`. All text is normalized markdown sourced from `CanonicalDocument` / `chunks.text` — original raw bytes are not exposed. PDF / audio span fetch returns `error.v1.code = span_not_supported`.", + "type": "object", + "required": ["schema_version", "kind", "doc_id", "doc_path", "indexed_at", "stale", "truncated"], + "properties": { + "schema_version": { "const": "fetch_result.v1" }, + "kind": { "enum": ["chunk", "doc", "span"] }, + "doc_id": { "type": "string" }, + "doc_path": { "type": "string" }, + "indexed_at": { "type": "string", "format": "date-time", "description": "fb-32 documents.updated_at" }, + "stale": { "type": "boolean", "description": "fb-32 staleness flag against config.search.stale_threshold_days" }, + "chunk": { "type": "object", "description": "kind=chunk: target chunk_inspection.v1 payload" }, + "context_before": { "type": "array", "description": "kind=chunk: --context N preceding chunks (ordinal-sorted)" }, + "context_after": { "type": "array", "description": "kind=chunk: --context N following chunks (ordinal-sorted)" }, + "text": { "type": "string", "description": "kind=doc/span: markdown text (truncated if budget tripped)" }, + "line_start": { "type": ["integer", "null"], "minimum": 1, "description": "kind=span: requested start line (1-based)" }, + "line_end": { "type": ["integer", "null"], "minimum": 1, "description": "kind=span: requested end line (1-based, inclusive)" }, + "effective_end": { "type": ["integer", "null"], "minimum": 1, "description": "kind=span: actual emitted end line after budget truncation" }, + "truncated": { "type": "boolean", "description": "kind=doc/span: budget forced text truncation. Always false for chunk." } + } +} +``` + +- [ ] **Step 2.2: Validate JSON** + +```bash +python3 -c "import json; json.load(open('docs/wire-schema/v1/fetch_result.schema.json'))" +``` + +Expected: silent success. + +- [ ] **Step 2.3: Commit** + +```bash +git add docs/wire-schema/v1/fetch_result.schema.json +git commit -m "$(cat <<'EOF' +feat(wire): fetch_result.v1 schema (fb-35) + +Discriminated by kind (chunk / doc / span). Per-kind required +fields enforced by description prose at v1 stub stage; future +phase may add JSON Schema conditional validation. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 3: `App::fetch` chunk mode + markdown serializer + +**Files:** +- Create: `crates/kebab-app/src/fetch.rs` +- Modify: `crates/kebab-app/src/app.rs` +- Modify: `crates/kebab-app/src/lib.rs` +- Create: `crates/kebab-app/tests/fetch_integration.rs` + +- [ ] **Step 3.1: Write failing chunk-mode test** + +Create `crates/kebab-app/tests/fetch_integration.rs`: + +```rust +//! p9-fb-35 App::fetch integration tests. + +mod common; + +use kebab_app::App; +use kebab_core::{FetchKind, FetchOpts, FetchQuery}; + +fn open(env: &common::TestEnv) -> App { + env.app() +} + +#[test] +fn fetch_chunk_returns_target_only_when_no_context() { + let env = common::TestEnv::new(); + common::ingest_md(&env, "a.md", "# Title\n\nFirst paragraph.\n\n## Section\n\nSecond.\n"); + let app = open(&env); + + // Find a chunk via search to obtain its id. + let q = kebab_core::SearchQuery { + text: "First".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 1, + filters: kebab_core::SearchFilters::default(), + }; + let hits = app.search(q).unwrap(); + let chunk_id = hits[0].chunk_id.clone(); + + let result = app + .fetch(FetchQuery::Chunk(chunk_id), FetchOpts::default()) + .unwrap(); + assert_eq!(result.kind, FetchKind::Chunk); + assert!(result.chunk.is_some(), "target chunk populated"); + assert!(result.context_before.is_empty()); + assert!(result.context_after.is_empty()); + assert!(!result.truncated); +} +``` + +- [ ] **Step 3.2: Run test (verify failure)** + +```bash +cargo test -p kebab-app --test fetch_integration fetch_chunk_returns_target_only_when_no_context +``` + +Expected: FAIL — `App::fetch` does not exist. + +- [ ] **Step 3.3: Implement minimal `App::fetch` (chunk mode only)** + +Create `crates/kebab-app/src/fetch.rs`: + +```rust +//! p9-fb-35 verbatim fetch implementation. + +use anyhow::Result; +use time::OffsetDateTime; + +use kebab_core::{ + Chunk, ChunkId, DocumentId, DocumentStore, FetchKind, FetchOpts, FetchQuery, FetchResult, + Block, CanonicalDocument, +}; + +use crate::App; +use crate::error_wire::{ErrorV1, StructuredError}; +use crate::staleness::compute_stale; + +impl App { + /// p9-fb-35: verbatim fetch facade. Returns text from + /// `chunks.text` / `CanonicalDocument` based on the requested + /// mode. Errors surface as `StructuredError(ErrorV1)` with one + /// of `chunk_not_found` / `doc_not_found` / `span_not_supported`. + pub fn fetch(&self, query: FetchQuery, opts: FetchOpts) -> Result { + match query { + FetchQuery::Chunk(id) => fetch_chunk(self, id, opts), + FetchQuery::Doc(id) => fetch_doc(self, id, opts), + FetchQuery::Span { doc_id, line_start, line_end } => { + fetch_span(self, doc_id, line_start, line_end, opts) + } + } + } +} + +fn fetch_chunk(app: &App, id: ChunkId, opts: FetchOpts) -> Result { + let target = ::get_chunk(&app.sqlite, &id)? + .ok_or_else(|| { + StructuredError(ErrorV1 { + schema_version: "error.v1".to_string(), + code: "chunk_not_found".to_string(), + message: format!("chunk_id '{}' not found", id.0), + details: serde_json::Value::Null, + hint: None, + }) + })?; + + let doc_id = target.doc_id.clone(); + let doc = ::get_document(&app.sqlite, &doc_id)? + .ok_or_else(|| { + StructuredError(ErrorV1 { + schema_version: "error.v1".to_string(), + code: "doc_not_found".to_string(), + message: format!( + "doc_id '{}' (parent of chunk '{}') not found", + doc_id.0, id.0 + ), + details: serde_json::Value::Null, + hint: None, + }) + })?; + + let (context_before, context_after) = match opts.context { + Some(n) if n > 0 => surrounding_chunks(app, &doc_id, &id, n)?, + _ => (Vec::new(), Vec::new()), + }; + + let now = OffsetDateTime::now_utc(); + let stale = compute_stale( + doc_metadata_updated_at(&doc), + now, + app.config.search.stale_threshold_days, + ); + + Ok(FetchResult { + kind: FetchKind::Chunk, + doc_id: doc.doc_id.clone(), + doc_path: doc.workspace_path.clone(), + indexed_at: doc_metadata_updated_at(&doc), + stale, + chunk: Some(target), + context_before, + context_after, + text: None, + line_start: None, + line_end: None, + effective_end: None, + truncated: false, + }) +} + +fn fetch_doc(_app: &App, _id: DocumentId, _opts: FetchOpts) -> Result { + // Implemented in Task 4. + anyhow::bail!("fetch_doc not yet implemented") +} + +fn fetch_span( + _app: &App, + _id: DocumentId, + _line_start: u32, + _line_end: u32, + _opts: FetchOpts, +) -> Result { + // Implemented in Task 5. + anyhow::bail!("fetch_span not yet implemented") +} + +/// p9-fb-35: list chunks for a document in ordinal order, returning +/// the slice `[target_idx - n .. target_idx + n + 1]` (clamped). +/// Returns `(before, after)` with the target itself excluded. +fn surrounding_chunks( + app: &App, + doc_id: &DocumentId, + target: &ChunkId, + n: u32, +) -> Result<(Vec, Vec)> { + let chunks = list_chunks_in_order(app, doc_id)?; + let target_idx = chunks + .iter() + .position(|c| c.id == *target) + .ok_or_else(|| anyhow::anyhow!("chunk not found in doc chunk list"))?; + let n = n as usize; + let lo = target_idx.saturating_sub(n); + let hi = (target_idx + n + 1).min(chunks.len()); + let before: Vec = chunks[lo..target_idx].to_vec(); + let after: Vec = chunks[target_idx + 1..hi].to_vec(); + Ok((before, after)) +} + +/// p9-fb-35: ordinal-ordered chunk list for one document. Walks the +/// store, sorted by `chunks.created_at` (matches insertion order) +/// because the chunks table has no explicit ordinal column. +fn list_chunks_in_order(app: &App, doc_id: &DocumentId) -> Result> { + use rusqlite::params; + let conn = app.sqlite.lock_conn(); + let mut stmt = conn.prepare( + "SELECT chunk_id FROM chunks WHERE doc_id = ? ORDER BY created_at ASC, chunk_id ASC", + )?; + let ids: Vec = stmt + .query_map(params![doc_id.0], |r| r.get::<_, String>(0))? + .collect::, _>>()?; + drop(stmt); + drop(conn); + let mut out: Vec = Vec::with_capacity(ids.len()); + for id in ids { + let cid = ChunkId(id); + if let Some(chunk) = ::get_chunk( + &app.sqlite, + &cid, + )? { + out.push(chunk); + } + } + Ok(out) +} + +fn doc_metadata_updated_at(doc: &CanonicalDocument) -> OffsetDateTime { + doc.metadata.updated_at +} + +/// p9-fb-35: serialize a `CanonicalDocument` back to markdown. +/// Loses round-trip fidelity for inline-styled spans (Strong / +/// Emph children become flat text); good enough for an agent +/// looking for verbatim context. Tests Task 4. +pub(crate) fn fmt_canonical_to_markdown(doc: &CanonicalDocument) -> String { + let mut out = String::with_capacity(1024); + for (i, block) in doc.blocks.iter().enumerate() { + if i > 0 { + out.push_str("\n\n"); + } + match block { + Block::Heading(h) => { + for _ in 0..h.level.max(1).min(6) { + out.push('#'); + } + out.push(' '); + out.push_str(&h.text); + } + Block::Paragraph(t) | Block::Quote(t) => out.push_str(&t.text), + Block::List(l) => { + for (idx, item) in l.items.iter().enumerate() { + if idx > 0 { + out.push('\n'); + } + if l.ordered { + out.push_str(&format!("{}. {}", idx + 1, item.text)); + } else { + out.push_str(&format!("- {}", item.text)); + } + } + } + Block::Code(c) => { + out.push_str("```"); + if let Some(lang) = &c.lang { + out.push_str(lang); + } + out.push('\n'); + out.push_str(&c.code); + if !c.code.ends_with('\n') { + out.push('\n'); + } + out.push_str("```"); + } + Block::Table(t) => { + let header = t.headers.join(" | "); + out.push_str(&header); + out.push('\n'); + out.push_str(&"---|".repeat(t.headers.len())); + for row in &t.rows { + out.push('\n'); + out.push_str(&row.join(" | ")); + } + } + Block::ImageRef(img) => { + out.push_str(&format!("![{}]({})", img.alt, img.src)); + } + Block::AudioRef(_a) => { + out.push_str("(audio reference)"); + } + } + } + out +} + +/// p9-fb-35: free function entry for CLI / MCP. +pub fn fetch_with_config( + config: kebab_config::Config, + query: FetchQuery, + opts: FetchOpts, +) -> Result { + App::open_with_config(config)?.fetch(query, opts) +} +``` + +The `app.sqlite` field access depends on visibility. Check: + +```bash +grep -n "sqlite:\|pub(crate) sqlite\|sqlite.lock_conn\|pub fn lock_conn" crates/kebab-app/src/app.rs crates/kebab-store-sqlite/src/store.rs | head -10 +``` + +If `app.sqlite` is private, expose `pub(crate)` or add an accessor. If `lock_conn` is private to `kebab-store-sqlite`, expose `pub` or use a public method. Adapt as needed. + +If the `Chunk` struct's `id` field is named `chunk_id` instead of `id`, adapt the comparison `c.id == *target` accordingly. Verify: + +```bash +grep -A 3 "^pub struct Chunk\b" crates/kebab-core/src/traits.rs +``` + +- [ ] **Step 3.4: Wire fetch module** + +Edit `crates/kebab-app/src/lib.rs`. Add near other `mod` declarations: + +```rust +pub mod fetch; +pub use fetch::fetch_with_config; +``` + +Edit `crates/kebab-app/src/app.rs` — no change needed; `App::fetch` is in the `impl App { ... }` block in `fetch.rs`. + +- [ ] **Step 3.5: Run test (verify pass)** + +```bash +cargo test -p kebab-app --test fetch_integration fetch_chunk_returns_target_only_when_no_context +``` + +Expected: PASS. + +If `common::TestEnv::new()` / `ingest_md` don't exist with those exact names, look at what `tests/common/mod.rs` provides (fb-32 / fb-34 added several helpers): + +```bash +grep -n "pub fn\|pub struct" crates/kebab-app/tests/common/mod.rs | head -10 +``` + +Adapt the test to whatever scaffold exists. + +- [ ] **Step 3.6: Add `--context` test** + +Append to `crates/kebab-app/tests/fetch_integration.rs`: + +```rust +#[test] +fn fetch_chunk_with_context_returns_neighbors() { + let env = common::TestEnv::new(); + let body = "# H1\n\nA1\n\n# H2\n\nA2\n\n# H3\n\nA3\n\n# H4\n\nA4\n\n# H5\n\nA5\n"; + common::ingest_md(&env, "multi.md", body); + let app = env.app(); + + // Find the middle chunk (search for "A3"). + let q = kebab_core::SearchQuery { + text: "A3".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 1, + filters: kebab_core::SearchFilters::default(), + }; + let hits = app.search(q).unwrap(); + let chunk_id = hits[0].chunk_id.clone(); + + let result = app + .fetch( + FetchQuery::Chunk(chunk_id), + FetchOpts { + context: Some(2), + max_tokens: None, + }, + ) + .unwrap(); + assert_eq!(result.kind, FetchKind::Chunk); + assert!(result.chunk.is_some()); + // We may not have exactly 2 each side depending on chunker, but + // total context (before+after) should be at most 4 and at least 1. + let total = result.context_before.len() + result.context_after.len(); + assert!(total >= 1, "at least one neighbor expected"); + assert!(total <= 4, "context capped at ±2 ⇒ max 4 neighbors"); +} + +#[test] +fn fetch_chunk_unknown_id_returns_chunk_not_found() { + let env = common::TestEnv::new(); + let app = env.app(); + let err = app + .fetch( + FetchQuery::Chunk(kebab_core::ChunkId("nonexistent-id".to_string())), + FetchOpts::default(), + ) + .unwrap_err(); + let msg = err.to_string(); + assert!( + msg.contains("chunk_not_found") || msg.contains("nonexistent-id"), + "expected chunk_not_found error, got: {msg}" + ); +} +``` + +- [ ] **Step 3.7: Run tests (verify pass)** + +```bash +cargo test -p kebab-app --test fetch_integration +``` + +Expected: 3 PASS. + +- [ ] **Step 3.8: Commit** + +```bash +git add crates/kebab-app/src/fetch.rs crates/kebab-app/src/lib.rs crates/kebab-app/tests/fetch_integration.rs +git commit -m "$(cat <<'EOF' +feat(app): App::fetch chunk mode + markdown serializer (fb-35) + +Chunk mode + ±N context. doc / span modes return placeholder +errors (filled by subsequent tasks). fmt_canonical_to_markdown +helper introduced now since doc mode (Task 4) consumes it. +Errors are typed StructuredError so classify preserves +chunk_not_found / doc_not_found through the wire layer. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 4: `App::fetch` doc mode + budget + +**Files:** +- Modify: `crates/kebab-app/src/fetch.rs` +- Modify: `crates/kebab-app/tests/fetch_integration.rs` + +- [ ] **Step 4.1: Failing test** + +Append to `crates/kebab-app/tests/fetch_integration.rs`: + +```rust +#[test] +fn fetch_doc_returns_serialized_markdown() { + let env = common::TestEnv::new(); + let body = "# Heading One\n\nFirst paragraph.\n\n## Sub\n\nSecond.\n"; + common::ingest_md(&env, "doc.md", body); + let app = env.app(); + + let doc_summary = common::list_docs(&app); + let doc_id = doc_summary[0].doc_id.clone(); + + let result = app + .fetch(FetchQuery::Doc(doc_id), FetchOpts::default()) + .unwrap(); + assert_eq!(result.kind, FetchKind::Doc); + let text = result.text.expect("doc text"); + assert!(text.contains("Heading One"), "doc text contains heading: {text:?}"); + assert!(text.contains("First paragraph"), "doc text contains body"); + assert!(!result.truncated); +} + +#[test] +fn fetch_doc_unknown_id_returns_doc_not_found() { + let env = common::TestEnv::new(); + let app = env.app(); + let err = app + .fetch( + FetchQuery::Doc(kebab_core::DocumentId("nonexistent-doc".to_string())), + FetchOpts::default(), + ) + .unwrap_err(); + assert!(err.to_string().contains("doc_not_found")); +} + +#[test] +fn fetch_doc_with_max_tokens_truncates() { + let env = common::TestEnv::new(); + // Long doc — repeated paragraphs. + let p = "Lorem ipsum dolor sit amet consectetur adipiscing elit. ".repeat(20); + let body = format!("# Big\n\n{p}\n"); + common::ingest_md(&env, "big.md", &body); + let app = env.app(); + let doc_id = common::list_docs(&app)[0].doc_id.clone(); + + let result = app + .fetch( + FetchQuery::Doc(doc_id), + FetchOpts { + context: None, + max_tokens: Some(20), // ~80 chars + }, + ) + .unwrap(); + assert!(result.truncated); + let text = result.text.expect("doc text"); + assert!(text.len() <= 100, "trimmed text length {}", text.len()); +} +``` + +If `common::list_docs` doesn't exist, look at how other tests find a doc id (e.g. via `app.list_docs(...)`): + +```bash +grep -n "list_docs\|list_documents" crates/kebab-app/src/lib.rs crates/kebab-app/src/app.rs | head -5 +``` + +Adapt — possibly inline the listing call. + +- [ ] **Step 4.2: Run tests (verify failure)** + +```bash +cargo test -p kebab-app --test fetch_integration fetch_doc +``` + +Expected: FAIL — `fetch_doc not yet implemented`. + +- [ ] **Step 4.3: Implement doc mode** + +Replace the placeholder `fetch_doc` in `crates/kebab-app/src/fetch.rs` with: + +```rust +fn fetch_doc(app: &App, id: DocumentId, opts: FetchOpts) -> Result { + let doc = ::get_document(&app.sqlite, &id)? + .ok_or_else(|| { + StructuredError(ErrorV1 { + schema_version: "error.v1".to_string(), + code: "doc_not_found".to_string(), + message: format!("doc_id '{}' not found", id.0), + details: serde_json::Value::Null, + hint: None, + }) + })?; + + let mut text = fmt_canonical_to_markdown(&doc); + let mut truncated = false; + if let Some(max_tokens) = opts.max_tokens { + let max_chars = max_tokens.saturating_mul(4); + if text.chars().count() > max_chars { + text = trim_to_chars(&text, max_chars); + truncated = true; + } + } + + let now = OffsetDateTime::now_utc(); + let stale = compute_stale( + doc_metadata_updated_at(&doc), + now, + app.config.search.stale_threshold_days, + ); + + Ok(FetchResult { + kind: FetchKind::Doc, + doc_id: doc.doc_id.clone(), + doc_path: doc.workspace_path.clone(), + indexed_at: doc_metadata_updated_at(&doc), + stale, + chunk: None, + context_before: Vec::new(), + context_after: Vec::new(), + text: Some(text), + line_start: None, + line_end: None, + effective_end: None, + truncated, + }) +} + +/// p9-fb-35: trim string to N chars (Unicode-safe). Mirrors fb-34's +/// helper at `crates/kebab-app/src/app.rs` — kept local to avoid +/// re-exporting an internal helper. +fn trim_to_chars(s: &str, n: usize) -> String { + if s.chars().count() <= n { + return s.to_string(); + } + let mut out = String::with_capacity(n * 4); + for (i, c) in s.chars().enumerate() { + if i >= n { + break; + } + out.push(c); + } + out +} +``` + +- [ ] **Step 4.4: Run tests (verify pass)** + +```bash +cargo test -p kebab-app --test fetch_integration fetch_doc +``` + +Expected: 3 PASS. + +- [ ] **Step 4.5: Commit** + +```bash +git add crates/kebab-app/src/fetch.rs crates/kebab-app/tests/fetch_integration.rs +git commit -m "$(cat <<'EOF' +feat(app): App::fetch doc mode with budget (fb-35) + +Walks CanonicalDocument blocks, serializes to markdown, applies +chars/4 budget when opts.max_tokens is set. doc_not_found +preserved through StructuredError. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 5: `App::fetch` span mode + PDF/audio rejection + +**Files:** +- Modify: `crates/kebab-app/src/fetch.rs` +- Modify: `crates/kebab-app/tests/fetch_integration.rs` + +- [ ] **Step 5.1: Failing test** + +Append to `crates/kebab-app/tests/fetch_integration.rs`: + +```rust +#[test] +fn fetch_span_returns_line_range() { + let env = common::TestEnv::new(); + let body = "Line one.\nLine two.\nLine three.\nLine four.\nLine five.\n"; + common::ingest_md(&env, "lines.md", body); + let app = env.app(); + let doc_id = common::list_docs(&app)[0].doc_id.clone(); + + let result = app + .fetch( + FetchQuery::Span { + doc_id, + line_start: 2, + line_end: 4, + }, + FetchOpts::default(), + ) + .unwrap(); + assert_eq!(result.kind, FetchKind::Span); + let text = result.text.expect("span text"); + // Lines 2-4 of the rendered markdown — exact content depends on + // canonicalization, so check that we got a 3-line slice. + assert_eq!(text.lines().count(), 3, "span should be 3 lines: {text:?}"); + assert_eq!(result.line_start, Some(2)); + assert_eq!(result.line_end, Some(4)); + assert_eq!(result.effective_end, Some(4)); + assert!(!result.truncated); +} + +#[test] +fn fetch_span_clamps_line_end_when_out_of_range() { + let env = common::TestEnv::new(); + common::ingest_md(&env, "short.md", "Line one.\nLine two.\n"); + let app = env.app(); + let doc_id = common::list_docs(&app)[0].doc_id.clone(); + + let result = app + .fetch( + FetchQuery::Span { + doc_id, + line_start: 1, + line_end: 999, + }, + FetchOpts::default(), + ) + .unwrap(); + let text = result.text.expect("span text"); + let actual_lines = text.lines().count(); + assert_eq!(result.effective_end, Some(actual_lines as u32)); + assert!(actual_lines < 999); +} +``` + +- [ ] **Step 5.2: Run tests (verify failure)** + +```bash +cargo test -p kebab-app --test fetch_integration fetch_span +``` + +Expected: FAIL. + +- [ ] **Step 5.3: Implement span mode** + +Replace the placeholder `fetch_span` in `crates/kebab-app/src/fetch.rs`: + +```rust +fn fetch_span( + app: &App, + id: DocumentId, + line_start: u32, + line_end: u32, + opts: FetchOpts, +) -> Result { + let doc = ::get_document(&app.sqlite, &id)? + .ok_or_else(|| { + StructuredError(ErrorV1 { + schema_version: "error.v1".to_string(), + code: "doc_not_found".to_string(), + message: format!("doc_id '{}' not found", id.0), + details: serde_json::Value::Null, + hint: None, + }) + })?; + + // Reject line-incompatible source types (PDF / audio). + if matches!( + doc.metadata.source_type, + kebab_core::SourceType::Pdf | kebab_core::SourceType::Audio + ) { + return Err(StructuredError(ErrorV1 { + schema_version: "error.v1".to_string(), + code: "span_not_supported".to_string(), + message: format!( + "doc '{}' has source_type {:?}; line-based span fetch unsupported. \ + Use `fetch chunk` or `fetch doc` instead.", + id.0, doc.metadata.source_type + ), + details: serde_json::Value::Null, + hint: Some("kind = chunk or kind = doc instead".to_string()), + }))?; + } + + if line_start == 0 || line_end == 0 || line_end < line_start { + return Err(StructuredError(ErrorV1 { + schema_version: "error.v1".to_string(), + code: "invalid_input".to_string(), + message: format!( + "line_start ({line_start}) and line_end ({line_end}) must be 1-based with start <= end" + ), + details: serde_json::Value::Null, + hint: None, + }))?; + } + + let full = fmt_canonical_to_markdown(&doc); + let lines: Vec<&str> = full.lines().collect(); + let total = lines.len() as u32; + let effective_end = line_end.min(total).max(line_start); + let lo = (line_start - 1) as usize; + let hi = effective_end as usize; + let mut text = lines[lo..hi].join("\n"); + + let mut truncated = effective_end != line_end; + let mut effective_end_after_budget = effective_end; + if let Some(max_tokens) = opts.max_tokens { + let max_chars = max_tokens.saturating_mul(4); + if text.chars().count() > max_chars { + text = trim_to_chars(&text, max_chars); + truncated = true; + // Recount lines after char-trim — effective_end may shrink. + let kept = text.lines().count() as u32; + effective_end_after_budget = (line_start - 1) + kept; + } + } + + let now = OffsetDateTime::now_utc(); + let stale = compute_stale( + doc_metadata_updated_at(&doc), + now, + app.config.search.stale_threshold_days, + ); + + Ok(FetchResult { + kind: FetchKind::Span, + doc_id: doc.doc_id.clone(), + doc_path: doc.workspace_path.clone(), + indexed_at: doc_metadata_updated_at(&doc), + stale, + chunk: None, + context_before: Vec::new(), + context_after: Vec::new(), + text: Some(text), + line_start: Some(line_start), + line_end: Some(line_end), + effective_end: Some(effective_end_after_budget), + truncated, + }) +} +``` + +If `kebab_core::SourceType` doesn't have `Pdf` / `Audio` variants exactly (might be lowercase, might be different names), adapt: + +```bash +grep -A 10 "^pub enum SourceType" crates/kebab-core/src/metadata.rs +``` + +- [ ] **Step 5.4: Run tests (verify pass)** + +```bash +cargo test -p kebab-app --test fetch_integration fetch_span +``` + +Expected: 2 PASS. + +- [ ] **Step 5.5: Run all kebab-app tests** + +```bash +cargo test -p kebab-app +``` + +Expected: all PASS — existing search / staleness / cursor tests unaffected. + +- [ ] **Step 5.6: Commit** + +```bash +git add crates/kebab-app/src/fetch.rs crates/kebab-app/tests/fetch_integration.rs +git commit -m "$(cat <<'EOF' +feat(app): App::fetch span mode + PDF/audio rejection (fb-35) + +Line-based slice over fmt_canonical_to_markdown output. +PDF / audio source_type → span_not_supported StructuredError. +Out-of-range line_end clamps to total; effective_end reflects +post-budget trim. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 6: CLI `kebab fetch` subcommand + +**Files:** +- Modify: `crates/kebab-cli/src/main.rs` +- Modify: `crates/kebab-cli/src/wire.rs` + +- [ ] **Step 6.1: Add wire helper** + +Edit `crates/kebab-cli/src/wire.rs`. Append after `wire_search_response`: + +```rust +/// p9-fb-35: tag a `FetchResult` as `fetch_result.v1`. +pub fn wire_fetch_result(r: &kebab_core::FetchResult) -> Value { + let mut v = serde_json::to_value(r).expect("FetchResult serializes"); + if let Value::Object(ref mut map) = v { + map.insert( + "schema_version".to_string(), + Value::String("fetch_result.v1".to_string()), + ); + } + v +} +``` + +(Use the same shape pattern as `wire_search_response` from fb-34. The key insertion is idempotent / inserts the discriminator on top of serde's default.) + +- [ ] **Step 6.2: Add clap subcommand** + +Edit `crates/kebab-cli/src/main.rs`. Find the existing `Cmd::Inspect` enum branch and add a `Fetch` variant nearby: + +```bash +grep -n "Inspect {\|InspectWhat" crates/kebab-cli/src/main.rs | head -3 +``` + +In the `enum Cmd { ... }` definition, add: + +```rust +/// p9-fb-35: verbatim chunk / doc / span fetch. +Fetch { + #[command(subcommand)] + what: FetchWhat, +}, +``` + +Add the `FetchWhat` enum near `InspectWhat`: + +```rust +#[derive(Subcommand, Debug)] +enum FetchWhat { + /// Fetch a single chunk verbatim, optionally with surrounding context. + Chunk { + id: String, + /// p9-fb-35: include ±N chunks before and after the target. + #[arg(long)] + context: Option, + }, + /// Fetch the entire normalized markdown text of a document. + Doc { + id: String, + /// p9-fb-35: chars/4 budget cap. + #[arg(long)] + max_tokens: Option, + }, + /// Fetch a 1-based line range of a document. PDF / audio source + /// types are rejected (use `fetch chunk` instead). + Span { + doc_id: String, + line_start: u32, + line_end: u32, + /// p9-fb-35: chars/4 budget cap. + #[arg(long)] + max_tokens: Option, + }, +} +``` + +In the match arm of `run()`, add `Cmd::Fetch { what } => { ... }`. Place it next to the `Cmd::Inspect` arm: + +```rust +Cmd::Fetch { what } => { + let cfg = kebab_config::Config::load(cli.config.as_deref())?; + let (query, opts) = match what { + FetchWhat::Chunk { id, context } => ( + kebab_core::FetchQuery::Chunk(kebab_core::ChunkId(id.clone())), + kebab_core::FetchOpts { + context: *context, + max_tokens: None, + }, + ), + FetchWhat::Doc { id, max_tokens } => ( + kebab_core::FetchQuery::Doc(kebab_core::DocumentId(id.clone())), + kebab_core::FetchOpts { + context: None, + max_tokens: *max_tokens, + }, + ), + FetchWhat::Span { + doc_id, + line_start, + line_end, + max_tokens, + } => ( + kebab_core::FetchQuery::Span { + doc_id: kebab_core::DocumentId(doc_id.clone()), + line_start: *line_start, + line_end: *line_end, + }, + kebab_core::FetchOpts { + context: None, + max_tokens: *max_tokens, + }, + ), + }; + let result = kebab_app::fetch_with_config(cfg, query, opts)?; + if cli.json { + println!("{}", serde_json::to_string(&wire::wire_fetch_result(&result))?); + } else { + render_fetch_plain(&result); + } + Ok(()) +} +``` + +Add the plain renderer function near the bottom of `main.rs`: + +```rust +/// p9-fb-35: human-friendly plain output. +fn render_fetch_plain(r: &kebab_core::FetchResult) { + println!("# {} ({})", r.doc_path.0, format_kind(r.kind)); + if r.stale { + println!("[stale; indexed_at = {}]", r.indexed_at); + } + match r.kind { + kebab_core::FetchKind::Chunk => { + for (label, chunks) in [ + ("=== before ===", &r.context_before), + ("=== target ===", &r.chunk.iter().cloned().collect::>()), + ("=== after ===", &r.context_after), + ] { + if !chunks.is_empty() { + println!("\n{label}"); + for c in chunks { + println!("[{} § {}]\n{}\n", c.id.0, c.heading_path.last().map(|s| s.as_str()).unwrap_or(""), c.text); + } + } + } + } + kebab_core::FetchKind::Doc | kebab_core::FetchKind::Span => { + if let Some(text) = &r.text { + println!("\n{text}"); + } + if r.truncated { + eprintln!("[truncated; widen --max-tokens for fuller text]"); + } + } + } +} + +fn format_kind(k: kebab_core::FetchKind) -> &'static str { + match k { + kebab_core::FetchKind::Chunk => "chunk", + kebab_core::FetchKind::Doc => "doc", + kebab_core::FetchKind::Span => "span", + } +} +``` + +If `Chunk.id` is named `chunk_id` (verify), adjust accordingly. The format/visibility of the `Chunk` struct may require minor adaptation — inspect existing CLI plain renderers (`Cmd::Search` with hits) for the idiom. + +- [ ] **Step 6.3: Build CLI** + +```bash +cargo build -p kebab-cli +``` + +Expected: clean. + +- [ ] **Step 6.4: Verify --help** + +```bash +cargo run -q -p kebab-cli -- fetch --help 2>&1 | head -20 +cargo run -q -p kebab-cli -- fetch chunk --help 2>&1 | head -10 +cargo run -q -p kebab-cli -- fetch doc --help 2>&1 | head -10 +cargo run -q -p kebab-cli -- fetch span --help 2>&1 | head -10 +``` + +Expected: subcommand help shows `chunk` / `doc` / `span` and per-mode flags. + +- [ ] **Step 6.5: Run kebab-cli tests** + +```bash +cargo test -p kebab-cli +``` + +Expected: all PASS. + +- [ ] **Step 6.6: Commit** + +```bash +git add crates/kebab-cli/src/main.rs crates/kebab-cli/src/wire.rs +git commit -m "$(cat <<'EOF' +feat(cli): kebab fetch chunk / doc / span (fb-35) + +JSON output is fetch_result.v1; plain output is human-friendly +labeled sections (chunk: before / target / after; doc/span: full +text + stderr truncated hint). + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 7: CLI integration tests + +**Files:** +- Create: `crates/kebab-cli/tests/wire_fetch.rs` +- Modify: `crates/kebab-cli/tests/common/mod.rs` (if helper missing) + +- [ ] **Step 7.1: Add `run_fetch_with_args` helper** + +Edit `crates/kebab-cli/tests/common/mod.rs`. After existing `run_search_with_args`: + +```rust +/// p9-fb-35: invoke `kebab fetch` with arbitrary args. +pub fn run_fetch_with_args(cfg: &std::path::Path, args: &[&str]) -> (String, String) { + let exe = env!("CARGO_BIN_EXE_kebab"); + let mut cmd_args: Vec<&str> = vec!["--config"]; + let cfg_str = cfg.to_str().expect("utf8"); + cmd_args.push(cfg_str); + cmd_args.push("fetch"); + cmd_args.extend(args); + let out = std::process::Command::new(exe) + .args(&cmd_args) + .output() + .expect("kebab fetch"); + ( + String::from_utf8_lossy(&out.stdout).to_string(), + String::from_utf8_lossy(&out.stderr).to_string(), + ) +} +``` + +If the existing `run_search_with_args` has a different signature (e.g. takes `&workspace`), mirror it. + +- [ ] **Step 7.2: Write integration tests** + +Create `crates/kebab-cli/tests/wire_fetch.rs`: + +```rust +//! p9-fb-35: CLI fetch wire shape + plain output + exit codes. + +mod common; + +use serde_json::Value; + +#[test] +fn fetch_chunk_json_emits_fetch_result_v1() { + let (cfg, ws) = common::write_config(); + common::ingest(&cfg, &ws, "a.md", "# T\n\napples are red.\n"); + + // First find a chunk_id via search. + let (search_stdout, _) = common::run_search_with_args( + &cfg, + &["--mode", "lexical", "--json", "--k", "1", "apples"], + ); + let search: Value = serde_json::from_str(search_stdout.trim()).expect("search json"); + let chunk_id = search["hits"][0]["chunk_id"] + .as_str() + .expect("chunk_id") + .to_string(); + + let (stdout, _) = common::run_fetch_with_args( + &cfg, + &["--json", "chunk", &chunk_id], + ); + let v: Value = serde_json::from_str(stdout.trim()).expect("fetch json"); + assert_eq!(v["schema_version"], "fetch_result.v1"); + assert_eq!(v["kind"], "chunk"); + assert!(v["chunk"].is_object()); + assert_eq!(v["truncated"], false); +} + +#[test] +fn fetch_doc_json_with_max_tokens_truncates() { + let (cfg, ws) = common::write_config(); + let body: String = "Lorem ipsum dolor sit amet. ".repeat(20); + common::ingest(&cfg, &ws, "big.md", &format!("# Big\n\n{body}\n")); + + // Discover doc_id via list-docs. + let (list_stdout, _) = common::run_with_args( + &cfg, + &["list", "docs", "--json"], + ); + // list docs --json prints `doc_summary.v1` ndjson; parse first line. + let first = list_stdout.lines().next().expect("at least one doc"); + let summary: Value = serde_json::from_str(first).expect("doc_summary json"); + let doc_id = summary["doc_id"].as_str().expect("doc_id").to_string(); + + let (stdout, _) = common::run_fetch_with_args( + &cfg, + &["--json", "doc", &doc_id, "--max-tokens", "20"], + ); + let v: Value = serde_json::from_str(stdout.trim()).expect("fetch json"); + assert_eq!(v["kind"], "doc"); + assert_eq!(v["truncated"], true); +} + +#[test] +fn fetch_chunk_unknown_id_exits_with_error_v1() { + let (cfg, _ws) = common::write_config(); + let exe = env!("CARGO_BIN_EXE_kebab"); + let cfg_str = cfg.to_str().expect("utf8"); + let out = std::process::Command::new(exe) + .args(["--config", cfg_str, "--json", "fetch", "chunk", "nonexistent"]) + .output() + .expect("kebab fetch"); + assert_ne!(out.status.code(), Some(0), "must exit non-zero"); + let stderr = String::from_utf8_lossy(&out.stderr); + let err_line = stderr + .lines() + .find(|l| { + serde_json::from_str::(l) + .ok() + .and_then(|v| v.get("schema_version").and_then(|s| s.as_str()).map(String::from)) + .as_deref() + == Some("error.v1") + }) + .unwrap_or_else(|| panic!("no error.v1 on stderr: {stderr}")); + let v: Value = serde_json::from_str(err_line).expect("error.v1 json"); + assert_eq!(v["code"], "chunk_not_found"); +} +``` + +If `common::run_with_args` (generic dispatcher) doesn't exist, write a one-off `Command` invocation inline for the list-docs call. The goal is to obtain a doc_id; any working route is fine. + +- [ ] **Step 7.3: Run tests** + +```bash +cargo test -p kebab-cli --test wire_fetch +``` + +Expected: 3 PASS. + +- [ ] **Step 7.4: Run full kebab-cli suite** + +```bash +cargo test -p kebab-cli +``` + +Expected: all PASS, no regressions. + +- [ ] **Step 7.5: Commit** + +```bash +git add crates/kebab-cli/tests/ +git commit -m "$(cat <<'EOF' +test(cli): wire_fetch — chunk/doc + chunk_not_found integration (fb-35) + +3 integration tests: chunk JSON shape, doc truncated, unknown id +returns error.v1 with code = chunk_not_found. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 8: MCP `kebab__fetch` tool + +**Files:** +- Create: `crates/kebab-mcp/src/tools/fetch.rs` +- Modify: `crates/kebab-mcp/src/lib.rs` +- Create: `crates/kebab-mcp/tests/tools_call_fetch.rs` + +- [ ] **Step 8.1: Inspect MCP tool registration pattern** + +```bash +sed -n '1,80p' crates/kebab-mcp/src/lib.rs +ls crates/kebab-mcp/src/tools/ +``` + +Read `src/tools/search.rs` to mirror the input/handler pattern. + +- [ ] **Step 8.2: Implement fetch tool** + +Create `crates/kebab-mcp/src/tools/fetch.rs`: + +```rust +//! `fetch` tool — wraps `kebab_app::fetch_with_config` for chunk / +//! doc / span mode. Input shape matches the spec / SKILL.md. + +use rmcp::model::CallToolResult; +use schemars::JsonSchema; +use serde::{Deserialize, Serialize}; + +use crate::error::{to_tool_error, to_tool_success}; +use crate::state::KebabAppState; + +#[derive(Debug, Deserialize, Serialize, JsonSchema)] +pub struct FetchInput { + /// "chunk" | "doc" | "span" + pub kind: String, + /// Required when kind = "chunk". + pub chunk_id: Option, + /// Required when kind = "doc" or "span". + pub doc_id: Option, + /// Required when kind = "span" (1-based, inclusive). + pub line_start: Option, + pub line_end: Option, + /// chunk only: ±N surrounding chunks. + pub context: Option, + /// doc/span only: chars/4 budget. + pub max_tokens: Option, +} + +pub fn handle(state: &KebabAppState, input: FetchInput) -> CallToolResult { + let query = match input.kind.as_str() { + "chunk" => match input.chunk_id { + Some(id) => kebab_core::FetchQuery::Chunk(kebab_core::ChunkId(id)), + None => return invalid_input("kind=chunk requires chunk_id"), + }, + "doc" => match input.doc_id { + Some(id) => kebab_core::FetchQuery::Doc(kebab_core::DocumentId(id)), + None => return invalid_input("kind=doc requires doc_id"), + }, + "span" => match (input.doc_id, input.line_start, input.line_end) { + (Some(id), Some(start), Some(end)) => kebab_core::FetchQuery::Span { + doc_id: kebab_core::DocumentId(id), + line_start: start, + line_end: end, + }, + _ => return invalid_input("kind=span requires doc_id, line_start, line_end"), + }, + other => return invalid_input(&format!("unknown kind '{other}'; expected chunk|doc|span")), + }; + + let opts = kebab_core::FetchOpts { + context: input.context, + max_tokens: input.max_tokens, + }; + + let cfg_clone = (*state.config).clone(); + let result = kebab_app::fetch_with_config(cfg_clone, query, opts); + match result { + Ok(r) => { + let mut v = match serde_json::to_value(&r) { + Ok(v) => v, + Err(e) => return to_tool_error(&anyhow::anyhow!("FetchResult serialize: {e}")), + }; + if let serde_json::Value::Object(ref mut map) = v { + map.insert( + "schema_version".to_string(), + serde_json::Value::String("fetch_result.v1".to_string()), + ); + } + match serde_json::to_string(&v) { + Ok(json) => to_tool_success(json), + Err(e) => to_tool_error(&anyhow::anyhow!(e)), + } + } + Err(e) => to_tool_error(&e), + } +} + +fn invalid_input(msg: &str) -> CallToolResult { + to_tool_error(&anyhow::anyhow!("invalid_input: {msg}")) +} +``` + +If `to_tool_error` already wraps in `error.v1.code = invalid_input` based on the message prefix, the `format!("invalid_input: ...")` is enough. Verify by reading `crates/kebab-mcp/src/error.rs` (or wherever `to_tool_error` lives). If not, construct an explicit `StructuredError`: + +```rust +use kebab_app::{ErrorV1, StructuredError}; + +fn invalid_input(msg: &str) -> CallToolResult { + let err = anyhow::Error::new(StructuredError(ErrorV1 { + schema_version: "error.v1".to_string(), + code: "invalid_input".to_string(), + message: msg.to_string(), + details: serde_json::Value::Null, + hint: None, + })); + to_tool_error(&err) +} +``` + +This guarantees `error.v1.code = "invalid_input"` propagates through. + +- [ ] **Step 8.3: Register tool** + +Edit `crates/kebab-mcp/src/lib.rs`. Find where `search` / `ask` / `schema` / `doctor` / `ingest_*` tools are registered (search for `tools::search` or `register`): + +```bash +grep -n "tools::search\|tools::ask\|register_tool\|mcp_tool" crates/kebab-mcp/src/lib.rs | head -10 +``` + +Add `mod fetch;` to the `mod tools { ... }` block (or wherever) and register the tool the same way as `search`. Inspect existing registration boilerplate and clone it. + +- [ ] **Step 8.4: Write integration test** + +Create `crates/kebab-mcp/tests/tools_call_fetch.rs`. Mirror `tests/tools_call_search.rs`: + +```rust +//! p9-fb-35: mcp__kebab__fetch tool — 3 modes + invalid_input. + +mod common; + +use serde_json::Value; + +#[test] +fn fetch_tool_chunk_returns_fetch_result_v1() { + let env = common::TestEnv::new(); + common::ingest_md(&env, "a.md", "# T\n\napples\n"); + + // Discover a chunk_id via the search tool. + let chunk_id = common::call_search(&env, "apples") + .pointer("/hits/0/chunk_id") + .expect("search returns at least one hit") + .as_str() + .expect("chunk_id string") + .to_string(); + + let body = common::call_fetch_chunk(&env, &chunk_id); + let v: Value = serde_json::from_str(&body).expect("fetch json"); + assert_eq!(v["schema_version"], "fetch_result.v1"); + assert_eq!(v["kind"], "chunk"); +} + +#[test] +fn fetch_tool_invalid_kind_returns_invalid_input() { + let env = common::TestEnv::new(); + let body = common::call_fetch_raw( + &env, + serde_json::json!({"kind": "garbage"}), + ); + let v: Value = serde_json::from_str(&body).expect("error json"); + assert_eq!(v["schema_version"], "error.v1"); + assert_eq!(v["code"], "invalid_input"); +} +``` + +The `common::call_fetch_chunk` / `call_search` helpers depend on the existing kebab-mcp test scaffold. Inspect `crates/kebab-mcp/tests/common/mod.rs` (or `tests/tools_call_search.rs`'s test setup) and clone the pattern. Add helpers as needed. + +- [ ] **Step 8.5: Run MCP tests** + +```bash +cargo test -p kebab-mcp +``` + +Expected: all PASS. + +- [ ] **Step 8.6: Commit** + +```bash +git add crates/kebab-mcp/ +git commit -m "$(cat <<'EOF' +feat(mcp): kebab__fetch tool — chunk / doc / span (fb-35) + +Mirrors CLI surface: same input shape, same fetch_result.v1 +output. invalid_input error for missing kind-specific fields. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 9: Workspace test + clippy gate + +- [ ] **Step 9.1: Workspace test** + +```bash +cargo test --workspace --no-fail-fast -j 1 2>&1 | tail -20 +``` + +Expected: all PASS. + +- [ ] **Step 9.2: Clippy** + +```bash +cargo clippy --workspace --all-targets -- -D warnings 2>&1 | tail -10 +``` + +Expected: clean. New patterns to watch: +- `clippy::result_large_err` on `App::fetch` if it returns `Result`. We use `Result` so this should not fire — `StructuredError` wraps `ErrorV1` inside an `anyhow::Error`. +- `clippy::large_enum_variant` on `FetchQuery` (`Span` carries 3 fields). Smaller than `StreamEvent::Final`; should be fine. + +- [ ] **Step 9.3: Commit clippy fixes if needed** + +```bash +git add -A +git commit -m "chore: clippy fixes for fb-35" +``` + +(Skip if no fixes were necessary.) + +--- + +## Task 10: Documentation updates + +**Files:** +- Modify: `README.md` +- Modify: `docs/SMOKE.md` +- Modify: `tasks/p9/p9-fb-35-verbatim-fetch.md` +- Modify: `tasks/INDEX.md` +- Modify: `integrations/claude-code/kebab/SKILL.md` + +- [ ] **Step 10.1: README — fetch row** + +Find the 명령 table: + +```bash +grep -n "| .kebab " README.md | head -10 +``` + +Add a new row after `inspect`: + +```markdown +| `kebab fetch chunk [--context N]` / `kebab fetch doc [--max-tokens N]` / `kebab fetch span [--max-tokens N]` | (p9-fb-35) verbatim text fetch from indexed corpus. wire = `fetch_result.v1` (kind discriminator). chunk: target + ±N ordinal-context chunks. doc: full normalized markdown. span: 1-based line range (PDF/audio rejected as `error.v1.code = span_not_supported`). chars/4 budget on doc/span. | +``` + +- [ ] **Step 10.2: SMOKE.md — fetch walkthrough** + +Append a section after the fb-34 pagination block: + +```markdown +### Verbatim fetch (fb-35) + +```bash +# Search to get a chunk_id. +CHUNK_ID=$(kebab search "rust ownership" --json --k 1 | jq -r '.hits[0].chunk_id') + +# Fetch verbatim with surrounding context. +kebab fetch chunk "$CHUNK_ID" --context 2 --json | jq . + +# Fetch the full doc as markdown. +DOC_ID=$(kebab list docs --json | head -1 | jq -r .doc_id) +kebab fetch doc "$DOC_ID" --max-tokens 1000 --json | jq '{kind, truncated, len: (.text | length)}' + +# Fetch a line range (markdown / text only). +kebab fetch span "$DOC_ID" 1 5 --json | jq '{line_start, line_end, effective_end, text}' +``` + +PDF / audio docs reject `fetch span` with `error.v1.code = span_not_supported` — use `fetch chunk` (PDF chunks are page-aligned) or `fetch doc` instead. +``` + +- [ ] **Step 10.3: Spec status flip** + +Edit `tasks/p9/p9-fb-35-verbatim-fetch.md`: + +```diff +-status: open ++status: completed +``` + +Replace the `> ⏳ **백로그 only — 미구현.**` block with: + +```markdown +> ✅ **구현 완료.** 본 spec 은 구현 시점의 frozen 상태. post-merge deviation 은 [HOTFIXES.md](../HOTFIXES.md) 참조. + +상세 설계: `docs/superpowers/specs/2026-05-09-p9-fb-35-verbatim-fetch-design.md`. +구현 계획: `docs/superpowers/plans/2026-05-09-p9-fb-35-verbatim-fetch.md`. +``` + +- [ ] **Step 10.4: tasks/INDEX.md** + +```diff +- - [p9-fb-35 verbatim fetch](p9/p9-fb-35-verbatim-fetch.md) — ⏳ 미구현, brainstorm 필요 ++ - [p9-fb-35 verbatim fetch](p9/p9-fb-35-verbatim-fetch.md) — ✅ 머지 + v0.5.0 cut 후보 (2026-05-09) +``` + +- [ ] **Step 10.5: SKILL.md** + +Find the MCP tools table (around line 33-41) and add a row: + +```markdown +| `mcp__kebab__fetch` | verbatim text → `fetch_result.v1` (chunk / doc / span) | no | +``` + +After the `mcp__kebab__ask` section, add a new section: + +```markdown +### `mcp__kebab__fetch` — when you need raw text + +Use after `search` to read the verbatim chunk text + surrounding context, or to pull a full doc / line range. + +Input: +```json +{ "kind": "chunk", "chunk_id": "", "context": 2 } +{ "kind": "doc", "doc_id": "", "max_tokens": 1000 } +{ "kind": "span", "doc_id": "", "line_start": 1, "line_end": 5 } +``` + +- `chunk` mode: `--context N` returns ordinal-adjacent chunks before/after for surrounding paragraphs. +- `doc` mode: full normalized markdown. `max_tokens` (chars/4) caps the response — `truncated: true` when applied. +- `span` mode: 1-based line range. PDF / audio docs reject as `error.v1.code = span_not_supported` (use `chunk` mode instead — PDF chunks are page-aligned). +- `error.v1.code = chunk_not_found` / `doc_not_found` are non-retryable from the same query — re-issue search to get a fresh id. +``` + +- [ ] **Step 10.6: Commit docs** + +```bash +git add README.md docs/SMOKE.md tasks/p9/p9-fb-35-verbatim-fetch.md tasks/INDEX.md integrations/claude-code/kebab/SKILL.md +git commit -m "$(cat <<'EOF' +docs(fb-35): README + SMOKE + INDEX + skill notes + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 11: Smoke + push + PR + +- [ ] **Step 11.1: Manual smoke** + +```bash +cd /tmp/kebab-smoke +~/Workspace/projects/kebab/target/release/kebab --config /tmp/kebab-smoke/config.toml ingest +CHUNK_ID=$(~/Workspace/projects/kebab/target/release/kebab --config /tmp/kebab-smoke/config.toml search "test" --json --k 1 | jq -r '.hits[0].chunk_id') +~/Workspace/projects/kebab/target/release/kebab --config /tmp/kebab-smoke/config.toml fetch chunk "$CHUNK_ID" --context 1 --json | jq '{schema_version, kind, before: (.context_before | length), after: (.context_after | length)}' +``` + +Expected: +- `schema_version: "fetch_result.v1"`, `kind: "chunk"`, sane before/after counts. + +- [ ] **Step 11.2: Final workspace test** + +```bash +cd ~/Workspace/projects/kebab +cargo test --workspace --no-fail-fast -j 1 +``` + +Expected: all green. + +- [ ] **Step 11.3: Push branch** + +```bash +git push -u origin feat/fb-35-verbatim-fetch +``` + +- [ ] **Step 11.4: Open PR via gitea-pr** + +Build PR body at `/tmp/fb35-pr-body.md`: + +```markdown +## Summary + +- adds `kebab fetch chunk|doc|span` CLI subcommand + `mcp__kebab__fetch` MCP tool +- wire = `fetch_result.v1` (kind discriminator). chunk mode optional `--context N` returns ±N ordinal-adjacent chunks. doc mode serializes `CanonicalDocument` to markdown. span mode slices line range (1-based inclusive). +- PDF / audio source_type rejected on span fetch — `error.v1.code = span_not_supported`. +- chars/4 budget on doc/span (mirrors fb-34); chunk fetch unbounded. +- All errors typed via fb-34 `StructuredError` wrapper — `chunk_not_found` / `doc_not_found` / `span_not_supported` / `invalid_input` reach the wire. + +## Test plan + +- [x] `cargo test --workspace --no-fail-fast -j 1` — green +- [x] `cargo clippy --workspace --all-targets -- -D warnings` — clean +- [x] new tests: + - `fetch_integration` (kebab-app): 7 tests covering chunk / chunk+context / doc / doc+budget / span / span clamp / unknown id + - `wire_fetch` (kebab-cli): 3 lexical-only integration tests (wire shape, doc truncation, error path) + - `tools_call_fetch` (kebab-mcp): 2 tests (chunk happy path + invalid_input) +- [x] manual smoke per `docs/SMOKE.md` "Verbatim fetch" walkthrough + +## Architectural notes + +- `App::fetch` is the new public method; `App::search` / `App::search_with_opts` / `App::ask` unchanged. +- `fmt_canonical_to_markdown` is a private helper in `kebab-app::fetch` — round-trip is best-effort (inline styling collapsed); good enough for an agent reading raw context. +- chunk_id stability: blake3(doc_id || chunker_version || ...) — stable across normal re-ingest, invalidated only by chunker_version cascade. Spec + SKILL.md note retry pattern. +- Source = `chunks.text` / `CanonicalDocument`. Raw bytes (`assets.storage_path`) NOT exposed by design — sucetable from sucetable from sucetable from sucetable user can read directly. + +## Files of interest + +- spec: `docs/superpowers/specs/2026-05-09-p9-fb-35-verbatim-fetch-design.md` +- plan: `docs/superpowers/plans/2026-05-09-p9-fb-35-verbatim-fetch.md` +- core: `crates/kebab-core/src/fetch.rs` +- app: `crates/kebab-app/src/fetch.rs` +- CLI: `crates/kebab-cli/src/main.rs` + `wire.rs::wire_fetch_result` +- MCP: `crates/kebab-mcp/src/tools/fetch.rs` +- wire: `docs/wire-schema/v1/fetch_result.schema.json` +``` + +(Fix the typo `sucetable from` — likely an artifact; replace with: "user can read directly via `cat $WORKSPACE_ROOT/`.") + +Open the PR: + +```bash +/Users/user/.claude/skills/gitea-ops/bin/gitea-pr \ + --title "feat(fb-35): verbatim fetch (chunk / doc / span)" \ + --body "$(cat /tmp/fb35-pr-body.md)" \ + --head feat/fb-35-verbatim-fetch \ + --base main +``` + +Capture the URL. + +- [ ] **Step 11.5: Cleanup** + +```bash +rm /tmp/fb35-pr-body.md +``` + +--- + +## Self-review + +- **Spec coverage:** + - §Behavior contract / 3 modes → Tasks 3, 4, 5 + - §CLI subcommand → Task 6 + - §Wire shape → Task 2 (schema) + Task 6 (CLI emit) + Task 8 (MCP emit) + - §Mode 동작 (chunk / doc / span semantics, error codes) → Tasks 3, 4, 5 + - §Budget integration → Tasks 4, 5 (`max_tokens` chars/4 trim) + - §Error codes → all tasks via `StructuredError` + - §MCP tool → Task 8 + - §Public surface delta → Tasks 1, 3 + - §Test plan → Tasks 3, 4, 5 (App), 7 (CLI), 8 (MCP) + - §Documentation → Task 10 + - §Risks (markdown round-trip, span line counting, chunk_id stability, PDF/audio rejection) → addressed in Task 4 (helper), Task 5 (line counting + reject), Task 10 (SKILL.md notes), Task 5 (rejection) + +- **Placeholder scan:** + - Task 6 / 7 / 8 have "If existing helper has different signature, mirror it" — concrete fallback paths spelled out. + - Task 3's `app.sqlite` access note instructs visibility check + fallback. No "TODO" / "fill in" remaining. + +- **Type consistency:** + - `FetchQuery::{Chunk(ChunkId), Doc(DocumentId), Span { doc_id, line_start, line_end }}` consistent across Tasks 1, 3, 4, 5, 6, 8. + - `FetchOpts { context: Option, max_tokens: Option }` consistent. + - `FetchResult { kind, doc_id, doc_path, indexed_at, stale, chunk, context_before, context_after, text, line_start, line_end, effective_end, truncated }` consistent. + - `FetchKind { Chunk, Doc, Span }` snake_case rename pinned in Task 1's serde test. + - Error codes consistent: `chunk_not_found`, `doc_not_found`, `span_not_supported`, `invalid_input`. + +--- + +## Execution Handoff + +Plan complete and saved to `docs/superpowers/plans/2026-05-09-p9-fb-35-verbatim-fetch.md`. Two execution options: + +**1. Subagent-Driven (recommended)** — fresh subagent per task, review between tasks. + +**2. Inline Execution** — execute tasks in this session. + +Which approach? diff --git a/docs/superpowers/specs/2026-05-09-p9-fb-35-verbatim-fetch-design.md b/docs/superpowers/specs/2026-05-09-p9-fb-35-verbatim-fetch-design.md new file mode 100644 index 0000000..0450e4d --- /dev/null +++ b/docs/superpowers/specs/2026-05-09-p9-fb-35-verbatim-fetch-design.md @@ -0,0 +1,276 @@ +--- +title: "p9-fb-35 — Verbatim fetch design" +phase: P9 +component: kebab-core + kebab-app + kebab-cli + kebab-mcp + wire-schema +task_id: p9-fb-35 +status: design +target_version: 0.5.0 +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md +contract_sections: [§4 search, §5 storage, §10 UX] +date: 2026-05-09 +--- + +# p9-fb-35 — Verbatim fetch + +## Goal + +agent 가 search hit / RAG citation 의 `chunk_id` / `doc_id` 로 raw verbatim text 를 deep-link fetch 할 수 있는 surface. CLI 와 MCP 동시 노출. 3가지 mode (chunk / doc / span). PDF / audio 의 line-based span 은 명시적 거절 (`error.v1.code = span_not_supported`). image OCR text 는 line-addressable 이라 span 허용. + +## Behavior contract + +### Source of truth + +모든 text 는 `CanonicalDocument` / `chunks.text` 에서 가져온 정규화된 markdown. 원본 raw bytes (`assets.storage_path` 의 파일) 는 노출 안 함 — 사용자가 필요하면 직접 read. PDF / audio / image 도 동일 surface (page-text / transcript / OCR text). 단 line-based span 은 PDF / audio 거절 — image OCR 은 line-addressable 이라 허용. + +### CLI subcommand + +`kebab fetch` 신규 subcommand, 3 mode: + +| mode | flags | +|------|-------| +| `kebab fetch chunk [--context N] [--json]` | `--context` 시 동일 doc 의 ordinal ±N 범위 chunks 도 포함 | +| `kebab fetch doc [--max-tokens N] [--json]` | doc 정규화된 markdown text. budget 트립 시 truncated. | +| `kebab fetch span [--max-tokens N] [--json]` | doc text 의 line range (1-based, inclusive). PDF/audio 면 거절. | + +`--context` 는 chunk mode 에만. `--max-tokens` 는 doc/span 에만 (chunk 는 bounded size). + +### Wire shape — `fetch_result.v1` + +discriminated by `kind`: + +```json +{ + "schema_version": "fetch_result.v1", + "kind": "chunk" | "doc" | "span", + "doc_id": "", + "doc_path": "", + "indexed_at": "", + "stale": , + "chunk": {/* chunk_inspection.v1, kind=chunk */}, + "context_before": [/* chunk_inspection.v1[], kind=chunk */], + "context_after": [/* chunk_inspection.v1[], kind=chunk */], + "text": "", + "line_start": , + "line_end": , + "effective_end": , + "truncated": +} +``` + +Per-kind 필수 필드 — schema description 으로 명시 (JSON Schema 의 conditional validation 은 v1 stub 단계에서 미구현, agent 책임). + +`indexed_at` / `stale` — fb-32 와 동일 stamping. `documents.updated_at` 기준. + +### Mode 동작 + +**chunk mode**: +1. `DocumentStore::get_chunk(chunk_id)` — 없으면 `error.v1.code = chunk_not_found`. +2. `--context N` 시 doc 안 chunks 의 ordinal 정렬 → target ordinal ±N 의 chunks 추출. doc 경계 넘기지 않음 (clamp). +3. wire: `kind: "chunk"`, `chunk: `, `context_before: [...]`, `context_after: [...]`, `truncated: false`. + +**doc mode**: +1. `DocumentStore::get_document(doc_id)` — 없으면 `error.v1.code = doc_not_found`. +2. `CanonicalDocument` 의 blocks → markdown 직렬화 (`fmt_canonical_to_markdown` 헬퍼 신규). +3. `--max-tokens N` 시 chars/4 추정 budget 적용 — 초과 시 끝에서 끊고 truncated=true. (line 단위 trim 은 별도 task — 단순 char-trim.) +4. wire: `kind: "doc"`, `text: `, `truncated: `. + +**span mode**: +1. doc lookup 동일. +2. media_type 검사 — PDF (`Page` citation) / audio (`Time` citation) 는 line-incompatible → `error.v1.code = span_not_supported`. +3. doc text → `text.lines()` slice `[line_start..=line_end]`. line_end 가 total 초과 시 clamp. +4. `--max-tokens` 적용 시 끝에서 추가 truncate, `effective_end` 갱신. +5. wire: `kind: "span"`, `text`, `line_start`, `line_end` (요청), `effective_end` (실제 emit), `truncated`. + +### Budget integration + +fb-34 의 chars/4 추정 + truncate 패턴 재사용. `FetchOpts.max_tokens` 가 `Some(N)` 일 때만 동작. chunk mode 는 무관 (chunk 는 chunker 단위 bounded). + +### Error codes + +`error.v1.code` enum 추가: +- `chunk_not_found` — chunk_id lookup miss. +- `doc_not_found` — doc_id lookup miss. +- `span_not_supported` — line-incompatible media (PDF / audio). +- `invalid_input` — MCP tool 의 mode 별 필수 필드 누락 (e.g. `kind: "chunk"` + `chunk_id: null`). + +`StructuredError` wrapper (fb-34) 재사용 — `App::fetch` 의 typed `ErrorV1` 가 `classify` downcast 거쳐 wire 까지 보존. + +### MCP tool + +`mcp__kebab__fetch` 신규. Input: + +```rust +pub struct FetchInput { + /// "chunk" | "doc" | "span" + pub kind: String, + pub chunk_id: Option, + pub doc_id: Option, + pub line_start: Option, + pub line_end: Option, + pub context: Option, + pub max_tokens: Option, +} +``` + +Validation: `kind` 별 필수 필드 검증 후 `App::fetch` 호출. 출력 = `fetch_result.v1`. + +## Allowed / forbidden dependencies + +- `kebab-core`: 신규 도메인 type. 신규 dep 없음. +- `kebab-app`: 기존 deps 충분. fb-32 staleness + fb-34 budget 헬퍼 재사용. markdown 직렬화는 단순 fmt 함수 (별도 dep 불필요). +- `kebab-cli`: clap subcommand 추가, wire helper. +- `kebab-mcp`: tool 추가. +- `kebab-tui`: 변경 없음. +- `kebab-search` / `kebab-rag`: 변경 없음. + +## Public surface delta + +### kebab-core + +```rust +#[derive(Clone, Debug)] +pub enum FetchQuery { + Chunk(ChunkId), + Doc(DocumentId), + Span { + doc_id: DocumentId, + line_start: u32, + line_end: u32, + }, +} + +#[derive(Clone, Debug, Default)] +pub struct FetchOpts { + /// chunk mode 만: ±N chunks. None = no context. + pub context: Option, + /// doc/span 만: chars/4 budget. None = no cap. + pub max_tokens: Option, +} + +#[derive(Clone, Debug)] +pub struct FetchResult { + pub kind: FetchKind, + pub doc_id: DocumentId, + pub doc_path: WorkspacePath, + pub indexed_at: OffsetDateTime, + pub stale: bool, + // chunk + pub chunk: Option, + pub context_before: Vec, + pub context_after: Vec, + // doc / span + pub text: Option, + pub line_start: Option, + pub line_end: Option, + pub effective_end: Option, + pub truncated: bool, +} + +#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum FetchKind { Chunk, Doc, Span } +``` + +`Serialize` impl for `FetchResult` flattens to `fetch_result.v1` shape (or wire helper does the projection). + +### kebab-app + +```rust +impl App { + pub fn fetch(&self, query: FetchQuery, opts: FetchOpts) -> Result; +} + +pub fn fetch_with_config( + config: kebab_config::Config, + query: FetchQuery, + opts: FetchOpts, +) -> Result; + +// markdown 직렬화 헬퍼 (private) +fn fmt_canonical_to_markdown(doc: &CanonicalDocument) -> String; +``` + +### kebab-cli + +```rust +// Cmd::Fetch 신규 enum variant +Fetch { + #[command(subcommand)] + what: FetchWhat, +} + +#[derive(Subcommand)] +enum FetchWhat { + Chunk { id: String, #[arg(long)] context: Option }, + Doc { id: String, #[arg(long)] max_tokens: Option }, + Span { + doc_id: String, + line_start: u32, + line_end: u32, + #[arg(long)] max_tokens: Option, + }, +} +``` + +```rust +// wire helper +pub fn wire_fetch_result(r: &FetchResult) -> Value; +``` + +### kebab-mcp + +`FetchInput` + `mcp__kebab__fetch` tool registration. + +## Test plan + +| kind | description | +|------|-------------| +| unit (kebab-app) | chunk fetch (no context) — chunk + 빈 context | +| unit (kebab-app) | chunk fetch `--context 2` 다중 chunk doc 에서 ±2 ordinal 정확 | +| unit (kebab-app) | chunk fetch `--context 99` doc 경계 clamp | +| unit (kebab-app) | doc fetch — markdown 직렬화 결과 | +| unit (kebab-app) | doc fetch `--max-tokens N` budget 트립 → truncated=true + text 잘림 | +| unit (kebab-app) | span fetch line range slice 정확 | +| unit (kebab-app) | span line_end > total → effective_end clamped | +| unit (kebab-app) | span PDF doc → StructuredError(span_not_supported) | +| unit (kebab-app) | span audio doc → StructuredError(span_not_supported) | +| unit (kebab-app) | unknown chunk_id → StructuredError(chunk_not_found) | +| unit (kebab-app) | unknown doc_id → StructuredError(doc_not_found) | +| unit (kebab-app) | indexed_at + stale fb-32 stamping 정확 | +| 통합 (kebab-cli) | `kebab fetch chunk --json --context 1` wire 검증 | +| 통합 (kebab-cli) | `kebab fetch doc --max-tokens 100 --json` truncated=true | +| 통합 (kebab-cli) | `kebab fetch span 1 5 --json` line range | +| 통합 (kebab-cli) | `kebab fetch chunk ` → exit 2 + error.v1.code = chunk_not_found | +| 통합 (kebab-cli) | plain mode chunk — `[doc_path § heading]\n` 형태 | +| 통합 (kebab-mcp) | `mcp__kebab__fetch` 3 mode 정상 응답 | +| 통합 (kebab-mcp) | `kind: "chunk"` + `chunk_id: null` → invalid_input | +| 통합 (wire-schema) | `fetch_result.schema.json` 3 mode 샘플 validate | + +## Implementation steps (high-level) + +1. wire schema 신규 `fetch_result.schema.json` + `error.v1` enum 4 codes 추가. +2. `kebab-core` 신규 types (`FetchQuery`, `FetchOpts`, `FetchResult`, `FetchKind`). +3. `kebab-app::fetch` impl + `fmt_canonical_to_markdown` 헬퍼. +4. `kebab-cli::Cmd::Fetch` clap subcommand + wire helper + plain renderer. +5. `kebab-mcp` `kebab__fetch` tool + input validation. +6. 단위 + 통합 테스트. +7. README + SMOKE — fetch 예시. +8. tasks/INDEX.md / spec status flip. +9. `tasks/HOTFIXES.md` — 신규 surface 라 deviation 없을 가능성 (skip). +10. `integrations/claude-code/kebab/SKILL.md` — Recipe 추가 ("agent fetched a chunk_id from search, wants surrounding context"). + +## Risks / notes + +- **Markdown 직렬화 round-trip** — `CanonicalDocument.blocks` 가 round-trip 손실 적은지 확인. 손실 발견 시 ingest 시점에 raw markdown 도 store 에 보존하는 후속 task 가능 (fb-3X). +- **chunk_id stability** — chunker_version cascade 시 invalidate. spec 에 명시 + skill notes 의 retry pattern 안내. +- **`Chunk` 가 `chunk_inspection.v1` 와 동일** — `wire_chunk_inspection` 재사용 가능. 새 헬퍼 불필요. +- **doc/span budget — line trim 안 함** — char-level trim 만. agent 가 끊긴 line 받을 가능성 있음. 충분히 작은 한도 (e.g. 2000 chars) 면 큰 영향 없음. 후속에서 line-aware trim 가능. +- **media_type 판정** — `documents.source_type` 또는 첫 chunk 의 citation kind (Line/Page/Time) 로 분기. PDF/audio 는 Page/Time citation. line range 의미 없음. + +## Documentation updates (implementation PR 동시) + +- `README.md` — 명령 표에 `kebab fetch chunk|doc|span` row. +- `docs/SMOKE.md` — fetch walkthrough (search → fetch chunk --context flow). +- `tasks/p9/p9-fb-35-verbatim-fetch.md` — `status: open → completed`, design/plan 링크. +- `tasks/INDEX.md` — fb-35 행 ✅. +- `integrations/claude-code/kebab/SKILL.md` — 신규 `mcp__kebab__fetch` row + recipe. diff --git a/docs/wire-schema/v1/fetch_result.schema.json b/docs/wire-schema/v1/fetch_result.schema.json new file mode 100644 index 0000000..a8d14ee --- /dev/null +++ b/docs/wire-schema/v1/fetch_result.schema.json @@ -0,0 +1,24 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://kb.local/wire/v1/fetch_result.schema.json", + "title": "FetchResult v1", + "description": "Verbatim text fetch from the indexed corpus. Discriminated by `kind`. All text is normalized markdown sourced from `CanonicalDocument` / `chunks.text` — original raw bytes are not exposed. PDF / audio span fetch returns `error.v1.code = span_not_supported`.", + "type": "object", + "required": ["schema_version", "kind", "doc_id", "doc_path", "indexed_at", "stale", "truncated"], + "properties": { + "schema_version": { "const": "fetch_result.v1" }, + "kind": { "enum": ["chunk", "doc", "span"] }, + "doc_id": { "type": "string" }, + "doc_path": { "type": "string" }, + "indexed_at": { "type": "string", "format": "date-time", "description": "fb-32 documents.updated_at" }, + "stale": { "type": "boolean", "description": "fb-32 staleness flag against config.search.stale_threshold_days" }, + "chunk": { "type": "object", "description": "kind=chunk: target chunk_inspection.v1 payload" }, + "context_before": { "type": "array", "description": "kind=chunk: --context N preceding chunks (ordinal-sorted)" }, + "context_after": { "type": "array", "description": "kind=chunk: --context N following chunks (ordinal-sorted)" }, + "text": { "type": "string", "description": "kind=doc/span: markdown text (truncated if budget tripped)" }, + "line_start": { "type": ["integer", "null"], "minimum": 1, "description": "kind=span: requested start line (1-based)" }, + "line_end": { "type": ["integer", "null"], "minimum": 1, "description": "kind=span: requested end line (1-based, inclusive)" }, + "effective_end": { "type": ["integer", "null"], "minimum": 0, "description": "kind=span: actual end line of emitted text (1-based, inclusive). Equals `line_end` on full slice; less than `line_end` when (a) requested range exceeded total lines (line clamp) or (b) `--max-tokens` budget trimmed the tail. Special case: `line_start - 1` (which is 0 when line_start=1) signals the entire requested range was beyond doc end — returned `text` is empty." }, + "truncated": { "type": "boolean", "description": "kind=doc/span: budget forced text truncation. Always false for chunk." } + } +} diff --git a/integrations/claude-code/kebab/SKILL.md b/integrations/claude-code/kebab/SKILL.md index 34d5ef6..2faedda 100644 --- a/integrations/claude-code/kebab/SKILL.md +++ b/integrations/claude-code/kebab/SKILL.md @@ -28,12 +28,13 @@ User-specific trigger keywords (team names, system names, internal acronyms) bel ## MCP tools (preferred) -When `kebab` is registered as an MCP server (see `~/.claude/mcp.json` example below), six tools are exposed as `mcp__kebab__`: +When `kebab` is registered as an MCP server (see `~/.claude/mcp.json` example below), seven tools are exposed as `mcp__kebab__`: | tool | purpose | mutation | |------|---------|----------| | `mcp__kebab__search` | corpus search → `search_response.v1` (`{hits, next_cursor, truncated}`) | no | | `mcp__kebab__ask` | RAG answer → `answer.v1` | no | +| `mcp__kebab__fetch` | verbatim text → `fetch_result.v1` (chunk / doc / span) | no | | `mcp__kebab__schema` | capability discovery → `schema.v1` | no | | `mcp__kebab__doctor` | health check → `doctor.v1` | no | | `mcp__kebab__ingest_file` | save single file → `ingest_report.v1` | yes | @@ -69,6 +70,22 @@ Input: - **If `grounded == false`** → KB doesn't have enough context. Don't paraphrase the refusal as if it were an answer. Tell the user the KB came up dry and fall back to your own knowledge or ask for the source. - For follow-up turns on the same topic, pass `session_id` (e.g. `"team-onboarding-2026-05"`) and reuse it across the conversation. Sessions persist until `kebab reset --data-only`. +### `mcp__kebab__fetch` — when you need raw text + +Use after `search` to read the verbatim chunk text + surrounding context, or to pull a full doc / line range. + +Input: +```json +{ "kind": "chunk", "chunk_id": "", "context": 2 } +{ "kind": "doc", "doc_id": "", "max_tokens": 1000 } +{ "kind": "span", "doc_id": "", "line_start": 1, "line_end": 5 } +``` + +- `chunk` mode: `context: N` returns ordinal-adjacent chunks before/after for surrounding paragraphs. +- `doc` mode: full normalized markdown. `max_tokens` (chars/4) caps the response — `truncated: true` when applied. +- `span` mode: 1-based inclusive line range. PDF / audio docs reject as `error.v1.code = span_not_supported` (use `chunk` mode instead — PDF chunks are page-aligned). +- `error.v1.code = chunk_not_found` / `doc_not_found` are non-retryable from the same id — re-issue search to get a fresh one. + ## CLI fallback If MCP tools aren't in scope (host without MCP support, or `mcp.json` not configured), call the CLI via Bash: diff --git a/tasks/INDEX.md b/tasks/INDEX.md index 1dbd5f9..fab95d7 100644 --- a/tasks/INDEX.md +++ b/tasks/INDEX.md @@ -123,7 +123,7 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능. - [p9-fb-32 stale doc indicator](p9/p9-fb-32-stale-doc-indicator.md) — ✅ 머지 + v0.4.0 cut 후보 (2026-05-09) - [p9-fb-33 streaming ask (ndjson delta)](p9/p9-fb-33-streaming-ask.md) — ✅ 머지 + v0.5.0 cut 후보 (2026-05-09) - [p9-fb-34 output budget controls](p9/p9-fb-34-output-budget-controls.md) — ✅ 머지 + v0.5.0 cut 후보 (2026-05-09) - - [p9-fb-35 verbatim fetch](p9/p9-fb-35-verbatim-fetch.md) — ⏳ 미구현, brainstorm 필요 + - [p9-fb-35 verbatim fetch](p9/p9-fb-35-verbatim-fetch.md) — ✅ 머지 + v0.5.0 cut 후보 (2026-05-09) - [p9-fb-36 search filter args](p9/p9-fb-36-search-filters.md) — ⏳ 미구현, brainstorm 필요 - [p9-fb-37 trace + stats](p9/p9-fb-37-trace-and-stats.md) — ⏳ 미구현, brainstorm 필요 (depends_on 27) diff --git a/tasks/p9/p9-fb-35-verbatim-fetch.md b/tasks/p9/p9-fb-35-verbatim-fetch.md index c8267da..8efe88f 100644 --- a/tasks/p9/p9-fb-35-verbatim-fetch.md +++ b/tasks/p9/p9-fb-35-verbatim-fetch.md @@ -3,8 +3,8 @@ phase: P9 component: kebab-cli + kebab-app + wire-schema task_id: p9-fb-35 title: "Verbatim fetch (`kebab fetch `) — citation deep-link" -status: open -target_version: 0.4.0 +status: completed +target_version: 0.5.0 depends_on: [] unblocks: [] contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md @@ -14,7 +14,10 @@ source_feedback: 사용자 도그푸딩 2026-05-06 — agent 가 search hit / ci # p9-fb-35 — Verbatim fetch -> ⏳ **백로그 only — 미구현.** 본 spec 은 도그푸딩 피드백 skeleton. 구현 착수 전 [superpowers:brainstorming](../../docs/superpowers/) 으로 설계 단계 선행 필요. fetch unit (chunk vs doc vs span) / 주변 context (앞뒤 chunk N 개) / 옵션 정책 brainstorm 후 확정. +> ✅ **구현 완료.** 본 spec 은 구현 시점의 frozen 상태. post-merge deviation 은 [HOTFIXES.md](../HOTFIXES.md) 참조. + +상세 설계: `docs/superpowers/specs/2026-05-09-p9-fb-35-verbatim-fetch-design.md`. +구현 계획: `docs/superpowers/plans/2026-05-09-p9-fb-35-verbatim-fetch.md`. ## 증상 / 동기 @@ -34,7 +37,7 @@ source_feedback: 사용자 도그푸딩 2026-05-06 — agent 가 search hit / ci - chunk_id / doc_id 노출 — 현재 search_hit.v1 에 있는지 확인 + 안정성. - context window — N 개 chunk vs N tokens. - doc 전체 fetch 의 size 제한 (fb-34 budget 과 통합). -- pdf / image 의 fetch — 텍스트 추출본 vs 원본 path. +- pdf / audio 의 fetch — 텍스트 추출본 vs 원본 path. (image OCR 는 markdown line 으로 떨어져 span 허용.) ## Risks / notes