Merge pull request 'feat(fb-27): introspection (kebab schema) + structured error wire' (#104) from feat/p9-fb-27-introspection-and-error-wire into main
Reviewed-on: #104
This commit was merged in pull request #104.
This commit is contained in:
@@ -60,7 +60,7 @@ Read the relevant task spec's deps section before adding an import. New crates i
|
||||
|
||||
## Wire schema v1
|
||||
|
||||
All `--json` output carries a `schema_version` field (`ingest_report.v1`, `search_hit.v1`, `answer.v1`, `doctor.v1`, …). Schemas live in `docs/wire-schema/v1/`. The wire shape is the contract for external integrations (Claude Code skills, MCP, etc.); breaking it requires a `*.v2` major bump and parallel-running both for one phase.
|
||||
All `--json` output carries a `schema_version` field. Current schemas: `ingest_report.v1`, `ingest_progress.v1`, `search_hit.v1`, `answer.v1`, `doctor.v1`, `reset_report.v1`, `eval_run.v1`, `eval_compare.v1`, `list_docs.v1`, `schema.v1`, `error.v1`. Schemas live in `docs/wire-schema/v1/`. The wire shape is the contract for external integrations (Claude Code skills, MCP, etc.); breaking it requires a `*.v2` major bump and parallel-running both for one phase. In `--json` mode, fatal errors emit `error.v1` to stderr as ndjson (non-`--json` mode keeps plain stderr text); exit codes 0/1/2/3 are unchanged — `error.v1.code` provides fine-grained agent branching.
|
||||
|
||||
In-tree integration packages live under `integrations/<host>/` — currently `integrations/claude-code/kebab/` (a Claude Code skill that calls `kebab search --json` / `kebab ask --json`). Any wire schema major bump (v1→v2) MUST update each shipped integration in the same PR, same as the version-cascade rule below. Per-user trigger keywords (team / system / acronym) belong in the user's local copy of the skill, not in the repo-shipped frontmatter — keep `integrations/claude-code/kebab/SKILL.md`'s `description` generic.
|
||||
|
||||
|
||||
4
Cargo.lock
generated
4
Cargo.lock
generated
@@ -3558,6 +3558,8 @@ dependencies = [
|
||||
"kebab-core",
|
||||
"kebab-eval",
|
||||
"kebab-tui",
|
||||
"reqwest",
|
||||
"serde",
|
||||
"serde_json",
|
||||
"tempfile",
|
||||
"time",
|
||||
@@ -3572,6 +3574,8 @@ dependencies = [
|
||||
"kebab-core",
|
||||
"serde",
|
||||
"serde_json",
|
||||
"tempfile",
|
||||
"thiserror 2.0.18",
|
||||
"toml",
|
||||
"tracing",
|
||||
]
|
||||
|
||||
@@ -65,6 +65,7 @@ P0~P5 직렬. P6~P9 P5 이후 병렬 가능.
|
||||
- **2026-05-04 P9 post-도그푸딩 (p9-fb-22)** — TUI 입력 cursor mid-string 편집 + Ask follow-tail auto-scroll. Gitea #94 (입력 후 커서 이동 안 됨) + #95 (새 응답 자동 스크롤 안 됨) 두 건. `InputBuffer` 의 cursor 모델을 byte-position 기반으로 재구성 — cursor 가 끝일 때 기존 append 동작과 backwards-compatible, mid-string 일 때는 `←/→/Home/End/Delete` 로 편집. `AskState` 에 `follow_tail: bool` (default true). `Paragraph::line_count(width)` (ratatui `unstable-rendered-line-info` feature 활성화) 로 매 프레임 wrapped row 수 계산해 follow-tail 시 scroll 을 bottom 에 pin. `j`/`k` 가 follow-tail 끄고 `Shift-G` 가 다시 켬. 12 신규 InputBuffer unit + 6 신규 Ask integration. spec: `tasks/p9/p9-fb-22-tui-cursor-and-autoscroll.md`. HOTFIXES 항목 `2026-05-04` 가 live cursor 모델 source of truth.
|
||||
- **2026-05-03 P9 post-도그푸딩 (p9-fb-21)** — `i` 가 universal Normal→Insert toggle (모든 pane). 이전 mode_intercept 는 Library/Inspect/Jobs 만 `i` intercept 였고 Search/Ask 는 fall-through (자동 INSERT 가정). 사용자가 Esc 로 NORMAL 로 빠진 후 Insert 복귀 키 없어 dead-end → 도그푸딩에서 보고됨. mode_intercept 의 `(Char('i'), Normal, _)` arm 이 pane 무관 모두 INSERT flip. Search 의 chunk inspect 키 `i`→`o` rebind (vim "open") 으로 충돌 해소. footer hint 모든 (pane, mode, filter) 조합 첫 fragment = `F1 도움말` (cheatsheet binding discoverability). Search/Ask Normal hint 에 `i 입력모드` fragment 추가. cheatsheet popup Global/Search/Ask section 갱신. 6 신규 unit + 3 기존 갱신. spec: `tasks/p9/p9-fb-21-tui-insert-key-discoverability.md` (status `completed` 직접). HOTFIXES 항목이 Search `i`→`o` rebind 의 source of truth.
|
||||
- **2026-05-03 P9 도그푸딩 후속 (p9-fb-10 follow-up)** — InputBuffer struct + 모든 text-input pane 마이그레이션 + cursor column 정렬. `kebab-tui::input::InputBuffer { content, cursor_col }` 신규 — `push_char` / `pop_char` / `clear` / `take` 가 wide-char 단위로 cursor_col 진행 (ASCII=1, Hangul/CJK=2, combining=0). `SearchState.input` / `AskState.input` / `FilterEdit.{tags_buf, lang_buf}` 가 InputBuffer 로 교체. render 단계에서 `f.set_cursor_position(...)` 가 `block.inner(area)` 기반 prompt 폭 + cursor_col 으로 caret 을 정확한 column 에 배치 (right-edge clamp). ratatui 0.28 의 cursor visibility 는 `cursor_position` Some/None 으로 자동 결정 — Search/Ask/Filter 가 `Some` 이라 caret 보임, Library/Inspect 는 `None` 이라 hidden. Korean lexical 검색은 `crates/kebab-app/tests/search_korean.rs` 에서 ingest → search → 결과 한 건 이상 + Korean 파일 stem 매칭 assert 로 회귀 핀. `lexical_query` test helper 가 `crates/kebab-app/tests/common/mod.rs` 로 promotion. spec status `in_progress` → `completed`. spec: `tasks/p9/p9-fb-10-tui-cjk-input.md`.
|
||||
- **2026-05-07 P9 post-도그푸딩 (p9-fb-27)** — `kebab schema [--json]` introspection 명령 + `error.v1` wire 도입. 정적 (wire schemas / capabilities / models) + 동적 (stats) 한 번에. `--json` 모드에서 fatal error 가 stderr ndjson 으로 emit (비 `--json` 은 기존 stderr text 유지). exit code 0/1/2/3 unchanged — `error.v1.code` 가 fine-grained 분기. fb-30 MCP `initialize` capability matrix 의 prerequisite. spec: `tasks/p9/p9-fb-27-introspection-and-error-wire.md`. design: `docs/superpowers/specs/2026-05-07-p9-fb-27-introspection-and-error-wire-design.md`.
|
||||
- **2026-05-03 P9 도그푸딩 피드백 20/20 ✅** — `tasks/p9/p9-fb-01..20` 모든 spec status `completed`. 사용자가 `kebab` 직접 돌려서 수집한 UX 잡음 (ingest 진행 표시 부재, mode 혼란, CJK column drift, multi-turn 부재, citation 부재 등) 이 모두 코드 또는 spec-acknowledged-deferred 형태로 해소. 도그푸딩 사이클 한 바퀴 완성 — P9-5 desktop tauri 와 별개로 TUI/CLI 사용자 경험 측면은 한 단계 안정화. P9 phase row 는 P9-5 미진행이라 🟡 유지.
|
||||
- **2026-05-03 P9 도그푸딩 후속 (p9-fb-13 follow-up)** — verb-form hint line 재구성. `pub fn footer_hints(focus: Pane, mode: Mode, filter_open: bool) -> &'static str` 신규 (run.rs). 한국어 동사구 (`"위로"` / `"아래로"` / `"필터"` / `"타이핑 검색어"` / `"Esc 로 NORMAL 모드"` 등) + mode-aware (NORMAL = navigation verbs, INSERT = typing + Esc reminder) + Library filter overlay 별 분기. 8 unit tests pin 모든 (pane, mode, filter) 조합 — exhaustive non-empty + Library Normal/filter, Search Normal/Insert, Ask Normal/Insert, Inspect Normal 별 verb fragment 존재 검증. spec status `in_progress` → `completed` — p9-fb-13 partial 의 deferred verb-form 항목이 닫힘.
|
||||
- **2026-05-03 P9 도그푸딩 후속 (p9-fb-13)** — TUI cheatsheet popup. `kebab-tui::cheatsheet::render_cheatsheet(f, area, app)` 신규 — 70%/60% centered modal, sections (Global / Library / Search / Ask / Inspect) + global toggle table + 현재 focused pane footer. `App.cheatsheet_visible: bool` 필드 + `pub fn cheatsheet_visible()` getter. run loop `cheatsheet_intercept(app, key)` 가 mode_intercept 보다 먼저 dispatch — `F1` 토글 (open/close), `Esc` 가 visible 일 때 닫기 (mode_intercept 를 우회해서 cheatsheet 닫기 가 mode flip 도 발동시키지 않도록), 그 외 키는 fall-through (popup 열린 채 navigation 가능). modifier-bearing F1 (Ctrl-F1 등) 은 무시. **HOTFIXES 기록**: spec 의 `?` trigger 가 Library 의 quick-Ask binding 과 충돌해서 `F1` 으로 rebind. spec 의 verb-form hint line 재구성은 별 후속 PR (기존 footer 가 동일 역할). spec status `planned` → `in_progress` (verb hint deferral 으로 partial). spec: `tasks/p9/p9-fb-13-tui-cheatsheet.md`.
|
||||
|
||||
@@ -79,8 +79,9 @@ kebab doctor
|
||||
| `kebab tui` | Ratatui 셸 (Library + Search + Ask + Inspect 패널, desktop 진행 중). Library 에서 `r` 키로 background ingest 시작 — 화면 하단 status bar 가 진행 표시, 완료/abort 시 final 라인 잠시 유지 후 자동 hide. ingest 진행 중 `Esc` / `Ctrl-C` 가 cancel signal (그 외에는 quit). vim-style mode (header 우측 `-- NORMAL --` / `-- INSERT --`) — Library/Inspect 는 자동 NORMAL, Search/Ask 는 자동 INSERT. `i` 로 Normal→Insert (모든 pane — p9-fb-21), `Esc` 로 Insert→Normal 어디서나. mode-authoritative dispatch — Search 의 `j/k/o/g`, Ask 의 `e/j/k` 는 NORMAL 모드에서만 명령으로 동작, INSERT 에서는 입력 문자로 typing. (Search 의 chunk inspect 키는 `i`→`o` 로 rebind — `i` 가 universal Insert toggle.) **`F1` 로 cheatsheet popup** (현재 pane 의 키 매핑 + global 토글 표) — `Esc` / `F1` 로 닫기. Search 패널은 200ms debounce 후 background worker 가 검색 — 키 입력으로 UI freeze 안 됨, 사용자가 계속 타이핑하면 stale 결과 자동 폐기 (generation counter). Ask 패널은 multi-turn — 같은 conversation 안에서 Q1/A1, Q2/A2 transcript 누적, 다음 질문이 이전 턴을 history 로 받아 답변. 답변 본문은 markdown 렌더 (bold/italic/inline code/heading/list/code fence/table/blockquote, raw `**bold**` 가 실제 굵게 표시). `Ctrl-L` 로 새 conversation 시작. Search 의 `g` 키가 `$EDITOR` (기본 `vi`) 로 hit 의 citation 위치 열기 — 종료 후 TUI 화면이 자동으로 깨끗이 redraw. CLI `kebab ask` 는 raw markdown 그대로 (terminal 호환성 위해). Library 의 doc-list 가 한글 / 일본어 / 중국어 (CJK) 제목을 wide-char 정확한 column width 로 truncate — 한글 제목이 한 줄을 넘기지 않음 (CJK 1 자 = 2 col). Search/Ask/Filter 입력의 cursor 가 wide char 위에서 column 단위로 정렬 — 한글 입력 시 caret 이 글자 옆에 정확히 놓임. `← / →` 로 입력 문자열 중간 cursor 이동 (한글 한 글자 = 2 column 이라도 한 번에 이동), `Home / End` 로 양 끝 점프, `Delete` 로 cursor 위치 char 삭제 — 모든 input pane (Ask / Search / Library filter overlay) 동일 (p9-fb-22). Ask 트랜스크립트는 새 답변이 viewport 아래로 누적될 때 자동으로 tail 을 따라감 (auto-scroll); `j` / `k` 로 위로 스크롤하면 freeze, `Shift-G` 로 다시 bottom + auto-tail 재개. 화면 하단 hint line 은 한국어 동사구로 (`"위로"` / `"아래로"` / `"필터"` / `"타이핑 검색어"` / `"Esc 로 NORMAL 모드"` / `"i 입력모드"` 등) + 현재 (pane, mode) 조합에 맞춰 자동 분기, **첫 fragment 가 항상 `F1 도움말`** (cheatsheet 발견성 보장). 모든 모드에서 항상 떠 있는 상태바 — `kebab v<version> │ <pane> │ <docs> docs │ <state>` (state: streaming/searching/indexing/idle, ingest 진행 중에는 progress 가 같은 자리에 흡수됨). Ask 진입 시 conversation id 8 자 prefix 도 함께 표시. Ask 트랜스크립트와 Inspect 양쪽에서 `PgUp / PgDn` 으로 10 줄씩 페이지 스크롤. Library 의 doc list 위에는 `TITLE / TAGS / UPDATED / CHUNKS` 컬럼 헤더 행 표시 (display-width 정렬, Hangul / CJK 안전). |
|
||||
| `kebab reset [--all / --data-only / --vector-only / --config-only] [--yes]` | XDG 데이터 wipe. **Irreversible.** TTY 면 confirm prompt, 아니면 `--yes` 필수. `--vector-only` 는 SQLite `embedding_records` 도 함께 truncate (orphan 방지) |
|
||||
| `kebab eval run / compare` | golden query 회귀 측정 |
|
||||
| `kebab schema [--json]` | introspection — wire schemas / capabilities / models / stats 한 번에. `--json` 은 `schema.v1` wire; 사람 모드는 서식 출력. |
|
||||
|
||||
모든 명령에 `--json` 플래그. 출력은 frozen wire schema v1 (`schema_version` 항상 포함, 예: `ingest_report.v1`, `ingest_progress.v1`, `search_hit.v1`, `answer.v1`, `doctor.v1`, `reset_report.v1`).
|
||||
모든 명령에 `--json` 플래그. 출력은 frozen wire schema v1 (`schema_version` 항상 포함, 예: `ingest_report.v1`, `ingest_progress.v1`, `search_hit.v1`, `answer.v1`, `doctor.v1`, `reset_report.v1`, `schema.v1`). `--json` 모드에서 fatal error 는 stderr 에 `error.v1` ndjson 으로 emit (exit code 0/1/2/3 unchanged).
|
||||
|
||||
## 논리 아키텍처
|
||||
|
||||
|
||||
15
crates/kebab-app/src/error_signal.rs
Normal file
15
crates/kebab-app/src/error_signal.rs
Normal file
@@ -0,0 +1,15 @@
|
||||
//! Typed signal re-exports + new signals introduced by fb-27.
|
||||
//!
|
||||
//! kebab-cli (and future kebab-tui / kebab-desktop) downcast on these to
|
||||
//! build `error.v1` wire records. The existing signals
|
||||
//! (`RefusalSignal`, `NoHitSignal`, `DoctorUnhealthy`) live in
|
||||
//! `doctor_signal.rs` — leave those unchanged and re-export via this
|
||||
//! module so callers have one place to import from.
|
||||
//!
|
||||
//! See `docs/superpowers/specs/2026-05-07-p9-fb-27-introspection-and-error-wire-design.md`.
|
||||
|
||||
pub use crate::doctor_signal::{DoctorUnhealthy, NoHitSignal, RefusalSignal};
|
||||
|
||||
pub use kebab_llm_local::LlmError;
|
||||
pub use kebab_config::ConfigInvalid;
|
||||
pub use kebab_store_sqlite::NotIndexed;
|
||||
@@ -56,13 +56,16 @@ use kebab_source_fs::FsSourceConnector;
|
||||
|
||||
mod app;
|
||||
pub mod doctor_signal;
|
||||
pub mod error_signal;
|
||||
pub mod ingest_progress;
|
||||
pub mod logging;
|
||||
pub mod reset;
|
||||
pub mod schema;
|
||||
|
||||
pub use app::App;
|
||||
pub use ingest_progress::{AggregateCounts, IngestEvent, render_skipped_breakdown};
|
||||
pub use reset::{ResetReport, ResetScope};
|
||||
pub use schema::{Capabilities, Models, SCHEMA_V1_ID, SchemaV1, Stats, WireBlock, schema_with_config};
|
||||
|
||||
/// p9-fb-25: sentinel for files without an extension in
|
||||
/// `IngestReport.skipped_by_extension` keys + `IngestItem.warnings`
|
||||
@@ -71,21 +74,6 @@ pub use reset::{ResetReport, ResetScope};
|
||||
/// compatibility break.
|
||||
pub const NO_EXT_SENTINEL: &str = "<no-ext>";
|
||||
|
||||
/// Parser-version label persisted in `documents.parser_version` for
|
||||
/// every Markdown file ingested through the `kb-parse-md` pipeline.
|
||||
/// Kept in lock-step with the literal used in the `kb-store-sqlite`
|
||||
/// idempotency / round-trip tests so the version label written by the
|
||||
/// app and the one used in cross-crate fixtures match.
|
||||
///
|
||||
/// p9-fb-07 bumped this from `pulldown-cmark-0.x` to `md-frontmatter-v2`
|
||||
/// because `kebab-normalize::derive_title` now applies a fallback chain
|
||||
/// (frontmatter → H1 → H2 → first paragraph → file stem) when the
|
||||
/// frontmatter title is blank. The bump invalidates `doc_id` for every
|
||||
/// pre-existing Markdown document, so a re-ingest is required for the
|
||||
/// new titles to land — this is the documented cascade behavior per
|
||||
/// design §9.
|
||||
const KEBAB_PARSE_MD_VERSION: &str = "md-frontmatter-v2";
|
||||
|
||||
/// Caller-supplied knobs for one [`ask`] invocation.
|
||||
///
|
||||
/// Re-exported from [`kebab_rag::AskOpts`] (P4-3 owns the type) so kb-cli's
|
||||
@@ -330,7 +318,7 @@ pub fn ingest_with_config_opts(
|
||||
.context("kb-app::ingest: ensure Lance table")?;
|
||||
}
|
||||
|
||||
let parser_version = ParserVersion(KEBAB_PARSE_MD_VERSION.to_string());
|
||||
let parser_version = ParserVersion(kebab_parse_md::PARSER_VERSION.to_string());
|
||||
let chunk_policy = chunk_policy_from_config(&app.config);
|
||||
|
||||
// P6-4: build OCR / caption adapters once per ingest invocation,
|
||||
|
||||
151
crates/kebab-app/src/schema.rs
Normal file
151
crates/kebab-app/src/schema.rs
Normal file
@@ -0,0 +1,151 @@
|
||||
//! `kebab schema` — introspection report. See spec
|
||||
//! `docs/superpowers/specs/2026-05-07-p9-fb-27-introspection-and-error-wire-design.md`.
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
use kebab_config::Config;
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct SchemaV1 {
|
||||
pub schema_version: String,
|
||||
pub kebab_version: String,
|
||||
pub wire: WireBlock,
|
||||
pub capabilities: Capabilities,
|
||||
pub models: Models,
|
||||
pub stats: Stats,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct WireBlock {
|
||||
pub schemas: Vec<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct Capabilities {
|
||||
pub json_mode: bool,
|
||||
pub ingest_progress: bool,
|
||||
pub ingest_cancellation: bool,
|
||||
pub rag_multi_turn: bool,
|
||||
pub search_cache: bool,
|
||||
pub incremental_ingest: bool,
|
||||
pub streaming_ask: bool,
|
||||
pub http_daemon: bool,
|
||||
pub mcp_server: bool,
|
||||
pub single_file_ingest: bool,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct Models {
|
||||
pub parser_version: String,
|
||||
pub chunker_version: String,
|
||||
pub embedding_version: String,
|
||||
pub prompt_template_version: String,
|
||||
pub index_version: String,
|
||||
pub corpus_revision: u64,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct Stats {
|
||||
pub doc_count: u64,
|
||||
pub chunk_count: u64,
|
||||
pub asset_count: u64,
|
||||
pub last_ingest_at: Option<String>,
|
||||
}
|
||||
|
||||
const KEBAB_VERSION: &str = env!("CARGO_PKG_VERSION");
|
||||
|
||||
/// Wire schema id for [`SchemaV1`]. Single source of truth — `kebab-cli`
|
||||
/// re-uses this via `kebab_app::schema::SCHEMA_V1_ID` when wrapping.
|
||||
pub const SCHEMA_V1_ID: &str = "schema.v1";
|
||||
|
||||
// Authoritative list of wire schemas this binary emits. Keep in sync with
|
||||
// `docs/wire-schema/v1/*.schema.json` and `kebab-cli::wire::wire_*` helpers.
|
||||
const WIRE_SCHEMAS: &[&str] = &[
|
||||
"answer.v1",
|
||||
"search_hit.v1",
|
||||
"doc_summary.v1",
|
||||
"chunk_inspection.v1",
|
||||
"doctor.v1",
|
||||
"ingest_report.v1",
|
||||
"ingest_progress.v1",
|
||||
"reset_report.v1",
|
||||
"citation.v1",
|
||||
"schema.v1",
|
||||
"error.v1",
|
||||
];
|
||||
|
||||
/// Build a [`SchemaV1`] introspection report for the given config.
|
||||
///
|
||||
/// Opens the SQLite store read-only via [`kebab_store_sqlite::SqliteStore::open_existing`]
|
||||
/// so the caller (kebab-cli) does not need write access to the data dir.
|
||||
/// Returns a [`kebab_store_sqlite::NotIndexed`] error (wrapped in `anyhow`)
|
||||
/// if the database file does not exist — the CLI translates that to an
|
||||
/// `error.v1` / `"not_indexed"` wire record.
|
||||
#[doc(hidden)]
|
||||
pub fn schema_with_config(cfg: &Config) -> anyhow::Result<SchemaV1> {
|
||||
let store = open_store_for_stats(cfg)?;
|
||||
let stats = collect_stats(&store)?;
|
||||
let models = collect_models(cfg, &store);
|
||||
Ok(SchemaV1 {
|
||||
schema_version: SCHEMA_V1_ID.to_string(),
|
||||
kebab_version: KEBAB_VERSION.to_string(),
|
||||
wire: WireBlock {
|
||||
schemas: WIRE_SCHEMAS.iter().map(|s| (*s).to_string()).collect(),
|
||||
},
|
||||
capabilities: capabilities_snapshot(),
|
||||
models,
|
||||
stats,
|
||||
})
|
||||
}
|
||||
|
||||
fn capabilities_snapshot() -> Capabilities {
|
||||
Capabilities {
|
||||
json_mode: true,
|
||||
ingest_progress: true,
|
||||
ingest_cancellation: true,
|
||||
rag_multi_turn: true,
|
||||
search_cache: true,
|
||||
incremental_ingest: true,
|
||||
streaming_ask: false,
|
||||
http_daemon: false,
|
||||
mcp_server: false,
|
||||
single_file_ingest: false,
|
||||
}
|
||||
}
|
||||
|
||||
fn open_store_for_stats(cfg: &Config) -> anyhow::Result<kebab_store_sqlite::SqliteStore> {
|
||||
// Mirror the data_dir resolution used in SqliteStore::open:
|
||||
// kebab_config::expand_path(&cfg.storage.data_dir, "") resolves tilde
|
||||
// and env vars. The SQLITE_FILE name ("kebab.sqlite") is the canonical
|
||||
// file name defined in kebab-store-sqlite.
|
||||
let data_dir = kebab_config::expand_path(&cfg.storage.data_dir, "");
|
||||
let db_path = data_dir.join("kebab.sqlite");
|
||||
kebab_store_sqlite::SqliteStore::open_existing(&db_path)
|
||||
}
|
||||
|
||||
fn collect_stats(store: &kebab_store_sqlite::SqliteStore) -> anyhow::Result<Stats> {
|
||||
let counts = store.count_summary()?;
|
||||
Ok(Stats {
|
||||
doc_count: counts.doc_count,
|
||||
chunk_count: counts.chunk_count,
|
||||
asset_count: counts.asset_count,
|
||||
last_ingest_at: counts.last_ingest_at,
|
||||
})
|
||||
}
|
||||
|
||||
fn collect_models(cfg: &Config, store: &kebab_store_sqlite::SqliteStore) -> Models {
|
||||
Models {
|
||||
// markdown parser only — pdf-page-v1 (P7) / image extractors (P6)
|
||||
// maintain their own versions; surface those when SchemaV1.models
|
||||
// becomes a multi-medium map (P+).
|
||||
parser_version: kebab_parse_md::PARSER_VERSION.to_string(),
|
||||
chunker_version: cfg.chunking.chunker_version.clone(),
|
||||
// EmbeddingModelCfg uses `.model` (not `.id`) — adapt from plan.
|
||||
embedding_version: cfg.models.embedding.model.clone(),
|
||||
prompt_template_version: cfg.rag.prompt_template_version.clone(),
|
||||
index_version: kebab_store_vector::INDEX_VERSION_STR.to_string(),
|
||||
// corpus_revision returns u64 directly (no Result) — matches
|
||||
// existing impl; treat 0 as the default for a fresh/unrevised store.
|
||||
corpus_revision: store.corpus_revision(),
|
||||
}
|
||||
}
|
||||
119
crates/kebab-app/tests/schema_report.rs
Normal file
119
crates/kebab-app/tests/schema_report.rs
Normal file
@@ -0,0 +1,119 @@
|
||||
//! Integration test: kebab_app::schema_with_config returns a SchemaV1
|
||||
//! that is internally consistent with a freshly-ingested TempDir KB.
|
||||
|
||||
use std::fs;
|
||||
|
||||
use kebab_config::Config;
|
||||
use kebab_core::SourceScope;
|
||||
|
||||
fn minimal_config(data_dir: &std::path::Path, workspace_root: &std::path::Path) -> Config {
|
||||
let mut config = Config::defaults();
|
||||
config.workspace.root = workspace_root.to_string_lossy().into_owned();
|
||||
config.workspace.exclude.clear();
|
||||
config.storage.data_dir = data_dir.to_string_lossy().into_owned();
|
||||
config.storage.model_dir = data_dir.join("models").to_string_lossy().into_owned();
|
||||
config.models.embedding.provider = "none".to_string();
|
||||
config.models.embedding.dimensions = 0;
|
||||
config.chunking.target_tokens = 80;
|
||||
config.chunking.overlap_tokens = 20;
|
||||
config
|
||||
}
|
||||
|
||||
fn minimal_scope(workspace_root: &std::path::Path) -> SourceScope {
|
||||
SourceScope {
|
||||
root: workspace_root.to_path_buf(),
|
||||
include: vec![],
|
||||
exclude: vec![],
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn schema_report_reflects_freshly_ingested_kb() {
|
||||
let temp = tempfile::tempdir().expect("tempdir");
|
||||
let workspace_root = temp.path().join("workspace");
|
||||
let data_dir = temp.path().join("data");
|
||||
fs::create_dir_all(&workspace_root).unwrap();
|
||||
fs::create_dir_all(&data_dir).unwrap();
|
||||
|
||||
fs::write(workspace_root.join("a.md"), "# A\n\nbody A.").unwrap();
|
||||
fs::write(workspace_root.join("b.md"), "# B\n\nbody B.").unwrap();
|
||||
|
||||
let config = minimal_config(&data_dir, &workspace_root);
|
||||
let _report =
|
||||
kebab_app::ingest_with_config(config.clone(), minimal_scope(&workspace_root), false)
|
||||
.unwrap();
|
||||
|
||||
let schema = kebab_app::schema_with_config(&config).unwrap();
|
||||
|
||||
assert!(!schema.kebab_version.is_empty());
|
||||
assert!(
|
||||
schema.wire.schemas.contains(&"schema.v1".to_string()),
|
||||
"schema.v1 missing from wire.schemas: {:?}",
|
||||
schema.wire.schemas
|
||||
);
|
||||
assert!(
|
||||
schema.wire.schemas.contains(&"error.v1".to_string()),
|
||||
"error.v1 missing from wire.schemas: {:?}",
|
||||
schema.wire.schemas
|
||||
);
|
||||
assert!(schema.capabilities.json_mode);
|
||||
assert!(!schema.capabilities.streaming_ask);
|
||||
assert_eq!(
|
||||
schema.stats.doc_count, 2,
|
||||
"expected 2 docs (a.md + b.md): {:?}",
|
||||
schema.stats
|
||||
);
|
||||
assert!(
|
||||
schema.stats.last_ingest_at.is_some(),
|
||||
"last_ingest_at must be set after ingest: {:?}",
|
||||
schema.stats
|
||||
);
|
||||
assert!(
|
||||
schema.stats.chunk_count >= 2,
|
||||
"expected ≥2 chunks (a.md + b.md): {:?}",
|
||||
schema.stats
|
||||
);
|
||||
assert_eq!(
|
||||
schema.stats.asset_count, 2,
|
||||
"expected 2 assets (a.md + b.md): {:?}",
|
||||
schema.stats
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn schema_report_on_empty_kb_has_zero_counts() {
|
||||
// An empty workspace dir with no .md files: ingest_with_config scans 0
|
||||
// files but still creates + migrates kebab.sqlite. This seeds the DB so
|
||||
// open_existing (used inside schema_with_config) succeeds and returns
|
||||
// all-zero counts.
|
||||
let temp = tempfile::tempdir().expect("tempdir");
|
||||
let workspace_root = temp.path().join("workspace");
|
||||
let data_dir = temp.path().join("data");
|
||||
fs::create_dir_all(&workspace_root).unwrap();
|
||||
fs::create_dir_all(&data_dir).unwrap();
|
||||
|
||||
let config = minimal_config(&data_dir, &workspace_root);
|
||||
// Run ingest over the empty workspace — creates kebab.sqlite, runs
|
||||
// migrations, records 0 docs. schema_with_config can then open_existing.
|
||||
let report =
|
||||
kebab_app::ingest_with_config(config.clone(), minimal_scope(&workspace_root), false)
|
||||
.unwrap();
|
||||
assert_eq!(report.new, 0, "empty workspace should yield 0 new docs");
|
||||
|
||||
let schema = kebab_app::schema_with_config(&config).unwrap();
|
||||
assert_eq!(
|
||||
schema.stats.doc_count, 0,
|
||||
"empty KB doc_count: {:?}",
|
||||
schema.stats
|
||||
);
|
||||
assert_eq!(
|
||||
schema.stats.chunk_count, 0,
|
||||
"empty KB chunk_count: {:?}",
|
||||
schema.stats
|
||||
);
|
||||
assert!(
|
||||
schema.stats.last_ingest_at.is_none(),
|
||||
"last_ingest_at must be None when no docs ingested: {:?}",
|
||||
schema.stats
|
||||
);
|
||||
}
|
||||
@@ -28,6 +28,7 @@ kebab-eval = { path = "../kebab-eval" }
|
||||
# launches it.
|
||||
kebab-tui = { path = "../kebab-tui" }
|
||||
anyhow = { workspace = true }
|
||||
serde = { workspace = true }
|
||||
serde_json = { workspace = true }
|
||||
clap = { version = "4", features = ["derive"] }
|
||||
# p9-fb-02: ingest progress UI.
|
||||
@@ -43,3 +44,6 @@ ctrlc = "3"
|
||||
|
||||
[dev-dependencies]
|
||||
tempfile = { workspace = true }
|
||||
# llm_unreachable_classifies_to_model_unreachable test needs a real
|
||||
# reqwest::Error (private constructor) — built from a connect-refused call.
|
||||
reqwest = { version = "0.12", default-features = false, features = ["blocking", "rustls-tls"] }
|
||||
|
||||
186
crates/kebab-cli/src/error_classify.rs
Normal file
186
crates/kebab-cli/src/error_classify.rs
Normal file
@@ -0,0 +1,186 @@
|
||||
//! Map `anyhow::Error` (returned by `kebab-app` facade calls) to the
|
||||
//! `error.v1` wire shape. The classifier downcasts to known typed errors
|
||||
//! re-exported via `kebab_app::error_signal` (LlmError, ConfigInvalid,
|
||||
//! NotIndexed) and falls back to `code: "generic"` for everything else.
|
||||
//!
|
||||
//! Refusal / no-hit / doctor-unhealthy are NOT routed here — they remain
|
||||
//! exit-code-only signals (see main.rs `exit_code()`).
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
use serde_json::{Value, json};
|
||||
|
||||
use kebab_app::error_signal::{ConfigInvalid, LlmError, NotIndexed};
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct ErrorV1 {
|
||||
pub code: String,
|
||||
pub message: String,
|
||||
pub details: Value,
|
||||
pub hint: Option<String>,
|
||||
}
|
||||
|
||||
pub fn classify(err: &anyhow::Error, verbose: bool) -> ErrorV1 {
|
||||
if let Some(s) = err.downcast_ref::<ConfigInvalid>() {
|
||||
return ErrorV1 {
|
||||
code: "config_invalid".to_string(),
|
||||
message: s.to_string(),
|
||||
details: json!({
|
||||
"path": s.path.to_string_lossy(),
|
||||
"cause": s.cause,
|
||||
}),
|
||||
hint: Some("check `--config <path>` and TOML syntax".to_string()),
|
||||
};
|
||||
}
|
||||
if let Some(s) = err.downcast_ref::<NotIndexed>() {
|
||||
return ErrorV1 {
|
||||
code: "not_indexed".to_string(),
|
||||
message: s.to_string(),
|
||||
details: json!({
|
||||
"expected": s.expected,
|
||||
"found": s.found,
|
||||
}),
|
||||
hint: Some("run `kebab init` then `kebab ingest`".to_string()),
|
||||
};
|
||||
}
|
||||
if let Some(s) = err.downcast_ref::<LlmError>() {
|
||||
return classify_llm(s);
|
||||
}
|
||||
if let Some(io) = err.downcast_ref::<std::io::Error>() {
|
||||
return ErrorV1 {
|
||||
code: "io_error".to_string(),
|
||||
message: io.to_string(),
|
||||
details: json!({"kind": format!("{:?}", io.kind())}),
|
||||
hint: None,
|
||||
};
|
||||
}
|
||||
let mut details = json!({});
|
||||
if verbose {
|
||||
let chain: Vec<String> = err.chain().map(|c| c.to_string()).collect();
|
||||
details = json!({"chain": chain});
|
||||
}
|
||||
ErrorV1 {
|
||||
code: "generic".to_string(),
|
||||
message: err.to_string(),
|
||||
details,
|
||||
hint: None,
|
||||
}
|
||||
}
|
||||
|
||||
fn classify_llm(s: &LlmError) -> ErrorV1 {
|
||||
match s {
|
||||
LlmError::Unreachable { endpoint, source } => ErrorV1 {
|
||||
code: "model_unreachable".to_string(),
|
||||
message: format!("ollama unreachable at {endpoint}"),
|
||||
details: json!({
|
||||
"endpoint": endpoint,
|
||||
"source": source.to_string(),
|
||||
}),
|
||||
hint: Some(format!("ensure `ollama serve` is reachable at {endpoint}")),
|
||||
},
|
||||
LlmError::ModelNotPulled(model) => ErrorV1 {
|
||||
code: "model_not_pulled".to_string(),
|
||||
message: format!("ollama model `{model}` is not pulled"),
|
||||
details: json!({"model": model}),
|
||||
hint: Some(format!("run `ollama pull {model}`")),
|
||||
},
|
||||
LlmError::Timeout(e) => ErrorV1 {
|
||||
code: "timeout".to_string(),
|
||||
message: format!("ollama timeout: {e}"),
|
||||
details: json!({"source": e.to_string()}),
|
||||
hint: Some("increase timeout or check Ollama load".to_string()),
|
||||
},
|
||||
LlmError::Stream(body) => ErrorV1 {
|
||||
code: "generic".to_string(),
|
||||
message: format!("ollama HTTP error: {body}"),
|
||||
details: json!({"body": body}),
|
||||
hint: None,
|
||||
},
|
||||
LlmError::Malformed(line) => ErrorV1 {
|
||||
code: "generic".to_string(),
|
||||
message: format!("malformed response line: {line}"),
|
||||
details: json!({"line": line}),
|
||||
hint: None,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn config_invalid_classifies_to_config_invalid_code() {
|
||||
let err = anyhow::Error::new(ConfigInvalid {
|
||||
path: std::path::PathBuf::from("/tmp/x.toml"),
|
||||
cause: "missing".to_string(),
|
||||
});
|
||||
let v1 = classify(&err, false);
|
||||
assert_eq!(v1.code, "config_invalid");
|
||||
assert_eq!(v1.details.get("path").and_then(|p| p.as_str()), Some("/tmp/x.toml"));
|
||||
assert!(v1.hint.is_some());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn not_indexed_classifies_correctly() {
|
||||
let err = anyhow::Error::new(NotIndexed {
|
||||
expected: "/data/k.sqlite".to_string(),
|
||||
found: None,
|
||||
});
|
||||
let v1 = classify(&err, false);
|
||||
assert_eq!(v1.code, "not_indexed");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn llm_unreachable_classifies_to_model_unreachable() {
|
||||
// We cannot construct a reqwest::Error from scratch (private constructor).
|
||||
// Approach: send a real request to a guaranteed-unroutable endpoint
|
||||
// (port 1 is reserved + connect-refused on all conformant TCP stacks).
|
||||
// 500ms timeout chosen as headroom over 50ms baseline — heavily loaded
|
||||
// CI may hit timeout race instead of connect-refused, but either way
|
||||
// the resulting LlmError::Unreachable maps to "model_unreachable".
|
||||
let client = reqwest::blocking::Client::builder()
|
||||
.timeout(std::time::Duration::from_millis(500))
|
||||
.build().unwrap();
|
||||
let err = client.get("http://127.0.0.1:1").send().unwrap_err();
|
||||
let llm = LlmError::Unreachable {
|
||||
endpoint: "http://127.0.0.1:1".to_string(),
|
||||
source: err,
|
||||
};
|
||||
let anyhow_err = anyhow::Error::new(llm);
|
||||
let v1 = classify(&anyhow_err, false);
|
||||
assert_eq!(v1.code, "model_unreachable");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn model_not_pulled_classifies_correctly() {
|
||||
let llm = LlmError::ModelNotPulled("gemma4:e4b".to_string());
|
||||
let v1 = classify(&anyhow::Error::new(llm), false);
|
||||
assert_eq!(v1.code, "model_not_pulled");
|
||||
assert_eq!(v1.details.get("model").and_then(|p| p.as_str()), Some("gemma4:e4b"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unknown_error_classifies_to_generic() {
|
||||
let err = anyhow::anyhow!("something else");
|
||||
let v1 = classify(&err, false);
|
||||
assert_eq!(v1.code, "generic");
|
||||
assert!(v1.hint.is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn generic_with_verbose_includes_chain() {
|
||||
let err = anyhow::anyhow!("root").context("middle").context("leaf");
|
||||
let v1 = classify(&err, true);
|
||||
assert_eq!(v1.code, "generic");
|
||||
let chain = v1.details.get("chain").and_then(|c| c.as_array()).unwrap();
|
||||
assert_eq!(chain.len(), 3);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn io_error_classifies_correctly() {
|
||||
let io = std::io::Error::new(std::io::ErrorKind::NotFound, "no such file");
|
||||
let err = anyhow::Error::new(io);
|
||||
let v1 = classify(&err, false);
|
||||
assert_eq!(v1.code, "io_error");
|
||||
}
|
||||
}
|
||||
@@ -9,6 +9,7 @@ use clap::{Parser, Subcommand};
|
||||
use kebab_app::doctor_signal::{DoctorUnhealthy, NoHitSignal, RefusalSignal};
|
||||
|
||||
mod cancel;
|
||||
mod error_classify;
|
||||
mod progress;
|
||||
mod wire;
|
||||
|
||||
@@ -176,6 +177,9 @@ enum Cmd {
|
||||
/// Health check.
|
||||
Doctor,
|
||||
|
||||
/// Print introspection report (wire schemas, capabilities, model versions, stats).
|
||||
Schema,
|
||||
|
||||
/// Launch the Ratatui shell (P9-1 — Library pane only; search /
|
||||
/// ask / inspect panes land with p9-2 / p9-3 / p9-4).
|
||||
Tui,
|
||||
@@ -277,10 +281,18 @@ fn main() -> ExitCode {
|
||||
// Refusals at exit code 1 print to stdout (already done by the
|
||||
// caller); errors go to stderr.
|
||||
if code != 1 {
|
||||
eprintln!("error: {e}");
|
||||
if cli.verbose {
|
||||
for cause in e.chain().skip(1) {
|
||||
eprintln!(" caused by: {cause}");
|
||||
if cli.json {
|
||||
let v1 = error_classify::classify(&e, cli.verbose);
|
||||
let v = wire::wire_error_v1(&v1);
|
||||
eprintln!("{}", serde_json::to_string(&v).unwrap_or_else(|_| {
|
||||
"{\"schema_version\":\"error.v1\",\"code\":\"generic\",\"message\":\"serialize failed\"}".to_string()
|
||||
}));
|
||||
} else {
|
||||
eprintln!("error: {e}");
|
||||
if cli.verbose {
|
||||
for cause in e.chain().skip(1) {
|
||||
eprintln!(" caused by: {cause}");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -603,6 +615,18 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
Cmd::Schema => {
|
||||
let cfg = kebab_config::Config::load(cli.config.as_deref())?;
|
||||
let report = kebab_app::schema_with_config(&cfg)?;
|
||||
if cli.json {
|
||||
let v = wire::wire_schema(&report);
|
||||
println!("{}", serde_json::to_string(&v)?);
|
||||
} else {
|
||||
print_schema_text(&report);
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
Cmd::Doctor => {
|
||||
let report = kebab_app::doctor_with_config_path(cli.config.as_deref())?;
|
||||
if cli.json {
|
||||
@@ -719,6 +743,50 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
|
||||
}
|
||||
}
|
||||
|
||||
fn print_schema_text(s: &kebab_app::SchemaV1) {
|
||||
println!("kebab v{}", s.kebab_version);
|
||||
println!();
|
||||
|
||||
println!("wire schemas");
|
||||
println!(" {}", s.wire.schemas.join(", "));
|
||||
println!();
|
||||
|
||||
println!("capabilities");
|
||||
let caps = [
|
||||
("json_mode", s.capabilities.json_mode),
|
||||
("ingest_progress", s.capabilities.ingest_progress),
|
||||
("ingest_cancellation", s.capabilities.ingest_cancellation),
|
||||
("rag_multi_turn", s.capabilities.rag_multi_turn),
|
||||
("search_cache", s.capabilities.search_cache),
|
||||
("incremental_ingest", s.capabilities.incremental_ingest),
|
||||
("streaming_ask", s.capabilities.streaming_ask),
|
||||
("http_daemon", s.capabilities.http_daemon),
|
||||
("mcp_server", s.capabilities.mcp_server),
|
||||
("single_file_ingest", s.capabilities.single_file_ingest),
|
||||
];
|
||||
for (name, on) in caps {
|
||||
let mark = if on { "✓" } else { "✗" };
|
||||
println!(" {mark} {name}");
|
||||
}
|
||||
println!();
|
||||
|
||||
println!("models");
|
||||
println!(" parser_version {}", s.models.parser_version);
|
||||
println!(" chunker_version {}", s.models.chunker_version);
|
||||
println!(" embedding_version {}", s.models.embedding_version);
|
||||
println!(" prompt_template_version {}", s.models.prompt_template_version);
|
||||
println!(" index_version {}", s.models.index_version);
|
||||
println!(" corpus_revision {}", s.models.corpus_revision);
|
||||
println!();
|
||||
|
||||
println!("stats");
|
||||
println!(" doc_count {}", s.stats.doc_count);
|
||||
println!(" chunk_count {}", s.stats.chunk_count);
|
||||
println!(" asset_count {}", s.stats.asset_count);
|
||||
let last = s.stats.last_ingest_at.as_deref().unwrap_or("(never)");
|
||||
println!(" last_ingest_at {last}");
|
||||
}
|
||||
|
||||
/// Minimal stdin/stdout confirm prompt for destructive ops. No new dep —
|
||||
/// uses stdlib `IsTerminal` (the caller is expected to have already
|
||||
/// short-circuited the non-TTY case). Returns `Ok(true)` only when the
|
||||
|
||||
@@ -133,6 +133,35 @@ pub fn wire_ingest_progress(
|
||||
Ok(tag_object(v, "ingest_progress.v1"))
|
||||
}
|
||||
|
||||
/// Wrap a [`kebab_app::SchemaV1`] as `schema.v1`.
|
||||
///
|
||||
/// Uses the idempotent re-tag pattern (mirrors `wire_doctor`) because
|
||||
/// `SchemaV1` already carries `schema_version: "schema.v1"` as a struct
|
||||
/// field. The re-tag is a defensive no-op when the field is present; it
|
||||
/// stamps the correct version if a future refactor ever drops the field.
|
||||
pub fn wire_schema(s: &kebab_app::SchemaV1) -> Value {
|
||||
let v = serde_json::to_value(s).expect("SchemaV1 serializes");
|
||||
if let Value::Object(ref map) = v {
|
||||
if matches!(
|
||||
map.get("schema_version"),
|
||||
Some(Value::String(s)) if s == kebab_app::SCHEMA_V1_ID
|
||||
) {
|
||||
return v;
|
||||
}
|
||||
}
|
||||
tag_object(v, kebab_app::SCHEMA_V1_ID)
|
||||
}
|
||||
|
||||
/// Wrap an [`crate::error_classify::ErrorV1`] as `error.v1`.
|
||||
///
|
||||
/// Uses the simple `tag_object` pattern because `ErrorV1` is a
|
||||
/// kebab-cli-local type that does NOT carry `schema_version` itself
|
||||
/// (kebab-core convention).
|
||||
pub fn wire_error_v1(e: &crate::error_classify::ErrorV1) -> Value {
|
||||
let v = serde_json::to_value(e).expect("ErrorV1 serializes");
|
||||
tag_object(v, "error.v1")
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
@@ -200,6 +229,51 @@ mod tests {
|
||||
assert_eq!(schema_of(&tagged), Some("x.v1"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn schema_wrapper_tags_schema_version() {
|
||||
use kebab_app::{Capabilities, Models, SchemaV1, Stats, WireBlock};
|
||||
let schema = SchemaV1 {
|
||||
schema_version: "schema.v1".to_string(),
|
||||
kebab_version: "0.2.1".to_string(),
|
||||
wire: WireBlock { schemas: vec!["answer.v1".to_string()] },
|
||||
capabilities: Capabilities {
|
||||
json_mode: true, ingest_progress: true, ingest_cancellation: true,
|
||||
rag_multi_turn: true, search_cache: true, incremental_ingest: true,
|
||||
streaming_ask: false, http_daemon: false, mcp_server: false,
|
||||
single_file_ingest: false,
|
||||
},
|
||||
models: Models {
|
||||
parser_version: "x".to_string(),
|
||||
chunker_version: "y".to_string(),
|
||||
embedding_version: "z".to_string(),
|
||||
prompt_template_version: "w".to_string(),
|
||||
index_version: "v".to_string(),
|
||||
corpus_revision: 7,
|
||||
},
|
||||
stats: Stats {
|
||||
doc_count: 1, chunk_count: 2, asset_count: 1,
|
||||
last_ingest_at: None,
|
||||
},
|
||||
};
|
||||
let v = wire_schema(&schema);
|
||||
assert_eq!(schema_of(&v), Some("schema.v1"));
|
||||
assert_eq!(v.get("kebab_version").and_then(Value::as_str), Some("0.2.1"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn error_wrapper_tags_schema_version_and_emits_code() {
|
||||
use crate::error_classify::ErrorV1;
|
||||
let err = ErrorV1 {
|
||||
code: "config_invalid".to_string(),
|
||||
message: "bad config".to_string(),
|
||||
details: serde_json::json!({"path": "/tmp/x"}),
|
||||
hint: Some("check the path".to_string()),
|
||||
};
|
||||
let v = wire_error_v1(&err);
|
||||
assert_eq!(schema_of(&v), Some("error.v1"));
|
||||
assert_eq!(v.get("code").and_then(Value::as_str), Some("config_invalid"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn reset_wrapper_tags_schema_version_and_serializes_scope() {
|
||||
let r = kebab_app::ResetReport {
|
||||
|
||||
105
crates/kebab-cli/tests/cli_error_wire.rs
Normal file
105
crates/kebab-cli/tests/cli_error_wire.rs
Normal file
@@ -0,0 +1,105 @@
|
||||
//! Integration: spawn kebab and verify --json mode emits error.v1 ndjson
|
||||
//! on stderr while non-json mode emits the legacy `error:` text prefix.
|
||||
//!
|
||||
//! The `config_invalid` code is triggered by supplying an *existing* but
|
||||
//! malformed TOML file via `--config`. Note: supplying a *non-existent*
|
||||
//! path does NOT trigger this error — Config::load silently falls back to
|
||||
//! defaults when the specified config file is absent (by design, so that
|
||||
//! `kebab doctor` runs before `kebab init` is ever called). A file that
|
||||
//! exists but fails TOML parsing is the reliable path to `config_invalid`.
|
||||
|
||||
use std::process::Command;
|
||||
|
||||
fn kebab_bin() -> std::path::PathBuf {
|
||||
let manifest = env!("CARGO_MANIFEST_DIR");
|
||||
std::path::PathBuf::from(manifest)
|
||||
.parent()
|
||||
.unwrap()
|
||||
.parent()
|
||||
.unwrap()
|
||||
.join("target/debug/kebab")
|
||||
}
|
||||
|
||||
fn xdg_envs(tmp: &std::path::Path) -> [(&'static str, std::path::PathBuf); 4] {
|
||||
[
|
||||
("XDG_CONFIG_HOME", tmp.join("cfg")),
|
||||
("XDG_DATA_HOME", tmp.join("data")),
|
||||
("XDG_CACHE_HOME", tmp.join("cache")),
|
||||
("XDG_STATE_HOME", tmp.join("state")),
|
||||
]
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_mode_emits_error_v1_on_config_invalid() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
// Write a file that exists but is not valid TOML.
|
||||
let bad_config = tmp.path().join("bad.toml");
|
||||
std::fs::write(&bad_config, b"this is not { valid toml !!!").unwrap();
|
||||
|
||||
let mut cmd = Command::new(kebab_bin());
|
||||
cmd.args([
|
||||
"--json",
|
||||
"--config",
|
||||
bad_config.to_str().unwrap(),
|
||||
"ingest",
|
||||
]);
|
||||
for (k, v) in xdg_envs(tmp.path()) {
|
||||
cmd.env(k, v);
|
||||
}
|
||||
|
||||
let out = cmd.output().unwrap();
|
||||
assert!(
|
||||
!out.status.success(),
|
||||
"expected non-zero exit for config_invalid"
|
||||
);
|
||||
let exit_code = out.status.code().unwrap_or(-1);
|
||||
assert_eq!(exit_code, 2, "expected exit code 2, got {exit_code}");
|
||||
|
||||
let stderr = String::from_utf8(out.stderr).unwrap();
|
||||
let first_line = stderr.lines().next().expect("stderr must have at least one line");
|
||||
let v: serde_json::Value =
|
||||
serde_json::from_str(first_line).expect("stderr first line must be valid JSON");
|
||||
|
||||
assert_eq!(
|
||||
v.get("schema_version").and_then(|s| s.as_str()),
|
||||
Some("error.v1"),
|
||||
"schema_version must be error.v1"
|
||||
);
|
||||
assert_eq!(
|
||||
v.get("code").and_then(|s| s.as_str()),
|
||||
Some("config_invalid"),
|
||||
"code must be config_invalid"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn text_mode_emits_legacy_error_format() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
// Same trigger: an existing file with malformed TOML.
|
||||
let bad_config = tmp.path().join("bad.toml");
|
||||
std::fs::write(&bad_config, b"this is not { valid toml !!!").unwrap();
|
||||
|
||||
let mut cmd = Command::new(kebab_bin());
|
||||
cmd.args(["--config", bad_config.to_str().unwrap(), "ingest"]);
|
||||
for (k, v) in xdg_envs(tmp.path()) {
|
||||
cmd.env(k, v);
|
||||
}
|
||||
|
||||
let out = cmd.output().unwrap();
|
||||
assert!(
|
||||
!out.status.success(),
|
||||
"expected non-zero exit for config_invalid"
|
||||
);
|
||||
let exit_code = out.status.code().unwrap_or(-1);
|
||||
assert_eq!(exit_code, 2, "expected exit code 2, got {exit_code}");
|
||||
|
||||
let stderr = String::from_utf8(out.stderr).unwrap();
|
||||
assert!(
|
||||
stderr.starts_with("error:"),
|
||||
"text mode stderr must start with 'error:', got: {stderr:?}"
|
||||
);
|
||||
assert!(
|
||||
!stderr.trim_start().starts_with('{'),
|
||||
"text mode stderr must NOT be JSON, got: {stderr:?}"
|
||||
);
|
||||
}
|
||||
134
crates/kebab-cli/tests/cli_schema.rs
Normal file
134
crates/kebab-cli/tests/cli_schema.rs
Normal file
@@ -0,0 +1,134 @@
|
||||
//! Integration: spawn the kebab binary and parse `kebab schema [--json]`.
|
||||
//!
|
||||
//! Each test builds an isolated TempDir-rooted XDG layout, runs
|
||||
//! `kebab ingest` over an empty workspace (which creates and migrates
|
||||
//! kebab.sqlite), then exercises `kebab schema` in JSON and text modes.
|
||||
//! Using an empty workspace avoids the embedding model dependency while
|
||||
//! still seeding the DB so `open_existing` inside schema_with_config
|
||||
//! succeeds (a NotIndexed error fires when the DB file is absent).
|
||||
|
||||
use std::process::Command;
|
||||
|
||||
fn kebab_bin() -> std::path::PathBuf {
|
||||
let manifest = env!("CARGO_MANIFEST_DIR");
|
||||
std::path::PathBuf::from(manifest)
|
||||
.parent()
|
||||
.unwrap()
|
||||
.parent()
|
||||
.unwrap()
|
||||
.join("target/debug/kebab")
|
||||
}
|
||||
|
||||
fn xdg_envs(tmp: &std::path::Path) -> [(&'static str, std::path::PathBuf); 4] {
|
||||
[
|
||||
("XDG_CONFIG_HOME", tmp.join("cfg")),
|
||||
("XDG_DATA_HOME", tmp.join("data")),
|
||||
("XDG_CACHE_HOME", tmp.join("cache")),
|
||||
("XDG_STATE_HOME", tmp.join("state")),
|
||||
]
|
||||
}
|
||||
|
||||
/// Seed kebab.sqlite by running `kebab ingest` over an empty workspace dir.
|
||||
/// This is the minimum required for `kebab schema` to succeed: the store
|
||||
/// uses `open_existing`, which errors when the DB file is absent.
|
||||
fn seed_db(tmp: &tempfile::TempDir) {
|
||||
let ws = tmp.path().join("ws");
|
||||
std::fs::create_dir_all(&ws).unwrap();
|
||||
|
||||
let mut cmd = Command::new(kebab_bin());
|
||||
cmd.args(["ingest", "--root", ws.to_str().unwrap(), "--summary-only"]);
|
||||
for (k, v) in xdg_envs(tmp.path()) {
|
||||
cmd.env(k, v);
|
||||
}
|
||||
let out = cmd.output().unwrap();
|
||||
assert!(
|
||||
out.status.success(),
|
||||
"seed ingest failed: {}",
|
||||
String::from_utf8_lossy(&out.stderr)
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn cli_schema_json_emits_schema_v1() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
seed_db(&tmp);
|
||||
|
||||
let mut cmd = Command::new(kebab_bin());
|
||||
cmd.args(["--json", "schema"]);
|
||||
for (k, v) in xdg_envs(tmp.path()) {
|
||||
cmd.env(k, v);
|
||||
}
|
||||
let out = cmd.output().unwrap();
|
||||
assert!(
|
||||
out.status.success(),
|
||||
"kebab --json schema failed: {}",
|
||||
String::from_utf8_lossy(&out.stderr)
|
||||
);
|
||||
|
||||
let stdout = String::from_utf8(out.stdout).unwrap();
|
||||
let v: serde_json::Value = serde_json::from_str(&stdout).expect("stdout must be valid JSON");
|
||||
|
||||
assert_eq!(
|
||||
v.get("schema_version").and_then(|s| s.as_str()),
|
||||
Some("schema.v1"),
|
||||
"schema_version must be schema.v1"
|
||||
);
|
||||
assert!(
|
||||
v.get("kebab_version")
|
||||
.and_then(|s| s.as_str())
|
||||
.map(|s| !s.is_empty())
|
||||
.unwrap_or(false),
|
||||
"kebab_version must be a non-empty string"
|
||||
);
|
||||
|
||||
let caps = v
|
||||
.get("capabilities")
|
||||
.and_then(|c| c.as_object())
|
||||
.expect("capabilities must be a JSON object");
|
||||
assert_eq!(
|
||||
caps.get("json_mode").and_then(|b| b.as_bool()),
|
||||
Some(true),
|
||||
"capabilities.json_mode must be true"
|
||||
);
|
||||
assert_eq!(
|
||||
caps.get("mcp_server").and_then(|b| b.as_bool()),
|
||||
Some(false),
|
||||
"capabilities.mcp_server must be false (not yet shipped)"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn cli_schema_text_mode_runs() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
seed_db(&tmp);
|
||||
|
||||
let mut cmd = Command::new(kebab_bin());
|
||||
cmd.args(["schema"]);
|
||||
for (k, v) in xdg_envs(tmp.path()) {
|
||||
cmd.env(k, v);
|
||||
}
|
||||
let out = cmd.output().unwrap();
|
||||
assert!(
|
||||
out.status.success(),
|
||||
"kebab schema (text) failed: {}",
|
||||
String::from_utf8_lossy(&out.stderr)
|
||||
);
|
||||
|
||||
let stdout = String::from_utf8(out.stdout).unwrap();
|
||||
assert!(
|
||||
stdout.contains("kebab v"),
|
||||
"text output must contain 'kebab v', got: {stdout}"
|
||||
);
|
||||
assert!(
|
||||
stdout.contains("capabilities"),
|
||||
"text output must contain 'capabilities', got: {stdout}"
|
||||
);
|
||||
assert!(
|
||||
stdout.contains("models"),
|
||||
"text output must contain 'models', got: {stdout}"
|
||||
);
|
||||
assert!(
|
||||
stdout.contains("stats"),
|
||||
"text output must contain 'stats', got: {stdout}"
|
||||
);
|
||||
}
|
||||
@@ -13,8 +13,12 @@ kebab-core = { path = "../kebab-core" }
|
||||
anyhow = { workspace = true }
|
||||
serde = { workspace = true }
|
||||
serde_json = { workspace = true }
|
||||
thiserror = { workspace = true }
|
||||
toml = "0.8"
|
||||
dirs = "5"
|
||||
# p9-fb-05: warn-log when current_dir() fails (chroot, deleted cwd)
|
||||
# during workspace.root resolution.
|
||||
tracing = { workspace = true }
|
||||
|
||||
[dev-dependencies]
|
||||
tempfile = { workspace = true }
|
||||
|
||||
@@ -11,6 +11,19 @@ use serde::{Deserialize, Serialize};
|
||||
mod paths;
|
||||
pub use paths::{expand_path, expand_path_with_base};
|
||||
|
||||
/// Signal: `Config::from_file` / `Config::load` failed due to missing path,
|
||||
/// I/O failure, TOML parse failure, or post-parse validation failure.
|
||||
///
|
||||
/// Wrapped into `anyhow::Error` at the API boundary so callers that need
|
||||
/// structured details (e.g. kebab-cli's `error_classify`) can
|
||||
/// `downcast_ref::<ConfigInvalid>()` for the wire record.
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
#[error("config invalid at {path}: {cause}")]
|
||||
pub struct ConfigInvalid {
|
||||
pub path: PathBuf,
|
||||
pub cause: String,
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
|
||||
pub struct Config {
|
||||
pub schema_version: u32,
|
||||
@@ -393,7 +406,12 @@ impl Config {
|
||||
/// values resolve against the config file's directory rather
|
||||
/// than the user's `cwd`.
|
||||
pub fn from_file(path: &Path) -> anyhow::Result<Self> {
|
||||
let text = std::fs::read_to_string(path)?;
|
||||
let text = std::fs::read_to_string(path).map_err(|e| {
|
||||
anyhow::Error::new(ConfigInvalid {
|
||||
path: path.to_path_buf(),
|
||||
cause: format!("read_failed: {e}"),
|
||||
})
|
||||
})?;
|
||||
|
||||
// p9-fb-25: probe for the legacy `workspace.include` key — if
|
||||
// present, emit a one-shot deprecation warning. Detection uses
|
||||
@@ -417,7 +435,12 @@ impl Config {
|
||||
}
|
||||
}
|
||||
|
||||
let mut cfg: Self = toml::from_str(&text)?;
|
||||
let mut cfg: Self = toml::from_str(&text).map_err(|e| {
|
||||
anyhow::Error::new(ConfigInvalid {
|
||||
path: path.to_path_buf(),
|
||||
cause: format!("parse_failed: {e}"),
|
||||
})
|
||||
})?;
|
||||
cfg.source_dir = path.parent().map(Path::to_path_buf);
|
||||
Ok(cfg)
|
||||
}
|
||||
@@ -934,3 +957,31 @@ max_context_tokens = 8000
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod fb27_tests {
|
||||
use super::*;
|
||||
use std::path::PathBuf;
|
||||
|
||||
#[test]
|
||||
fn config_invalid_carries_path_and_cause() {
|
||||
let nonexistent = PathBuf::from("/this/path/should/not/exist/kebab.toml");
|
||||
let err = Config::from_file(&nonexistent).unwrap_err();
|
||||
let signal = err.downcast_ref::<ConfigInvalid>()
|
||||
.expect("from_file error should downcast to ConfigInvalid");
|
||||
assert_eq!(signal.path, nonexistent);
|
||||
assert!(!signal.cause.is_empty(), "cause should be non-empty");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn config_invalid_on_malformed_toml() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let p = dir.path().join("bad.toml");
|
||||
std::fs::write(&p, "this is not [valid toml").unwrap();
|
||||
let err = Config::from_file(&p).unwrap_err();
|
||||
let signal = err.downcast_ref::<ConfigInvalid>()
|
||||
.expect("malformed TOML should downcast to ConfigInvalid");
|
||||
assert_eq!(signal.path, p);
|
||||
assert!(!signal.cause.is_empty(), "cause should be non-empty");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -19,3 +19,10 @@ pub mod frontmatter;
|
||||
|
||||
pub use blocks::parse_blocks;
|
||||
pub use frontmatter::{BodyHints, FrontmatterSpan, parse_frontmatter};
|
||||
|
||||
/// Parser-version label for Markdown files ingested through this crate.
|
||||
/// Re-exported so `kebab-app::schema_with_config` can embed it in
|
||||
/// `SchemaV1.models.parser_version` without duplicating the literal.
|
||||
///
|
||||
/// Kept in sync with `KEBAB_PARSE_MD_VERSION` in `kebab-app/src/lib.rs`.
|
||||
pub const PARSER_VERSION: &str = "md-frontmatter-v2";
|
||||
|
||||
@@ -34,4 +34,4 @@ pub use error::StoreError;
|
||||
pub use eval::{EvalQueryResultRecord, EvalRunRecord, EvalRunRow};
|
||||
pub use fts::rebuild_chunks_fts;
|
||||
pub use jobs::IngestRunRow;
|
||||
pub use store::SqliteStore;
|
||||
pub use store::{CountSummary, NotIndexed, SqliteStore};
|
||||
|
||||
@@ -11,11 +11,27 @@ use std::sync::atomic::{AtomicU64, Ordering};
|
||||
use std::sync::{Mutex, MutexGuard};
|
||||
|
||||
use anyhow::{Context, Result};
|
||||
use rusqlite::{Connection, OptionalExtension, params};
|
||||
use rusqlite::{Connection, OpenFlags, OptionalExtension, params};
|
||||
|
||||
use crate::error::StoreError;
|
||||
use crate::schema;
|
||||
|
||||
/// Signal: SQLite database file does not exist, or schema_version does
|
||||
/// not match the binary's expectation.
|
||||
///
|
||||
/// Distinct from generic I/O / SQL errors so kebab-cli can surface
|
||||
/// `code: "not_indexed"` with a hint to run `kebab init` / `kebab ingest`.
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
#[error("not indexed: expected={expected}, found={found:?}")]
|
||||
pub struct NotIndexed {
|
||||
pub expected: String,
|
||||
/// When the DB file exists but the schema is incompatible, this holds
|
||||
/// the highest applied migration version string (e.g. `"V005"`).
|
||||
/// `None` means the file was absent entirely (current Task 3 behavior;
|
||||
/// schema-mismatch wrapping is a deferred follow-up).
|
||||
pub found: Option<String>,
|
||||
}
|
||||
|
||||
/// Monotonic counter used to namespace per-process temp file names so
|
||||
/// concurrent `put_asset_with_bytes` calls in the same millisecond cannot
|
||||
/// collide on `<final>.tmp.<pid>.<n>`.
|
||||
@@ -59,6 +75,47 @@ pub struct SqliteStore {
|
||||
}
|
||||
|
||||
impl SqliteStore {
|
||||
/// Open an existing SQLite DB at `path`.
|
||||
///
|
||||
/// Unlike [`Self::open`], this does NOT create the file — if it is
|
||||
/// missing, returns a [`NotIndexed`] signal suitable for `error.v1`
|
||||
/// translation. Opens read-write to support WAL pragmas; callers should
|
||||
/// not issue mutations through this connection — use [`Self::open`] for
|
||||
/// ingest paths.
|
||||
///
|
||||
/// **Does not run migrations** — call [`Self::run_migrations`] next if
|
||||
/// you need the schema initialised.
|
||||
pub fn open_existing(path: &std::path::Path) -> anyhow::Result<Self> {
|
||||
let conn = Connection::open_with_flags(
|
||||
path,
|
||||
OpenFlags::SQLITE_OPEN_READ_WRITE | OpenFlags::SQLITE_OPEN_URI,
|
||||
)
|
||||
.map_err(|_| {
|
||||
anyhow::Error::new(NotIndexed {
|
||||
expected: path.to_string_lossy().to_string(),
|
||||
found: None,
|
||||
})
|
||||
})?;
|
||||
apply_pragmas(&conn)?;
|
||||
|
||||
let data_dir = path
|
||||
.parent()
|
||||
.unwrap_or_else(|| std::path::Path::new("."))
|
||||
.to_path_buf();
|
||||
|
||||
tracing::debug!(
|
||||
target: "kebab-store-sqlite",
|
||||
db = %path.display(),
|
||||
"opened existing sqlite store"
|
||||
);
|
||||
|
||||
Ok(Self {
|
||||
data_dir,
|
||||
copy_threshold_bytes: 0,
|
||||
conn: Mutex::new(conn),
|
||||
})
|
||||
}
|
||||
|
||||
/// Open (or create) the SQLite file under `config.storage.data_dir`,
|
||||
/// apply pragmas (foreign_keys / WAL / synchronous=NORMAL /
|
||||
/// temp_store=MEMORY), and create parent directories as needed.
|
||||
@@ -534,6 +591,61 @@ pub(crate) fn upsert_asset_row(
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// p9-fb-27: aggregate counts for `SchemaV1.stats` block.
|
||||
///
|
||||
/// Returned by [`SqliteStore::count_summary`] and consumed by
|
||||
/// `kebab-app::schema_with_config` to populate the `stats` sub-object of the
|
||||
/// `schema.v1` wire record.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct CountSummary {
|
||||
pub doc_count: u64,
|
||||
pub chunk_count: u64,
|
||||
pub asset_count: u64,
|
||||
/// ISO-8601 timestamp of the most-recently updated document row, or
|
||||
/// `None` when the store is empty.
|
||||
pub last_ingest_at: Option<String>,
|
||||
}
|
||||
|
||||
impl SqliteStore {
|
||||
/// Return aggregate counts from the three primary tables plus the
|
||||
/// most-recent `documents.updated_at` timestamp.
|
||||
///
|
||||
/// Uses `read_conn()` (no mutations) — mirrors the pattern used by
|
||||
/// [`Self::corpus_revision`].
|
||||
pub fn count_summary(&self) -> anyhow::Result<CountSummary> {
|
||||
let conn = self.read_conn();
|
||||
|
||||
let doc_count: u64 = conn
|
||||
.query_row("SELECT COUNT(*) FROM documents", [], |r| r.get(0))
|
||||
.context("count documents")?;
|
||||
|
||||
let chunk_count: u64 = conn
|
||||
.query_row("SELECT COUNT(*) FROM chunks", [], |r| r.get(0))
|
||||
.context("count chunks")?;
|
||||
|
||||
let asset_count: u64 = conn
|
||||
.query_row("SELECT COUNT(*) FROM assets", [], |r| r.get(0))
|
||||
.context("count assets")?;
|
||||
|
||||
let last_ingest_at: Option<String> = conn
|
||||
.query_row(
|
||||
"SELECT MAX(updated_at) FROM documents",
|
||||
[],
|
||||
|r| r.get(0),
|
||||
)
|
||||
.optional()
|
||||
.context("max updated_at")?
|
||||
.flatten();
|
||||
|
||||
Ok(CountSummary {
|
||||
doc_count,
|
||||
chunk_count,
|
||||
asset_count,
|
||||
last_ingest_at,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
/// Apply the design §5 / task-spec pragmas. Called once per connection.
|
||||
/// Note: WAL is persistent (the journal-mode setting is sticky in the DB
|
||||
/// header) but `foreign_keys`, `synchronous`, and `temp_store` are
|
||||
@@ -548,3 +660,27 @@ fn apply_pragmas(conn: &Connection) -> Result<()> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn open_fresh_store() -> (tempfile::TempDir, SqliteStore) {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let mut cfg = kebab_config::Config::defaults();
|
||||
cfg.storage.data_dir = dir.path().to_string_lossy().into_owned();
|
||||
let store = SqliteStore::open(&cfg).unwrap();
|
||||
store.run_migrations().unwrap();
|
||||
(dir, store)
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn count_summary_zero_on_fresh_store() {
|
||||
let (_dir, store) = open_fresh_store();
|
||||
let s = store.count_summary().unwrap();
|
||||
assert_eq!(s.doc_count, 0);
|
||||
assert_eq!(s.chunk_count, 0);
|
||||
assert_eq!(s.asset_count, 0);
|
||||
assert!(s.last_ingest_at.is_none());
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
27
crates/kebab-store-sqlite/tests/not_indexed.rs
Normal file
27
crates/kebab-store-sqlite/tests/not_indexed.rs
Normal file
@@ -0,0 +1,27 @@
|
||||
//! Signal test: `SqliteStore::open_existing` emits `NotIndexed` when the DB
|
||||
//! file is absent.
|
||||
|
||||
use kebab_store_sqlite::{NotIndexed, SqliteStore};
|
||||
|
||||
#[test]
|
||||
fn not_indexed_signal_emitted_when_db_missing() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let nonexistent_db = dir.path().join("does-not-exist.sqlite");
|
||||
let res = SqliteStore::open_existing(&nonexistent_db);
|
||||
let err = match res {
|
||||
Ok(_) => panic!("opening a missing DB should fail"),
|
||||
Err(e) => e,
|
||||
};
|
||||
let signal = err
|
||||
.downcast_ref::<NotIndexed>()
|
||||
.expect("missing DB error should downcast to NotIndexed");
|
||||
assert_eq!(signal.expected, nonexistent_db.to_string_lossy().as_ref());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn open_existing_does_not_create_missing_db() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let nonexistent_db = dir.path().join("does-not-exist.sqlite");
|
||||
let _ = SqliteStore::open_existing(&nonexistent_db);
|
||||
assert!(!nonexistent_db.exists(), "open_existing must NOT create the file");
|
||||
}
|
||||
@@ -28,4 +28,4 @@ mod arrow_batch;
|
||||
mod paths;
|
||||
mod store;
|
||||
|
||||
pub use store::LanceVectorStore;
|
||||
pub use store::{INDEX_VERSION_STR, LanceVectorStore};
|
||||
|
||||
@@ -44,6 +44,12 @@ const INDEX_KIND: &str = "flat";
|
||||
/// `v1` so re-runs produce stable IDs.
|
||||
const INDEX_VERSION: &str = "v1";
|
||||
|
||||
/// Public view of [`INDEX_VERSION`] for `kebab-app::schema_with_config`.
|
||||
/// The value is the same string — exposed as `pub const` so the schema
|
||||
/// facade can embed it in `SchemaV1.models.index_version` without
|
||||
/// reaching into a private constant.
|
||||
pub const INDEX_VERSION_STR: &str = INDEX_VERSION;
|
||||
|
||||
/// Lance VectorStore.
|
||||
///
|
||||
/// Holds a single `lancedb::Connection` opened against
|
||||
|
||||
@@ -1426,6 +1426,23 @@ $ kebab doctor
|
||||
1 check failed.
|
||||
```
|
||||
|
||||
### 10.1 Capability matrix + introspection (fb-27)
|
||||
|
||||
`kebab schema [--json]` 가 binary 의 capability set 을 노출한다.
|
||||
`schema.v1` wire schema 가 `wire.schemas` (지원 wire id 목록), `capabilities`
|
||||
(bool flag, 미래 surface 의 placeholder 도 항상 포함), `models` (cascade
|
||||
version 6축), `stats` (doc/chunk/asset count + last_ingest_at) 를 한 호출로 반환한다.
|
||||
|
||||
`error.v1` wire schema 가 `--json` 모드에서 fatal error 를 stderr ndjson 으로
|
||||
emit. code 7개 initial set: `config_invalid` / `not_indexed` /
|
||||
`model_unreachable` / `model_not_pulled` / `timeout` / `io_error` /
|
||||
`generic`. exit code 0/1/2/3 unchanged — `error.v1.code` 가 fine-grained
|
||||
agent 분기 source. 자세한 details shape per code 는
|
||||
[docs/wire-schema/v1/error.schema.json](../../wire-schema/v1/error.schema.json).
|
||||
HOTFIXES 의 `2026-05-07 — p9-fb-27` 항목이 details shape 의
|
||||
interim deviation (IoFailure / OpTimeout 신규 typed signal 도입 전까지의
|
||||
transitional 형태) 의 source of truth.
|
||||
|
||||
---
|
||||
|
||||
## 11. 동결 범위 / 변경 정책
|
||||
|
||||
35
docs/wire-schema/v1/error.schema.json
Normal file
35
docs/wire-schema/v1/error.schema.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"$id": "https://kebab.local/wire-schema/v1/error.schema.json",
|
||||
"title": "error.v1",
|
||||
"description": "Structured fatal error emitted on stderr in --json mode. The `details` shape varies per `code`; consumers should branch on `code` and treat `details` as best-effort context.",
|
||||
"type": "object",
|
||||
"required": ["schema_version", "code", "message", "details"],
|
||||
"properties": {
|
||||
"schema_version": { "const": "error.v1" },
|
||||
"code": {
|
||||
"type": "string",
|
||||
"enum": [
|
||||
"config_invalid",
|
||||
"not_indexed",
|
||||
"model_unreachable",
|
||||
"model_not_pulled",
|
||||
"timeout",
|
||||
"io_error",
|
||||
"generic"
|
||||
]
|
||||
},
|
||||
"message": { "type": "string" },
|
||||
"details": {
|
||||
"type": "object",
|
||||
"additionalProperties": true,
|
||||
"description": "Per-code free-form context. config_invalid: { path, cause }. not_indexed: { expected, found }. model_unreachable: { endpoint, source }. model_not_pulled: { model }. timeout: { source }. io_error: { kind }. generic: { chain (when --verbose) }."
|
||||
},
|
||||
"hint": {
|
||||
"anyOf": [
|
||||
{ "type": "string" },
|
||||
{ "type": "null" }
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
61
docs/wire-schema/v1/schema.schema.json
Normal file
61
docs/wire-schema/v1/schema.schema.json
Normal file
@@ -0,0 +1,61 @@
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"$id": "https://kebab.local/wire-schema/v1/schema.schema.json",
|
||||
"title": "schema.v1",
|
||||
"description": "kebab introspection report — wire schemas, capabilities, model versions, and index stats.",
|
||||
"type": "object",
|
||||
"required": ["schema_version", "kebab_version", "wire", "capabilities", "models", "stats"],
|
||||
"properties": {
|
||||
"schema_version": { "const": "schema.v1" },
|
||||
"kebab_version": { "type": "string" },
|
||||
"wire": {
|
||||
"type": "object",
|
||||
"required": ["schemas"],
|
||||
"properties": {
|
||||
"schemas": {
|
||||
"type": "array",
|
||||
"items": { "type": "string", "pattern": "^[a-z_]+\\.v[0-9]+$" }
|
||||
}
|
||||
}
|
||||
},
|
||||
"capabilities": {
|
||||
"type": "object",
|
||||
"additionalProperties": { "type": "boolean" },
|
||||
"required": [
|
||||
"json_mode", "ingest_progress", "ingest_cancellation",
|
||||
"rag_multi_turn", "search_cache", "incremental_ingest",
|
||||
"streaming_ask", "http_daemon", "mcp_server", "single_file_ingest"
|
||||
]
|
||||
},
|
||||
"models": {
|
||||
"type": "object",
|
||||
"required": [
|
||||
"parser_version", "chunker_version", "embedding_version",
|
||||
"prompt_template_version", "index_version", "corpus_revision"
|
||||
],
|
||||
"properties": {
|
||||
"parser_version": { "type": "string" },
|
||||
"chunker_version": { "type": "string" },
|
||||
"embedding_version": { "type": "string" },
|
||||
"prompt_template_version": { "type": "string" },
|
||||
"index_version": { "type": "string" },
|
||||
"corpus_revision": { "type": "integer", "minimum": 0 }
|
||||
}
|
||||
},
|
||||
"stats": {
|
||||
"type": "object",
|
||||
"required": ["doc_count", "chunk_count", "asset_count", "last_ingest_at"],
|
||||
"properties": {
|
||||
"doc_count": { "type": "integer", "minimum": 0 },
|
||||
"chunk_count": { "type": "integer", "minimum": 0 },
|
||||
"asset_count": { "type": "integer", "minimum": 0 },
|
||||
"last_ingest_at": {
|
||||
"anyOf": [
|
||||
{ "type": "string", "format": "date-time" },
|
||||
{ "type": "null" }
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -57,6 +57,16 @@ kebab ask "<question>" --json
|
||||
- `ask --json`'s `citations[]` mirrors `search_hit.v1` minus retrieval internals — same `doc_path` / `citation` shape.
|
||||
- Schema reference lives in the kebab repo at `docs/wire-schema/v1/*.schema.json` if a field is unclear.
|
||||
|
||||
## Capability discovery
|
||||
|
||||
Before using streaming or multi-turn features, you can probe what this binary supports:
|
||||
|
||||
```bash
|
||||
kebab schema --json
|
||||
```
|
||||
|
||||
Returns a `schema.v1` object with: `wire.schemas` (supported wire ids), `capabilities` (bool flags — e.g. `streaming_ask`, `rag_multi_turn`), `models` (version cascade 6-axis), and `stats` (doc/chunk/asset count + last_ingest_at). Gate streaming / session flows on `capabilities.streaming_ask` / `capabilities.rag_multi_turn` being `true`. This call is cheap (no LLM) and can be run once per session.
|
||||
|
||||
## Quick health check
|
||||
|
||||
If a call fails or returns suspicious output, run `kebab doctor` first — it surfaces config-load / data-dir / Ollama-reachability problems in one line each. Don't silently retry on errors; report the doctor output.
|
||||
|
||||
@@ -14,6 +14,45 @@ historical contract that was implemented; this file accumulates the
|
||||
deltas so phase 5+ readers can find the live behavior without diffing
|
||||
git history.
|
||||
|
||||
## 2026-05-07 — p9-fb-27 (post-dogfooding): introspection (`kebab schema`) + structured error wire
|
||||
|
||||
**Source feedback**: 사용자 도그푸딩 2026-05-06 — agent 가 kebab 인스턴스의 wire 버전 / 기능 / 모델 / 인덱스 통계 introspect 못 함; error 가 stderr text 라 substring 분기 필요.
|
||||
|
||||
**Live binding 변경**:
|
||||
|
||||
- 신규 명령 `kebab schema [--json]` — text / `schema.v1` JSON. `--config <path>` honor.
|
||||
- 신규 wire `schema.v1` — `kebab_version` (`env!("CARGO_PKG_VERSION")`) / `wire.schemas` / `capabilities` (10 bool, 4 미래 surface 포함) / `models` (parser/chunker/embedding/prompt_template/index/corpus_revision 6축) / `stats` (doc/chunk/asset count + last_ingest_at). `SchemaV1` 가 자체 `schema_version: "schema.v1"` 필드 carry — `wire_doctor` 와 동일 idempotent re-tag pattern.
|
||||
- 신규 wire `error.v1` — `--json` 모드에서 fatal error 가 stderr ndjson 으로 emit. 비 `--json` 은 기존 stderr text 유지.
|
||||
- error code 7개 initial set: `config_invalid` (`ConfigInvalid` signal in kebab-config, `cause` prefix `read_failed:` / `parse_failed:` underscore-slugged for stable agent matching) / `not_indexed` (`NotIndexed` in kebab-store-sqlite, `SqliteStore::open_existing` API 신규 — `OpenFlags::SQLITE_OPEN_READ_WRITE | SQLITE_OPEN_URI` 로 silent CREATE 방지) / `model_unreachable` (`LlmError::Unreachable`) / `model_not_pulled` (`LlmError::ModelNotPulled`) / `timeout` (`LlmError::Timeout`) / `io_error` (`std::io::Error` chain detection) / `generic` (catch-all, verbose 시 `details.chain` 채움).
|
||||
- exit code 0/1/2/3 unchanged — `RefusalSignal` / `NoHitSignal` / `DoctorUnhealthy` 만 보고 1/1/3 결정. 신규 5 typed signal 모두 fall-through → 2.
|
||||
- `kebab-app::error_signal` 모듈 신규 — `doctor_signal` 의 3 signal 과 신규 typed error 들 한 곳에서 re-export.
|
||||
- `kebab-store-sqlite::SqliteStore::count_summary` 메서드 신규 — `schema.v1.stats` block backing.
|
||||
- `kebab_parse_md::PARSER_VERSION` + `kebab_store_vector::INDEX_VERSION_STR` `pub const` 노출 — kebab-app 의 `Models` block 이 single source of truth (cascade 규약 충족).
|
||||
|
||||
**Spec contract impact**: design §10 에 §10.1 capability matrix subsection 추가 — `schema.v1` / `error.v1` wire 명시.
|
||||
|
||||
**Tests added**: kebab-config fb27_tests (2: ConfigInvalid downcast / malformed TOML), kebab-store-sqlite (3: NotIndexed signal + open_existing no-create regression + count_summary zero state), kebab-cli error_classify::tests (7: 7 code 분류 + verbose chain), kebab-cli wire::tests (2: schema.v1 / error.v1 round-trip), kebab-app schema_report integration (2: ingested KB stats + empty KB), kebab-cli cli_schema integration (2: --json + text), kebab-cli cli_error_wire integration (2: --json error.v1 + legacy text). 약 20 신규 테스트.
|
||||
|
||||
**Known limitation (deferred — interim wire shape)**:
|
||||
|
||||
- `error.v1.details` shape per code 가 frozen design literal 과 일부 일탈 — 신규 typed signal 도입 deferred 라 발생:
|
||||
- `io_error.details` = `{ "kind": "<ErrorKind debug string>" }` (spec literal 의 `{ path, op }` 아님 — `IoFailure` typed signal 추가 시 정정).
|
||||
- `timeout.details` = `{ "source": "<error display>" }` (spec literal 의 `{ operation, elapsed_ms, deadline_ms }` 아님 — `OpTimeout` typed signal + per-callsite stamping 추가 시 정정).
|
||||
- `model_unreachable.details` = `{ endpoint, source }` (spec literal 의 `{ endpoint, operation }` — `LlmError::Unreachable` 가 `operation` field 없음).
|
||||
- `model_not_pulled.details` = `{ model }` (spec literal 의 `{ model, endpoint, operation }` — `LlmError::ModelNotPulled` 가 model id 만 carry).
|
||||
- JSON Schema literal `docs/wire-schema/v1/error.schema.json` 의 `details` block 은 `additionalProperties: true` + `required: []` 로 permissive — 실제 emit shape 반영. 후속 task 가 typed signal 추가 시 schema 의 description 갱신.
|
||||
- `Config::load(Some(/nonexistent))` 가 silent default fallback — agent 가 `--config /wrong` 으로 호출 시 `config_invalid` 가 아닌 default config 적용 + 후속 명령이 default 동작. fb-28 (`--readonly`/`--quiet`) 또는 별 follow-up 에서 `--config` strict mode 도입 검토 필요.
|
||||
- `Config::from_file` 의 schema-mismatch (DB 마이그레이션 버전 안 맞음) 는 `NotIndexed.found = None` 으로만 보고 — `_refinery_schema_history` 의 max version 을 read 하는 후속 PR 에서 `found: Some("V005")` 같은 정확한 값 채움.
|
||||
- `LlmError::Stream` / `Malformed` 가 `code: "generic"` fallback — 후속 task 에서 `stream_aborted` / `malformed_response` 같은 dedicated code 도입 검토 (design §10.1 future-extensions 절 참조).
|
||||
- `not_indexed.details` 가 `{ expected, found }` 만 emit (spec literal 의 `{ data_dir, expected, found }` 아님 — `expected` 가 full DB path 라 data_dir 은 caller 에서 derive 해야 함, NotIndexed signal 자체는 path 한 개만 carry).
|
||||
- README 의 wire schema 목록과 CLAUDE.md 의 wire schema 목록이 fb-27 머지 시점에 약간 일치 안 함 (CLAUDE.md 가 `eval_run.v1`/`eval_compare.v1`/`list_docs.v1` 포함, 실제 docs/wire-schema/v1/ 에 해당 파일 없음). 별 follow-up 에서 doc / 실제 wire 동기화 sweep 진행.
|
||||
- `SqliteStore::open_existing` 가 `SQLITE_OPEN_READ_WRITE` 로 열고 doc 으로만 "callers should not issue mutations" 명시 — 컴파일러 enforcement 없음. 후속 PR 에서 `apply_pragmas` 의 WAL 라인을 분리한 `apply_read_pragmas` + `SQLITE_OPEN_READ_ONLY` 변형 도입 검토 (WAL mode 는 DB 헤더에 영속이라 RO 도 동작 가능).
|
||||
|
||||
**Amends**:
|
||||
- design §10 (capability matrix subsection 추가).
|
||||
- spec `tasks/p9/p9-fb-27-introspection-and-error-wire.md` (status `open` → `completed`).
|
||||
- spec stub 의 `Goal (skeleton)` 의 6 exit code (`0/1/2/3/4/5`) 제안 → 실제 채택 0/1/2/3 only.
|
||||
|
||||
## 2026-05-05 — p9-fb-25 (post-dogfooding): config workspace.include 제거 + 지원 형식 가시성
|
||||
|
||||
**Source feedback**: 사용자 도그푸딩 2026-05-05 — config 의 `workspace.include` + `workspace.exclude` 동시 존재가 case 4 (둘 다 매치 안 함) 의미 모호 + 어차피 처리 가능 형식 (md / png / jpg / pdf) 이 정해져 있으니 사용자에게 명시 필요.
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P9
|
||||
component: kebab-cli + kebab-app + wire-schema
|
||||
task_id: p9-fb-27
|
||||
title: "Introspection (`kebab schema`) + structured error wire"
|
||||
status: open
|
||||
status: completed
|
||||
target_version: 0.3.0
|
||||
depends_on: []
|
||||
unblocks: [p9-fb-30]
|
||||
@@ -14,7 +14,7 @@ source_feedback: 사용자 도그푸딩 2026-05-06 — agent 가 kebab 인스턴
|
||||
|
||||
# p9-fb-27 — Introspection + structured error wire
|
||||
|
||||
> ⏳ **백로그 only — 미구현.** 본 spec 은 도그푸딩 피드백 skeleton. 구현 착수 전 [superpowers:brainstorming](../../docs/superpowers/) 으로 설계 단계 선행 필요. capability matrix 정의 / error code enumerate / exit code 매핑 brainstorm 후 확정.
|
||||
> ✅ **구현 완료.** 본 spec 은 구현 시점의 frozen 상태. post-merge deviation (특히 `error.v1.details` 의 interim shape) 은 [HOTFIXES.md](../HOTFIXES.md) 의 `2026-05-07 — p9-fb-27` 항목 참조 — live source of truth.
|
||||
|
||||
## 증상 / 동기
|
||||
|
||||
|
||||
Reference in New Issue
Block a user