Files
kebab/crates/kebab-store-sqlite/src/lib.rs
altair823 27baec82ea fix(dogfood): auto-purge stored docs for filesystem-deleted files
Files deleted from disk (rm a.md) were leaving stale documents + chunks +
embeddings in the store, surfacing as ghost citations in search/ask.
Existing purge_orphan_at_workspace_path only handled content-changed
stale (WHERE workspace_path=? AND asset_id != ?) — file deletion has no
new asset_id.

Fix: post-walker-scan sweep. Compute (stored_paths - scanned_paths),
for each candidate check filesystem existence — only purge when the
file is TRULY missing. Scope-narrowing case (file on disk but outside
include glob) is explicitly NOT purged to protect users from accidental
data loss via config edits.

Adds:
- DocumentStore::all_workspace_paths trait method + SqliteStore impl
- purge_deleted_workspace_path in store-sqlite (returns chunk_ids for
  vector delete; deletes doc CASCADE + asset row + copied storage file)
- sweep_deleted_files in kebab-app::ingest path; called once per ingest
  before the per-asset loop
- IngestReport.purged_deleted_files counter (additive, serde default)
- CLI ingest summary mentions purge count when > 0
- 2 integration tests: file_deletion_auto_purge + include_scope_narrowing_does_NOT_purge

dogfood discovery (PR #142 1B + multi-root: kebab-docs + httpx + zod
+ lodash). Per user decision: only filesystem deletion auto-purges;
scope narrowing requires explicit kebab reset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 06:51:07 +00:00

39 lines
1.5 KiB
Rust

//! `kb-store-sqlite` — SQLite-backed implementations of
//! [`kebab_core::DocumentStore`] and [`kebab_core::JobRepo`] (§7.2), plus the
//! asset writer that copies (or references) raw bytes per design §5.2.
//!
//! Schema is owned by `migrations/V001__init.sql` (workspace root), which
//! ships the full §5 DDL minus the FTS5 virtual table + triggers (those
//! land in P2-1's `V002`).
//!
//! Allowed deps per task spec: `kb-core`, `kb-config`, `rusqlite`,
//! `refinery`, `serde_json`, `time`, `blake3`, `tracing`, `anyhow`,
//! `thiserror`. `globset` was added in P3-3 to back the
//! `filter_chunks` helper (used by `kb-store-vector`'s post-filter
//! pass — moving the SQL JOIN into this crate kept `kb-store-vector`
//! from needing its own `rusqlite` / `globset` direct deps). NOT
//! allowed: `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-vector`,
//! `kb-source-fs`, etc. (`kb-parse-md`, `kb-normalize`, `kb-chunk` may
//! appear as **dev-deps** — see `Cargo.toml` — to drive the contract
//! round-trip test off a real Markdown fixture.)
mod answers;
mod chat_sessions;
mod documents;
mod embeddings;
mod error;
mod eval;
mod filters;
mod fts;
mod jobs;
mod schema;
mod store;
pub mod stats_ext;
pub use embeddings::EmbeddingRecordRow;
pub use error::StoreError;
pub use eval::{EvalQueryResultRecord, EvalRunRecord, EvalRunRow};
pub use fts::rebuild_chunks_fts;
pub use jobs::IngestRunRow;
pub use store::{CountSummary, NotIndexed, SqliteStore, purge_deleted_workspace_path};