Files deleted from disk (rm a.md) were leaving stale documents + chunks + embeddings in the store, surfacing as ghost citations in search/ask. Existing purge_orphan_at_workspace_path only handled content-changed stale (WHERE workspace_path=? AND asset_id != ?) — file deletion has no new asset_id. Fix: post-walker-scan sweep. Compute (stored_paths - scanned_paths), for each candidate check filesystem existence — only purge when the file is TRULY missing. Scope-narrowing case (file on disk but outside include glob) is explicitly NOT purged to protect users from accidental data loss via config edits. Adds: - DocumentStore::all_workspace_paths trait method + SqliteStore impl - purge_deleted_workspace_path in store-sqlite (returns chunk_ids for vector delete; deletes doc CASCADE + asset row + copied storage file) - sweep_deleted_files in kebab-app::ingest path; called once per ingest before the per-asset loop - IngestReport.purged_deleted_files counter (additive, serde default) - CLI ingest summary mentions purge count when > 0 - 2 integration tests: file_deletion_auto_purge + include_scope_narrowing_does_NOT_purge dogfood discovery (PR #142 1B + multi-root: kebab-docs + httpx + zod + lodash). Per user decision: only filesystem deletion auto-purges; scope narrowing requires explicit kebab reset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
39 lines
1.5 KiB
Rust
39 lines
1.5 KiB
Rust
//! `kb-store-sqlite` — SQLite-backed implementations of
|
|
//! [`kebab_core::DocumentStore`] and [`kebab_core::JobRepo`] (§7.2), plus the
|
|
//! asset writer that copies (or references) raw bytes per design §5.2.
|
|
//!
|
|
//! Schema is owned by `migrations/V001__init.sql` (workspace root), which
|
|
//! ships the full §5 DDL minus the FTS5 virtual table + triggers (those
|
|
//! land in P2-1's `V002`).
|
|
//!
|
|
//! Allowed deps per task spec: `kb-core`, `kb-config`, `rusqlite`,
|
|
//! `refinery`, `serde_json`, `time`, `blake3`, `tracing`, `anyhow`,
|
|
//! `thiserror`. `globset` was added in P3-3 to back the
|
|
//! `filter_chunks` helper (used by `kb-store-vector`'s post-filter
|
|
//! pass — moving the SQL JOIN into this crate kept `kb-store-vector`
|
|
//! from needing its own `rusqlite` / `globset` direct deps). NOT
|
|
//! allowed: `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-vector`,
|
|
//! `kb-source-fs`, etc. (`kb-parse-md`, `kb-normalize`, `kb-chunk` may
|
|
//! appear as **dev-deps** — see `Cargo.toml` — to drive the contract
|
|
//! round-trip test off a real Markdown fixture.)
|
|
|
|
mod answers;
|
|
mod chat_sessions;
|
|
mod documents;
|
|
mod embeddings;
|
|
mod error;
|
|
mod eval;
|
|
mod filters;
|
|
mod fts;
|
|
mod jobs;
|
|
mod schema;
|
|
mod store;
|
|
pub mod stats_ext;
|
|
|
|
pub use embeddings::EmbeddingRecordRow;
|
|
pub use error::StoreError;
|
|
pub use eval::{EvalQueryResultRecord, EvalRunRecord, EvalRunRow};
|
|
pub use fts::rebuild_chunks_fts;
|
|
pub use jobs::IngestRunRow;
|
|
pub use store::{CountSummary, NotIndexed, SqliteStore, purge_deleted_workspace_path};
|