feat(kebab-app + kebab-store-sqlite): p9-fb-19 search LRU cache + corpus_revision
도그푸딩 item 15 — TUI / 같은 process 안에서 동일 query 반복 시 SQLite FTS + Lance + RRF 재계산이 매번 발생하던 비용 해소. in-process LRU 캐시 + 모노토닉 corpus_revision 카운터로 ingest commit 발생 시 모든 entry 자동 stale. ## 핵심 변경 - **SQLite V004 migration**: `kv (key TEXT PRIMARY KEY, value TEXT) STRICT` + `corpus_revision = '0'` seed. 미래의 다른 scalar 도 같은 테이블에 들어갈 수 있는 generic shape. - **`SqliteStore::corpus_revision()` / `bump_corpus_revision()`** — `UPDATE ... CAST AS INTEGER + 1` atomic. INSERT-OR-IGNORE 도 함께 실행 (V004 seed 가 무슨 이유로 누락된 케이스 paranoid). - **`kebab-app::ingest_with_config_cancellable`** — `new + updated > 0` 시 bump, no-op (skipped-only) reingest 는 cache 보존. - **`App.search_cache: Option<Mutex<LruCache<SearchCacheKey, Vec< SearchHit>>>>`** — `config.search.cache_capacity` (default 256, 0 비활성). `lru = "0.12"` workspace dep 추가. - **`SearchCacheKey`** = `query_norm` (NFKC + trim + lowercase) + `mode` + `k` + `snippet_chars` + `embedding_version` (vector/hybrid 만, lexical 은 빈 문자열) + `chunker_version` + `corpus_revision` snapshot. - **`App::search`** rewrite — cache 활성 시 lookup → miss 면 기존 `search_uncached` 호출 후 put. cache 비활성이거나 lock 실패면 straight-line. - **`App::search_uncached`** (rename of pre-fb-19 `search` body) + `search_uncached_with_config` facade — CLI `kebab search --no-cache` 로 진입. - **`Config.search.cache_capacity: usize`** field, `#[serde(default)]` 로 기존 config 호환. - **CLI `--no-cache`** flag — 디버깅용 (CLI 는 매 호출이 새 process 라 사실상 no-op 이지만 spec 명시 + 향후 long-lived process 호환). - **frozen design §9 versioning** 표에 `corpus_revision` row 추가 (기존 `index_version` 라벨과 다른 차원: 라벨은 retrieval 형상, corpus_revision 은 ingest commit ack). ## 테스트 - `kebab-store-sqlite` 신규 3 unit (fresh=0, monotonic bump, persist across reopen) - `kebab-app` 신규 4 integration (cached repeat 같은 hits, NFKC 정규화 로 case/whitespace collapse, --no-cache parity, first ingest bumps corpus_revision) - 워크스페이스 전체 `cargo test --workspace --no-fail-fast -j 1` exit 0 - `cargo clippy --workspace --all-targets -- -D warnings` clean ## 문서 - README `kebab search` 행: 캐시 동작 + `--no-cache` 안내 + corpus_ revision 무효화 메커니즘 - docs/SMOKE.md `[search]` 절에 `cache_capacity` 라인 추가 - HANDOFF: 2026-05-03 entry - spec status planned → in_progress ## Out of scope - patch-and-merge incremental (RRF 정규화 전체 hit set 기준이라 어려움) - SQLite 영속 cache (P+) - 다른 process 간 cache 공유 (in-process 만 — corpus_revision 이 cross-process 무효화는 O(1)) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -42,6 +42,13 @@ tracing-subscriber = { version = "0.3", features = ["env-filter", "fmt", "json
|
||||
tracing-appender = "0.2"
|
||||
toml = "0.8"
|
||||
dirs = "5"
|
||||
# p9-fb-19: in-process LRU cache for `App::search`. Capacity from
|
||||
# `config.search.cache_capacity` (default 256, ~1.3 MB cap).
|
||||
lru = { workspace = true }
|
||||
# p9-fb-19: NFKC-normalize cache-key queries so `"Foo"` / `"FOO"` /
|
||||
# `" foo "` collapse to one entry. Same crate kebab-normalize +
|
||||
# kebab-core already use, no version drift.
|
||||
unicode-normalization = "0.1"
|
||||
|
||||
[dev-dependencies]
|
||||
rusqlite = { workspace = true }
|
||||
|
||||
@@ -33,9 +33,11 @@
|
||||
//! in that mode [`App::embedder`] returns `None` and callers must fall
|
||||
//! back to lexical-only search.
|
||||
|
||||
use std::sync::{Arc, OnceLock};
|
||||
use std::num::NonZeroUsize;
|
||||
use std::sync::{Arc, Mutex, OnceLock};
|
||||
|
||||
use anyhow::{Context, Result, anyhow};
|
||||
use lru::LruCache;
|
||||
|
||||
use kebab_core::{
|
||||
Answer, Embedder, IndexVersion, LanguageModel, Retriever, SearchHit, SearchMode,
|
||||
@@ -69,6 +71,44 @@ pub struct App {
|
||||
/// client per query (cheap, but still measurable on a 50-query
|
||||
/// suite).
|
||||
llm: OnceLock<Arc<dyn LanguageModel>>,
|
||||
/// p9-fb-19: in-process LRU search-result cache. Capacity comes
|
||||
/// from `config.search.cache_capacity` (default 256, ~1.3 MB
|
||||
/// cap). `None` when capacity is 0 (cache disabled). The
|
||||
/// `corpus_revision` snapshot embedded in `SearchCacheKey`
|
||||
/// invalidates every entry the moment a new ingest commit lands.
|
||||
search_cache: Option<Mutex<LruCache<SearchCacheKey, Vec<SearchHit>>>>,
|
||||
}
|
||||
|
||||
/// p9-fb-19: cache key for `App::search`. Includes every field that
|
||||
/// could change the result set:
|
||||
/// - normalized query (NFKC + trim + lowercase)
|
||||
/// - mode + k + snippet_chars (caller knobs)
|
||||
/// - embedding_version + chunker_version (model identity)
|
||||
/// - corpus_revision (monotonic counter that ingest bumps)
|
||||
///
|
||||
/// Lexical mode has no embedding identity → empty string in that
|
||||
/// slot, harmless because the rest of the key still distinguishes
|
||||
/// queries.
|
||||
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
|
||||
pub(crate) struct SearchCacheKey {
|
||||
pub query_norm: String,
|
||||
pub mode: SearchMode,
|
||||
pub k: u32,
|
||||
pub snippet_chars: u32,
|
||||
pub embedding_version: String,
|
||||
pub chunker_version: String,
|
||||
pub corpus_revision: u64,
|
||||
}
|
||||
|
||||
impl SearchCacheKey {
|
||||
/// Normalize `query.text` per spec p9-fb-19: NFKC + trim +
|
||||
/// lowercase. Means `"Foo"` / `"FOO"` / `" foo "` collapse to a
|
||||
/// single cache entry — redundant work avoided when the user's
|
||||
/// input differs only in shape.
|
||||
pub fn normalize_query(text: &str) -> String {
|
||||
use unicode_normalization::UnicodeNormalization;
|
||||
text.trim().nfkc().collect::<String>().to_lowercase()
|
||||
}
|
||||
}
|
||||
|
||||
impl App {
|
||||
@@ -85,22 +125,65 @@ impl App {
|
||||
sqlite
|
||||
.run_migrations()
|
||||
.context("kb-app: run SqliteStore migrations")?;
|
||||
// p9-fb-19: build the LRU cache from config. Capacity 0 →
|
||||
// `None` (cache disabled — every search hits the retrievers).
|
||||
let search_cache = NonZeroUsize::new(config.search.cache_capacity)
|
||||
.map(|cap| Mutex::new(LruCache::new(cap)));
|
||||
Ok(Self {
|
||||
config,
|
||||
sqlite: Arc::new(sqlite),
|
||||
embedder: OnceLock::new(),
|
||||
vector: OnceLock::new(),
|
||||
llm: OnceLock::new(),
|
||||
search_cache,
|
||||
})
|
||||
}
|
||||
|
||||
/// Run a [`SearchQuery`] through the configured retriever stack and
|
||||
/// return the top-k hits.
|
||||
/// return the top-k hits. p9-fb-19: result is served from the
|
||||
/// in-process LRU cache when the same `(query_norm, mode, k,
|
||||
/// snippet_chars, embedding_version, chunker_version,
|
||||
/// corpus_revision)` tuple was seen before; cache miss falls
|
||||
/// through to [`Self::search_uncached`].
|
||||
///
|
||||
/// Reuses any previously-built embedder / vector store on this `App`
|
||||
/// — long-lived callers (kb-eval, future TUI) get amortized cost
|
||||
/// across calls.
|
||||
pub fn search(&self, query: SearchQuery) -> Result<Vec<SearchHit>> {
|
||||
let Some(cache) = self.search_cache.as_ref() else {
|
||||
// Cache disabled (capacity = 0) — straight-line.
|
||||
return self.search_uncached(query);
|
||||
};
|
||||
// Build the cache key. embedding_version is empty for lexical
|
||||
// mode (no embedder identity); for vector/hybrid we need the
|
||||
// embedder built (which forces the cold-start cost), but
|
||||
// that's the cost the cache exists to amortize across
|
||||
// *subsequent* identical queries.
|
||||
let key = self.build_cache_key(&query)?;
|
||||
// Lock the cache long enough to lookup; clone the hit out so
|
||||
// we can drop the lock before returning.
|
||||
if let Ok(mut guard) = cache.lock() {
|
||||
if let Some(hits) = guard.get(&key) {
|
||||
tracing::debug!(
|
||||
target: "kebab-app",
|
||||
cache = "hit",
|
||||
corpus_revision = key.corpus_revision,
|
||||
"search served from LRU cache"
|
||||
);
|
||||
return Ok(hits.clone());
|
||||
}
|
||||
}
|
||||
let hits = self.search_uncached(query)?;
|
||||
if let Ok(mut guard) = cache.lock() {
|
||||
guard.put(key, hits.clone());
|
||||
}
|
||||
Ok(hits)
|
||||
}
|
||||
|
||||
/// p9-fb-19: bypass the LRU cache and run the search directly.
|
||||
/// Used by `--no-cache` CLI invocations and by `search` itself
|
||||
/// on cache miss. Identical behavior to the pre-fb-19 `search`.
|
||||
pub fn search_uncached(&self, query: SearchQuery) -> Result<Vec<SearchHit>> {
|
||||
match query.mode {
|
||||
SearchMode::Lexical => {
|
||||
let lex = LexicalRetriever::with_settings(
|
||||
@@ -257,6 +340,47 @@ impl App {
|
||||
Ok(self.llm.get().cloned().unwrap_or(llm))
|
||||
}
|
||||
|
||||
/// p9-fb-19: build a `SearchCacheKey` for `query`. For lexical
|
||||
/// mode the embedding_version slot is left empty (no embedder
|
||||
/// identity contributes to the result). For vector / hybrid
|
||||
/// modes the embedder is built (cold-start) so the version
|
||||
/// label can be read; that's the cost the cache exists to
|
||||
/// amortize over the next few identical queries.
|
||||
fn build_cache_key(&self, query: &SearchQuery) -> Result<SearchCacheKey> {
|
||||
let embedding_version = match query.mode {
|
||||
SearchMode::Lexical => String::new(),
|
||||
SearchMode::Vector | SearchMode::Hybrid => {
|
||||
let emb = self.embedder()?.ok_or_else(|| {
|
||||
anyhow!(
|
||||
"embeddings disabled; vector / hybrid search require an \
|
||||
embedder — switch to --mode lexical or enable a provider"
|
||||
)
|
||||
})?;
|
||||
vector_index_version(emb.as_ref()).0
|
||||
}
|
||||
};
|
||||
Ok(SearchCacheKey {
|
||||
query_norm: SearchCacheKey::normalize_query(&query.text),
|
||||
mode: query.mode,
|
||||
k: u32::try_from(query.k).unwrap_or(u32::MAX),
|
||||
snippet_chars: u32::try_from(self.config.search.snippet_chars).unwrap_or(u32::MAX),
|
||||
embedding_version,
|
||||
chunker_version: self.config.chunking.chunker_version.clone(),
|
||||
corpus_revision: self.sqlite.corpus_revision(),
|
||||
})
|
||||
}
|
||||
|
||||
/// p9-fb-19: clear the in-process search cache. Useful for tests
|
||||
/// and for explicit user actions (e.g. a future `kebab cache
|
||||
/// clear` admin command). No-op when the cache is disabled.
|
||||
pub fn clear_search_cache(&self) {
|
||||
if let Some(cache) = self.search_cache.as_ref() {
|
||||
if let Ok(mut guard) = cache.lock() {
|
||||
guard.clear();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Resolve the embedder + vector store, surfacing the user-friendly
|
||||
/// "switch to --mode lexical" error when embeddings are disabled.
|
||||
fn require_embeddings(
|
||||
|
||||
@@ -600,6 +600,26 @@ pub fn ingest_with_config_cancellable(
|
||||
};
|
||||
crate::ingest_progress::emit(progress, terminal_event);
|
||||
|
||||
// p9-fb-19: bump the persistent corpus_revision counter when a
|
||||
// commit landed (any new / updated). This invalidates every
|
||||
// entry in any in-process LRU search cache (in this process or
|
||||
// a sibling) on the next lookup. No-op when nothing changed
|
||||
// (skipped-only run) — the cache stays valid.
|
||||
if new_count > 0 || updated_count > 0 {
|
||||
match app.sqlite.bump_corpus_revision() {
|
||||
Ok(rev) => tracing::debug!(
|
||||
target: "kebab-app",
|
||||
corpus_revision = rev,
|
||||
"bumped corpus_revision after ingest commit"
|
||||
),
|
||||
Err(e) => tracing::warn!(
|
||||
target: "kebab-app",
|
||||
error = %e,
|
||||
"bump_corpus_revision failed; cache may serve stale results until process restart"
|
||||
),
|
||||
}
|
||||
}
|
||||
|
||||
Ok(IngestReport {
|
||||
scope,
|
||||
scanned: scanned_count,
|
||||
@@ -1443,6 +1463,17 @@ pub fn search_with_config(
|
||||
App::open_with_config(config)?.search(query)
|
||||
}
|
||||
|
||||
/// p9-fb-19: bypass the LRU search cache for one call. Same shape as
|
||||
/// [`search_with_config`] but routes through [`App::search_uncached`]
|
||||
/// — used by `kebab search --no-cache`.
|
||||
#[doc(hidden)]
|
||||
pub fn search_uncached_with_config(
|
||||
config: kebab_config::Config,
|
||||
query: SearchQuery,
|
||||
) -> anyhow::Result<Vec<SearchHit>> {
|
||||
App::open_with_config(config)?.search_uncached(query)
|
||||
}
|
||||
|
||||
// ── ask ──────────────────────────────────────────────────────────────────
|
||||
//
|
||||
// P4-3 wires `ask` end-to-end. The retriever is built per `opts.mode`;
|
||||
|
||||
@@ -49,6 +49,91 @@ fn lexical_search_empty_query_returns_empty() {
|
||||
assert!(hits.is_empty(), "blank query must short-circuit empty");
|
||||
}
|
||||
|
||||
/// p9-fb-19 — `App::search` returns the same hit list for a repeated
|
||||
/// query (cache hit doesn't corrupt the result). Both calls share an
|
||||
/// `App` instance so the cache is in scope.
|
||||
#[test]
|
||||
fn cached_search_returns_same_hits_on_repeat() {
|
||||
let env = TestEnv::lexical_only();
|
||||
kebab_app::ingest_with_config(env.config.clone(), env.scope(), true).unwrap();
|
||||
let app = kebab_app::App::open_with_config(env.config.clone()).unwrap();
|
||||
let first = app.search(lexical_query("ownership")).unwrap();
|
||||
assert!(!first.is_empty(), "first call must return ≥1 hit");
|
||||
let second = app.search(lexical_query("ownership")).unwrap();
|
||||
assert_eq!(
|
||||
first.len(),
|
||||
second.len(),
|
||||
"cached call must yield identical hit count"
|
||||
);
|
||||
for (a, b) in first.iter().zip(second.iter()) {
|
||||
assert_eq!(a.chunk_id, b.chunk_id, "chunk_ids must align");
|
||||
assert_eq!(a.rank, b.rank, "ranks must align");
|
||||
}
|
||||
}
|
||||
|
||||
/// p9-fb-19 — query normalization (NFKC + trim + lowercase) collapses
|
||||
/// `"Ownership"` / `"OWNERSHIP"` / `" ownership "` into one cache
|
||||
/// entry. Verified by ensuring all three forms return the same hits.
|
||||
#[test]
|
||||
fn cache_key_normalization_treats_case_and_whitespace_as_equivalent() {
|
||||
let env = TestEnv::lexical_only();
|
||||
kebab_app::ingest_with_config(env.config.clone(), env.scope(), true).unwrap();
|
||||
let app = kebab_app::App::open_with_config(env.config.clone()).unwrap();
|
||||
let plain = app.search(lexical_query("ownership")).unwrap();
|
||||
let upper = app.search(lexical_query("OWNERSHIP")).unwrap();
|
||||
let padded = app.search(lexical_query(" Ownership ")).unwrap();
|
||||
assert_eq!(plain.len(), upper.len());
|
||||
assert_eq!(plain.len(), padded.len());
|
||||
// chunk_ids are deterministic — same query class, same set.
|
||||
let plain_ids: Vec<_> = plain.iter().map(|h| h.chunk_id.0.clone()).collect();
|
||||
let upper_ids: Vec<_> = upper.iter().map(|h| h.chunk_id.0.clone()).collect();
|
||||
assert_eq!(plain_ids, upper_ids);
|
||||
}
|
||||
|
||||
/// p9-fb-19 — `--no-cache` (`search_uncached_with_config`) bypasses
|
||||
/// the cache. Result correctness is identical to `search_with_config`.
|
||||
#[test]
|
||||
fn search_uncached_returns_same_hits_as_cached() {
|
||||
let env = TestEnv::lexical_only();
|
||||
kebab_app::ingest_with_config(env.config.clone(), env.scope(), true).unwrap();
|
||||
let cached =
|
||||
kebab_app::search_with_config(env.config.clone(), lexical_query("ownership"))
|
||||
.unwrap();
|
||||
let uncached = kebab_app::search_uncached_with_config(
|
||||
env.config.clone(),
|
||||
lexical_query("ownership"),
|
||||
)
|
||||
.unwrap();
|
||||
assert_eq!(cached.len(), uncached.len());
|
||||
for (a, b) in cached.iter().zip(uncached.iter()) {
|
||||
assert_eq!(a.chunk_id, b.chunk_id);
|
||||
}
|
||||
}
|
||||
|
||||
/// p9-fb-19 — first ingest with commits bumps `corpus_revision` from
|
||||
/// 0 to ≥1. Verified by reading the persisted kv via a fresh
|
||||
/// SqliteStore handle (the field on `App` is `pub(crate)`).
|
||||
#[test]
|
||||
fn first_ingest_bumps_corpus_revision() {
|
||||
let env = TestEnv::lexical_only();
|
||||
let store_before =
|
||||
kebab_store_sqlite::SqliteStore::open(&env.config).unwrap();
|
||||
store_before.run_migrations().unwrap();
|
||||
assert_eq!(store_before.corpus_revision(), 0, "fresh store seeds 0");
|
||||
|
||||
let report =
|
||||
kebab_app::ingest_with_config(env.config.clone(), env.scope(), true).unwrap();
|
||||
assert!(report.new + report.updated > 0, "first ingest must commit ≥1 doc");
|
||||
|
||||
let store_after =
|
||||
kebab_store_sqlite::SqliteStore::open(&env.config).unwrap();
|
||||
assert!(
|
||||
store_after.corpus_revision() >= 1,
|
||||
"ingest commit must bump corpus_revision (got {})",
|
||||
store_after.corpus_revision(),
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn vector_mode_with_provider_none_errors_clearly() {
|
||||
let env = TestEnv::lexical_only();
|
||||
|
||||
Reference in New Issue
Block a user