feat(kebab-app + kebab-store-sqlite): p9-fb-19 search LRU cache + corpus_revision
도그푸딩 item 15 — TUI / 같은 process 안에서 동일 query 반복 시 SQLite FTS + Lance + RRF 재계산이 매번 발생하던 비용 해소. in-process LRU 캐시 + 모노토닉 corpus_revision 카운터로 ingest commit 발생 시 모든 entry 자동 stale. ## 핵심 변경 - **SQLite V004 migration**: `kv (key TEXT PRIMARY KEY, value TEXT) STRICT` + `corpus_revision = '0'` seed. 미래의 다른 scalar 도 같은 테이블에 들어갈 수 있는 generic shape. - **`SqliteStore::corpus_revision()` / `bump_corpus_revision()`** — `UPDATE ... CAST AS INTEGER + 1` atomic. INSERT-OR-IGNORE 도 함께 실행 (V004 seed 가 무슨 이유로 누락된 케이스 paranoid). - **`kebab-app::ingest_with_config_cancellable`** — `new + updated > 0` 시 bump, no-op (skipped-only) reingest 는 cache 보존. - **`App.search_cache: Option<Mutex<LruCache<SearchCacheKey, Vec< SearchHit>>>>`** — `config.search.cache_capacity` (default 256, 0 비활성). `lru = "0.12"` workspace dep 추가. - **`SearchCacheKey`** = `query_norm` (NFKC + trim + lowercase) + `mode` + `k` + `snippet_chars` + `embedding_version` (vector/hybrid 만, lexical 은 빈 문자열) + `chunker_version` + `corpus_revision` snapshot. - **`App::search`** rewrite — cache 활성 시 lookup → miss 면 기존 `search_uncached` 호출 후 put. cache 비활성이거나 lock 실패면 straight-line. - **`App::search_uncached`** (rename of pre-fb-19 `search` body) + `search_uncached_with_config` facade — CLI `kebab search --no-cache` 로 진입. - **`Config.search.cache_capacity: usize`** field, `#[serde(default)]` 로 기존 config 호환. - **CLI `--no-cache`** flag — 디버깅용 (CLI 는 매 호출이 새 process 라 사실상 no-op 이지만 spec 명시 + 향후 long-lived process 호환). - **frozen design §9 versioning** 표에 `corpus_revision` row 추가 (기존 `index_version` 라벨과 다른 차원: 라벨은 retrieval 형상, corpus_revision 은 ingest commit ack). ## 테스트 - `kebab-store-sqlite` 신규 3 unit (fresh=0, monotonic bump, persist across reopen) - `kebab-app` 신규 4 integration (cached repeat 같은 hits, NFKC 정규화 로 case/whitespace collapse, --no-cache parity, first ingest bumps corpus_revision) - 워크스페이스 전체 `cargo test --workspace --no-fail-fast -j 1` exit 0 - `cargo clippy --workspace --all-targets -- -D warnings` clean ## 문서 - README `kebab search` 행: 캐시 동작 + `--no-cache` 안내 + corpus_ revision 무효화 메커니즘 - docs/SMOKE.md `[search]` 절에 `cache_capacity` 라인 추가 - HANDOFF: 2026-05-03 entry - spec status planned → in_progress ## Out of scope - patch-and-merge incremental (RRF 정규화 전체 hit set 기준이라 어려움) - SQLite 영속 cache (P+) - 다른 process 간 cache 공유 (in-process 만 — corpus_revision 이 cross-process 무효화는 O(1)) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -49,6 +49,91 @@ fn lexical_search_empty_query_returns_empty() {
|
||||
assert!(hits.is_empty(), "blank query must short-circuit empty");
|
||||
}
|
||||
|
||||
/// p9-fb-19 — `App::search` returns the same hit list for a repeated
|
||||
/// query (cache hit doesn't corrupt the result). Both calls share an
|
||||
/// `App` instance so the cache is in scope.
|
||||
#[test]
|
||||
fn cached_search_returns_same_hits_on_repeat() {
|
||||
let env = TestEnv::lexical_only();
|
||||
kebab_app::ingest_with_config(env.config.clone(), env.scope(), true).unwrap();
|
||||
let app = kebab_app::App::open_with_config(env.config.clone()).unwrap();
|
||||
let first = app.search(lexical_query("ownership")).unwrap();
|
||||
assert!(!first.is_empty(), "first call must return ≥1 hit");
|
||||
let second = app.search(lexical_query("ownership")).unwrap();
|
||||
assert_eq!(
|
||||
first.len(),
|
||||
second.len(),
|
||||
"cached call must yield identical hit count"
|
||||
);
|
||||
for (a, b) in first.iter().zip(second.iter()) {
|
||||
assert_eq!(a.chunk_id, b.chunk_id, "chunk_ids must align");
|
||||
assert_eq!(a.rank, b.rank, "ranks must align");
|
||||
}
|
||||
}
|
||||
|
||||
/// p9-fb-19 — query normalization (NFKC + trim + lowercase) collapses
|
||||
/// `"Ownership"` / `"OWNERSHIP"` / `" ownership "` into one cache
|
||||
/// entry. Verified by ensuring all three forms return the same hits.
|
||||
#[test]
|
||||
fn cache_key_normalization_treats_case_and_whitespace_as_equivalent() {
|
||||
let env = TestEnv::lexical_only();
|
||||
kebab_app::ingest_with_config(env.config.clone(), env.scope(), true).unwrap();
|
||||
let app = kebab_app::App::open_with_config(env.config.clone()).unwrap();
|
||||
let plain = app.search(lexical_query("ownership")).unwrap();
|
||||
let upper = app.search(lexical_query("OWNERSHIP")).unwrap();
|
||||
let padded = app.search(lexical_query(" Ownership ")).unwrap();
|
||||
assert_eq!(plain.len(), upper.len());
|
||||
assert_eq!(plain.len(), padded.len());
|
||||
// chunk_ids are deterministic — same query class, same set.
|
||||
let plain_ids: Vec<_> = plain.iter().map(|h| h.chunk_id.0.clone()).collect();
|
||||
let upper_ids: Vec<_> = upper.iter().map(|h| h.chunk_id.0.clone()).collect();
|
||||
assert_eq!(plain_ids, upper_ids);
|
||||
}
|
||||
|
||||
/// p9-fb-19 — `--no-cache` (`search_uncached_with_config`) bypasses
|
||||
/// the cache. Result correctness is identical to `search_with_config`.
|
||||
#[test]
|
||||
fn search_uncached_returns_same_hits_as_cached() {
|
||||
let env = TestEnv::lexical_only();
|
||||
kebab_app::ingest_with_config(env.config.clone(), env.scope(), true).unwrap();
|
||||
let cached =
|
||||
kebab_app::search_with_config(env.config.clone(), lexical_query("ownership"))
|
||||
.unwrap();
|
||||
let uncached = kebab_app::search_uncached_with_config(
|
||||
env.config.clone(),
|
||||
lexical_query("ownership"),
|
||||
)
|
||||
.unwrap();
|
||||
assert_eq!(cached.len(), uncached.len());
|
||||
for (a, b) in cached.iter().zip(uncached.iter()) {
|
||||
assert_eq!(a.chunk_id, b.chunk_id);
|
||||
}
|
||||
}
|
||||
|
||||
/// p9-fb-19 — first ingest with commits bumps `corpus_revision` from
|
||||
/// 0 to ≥1. Verified by reading the persisted kv via a fresh
|
||||
/// SqliteStore handle (the field on `App` is `pub(crate)`).
|
||||
#[test]
|
||||
fn first_ingest_bumps_corpus_revision() {
|
||||
let env = TestEnv::lexical_only();
|
||||
let store_before =
|
||||
kebab_store_sqlite::SqliteStore::open(&env.config).unwrap();
|
||||
store_before.run_migrations().unwrap();
|
||||
assert_eq!(store_before.corpus_revision(), 0, "fresh store seeds 0");
|
||||
|
||||
let report =
|
||||
kebab_app::ingest_with_config(env.config.clone(), env.scope(), true).unwrap();
|
||||
assert!(report.new + report.updated > 0, "first ingest must commit ≥1 doc");
|
||||
|
||||
let store_after =
|
||||
kebab_store_sqlite::SqliteStore::open(&env.config).unwrap();
|
||||
assert!(
|
||||
store_after.corpus_revision() >= 1,
|
||||
"ingest commit must bump corpus_revision (got {})",
|
||||
store_after.corpus_revision(),
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn vector_mode_with_provider_none_errors_clearly() {
|
||||
let env = TestEnv::lexical_only();
|
||||
|
||||
Reference in New Issue
Block a user