도그푸딩 item 15 — TUI / 같은 process 안에서 동일 query 반복 시 SQLite FTS + Lance + RRF 재계산이 매번 발생하던 비용 해소. in-process LRU 캐시 + 모노토닉 corpus_revision 카운터로 ingest commit 발생 시 모든 entry 자동 stale. ## 핵심 변경 - **SQLite V004 migration**: `kv (key TEXT PRIMARY KEY, value TEXT) STRICT` + `corpus_revision = '0'` seed. 미래의 다른 scalar 도 같은 테이블에 들어갈 수 있는 generic shape. - **`SqliteStore::corpus_revision()` / `bump_corpus_revision()`** — `UPDATE ... CAST AS INTEGER + 1` atomic. INSERT-OR-IGNORE 도 함께 실행 (V004 seed 가 무슨 이유로 누락된 케이스 paranoid). - **`kebab-app::ingest_with_config_cancellable`** — `new + updated > 0` 시 bump, no-op (skipped-only) reingest 는 cache 보존. - **`App.search_cache: Option<Mutex<LruCache<SearchCacheKey, Vec< SearchHit>>>>`** — `config.search.cache_capacity` (default 256, 0 비활성). `lru = "0.12"` workspace dep 추가. - **`SearchCacheKey`** = `query_norm` (NFKC + trim + lowercase) + `mode` + `k` + `snippet_chars` + `embedding_version` (vector/hybrid 만, lexical 은 빈 문자열) + `chunker_version` + `corpus_revision` snapshot. - **`App::search`** rewrite — cache 활성 시 lookup → miss 면 기존 `search_uncached` 호출 후 put. cache 비활성이거나 lock 실패면 straight-line. - **`App::search_uncached`** (rename of pre-fb-19 `search` body) + `search_uncached_with_config` facade — CLI `kebab search --no-cache` 로 진입. - **`Config.search.cache_capacity: usize`** field, `#[serde(default)]` 로 기존 config 호환. - **CLI `--no-cache`** flag — 디버깅용 (CLI 는 매 호출이 새 process 라 사실상 no-op 이지만 spec 명시 + 향후 long-lived process 호환). - **frozen design §9 versioning** 표에 `corpus_revision` row 추가 (기존 `index_version` 라벨과 다른 차원: 라벨은 retrieval 형상, corpus_revision 은 ingest commit ack). ## 테스트 - `kebab-store-sqlite` 신규 3 unit (fresh=0, monotonic bump, persist across reopen) - `kebab-app` 신규 4 integration (cached repeat 같은 hits, NFKC 정규화 로 case/whitespace collapse, --no-cache parity, first ingest bumps corpus_revision) - 워크스페이스 전체 `cargo test --workspace --no-fail-fast -j 1` exit 0 - `cargo clippy --workspace --all-targets -- -D warnings` clean ## 문서 - README `kebab search` 행: 캐시 동작 + `--no-cache` 안내 + corpus_ revision 무효화 메커니즘 - docs/SMOKE.md `[search]` 절에 `cache_capacity` 라인 추가 - HANDOFF: 2026-05-03 entry - spec status planned → in_progress ## Out of scope - patch-and-merge incremental (RRF 정규화 전체 hit set 기준이라 어려움) - SQLite 영속 cache (P+) - 다른 process 간 cache 공유 (in-process 만 — corpus_revision 이 cross-process 무효화는 O(1)) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
67 lines
3.1 KiB
TOML
67 lines
3.1 KiB
TOML
[package]
|
|
name = "kebab-app"
|
|
version = { workspace = true }
|
|
edition = { workspace = true }
|
|
rust-version = { workspace = true }
|
|
license = { workspace = true }
|
|
repository = { workspace = true }
|
|
description = "Facade — orchestrates components for kb-cli/tui/desktop"
|
|
|
|
[dependencies]
|
|
kebab-core = { path = "../kebab-core" }
|
|
kebab-config = { path = "../kebab-config" }
|
|
kebab-source-fs = { path = "../kebab-source-fs" }
|
|
kebab-parse-md = { path = "../kebab-parse-md" }
|
|
kebab-parse-types = { path = "../kebab-parse-types" }
|
|
kebab-normalize = { path = "../kebab-normalize" }
|
|
kebab-chunk = { path = "../kebab-chunk" }
|
|
kebab-store-sqlite = { path = "../kebab-store-sqlite" }
|
|
kebab-store-vector = { path = "../kebab-store-vector" }
|
|
kebab-search = { path = "../kebab-search" }
|
|
kebab-embed = { path = "../kebab-embed" }
|
|
kebab-embed-local = { path = "../kebab-embed-local" }
|
|
kebab-llm = { path = "../kebab-llm" }
|
|
kebab-llm-local = { path = "../kebab-llm-local" }
|
|
kebab-rag = { path = "../kebab-rag" }
|
|
# P6-4: image extractor + OCR + caption adapters live here. App
|
|
# threads them into the per-asset dispatch (see `ingest_one_asset`
|
|
# image branch). Trait-only consumption — no `kebab-parse-image`
|
|
# internals leak into kb-app code.
|
|
kebab-parse-image = { path = "../kebab-parse-image" }
|
|
# P7-3: PDF text extractor lives here. App threads it into the
|
|
# per-asset dispatch (see `ingest_one_asset` PDF branch) and runs the
|
|
# resulting `CanonicalDocument` through `kebab-chunk::PdfPageV1Chunker`.
|
|
kebab-parse-pdf = { path = "../kebab-parse-pdf" }
|
|
anyhow = { workspace = true }
|
|
blake3 = { workspace = true }
|
|
serde = { workspace = true }
|
|
serde_json = { workspace = true }
|
|
time = { workspace = true }
|
|
tracing = { workspace = true }
|
|
tracing-subscriber = { version = "0.3", features = ["env-filter", "fmt", "json"] }
|
|
tracing-appender = "0.2"
|
|
toml = "0.8"
|
|
dirs = "5"
|
|
# p9-fb-19: in-process LRU cache for `App::search`. Capacity from
|
|
# `config.search.cache_capacity` (default 256, ~1.3 MB cap).
|
|
lru = { workspace = true }
|
|
# p9-fb-19: NFKC-normalize cache-key queries so `"Foo"` / `"FOO"` /
|
|
# `" foo "` collapse to one entry. Same crate kebab-normalize +
|
|
# kebab-core already use, no version drift.
|
|
unicode-normalization = "0.1"
|
|
|
|
[dev-dependencies]
|
|
rusqlite = { workspace = true }
|
|
tempfile = { workspace = true }
|
|
# Image-pipeline integration tests use wiremock to stub Ollama for OCR
|
|
# / caption HTTP calls. Async runtime to host the mock server only;
|
|
# the kb-app code under test stays sync.
|
|
wiremock = { workspace = true }
|
|
tokio = { workspace = true, features = ["rt-multi-thread"] }
|
|
image = { version = "0.25", default-features = false, features = ["png"] }
|
|
# P7-3 PDF integration tests build in-memory PDF fixtures via the same
|
|
# lopdf builder pattern `kebab-parse-pdf::tests::common` uses; pinned
|
|
# to the same major (0.32) so byte output is identical between the two
|
|
# fixture surfaces.
|
|
lopdf = "0.32"
|