feat(app): first-boot eager backfill for tokenized_korean_text
V007 → V009 업그레이드 시 기존 chunks 의 tokenized_korean_text 가 NULL — 첫 App::open_with_config 호출 시 자동으로 lindera ko-dic 으로 분해 후 UPDATE. chunks_au trigger 가 chunks_fts 를 자동 재-index. 사용자 재-ingest 불필요. - crates/kebab-store-sqlite/src/store.rs: backfill_tokenized_korean_text(progress_cb, tokenize) API. 1000 row 마다 commit + progress 콜백. idempotent (IS NULL 필터로 partial completion 재실행 안전). tokenizer 를 파라미터로 받아 §8 dep 경계 유지. - crates/kebab-app/src/app.rs::open_with_config: run_migrations 직후 backfill 호출. 실패 시 warn log 만 (App open 은 성공 — vector/hybrid mode 계속 가능). 500 row 마다 info log progress. - crates/kebab-store-sqlite/tests/fts.rs: backfill_tokenized_korean_text_populates_nullable_rows 단위 test (idempotency 포함). - clippy pre-existing 오류 수정 (redundant_closure, map_unwrap_or, cast_lossless, uninlined_format_args — kebab-app/ingest_log.rs, pdf_ocr_apply.rs, app.rs, tests/ocr_inspect_smoke.rs). Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §8.1, §8.2 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S4) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -15,14 +15,14 @@ fn seed_ocr_events(env: &TestEnv, store: &SqliteStore) {
|
||||
store
|
||||
.record_pdf_ocr_event(
|
||||
"run-aaa",
|
||||
&format!("2026-05-28T0{}:00:00Z", i),
|
||||
&format!("2026-05-28T0{i}:00:00Z"),
|
||||
Some("doc-abc"),
|
||||
"path/scanned.pdf",
|
||||
i + 1,
|
||||
Some(50_000),
|
||||
Some(200),
|
||||
Some(150),
|
||||
100 + (i as u64) * 20,
|
||||
100 + u64::from(i) * 20,
|
||||
42,
|
||||
true,
|
||||
None,
|
||||
|
||||
Reference in New Issue
Block a user