feat(fts): add V009 korean morphological tokenizer migration
V007 trigram tokenizer 의 한국어 2자 query 0-hit 한계 (Bug #8) 해소를 위한 V009 migration 추가. unicode61 tokenizer 로 환원 + 한국어 형태소 분해 결과를 별 column `tokenized_korean_text` 에 pre-fill 하는 방식. - migrations/V009__fts_korean_morphological.sql 신규: column ADD, chunks_fts DROP+재정의, 3 trigger CASE expression, backfill INSERT, corpus_revision bump. - design §5.5 갱신: trigram → unicode61 + 형태소 column. CASE expression trigger 본문. - crates/kebab-store-sqlite/tests/fts.rs: V007 verbatim test 를 V009 source-of-truth 로 rename. v009_bumps_corpus_revision unit test 추가. - store.rs: clippy bool_to_int_with_if + cast_lossless 기존 경고 수정 (pdf_ocr_events 관련 코드, S1 작업 중 발견). 영어 substring 매칭은 V002 (whole-token only) 로 회귀 — spec §3 Non-Goals + 후속 release notes (v0.20.1) 에서 정직히 기술. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S1) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1028,7 +1028,7 @@ impl SqliteStore {
|
||||
image_height,
|
||||
ms,
|
||||
chars,
|
||||
if success { 1i32 } else { 0i32 },
|
||||
i32::from(success),
|
||||
reason,
|
||||
ocr_engine
|
||||
],
|
||||
@@ -1042,7 +1042,7 @@ impl SqliteStore {
|
||||
/// means "delete everything older than now" (i.e. all past rows).
|
||||
pub fn prune_pdf_ocr_events(&self, retention_days: u32) -> anyhow::Result<u64> {
|
||||
use time::format_description::well_known::Rfc3339;
|
||||
let cutoff = time::OffsetDateTime::now_utc() - time::Duration::days(retention_days as i64);
|
||||
let cutoff = time::OffsetDateTime::now_utc() - time::Duration::days(i64::from(retention_days));
|
||||
let cutoff_ts = cutoff
|
||||
.format(&Rfc3339)
|
||||
.unwrap_or_else(|_| "1970-01-01T00:00:00Z".to_string());
|
||||
|
||||
Reference in New Issue
Block a user