docs(v0.20.1): polish PR-review findings (README/HOTFIXES/schema/SKILL)

opus PR-level final review (Approved with notes) 의 4 minor finding
mechanical 정정:

1. README.md — `kebab search` row 의 영어 substring 매칭 표현이
   V007 시절 그대로였음. V009 의 whole-token 회귀 (substring → V002
   동작) 를 정직히 명시 + vector/hybrid mode 권장 안내.
2. tasks/HOTFIXES.md — 2026-05-28 entry 의 file path 정정. lexical.rs
   는 lindera 호출자가 아니라 build_match_string 의 MIN_QUERY_CHARS
   3→2 갱신만; lindera helper 의 실제 owner 는 kebab-chunk/src/lib.rs.
   ingest.rs 는 본 PR scope 외, eager backfill hook 위치는 kebab-app/
   src/app.rs::App::open_with_config.
3. docs/wire-schema/v1/search_response.schema.json — `hint` field
   description 이 V007 trigram 3-char minimum 시절 advisory 시그니처
   그대로. v0.20.1 에서 helper retired + always-omit 사실 명시
   (forward-compat 차원에서 field 만 schema 에 보존).
4. integrations/claude-code/kebab/SKILL.md — `hint` field 설명의
   self-contradiction ("present only with trigram in edge cases" vs
   "Korean 2-char now supported") 해소. retired + reuse 가능 명시.

PR-level reviewer recommendation: "Merge as-is — block 사유 아님 (모든
finding minor)". 본 commit 은 reviewer 의 옵션 1 (별 docs hotfix
commit) 채택.

Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (PR-level finding follow-up)
This commit is contained in:
2026-05-28 12:53:00 +00:00
parent 53ec9b4dc5
commit 5d9ea588ed
4 changed files with 9 additions and 7 deletions

View File

@@ -22,10 +22,12 @@ git history.
**Root cause**: trigram 의 bucket 미존재. unicode61 기반 단순 3-gram 분해로는 2-char 한국어 단어를 충분히 커버 못함.
**Fix**: V009 migration + lindera-ko-dic 형태소분석기 + tokenized_korean_text column + first-boot eager backfill. branch `feat/korean-morphological-tokenizer` (8 commit + 5 follow-up).
- `migrations/V009__fts_korean_morphological.sql`FTS5 tokenize 함수 + tokenized_korean_text column 신설.
- `crates/kebab-search/src/lexical.rs` — lindera integration + 한국어 쿼리 prefix matching 재설계.
- `crates/kebab-app/src/ingest.rs` — first-boot V009 trigger + eager backfill.
**Fix**: V009 migration + lindera-ko-dic 형태소분석기 + tokenized_korean_text column + first-boot eager backfill. branch `feat/korean-morphological-tokenizer` (17 commit).
- `migrations/V009__fts_korean_morphological.sql``tokenized_korean_text` column ADD + chunks_fts (trigram → unicode61) + CASE expression triggers + corpus_revision bump.
- `crates/kebab-chunk/src/lib.rs::tokenize_korean_morphological` — lindera ko-dic 형태소 분석 helper (OnceLock 캐시 + None fallback).
- `crates/kebab-store-sqlite/src/store.rs::backfill_tokenized_korean_text` — 1000-row batch transaction + idempotent backfill (tokenize closure 주입으로 dependency-inversion).
- `crates/kebab-app/src/app.rs::App::open_with_config` — first-boot hook 에서 backfill 호출 (실패 시 warn log + App open 계속).
- `crates/kebab-search/src/lexical.rs::build_match_string``MIN_QUERY_CHARS` 3 → 2 로 낮춰 2자 한국어 query 통과 허용 (V007 시절 doc-comment 의 trigram 가정 갱신).
**Amends**: design §5.5 (FTS5 한국어 지원으로 갱신), §9 (index_version cascade — `fts5-v009-korean-morphological` suffix), HOTFIXES 2026-05-24 trigram entry (한국어 2자 query 미해결 footnote 해소).