Commit Graph

6 Commits

Author SHA1 Message Date
21b52bc285 style: cargo fmt --all (S3+S4+S5+S7 follow-up)
V009 morphological tokenizer 작업 (S3 chunk + S4 backfill + S5
short_query_hint 제거 + S7 신규 tests) 의 형식 정리. 동작 변경 없음.

Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S11)
2026-05-28 12:06:01 +00:00
c5de5f812b test(fts,app): V009 morphological tokenizer integration tests
신규 4 test 추가:

- crates/kebab-store-sqlite/tests/fts.rs:
  - fts_v009_korean_morphological_2char_query_hits: tokenized_korean_text
    column 이 채워진 chunk 의 '한국' 2-char query hit.
  - fts_v009_english_whole_token_only: V007 trigram substring 매칭
    회귀 (Path A) — 'token' query 가 'tokenizer' chunk 에서 0-hit.
- crates/kebab-app/tests/search_korean.rs:
  - korean_morphological_2char_query_lexical_mode: end-to-end
    한국어 wiki fixture ingest → '한국' / '서울' query hit.
  - korean_morphological_mixed_english_korean_query: 'Rust' English
    whole-token + '최적화' Korean morpheme hit.

crates/kebab-search/src/lexical.rs:
  - build_match_string() 의 MIN_TRIGRAM_CHARS(3) → MIN_QUERY_CHARS(2).
    V009 unicode61 은 최소 token 길이 제한 없어 2자 한국어 morpheme
    query 가 통과되어야 함. 1자 단독은 여전히 필터.
  - 관련 unit test 2개 V009 동작으로 갱신.

fixture text 는 lindera ko-dic 의 실제 segmentation 동작에 의존
(spec Appendix B prior-knowledge 예측). 실측 시 fixture 조정 가능.

Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §9.1, §9.2
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S7)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 11:38:52 +00:00
685007789a style: cargo fmt --all (round 4 ingest log feature follow-up)
Phase C4 executor 의 마지막 `fix(test): clippy + fmt fixes` commit 이
test file 부분만 fmt 적용. workspace 전체 fmt 누락 발견 → cargo fmt --all
적용. 모든 import alphabetical reorder + line wrapping 정합.

추가 untracked artifact 동시 commit:
- docs/superpowers/specs/2026-05-28-v0.20-ingest-log-spec.md (491 line, ACCEPT)
- docs/superpowers/plans/2026-05-28-v0.20-ingest-log-plan.md (616 line, ACCEPT)

workspace test: 1370 passed / 0 failed / 50 ignored, ingest_log_smoke green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 04:18:40 +00:00
6ac7fea7b9 feat(v0.17.0/A5): trigram-aware build_match_string + SearchResponse.hint
PR-A 본체. plan Task A4 Step 1c + A5.

- lexical.rs::build_match_string 재설계: whole-phrase + token-AND
  OR-combined, 3자 미만 토큰 drop, 후보 없음 시 None (빈 MATCH
  회피). raw single-quote mode 유지.
- SearchResponse.hint additive — empty result + trimmed < 3 chars
  + non-raw 케이스에 short_query_hint helper 가 set.
- CLI 'kebab search' 가 [hint] stderr 한 줄 (text mode).
- TUI SearchState.short_query_hint + poll_worker stale-aware set
  + fire_search/mark_input_changed reset + dynamic_status 표시.
- docs/wire-schema/v1/search_response.schema.json hint additive.
- 신규 unit tests (lexical 9 PASS, 기존 2 expectation 갱신) +
  통합 회귀 (search_korean: multi_token + mixed, 3 PASS) +
  BM25 snapshot regen (trigram token stream).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 11:54:25 +00:00
3f0b00439a review(p9-fb-10-task5): promote lexical_query to common + tighten Korean hit assertion
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 10:14:17 +00:00
60e583252e test(kebab-app): Korean query → FTS5 smoke pin
p9-fb-10: verifies that a Korean (Hangul) token survives the
ingest → FTS5 lexical search round-trip via the kebab-app facade.
NFC normalization is wired upstream in kebab-normalize; this test
only exercises end-to-end correctness — no AVX, no fastembed required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 10:08:32 +00:00