kebab

Author	SHA1	Message	Date
altair823	2619b7bff7	test(chunk): AST snapshot fixture에 aliases:null 필드 반영 Chunk 구조체에 aliases 필드가 추가된(별칭 인프라) 뒤 chunk-*-ast-v1 snapshot fixture 들이 미갱신 상태로 남아 drift FAIL 이었다. chunk_id· text·policy_hash·tokenized 는 전부 불변 — 직렬화에 "aliases": null 한 필드만 추가됐다(청크 생성 로직 무변경, 회귀 아님). UPDATE_SNAPSHOTS=1 로 10개 fixture(code c/cpp/go/java/js/kotlin/python/rust/ts + long_section) 재베이크. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 09:57:16 +00:00
altair823	e9b520216e	fix(expansion): per-alias sentinel orphan cleanup + 캐시 견고성 (PR #195 리뷰) MAJOR: 별칭 dense 벡터의 chunk_id 가 레거시 단일 `{id}#alias` 에서 줄별 `{id}#alias#0`, `#alias#1`, … 로 바뀌었으나 orphan cleanup 이 단일 sentinel 하나만 삭제해 `#alias#N` 벡터가 LanceDB / embedding_records 에 누수됐다. - kebab-app: `alias_sentinel_ids_to_delete` 헬퍼 추가(접근법 A) — 본문 + legacy `{id}#alias` + `{id}#alias#0`..`{id}#alias#{max-1}` 를 모두 delete-set 에 포함. max=expansion.max_aliases_per_chunk(= parse_aliases 의 하드 cap)와 일치. parser-bump / edited-asset / deleted-file 세 LanceDB cleanup 경로 모두 이 헬퍼를 사용. - kebab-store-sqlite: embedding_records 명시 DELETE 4 경로(put_chunks / purge_*_except_doc_id / purge_orphan_at_workspace_path / purge_deleted_workspace_path)를 정확 일치(`\|\| '#alias'`)에서 `{id}#alias%` 프리픽스 LIKE 로 전환. 본문 chunk_id 는 32자 hex 라 LIKE 와일드카드 없음. MINOR 1: alias 캐시 히트 시 비-UTF8 payload 를 미스로 강등(재생성 분기로) — embedding 경로의 decode-실패→미스 강등과 동작 일치. MINOR 2: embedding version_key 맨 앞에 kind 토큰("doc") 추가 — 임베더가 kind 별 프리픽스를 붙이므로 미래에 query 임베딩이 같은 캐시를 타도 충돌 방지. 회귀 테스트: - kebab-app: alias_sentinel_ids_to_delete 단위 테스트 2건. - kebab-store-sqlite: per-alias sentinel embedding_records 가 세 cleanup 경로 모두에서 사라지는지 핀하는 통합 테스트 3건. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 09:14:34 +00:00
altair823	a8fd76499c	feat(expansion): doc-side expansion 별칭 개별 dense 벡터 + 파생물 캐시(V012) 별칭을 줄별 개별 dense 벡터(sentinel `{chunk}#alias#N`)로 색인하고 boilerplate 청크는 별칭 생성을 skip. 묶음 1벡터 방식은 평균화로 특정 표현이 희석돼 오히려 회귀(13/18)했던 것을 폐기. 변형 일관성 14/18 → 16/18, mean_spread@10 0.222 → 0.111 (나무위키 ~1000 문서 CS corpus). `kebab-core::strip_alias_suffix` 가 suffix 형과 per-alias 형 둘 다 처리. 파생물 캐시(V012): embedding 벡터 + 별칭 LLM 결과를 청크 내용 해시 키로 캐싱해 재색인 시 내용 불변 청크의 재계산을 skip. cache_key = blake3(kind ‖ text_blake3 ‖ version_key)[:32], version_key 에 model/prompt/dimensions 포함 → §9 cascade 와 정합(버전 bump 시 자동 miss). 측정: 정답 3개 cold 1879s → warm 13s ≈ 145배. 순수 가산이라 corpus_revision bump 없음. search/ask 는 kebab.sqlite+lancedb 만으로 동작 → 외부 서버 색인 후 DB 만 복사하는 이식 워크플로 가능. V012 schema migration + 신규 surface 로 workspace version 0.20.2 → 0.21.0 (minor) bump. README/HANDOFF/ARCHITECTURE/HOTFIXES sync. known limitation: stack·svm 설명형 2개 잔존 + grounded 판정이 부분 인용을 grounded 로 오분류(후속 후보). 측정 상세: docs/superpowers/handoffs/2026-05-31-namu-wiki-alias-cache-study.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 08:24:04 +00:00
altair823	0282a81c67	fix(store): CASCADE 대체 4번째 경로 + V011 CHECK 복원 (Task 4.5 리뷰) 리뷰 MAJOR: purge_document_at_workspace_path_except_doc_id(parser-bump 경로)에 원본+sentinel embedding_records 명시 DELETE 누락 → tombstone 누적. 추가 + 회귀 테스트. MINOR: V011 status CHECK(pending/committed/tombstone) 복원. NIT: foreign_keys PRAGMA no-op 주석. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 14:02:46 +00:00
altair823	f3587b7143	feat(store): filter_chunks sentinel 별칭 candidate strip (committed 통과) LanceDB 후보의 sentinel chunk_id({orig}#alias)는 chunks JOIN 에서 탈락해 VectorRetriever strip 이전에 사라진다. candidate 를 kebab_core::strip_alias_suffix 로 원본 chunk_id 로 strip 해 IN-list/JOIN 에 넣어(committed 판정은 원본 body chunk 기준) 통과시키되, 반환은 입력 candidate 형태(sentinel 유지) — VectorRetriever 가 그 sentinel 을 받아 strip+dedup 한다. SQL replace 대신 (b) Rust strip 채택(명확). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 13:41:28 +00:00
altair823	483b1ec06b	feat(store): V011 embedding_records FK 제거 + CASCADE 대체 명시 DELETE (sentinel 별칭 벡터) 별칭 dense 벡터를 sentinel chunk_id({orig}#alias)로 색인하려면 chunks 에 없는 chunk_id 가 embedding_records 에 들어가야 한다. V001 의 chunk_id REFERENCES chunks ON DELETE CASCADE FK 가 이를 SQLite 787 로 막으므로 테이블을 FK 없이 재생성한다. status/vector_committed(V003) + 3개 인덱스 보존, chunks_bd_tombstone_embeddings trigger 무수정. DROP→RENAME 시 dangling trigger 재파싱을 피하려 legacy_alter_table=ON. 사라진 CASCADE 는 put_chunks + purge 두 경로(purge_orphan_at_workspace_path, purge_deleted_workspace_path)의 명시 DELETE 로 대체 — chunks 삭제 직전 원본 + {id}#alias sentinel embedding_records 를 함께 정리. corpus_revision baseline 2→3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 13:41:20 +00:00
altair823	b56469f010	fix(core): clippy uninlined_format_args — strip_alias 테스트 (리뷰 MAJOR-1) workspace clippy --all-targets -D warnings 게이트 통과. format! 인자 인라인. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 11:24:04 +00:00
altair823	6ba8cb2c88	feat(search): VectorRetriever sentinel 별칭 strip + dedup 별칭 dense 벡터({orig}#alias) hit 을 원본 chunk_id 로 strip 해 hydrate, body+alias 중복은 첫(높은 score) 하나만 유지. overfetch 2→3 (dedup 후 k 확보). wire/RetrievalDetail 무변경. vector/hybrid 회귀 0, clippy green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 11:09:32 +00:00
altair823	afa8af0f88	feat(app): 별칭 dense 별도 벡터 색인 + purge (sentinel)	2026-05-30 10:48:58 +00:00
altair823	b9d20d23d1	feat(config): ingest.expansion.embed_aliases flag (default off)	2026-05-30 10:31:07 +00:00
altair823	86b4e1ebd0	feat(core): ALIAS_SUFFIX + strip_alias_suffix (dense alias vectors)	2026-05-30 10:31:03 +00:00
altair823	116b3e6377	fix(app): clippy unused_self — build_request 를 associated fn 으로 CI 게이트(clippy --workspace --all-targets -D warnings) 통과. 동작 동일. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 03:47:06 +00:00
altair823	a271352e33	feat(search): lexical body+alias 병합 검색 (pool-rescue) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 03:12:14 +00:00
altair823	cde4d75f6b	feat(app): ingest 별칭 생성 hook (flag off 기본, fail-soft)	2026-05-30 03:03:09 +00:00
altair823	bddcd53688	fix(app): parse_aliases 접두 제거가 숫자/하이픈 선두 별칭 손상 (Task 4 리뷰 MAJOR-1) 탐욕적 trim_start_matches → 명시적 strip_list_marker(마커+공백 패턴만 1회). "3D 렌더링"/"2단계"/"-fast" 보존, "- "/"1. " 마커만 제거. 회귀 테스트 2개. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 02:49:25 +00:00
altair823	2a207f9868	feat(app): ExpansionGenerator — 청크당 별칭 생성 (fail-soft) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 02:36:20 +00:00
altair823	cc31868d24	feat(config): [ingest.expansion] flag (default off)	2026-05-30 02:26:41 +00:00
altair823	0df47febf0	test(store): doc-side expansion Task 2 리뷰 보강 (M1/M2/N1) - M1: chunk_aliases trigger 가드에 AND aliases <> '' (빈 문자열 미색인) - M2: 재색인 멱등 테스트 (재-put 후 별칭 행 1개) - N1: 본문 격리 음성 단언 (별칭 term 이 chunks_fts 로 누출 안 됨) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 02:24:24 +00:00
altair823	b12a616ab2	feat(store): V010 chunk_aliases_fts + put_chunks 별칭 영속화 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 02:15:27 +00:00
altair823	848b75c069	feat(core): Chunk.aliases 필드 (doc-side expansion) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-30 02:09:39 +00:00
altair823	0988f66331	feat(cli): kebab eval variants <run_id> — 변형 일관성 진단 리포트 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	82e02aa4fe	fix(eval): 변형 일관성 리뷰 H1/M1 — pool truncation 방어 + answer 판정 정렬 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	db4af0cc72	feat(eval): 변형 일관성 메트릭 + A/B(순위출렁/어휘격차) 분류 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	ab20202241	test(eval): Task1 리뷰 nit — 3+멤버 그룹/group=None 테스트 + 에러 메시지에 divergent query id Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	a51e6395c0	feat(eval): GoldenQuery.group + 그룹 정합성 검증 (변형 일관성 기반)	2026-05-30 00:53:24 +00:00
altair823	a3bb2580bf	test(rag): add rag-v3 dispatch integration test + refresh stale rag-v2 docs (code-review) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 04:46:27 +00:00
altair823	d93b757cf1	fix(cli): thread --config through kebab eval run/aggregate/compare (facade-rule) Cmd::Eval now loads Config via cli.config (same pattern as all other subcommands) before dispatching to the inner match. Each arm now calls the *_with_config variant: run_eval(&opts) → run_eval_with_config(&cfg, &opts) compute_aggregate(run_id) → compute_aggregate_with_config(&cfg, run_id) store_aggregate(run_id, ..) → store_aggregate_with_config(&cfg, run_id, ..) Compare already called compare_runs_with_config but sourced cfg from Config::load(None) — that redundant load is removed; cfg comes from the shared binding above. Fixes the same facade-rule regression pattern as P3-5 / P4-3: previously `kebab --config /build/dogfood/config.toml eval run` silently evaluated the XDG-default (empty) KB instead of the dogfood KB. Also fixes runner.rs test that hardcoded rag-v2 after commit `5719969` bumped the default prompt_template_version to rag-v3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 03:42:40 +00:00
altair823	1454321b12	fix(rag): rag-v3 refusal/hedge phrasing follows answer language (Finding O-2) SYSTEM_PROMPT_RAG_V3: 한국어 리터럴 refusal/hedge 문구를 언어 중립으로 교체. - 근거가 부족하면 "근거가 부족하다"고 답한다. → 답변 언어로 근거가 부족함을 밝히고 [#번호] 인용 없이 답한다. - 근거가 모호하면 "확실하지 않다" 라고 명시한다. → 근거가 모호하면 답변 언어로 불확실함을 명시한다. MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT: 동일 패턴 두 곳 교체. - 근거가 부족하면 "근거가 부족하다"고 답한다. → 답변 언어로 근거가 부족함을 밝히고 [#번호] 인용 없이 답한다. - self-check 의 즉시 "근거가 부족하다" 라고만 답한다. → 즉시 답변 언어로 근거가 부족하다고만 답한다. refusal 판정 로직(citation marker 기반)은 무변경 — 문구만 언어 중립화. test rag_v3_contains_v2_rules_plus_language_rule: "확실하지 않다" assert → "불확실함" assert 로 갱신. task spec grep: tasks/p9/p9-fb-15 의 rag-v2 언급은 도입 시점 historical 기술 → frozen 유지. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 02:43:28 +00:00
altair823	649ec35108	feat(cli): add Ollama remote-endpoint hint to kebab init (Todo #8 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 02:19:34 +00:00
altair823	dece5e89fc	feat(bulk): document bulk search input schema + error shape hint (Todo #2 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 02:16:21 +00:00
altair823	3cb49f1f9b	feat(cli): show title + doc_path in list docs human output (Todo #3 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 02:03:10 +00:00
altair823	b82eaec21a	feat(config): default prompt_template_version rag-v2 -> rag-v3 (Todo #1 )	2026-05-28 21:20:59 +00:00
altair823	6daa43375b	feat(rag): add rag-v3 system prompt with response-language matching (Todo #1 )	2026-05-28 21:20:18 +00:00
altair823	fe20be8195	feat(chunk): N-gram supplement (Option β) — sub-token emit for Korean compounds #4 (사용자 요청): spec §6.2 의 Option β (sub-token 추가 emit) 를 v0.21.x P9 follow-up 에서 v0.20.1 implementation 으로 promote. dogfood 의 ko-dic compound noun limitation (`대한민국`, `한국정부`, `주민등록번호` 등 단일 token 정책) 해소. Implementation (`crates/kebab-chunk/src/lib.rs::tokenize_korean_morphological`): - 신규 helper `is_hangul()` — 한글 음절 (U+AC00..D7A3) + 자모 (U+1100..11FF, U+3130..318F) 판정. - lindera output 의 각 morpheme 에 대해, 한글만 + 길이 ≥ 3 인 경우 sliding window 2-gram 추가 emit. `[한국정부, 한국, 국정, 정부]` 형태로 token list expand. - 영어 / 숫자 / 혼합 token 은 supplement X (false positive 회피). Tests (`crates/kebab-chunk/tests/tokenize_korean.rs`): - `tokenize_korean_morphological_emits_2gram_for_long_morpheme`: 5 probe fixture 중 supplement 발화 case 확인 (실측 `서울특별시` → `[서울, 특별시, 특별, 별시]`, `대한민국` → `[대한민국, 대한, 한민, 민국]`). - `tokenize_korean_morphological_no_2gram_for_english`: Rust optimization fixture 에서 영어 substring (`Rus`, `ust`, `imi`) emit 없음 보장. Dogfood evidence (`tasks/HOTFIXES.md` 2026-05-28 entry 보강): - '대한', '한민', '민국' query 모두 hit (대한민국 의 sliding window). - '특별', '주민', '등록' 같은 sub-token query hit. - 영어 'tokenizer' query 는 corpus 부재로 0 hit (supplement X). - Trade-off: DB size +20-30% (Korean-heavy), false positive 작은 risk. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §6.2 (Option β promote) Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (post-implementation enhancement)	2026-05-28 13:48:05 +00:00
altair823	53ec9b4dc5	test(chunk): regenerate AST + long-section snapshots for V009 chunk field S3 의 Chunk struct 갱신 (kebab-core 의 tokenized_korean_text: Option<String> field 추가) 가 모든 chunk snapshot JSON 의 serde serialize 결과를 변경시킴. 10 snapshot fixture (9 AST chunker + markdown long-section) 의 baseline 을 V009 형태로 regenerate. 각 snapshot 의 변경 = chunk JSON 마다 `"tokenized_korean_text": null` field 추가 (대부분의 fixture 가 영어 코드라 lindera 의 None fallback). 동작 변경 없음 — serde representation 의 cascade만. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §6.2 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S3 follow-up via S11 sanity)	2026-05-28 12:27:37 +00:00
altair823	21b52bc285	style: cargo fmt --all (S3+S4+S5+S7 follow-up) V009 morphological tokenizer 작업 (S3 chunk + S4 backfill + S5 short_query_hint 제거 + S7 신규 tests) 의 형식 정리. 동작 변경 없음. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S11)	2026-05-28 12:06:01 +00:00
altair823	26f3a7756c	test(eval): regenerate runner_per_query_snapshot for V009 baseline V009 FTS5 tokenizer (trigram → unicode61 + 형태소) 로 인한 BM25 distribution + hit ordering 변경의 의도된 cascade. eval runner 의 per-query snapshot (run-1.json) 을 V009 baseline 으로 regenerate. Regenerate 절차: UPDATE_SNAPSHOTS=1 cargo test -p kebab-eval --test runner runner_per_query_snapshot_matches_fixture -j 4. 회귀 0 — kebab-eval workspace 의 모든 test (runner / loader / metrics_and_compare) pass. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §11.3 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S8)	2026-05-28 11:51:49 +00:00
altair823	881f949fcb	test(search): regenerate lexical_snapshot_run_1 for V009 BM25 distribution V009 FTS5 tokenizer 가 trigram → unicode61 + tokenized_korean_text column 로 갱신되면서 BM25 IDF 계산이 변화 → fusion_score 의 부동 소수점 값이 미세하게 다름 (히트 순서·snippet·구조 동일). S1 머지 직후 update 누락되었던 snapshot 을 V009 baseline 으로 regenerate. 회귀 없음: `cargo test -p kebab-search --test lexical lexical_snapshot_run_1 -j 4` exit 0. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §11.3 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S1 follow-up via S7 reviewer)	2026-05-28 11:49:15 +00:00
altair823	c5de5f812b	test(fts,app): V009 morphological tokenizer integration tests 신규 4 test 추가: - crates/kebab-store-sqlite/tests/fts.rs: - fts_v009_korean_morphological_2char_query_hits: tokenized_korean_text column 이 채워진 chunk 의 '한국' 2-char query hit. - fts_v009_english_whole_token_only: V007 trigram substring 매칭 회귀 (Path A) — 'token' query 가 'tokenizer' chunk 에서 0-hit. - crates/kebab-app/tests/search_korean.rs: - korean_morphological_2char_query_lexical_mode: end-to-end 한국어 wiki fixture ingest → '한국' / '서울' query hit. - korean_morphological_mixed_english_korean_query: 'Rust' English whole-token + '최적화' Korean morpheme hit. crates/kebab-search/src/lexical.rs: - build_match_string() 의 MIN_TRIGRAM_CHARS(3) → MIN_QUERY_CHARS(2). V009 unicode61 은 최소 token 길이 제한 없어 2자 한국어 morpheme query 가 통과되어야 함. 1자 단독은 여전히 필터. - 관련 unit test 2개 V009 동작으로 갱신. fixture text 는 lindera ko-dic 의 실제 segmentation 동작에 의존 (spec Appendix B prior-knowledge 예측). 실측 시 fixture 조정 가능. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §9.1, §9.2 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S7) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 11:38:52 +00:00
altair823	f94e0c4a9b	feat(app): bump lexical_index_version to V009 (fts5-v009-korean-morphological) V009 의 FTS5 tokenizer 가 trigram → unicode61 + 한국어 형태소 분해 column 로 갱신됨. lexical_index_version 의 format 에 `fts5-v009-korean-morphological` suffix 추가하여 V007 baseline 과 구별. eval runner 의 config_snapshot 및 search cache 무효화에 자동 picks up. 기존 format: lex:{chunker_version} 신규 format: lex:{chunker_version}:fts5-v009-korean-morphological Wire schema shape 변경 없음 (SearchHit.index_version 의 string content 만 변화). lexical_index_version_is_returned_unchanged test 는 IndexVersion 의 임의 string 을 사용해 unchanged. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §11.1, §11.3 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S6)	2026-05-28 11:23:13 +00:00
altair823	923b959610	refactor(app): retire short_query_hint helper, keep wire field as None V009 unicode61 + 형태소 tokenizer 환경에서 2-char 한국어 query 가 hit 가능해졌으므로 V007 시기의 "3자 이상 권장" hint 가 obsolete. SearchResponse.hint field 는 wire schema 보존 위해 struct 에 유지 + 항상 None. - kebab-app/src/app.rs: short_query_hint 함수 + doc-comment 삭제. 2 호출 site 가 hint = None 으로 정리. - kebab-app/src/lib.rs: re-export 에서 short_query_hint 제거. - kebab-tui/{app.rs,search.rs,run.rs}: short_query_hint field + 4 호출 cascade 제거. - kebab-cli/tests/wire_search_response.rs: search_plain_emits_short_query_hint_to_stderr test 삭제. search_json_emits_hint_field_for_short_query → search_json_hint_absent_for_short_query_v009 으로 교체 (hint 항상 None 검증). - kebab-search/src/lexical.rs::build_match_string: V007 의 trigram multi-token OR-combine 분기는 V009 환경에서 redundant 하나 보존 (future 확장성) — doc-comment 1 줄 추가. Wire schema shape 변경 없음 (search_response.schema.json:33 의 hint field 보존, struct 에 None 으로 항상 셋팅). Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §7.2, §7.3, §11.3 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S5) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 11:13:45 +00:00
altair823	b63af20b72	feat(app): first-boot eager backfill for tokenized_korean_text V007 → V009 업그레이드 시 기존 chunks 의 tokenized_korean_text 가 NULL — 첫 App::open_with_config 호출 시 자동으로 lindera ko-dic 으로 분해 후 UPDATE. chunks_au trigger 가 chunks_fts 를 자동 재-index. 사용자 재-ingest 불필요. - crates/kebab-store-sqlite/src/store.rs: backfill_tokenized_korean_text(progress_cb, tokenize) API. 1000 row 마다 commit + progress 콜백. idempotent (IS NULL 필터로 partial completion 재실행 안전). tokenizer 를 파라미터로 받아 §8 dep 경계 유지. - crates/kebab-app/src/app.rs::open_with_config: run_migrations 직후 backfill 호출. 실패 시 warn log 만 (App open 은 성공 — vector/hybrid mode 계속 가능). 500 row 마다 info log progress. - crates/kebab-store-sqlite/tests/fts.rs: backfill_tokenized_korean_text_populates_nullable_rows 단위 test (idempotency 포함). - clippy pre-existing 오류 수정 (redundant_closure, map_unwrap_or, cast_lossless, uninlined_format_args — kebab-app/ingest_log.rs, pdf_ocr_apply.rs, app.rs, tests/ocr_inspect_smoke.rs). Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §8.1, §8.2 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S4) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 11:01:00 +00:00
altair823	e8f44a57e3	fix(test): update first_ingest_bumps_corpus_revision baseline for V009 V004 seeds corpus_revision=0, V009 migration bumps to 1 (spec §5.2 — LRU cache invalidation). Test previously asserted fresh store = 0; now reads post-migration baseline dynamically and verifies that the ingest commit increments past it. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §5.2 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S3 follow-up)	2026-05-28 10:47:09 +00:00
altair823	4b4a8cbb3a	fix(test): retire V007 trigram-only fts tests, add V009 unicode61 sanity V007 trigram tokenizer 의 substring 매칭을 검증하던 3 test 는 V009 unicode61 으로 의도된 회귀 (spec §3 Non-Goals Path A) 가 발생하므로 obsolete: - fts_trigram_korean_3char_substring_hits: '발생한' → '발생한다' hit 은 trigram 의 substring 매칭이라 V009 의 whole-token 매칭에서 fail. - fts_trigram_korean_short_query_zero_hit_pinned: 2-char Korean query 의 0-hit 동작은 V009 의 형태소 column 으로 해소되므로 이 핀 자체가 obsolete (S7 이 신규 2-char hit test 로 대체). - fts_trigram_english_substring_hits: 'token' → 'tokenizer' hit 은 V009 unicode61 의 whole-token only 에서 fail. 신규 추가: - fts_v009_unicode61_space_separated_korean_token_hits: V009 unicode61 의 whole-token 매칭 sanity (token '충돌은' hit, substring '발생한' 0-hit). S7 이 추가할 morphological 검증 test 와 별개의 baseline. S7 (plan §2 Step 7) 가 v009_korean_morphological_2char_query_hits + v009_english_whole_token_only 를 추가하여 회귀 + 신규 동작 모두 핀할 예정. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §3, §9.2 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S3 follow-up)	2026-05-28 10:36:32 +00:00
altair823	4dc1c10be1	fix(test): update corpus_revision baseline to post-V009 migration V009 migration bumps corpus_revision by 1 at apply time (spec §5.2 — invalidates pre-V009 LRU search cache). Existing tests assumed V004 seed (0) was the final baseline; updated to expect 1 after migration: - fresh_store_starts_at_zero → fresh_store_starts_at_post_migration_baseline - bump_increments_monotonically: expected 1,2,3 → 2,3,4 (post-baseline) - revision_persists_across_reopen: 2 → 3 (V009 +1, +bump,+bump) Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §5.2 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S3 follow-up)	2026-05-28 10:33:50 +00:00
altair823	bd86f61c9c	fix(chunk): close S3 reviewer blockers — get_chunk read + AST chunker cascade S3 spec compliance reviewer (sonnet) 가 2 blocker 발견: 1. crates/kebab-store-sqlite/src/documents.rs: get_chunk SELECT 가 tokenized_korean_text column 을 미조회 → DB 의 값이 read 시 유실. SELECT column list + row → Chunk 변환 시 row.get 인덱스 추가. ChunkRow struct + chunk_row_from_sql + get_chunk Chunk 생성 cascade. 2. crates/kebab-chunk/src/code_*_ast_v1.rs (9 file): make_chunk 가 tokenized_korean_text: None 하드코딩 → 한국어 주석을 가진 코드 파일이 FTS hit 안 됨. tier2_shared 와 동일 패턴으로 tokenize_korean_morphological(text) 호출 cascade. 이 commit 은 S3 의 rework — amend 아닌 별 commit (S3 boundary 유지). spec §6.2 invariant ("모든 chunker 가 chunk emit 직전에 tokenize 호출") 충족. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §6.2 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S3 rework) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 10:30:53 +00:00
altair823	b134ae9dd5	feat(chunk): integrate lindera korean morphological tokenizer V009 의 tokenized_korean_text column 에 들어갈 morpheme sequence 를 lindera ko-dic 으로 분해. chunk builder pipeline 의 chunk 생성 직후 시점에서 호출 → chunk struct 의 field 에 pre-fill → store 의 put_chunks 가 단일 transaction 안에서 INSERT. - crates/kebab-core/src/chunk.rs: Chunk struct 에 tokenized_korean_text: Option<String> field 추가 (#[serde(default)]). - crates/kebab-chunk/src/lib.rs: tokenize_korean_morphological() helper + OnceLock 캐싱 + fallback (None) 정책. - crates/kebab-chunk/Cargo.toml: lindera features = ["embed-ko-dic"] 추가 (DictionaryKind::KoDic 활성화에 필요). - 모든 chunker (tier2_shared, md_heading_v1, pdf_page_v1, 9개 code AST v1): Chunk 리터럴에 tokenized_korean_text pre-fill. - crates/kebab-store-sqlite/src/documents.rs::put_chunks: INSERT SQL column list + placeholder + binding 갱신 (12번째 column). - crates/kebab-chunk/tests/tokenize_korean.rs: 단위 테스트 2개. lindera 3.0.7 API 정정: load_dictionary_from_kind → load_embedded_dictionary, Token.text → Token.surface. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §6.2 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S3) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 10:22:15 +00:00
altair823	597d8b70ad	feat(deps): add lindera + lindera-ko-dic for korean morphological tokenizer Workspace dependency 만 추가 — 실제 사용은 S3 의 kebab-chunk tokenize_korean_morphological() helper. - Cargo.toml (workspace): lindera = "3", lindera-ko-dic = "3" 추가. - crates/kebab-chunk/Cargo.toml: per-crate dep (lindera-ko-dic 에 embed-ko-dic feature 로 KO-DIC 딕셔너리 embedded blob 활성화). - crates/kebab-app/Cargo.toml: [features] 에 fts_korean_morphological (spec §6.3 Option A — marker role only, disable path 없음). License: lindera = MIT, lindera-ko-dic = MIT (cargo info 로 확인). cargo deny 도입은 P9 follow-up. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §6.1, §10.1 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 10:03:58 +00:00
altair823	b106120e93	feat(fts): add V009 korean morphological tokenizer migration V007 trigram tokenizer 의 한국어 2자 query 0-hit 한계 (Bug #8) 해소를 위한 V009 migration 추가. unicode61 tokenizer 로 환원 + 한국어 형태소 분해 결과를 별 column `tokenized_korean_text` 에 pre-fill 하는 방식. - migrations/V009__fts_korean_morphological.sql 신규: column ADD, chunks_fts DROP+재정의, 3 trigger CASE expression, backfill INSERT, corpus_revision bump. - design §5.5 갱신: trigram → unicode61 + 형태소 column. CASE expression trigger 본문. - crates/kebab-store-sqlite/tests/fts.rs: V007 verbatim test 를 V009 source-of-truth 로 rename. v009_bumps_corpus_revision unit test 추가. - store.rs: clippy bool_to_int_with_if + cast_lossless 기존 경고 수정 (pdf_ocr_events 관련 코드, S1 작업 중 발견). 영어 substring 매칭은 V002 (whole-token only) 로 회귀 — spec §3 Non-Goals + 후속 release notes (v0.20.1) 에서 정직히 기술. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S1) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 09:48:46 +00:00
altair823	9a36a06f97	style: cargo fmt --all (v0.20.x logging r2 feature follow-up) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 06:34:01 +00:00

1 2 3 4 5 ...

574 Commits