docs(v0.20.x): sync README + HANDOFF + ARCH + SKILL + HOTFIXES for V009

V009 한국어 morphological tokenizer 의 사용자 visible surface 변경 +
release notes scope 를 5 docs 에 cascade.

- README.md: kebab search 명령 row 에 한국어 2자 query 지원 명시.
- integrations/claude-code/kebab/SKILL.md: V007 3-char hint 제거 +
  V009 2자 한국어 query 지원 1줄.
- HANDOFF.md: C task status 완료 flip + v0.20.1 release notes scope
  에 본 변경 추가 + 머지 후 발견 summary 행.
- docs/ARCHITECTURE.md: embedding upgrade (e5-small → e5-large),
  lindera-ko-dic FTS5 한국어 지원, version notes 추가.
- tasks/HOTFIXES.md: 2026-05-28 entry — Bug #8 V009 해소, lindera-ko-dic
  실제 crate name (spec deviation), cargo-deny deferred, Path A
  영어 substring 회귀 명시.

Spec: tasks/p9/p9-9-v0.20.x-korean-morphological-tokenizer-spec.md §7.4
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-05-28 11:55:25 +00:00
parent 26f3a7756c
commit d13eb87401
5 changed files with 42 additions and 10 deletions

View File

@@ -60,7 +60,7 @@ Input:
- Cite back to the user as `doc_path § heading_path[-1]` so they can open the source.
- When `truncated: true`, the budget loop modified the page (snippet shortening or k reduction). `next_cursor` is **independent** — non-null whenever more hits may be reachable. Caller may widen `max_tokens` (re-issue same query for fuller snippets / more hits per page) or follow `next_cursor` (advance through more hits) or both. Mismatched cursor (corpus_revision changed) returns `error.v1.code = stale_cursor` — re-issue the search to obtain a fresh one.
- **`trace: true` (p9-fb-37)** — debug aid. Response carries an extra `trace` block: `lexical[]` + `vector[]` (pre-fusion candidates), `rrf_inputs[]` (RRF union before final cut), and `timing` (`lexical_ms`, `vector_ms`, `fusion_ms`, `total_ms`). Trace bypasses the search cache (always cold). Use sparingly — it bloats the wire response and is for diagnosing "why did this hit / not hit", not normal retrieval.
- **`hint` (v0.17.0)** — optional advisory string on `search_response.v1`. Present only when the result is empty AND the trimmed query is shorter than the FTS5 trigram tokenizer's 3-char minimum. Surface it to the user instead of retrying the same short query. Korean lexical search benefits most from ≥3-char keywords (`충돌` zero-hit, `충돌은` substring-hit). Raw FTS5 mode (`'...'`) opts out — the user opted into FTS5 syntax. Vector / hybrid modes carry the field too but it's rarely triggered (semantic embeddings handle short queries).
- **`hint` (v0.17.0, v0.20.1 한국어 2자 지원)** — optional advisory string on `search_response.v1`. Present only when the result is empty AND using the trigram tokenizer in certain edge cases. Korean queries now support 2-character words (`한국`, `서울` both hit). For best results use ≥3-character keywords in general queries. Raw FTS5 mode (`'...'`) opts out — the user opted into FTS5 syntax. Vector / hybrid modes carry the field too but it's rarely triggered (semantic embeddings handle short queries).
- **Column scoping (post-v0.17.1 dogfood)** — default lexical / hybrid matching is scoped to the `text` column only. The `heading_path` column is indexed (path segments like `app`, `src` plus JSON punctuation are 3-gram'd) but excluded from the default MATCH — past JSON noise produced false positives where a query coincidentally shared a 3-gram with a file's heading path. To deliberately search heading paths, escape into raw FTS5 mode with an explicit column filter: `'heading_path : <token>'` (e.g. `kebab search "'heading_path : agent'"`). Same applies to MCP `search` — quote the inner expression. Raw mode bypasses both column scoping and the 3-char `hint` short-circuit.
### `mcp__kebab__bulk_search`