feat(deps): add lindera + lindera-ko-dic for korean morphological tokenizer

Workspace dependency 만 추가 — 실제 사용은 S3 의 kebab-chunk
tokenize_korean_morphological() helper.

- Cargo.toml (workspace): lindera = "3", lindera-ko-dic = "3" 추가.
- crates/kebab-chunk/Cargo.toml: per-crate dep (lindera-ko-dic 에
  embed-ko-dic feature 로 KO-DIC 딕셔너리 embedded blob 활성화).
- crates/kebab-app/Cargo.toml: [features] 에 fts_korean_morphological
  (spec §6.3 Option A — marker role only, disable path 없음).

License: lindera = MIT, lindera-ko-dic = MIT
(cargo info 로 확인). cargo deny 도입은 P9 follow-up.

Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §6.1, §10.1
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S2)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-28 10:03:58 +00:00
parent b106120e93
commit 597d8b70ad
4 changed files with 491 additions and 34 deletions

View File

@@ -203,6 +203,10 @@ ort = { version = "=2.0.0-rc.9", default-features = false, features = [
tokenizers = { version = "0.21", default-features = false, features = ["onig"] }
hf-hub = { version = "0.4", default-features = false, features = ["ureq", "rustls-tls"] }
ndarray = "0.16"
# Korean morphological tokenizer (FTS v0.20.x, §6.1). lindera-ko-dic bundles
# the KO-DIC dictionary as an embedded blob via the embed-ko-dic feature.
lindera = "3"
lindera-ko-dic = "3"
# Disk-footprint trim for dev / test builds. Codegen, opt-level, and
# behavior are unchanged — only DWARF debug info is reduced (line