b134ae9dd5
feat(chunk): integrate lindera korean morphological tokenizer
...
V009 의 tokenized_korean_text column 에 들어갈 morpheme sequence
를 lindera ko-dic 으로 분해. chunk builder pipeline 의 chunk 생성
직후 시점에서 호출 → chunk struct 의 field 에 pre-fill → store
의 put_chunks 가 단일 transaction 안에서 INSERT.
- crates/kebab-core/src/chunk.rs: Chunk struct 에
tokenized_korean_text: Option<String> field 추가 (#[serde(default)]).
- crates/kebab-chunk/src/lib.rs: tokenize_korean_morphological()
helper + OnceLock 캐싱 + fallback (None) 정책.
- crates/kebab-chunk/Cargo.toml: lindera features = ["embed-ko-dic"]
추가 (DictionaryKind::KoDic 활성화에 필요).
- 모든 chunker (tier2_shared, md_heading_v1, pdf_page_v1, 9개
code AST v1): Chunk 리터럴에 tokenized_korean_text pre-fill.
- crates/kebab-store-sqlite/src/documents.rs::put_chunks: INSERT
SQL column list + placeholder + binding 갱신 (12번째 column).
- crates/kebab-chunk/tests/tokenize_korean.rs: 단위 테스트 2개.
lindera 3.0.7 API 정정: load_dictionary_from_kind →
load_embedded_dictionary, Token.text → Token.surface.
Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §6.2
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S3)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-28 10:22:15 +00:00
597d8b70ad
feat(deps): add lindera + lindera-ko-dic for korean morphological tokenizer
...
Workspace dependency 만 추가 — 실제 사용은 S3 의 kebab-chunk
tokenize_korean_morphological() helper.
- Cargo.toml (workspace): lindera = "3", lindera-ko-dic = "3" 추가.
- crates/kebab-chunk/Cargo.toml: per-crate dep (lindera-ko-dic 에
embed-ko-dic feature 로 KO-DIC 딕셔너리 embedded blob 활성화).
- crates/kebab-app/Cargo.toml: [features] 에 fts_korean_morphological
(spec §6.3 Option A — marker role only, disable path 없음).
License: lindera = MIT, lindera-ko-dic = MIT
(cargo info 로 확인). cargo deny 도입은 P9 follow-up.
Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §6.1, §10.1
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-28 10:03:58 +00:00
710945c4b0
refactor(parse-md): absorb kebab-normalize + kebab-parse-types — 24 → 22 crates + §3.7b 재작성
...
design §3.7b 의 thin layer (ParsedBlock 류) 가 4 parser 중 1개 (markdown) 만 lift 를
경유하는 현실 — fan-in/fan-out 모두 1 → layer 의미 잃음. kebab-normalize (1097 LOC)
+ kebab-parse-types (98 LOC) 둘을 kebab-parse-md 로 흡수.
설계: docs/superpowers/specs/2026-05-26-normalize-absorption-spec.md
플랜: docs/superpowers/plans/2026-05-26-normalize-absorption-plan.md
HOTFIXES: tasks/HOTFIXES.md 의 2026-05-26 entry (design deviation)
- 5 사용 type + 3 forward-declared struct → kebab-parse-md::types module 의 pub explicit re-export.
- build_canonical_document + derive_title + warning_agent → kebab-parse-md::normalize module.
- 4 hard-coded agent literal (lib.rs:122/128/134/153) + warning_agent body return + tracing target literal 모두 보존 — stage label 일관성.
- kebab-app callsite (lib.rs:51 use + :1119 context string) + Cargo.toml 의 2 dep (regular + dead) 제거.
- kebab-chunk + kebab-store-sqlite 의 [dev-dependencies] kebab-normalize → 제거 (kebab-parse-md 로 갈음). 통합 test source 의 use shift.
- test file 이동 (kebab-normalize/tests/normalize_snapshot.rs → kebab-parse-md/tests/).
- workspace Cargo.toml: Hunk (a) members 2 entry 삭제 + Hunk (b) version 0.18.0 → 0.19.0 (frozen contract 변경).
- design §3.7b 4-단락 재작성 (원래 intent 보존 + 현재 상태 + 보존된 surface + future re-extraction trigger).
- design §8 graph 갱신 (3 edge 제거 + 2 forbidden bullet 의미 갱신 + commentary).
- ARCHITECTURE.md crate graph + directory tree mechanical 갱신.
- tasks/INDEX.md L169 closure mention + "Future work / deferred" 섹션 신설 (image/pdf normalize integration entry).
- tasks/HOTFIXES.md 신규 entry (4-block — design deviation Symptom).
- HANDOFF.md cross-link 한 줄.
- 3 dead struct (ParsedImageRegion / ParsedPdfPage / ParsedAudioSegment) 는 보존 — v0.20+ image/pdf normalize integration 의 future surface (spec §11).
Wire / surface impact: 0건. CLI / TUI / MCP / --json 출력 / config / XDG path /
parser_version 모두 unchanged. wire-invisible provenance.events[].agent + tracing target
literal "kb-normalize" 도 보존 — old DB row 와 new DB row 의 audit log 일관성.
Verification: cargo test --workspace --no-fail-fast -j 1 → 1313 passed / 0 failed (172 result blocks).
cargo clippy --workspace --all-targets -j 1 -- -D warnings → 0 warning (5m 46s).
cargo metadata --no-deps --format-version 1 | jq '.workspace_members | length' = 22.
cargo tree -p kebab-app --depth 2 | grep -E "kebab_(parse_types|normalize)" = 0 줄.
2026-05-26 15:00:59 +00:00
7c85de065a
chore: workspace-wide cleanup — clippy::pedantic baseline + auto-fix
...
cut PR v0.18.0 전 마지막 정리. 사용자 요청: "전체 코드베이스를 깔끔하고 알아보기 쉽게".
## Workspace lints
- `Cargo.toml` 의 `[workspace.lints.clippy]` 에 `pedantic = "warn"` (priority -1) + 의도적 allow-list 추가:
- cast_possible_truncation / cast_possible_wrap / cast_sign_loss / cast_precision_loss — ONNX i64 / hash modular reduction 등 의도적 truncation.
- doc_markdown / missing_errors_doc / missing_panics_doc — cosmetic doc style.
- too_many_lines / module_name_repetitions / must_use_candidate / needless_pass_by_value / manual_let_else / items_after_statements / similar_names — informational only.
- format_collect / match_wildcard_for_single_variants / trivially_copy_pass_by_ref / unnecessary_wraps — intentional patterns (exhaustive match, future Result variants 등).
- default_trait_access — `Foo::default()` 가 idiomatic.
- float_cmp — NLI / RRF score 의 explicit threshold 비교 의도.
- struct_excessive_bools / case_sensitive_file_extension_comparisons / naive_bytecount / ignore_without_reason — domain-specific 의도.
- format_push_string / return_self_not_must_use / match_same_arms — builder / wire-label / hot-path 패턴 보존.
- needless_continue / used_underscore_binding / nonminimal_bool / unreadable_literal / many_single_char_names / doc_link_with_quotes / assigning_clones / collapsible_str_replace / trivial_regex / elidable_lifetime_names / range_plus_one / explicit_iter_loop / implicit_hasher / ref_option — remaining low-value style.
- 각 24 crate `Cargo.toml` 에 `[lints] workspace = true` 추가.
## Auto-fix
`cargo clippy --workspace --all-targets --fix` 적용 — 128 files changed, 552 insertions / 472 deletions. 주로:
- uninlined_format_args (~18): `format!("{}", x)` → `format!("{x}")`.
- redundant_closure_for_method_calls (~33): `.map(|x| x.foo())` → `.map(T::foo)`.
- 그 외 mechanical refactor.
## 검증
- `cargo clippy --workspace --all-targets -j 1 -- -D warnings` clean (pedantic + 모든 lint group).
- `cargo test --workspace --no-fail-fast -j 1` — **1293 tests pass + 1 pre-existing flaky fail** (`kebab-mcp::tools_call_ask_multi_hop::ask_tool_routes_multi_hop_true_to_decompose_first`, HOTFIX candidate, cleanup 무관). 회귀 0.
Wire 영향: 없음.
Behavior 영향: 없음 (mechanical refactor only).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-26 03:01:58 +00:00
840c6c40a6
test(p10-1d-followup): cpp snapshot exercises actual CppAstExtractor
...
Reviewer nit #3 : the hand-built fixed_doc() only verified chunker 1:1
mapping. New tests invoke CppAstExtractor against tests/fixtures/sample.cpp
and snapshot the real extractor → chunker pipeline (14 blocks emitted
covering namespace::chunk::Class, ctor/dtor/operator/template/free-fn
convention, glue <top-level> blocks between units).
Adds kebab-parse-code as a dev-dep of kebab-chunk (same precedent as
kebab-parse-md). Both the existing hand-built test AND the new
extractor-driven tests are kept — the former for fast chunker-only
validation, the latter for end-to-end regression detection.
Added tests:
- code_cpp_ast_extractor_snapshot: asserts all 8 named symbol units are present
- code_cpp_ast_extractor_chunks_deterministic: chunker output is stable
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-21 22:43:57 +00:00
077f92f41e
build(p10-2): add serde_yaml dep to kebab-chunk for k8s-manifest-resource-v1
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-20 12:57:06 +00:00
911fb49550
refactor(rename): kb crates → kebab — Cargo packages, folders, Rust modules
...
프로젝트 이름 `kb` → `kebab` rename 의 첫 단계.
- workspace `Cargo.toml`: members `crates/kb-*` → `crates/kebab-*`,
repository URL `altair823/kb` → `altair823/kebab`.
- 18 crate 폴더 rename via `git mv` (history 보존).
- 각 crate `Cargo.toml`: `name = "kb-*"` → `"kebab-*"`, path deps
`../kb-*` → `../kebab-*`.
- 모든 `.rs`: `kb_<id>` snake-case 모듈 path 18 개 (`kb_core`,
`kb_config`, `kb_app`, `kb_cli`, `kb_eval`, `kb_search`, `kb_chunk`,
`kb_normalize`, `kb_source_fs`, `kb_parse_md`, `kb_parse_types`,
`kb_store_sqlite`, `kb_store_vector`, `kb_embed`, `kb_embed_local`,
`kb_llm`, `kb_llm_local`, `kb_rag`) → `kebab_<id>` 일괄 sed (단어
경계 \\b 사용해 영어 문장 안의 "kb" 약어 미오염).
CLI binary 이름 (`[[bin]] name = "kb"`), 환경변수 `KB_*`, XDG paths,
tracing target, 그리고 docs sweep 은 다음 commit 에서.
## 검증
- `cargo check --workspace` clean — 모든 crate 빌드 통과 후 commit.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-02 03:28:08 +00:00