kebab

Author	SHA1	Message	Date
altair823	b134ae9dd5	feat(chunk): integrate lindera korean morphological tokenizer V009 의 tokenized_korean_text column 에 들어갈 morpheme sequence 를 lindera ko-dic 으로 분해. chunk builder pipeline 의 chunk 생성 직후 시점에서 호출 → chunk struct 의 field 에 pre-fill → store 의 put_chunks 가 단일 transaction 안에서 INSERT. - crates/kebab-core/src/chunk.rs: Chunk struct 에 tokenized_korean_text: Option<String> field 추가 (#[serde(default)]). - crates/kebab-chunk/src/lib.rs: tokenize_korean_morphological() helper + OnceLock 캐싱 + fallback (None) 정책. - crates/kebab-chunk/Cargo.toml: lindera features = ["embed-ko-dic"] 추가 (DictionaryKind::KoDic 활성화에 필요). - 모든 chunker (tier2_shared, md_heading_v1, pdf_page_v1, 9개 code AST v1): Chunk 리터럴에 tokenized_korean_text pre-fill. - crates/kebab-store-sqlite/src/documents.rs::put_chunks: INSERT SQL column list + placeholder + binding 갱신 (12번째 column). - crates/kebab-chunk/tests/tokenize_korean.rs: 단위 테스트 2개. lindera 3.0.7 API 정정: load_dictionary_from_kind → load_embedded_dictionary, Token.text → Token.surface. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §6.2 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S3) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 10:22:15 +00:00
altair823	685007789a	style: cargo fmt --all (round 4 ingest log feature follow-up) Phase C4 executor 의 마지막 `fix(test): clippy + fmt fixes` commit 이 test file 부분만 fmt 적용. workspace 전체 fmt 누락 발견 → cargo fmt --all 적용. 모든 import alphabetical reorder + line wrapping 정합. 추가 untracked artifact 동시 commit: - docs/superpowers/specs/2026-05-28-v0.20-ingest-log-spec.md (491 line, ACCEPT) - docs/superpowers/plans/2026-05-28-v0.20-ingest-log-plan.md (616 line, ACCEPT) workspace test: 1370 passed / 0 failed / 50 ignored, ingest_log_smoke green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 04:18:40 +00:00
altair823	710945c4b0	refactor(parse-md): absorb kebab-normalize + kebab-parse-types — 24 → 22 crates + §3.7b 재작성 design §3.7b 의 thin layer (ParsedBlock 류) 가 4 parser 중 1개 (markdown) 만 lift 를 경유하는 현실 — fan-in/fan-out 모두 1 → layer 의미 잃음. kebab-normalize (1097 LOC) + kebab-parse-types (98 LOC) 둘을 kebab-parse-md 로 흡수. 설계: docs/superpowers/specs/2026-05-26-normalize-absorption-spec.md 플랜: docs/superpowers/plans/2026-05-26-normalize-absorption-plan.md HOTFIXES: tasks/HOTFIXES.md 의 2026-05-26 entry (design deviation) - 5 사용 type + 3 forward-declared struct → kebab-parse-md::types module 의 pub explicit re-export. - build_canonical_document + derive_title + warning_agent → kebab-parse-md::normalize module. - 4 hard-coded agent literal (lib.rs:122/128/134/153) + warning_agent body return + tracing target literal 모두 보존 — stage label 일관성. - kebab-app callsite (lib.rs:51 use + :1119 context string) + Cargo.toml 의 2 dep (regular + dead) 제거. - kebab-chunk + kebab-store-sqlite 의 [dev-dependencies] kebab-normalize → 제거 (kebab-parse-md 로 갈음). 통합 test source 의 use shift. - test file 이동 (kebab-normalize/tests/normalize_snapshot.rs → kebab-parse-md/tests/). - workspace Cargo.toml: Hunk (a) members 2 entry 삭제 + Hunk (b) version 0.18.0 → 0.19.0 (frozen contract 변경). - design §3.7b 4-단락 재작성 (원래 intent 보존 + 현재 상태 + 보존된 surface + future re-extraction trigger). - design §8 graph 갱신 (3 edge 제거 + 2 forbidden bullet 의미 갱신 + commentary). - ARCHITECTURE.md crate graph + directory tree mechanical 갱신. - tasks/INDEX.md L169 closure mention + "Future work / deferred" 섹션 신설 (image/pdf normalize integration entry). - tasks/HOTFIXES.md 신규 entry (4-block — design deviation Symptom). - HANDOFF.md cross-link 한 줄. - 3 dead struct (ParsedImageRegion / ParsedPdfPage / ParsedAudioSegment) 는 보존 — v0.20+ image/pdf normalize integration 의 future surface (spec §11). Wire / surface impact: 0건. CLI / TUI / MCP / --json 출력 / config / XDG path / parser_version 모두 unchanged. wire-invisible provenance.events[].agent + tracing target literal "kb-normalize" 도 보존 — old DB row 와 new DB row 의 audit log 일관성. Verification: cargo test --workspace --no-fail-fast -j 1 → 1313 passed / 0 failed (172 result blocks). cargo clippy --workspace --all-targets -j 1 -- -D warnings → 0 warning (5m 46s). cargo metadata --no-deps --format-version 1 \| jq '.workspace_members \| length' = 22. cargo tree -p kebab-app --depth 2 \| grep -E "kebab_(parse_types\|normalize)" = 0 줄.	2026-05-26 15:00:59 +00:00
altair823	7c85de065a	chore: workspace-wide cleanup — clippy::pedantic baseline + auto-fix cut PR v0.18.0 전 마지막 정리. 사용자 요청: "전체 코드베이스를 깔끔하고 알아보기 쉽게". ## Workspace lints - `Cargo.toml` 의 `[workspace.lints.clippy]` 에 `pedantic = "warn"` (priority -1) + 의도적 allow-list 추가: - cast_possible_truncation / cast_possible_wrap / cast_sign_loss / cast_precision_loss — ONNX i64 / hash modular reduction 등 의도적 truncation. - doc_markdown / missing_errors_doc / missing_panics_doc — cosmetic doc style. - too_many_lines / module_name_repetitions / must_use_candidate / needless_pass_by_value / manual_let_else / items_after_statements / similar_names — informational only. - format_collect / match_wildcard_for_single_variants / trivially_copy_pass_by_ref / unnecessary_wraps — intentional patterns (exhaustive match, future Result variants 등). - default_trait_access — `Foo::default()` 가 idiomatic. - float_cmp — NLI / RRF score 의 explicit threshold 비교 의도. - struct_excessive_bools / case_sensitive_file_extension_comparisons / naive_bytecount / ignore_without_reason — domain-specific 의도. - format_push_string / return_self_not_must_use / match_same_arms — builder / wire-label / hot-path 패턴 보존. - needless_continue / used_underscore_binding / nonminimal_bool / unreadable_literal / many_single_char_names / doc_link_with_quotes / assigning_clones / collapsible_str_replace / trivial_regex / elidable_lifetime_names / range_plus_one / explicit_iter_loop / implicit_hasher / ref_option — remaining low-value style. - 각 24 crate `Cargo.toml` 에 `[lints] workspace = true` 추가. ## Auto-fix `cargo clippy --workspace --all-targets --fix` 적용 — 128 files changed, 552 insertions / 472 deletions. 주로: - uninlined_format_args (~18): `format!("{}", x)` → `format!("{x}")`. - redundant_closure_for_method_calls (~33): `.map(\|x\| x.foo())` → `.map(T::foo)`. - 그 외 mechanical refactor. ## 검증 - `cargo clippy --workspace --all-targets -j 1 -- -D warnings` clean (pedantic + 모든 lint group). - `cargo test --workspace --no-fail-fast -j 1` — 1293 tests pass + 1 pre-existing flaky fail (`kebab-mcp::tools_call_ask_multi_hop::ask_tool_routes_multi_hop_true_to_decompose_first`, HOTFIX candidate, cleanup 무관). 회귀 0. Wire 영향: 없음. Behavior 영향: 없음 (mechanical refactor only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 03:01:58 +00:00
altair823	1969c8e3b5	fix(dogfood): k8s multi-resource YAML chunk_id collision P10 dogfooding found that a k8s manifest with 2+ documents (e.g. Deployment + Service in one file) fails to ingest: UNIQUE constraint failed: chunks.chunk_id Root cause: tier2_shared::push_chunks_with_oversize's non-oversize branch hardcoded split_key = None. K8sManifestResourceV1Chunker calls it once per resource; with split_key None every resource from the same document gets the same id_hash (= base_policy_hash) → identical chunk_id. p10-3's code_text_paragraph_v1 had the same bug (fixed in `df3c5b8`) but it calls build_chunk_no_symbol directly — the push_chunks_with_oversize path was never fixed. Fix: push_chunks_with_oversize gains a base_split_key parameter for the non-oversize single-chunk case. k8s chunker passes Some(resource.line_start) so each resource gets a distinct chunk_id; dockerfile / manifest pass None (1 chunk per file — no sibling collision, chunk_id stays stable). Regression coverage: k8s_multi_doc_emits_one_chunk_per_resource now asserts chunk_id distinctness; new integration test tier2_k8s_multi_resource_yaml_ingests_without_collision ingests a real 2-document YAML end-to-end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 23:49:37 +00:00
altair823	840c6c40a6	test(p10-1d-followup): cpp snapshot exercises actual CppAstExtractor Reviewer nit #3: the hand-built fixed_doc() only verified chunker 1:1 mapping. New tests invoke CppAstExtractor against tests/fixtures/sample.cpp and snapshot the real extractor → chunker pipeline (14 blocks emitted covering namespace::chunk::Class, ctor/dtor/operator/template/free-fn convention, glue <top-level> blocks between units). Adds kebab-parse-code as a dev-dep of kebab-chunk (same precedent as kebab-parse-md). Both the existing hand-built test AND the new extractor-driven tests are kept — the former for fast chunker-only validation, the latter for end-to-end regression detection. Added tests: - code_cpp_ast_extractor_snapshot: asserts all 8 named symbol units are present - code_cpp_ast_extractor_chunks_deterministic: chunker output is stable Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 22:43:57 +00:00
altair823	b2a2902e38	feat(p10-1d): code-cpp-ast-v1 chunker + snapshot test Identical chunker body to code-c-ast-v1 (per-language work happens in the CppAstExtractor, Task C). Snapshot fixture covers nested namespace + class + ctor/dtor + method + operator overload + template fn + free fn + top-level main, verifying namespace::Class::method symbol convention per design §3.4. 5 chunks emitted: - <top-level> (includes, namespace opening) - kebab::chunk::MdHeadingV1Chunker (class unit) - kebab::identity (template function) - kebab::global_helper (free function in namespace) - main (top-level main function) Template function symbols emit without <T> parameters per spec convention. Namespace::Class::method pattern verified. All tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 13:46:12 +00:00
altair823	03cd41c48f	feat(p10-1d): code-c-ast-v1 chunker + snapshot test Mirrors code-go-ast-v1's chunker pattern. Snapshot test against tests/fixtures/sample.c (function + typedef struct + typedef enum + preprocessor) verifies symbol list + lang=c stamping. Chunks produced (4 total): - <top-level> glue: includes, defines, static vars, typedefs (lines 1-18) - parse_record function (lines 20-23) - print_record function (lines 25-27) - main function (lines 29-33) All chunks stamped with lang=c and chunker_version=code-c-ast-v1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 13:41:19 +00:00
altair823	0b7d8af759	feat(p10-3): code-text-paragraph-v1 chunker — paragraph + line-window fallback Blank-line paragraph segmentation (whitespace-only lines as boundaries, blank lines themselves never in any chunk's range). Paragraphs > 80 lines split into 80-line windows with 20-line overlap (stride 60), sharing the input lang and symbol=None per spec §9.3. tier2_shared exposes a new build_chunk_no_symbol helper so Chunk id/hash/token semantics stay identical with Tier 1/2. Extracts build_chunk_from_span as private core so build_chunk and build_chunk_no_symbol share mechanics without drift. 4 unit tests cover multi-paragraph shell (4 paragraphs, blank-line boundaries verified), 200-line oversize line-window split (chunks 1-80 / 61-140 / 121-200), empty file, and lang preservation when input is yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 11:22:48 +00:00
altair823	75c1c7b911	test(p10-2-followup): cover tier2_shared oversize fallback with >200-line k8s ConfigMap Spec p10-2 risks section calls out "거대 ConfigMap" but no test exercised the line-window split branch of tier2_shared::push_chunks_with_oversize. This adds a 256-line ConfigMap fixture (generated inline) and asserts: - ≥2 chunks emitted (split happened), - all chunks share symbol `ConfigMap/prod/big`, - chunk_ids all distinct (id_for_chunk's #L{k} suffix disambiguation), - line ranges form a contiguous partition (prev.line_end + 1 == next.line_start). Reviewer nit #1 (PR #153 code-reviewer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 14:41:16 +00:00
altair823	22d4161728	feat(p10-2): manifest-file-v1 chunker (whole-file 1 chunk, symbol <manifest>) Emits 1 Chunk per manifest file (Cargo.toml / pyproject.toml / package.json / tsconfig.json / pom.xml / build.gradle / go.mod). Symbol unified to "<manifest>"; manifest type distinguished by code_lang (toml / json / xml / groovy / go-mod) read from Block::Code.lang. Oversize >200 lines splits via tier2_shared::push_chunks_with_oversize. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 13:11:46 +00:00
altair823	51004ac593	feat(p10-2): dockerfile-file-v1 chunker (whole-file 1 chunk, symbol <dockerfile>) Reads entire Dockerfile / Dockerfile.* / *.dockerfile content and emits a single Chunk with symbol "<dockerfile>", code_lang "dockerfile", line range 1..EOF. Oversize >200 lines splits into line-windows sharing the symbol via tier2_shared::push_chunks_with_oversize. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 13:09:13 +00:00
altair823	8996e73282	feat(p10-2): k8s-manifest-resource-v1 chunker + tier2_shared helper Splits multi-document YAML by ^---\s*$, requires apiVersion + kind string fields per document, emits 1 chunk per recognized k8s resource. Symbol = <kind>/<namespace>/<name> or <kind>/<name> (cluster-scoped). Invalid YAML returns 0 chunks (handled by p10-3 paragraph fallback). Oversize >200 lines splits into line-windows sharing the same symbol. tier2_shared module hosts the oversize fallback + Chunk-construction helper mirroring code_rust_ast_v1's Chunk shape. Task E (dockerfile) and Task F (manifest) will reuse it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 13:06:47 +00:00
altair823	813bdd1a16	test(p10-1c-jk): code-java-ast-v1 + code-kotlin-ast-v1 chunker snapshots Mirrors code_go_ast_snapshot pattern. In-memory CanonicalDocument (no kebab-parse-code dep — boundary §6.3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 10:57:37 +00:00
altair823	ab288135e9	test(p10-1c-go): code-go-ast-v1 chunker snapshot + full-suite gate Mirrors code_python_ast_snapshot / code_ts_ast_snapshot patterns. In-memory CanonicalDocument (no kebab-parse-code dep — boundary §6.3 respected). verify: - cargo test -p kebab-chunk --test code_go_ast_snapshot → 2/2 - cargo test --workspace --no-fail-fast -j 1 → 0 failures (all green) - cargo clippy --workspace --all-targets -- -D warnings → clean - SMOKE: chunk.ParseDoc symbol + code_lang_breakdown {"go": 1} 확인 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 09:54:17 +00:00
altair823	d6bb6cfd3b	test(p10-1b): per-language chunker snapshots (python/ts/js) Mirrors code_rust_ast_snapshot pattern. In-memory CanonicalDocument build so no kebab-parse-code dep (boundary §6.3 respected). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 01:39:17 +00:00
altair823	97e9f558f4	test(p10-1a-2): code-rust-ast-v1 chunker snapshot + full-suite gate Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 22:14:57 +00:00
altair823	911fb49550	refactor(rename): kb crates → kebab — Cargo packages, folders, Rust modules 프로젝트 이름 `kb` → `kebab` rename 의 첫 단계. - workspace `Cargo.toml`: members `crates/kb-` → `crates/kebab-`, repository URL `altair823/kb` → `altair823/kebab`. - 18 crate 폴더 rename via `git mv` (history 보존). - 각 crate `Cargo.toml`: `name = "kb-"` → `"kebab-"`, path deps `../kb-` → `../kebab-`. - 모든 `.rs`: `kb_<id>` snake-case 모듈 path 18 개 (`kb_core`, `kb_config`, `kb_app`, `kb_cli`, `kb_eval`, `kb_search`, `kb_chunk`, `kb_normalize`, `kb_source_fs`, `kb_parse_md`, `kb_parse_types`, `kb_store_sqlite`, `kb_store_vector`, `kb_embed`, `kb_embed_local`, `kb_llm`, `kb_llm_local`, `kb_rag`) → `kebab_<id>` 일괄 sed (단어 경계 \\b 사용해 영어 문장 안의 "kb" 약어 미오염). CLI binary 이름 (`[[bin]] name = "kb"`), 환경변수 `KB_*`, XDG paths, tracing target, 그리고 docs sweep 은 다음 commit 에서. ## 검증 - `cargo check --workspace` clean — 모든 crate 빌드 통과 후 commit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 03:28:08 +00:00

18 Commits