feat(p10-1C-JK): Java + Kotlin AST chunkers — JVM family 코드 색인 활성화 #152
Reference in New Issue
Block a user
Delete Branch "feat/p10-1c-jk"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
요약
1C 의 후반부 — JVM family (Java + Kotlin) 묶음 PR. 1C-Go (PR #151 / v0.12.0) 의 자매. 머지 시점부터
.java/.kt/.kts파일 dogfooding 가능. 1C phase 완성.frozen design: §1C, §3.4 Java/Kotlin row (
package.Class.method).task spec (frozen on merge):
tasks/p10/p10-1c-jk-ast-chunker.md.plan:
docs/superpowers/plans/2026-05-20-p10-1c-jk-ast-chunker.md.동결된 설계 결정
package선언 (design §3.4, 1C-Go 모델 동일):package_declaration→scoped_identifier/identifier.package_header→qualified_identifier.package.Class.method(예:com.kebab.chunk.MdHeadingV1Chunker.chunkDoc). 1B Python 의 class-nesting 패턴.tree-sitter-kotlin = "0.3.8"가 tree-sitter 0.21-0.23 만 지원 → 0.26 incompatible.tree-sitter-kotlin-ng v1.1.0(tree-sitter-grammars org 의 maintained fork, Amaan Qureshi 유지) 가tree-sitter-language = "0.1"shim 으로 0.24+ 호환.Java 핵심 특징
program(1C-Go 의source_file과 다름).line_comment/block_comment(Go 의comment와 다름) —unit_start가 두 변종 모두 fold.<pkg>.<Class>.<ClassName>(Java 의 class-name-as-method-name 관습).compact_constructor_declaration(record canonical constructor) 동일 처리.Kotlin 핵심 특징 (grammar quirks 8건 정확 처리)
source_file.qualified_identifierfor package (Java 의scoped_identifier와 다름).class_declarationfor class / interface / enum class / data class / sealed class — modifier 로 구별.class_modifier텍스트"enum"감지로 enum body recursion skip.bodyfield 없음 —class_body(regular) /enum_class_body(enum) 직접 lookup.companion_objectown node — name optional → 기본값Companion. Symbol:<pkg>.<Class>.Companion, methods:<pkg>.<Class>.Companion.<method>.function_declaration지원 (Kotlin-specific) →<pkg>.<fn>. Inside class →<pkg>.<Class>.<method>(same node kind, differentmod_path).secondary_constructor— name field 없음 → 클래스 이름 inherit (Java duplication 관습 보존).import(singular, no_declarationsuffix) — glue.변경
tree-sitter-java = "0.23",tree-sitter-kotlin-ng = "1.1".kebab-parse-code:java.rs(461 lines) +kotlin.rs(~470 lines) + 2 fixtures.kebab-chunk:code-java-ast-v1+code-kotlin-ast-v1chunkers (1A-2 near-duplicate).kebab-source-fs:.java/.kt/.ktsrouting.kebab-app: dispatch allowlist + 4 match arm × 2 langs 활성화.0.12.0 → 0.13.0(minor — surface 확장).검증
cargo test --workspace --no-fail-fast -j 1→ exit 0, all greencargo clippy --workspace --all-targets -- -D warnings→ cleancom.foo.Foo.barsymbolcom.foo.Bar.bazcom.foo.freeFn✅ (Kotlin-specific 동작 입증)code_lang_breakdown: { "java": 1, "kotlin": 1 }영향
Citation::Code그대로, lang"java"/"kotlin"값 확장).1C phase 완성
다음 phase: p10-1D (C + C++).
🤖 Generated with Claude Code
Uses tree-sitter-kotlin-ng (bare tree-sitter-kotlin is stuck on tree-sitter 0.21-0.23, incompatible with our 0.26). Mirrors JavaAstExtractor (JVM family, source-side package extraction + class-nesting) with Kotlin grammar quirks: - Root is `source_file`, not `program`. - `package_header` child is `qualified_identifier` (its slice text is the dotted path); the bare `identifier` shape is also accepted as a fallback. - `class_declaration` is the single node kind for `class` / `data class` / `sealed class` / `interface` / `enum class` — distinguished only by its `modifiers` child. Body is `class_body` for non-enum, `enum_class_body` for enum class; neither carries a `body` field name, so the extractor looks the body up by node kind rather than `child_by_field_name("body")`. - `companion_object` is its own node kind (NOT object_declaration with a modifier); its `name` field is optional, so the extractor fills in the implicit Kotlin convention name `Companion`. - `function_declaration` is allowed at top level (unlike Java), emitted as `<pkg>.<fn_name>`; the same node kind nested in `class_body` becomes `<pkg>.<...>.<Class>.<method>` via the same mod_path mechanism. - `secondary_constructor` has no `name` field; symbol uses the enclosing class name (Java duplication convention: `<pkg>.<...>.<Class>.<Class>`). - Enum bodies (`enum_class_body`) are NOT recursed — `enum_entry` is not emitted as a unit (matches Java 1차 scope). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>회차 1 — APPROVE: Java + Kotlin AST chunkers (p10-1C-JK) 전체 검증 완료 — 1C phase 완성
요약
12 커밋 / 27 파일 / +3470 -38 전량 리뷰 + 테스트 직접 실행. 모든 검증 통과. merge 준비 완료.
검증 결과
cargo test -p kebab-parse-codecargo test -p kebab-chunkcargo test -p kebab-source-fscargo test -p kebab-app --test code_ingest_smokecargo test -p kebab-app --test twin_files_idempotentcargo test -p kebab-app --test file_deletion_auto_purgecargo test -p kebab-app --test reset_orphanscargo test -p kebab-app --test twin_files_fetch_spancargo test -p kebab-app --libcargo clippy --workspace --all-targets -- -D warningsJava correctness
extract_package:programroot →package_declaration→scoped_identifier|identifier텍스트. 정확.<pkg>.<Class>.<ClassName>(name 중복):java_units_match_design_3_4_symbols가MdHeadingV1Chunker.MdHeadingV1Chunkerassert ✅Builder+ methods:Builder.withName,Builder.buildassert ✅line_comment/block_comment두 종 모두unit_startfold (Javadoc 포함) ✅compact_constructor_declaration은walk_body에서만 처리 — Java 문법상 record body 내에서만 등장하므로 올바른 배치.Kotlin correctness (8 quirk)
PR body 에 열거된 8개 quirk 모두 구현에서 정확하게 처리됨.
kotlin_units_match_design_3_4_symbols가 Companion · freeFunction · Singleton · Singleton.ping ·<top-level>를 모두 assert → 회귀 보호 충분.의존성 경계
cargo tree -p kebab-chunk --depth 1: tree-sitter 계열 없음 ✅ (design §6.3 준수)cargo tree -p kebab-parse-code --depth 1:tree-sitter-java v0.23.5+tree-sitter-kotlin-ng v1.1.0만 추가 ✅tree-sitter-kotlin-ng선택 근거 (bare tree-sitter-kotlin0.21–0.23 고착) 이 Cargo.toml 주석 + ARCHITECTURE + PR body 에 문서화됨 ✅문서
README (
.java/.kt/.kts+--code-lang java/kotlinflags + symbol 예시) · HANDOFF (P10 row) · ARCHITECTURE (parser 목록 + ng 선택 근거) · SMOKE (1C-JK 검증 항목) · tasks/INDEX + tasks/p10/INDEX · design §10.1 — 모두 정확하게 갱신됨.minor note (블로커 아님)
compact_constructor_declaration(Java Record canonical constructor) +annotation_type_declaration이tests/fixtures/sample.java에 없어 단위 커버리지 공백. 코드 경로는 존재하고 clippy 통과. 1차 cut 범위에서 허용 가능 — 추후 Record 타입 픽스처 추가 시 보완 가능.🤖 Generated with Claude Code
[E2E smoke 9개 모두 green ✓]
java_file_ingests_and_searches_as_code_citation+kotlin_file_ingests_and_searches_as_code_citation포함 9개 pass. 각 테스트가 parser_version · chunker_version · citation.lang · citation.symbol (fully-qualified) · SearchHit.code_lang 를 모두 assert — end-to-end wiring 검증 충분합니다.@@ -0,0 +1,322 @@//! `code-java-ast-v1` — maps a tree-sitter-derived Java AST[chunker 경계 — design §6.3 준수 ✓]
tree-sitter/tree-sitter-javaimport 없음.cargo tree -p kebab-chunk --depth 1에서 tree-sitter 계열 crate 가 보이지 않음을 직접 확인.code-kotlin-ast-v1동일.kebab_core타입만 소비하는 순수 chunker 경계 유지.@@ -0,0 +1,543 @@//! `kebab-parse-code::java` — tree-sitter Java AST extractor (P10-1C-JK Task D).[Java 타입 선언 5종 + constructor 처리 확인 — LGTM]
extract_package:programroot →package_declaration→scoped_identifier|identifier텍스트. 5 type 선언 (class/interface/enum/record/annotation_type) 모두walk_top에서 처리됨.compact_constructor_declaration은 record body 안에서만 등장하므로walk_body전담 처리가 맞습니다.unit_start가line_comment/block_comment두 종 모두 fold 하고 있어 Javadoc(/** ... */=block_comment) 도 올바르게 처리됩니다.minor coverage note:
compact_constructor_declaration(Java Record canonical constructor) 와annotation_type_declaration은tests/fixtures/sample.java픽스처에 없습니다 — 코드 경로는 있으나 단위 검증은 없습니다. 1차 cut 범위에서는 허용 가능 수준이며, 실제 Java 코드베이스 ingest 시 첫 커버리지를 얻게 됩니다.@@ -0,0 +1,627 @@//! `kebab-parse-code::kotlin` — tree-sitter Kotlin AST extractor (P10-1C-JK Task G).[Kotlin grammar quirk 8개 일괄 확인 — LGTM]
헤더 주석이 8개 quirk 를 정확하게 열거하고 있고, 구현도 일치 확인:
source_file✓ (build_blocks의tree.root_node())qualified_identifierfor package ✓ (extract_package—package_header→qualified_identifier텍스트 슬라이스)class_declaration+class_decl_is_enum✓ (modifier walk →class_modifiertext =="enum")class_body/enum_class_body—first_child_of_kinds✓ (field name 없으므로 kind 기반 lookup)companion_object+ optional name →"Companion"fallback ✓ (node_name_text(...).unwrap_or("Companion"))function_declaration✓ (walk_top"function_declaration"arm)secondary_constructor—mod_path.last()로 class 이름 상속 ✓"import"(singular, no_declarationsuffix) as glue ✓kotlin_units_match_design_3_4_symbols테스트가 Companion·freeFunction·Singleton·Singleton.ping·<top-level>를 모두 assert 하므로 quirk 결과에 대한 회귀 보호가 충분합니다.@@ -0,0 +1,69 @@# p10-1C-JavaKotlin — Java + Kotlin AST chunkers[task spec 내 pre-impl 부정확 — 동결 후 HOTFIXES 불필요]
spec 내 "tree-sitter-kotlin 의
package_header→identifier텍스트" 서술이 실제 grammar 과 다릅니다 — 구현은qualified_identifier를 올바르게 사용합니다. 계획 작성 시 grammar 을 미리 확인하지 않아 생긴 사전 오류이며, 동작 편차 아님 — HOTFIXES 대상이 아닙니다. task spec 은 merge 시 동결되므로 수정 불필요.