# p10-1D β€” C + C++ AST chunkers **Status:** 🟑 μ§„ν–‰ 쀑 **Contract sections:** Β§3.3 (chunker_version `code-c-ast-v1` + `code-cpp-ast-v1`), Β§3.4 (symbol path β€” C `func_name`, C++ `namespace::Class::method`), Β§3.5 (code_lang `c` + `cpp`, ext `.c`/`.h` / `.cpp`/`.cc`/`.cxx`/`.hpp`/`.hh`/`.hxx`), Β§6.1 (`kebab-parse-code/src/{c,cpp}.rs`), Β§6.2 (`kebab-chunk/src/code_{c,cpp}_ast_v1.rs`), Β§9.1 (Tier 1 AST per-language + oversize fallback), Β§10 (activation log). **Design:** [2026-05-15-kebab-code-ingest-design.md](../../docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md) Β§1D (C + C++ λΆ€λΆ„). **Plan:** [2026-05-21-p10-1d-c-cpp-ast-chunker.md](../../docs/superpowers/plans/2026-05-21-p10-1d-c-cpp-ast-chunker.md). ## Goal p10-1A-2 / 1B / 1C / p10-2 / p10-3 인프라 μœ„μ— C + C++ AST chunker 2쒅을 단일 PR 둜 ν™œμ„±ν™”. P10 의 Tier 1 chunker family λ§ˆμ§€λ§‰. λ¨Έμ§€ μ‹œμ λΆ€ν„° `.c` / `.h` / `.cpp` / `.cc` / `.cxx` / `.hpp` / `.hh` / `.hxx` 파일 dogfooding κ°€λŠ₯. `.h` κ°€ design λͺ…μ‹œλŒ€λ‘œ C λ§€ν•‘ β€” C++ ν”„λ‘œμ νŠΈμ˜ `.h` λŠ” tree-sitter-c 의 parse κ°€ namespace / template 같은 C++ syntax 에 μ‹€νŒ¨ν•  κ°€λŠ₯μ„±. μ‹€νŒ¨ μ‹œ p10-3 의 Tier 3 fallback 으둜 μžλ™ picked up (이미 wired). ## λ™κ²°λœ 섀계 κ²°μ • (이 task 둜 ν™•μ •) ### C extractor (`code-c-ast-v1`) - **Symbol** = function name only. design Β§3.4 κ·ΈλŒ€λ‘œ β€” no nesting, no namespace. 예: `parse_blocks`. - **Top-level units**: - `function_definition` (named) β†’ 1 unit, symbol = function name - `struct_specifier` (named, top-level) β†’ 1 unit, symbol = struct name - `enum_specifier` (named, top-level) β†’ 1 unit, symbol = enum name - `union_specifier` (named, top-level) β†’ 1 unit, symbol = union name - `declaration` (top-level β€” typedef / global var / fn prototype) β†’ glue `` - `preproc_include` / `preproc_def` / `preproc_function_def` / `preproc_ifdef` λ“± preprocessor β†’ glue `` - **Static / extern / inline fn**: 일반 fn κ³Ό 동일 처리 (storage class qualifier λ¬΄μ‹œ β€” symbol 은 declarator 의 fn name 만). - **Inner struct / enum μ•ˆμ˜ nested declaration** (C 도 κ°€λŠ₯): 1B Python class-nesting 미적용 β€” C 의 inner type 은 ν”μΉ˜ μ•Šκ³  outer κ°€ typedef wrapper 인 νŒ¨ν„΄μ΄λΌ top-level 만 emit. - **Empty file λ˜λŠ” unit 0개** β†’ `` post-pass (1A-2 νŒ¨ν„΄). ### C++ extractor (`code-cpp-ast-v1`) - **Symbol** = `namespace::Class::method` (design Β§3.4 κ·ΈλŒ€λ‘œ). namespace κ°€ μ—†μœΌλ©΄ `Class::method` λ˜λŠ” `func_name`. 예: `kebab::chunk::MdHeadingV1Chunker::chunk_doc`. - **Top-level units + recursion**: - `namespace_definition` (named) β†’ recurse with namespace name pushed (Python class-nesting + Java/Kotlin package-prefix hybrid). - **Anonymous namespace** (`namespace { ... }`) β†’ namespace name = `` push (Python `` νŒ¨ν„΄ 일관). - `class_specifier` / `struct_specifier` (top-level or in namespace or nested in class, named) β†’ recurse with class name pushed. - `function_definition` (top-level or in namespace or in class) β†’ 1 unit, symbol per nesting (`namespace::Class::method` / `namespace::func` / `Class::method` / `func_name`). - `template_declaration` β†’ λ‚΄λΆ€ declarator type 따라 recurse / emit (function template β†’ method emit, class template β†’ class recurse). template type params (``, ``) λŠ” symbol 미포함 (Go generic μ²˜λ¦¬μ™€ 동일). - `enum_specifier` (named) β†’ 1 unit, symbol per nesting. - `concept_definition` (C++20) β†’ 1 unit, symbol per nesting (treat as type-level definition). - `using_declaration` / `using_directive` / `preproc_include` / `preproc_def` λ“± β†’ glue ``. - `extern "C"` 블둝 μ•ˆμ˜ μ •μ˜ β†’ 일반 fn 처리 (block μžμ²΄λŠ” glue). - **Method out-of-class definition** (`Class::method` ν˜•νƒœλ‘œ namespace λ°–μ—μ„œ μ •μ˜): tree-sitter-cpp 의 `function_declarator` 의 `qualified_identifier` 따라 prefix 볡원 β€” declarator 의 `Class::method` μžμ²΄μ—μ„œ μΆ”μΆœ. - **Operator overload** (`operator+`, `operator()` λ“±): symbol = `Class::operator+` κ·ΈλŒ€λ‘œ. - **Constructor / destructor**: symbol = `Class::Class` / `Class::~Class` (convention). - **Empty file λ˜λŠ” unit 0개** β†’ `` post-pass. ### 곡톡 - **`` glue grouping**: preprocessor + global var + using μ„ μ–Έ λ“± 의미 λ‹¨μœ„ μ™Έ β†’ 1 glue chunk per file. - **Oversize fallback**: 1A-2 의 `AST_CHUNK_MAX_LINES = 200` 동일. - **`.h` 의 fallback 보μž₯**: C parser μ‹€νŒ¨ μ‹œ p10-3 의 Tier 3 fallback wrapper (이미 wired) κ°€ picked up β†’ `Citation::Code { symbol: None, lang: "c" }` + `code-text-paragraph-v1`. ### Module layout ``` crates/kebab-parse-code/src/ β”œβ”€β”€ c.rs [μ‹ κ·œ] β€” C AST extractor (PARSER_VERSION `tree-sitter-c-`) β”œβ”€β”€ cpp.rs [μ‹ κ·œ] β€” C++ AST extractor (PARSER_VERSION `tree-sitter-cpp-`) └── lib.rs [edit] β€” pub use + C_PARSER_VERSION / CPP_PARSER_VERSION μƒμˆ˜ λ…ΈμΆœ crates/kebab-chunk/src/ β”œβ”€β”€ code_c_ast_v1.rs [μ‹ κ·œ] β€” VERSION_LABEL `code-c-ast-v1`. 1A-2 νŒ¨ν„΄ (canonical Document β†’ Vec). β”œβ”€β”€ code_cpp_ast_v1.rs [μ‹ κ·œ] β€” VERSION_LABEL `code-cpp-ast-v1`. 동일 νŒ¨ν„΄. └── lib.rs [edit] β€” pub use 2개 crates/kebab-source-fs/src/media.rs [νŽΈμ§‘ λΆˆμš”] β€” code_lang_for_path μœ„μž„ νŒ¨ν„΄ κ·ΈλŒ€λ‘œ (Task C of p10-2 이후 단일 source of truth). crates/kebab-parse-code/src/lang.rs [νŽΈμ§‘ λΆˆμš”] β€” `.c`/`.h`/`.cpp` λ“± 맀핑은 1A-1 μ‹œμ λΆ€ν„° 이미 쑴재. crates/kebab-app/src/lib.rs [edit] β€” ingest_one_code_asset 의 allowlist + 4-arm match 에 "c" + "cpp" μΆ”κ°€. tier3 fallback list 에도 λ‘˜ μΆ”κ°€. crates/kebab-chunk/tests/ [μ‹ κ·œ] β”œβ”€β”€ fixtures/sample.c β€” C fixture (top-level fn + struct) β”œβ”€β”€ fixtures/sample.cpp β€” C++ fixture (namespace + class + method) β”œβ”€β”€ code_c_ast_snapshot.rs β€” C snapshot test └── code_cpp_ast_snapshot.rs β€” C++ snapshot test crates/kebab-app/tests/code_ingest_smoke.rs [edit] β€” 2 μ‹ κ·œ integration test (c + cpp). 16 + 2 = 18. Cargo.toml workspace.dependencies [edit] β€” tree-sitter-c + tree-sitter-cpp. crates/kebab-parse-code/Cargo.toml [edit] β€” μœ„ 2 dep μ‹ κ·œ entry. ``` ## Acceptance criteria - `cargo test --workspace --no-fail-fast -j 1` PASS (memory-conscious `-j 1`). - `cargo clippy --workspace --all-targets -- -D warnings` clean. - C fixture (`tests/fixtures/sample.c`) + C++ fixture (`tests/fixtures/sample.cpp`) ingest β†’ chunk snapshot μ•ˆμ •. C snapshot 의 chunks κ°€ λͺ¨λ‘ `Citation::Code { lang: "c", symbol: Some(), ... }`. C++ snapshot 의 chunks κ°€ namespace + class nesting 포함 (`kebab::chunk::Foo::bar`). - 격리 TempDir KB 에 `.c` / `.cpp` 파일 두고 `kebab search --code-lang c --json` / `--code-lang cpp --json` κ°€ 각각 `Citation::Code` λ°˜ν™˜. integration test `tier1_c_ingest_searchable` + `tier1_cpp_ingest_searchable` (κΈ°μ‘΄ 16 + 2 = 18). - `kebab schema --json | jq .stats.code_lang_breakdown` 에 `"c"` + `"cpp"` 카운트 λ“±μž₯ (.c/.cpp 파일 ingest ν›„). - README + HANDOFF + docs/ARCHITECTURE + docs/SMOKE + tasks/INDEX + tasks/p10/INDEX κ°±μ‹ . - frozen design 2026-04-27 Β§10 activation log ν•œ 쀄. - workspace `Cargo.toml` minor bump (0.15.0 β†’ 0.16.0), gitea-release v0.16.0. ## Allowed dependencies - `kebab-parse-code` 에 `tree-sitter-c` + `tree-sitter-cpp` workspace deps μΆ”κ°€. κΈ°μ‘΄ deps μœ μ§€. - `kebab-chunk` 의 μƒˆ λͺ¨λ“ˆ 2개 (`code_c_ast_v1.rs`, `code_cpp_ast_v1.rs`) β€” language-agnostic body, tree-sitter import κΈˆμ§€. κΈ°μ‘΄ `tier2_shared::build_chunk` (pub(crate)) μž¬μ‚¬μš©. - `kebab-app`, `kebab-source-fs` β€” μƒˆ crate dep μ—†μŒ. ## Forbidden dependencies - `kebab-chunk` κ°€ tree-sitter-c / tree-sitter-cpp 직접 import κΈˆμ§€ (boundary Β§6.3). - `kebab-parse-code` κ°€ store / embed / llm / rag 직접 import κΈˆμ§€. - UI crate (`kebab-cli` / `kebab-mcp` / `kebab-tui`) κ°€ `kebab-parse-code` / `kebab-chunk` 직접 import κΈˆμ§€ β€” `kebab-app` facade 만. ## Risks / notes - **tree-sitter-c / tree-sitter-cpp ν˜Έν™˜μ„±**: tree-sitter 0.26 (ν˜„μž¬ workspace) κ³Ό ν˜Έν™˜ ν•„μš”. resolve μ‹œ `tree-sitter-language` shim μ‚¬μš© fork (1C-JK 의 tree-sitter-kotlin-ng νŒ¨ν„΄) κ°€λŠ₯μ„± β€” crate.io 의 κ°€μž₯ ν™œλ°œν•œ maintainer μš°μ„ . μ‹€νŒ¨ μ‹œ 별도 fork κ²€ν† . - **`.h` parse μ‹€νŒ¨**: C++ 헀더 (`namespace`, `template`, `class`) λ₯Ό C parser κ°€ λ§Œλ‚˜λ©΄ partial parse + error nodes. 1A-2 의 extractor νŒ¨ν„΄μ΄ error node λ¬΄μ‹œ + recoverable parse μ§„ν–‰ β€” emit κ²°κ³Όκ°€ *λΆˆμ™„μ „* ν•  κ°€λŠ₯μ„±. 그럴 λ•Œ chunks κ°€ 0 으둜 λ–¨μ–΄μ§€λ©΄ p10-3 Tier 3 fallback 으둜 μžλ™ picked up (이미 wired). λΆ€λΆ„ emit μ‹œ μΌλΆ€λ§Œ 색인 β€” Tier 3 fallback μ•ˆ 함. dogfood ν›„ HOTFIXES κ²€ν† . - **Method out-of-class definition** (`Class::method` ν˜•μ‹): tree-sitter-cpp 의 `function_definition` 의 declarator κ°€ `qualified_identifier` 일 λ•Œ prefix 볡원. fixture 둜 검증. - **Template specialization** (`template<> class Foo`): tree-sitter-cpp 의 `template_declaration` μ•ˆμ˜ `class_specifier` name 만 μΆ”μΆœ β€” `Foo` 만 symbol 에 λ“€μ–΄κ°€κ³  `` 미포함. design 의 generic λ¬΄μ‹œ λ£° 일관. - **`extern "C"` block μ•ˆμ˜ fn**: 일반 fn 처리. μ™ΈλΆ€ wrapping block 은 glue. - **Anonymous union / struct** (`struct { int x; }` λ³€μˆ˜ μ•ˆμ—): ν”μΉ˜ μ•ŠμŒ + named 만 unit. anonymous λŠ” glue. - **typedef-wrapped struct/enum idiom** (`typedef struct { ... } Foo;`) β€” βœ… v0.17.0 (2026-05-24) PR-B μ—μ„œ ν•΄μ†Œ. extractor 의 `type_definition` λΆ„κΈ°κ°€ inner anonymous `struct_specifier` / `enum_specifier` / `union_specifier` λ₯Ό 탐지해 declarator 의 typedef alias μ΄λ¦„μœΌλ‘œ synthetic unit 방좜. `PARSER_VERSION` `code-c-v1` β†’ `code-c-v2` bump + same-workspace_path orphan purge cascade λ™λ°˜. **μž”μ—¬ λ―Έν•΄κ²°**: nested typedef (`typedef struct { struct {...} inner; } Outer;`) 의 inner 읡λͺ… struct λŠ” μ—¬μ „νžˆ glue β€” v2 의 1μ°¨ λ²”μœ„λŠ” top-level typedef alias 만. See [HOTFIXES.md 2026-05-21 entry](../HOTFIXES.md) (frozen κ΄€μ°°) + 2026-05-24 closure entry. - **Macro-heavy code** (Linux kernel λ“±): `#define FOO(x) ...` λ§€ν¬λ‘œκ°€ function-like 라도 parser κ°€ fn 으둜 인식 μ•ˆ 함. preprocessor glue 둜 처리 β€” symbol μ•ˆ 작힘. μ˜λ„λœ λ™μž‘ (parser 의 macro expansion μ•ˆ 함). - **`__attribute__((...))`** annotations: tree-sitter-c 의 attribute λ…Έλ“œλŠ” declarator μ˜† sibling. λ¬΄μ‹œ κ°€λŠ₯. function name μΆ”μΆœμ— 영ν–₯ μ—†μŒ. - **fixture 크기**: sample.c λŠ” ~30 line (top-level fn + struct + enum + preprocessor), sample.cpp λŠ” ~50 line (nested namespace + class + method + template + free fn). oversize fallback 의 별도 검증은 1A-2 의 long_section_snapshot νŒ¨ν„΄μ΄ 이미 cover (ν•„μš” μ‹œ 별도 fixture). - **λ¨Έμ§€ ν›„ deviation** 은 `tasks/HOTFIXES.md` dated 둜그 + λ³Έ spec `Risks / notes` cross-link.