Tasks: workspace deps / C extractor / C++ extractor / C chunker + snapshot / C++ chunker + snapshot / ingest dispatch + tier3_fallback_cv extension / 2 smoke tests / frozen design §10 / docs sync / workspace test gate / version bump 0.15.0 → 0.16.0 + gitea PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
34 KiB
p10-1D C + C++ AST Chunkers Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Activate C + C++ code ingest end-to-end. P10 Tier 1 chunker family final entry.
Architecture: Same shape as 1B (multi-language single PR) and 1C-JK (JVM family). 2 new tree-sitter grammars + 2 extractors + 2 chunkers + media routing (delegated via code_lang_for_path, no change) + app dispatch arms. C symbol = function name only; C++ symbol = namespace::Class::method via recursive class/namespace nesting (Java/Kotlin + Python hybrid).
Tech Stack: Rust 2024 workspace, tree-sitter 0.26 (already), tree-sitter-c + tree-sitter-cpp (NEW). 1A-2/1B/1C/p10-2/p10-3 infrastructure unchanged.
Memory note: Host has been OOM'd previously (재부팅 사례). Per-crate cargo only. ONE full-suite + clippy invocation in Task J. NO cargo test --workspace outside that gate.
Pre-flight
Branch feat/p10-1d-c-cpp already exists (spec commit 8add684).
- Disk hygiene:
df -h /점검. 80% 넘으면cargo clean.
Reference files:
- 1C-JK extractor:
crates/kebab-parse-code/src/{java,kotlin}.rs— closest template for source-side identifier prefix (package vs namespace). - 1B Python extractor:
crates/kebab-parse-code/src/python.rs— class-nesting recursion model (relevant for C++ class nesting). - 1A-2 chunker:
crates/kebab-chunk/src/code_rust_ast_v1.rs— duplicate-with-substitution pattern. - 1B/1C/p10-2/p10-3 dispatch generalization:
crates/kebab-app/src/lib.rs::ingest_one_code_asset(~L1796–2116). Current allowlist + 4-arm match. - spec:
tasks/p10/p10-1d-c-cpp-ast-chunker.md.
Task A: Workspace deps (tree-sitter-c + tree-sitter-cpp)
Files:
-
Modify:
Cargo.toml([workspace.dependencies], aftertree-sitter-kotlin-ng) -
Modify:
crates/kebab-parse-code/Cargo.toml -
Step 1:
cargo add tree-sitter-c tree-sitter-cpp -p kebab-parse-code. If either crate's actively-maintained name differs (e.g.tree-sitter-cppvstree-sitter-cpp-ng), verify on crates.io. Thetree-sitter-c0.24 /tree-sitter-cpp0.23 line is the most common; verify compatibility with workspacetree-sitter = "0.26"(likely already supported via thetree-sitter-languageshim). -
Step 2: Lift the two resolved versions into
[workspace.dependencies](aftertree-sitter-kotlin-ng):
# C/C++ family grammars for code ingest (kebab-parse-code, p10-1D).
tree-sitter-c = "<resolved>"
tree-sitter-cpp = "<resolved>"
Switch crate's Cargo.toml entries to { workspace = true }.
-
Step 3:
cargo build -p kebab-parse-code→ clean. Unused dep warning is fine. -
Step 4: Commit:
git add Cargo.toml Cargo.lock crates/kebab-parse-code/Cargo.toml
git commit -m "$(cat <<'EOF'
build(p10-1d): add tree-sitter-c + tree-sitter-cpp workspace deps
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"
If a crate's resolved name has a non-obvious fork suffix (e.g. tree-sitter-cpp-ng), document it in the commit body.
Task B: C AST extractor (kebab-parse-code/src/c.rs)
Files:
-
Create:
crates/kebab-parse-code/src/c.rs -
Modify:
crates/kebab-parse-code/src/lib.rs(pub mod +C_PARSER_VERSIONconst) -
Step 1: Create
crates/kebab-parse-code/src/c.rs. Mirrorcrates/kebab-parse-code/src/go.rs(closest template — single-language, no namespace/package nesting, top-level units). Replace tree-sitter-go with tree-sitter-c:
//! p10-1D: C AST extractor.
use crate::traits::{Extractor, ExtractContext};
use anyhow::{Context, Result};
use kebab_core::{Block, BlockId, CanonicalDocument, CodeBlock, CommonBlock, /*..*/, SourceSpan, id_for_block, id_for_doc};
use tree_sitter::Parser;
pub const C_PARSER_VERSION: &str = concat!("tree-sitter-c-", env!("CARGO_PKG_VERSION"));
// Or use the tree-sitter-c crate version: better to hardcode for stability.
// Look at how go.rs / rust.rs / etc. set their PARSER_VERSION.
pub struct CAstExtractor {
parser: Parser,
}
impl CAstExtractor {
pub fn new() -> Self {
let mut parser = Parser::new();
parser.set_language(&tree_sitter_c::LANGUAGE.into()).expect("load tree-sitter-c");
Self { parser }
}
}
impl Extractor for CAstExtractor {
fn extract(&mut self, ctx: &ExtractContext, bytes: &[u8]) -> Result<CanonicalDocument> {
// ... mirror go.rs:
// 1. parse the tree
// 2. iterate source_file's named_children
// 3. for each top-level node:
// - function_definition → emit unit (symbol = fn name)
// - struct_specifier (named) → emit unit (symbol = struct name)
// - enum_specifier (named) → emit unit (symbol = enum name)
// - union_specifier (named) → emit unit (symbol = union name)
// - declaration → glue
// - preproc_include / preproc_def / preproc_function_def / preproc_ifdef → glue
// - else → glue
// 4. <top-level> glue chunk if any glue accumulated
// 5. <module> post-pass if 0 units
// ...
todo!("mirror go.rs structure with C-specific node-kind names")
}
}
ACTION: Read crates/kebab-parse-code/src/go.rs in full first. It's the closest template — single-language, no namespace prefix to thread through (C is even simpler than Go since there's no package). Port the structure: parse → iterate top-level → match on node-kind → emit units or accumulate glue.
Node-kind name reference (tree-sitter-c): function_definition, struct_specifier, enum_specifier, union_specifier, declaration, preproc_*. Confirm by checking the crate's node-types.json if uncertain.
Function name extraction: function_definition has a declarator field. The innermost identifier of that declarator is the function name. Mirror how go.rs extracts function names — it uses tree-sitter field traversal.
- Step 2: Register the module in
crates/kebab-parse-code/src/lib.rs:
pub mod c;
pub use c::{CAstExtractor, C_PARSER_VERSION};
- Step 3: Build:
cargo build -p kebab-parse-code 2>&1 | tail -5
Expected: clean.
- Step 4: Commit (no test yet — Task D adds the snapshot test):
git add crates/kebab-parse-code/src/c.rs crates/kebab-parse-code/src/lib.rs
git commit -m "$(cat <<'EOF'
feat(p10-1d): C AST extractor (tree-sitter-c)
Top-level units: function_definition (symbol = fn name), struct_specifier,
enum_specifier, union_specifier (each emits 1 unit with the symbol being
the named identifier). Preprocessor directives + top-level declarations
group into a <top-level> glue chunk. Empty file or zero units → <module>
post-pass.
C symbol = function name only — no namespace, no class nesting (design §3.4).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"
Task C: C++ AST extractor (kebab-parse-code/src/cpp.rs)
Files:
-
Create:
crates/kebab-parse-code/src/cpp.rs -
Modify:
crates/kebab-parse-code/src/lib.rs -
Step 1: Create
crates/kebab-parse-code/src/cpp.rs. The closest template iscrates/kebab-parse-code/src/java.rs(1C-JK) — it handles package prefix + class nesting via recursion. C++ adds namespace nesting (multiple levels possible).
Pseudocode:
//! p10-1D: C++ AST extractor.
use crate::traits::{Extractor, ExtractContext};
use anyhow::{Context, Result};
use kebab_core::{/* ... */};
use tree_sitter::{Node, Parser};
pub const CPP_PARSER_VERSION: &str = "tree-sitter-cpp-<resolved>";
pub struct CppAstExtractor { parser: Parser }
impl CppAstExtractor {
pub fn new() -> Self {
let mut parser = Parser::new();
parser.set_language(&tree_sitter_cpp::LANGUAGE.into()).expect("load tree-sitter-cpp");
Self { parser }
}
fn visit(&self, node: Node, source: &[u8], prefix: &[&str], units: &mut Vec<(String, Node)>, glue: &mut Vec<Node>) {
// prefix is the namespace/class chain so far (e.g. ["kebab", "chunk", "MdHeadingV1Chunker"]).
for child in node.named_children(&mut node.walk()) {
match child.kind() {
"namespace_definition" => {
let name = child.child_by_field_name("name")
.and_then(|n| n.utf8_text(source).ok())
.unwrap_or("<anonymous>");
let mut new_prefix = prefix.to_vec();
new_prefix.push(name);
let body = child.child_by_field_name("body").unwrap_or(child);
self.visit(body, source, &new_prefix, units, glue);
}
"class_specifier" | "struct_specifier" if child.child_by_field_name("name").is_some() => {
let name = child.child_by_field_name("name")
.and_then(|n| n.utf8_text(source).ok())
.unwrap_or("<anonymous>");
// Emit the class itself as a unit.
let symbol = build_symbol(prefix, &[], name); // e.g. "kebab::chunk::Foo"
units.push((symbol, child));
// Recurse for nested classes / methods.
let mut new_prefix = prefix.to_vec();
new_prefix.push(name);
let body = child.child_by_field_name("body").unwrap_or(child);
self.visit(body, source, &new_prefix, units, glue);
}
"function_definition" => {
// declarator may be qualified_identifier (out-of-class def) or plain identifier.
let symbol = extract_fn_symbol(child, source, prefix);
units.push((symbol, child));
// Do NOT recurse into function body — inner classes/lambdas left to a future revision.
}
"template_declaration" => {
// Recurse: unwrap to inner declarator (function_definition or class_specifier)
// and treat it as if it were directly there. Template params NOT in symbol.
self.visit(child, source, prefix, units, glue);
}
"enum_specifier" if child.child_by_field_name("name").is_some() => {
let name = child.child_by_field_name("name").and_then(|n| n.utf8_text(source).ok()).unwrap_or("<anonymous>");
let symbol = build_symbol(prefix, &[], name);
units.push((symbol, child));
}
"concept_definition" => {
let name = /* extract */;
let symbol = build_symbol(prefix, &[], &name);
units.push((symbol, child));
}
_ => glue.push(child),
}
}
}
}
fn build_symbol(prefix: &[&str], extras: &[&str], leaf: &str) -> String {
// Join with ::
let mut parts: Vec<&str> = prefix.iter().copied().collect();
parts.extend_from_slice(extras);
parts.push(leaf);
parts.join("::")
}
fn extract_fn_symbol(node: Node, source: &[u8], prefix: &[&str]) -> String {
// function_definition.declarator may be a function_declarator wrapping a
// qualified_identifier (out-of-class def like `void Foo::bar(){}`) or a
// plain identifier (free fn or in-namespace fn).
// Need to walk down to the leaf identifier and any qualifier chain.
// For qualified_identifier "Foo::bar::baz", break into ["Foo", "bar"] qualifier + "baz" leaf.
// ...
todo!("walk declarator → qualified_identifier → assemble symbol with prefix")
}
// Extractor impl: parse, visit(root, ...), emit chunks-of-blocks per (symbol, node) pair + <top-level> glue + <module> fallback.
This is the most intricate extractor in p10-1D. Action: read crates/kebab-parse-code/src/java.rs for the recursion pattern, then crates/kebab-parse-code/src/python.rs for the class-nesting pattern, and combine. tree-sitter-cpp's node-types.json (or a quick tree-sitter parse against a sample file) confirms exact node-kind names.
- Step 2: Register in
crates/kebab-parse-code/src/lib.rs:
pub mod cpp;
pub use cpp::{CppAstExtractor, CPP_PARSER_VERSION};
- Step 3: Build:
cargo build -p kebab-parse-code 2>&1 | tail -5
Expected: clean.
- Step 4: Commit:
git add crates/kebab-parse-code/src/cpp.rs crates/kebab-parse-code/src/lib.rs
git commit -m "$(cat <<'EOF'
feat(p10-1d): C++ AST extractor (tree-sitter-cpp)
Symbol = namespace::Class::method via recursive visit. namespace_definition
pushes namespace name (anonymous → <anonymous>). class_specifier / struct_specifier
(named) emit class unit + recurse with class name pushed. function_definition
emits method unit (symbol may include qualified_identifier prefix for
out-of-class definitions). template_declaration unwraps to inner declarator
(template params NOT in symbol). enum_specifier + concept_definition emit
type-level units. extern "C" block content + using/include/define → glue.
Constructor / destructor symbols use Class::Class / Class::~Class
convention. Operator overloads keep operator+ form.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"
Task D: C chunker + snapshot test
Files:
-
Create:
crates/kebab-chunk/src/code_c_ast_v1.rs -
Create:
crates/kebab-chunk/tests/fixtures/sample.c -
Create:
crates/kebab-chunk/tests/code_c_ast_snapshot.rs -
Modify:
crates/kebab-chunk/src/lib.rs -
Step 1: Create
crates/kebab-chunk/src/code_c_ast_v1.rs. Mirrorcrates/kebab-chunk/src/code_go_ast_v1.rs(closest 1-extractor pattern, no nesting):
//! p10-1D: C AST chunker.
use crate::tier2_shared::build_chunk;
use crate::{Chunker, ChunkPolicy};
use anyhow::Result;
use kebab_core::{Block, Chunk, Document};
pub const VERSION_LABEL: &str = "code-c-ast-v1";
pub struct CodeCAstV1Chunker;
impl Chunker for CodeCAstV1Chunker {
fn chunker_version(&self) -> &'static str { VERSION_LABEL }
fn policy_hash(&self, policy: &ChunkPolicy) -> String {
crate::tier2_shared::policy_hash(policy)
}
fn chunk(&self, doc: &Document, policy: &ChunkPolicy) -> Result<Vec<Chunk>> {
// Mirror code_go_ast_v1.rs's body — iterate doc.blocks, each Block::Code
// contributes 1 chunk via build_chunk. Apply oversize fallback per block
// via tier2_shared::push_chunks_with_oversize.
// ...
todo!("mirror code_go_ast_v1.rs verbatim, substituting VERSION_LABEL")
}
}
Read code_go_ast_v1.rs and port verbatim — the language-agnostic body iterates doc.blocks and emits chunks. Only the VERSION_LABEL and (potentially) symbol formatting helper change.
- Step 2: Create
tests/fixtures/sample.c(~30 lines, includes top-level fn, struct, enum, preprocessor):
#include <stdio.h>
#include <stdlib.h>
#define MAX_BUF 4096
typedef enum {
OK = 0,
ERR_PARSE,
ERR_IO,
} status_t;
typedef struct {
int id;
char name[64];
status_t status;
} record_t;
static int counter = 0;
int parse_record(const char *line, record_t *out) {
if (line == NULL || out == NULL) return ERR_PARSE;
return OK;
}
void print_record(const record_t *r) {
printf("[%d] %s (status=%d)\n", r->id, r->name, r->status);
}
int main(void) {
record_t r = { .id = 1, .name = "foo", .status = OK };
print_record(&r);
return 0;
}
Expected snapshot: 3 function units (parse_record, print_record, main) + 1 enum unit (status_t) + 1 struct unit (record_t) + 1 <top-level> glue (preproc + global var). Total ~6 chunks.
- Step 3: Create
tests/code_c_ast_snapshot.rsmirroringtests/code_go_ast_snapshot.rs. Assertions:
// Pseudocode:
// 1. Load fixture sample.c
// 2. Run CAstExtractor → Document
// 3. Run CodeCAstV1Chunker.chunk(&doc, &policy)
// 4. Assert chunks.len() == expected (6).
// 5. Assert symbols (from chunks[i].source_spans[0]::SourceSpan::Code.symbol) match expected list:
// ["status_t", "record_t", "parse_record", "print_record", "main", "<top-level>"]
// (order matches AST traversal order — verify by running once.)
// 6. Assert all chunks have lang = Some("c").
- Step 4: Register module in
crates/kebab-chunk/src/lib.rs:
pub mod code_c_ast_v1;
pub use code_c_ast_v1::CodeCAstV1Chunker;
- Step 5: Run test:
cargo test -p kebab-chunk --test code_c_ast_snapshot -- --nocapture 2>&1 | tail -25
Expected: PASS. If chunk count or symbol order differs from expectation, INSPECT the actual output and update the test's expected list to match (run once to learn, codify on second run).
- Step 6: Clippy + commit:
cargo clippy -p kebab-chunk --all-targets -- -D warnings
git add crates/kebab-chunk/src/code_c_ast_v1.rs \
crates/kebab-chunk/src/lib.rs \
crates/kebab-chunk/tests/fixtures/sample.c \
crates/kebab-chunk/tests/code_c_ast_snapshot.rs
git commit -m "$(cat <<'EOF'
feat(p10-1d): code-c-ast-v1 chunker + snapshot test
Mirrors code-go-ast-v1's chunker pattern (1 chunk per AST unit + <top-level>
glue + oversize fallback). Snapshot test against tests/fixtures/sample.c
(function + struct + enum + preprocessor) verifies symbol order + lang=c
stamping.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"
Task E: C++ chunker + snapshot test
Files:
-
Create:
crates/kebab-chunk/src/code_cpp_ast_v1.rs -
Create:
crates/kebab-chunk/tests/fixtures/sample.cpp -
Create:
crates/kebab-chunk/tests/code_cpp_ast_snapshot.rs -
Modify:
crates/kebab-chunk/src/lib.rs -
Step 1: Create
code_cpp_ast_v1.rs. Mirrorcode_c_ast_v1.rsverbatim, only VERSION_LABEL differs:
pub const VERSION_LABEL: &str = "code-cpp-ast-v1";
pub struct CodeCppAstV1Chunker;
impl Chunker for CodeCppAstV1Chunker {
fn chunker_version(&self) -> &'static str { VERSION_LABEL }
// ... identical body — both languages use the same Block::Code → Chunk emission ...
}
The actual symbol-formatting work happens in the EXTRACTOR (Task C). The chunker's job is to iterate blocks the extractor produced and emit Chunks. Both C and C++ chunkers are essentially identical bodies.
- Step 2: Create
tests/fixtures/sample.cpp(~50 lines, includes namespace + nested class + method + free fn + template):
#include <string>
#include <vector>
namespace kebab {
namespace chunk {
class MdHeadingV1Chunker {
public:
MdHeadingV1Chunker() = default;
~MdHeadingV1Chunker() = default;
std::string chunk_doc(const std::string& doc) {
return doc;
}
int operator()(int x) const {
return x * 2;
}
private:
int counter_ = 0;
};
template <typename T>
T identity(T value) {
return value;
}
} // namespace chunk
void global_helper() {
// free function in kebab namespace
}
} // namespace kebab
int main() {
kebab::chunk::MdHeadingV1Chunker c;
return 0;
}
Expected snapshot symbols (verify on first run, then codify):
kebab::chunk::MdHeadingV1Chunker(class unit)kebab::chunk::MdHeadingV1Chunker::MdHeadingV1Chunker(constructor)kebab::chunk::MdHeadingV1Chunker::~MdHeadingV1Chunker(destructor)kebab::chunk::MdHeadingV1Chunker::chunk_dockebab::chunk::MdHeadingV1Chunker::operator()kebab::chunk::identity(template fn)kebab::global_helpermain(free fn, no namespace)<top-level>(include + using)
~9 chunks total.
-
Step 3: Create
tests/code_cpp_ast_snapshot.rsmirroringcode_c_ast_snapshot.rs. Assert symbol list matches expected (run once to learn the actual order, codify). -
Step 4: Register module in
lib.rs:
pub mod code_cpp_ast_v1;
pub use code_cpp_ast_v1::CodeCppAstV1Chunker;
- Step 5: Run test:
cargo test -p kebab-chunk --test code_cpp_ast_snapshot -- --nocapture 2>&1 | tail -30
Expected: PASS.
- Step 6: Clippy + commit:
cargo clippy -p kebab-chunk --all-targets -- -D warnings
git add crates/kebab-chunk/src/code_cpp_ast_v1.rs \
crates/kebab-chunk/src/lib.rs \
crates/kebab-chunk/tests/fixtures/sample.cpp \
crates/kebab-chunk/tests/code_cpp_ast_snapshot.rs
git commit -m "$(cat <<'EOF'
feat(p10-1d): code-cpp-ast-v1 chunker + snapshot test
Identical chunker body to code-c-ast-v1; per-language work happens in the
CppAstExtractor (Task C). Snapshot fixture covers nested namespace +
class + ctor/dtor + method + operator overload + template fn + free fn +
top-level main, verifying namespace::Class::method symbol convention per
design §3.4.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"
Task F: ingest_one_code_asset dispatch + tier3 fallback list extension
Files:
-
Modify:
crates/kebab-app/src/lib.rs -
Step 1: Top-of-file
use kebab_chunk::{...}extend withCodeCAstV1Chunker+CodeCppAstV1Chunker:
use kebab_chunk::{
/* existing items */,
CodeCAstV1Chunker,
CodeCppAstV1Chunker,
};
- Step 2: Allowlist (around line 953) extend:
if matches!(lang.as_str(),
"rust" | "python" | "typescript" | "javascript" | "go" | "java" | "kotlin"
| "yaml" | "dockerfile" | "toml" | "json" | "xml" | "groovy" | "go-mod"
| "shell"
| "c" | "cpp")
- Step 3:
parser_versionmatch — add C/C++ arms (Tier 1, so they DO get a real parser version):
let parser_version = match code_lang {
// ... existing 7 Tier 1 + Tier 2 + shell arms ...
"c" => ParserVersion(kebab_parse_code::C_PARSER_VERSION.to_string()),
"cpp" => ParserVersion(kebab_parse_code::CPP_PARSER_VERSION.to_string()),
other => anyhow::bail!("unsupported code_lang: {other}"),
};
- Step 4:
chunker_versionmatch — add C/C++ arms:
let chunker_version = match code_lang {
// ... existing arms ...
"c" => CodeCAstV1Chunker.chunker_version(),
"cpp" => CodeCppAstV1Chunker.chunker_version(),
other => anyhow::bail!("unreachable chunker_version: {other}"),
};
- Step 5:
canonical_resultextract match — add C/C++ arms:
let canonical_result: anyhow::Result<CanonicalDocument> = match code_lang {
"rust" => RustAstExtractor::new().extract(&ctx, &bytes).context("..."),
// ... existing ...
"c" => CAstExtractor::new().extract(&ctx, &bytes)
.context("kb-parse-code::CAstExtractor::extract (code:c)"),
"cpp" => CppAstExtractor::new().extract(&ctx, &bytes)
.context("kb-parse-code::CppAstExtractor::extract (code:cpp)"),
// ... Tier 2 + shell ...
other => anyhow::bail!("unreachable (extract): {other}"),
};
(Add use kebab_parse_code::{CAstExtractor, CppAstExtractor}; at the top if not already wildcard-imported.)
- Step 6:
chunks_resultmatch — add C/C++ arms:
let chunks_result: anyhow::Result<Vec<Chunk>> = if extract_fell_back {
// ... existing ...
} else {
match code_lang {
"rust" => CodeRustAstV1Chunker.chunk(&canonical, chunk_policy).context("..."),
// ... existing ...
"c" => CodeCAstV1Chunker.chunk(&canonical, chunk_policy)
.context("kb-chunk::CodeCAstV1Chunker::chunk (code:c)"),
"cpp" => CodeCppAstV1Chunker.chunk(&canonical, chunk_policy)
.context("kb-chunk::CodeCppAstV1Chunker::chunk (code:cpp)"),
// ... existing ...
other => anyhow::bail!("unreachable (chunk): {other}"),
}
};
- Step 7:
tier3_fallback_cv(p10-3 Critical fix) — C/C++ are fallback-eligible (extract may fail on.hC++ headers or malformed code):
let tier3_fallback_cv = match code_lang {
"rust" | "python" | "typescript" | "javascript"
| "go" | "java" | "kotlin"
| "yaml" | "dockerfile" | "toml" | "json" | "xml" | "groovy" | "go-mod"
| "c" | "cpp" // p10-1d:
=> Some(CodeTextParagraphV1Chunker.chunker_version()),
_ => None,
};
(The exact location of this match is in ingest_one_code_asset between ~lines 1921-1927 per the p10-3 critical fix.)
- Step 8: Build:
cargo build -p kebab-app 2>&1 | tail -5
Expected: clean.
- Step 9: Per-crate test (no regression):
cargo test -p kebab-app --lib -- --nocapture 2>&1 | tail -10
Expected: 52 PASS (existing baseline).
- Step 10: Clippy + commit:
cargo clippy -p kebab-app --all-targets -- -D warnings
git add crates/kebab-app/src/lib.rs
git commit -m "$(cat <<'EOF'
feat(p10-1d): activate C + C++ in ingest_one_code_asset dispatch
Extends 4-arm match (parser_version / chunker_version / extract / chunks)
+ allowlist + tier3_fallback_cv list with "c" + "cpp" arms. C uses
CAstExtractor + CodeCAstV1Chunker; C++ uses CppAstExtractor +
CodeCppAstV1Chunker. Both langs are Tier 3-fallback-eligible (e.g. .h
file with C++ syntax may fail tree-sitter-c parse → Tier 3 paragraph
fallback).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"
Task G: code_ingest_smoke integration tests (C + C++)
Files:
-
Modify:
crates/kebab-app/tests/code_ingest_smoke.rs -
Step 1: Append 2 tests at the end of the file (mirror the existing tier1 tests
c_ast_v1_*if present; if not, mirrorrust_ast_v1_*orgo_ast_v1_*):
#[test]
fn tier1_c_ingest_searchable() {
let env = TestEnv::lexical_only();
let workspace = env.workspace_root();
std::fs::write(
workspace.join("parser.c"),
"#include <stdio.h>\n\nint parse_record(const char *line) {\n if (line == NULL) return -1;\n return 0;\n}\n",
)
.unwrap();
let report = env.ingest().expect("ingest");
assert!(report.new_docs >= 1, "expected at least 1 new doc");
let hits = env.search_code_lang("c", "parse_record").expect("search");
assert!(!hits.is_empty(), "expected at least 1 c hit");
match &hits[0].citation {
Citation::Code { symbol, lang, .. } => {
assert_eq!(symbol.as_deref(), Some("parse_record"), "C symbol must be function name only");
assert_eq!(lang.as_deref(), Some("c"));
}
other => panic!("expected Citation::Code, got {other:?}"),
}
assert_eq!(
hits[0].chunker_version.as_ref().map(|c| c.0.as_str()),
Some("code-c-ast-v1"),
);
}
#[test]
fn tier1_cpp_ingest_searchable() {
let env = TestEnv::lexical_only();
let workspace = env.workspace_root();
std::fs::write(
workspace.join("chunker.cpp"),
"namespace kebab {\nnamespace chunk {\nclass Foo {\npublic:\n void bar() { /* impl */ }\n};\n}\n}\n",
)
.unwrap();
let report = env.ingest().expect("ingest");
assert!(report.new_docs >= 1);
let hits = env.search_code_lang("cpp", "bar").expect("search");
assert!(!hits.is_empty(), "expected at least 1 cpp hit");
match &hits[0].citation {
Citation::Code { symbol, lang, .. } => {
// Symbol could be "kebab::chunk::Foo::bar" or "kebab::chunk::Foo" depending on which chunk hits first.
assert!(
symbol.as_deref().map_or(false, |s| s.starts_with("kebab::chunk::Foo")),
"C++ symbol must start with namespace::Class prefix, got {:?}", symbol
);
assert_eq!(lang.as_deref(), Some("cpp"));
}
other => panic!("expected Citation::Code, got {other:?}"),
}
assert_eq!(
hits[0].chunker_version.as_ref().map(|c| c.0.as_str()),
Some("code-cpp-ast-v1"),
);
}
- Step 2: Run tests:
cargo test -p kebab-app --test code_ingest_smoke tier1_c_ingest tier1_cpp_ingest -- --nocapture 2>&1 | tail -30
Expected: 2 PASS.
- Step 3: Full smoke regression:
cargo test -p kebab-app --test code_ingest_smoke -- --nocapture 2>&1 | tail -30
Expected: 18 PASS (16 existing + 2 new).
- Step 4: Clippy + commit:
cargo clippy -p kebab-app --tests -- -D warnings
git add crates/kebab-app/tests/code_ingest_smoke.rs
git commit -m "$(cat <<'EOF'
test(p10-1d): integration smoke tests for C + C++
Verifies end-to-end ingest + search + Citation::Code shape:
- tier1_c_ingest_searchable: .c file → --code-lang c search → symbol
= function name (no nesting), lang = "c", chunker_version = "code-c-ast-v1".
- tier1_cpp_ingest_searchable: .cpp file → --code-lang cpp search →
symbol starts with namespace::Class prefix, lang = "cpp",
chunker_version = "code-cpp-ast-v1".
Brings code_ingest_smoke to 18 tests (Rust 3 + Python 1 + TS 1 + JS 1 +
Go 1 + Java 1 + Kotlin 1 + yaml 1 + dockerfile 1 + manifest 1 + shell 1 +
yaml-fallback 1 + 2 reingest-unchanged regression + c 1 + cpp 1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"
Task H: frozen design §10 activation log
Files:
-
Modify:
docs/superpowers/specs/2026-04-27-kebab-final-form-design.md -
Step 1: Find §10 activation log. Add p10-1D entry right after the p10-3 entry:
**p10-1D 활성화 (C + C++) (2026-05-21)**: Tier 1 chunker family 완료 — C (`code-c-ast-v1`, `.c`/`.h`) + C++ (`code-cpp-ast-v1`, `.cpp`/`.cc`/`.cxx`/`.hpp`/`.hh`/`.hxx`) AST chunker 활성화. C symbol = function name only; C++ symbol = `namespace::Class::method` (recursive namespace + class nesting). `.h` 가 C++ syntax 만나면 tree-sitter-c parse 실패 → p10-3 Tier 3 fallback 으로 자동 picked up.
- Step 2: Commit:
git add docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md \
docs/superpowers/specs/2026-04-27-kebab-final-form-design.md 2>/dev/null
git add docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
git commit -m "$(cat <<'EOF'
docs(p10-1d): activate C + C++ in frozen design §10
P10 Tier 1 chunker family complete.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"
Task I: README + HANDOFF + ARCHITECTURE + SMOKE + tasks/INDEX + tasks/p10/INDEX
Files:
-
Modify:
README.md(Mermaid + ingest row),HANDOFF.md,docs/ARCHITECTURE.md,docs/SMOKE.md,tasks/INDEX.md,tasks/p10/INDEX.md -
Step 1 — README.md: Update the
kebab ingestrow's supported-langs list to include.c/.h→code-c-ast-v1and.cpp/.cc/.cxx/.hpp/.hh/.hxx→code-cpp-ast-v1. Extend--code-lang c/--code-lang cppin the enumeration. Update the Mermaidchunker[...]node to includecode-c-ast-v1, code-cpp-ast-v1in the brace. -
Step 2 — HANDOFF.md: P10 row append
, **1D ✅ (C + C++ AST chunkers, code-c-ast-v1 + code-cpp-ast-v1 — v0.16.0)**. Update 한 줄 요약 to include C/C++. Update 다음 후보 (drop p10-1D; remaining: P9-5 desktop / P8 audio). -
Step 3 — docs/ARCHITECTURE.md: code parser table row: append C + C++ row mention. Flowchart
pcodenode: append+ P10-1D. Directory tree chunkers list: addcode_c_ast_v1.rs+code_cpp_ast_v1.rs. -
Step 4 — docs/SMOKE.md: Add a "## P10-1D C + C++ AST chunker" section after the P10-3 section. Walkthrough with sample.c + sample.cpp ingest +
--code-lang c/--code-lang cppsearch assertions. Append verification checklist entry. -
Step 5 — tasks/INDEX.md + tasks/p10/INDEX.md: Flip p10-1D row ⏳ → ✅ (v0.16.0).
-
Step 6: Commit:
git add README.md HANDOFF.md docs/ARCHITECTURE.md docs/SMOKE.md tasks/INDEX.md tasks/p10/INDEX.md
git commit -m "$(cat <<'EOF'
docs(p10-1d): README/HANDOFF/ARCHITECTURE/SMOKE/INDEX sync
P10 Tier 1 chunker family complete (Rust + Python + TS + JS + Go + Java +
Kotlin + C + C++). Tier 2 (k8s + dockerfile + manifest) and Tier 3
(paragraph fallback) already active. p10-1D 활성화 + ✅ flip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"
Task J: workspace test gate + clippy
-
Step 1: Disk check (
df -h /) + optionalcargo clean. -
Step 2:
cargo test --workspace --no-fail-fast -j 1 2>&1 | tail -80. Expected: all PASS. -
Step 3:
cargo clippy --workspace --all-targets -- -D warnings 2>&1 | tail -30. Expected: clean.
Task K: version bump + gitea PR + release
Files:
-
Modify:
Cargo.toml -
Step 1: Workspace
version = "0.15.0"→"0.16.0". -
Step 2:
cargo build -p kebab-clito refresh Cargo.lock. -
Step 3: Commit:
git add Cargo.toml Cargo.lock
git commit -m "$(cat <<'EOF'
chore: bump version 0.15.0 → 0.16.0 (p10-1d C + C++ AST chunkers)
Minor bump — additive new chunker_versions code-c-ast-v1 + code-cpp-ast-v1
+ new routing langs c / cpp + new tree-sitter-c / tree-sitter-cpp workspace
deps. P10 Tier 1 chunker family complete. No DB migration, no wire schema
major bump.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"
-
Step 4: Push branch + open gitea PR via REST API. Title:
feat(p10-1d): C + C++ AST chunkers — P10 Tier 1 chunker family complete. -
Step 5: Wait for code-reviewer APPROVE → merge via gitea REST API → cut
gitea-release v0.16.0.
Verification matrix
| 검증 | 명령 | 기대 |
|---|---|---|
| C symbol | kebab search --code-lang c --json |
Citation::Code.symbol = "<fn_name>" |
| C++ symbol | kebab search --code-lang cpp --json |
Citation::Code.symbol = "namespace::Class::method" |
| .h fallback | .h with C++ syntax → ingest |
Tier 3 fallback: chunker_version = "code-text-paragraph-v1", lang = c |
| code_lang_breakdown | kebab schema --json |
"c": N, "cpp": M |
Risks reminder (구현 중 주의)
- tree-sitter grammar version resolution: tree-sitter 0.26 호환 grammar. crates.io 최신 버전 default.
- tree-sitter-cpp 의 node-kind 명: spec 의 가정 (
namespace_definition,class_specifier,function_definition,template_declaration,concept_definition, etc.) 이 실제 grammar 와 일치하는지 fixture parse 로 검증. - out-of-class method def 의 prefix 복원:
void Foo::bar()의 declarator 가function_declarator > qualified_identifier > namespace_identifier "Foo" + identifier "bar". spec 의extract_fn_symbol이 이 chain 정확히 walk. - Operator overload: tree-sitter-cpp 의
operator_name또는field_identifier"operator+" 형태. fixture 로 검증. - 머지 후 deviation 은
tasks/HOTFIXES.mddated 로그.