Minor bump — additive new chunker_versions code-c-ast-v1 + code-cpp-ast-v1
+ new routing langs c / cpp + new tree-sitter-c / tree-sitter-cpp workspace
deps. P10 Tier 1 chunker family complete. No DB migration, no wire schema
major bump.
Also lands the missing p10-3 try_skip_unchanged fallback-aware fix (Option
B1 — 7th param) that PR #155 was supposed to ship but never made it to main
(implementer reported commit SHA 2a39513 that didn't exist in the merged
branch). Same commit extends tier3_fallback_cv to include c/cpp.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #155 (p10-3) merged WITHOUT the reviewer's required Option B1 fix —
the implementer reported a commit SHA (2a39513) that never made it to main.
Result: every reingest of a Tier 3-fallback file (non-k8s YAML, invalid
YAML, AST extractor failure) re-runs full extract + chunk + embed because
the parser/chunker version comparison can never match (stored is
code-text-paragraph-v1 / none-v1, but caller uses Tier 1/2 dispatch
values).
This commit:
1. Adds the 7th param `fallback_chunker_version: Option<&ChunkerVersion>`
to try_skip_unchanged + the stored_is_tier3_fallback detection branch
(skip parser/chunker equality, keep embedder check).
2. Threads `None` through non-code call sites (md / image / pdf).
3. Code call site computes tier3_fallback_cv covering all Tier 1/2 langs
that can fall back: rust / python / ts / js / go / java / kotlin /
yaml / dockerfile / toml / json / xml / groovy / go-mod / c / cpp
(p10-1D additions).
4. Adds tier3_yaml_fallback_reingest_is_unchanged + tier3_shell_reingest_is_unchanged
regression tests (the originally-promised PR #155 regression coverage
that also never made it to main).
Smoke tests: 14 + 2 = 16 PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends 4-arm match (parser_version / chunker_version / extract / chunks)
+ allowlist + tier3_fallback_cv with "c" + "cpp" arms. C uses CAstExtractor
+ CodeCAstV1Chunker; C++ uses CppAstExtractor + CodeCppAstV1Chunker. Both
langs are Tier 3-fallback-eligible (e.g. .h file with C++ syntax may fail
tree-sitter-c parse → Tier 3 paragraph fallback per p10-3 wrapper).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors code-go-ast-v1's chunker pattern. Snapshot test against
tests/fixtures/sample.c (function + typedef struct + typedef enum +
preprocessor) verifies symbol list + lang=c stamping.
Chunks produced (4 total):
- <top-level> glue: includes, defines, static vars, typedefs (lines 1-18)
- parse_record function (lines 20-23)
- print_record function (lines 25-27)
- main function (lines 29-33)
All chunks stamped with lang=c and chunker_version=code-c-ast-v1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symbol = namespace::Class::method via recursive build_blocks. namespace_definition
pushes namespace name (anonymous → <anonymous>). nested_namespace_specifier
(outer::inner) flattens all segments and pushes them. class_specifier / struct_specifier
(named) emit class unit + recurse with class name pushed. function_definition emits
method unit; symbol resolution unpacks declarator chain (pointer_declarator /
reference_declarator → function_declarator → identifier / field_identifier /
qualified_identifier / operator_name / destructor_name).
operator_cast (conversion operators, e.g. operator bool) handled as a direct
declarator kind on function_definition. template_declaration recurses with same
prefix (template params NOT in symbol). enum_specifier + concept_definition emit
type-level units. linkage_specification (extern "C") recurses into body with same
prefix. Other top-level nodes → <top-level> glue.
All 15 unit tests pass; build and clippy clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Top-level units: function_definition (symbol = fn name from declarator's
innermost identifier), struct_specifier, enum_specifier, union_specifier
(each emits 1 unit with the named identifier as symbol). Preprocessor
directives + top-level declarations group into a <top-level> glue chunk.
Empty file or zero units → <module> post-pass.
C symbol = function name only — no namespace, no class nesting (design §3.4).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Standard crate names resolved cleanly: tree-sitter-c v0.24.2 and
tree-sitter-cpp v0.23.4 are both compatible with workspace tree-sitter 0.26.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Frozen contract: single PR with code-c-ast-v1 + code-cpp-ast-v1. C symbol
= function name only (no nesting). C++ symbol = namespace::Class::method
(recursion). .h → C (design §3.5); C++ headers' parse failure picked up
by p10-3 Tier 3 fallback. tree-sitter-c + tree-sitter-cpp workspace deps,
version bump 0.15.0 → 0.16.0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>