feat(v0.17.0/PR-C): code_lang_chunk_breakdown additive wire field

closure of HOTFIXES 2026-05-22 "code_lang_breakdown chunk granularity"
LOW. Chunk-level companion of the existing doc-count metric.

- crates/kebab-store-sqlite/src/store.rs: code_lang_chunk_breakdown()
  method. chunks INNER JOIN documents → COUNT(c.chunk_id) GROUP BY
  metadata_json.code_lang, NULL skipped. BTreeMap<String, u32>.
  + lib unit test code_lang_chunk_breakdown_counts_chunks_not_docs
  (1 rust doc + 3 chunks → rust=3 chunks vs rust=1 doc).
- crates/kebab-app/src/schema.rs: Stats.code_lang_chunk_breakdown
  additive field + collect_stats builder. tests_stats_ext 의
  stats_includes_code_lang_and_repo_breakdown_fields 가 신규 필드도
  검증.
- docs/wire-schema/v1/schema.schema.json: 신규 additive 필드
  명세 + 기존 code_lang_breakdown / repo_breakdown description
  정정 ("code chunk count" → "doc count", Gemini round 2 권고).
- tasks/HOTFIXES.md: 2026-05-24 PR-C closure entry.

wire additive, schema_version bump 불필요. v0.16.x 호출 호환.
cargo test --workspace --no-fail-fast -j 1 + clippy 통과.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-24 16:08:28 +00:00
parent ff9d5f5f86
commit 0def913abd
4 changed files with 178 additions and 2 deletions

View File

@@ -81,12 +81,17 @@
},
"code_lang_breakdown": {
"type": "object",
"description": "p10-1A-1: per-language code chunk count. Key = lowercase language name (e.g. 'rust', 'python'). Populated after 1A-2 lands; empty on markdown-only corpora.",
"description": "p10-1A-1: per-language **doc** count (one entry per indexed code document). Key = lowercase language name (e.g. 'rust', 'python'). Empty on markdown-only corpora. Pair with `code_lang_chunk_breakdown` for chunk-level granularity (one file's 200 chunks vs one doc).",
"additionalProperties": { "type": "integer", "minimum": 0 }
},
"repo_breakdown": {
"type": "object",
"description": "p10-1A-1: per-repo code chunk count. Key = repo name as detected by kebab-parse-code::repo. Empty on markdown-only corpora.",
"description": "p10-1A-1: per-repo **doc** count. Key = repo name as detected by kebab-parse-code::repo. Empty on markdown-only corpora.",
"additionalProperties": { "type": "integer", "minimum": 0 }
},
"code_lang_chunk_breakdown": {
"type": "object",
"description": "v0.17.0 PR-C: per-language **chunk** count (closes HOTFIXES 2026-05-22 'code_lang_breakdown chunk granularity'). Companion to `code_lang_breakdown` (doc count) — chunk-level granularity is the indexing-pressure metric (a 200-chunk PDF + a 5-chunk Rust file both appear as `1 doc` but `200` vs `5` chunks). Key = lowercase language name. Empty on markdown-only corpora.",
"additionalProperties": { "type": "integer", "minimum": 0 }
}
}