fix(dogfood): k8s multi-resource YAML chunk_id collision
P10 dogfooding found that a k8s manifest with 2+ documents (e.g.
Deployment + Service in one file) fails to ingest:
UNIQUE constraint failed: chunks.chunk_id
Root cause: tier2_shared::push_chunks_with_oversize's non-oversize branch
hardcoded split_key = None. K8sManifestResourceV1Chunker calls it once per
resource; with split_key None every resource from the same document gets
the same id_hash (= base_policy_hash) → identical chunk_id. p10-3's
code_text_paragraph_v1 had the same bug (fixed in df3c5b8) but it calls
build_chunk_no_symbol directly — the push_chunks_with_oversize path was
never fixed.
Fix: push_chunks_with_oversize gains a base_split_key parameter for the
non-oversize single-chunk case. k8s chunker passes Some(resource.line_start)
so each resource gets a distinct chunk_id; dockerfile / manifest pass None
(1 chunk per file — no sibling collision, chunk_id stays stable).
Regression coverage: k8s_multi_doc_emits_one_chunk_per_resource now asserts
chunk_id distinctness; new integration test
tier2_k8s_multi_resource_yaml_ingests_without_collision ingests a real
2-document YAML end-to-end.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -140,6 +140,17 @@ fn k8s_multi_doc_emits_one_chunk_per_resource() {
|
||||
for chunk in &chunks {
|
||||
assert_eq!(chunk.chunker_version.0, "k8s-manifest-resource-v1");
|
||||
}
|
||||
|
||||
// Every chunk from a multi-resource file must have a distinct chunk_id.
|
||||
// Without the fix, all non-oversize resources get split_key=None which
|
||||
// collapses to the same id_hash (= base_policy_hash) → UNIQUE constraint
|
||||
// violation on the second resource.
|
||||
let ids: std::collections::HashSet<_> = chunks.iter().map(|c| c.chunk_id.clone()).collect();
|
||||
assert_eq!(
|
||||
ids.len(),
|
||||
chunks.len(),
|
||||
"every k8s resource chunk must have a distinct chunk_id (multi-resource collision regression)"
|
||||
);
|
||||
}
|
||||
|
||||
/// A YAML document with an indentation error (tab in a space-indented context)
|
||||
|
||||
Reference in New Issue
Block a user