fix(dogfood): k8s multi-resource YAML chunk_id collision #158

Merged
altair823 merged 3 commits from fix/dogfood-k8s-multi-resource-chunk-id into main 2026-05-21 23:57:54 +00:00
Owner

Summary

P10 종합 도그푸딩에서 발견한 HIGH 우선순위 버그 fix.

버그: 2+ document 를 가진 k8s manifest YAML (예: 한 파일에 Deployment + Service, --- 구분) 이 ingest 실패 — UNIQUE constraint failed: chunks.chunk_id. document row 는 생성되나 chunk 0개 → 검색 불가.

Root cause: tier2_shared::push_chunks_with_oversize 의 non-oversize 분기가 split_key = None 하드코딩. K8sManifestResourceV1Chunker 가 resource 마다 push_chunks_with_oversize 호출 — 각 resource ≤ 200 line 이면 build_chunk(... None). 같은 document 의 모든 resource 가 doc_id + chunker_version + base_policy_hash + 빈 block_ids 공유 + split_key = None → 동일 id_hash → 동일 chunk_id → 2번째 chunk 에서 UNIQUE 위반. p10-3 의 code_text_paragraph_v1 가 정확히 같은 버그였고 commit df3c5b8 에서 fix 됐지만 — code_text_paragraph_v1build_chunk_no_symbol 직접 호출, push_chunks_with_oversize 경로는 미수정.

Fix: push_chunks_with_oversizebase_split_key: Option<u32> 파라미터 추가 — non-oversize 단일 chunk 케이스의 split key. k8s chunker 가 Some(resource.line_start) 전달 → 각 resource 가 distinct chunk_id. dockerfile / manifest 는 None 전달 (파일당 1 chunk — sibling collision 없음, chunk_id 불변, migration churn 없음).

Regression coverage:

  • k8s_multi_doc_emits_one_chunk_per_resource 가 chunk_id distinctness assertion 추가 — 이 검증이 없어서 p10-2 시점에 버그가 빠져나감.
  • 신규 integration test tier2_k8s_multi_resource_yaml_ingests_without_collision — 실 2-document YAML end-to-end ingest. implementer 가 git stash 로 fix 전 이 test 가 정확히 UNIQUE constraint error 로 fail 함을 확인.

발견 경위

P10 종합 도그푸딩 (/tmp/kebab-p10-dogfood/, 16 파일 — 9 AST lang + Tier 2/3). 9 AST lang + dockerfile + manifest + shell + 비-k8s YAML fallback 은 모두 정상. multi-resource k8s YAML 만 실패. p10-2 의 통합 테스트 tier2_k8s_yaml_ingest_searchable 가 single-Deployment fixture 만 써서 미발견.

Test plan

  • cargo test -p kebab-chunk --test k8s_manifest_resource_v1 4/4 PASS (distinctness assertion 포함)
  • cargo test -p kebab-chunk --test dockerfile_file_v1 --test manifest_file_v1 5/5 PASS (chunk_id 불변 확인)
  • cargo test -p kebab-app --test code_ingest_smoke 19/19 PASS (18 + 신규 multi-resource test)
  • cargo clippy -p kebab-chunk -p kebab-app --all-targets -- -D warnings clean
  • regression test 가 fix 전 실제 fail 함을 git stash 로 확인
  • post-merge: 0.16.1 binary 로 dogfood KB 재검증 (k8s/deploy.yaml ingest 성공 확인)
  • post-merge gitea-release v0.16.1

Branch

fix/dogfood-k8s-multi-resource-chunk-id (head: 08d72a1). 2 commits — fix + version bump 0.16.0 → 0.16.1 (patch, bug fix only).

🤖 Generated with Claude Code

## Summary P10 종합 도그푸딩에서 발견한 HIGH 우선순위 버그 fix. **버그**: 2+ document 를 가진 k8s manifest YAML (예: 한 파일에 Deployment + Service, `---` 구분) 이 ingest 실패 — `UNIQUE constraint failed: chunks.chunk_id`. document row 는 생성되나 chunk 0개 → 검색 불가. **Root cause**: `tier2_shared::push_chunks_with_oversize` 의 non-oversize 분기가 `split_key = None` 하드코딩. `K8sManifestResourceV1Chunker` 가 resource 마다 `push_chunks_with_oversize` 호출 — 각 resource ≤ 200 line 이면 `build_chunk(... None)`. 같은 document 의 모든 resource 가 `doc_id` + `chunker_version` + `base_policy_hash` + 빈 `block_ids` 공유 + `split_key = None` → 동일 `id_hash` → 동일 `chunk_id` → 2번째 chunk 에서 UNIQUE 위반. p10-3 의 `code_text_paragraph_v1` 가 정확히 같은 버그였고 commit `df3c5b8` 에서 fix 됐지만 — `code_text_paragraph_v1` 는 `build_chunk_no_symbol` 직접 호출, `push_chunks_with_oversize` 경로는 미수정. **Fix**: `push_chunks_with_oversize` 에 `base_split_key: Option<u32>` 파라미터 추가 — non-oversize 단일 chunk 케이스의 split key. k8s chunker 가 `Some(resource.line_start)` 전달 → 각 resource 가 distinct chunk_id. dockerfile / manifest 는 `None` 전달 (파일당 1 chunk — sibling collision 없음, chunk_id 불변, migration churn 없음). **Regression coverage**: - `k8s_multi_doc_emits_one_chunk_per_resource` 가 chunk_id distinctness assertion 추가 — 이 검증이 없어서 p10-2 시점에 버그가 빠져나감. - 신규 integration test `tier2_k8s_multi_resource_yaml_ingests_without_collision` — 실 2-document YAML end-to-end ingest. implementer 가 `git stash` 로 fix 전 이 test 가 정확히 UNIQUE constraint error 로 fail 함을 확인. ## 발견 경위 P10 종합 도그푸딩 (`/tmp/kebab-p10-dogfood/`, 16 파일 — 9 AST lang + Tier 2/3). 9 AST lang + dockerfile + manifest + shell + 비-k8s YAML fallback 은 모두 정상. multi-resource k8s YAML 만 실패. p10-2 의 통합 테스트 `tier2_k8s_yaml_ingest_searchable` 가 single-Deployment fixture 만 써서 미발견. ## Test plan - [x] `cargo test -p kebab-chunk --test k8s_manifest_resource_v1` 4/4 PASS (distinctness assertion 포함) - [x] `cargo test -p kebab-chunk --test dockerfile_file_v1 --test manifest_file_v1` 5/5 PASS (chunk_id 불변 확인) - [x] `cargo test -p kebab-app --test code_ingest_smoke` 19/19 PASS (18 + 신규 multi-resource test) - [x] `cargo clippy -p kebab-chunk -p kebab-app --all-targets -- -D warnings` clean - [x] regression test 가 fix 전 실제 fail 함을 git stash 로 확인 - [ ] post-merge: 0.16.1 binary 로 dogfood KB 재검증 (k8s/deploy.yaml ingest 성공 확인) - [ ] post-merge gitea-release v0.16.1 ## Branch `fix/dogfood-k8s-multi-resource-chunk-id` (head: `08d72a1`). 2 commits — fix + version bump 0.16.0 → 0.16.1 (patch, bug fix only). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
altair823 added 2 commits 2026-05-21 23:55:06 +00:00
P10 dogfooding found that a k8s manifest with 2+ documents (e.g.
Deployment + Service in one file) fails to ingest:
  UNIQUE constraint failed: chunks.chunk_id

Root cause: tier2_shared::push_chunks_with_oversize's non-oversize branch
hardcoded split_key = None. K8sManifestResourceV1Chunker calls it once per
resource; with split_key None every resource from the same document gets
the same id_hash (= base_policy_hash) → identical chunk_id. p10-3's
code_text_paragraph_v1 had the same bug (fixed in df3c5b8) but it calls
build_chunk_no_symbol directly — the push_chunks_with_oversize path was
never fixed.

Fix: push_chunks_with_oversize gains a base_split_key parameter for the
non-oversize single-chunk case. k8s chunker passes Some(resource.line_start)
so each resource gets a distinct chunk_id; dockerfile / manifest pass None
(1 chunk per file — no sibling collision, chunk_id stays stable).

Regression coverage: k8s_multi_doc_emits_one_chunk_per_resource now asserts
chunk_id distinctness; new integration test
tier2_k8s_multi_resource_yaml_ingests_without_collision ingests a real
2-document YAML end-to-end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Patch bump — bug fix only (P10 dogfood-discovered k8s multi-resource
chunk_id collision). New binary needed to resume dogfooding. No wire
schema change, no DB migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
altair823 added 1 commit 2026-05-21 23:57:48 +00:00
PR #158 code-reviewer recommendation. Records the dogfood-discovered
k8s multi-resource chunk_id collision + the deliberate decision NOT to
bump chunker_version (dogfood-only stage, single-resource k8s chunk_id
shift is benign churn). Cross-link added to p10-2 spec Risks/notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
altair823 merged commit a0c0dca321 into main 2026-05-21 23:57:53 +00:00
altair823 deleted branch fix/dogfood-k8s-multi-resource-chunk-id 2026-05-21 23:57:54 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: altair823-org/kebab#158