feat(p10-2): Tier 2 resource-aware chunkers (k8s + Dockerfile + manifest) — 코드 색인 외 리소스 파일 활성화 #153

Merged
altair823 merged 13 commits from feat/p10-2-tier2-resource into main 2026-05-20 14:22:58 +00:00
Owner

Summary

  • Tier 2 resource-aware chunker 3종 활성화 (design §9.2). AST 가 아닌 file/document-level chunking 으로 코드 외 리소스 파일을 색인.
    • k8s-manifest-resource-v1.yaml/.ymlapiVersion + kind 가 있는 multi-document YAML. document 당 1 chunk. symbol = <kind>/<namespace>/<name> 또는 <kind>/<name> (cluster-scoped).
    • dockerfile-file-v1Dockerfile / Dockerfile.* / *.dockerfile 전체 1 chunk. symbol = <dockerfile>.
    • manifest-file-v1Cargo.toml / pyproject.toml / package.json / tsconfig.json / pom.xml / build.gradle / go.mod 전체 1 chunk. symbol = <manifest>. 종류는 code_lang 으로 구분 (toml / json / xml / groovy / go-mod).
  • 새 모듈: crates/kebab-chunk/src/{k8s_manifest_resource_v1,dockerfile_file_v1,manifest_file_v1,tier2_shared}.rs. tier2_shared 가 oversize fallback (>200 lines line-window split) + Chunk 생성 helper 를 공유 — Chunk hash/id/token-count 가 1A-2 와 동일.
  • routing: code_lang_for_path (design §3.5 single source of truth) 가 basename-first 매칭으로 확장 — Dockerfile.* prefix 변형 + 7 매니페스트 basename + Tier 2 확장자 (.yaml/.yml/.toml/.json/.xml/.gradle/.dockerfile). kebab-source-fs/src/media.rs 의 1A-1 era inline 매칭 duplication 정리.
  • dispatch: ingest_one_code_asset 의 7-arm match 가 Tier 2 7-lang (yaml / dockerfile / toml / json / xml / groovy / go-mod) 분기 추가. parser_version = "none-v1" 통일 (Tier 2 는 parse 단계 없음). synthesize_tier2_document 가 raw bytes → 단일 Block::Code Document 직접 생성.
  • frozen design 갱신: §3.5 매핑 3줄 (xml / groovy / go-mod) + §10.1 활성화 한 줄. design 2026-04-27 의 §10 activation log 에도 동일 entry.
  • wire schema: additive — Citation::Code.lang 에 7 새 값, schema.v1.stats.code_lang_breakdown 에 자연히 추가. schema 본문 변경 0.
  • version bump: 0.13.0 → 0.14.0 (workspace version 한 줄, cascade 자동).
  • docs: README (지원 lang 표) / HANDOFF (p10-2 ) / docs/ARCHITECTURE (chunker 트리 + flowchart) / docs/SMOKE (k8s yaml + Dockerfile + Cargo.toml ingest+search 검증 단계) / tasks/INDEX + tasks/p10/INDEX ().

비-k8s YAML (Helm values, CI yml, docker-compose 등) 및 invalid YAML 은 본 phase 에선 skip — p10-3 의 paragraph fallback 이 머지되면 자동으로 wire 됨.

Test plan

  • cargo test --workspace --no-fail-fast -j 1 PASS (memory-conscious -j 1 룰)
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • kebab-chunk 신규 8 테스트: k8s multi-doc (2 chunk + 비-k8s skip) / invalid yaml (0 chunk) / cluster-scoped symbol / dockerfile single chunk / 4 manifest variants (toml/json/xml/go-mod)
  • kebab-app/tests/code_ingest_smoke.rs 12 테스트 (Tier 1 9 + Tier 2 3): k8s yaml → Citation::Code { symbol: "Deployment/prod/api", lang: "yaml" }, Dockerfile → <dockerfile> / dockerfile, Cargo.toml → <manifest> / toml
  • code_lang_for_path basename 우선 + 확장자 fallback unit tests (Task B)
  • media.rs Tier 2 routing test (Task C)
  • post-merge dogfood: kebab ingest 가 multi-root KB 에서 .yaml + Dockerfile + manifest 파일을 routing — --code-lang yaml 검색 결과 + schema breakdown 확인
  • post-merge gitea-release v0.14.0

Branch

feat/p10-2-tier2-resource (head: 217dddb). Spec: tasks/p10/p10-2-tier2-resource-aware.md. Plan: docs/superpowers/plans/2026-05-20-p10-2-tier2-resource-aware.md.

🤖 Generated with Claude Code

## Summary - **Tier 2 resource-aware chunker 3종 활성화** (design §9.2). AST 가 아닌 file/document-level chunking 으로 코드 외 리소스 파일을 색인. - `k8s-manifest-resource-v1` — `.yaml`/`.yml` 중 `apiVersion` + `kind` 가 있는 multi-document YAML. document 당 1 chunk. symbol = `<kind>/<namespace>/<name>` 또는 `<kind>/<name>` (cluster-scoped). - `dockerfile-file-v1` — `Dockerfile` / `Dockerfile.*` / `*.dockerfile` 전체 1 chunk. symbol = `<dockerfile>`. - `manifest-file-v1` — `Cargo.toml` / `pyproject.toml` / `package.json` / `tsconfig.json` / `pom.xml` / `build.gradle` / `go.mod` 전체 1 chunk. symbol = `<manifest>`. 종류는 `code_lang` 으로 구분 (`toml` / `json` / `xml` / `groovy` / `go-mod`). - **새 모듈**: `crates/kebab-chunk/src/{k8s_manifest_resource_v1,dockerfile_file_v1,manifest_file_v1,tier2_shared}.rs`. `tier2_shared` 가 oversize fallback (>200 lines line-window split) + Chunk 생성 helper 를 공유 — Chunk hash/id/token-count 가 1A-2 와 동일. - **routing**: `code_lang_for_path` (design §3.5 single source of truth) 가 basename-first 매칭으로 확장 — `Dockerfile.*` prefix 변형 + 7 매니페스트 basename + Tier 2 확장자 (`.yaml`/`.yml`/`.toml`/`.json`/`.xml`/`.gradle`/`.dockerfile`). `kebab-source-fs/src/media.rs` 의 1A-1 era inline 매칭 duplication 정리. - **dispatch**: `ingest_one_code_asset` 의 7-arm match 가 Tier 2 7-lang (`yaml` / `dockerfile` / `toml` / `json` / `xml` / `groovy` / `go-mod`) 분기 추가. `parser_version = "none-v1"` 통일 (Tier 2 는 parse 단계 없음). `synthesize_tier2_document` 가 raw bytes → 단일 `Block::Code` Document 직접 생성. - **frozen design 갱신**: §3.5 매핑 3줄 (`xml` / `groovy` / `go-mod`) + §10.1 활성화 한 줄. design 2026-04-27 의 §10 activation log 에도 동일 entry. - **wire schema**: additive — `Citation::Code.lang` 에 7 새 값, `schema.v1.stats.code_lang_breakdown` 에 자연히 추가. schema 본문 변경 0. - **version bump**: 0.13.0 → 0.14.0 (workspace `version` 한 줄, cascade 자동). - **docs**: README (지원 lang 표) / HANDOFF (p10-2 ✅) / docs/ARCHITECTURE (chunker 트리 + flowchart) / docs/SMOKE (k8s yaml + Dockerfile + Cargo.toml ingest+search 검증 단계) / tasks/INDEX + tasks/p10/INDEX (✅). 비-k8s YAML (Helm values, CI yml, docker-compose 등) 및 invalid YAML 은 본 phase 에선 skip — p10-3 의 paragraph fallback 이 머지되면 자동으로 wire 됨. ## Test plan - [x] `cargo test --workspace --no-fail-fast -j 1` PASS (memory-conscious `-j 1` 룰) - [x] `cargo clippy --workspace --all-targets -- -D warnings` clean - [x] `kebab-chunk` 신규 8 테스트: k8s multi-doc (2 chunk + 비-k8s skip) / invalid yaml (0 chunk) / cluster-scoped symbol / dockerfile single chunk / 4 manifest variants (toml/json/xml/go-mod) - [x] `kebab-app/tests/code_ingest_smoke.rs` 12 테스트 (Tier 1 9 + Tier 2 3): k8s yaml → `Citation::Code { symbol: "Deployment/prod/api", lang: "yaml" }`, Dockerfile → `<dockerfile>` / `dockerfile`, Cargo.toml → `<manifest>` / `toml` - [x] `code_lang_for_path` basename 우선 + 확장자 fallback unit tests (Task B) - [x] `media.rs` Tier 2 routing test (Task C) - [ ] post-merge dogfood: `kebab ingest` 가 multi-root KB 에서 `.yaml` + `Dockerfile` + manifest 파일을 routing — `--code-lang yaml` 검색 결과 + schema breakdown 확인 - [ ] post-merge gitea-release v0.14.0 ## Branch `feat/p10-2-tier2-resource` (head: `217dddb`). Spec: `tasks/p10/p10-2-tier2-resource-aware.md`. Plan: `docs/superpowers/plans/2026-05-20-p10-2-tier2-resource-aware.md`. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
altair823 added 13 commits 2026-05-20 14:15:47 +00:00
Frozen contract for the p10-2 single PR: 3 chunker activation, k8s
identification via apiVersion+kind, Dockerfile/manifest basename matching,
code_lang_for_path source-of-truth consolidation, frozen design §3.5 +
§10.1 deltas, and version bump 0.13.0 → 0.14.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Branch feat/p10-2-tier2-resource. Tasks: serde_yaml dep / lang.rs basenames /
media.rs source-of-truth consolidation / 3 chunkers (k8s + dockerfile +
manifest) + tier2_shared helper / ingest dispatch / smoke tests / frozen
design §3.5+§10.1 / docs sync / version bump 0.13.0→0.14.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds basename-first matching for Dockerfile / Cargo.toml / pyproject.toml /
package.json / tsconfig.json / go.mod / pom.xml / build.gradle plus
Dockerfile.* prefix variant. Extension fallback adds .yaml/.yml/.dockerfile/
.toml/.json/.xml/.gradle → yaml/dockerfile/toml/json/xml/groovy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces 1A-1 era inline match block with a single call to
kebab_parse_code::code_lang_for_path, per design §3.5 single-source-of-truth
rule. Adds Tier 2 routing test (yaml / dockerfile / toml / json / xml /
groovy / go-mod) and preserves all non-code extension branches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits multi-document YAML by ^---\s*$, requires apiVersion + kind string
fields per document, emits 1 chunk per recognized k8s resource. Symbol =
<kind>/<namespace>/<name> or <kind>/<name> (cluster-scoped). Invalid YAML
returns 0 chunks (handled by p10-3 paragraph fallback). Oversize >200 lines
splits into line-windows sharing the same symbol.

tier2_shared module hosts the oversize fallback + Chunk-construction helper
mirroring code_rust_ast_v1's Chunk shape. Task E (dockerfile) and Task F
(manifest) will reuse it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reads entire Dockerfile / Dockerfile.* / *.dockerfile content and emits a
single Chunk with symbol "<dockerfile>", code_lang "dockerfile", line range
1..EOF. Oversize >200 lines splits into line-windows sharing the symbol via
tier2_shared::push_chunks_with_oversize.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Emits 1 Chunk per manifest file (Cargo.toml / pyproject.toml / package.json /
tsconfig.json / pom.xml / build.gradle / go.mod). Symbol unified to
"<manifest>"; manifest type distinguished by code_lang (toml / json / xml /
groovy / go-mod) read from Block::Code.lang. Oversize >200 lines splits via
tier2_shared::push_chunks_with_oversize.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds yaml / dockerfile / toml / json / xml / groovy / go-mod arms to the
existing 7-arm AST match. parser_version unified to "none-v1" for Tier 2.
synthesize_tier2_document builds a minimal Document (single Block::Code
with raw file text) since Tier 2 has no parse step. allowlist in
ingest_one_asset extended to admit Tier 2 langs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new tests in code_ingest_smoke.rs verifying isolated-TempDir ingest +
--code-lang filter + Citation::Code.lang / .symbol shape for each Tier 2
chunker. Brings the suite to 12 tests (Rust 3 + Python 1 + TS 1 + JS 1 +
Go 1 + Java 1 + Kotlin 1 + yaml 1 + dockerfile 1 + manifest 1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
§3.5: add code_lang_for_path mappings xml / groovy / go-mod.
§10.1: add deactivation log entry for p10-2 (3 Tier 2 chunkers active).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-visible surface sync per the docs-split rule:
- README adds Tier 2 langs (yaml / dockerfile / toml / json / xml / groovy / go-mod) to the ingest支援 list and --code-lang options.
- HANDOFF flips p10-2 phase row to  (v0.14.0) and updates the next-task candidates.
- ARCHITECTURE extends crates/kebab-chunk/src/ tree with k8s_manifest_resource_v1.rs / dockerfile_file_v1.rs / manifest_file_v1.rs / tier2_shared.rs, plus a Tier 2 note on the code-parser row and flowchart node.
- SMOKE adds a Tier 2 smoke walkthrough (k8s yaml + Dockerfile + Cargo.toml ingest + --code-lang search) and a P10-2 entry in the verification checklist.
- tasks/INDEX + tasks/p10/INDEX flip p10-2 to  (v0.14.0).

Workspace test gate (-j 1) + clippy --workspace pass cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Minor bump — additive code_lang values (xml / groovy / go-mod) + 3 new
chunker_version labels (k8s-manifest-resource-v1 / dockerfile-file-v1 /
manifest-file-v1) + frozen design §3.5 / §10.1 deltas. No DB migration,
no wire schema major bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
altair823 merged commit 17ee400fd5 into main 2026-05-20 14:22:58 +00:00
altair823 deleted branch feat/p10-2-tier2-resource 2026-05-20 14:23:00 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: altair823-org/kebab#153