# p10-2 β€” Tier 2 resource-aware chunkers (k8s + Dockerfile + manifest) **Status:** 🟑 μ§„ν–‰ 쀑 **Contract sections:** Β§3.3 (chunker_version `k8s-manifest-resource-v1` + `dockerfile-file-v1` + `manifest-file-v1`), Β§3.4 (citation symbol β€” `//` / `` / ``), Β§3.5 (code_lang μΆ”κ°€ λ§€ν•‘ `xml` / `groovy` / `go-mod`), Β§6.1 (`kebab-parse-code/src/lang.rs` κ°±μ‹  + `kebab-source-fs/src/media.rs` 의 inline duplication 정리), Β§6.2 (`kebab-chunk/src/{k8s_manifest_resource_v1,dockerfile_file_v1,manifest_file_v1}.rs`), Β§9.2 (Tier 2 μ •μ˜), Β§10.1 (deactivation log ν•œ 쀄). **Design:** [2026-05-15-kebab-code-ingest-design.md](../../docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md) Β§1.2 (Phase 2) + Β§9.2. **Plan:** [2026-05-20-p10-2-tier2-resource-aware.md](../../docs/superpowers/plans/2026-05-20-p10-2-tier2-resource-aware.md). ## Goal p10-1A-2 / 1B / 1C 인프라 μœ„μ— Tier 2 resource-aware chunker 3쒅을 단일 PR 둜 ν™œμ„±ν™”. AST κ°€ μ•„λ‹Œ file/document-level chunking β€” 1B (Python+TS+JS) 의 묢음 νŒ¨ν„΄ 따름. λ¨Έμ§€ μ‹œμ λΆ€ν„° `.yaml` / `.yml` / `Dockerfile` / λ§€λ‹ˆνŽ˜μŠ€νŠΈ 7μ’… dogfooding κ°€λŠ₯. λΉ„-k8s YAML (Helm values, CI yml, docker-compose λ“±) 및 invalid YAML 은 λ³Έ phase 에선 skip β€” p10-3 의 paragraph fallback 이 λ¨Έμ§€λ˜λ©΄ μžλ™μœΌλ‘œ wire 됨. ## λ™κ²°λœ 섀계 κ²°μ • (이 task 둜 ν™•μ •) ### 곡톡 - **3 chunker = self-contained**. `kebab-parse-code` 에 Tier 2 용 extractor λͺ¨λ“ˆ μΆ”κ°€ μ—†μŒ. lang.rs 의 `code_lang_for_path` κ°±μ‹ λ§Œ. AST κ°€ μ•„λ‹ˆλΌ 좔상화 λΉ„μš©μ΄ μ½”λ“œ 보상보닀 큼. - **`code_lang_for_path` = single source of truth** (design Β§3.5). `kebab-source-fs/src/media.rs` 의 inline ν™•μž₯자 match λŠ” 이 ν•¨μˆ˜ 호좜둜 톡일 (1A-1 λΆ€ν„° λˆ„μ λœ duplication 정리, μž‘μ€ λ¦¬νŒ©ν† λ§). - **parser_version** = `"none-v1"` 톡일. Tier 2 λŠ” parse 단계가 μ—†μŒμ„ λͺ…μ‹œν•˜λŠ” sentinel. chunker_version cascade 만 의미 있음. - **oversize fallback** = AST chunker 와 동일 μ •μ±… (`AST_CHUNK_MAX_LINES = 200` 초과 μ‹œ line-window split). κ±°λŒ€ ConfigMap / multi-stage Dockerfile / aggregate POM λŒ€λΉ„. split chunk λŠ” 같은 symbol 곡유 (line range 만 닀름). - **frozen design κ°±μ‹ ** (λ³Έ PR μ•ˆμ—μ„œ): - Β§3.5 `code_lang` λ§€ν•‘ ν‘œμ— 3 쀄 μΆ”κ°€: - XML (`.xml`, `pom.xml`) β†’ `xml` - Groovy (`build.gradle`, `.gradle`) β†’ `groovy` - Go module (`go.mod`) β†’ `go-mod` - Β§10.1 deactivation log ν•œ 쀄 μΆ”κ°€: "p10-2 ν™œμ„±ν™” β€” Tier 2 chunker 3μ’… active." ### k8s-manifest-resource-v1 - **Trigger**: `MediaType::Code("yaml")` (= `.yaml` / `.yml`). - **k8s 식별**: YAML document 의 top-level mapping 에 `apiVersion: ` + `kind: ` λ‘˜ λ‹€ μžˆμ–΄μ•Ό 인정. ν•˜λ‚˜λΌλ„ μ—†κ±°λ‚˜ string νƒ€μž…μ΄ μ•„λ‹ˆλ©΄ κ·Έ document skip (전체 파일 skip μ•„λ‹˜ β€” λ‹€λ₯Έ document λŠ” 정상 처리). - **Multi-document split κ΅¬ν˜„**: `serde_yaml::Deserializer::from_str` 의 multi-document iterator κ°€ line offset 을 μ•ˆ μ€˜μ„œ, 원본 ν…μŠ€νŠΈμ˜ `^---\s*$` 쀄 μ •κ·œμ‹ κΈ°μ€€μœΌλ‘œ pre-split ν›„ 각 슬라이슀λ₯Ό deserialize. line_start/line_end λŠ” pre-split λ‹¨κ³„μ—μ„œ 좔적. trailing `---` 의 빈 μŠ¬λΌμ΄μŠ€λŠ” skip. - **Symbol**: `//` (namespace 있으면) λ˜λŠ” `/` (cluster-scoped) λ˜λŠ” `/` (name λˆ„λ½). 예: `Deployment/prod/api-server`, `ClusterRole/cluster-admin`, `ConfigMap/`. - **Chunk text**: pre-split 슬라이슀의 원본 ν…μŠ€νŠΈ κ·ΈλŒ€λ‘œ (deserialized form μ•„λ‹˜ β€” 원본 보쑴). - **Citation**: `Citation::Code { path, line_start, line_end, symbol: Some(<μœ„>), lang: Some("yaml") }`. - **Failure modes**: - Invalid YAML (μ–΄λ–€ document 라도 deserialize μ‹€νŒ¨) β†’ 파일 전체 emit 0 chunk + warning log `invalid yaml: {path}`. p10-3 의 paragraph fallback 이 picked up. - μΈμ •λœ document 0개 (λͺ¨λ‘ λΉ„-k8s) β†’ 파일 전체 emit 0 chunk. 동일 fallback. ### dockerfile-file-v1 - **Trigger**: `MediaType::Code("dockerfile")` β€” 파일λͺ…이 μ •ν™•νžˆ `Dockerfile`, λ˜λŠ” prefix `Dockerfile.` (e.g. `Dockerfile.dev`), λ˜λŠ” ν™•μž₯자 `.dockerfile` (e.g. `myapp.dockerfile`). - **Algorithm**: 파일 전체 ν…μŠ€νŠΈ β†’ 1 chunk emit. - **Symbol**: 톡일 ``. - **Citation**: `Citation::Code { path, line_start: 1, line_end: , symbol: Some(""), lang: Some("dockerfile") }`. ### manifest-file-v1 - **Trigger**: 파일λͺ…이 design Β§9.2 의 7μ’… 쀑 ν•˜λ‚˜: | basename | code_lang | |----------------|-----------| | `Cargo.toml` | `toml` | | `pyproject.toml` | `toml` | | `package.json` | `json` | | `tsconfig.json`| `json` | | `go.mod` | `go-mod` | | `pom.xml` | `xml` | | `build.gradle` | `groovy` | - **μ œμ™Έ**: `build.gradle.kts` λŠ” 1C-JK 의 Kotlin AST chunker (code-kotlin-ast-v1) κ°€ μž‘μœΌλ―€λ‘œ λ³Έ chunker 의 λŒ€μƒ μ•„λ‹˜. - **Algorithm**: 파일 전체 ν…μŠ€νŠΈ β†’ 1 chunk emit. - **Symbol**: 톡일 `` (7μ’… λͺ¨λ‘). manifest μ’…λ₯˜ ꡬ뢄은 `code_lang` 으둜 β€” 예: `--code-lang go-mod` λŠ” go.mod 만, `--code-lang toml` 은 Cargo.toml + pyproject.toml. - **Citation**: `Citation::Code { path, line_start: 1, line_end: , symbol: Some(""), lang: Some(<μœ„ λ§€ν•‘>) }`. ### Routing (kebab-app::ingest_one_code_asset) κΈ°μ‘΄ 7-arm AST match μ˜†μ— Tier 2 λΆ„κΈ° μΆ”κ°€: ```text "rust" | "python" | "typescript" | "javascript" | "go" | "java" | "kotlin" β†’ κΈ°μ‘΄ AST chunker (1A-2 / 1B / 1C) "yaml" β†’ k8s_manifest_resource_v1 "dockerfile" β†’ dockerfile_file_v1 "toml" | "json" | "xml" | "groovy" | "go-mod" β†’ manifest_file_v1 _ β†’ skip (p10-3 fallback 의 자리) ``` `code_lang_for_path` 의 lookup μˆœμ„œ: basename μš°μ„  λ§€μΉ­ (`Cargo.toml` / `Dockerfile.*` / etc.) β†’ ν™•μž₯자 fallback (`.yaml` / `.toml` / etc.). ## Acceptance criteria - `cargo test --workspace --no-fail-fast -j 1` passes (memory-conscious: per-crate μœ„μ£Ό, full-suite gate λŠ” docs task 직전 1회). - `cargo clippy --workspace --all-targets -- -D warnings` passes. - 각 chunker 의 snapshot test μ•ˆμ •: - `crates/kebab-chunk/tests/fixtures/sample.yaml` β€” 2 k8s doc (Deployment + Service) + 1 λΉ„-k8s doc (apiVersion 빠짐) β†’ 2 chunk emit, λΉ„-k8s doc skip. - `crates/kebab-chunk/tests/fixtures/sample.dockerfile` β†’ 1 chunk, symbol ``. - `crates/kebab-chunk/tests/fixtures/sample.Cargo.toml` + `sample.package.json` + `sample.pom.xml` + `sample.go.mod` (4μ’…) β†’ 각 1 chunk, symbol ``, λ§€ν•‘λœ code_lang. - `code_lang_for_path` 의 basename μš°μ„  λ§€μΉ­ + ν™•μž₯자 fallback unit test. - 격리 TempDir KB 에 yaml + Dockerfile + Cargo.toml 두고 `kebab search --code-lang yaml --json` / `--code-lang dockerfile --json` / `--code-lang toml --json` 각각 `Citation::Code` λ°˜ν™˜ (κΈ°μ‘΄ `code_ingest_smoke.rs` 에 3 ν…ŒμŠ€νŠΈ μΆ”κ°€, 총 12 ν…ŒμŠ€νŠΈ). - `kebab schema --json | jq .stats.code_lang_breakdown` 에 `yaml` / `dockerfile` / `toml` / `json` / `xml` / `groovy` / `go-mod` 카운트 (μ‚¬μš©λœ κ²ƒλ§Œ λ“±μž₯). - README + HANDOFF + docs/ARCHITECTURE + docs/SMOKE + tasks/INDEX + tasks/p10/INDEX κ°±μ‹ . - frozen design Β§3.5 λ§€ν•‘ 3 쀄 + Β§10.1 ν™œμ„±ν™” ν•œ 쀄. - workspace `Cargo.toml` minor bump (0.13.0 β†’ 0.14.0), gitea-release v0.14.0. ## Allowed dependencies - `kebab-chunk` 에 μƒˆ λͺ¨λ“ˆ 3개 (`k8s_manifest_resource_v1.rs` / `dockerfile_file_v1.rs` / `manifest_file_v1.rs`) 및 dep entry `serde_yaml = { workspace = true }` (workspace 에 이미 쑴재). κΈ°μ‘΄ deps (kebab-core / serde_json_canonicalizer / blake3 / anyhow / tracing) μœ μ§€. - `kebab-parse-code` 의 `lang.rs` κ°±μ‹ λ§Œ. extractor λͺ¨λ“ˆ μΆ”κ°€ μ—†μŒ, μƒˆ crate dep μ—†μŒ. - `kebab-source-fs/src/media.rs` β€” `code_lang_for_path` 호좜둜 inline match 정리. κΈ°μ‘΄ dep μœ μ§€ (kebab-parse-code λŠ” 이미 의쑴). - `kebab-app::ingest_one_code_asset` β€” match λΆ„κΈ° ν™•μž₯. μƒˆ crate dep μ—†μŒ. ## Forbidden dependencies - `kebab-chunk` κ°€ store / embed / llm / rag / tree-sitter 직접 import κΈˆμ§€ (boundary Β§6.3 μœ μ§€). - `kebab-parse-code` κ°€ store / embed / llm / rag 직접 import κΈˆμ§€. - UI crate (`kebab-cli` / `kebab-mcp` / `kebab-tui` / `kebab-desktop`) κ°€ `kebab-parse-code` / `kebab-chunk` 직접 import κΈˆμ§€ β€” `kebab-app` facade 만. ## Risks / notes - **serde_yaml line offset μ—†μŒ** β†’ 원본 ν…μŠ€νŠΈμ˜ `^---\s*$` μ •κ·œμ‹ split 으둜 line 좔적. trailing `---` 의 빈 슬라이슀 / 첫 μŠ¬λΌμ΄μŠ€μ— `---` prefix μ—†μŒ / λΉ„-ν‘œμ€€ separator (예: `--- # comment`) λͺ¨λ‘ fixture 둜 검증. - **apiVersion / kind κ°€ string 이 μ•„λ‹Œ 경우** (예: `kind: 42`) β€” `serde_yaml::Value::as_str()` 으둜 string 체크 ν›„ 인정. λΉ„-string 이면 λΉ„-k8s μ·¨κΈ‰. - **cluster-scoped resource** (Namespace, ClusterRole, ClusterRoleBinding, …) β€” metadata.namespace μ—†μŒμ΄ 정상. symbol = `/` ν˜•νƒœ. - **metadata.name λˆ„λ½** β€” λΉ„μ •μƒμ΄μ§€λ§Œ panic κΈˆμ§€. `/` fallback + warning log. - **κ±°λŒ€ ConfigMap / Helm-rendered manifest** β€” `AST_CHUNK_MAX_LINES = 200` oversize fallback. split chunk κ°€ 같은 symbol 곡유 β†’ search μ‹œ dedupe λ˜λŠ” user-visible 두 hit 으둜 λ³΄μž„ (1A-2 의 oversize 와 동일 λ™μž‘). - **YAML anchor / merge keys (`&`, `<<`, `*`)** β€” serde_yaml κ°€ μžλ™ resolve. 원본 ν…μŠ€νŠΈ 보쑴 정책상 chunk text λŠ” 원본 (resolve μ „) μœ μ§€, νŒŒμ‹±μ€ resolve ν›„ κ°’μœΌλ‘œ. - **`Dockerfile.example` 같은 doc-purpose 파일** β€” ν™•μž₯자/접두사 맀칭에 작힘. user intent 와 μ–΄κΈ‹λ‚  수 μžˆμœΌλ‚˜ λ³Έ phase 의 scope λ°– (skip 정책은 1A-1 의 size/built-in/generated μ •μ±…μœΌλ‘œ ν†΅μ œ). dogfood ν›„ false positive λΉˆλ„ 보고 HOTFIXES κ²°μ •. - **`pom.xml` aggregate parent POM** β€” 맀우 큼 (수백~수천 쀄). oversize fallback 으둜 split. κ±°λŒ€ fixture 둜 ν•œ 번 검증. - **`media.rs` 정리** β€” 1A-1 λΆ€ν„° λˆ„μ λœ inline `match extension` duplication 을 `code_lang_for_path` 호좜둜 ꡐ체. κΈ°μ‘΄ λ‹¨μœ„ ν…ŒμŠ€νŠΈ λ™μž‘ 보쑴 (ν…ŒμŠ€νŠΈλŠ” κ²°κ³Ό κ°’λ§Œ λ³΄λ―€λ‘œ 톡과해야 함). - **λ¨Έμ§€ ν›„ deviation** 은 `tasks/HOTFIXES.md` dated 둜그 + λ³Έ spec `Risks / notes` 에 one-line cross-link. - **[HOTFIXES 2026-05-21]** multi-resource k8s YAML (2+ document) 이 `chunk_id` 좩돌둜 ingest μ‹€νŒ¨ β€” `push_chunks_with_oversize` 의 non-oversize λΆ„κΈ°κ°€ `split_key = None` ν•˜λ“œμ½”λ”©. PR #158 (v0.16.1) μ—μ„œ `base_split_key` νŒŒλΌλ―Έν„°λ‘œ fix. See `tasks/HOTFIXES.md` 2026-05-21 entry.