From 47857b2622d03383cf2afa1d65c5b7ed5c9db61f Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 20 May 2026 12:43:34 +0000 Subject: [PATCH 01/13] docs(p10-2): task spec for Tier 2 resource-aware chunkers (k8s + Dockerfile + manifest) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Frozen contract for the p10-2 single PR: 3 chunker activation, k8s identification via apiVersion+kind, Dockerfile/manifest basename matching, code_lang_for_path source-of-truth consolidation, frozen design ยง3.5 + ยง10.1 deltas, and version bump 0.13.0 โ†’ 0.14.0. Co-Authored-By: Claude Opus 4.7 (1M context) --- tasks/p10/p10-2-tier2-resource-aware.md | 120 ++++++++++++++++++++++++ 1 file changed, 120 insertions(+) create mode 100644 tasks/p10/p10-2-tier2-resource-aware.md diff --git a/tasks/p10/p10-2-tier2-resource-aware.md b/tasks/p10/p10-2-tier2-resource-aware.md new file mode 100644 index 0000000..8ae66ad --- /dev/null +++ b/tasks/p10/p10-2-tier2-resource-aware.md @@ -0,0 +1,120 @@ +# p10-2 โ€” Tier 2 resource-aware chunkers (k8s + Dockerfile + manifest) + +**Status:** ๐ŸŸก ์ง„ํ–‰ ์ค‘ +**Contract sections:** ยง3.3 (chunker_version `k8s-manifest-resource-v1` + `dockerfile-file-v1` + `manifest-file-v1`), ยง3.4 (citation symbol โ€” `//` / `` / ``), ยง3.5 (code_lang ์ถ”๊ฐ€ ๋งคํ•‘ `xml` / `groovy` / `go-mod`), ยง6.1 (`kebab-parse-code/src/lang.rs` ๊ฐฑ์‹  + `kebab-source-fs/src/media.rs` ์˜ inline duplication ์ •๋ฆฌ), ยง6.2 (`kebab-chunk/src/{k8s_manifest_resource_v1,dockerfile_file_v1,manifest_file_v1}.rs`), ยง9.2 (Tier 2 ์ •์˜), ยง10.1 (deactivation log ํ•œ ์ค„). +**Design:** [2026-05-15-kebab-code-ingest-design.md](../../docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md) ยง1.2 (Phase 2) + ยง9.2. +**Plan:** [2026-05-20-p10-2-tier2-resource-aware.md](../../docs/superpowers/plans/2026-05-20-p10-2-tier2-resource-aware.md). + +## Goal + +p10-1A-2 / 1B / 1C ์ธํ”„๋ผ ์œ„์— Tier 2 resource-aware chunker 3์ข…์„ ๋‹จ์ผ PR ๋กœ ํ™œ์„ฑํ™”. AST ๊ฐ€ ์•„๋‹Œ file/document-level chunking โ€” 1B (Python+TS+JS) ์˜ ๋ฌถ์Œ ํŒจํ„ด ๋”ฐ๋ฆ„. ๋จธ์ง€ ์‹œ์ ๋ถ€ํ„ฐ `.yaml` / `.yml` / `Dockerfile` / ๋งค๋‹ˆํŽ˜์ŠคํŠธ 7์ข… dogfooding ๊ฐ€๋Šฅ. + +๋น„-k8s YAML (Helm values, CI yml, docker-compose ๋“ฑ) ๋ฐ invalid YAML ์€ ๋ณธ phase ์—์„  skip โ€” p10-3 ์˜ paragraph fallback ์ด ๋จธ์ง€๋˜๋ฉด ์ž๋™์œผ๋กœ wire ๋จ. + +## ๋™๊ฒฐ๋œ ์„ค๊ณ„ ๊ฒฐ์ • (์ด task ๋กœ ํ™•์ •) + +### ๊ณตํ†ต + +- **3 chunker = self-contained**. `kebab-parse-code` ์— Tier 2 ์šฉ extractor ๋ชจ๋“ˆ ์ถ”๊ฐ€ ์—†์Œ. lang.rs ์˜ `code_lang_for_path` ๊ฐฑ์‹ ๋งŒ. AST ๊ฐ€ ์•„๋‹ˆ๋ผ ์ถ”์ƒํ™” ๋น„์šฉ์ด ์ฝ”๋“œ ๋ณด์ƒ๋ณด๋‹ค ํผ. +- **`code_lang_for_path` = single source of truth** (design ยง3.5). `kebab-source-fs/src/media.rs` ์˜ inline ํ™•์žฅ์ž match ๋Š” ์ด ํ•จ์ˆ˜ ํ˜ธ์ถœ๋กœ ํ†ต์ผ (1A-1 ๋ถ€ํ„ฐ ๋ˆ„์ ๋œ duplication ์ •๋ฆฌ, ์ž‘์€ ๋ฆฌํŒฉํ† ๋ง). +- **parser_version** = `"none-v1"` ํ†ต์ผ. Tier 2 ๋Š” parse ๋‹จ๊ณ„๊ฐ€ ์—†์Œ์„ ๋ช…์‹œํ•˜๋Š” sentinel. chunker_version cascade ๋งŒ ์˜๋ฏธ ์žˆ์Œ. +- **oversize fallback** = AST chunker ์™€ ๋™์ผ ์ •์ฑ… (`AST_CHUNK_MAX_LINES = 200` ์ดˆ๊ณผ ์‹œ line-window split). ๊ฑฐ๋Œ€ ConfigMap / multi-stage Dockerfile / aggregate POM ๋Œ€๋น„. split chunk ๋Š” ๊ฐ™์€ symbol ๊ณต์œ  (line range ๋งŒ ๋‹ค๋ฆ„). +- **frozen design ๊ฐฑ์‹ ** (๋ณธ PR ์•ˆ์—์„œ): + - ยง3.5 `code_lang` ๋งคํ•‘ ํ‘œ์— 3 ์ค„ ์ถ”๊ฐ€: + - XML (`.xml`, `pom.xml`) โ†’ `xml` + - Groovy (`build.gradle`, `.gradle`) โ†’ `groovy` + - Go module (`go.mod`) โ†’ `go-mod` + - ยง10.1 deactivation log ํ•œ ์ค„ ์ถ”๊ฐ€: "p10-2 ํ™œ์„ฑํ™” โ€” Tier 2 chunker 3์ข… active." + +### k8s-manifest-resource-v1 + +- **Trigger**: `MediaType::Code("yaml")` (= `.yaml` / `.yml`). +- **k8s ์‹๋ณ„**: YAML document ์˜ top-level mapping ์— `apiVersion: ` + `kind: ` ๋‘˜ ๋‹ค ์žˆ์–ด์•ผ ์ธ์ •. ํ•˜๋‚˜๋ผ๋„ ์—†๊ฑฐ๋‚˜ string ํƒ€์ž…์ด ์•„๋‹ˆ๋ฉด ๊ทธ document skip (์ „์ฒด ํŒŒ์ผ skip ์•„๋‹˜ โ€” ๋‹ค๋ฅธ document ๋Š” ์ •์ƒ ์ฒ˜๋ฆฌ). +- **Multi-document split ๊ตฌํ˜„**: `serde_yaml::Deserializer::from_str` ์˜ multi-document iterator ๊ฐ€ line offset ์„ ์•ˆ ์ค˜์„œ, ์›๋ณธ ํ…์ŠคํŠธ์˜ `^---\s*$` ์ค„ ์ •๊ทœ์‹ ๊ธฐ์ค€์œผ๋กœ pre-split ํ›„ ๊ฐ ์Šฌ๋ผ์ด์Šค๋ฅผ deserialize. line_start/line_end ๋Š” pre-split ๋‹จ๊ณ„์—์„œ ์ถ”์ . trailing `---` ์˜ ๋นˆ ์Šฌ๋ผ์ด์Šค๋Š” skip. +- **Symbol**: `//` (namespace ์žˆ์œผ๋ฉด) ๋˜๋Š” `/` (cluster-scoped) ๋˜๋Š” `/` (name ๋ˆ„๋ฝ). ์˜ˆ: `Deployment/prod/api-server`, `ClusterRole/cluster-admin`, `ConfigMap/`. +- **Chunk text**: pre-split ์Šฌ๋ผ์ด์Šค์˜ ์›๋ณธ ํ…์ŠคํŠธ ๊ทธ๋Œ€๋กœ (deserialized form ์•„๋‹˜ โ€” ์›๋ณธ ๋ณด์กด). +- **Citation**: `Citation::Code { path, line_start, line_end, symbol: Some(<์œ„>), lang: Some("yaml") }`. +- **Failure modes**: + - Invalid YAML (์–ด๋–ค document ๋ผ๋„ deserialize ์‹คํŒจ) โ†’ ํŒŒ์ผ ์ „์ฒด emit 0 chunk + warning log `invalid yaml: {path}`. p10-3 ์˜ paragraph fallback ์ด picked up. + - ์ธ์ •๋œ document 0๊ฐœ (๋ชจ๋‘ ๋น„-k8s) โ†’ ํŒŒ์ผ ์ „์ฒด emit 0 chunk. ๋™์ผ fallback. + +### dockerfile-file-v1 + +- **Trigger**: `MediaType::Code("dockerfile")` โ€” ํŒŒ์ผ๋ช…์ด ์ •ํ™•ํžˆ `Dockerfile`, ๋˜๋Š” prefix `Dockerfile.` (e.g. `Dockerfile.dev`), ๋˜๋Š” ํ™•์žฅ์ž `.dockerfile` (e.g. `myapp.dockerfile`). +- **Algorithm**: ํŒŒ์ผ ์ „์ฒด ํ…์ŠคํŠธ โ†’ 1 chunk emit. +- **Symbol**: ํ†ต์ผ ``. +- **Citation**: `Citation::Code { path, line_start: 1, line_end: , symbol: Some(""), lang: Some("dockerfile") }`. + +### manifest-file-v1 + +- **Trigger**: ํŒŒ์ผ๋ช…์ด design ยง9.2 ์˜ 7์ข… ์ค‘ ํ•˜๋‚˜: + | basename | code_lang | + |----------------|-----------| + | `Cargo.toml` | `toml` | + | `pyproject.toml` | `toml` | + | `package.json` | `json` | + | `tsconfig.json`| `json` | + | `go.mod` | `go-mod` | + | `pom.xml` | `xml` | + | `build.gradle` | `groovy` | +- **์ œ์™ธ**: `build.gradle.kts` ๋Š” 1C-JK ์˜ Kotlin AST chunker (code-kotlin-ast-v1) ๊ฐ€ ์žก์œผ๋ฏ€๋กœ ๋ณธ chunker ์˜ ๋Œ€์ƒ ์•„๋‹˜. +- **Algorithm**: ํŒŒ์ผ ์ „์ฒด ํ…์ŠคํŠธ โ†’ 1 chunk emit. +- **Symbol**: ํ†ต์ผ `` (7์ข… ๋ชจ๋‘). manifest ์ข…๋ฅ˜ ๊ตฌ๋ถ„์€ `code_lang` ์œผ๋กœ โ€” ์˜ˆ: `--code-lang go-mod` ๋Š” go.mod ๋งŒ, `--code-lang toml` ์€ Cargo.toml + pyproject.toml. +- **Citation**: `Citation::Code { path, line_start: 1, line_end: , symbol: Some(""), lang: Some(<์œ„ ๋งคํ•‘>) }`. + +### Routing (kebab-app::ingest_one_code_asset) + +๊ธฐ์กด 7-arm AST match ์˜†์— Tier 2 ๋ถ„๊ธฐ ์ถ”๊ฐ€: + +```text +"rust" | "python" | "typescript" | "javascript" + | "go" | "java" | "kotlin" โ†’ ๊ธฐ์กด AST chunker (1A-2 / 1B / 1C) +"yaml" โ†’ k8s_manifest_resource_v1 +"dockerfile" โ†’ dockerfile_file_v1 +"toml" | "json" | "xml" + | "groovy" | "go-mod" โ†’ manifest_file_v1 +_ โ†’ skip (p10-3 fallback ์˜ ์ž๋ฆฌ) +``` + +`code_lang_for_path` ์˜ lookup ์ˆœ์„œ: basename ์šฐ์„  ๋งค์นญ (`Cargo.toml` / `Dockerfile.*` / etc.) โ†’ ํ™•์žฅ์ž fallback (`.yaml` / `.toml` / etc.). + +## Acceptance criteria + +- `cargo test --workspace --no-fail-fast -j 1` passes (memory-conscious: per-crate ์œ„์ฃผ, full-suite gate ๋Š” docs task ์ง์ „ 1ํšŒ). +- `cargo clippy --workspace --all-targets -- -D warnings` passes. +- ๊ฐ chunker ์˜ snapshot test ์•ˆ์ •: + - `crates/kebab-chunk/tests/fixtures/sample.yaml` โ€” 2 k8s doc (Deployment + Service) + 1 ๋น„-k8s doc (apiVersion ๋น ์ง) โ†’ 2 chunk emit, ๋น„-k8s doc skip. + - `crates/kebab-chunk/tests/fixtures/sample.dockerfile` โ†’ 1 chunk, symbol ``. + - `crates/kebab-chunk/tests/fixtures/sample.Cargo.toml` + `sample.package.json` + `sample.pom.xml` + `sample.go.mod` (4์ข…) โ†’ ๊ฐ 1 chunk, symbol ``, ๋งคํ•‘๋œ code_lang. +- `code_lang_for_path` ์˜ basename ์šฐ์„  ๋งค์นญ + ํ™•์žฅ์ž fallback unit test. +- ๊ฒฉ๋ฆฌ TempDir KB ์— yaml + Dockerfile + Cargo.toml ๋‘๊ณ  `kebab search --code-lang yaml --json` / `--code-lang dockerfile --json` / `--code-lang toml --json` ๊ฐ๊ฐ `Citation::Code` ๋ฐ˜ํ™˜ (๊ธฐ์กด `code_ingest_smoke.rs` ์— 3 ํ…Œ์ŠคํŠธ ์ถ”๊ฐ€, ์ด 12 ํ…Œ์ŠคํŠธ). +- `kebab schema --json | jq .stats.code_lang_breakdown` ์— `yaml` / `dockerfile` / `toml` / `json` / `xml` / `groovy` / `go-mod` ์นด์šดํŠธ (์‚ฌ์šฉ๋œ ๊ฒƒ๋งŒ ๋“ฑ์žฅ). +- README + HANDOFF + docs/ARCHITECTURE + docs/SMOKE + tasks/INDEX + tasks/p10/INDEX ๊ฐฑ์‹ . +- frozen design ยง3.5 ๋งคํ•‘ 3 ์ค„ + ยง10.1 ํ™œ์„ฑํ™” ํ•œ ์ค„. +- workspace `Cargo.toml` minor bump (0.13.0 โ†’ 0.14.0), gitea-release v0.14.0. + +## Allowed dependencies + +- `kebab-chunk` ์— ์ƒˆ ๋ชจ๋“ˆ 3๊ฐœ (`k8s_manifest_resource_v1.rs` / `dockerfile_file_v1.rs` / `manifest_file_v1.rs`) ๋ฐ dep entry `serde_yaml = { workspace = true }` (workspace ์— ์ด๋ฏธ ์กด์žฌ). ๊ธฐ์กด deps (kebab-core / serde_json_canonicalizer / blake3 / anyhow / tracing) ์œ ์ง€. +- `kebab-parse-code` ์˜ `lang.rs` ๊ฐฑ์‹ ๋งŒ. extractor ๋ชจ๋“ˆ ์ถ”๊ฐ€ ์—†์Œ, ์ƒˆ crate dep ์—†์Œ. +- `kebab-source-fs/src/media.rs` โ€” `code_lang_for_path` ํ˜ธ์ถœ๋กœ inline match ์ •๋ฆฌ. ๊ธฐ์กด dep ์œ ์ง€ (kebab-parse-code ๋Š” ์ด๋ฏธ ์˜์กด). +- `kebab-app::ingest_one_code_asset` โ€” match ๋ถ„๊ธฐ ํ™•์žฅ. ์ƒˆ crate dep ์—†์Œ. + +## Forbidden dependencies + +- `kebab-chunk` ๊ฐ€ store / embed / llm / rag / tree-sitter ์ง์ ‘ import ๊ธˆ์ง€ (boundary ยง6.3 ์œ ์ง€). +- `kebab-parse-code` ๊ฐ€ store / embed / llm / rag ์ง์ ‘ import ๊ธˆ์ง€. +- UI crate (`kebab-cli` / `kebab-mcp` / `kebab-tui` / `kebab-desktop`) ๊ฐ€ `kebab-parse-code` / `kebab-chunk` ์ง์ ‘ import ๊ธˆ์ง€ โ€” `kebab-app` facade ๋งŒ. + +## Risks / notes + +- **serde_yaml line offset ์—†์Œ** โ†’ ์›๋ณธ ํ…์ŠคํŠธ์˜ `^---\s*$` ์ •๊ทœ์‹ split ์œผ๋กœ line ์ถ”์ . trailing `---` ์˜ ๋นˆ ์Šฌ๋ผ์ด์Šค / ์ฒซ ์Šฌ๋ผ์ด์Šค์— `---` prefix ์—†์Œ / ๋น„-ํ‘œ์ค€ separator (์˜ˆ: `--- # comment`) ๋ชจ๋‘ fixture ๋กœ ๊ฒ€์ฆ. +- **apiVersion / kind ๊ฐ€ string ์ด ์•„๋‹Œ ๊ฒฝ์šฐ** (์˜ˆ: `kind: 42`) โ€” `serde_yaml::Value::as_str()` ์œผ๋กœ string ์ฒดํฌ ํ›„ ์ธ์ •. ๋น„-string ์ด๋ฉด ๋น„-k8s ์ทจ๊ธ‰. +- **cluster-scoped resource** (Namespace, ClusterRole, ClusterRoleBinding, โ€ฆ) โ€” metadata.namespace ์—†์Œ์ด ์ •์ƒ. symbol = `/` ํ˜•ํƒœ. +- **metadata.name ๋ˆ„๋ฝ** โ€” ๋น„์ •์ƒ์ด์ง€๋งŒ panic ๊ธˆ์ง€. `/` fallback + warning log. +- **๊ฑฐ๋Œ€ ConfigMap / Helm-rendered manifest** โ€” `AST_CHUNK_MAX_LINES = 200` oversize fallback. split chunk ๊ฐ€ ๊ฐ™์€ symbol ๊ณต์œ  โ†’ search ์‹œ dedupe ๋˜๋Š” user-visible ๋‘ hit ์œผ๋กœ ๋ณด์ž„ (1A-2 ์˜ oversize ์™€ ๋™์ผ ๋™์ž‘). +- **YAML anchor / merge keys (`&`, `<<`, `*`)** โ€” serde_yaml ๊ฐ€ ์ž๋™ resolve. ์›๋ณธ ํ…์ŠคํŠธ ๋ณด์กด ์ •์ฑ…์ƒ chunk text ๋Š” ์›๋ณธ (resolve ์ „) ์œ ์ง€, ํŒŒ์‹ฑ์€ resolve ํ›„ ๊ฐ’์œผ๋กœ. +- **`Dockerfile.example` ๊ฐ™์€ doc-purpose ํŒŒ์ผ** โ€” ํ™•์žฅ์ž/์ ‘๋‘์‚ฌ ๋งค์นญ์— ์žกํž˜. user intent ์™€ ์–ด๊ธ‹๋‚  ์ˆ˜ ์žˆ์œผ๋‚˜ ๋ณธ phase ์˜ scope ๋ฐ– (skip ์ •์ฑ…์€ 1A-1 ์˜ size/built-in/generated ์ •์ฑ…์œผ๋กœ ํ†ต์ œ). dogfood ํ›„ false positive ๋นˆ๋„ ๋ณด๊ณ  HOTFIXES ๊ฒฐ์ •. +- **`pom.xml` aggregate parent POM** โ€” ๋งค์šฐ ํผ (์ˆ˜๋ฐฑ~์ˆ˜์ฒœ ์ค„). oversize fallback ์œผ๋กœ split. ๊ฑฐ๋Œ€ fixture ๋กœ ํ•œ ๋ฒˆ ๊ฒ€์ฆ. +- **`media.rs` ์ •๋ฆฌ** โ€” 1A-1 ๋ถ€ํ„ฐ ๋ˆ„์ ๋œ inline `match extension` duplication ์„ `code_lang_for_path` ํ˜ธ์ถœ๋กœ ๊ต์ฒด. ๊ธฐ์กด ๋‹จ์œ„ ํ…Œ์ŠคํŠธ ๋™์ž‘ ๋ณด์กด (ํ…Œ์ŠคํŠธ๋Š” ๊ฒฐ๊ณผ ๊ฐ’๋งŒ ๋ณด๋ฏ€๋กœ ํ†ต๊ณผํ•ด์•ผ ํ•จ). +- **๋จธ์ง€ ํ›„ deviation** ์€ `tasks/HOTFIXES.md` dated ๋กœ๊ทธ + ๋ณธ spec `Risks / notes` ์— one-line cross-link. From 5ce7f60932d89fb2b0529c24420a601f77b10147 Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 20 May 2026 12:55:36 +0000 Subject: [PATCH 02/13] docs(p10-2): implementation plan (11 tasks A-K, subagent-driven) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Branch feat/p10-2-tier2-resource. Tasks: serde_yaml dep / lang.rs basenames / media.rs source-of-truth consolidation / 3 chunkers (k8s + dockerfile + manifest) + tier2_shared helper / ingest dispatch / smoke tests / frozen design ยง3.5+ยง10.1 / docs sync / version bump 0.13.0โ†’0.14.0. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../2026-05-20-p10-2-tier2-resource-aware.md | 1343 +++++++++++++++++ 1 file changed, 1343 insertions(+) create mode 100644 docs/superpowers/plans/2026-05-20-p10-2-tier2-resource-aware.md diff --git a/docs/superpowers/plans/2026-05-20-p10-2-tier2-resource-aware.md b/docs/superpowers/plans/2026-05-20-p10-2-tier2-resource-aware.md new file mode 100644 index 0000000..05770ad --- /dev/null +++ b/docs/superpowers/plans/2026-05-20-p10-2-tier2-resource-aware.md @@ -0,0 +1,1343 @@ +# p10-2 Tier 2 Resource-Aware Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development to implement this plan task-by-task. + +**Goal:** Activate Tier 2 resource-aware chunkers (k8s manifest + Dockerfile + 7-file manifest set) in a single PR. AST ๊ฐ€ ์•„๋‹Œ file/document-level chunking. ๋จธ์ง€ ์‹œ์ ๋ถ€ํ„ฐ `.yaml` / `Dockerfile` / ๋งค๋‹ˆํŽ˜์ŠคํŠธ 7์ข… dogfooding ๊ฐ€๋Šฅ. + +**Architecture:** 3 self-contained chunker ๋ชจ๋“ˆ์„ `kebab-chunk` ์— ์ถ”๊ฐ€. `kebab-parse-code` ์˜ lang.rs ๋งŒ ๊ฐฑ์‹  (Tier 2 = AST ์—†์Œ). `kebab-source-fs/src/media.rs` ์˜ inline ํ™•์žฅ์ž match ๋ฅผ `code_lang_for_path` ํ˜ธ์ถœ๋กœ ํ†ต์ผ (1A-1 ๋ถ€ํ„ฐ ๋ˆ„์ ๋œ duplication ์ •๋ฆฌ). `ingest_one_code_asset` ์˜ match ๊ฐ€ Tier 2 lang 7์ข… (`yaml` / `dockerfile` / `toml` / `json` / `xml` / `groovy` / `go-mod`) ์„ ์ƒˆ chunker ๋กœ ๋ผ์šฐํŒ…. parser_version = `"none-v1"` ํ†ต์ผ. + +**Tech Stack:** Rust 2024 workspace, `serde_yaml = "0.9"` (์ด๋ฏธ workspace.dependencies). 1A-2 / 1B / 1C ์ธํ”„๋ผ ๋ณ€๊ฒฝ ์—†์Œ. + +**Memory note:** Host has been OOM'd previously (์žฌ๋ถ€ํŒ… ์‚ฌ๋ก€ ์žˆ์Œ). Per-crate cargo only. ONE full-suite + clippy invocation in Task J. NO `cargo test --workspace` outside that gate. + +--- + +## Pre-flight + +Branch `feat/p10-2-tier2-resource` ์ด๋ฏธ ์กด์žฌ (spec commit 47857b2 ํฌํ•จ). + +- [ ] **Disk hygiene**: `df -h /` ์ ๊ฒ€. 90% ๋„˜์œผ๋ฉด `cargo clean` (last cleanup recovered 38.7 GB). + +Reference files: +- 1A-2 chunker: `crates/kebab-chunk/src/code_rust_ast_v1.rs` โ€” `AST_CHUNK_MAX_LINES = 200` / `POLICY_HASH_HEX_LEN = 16` / `BYTES_PER_TOKEN = 3` ์ƒ์ˆ˜ + `Document โ†’ Vec` ํŒจํ„ด. +- 1C-JK dispatch generalization: `crates/kebab-app/src/lib.rs::ingest_one_code_asset` (~L1794). ํ˜„์žฌ 7-arm match (rust|python|typescript|javascript|go|java|kotlin). Tier 2 ๋ถ„๊ธฐ ์ถ”๊ฐ€ ์ž๋ฆฌ. +- 1A-1 code_lang_for_path: `crates/kebab-parse-code/src/lang.rs`. basename ์šฐ์„  ๋งค์นญ ํŒจํ„ด ์‹ ์„ค. +- 1A-1 media.rs: `crates/kebab-source-fs/src/media.rs`. inline `match extension` duplication. +- spec: `tasks/p10/p10-2-tier2-resource-aware.md`. + +--- + +## Task A: kebab-chunk ์— serde_yaml dep ์ถ”๊ฐ€ + +**Files:** +- Modify: `crates/kebab-chunk/Cargo.toml` (dependencies ์ ˆ) + +- [ ] **Step 1**: `crates/kebab-chunk/Cargo.toml` ์˜ `[dependencies]` ์ ˆ์— ์ถ”๊ฐ€ (serde_json ๋‹ค์Œ ์ค„): + +```toml +serde_yaml = { workspace = true } +``` + +- [ ] **Step 2**: `cargo build -p kebab-chunk` โ†’ clean (unused dep warning ๋ฌด์‹œ โ€” Task D ์—์„œ ์‚ฌ์šฉ). + +- [ ] **Step 3**: Commit: + +```bash +git add crates/kebab-chunk/Cargo.toml +git commit -m "$(cat <<'EOF' +build(p10-2): add serde_yaml dep to kebab-chunk for k8s-manifest-resource-v1 + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task B: lang.rs โ€” basename + ํ™•์žฅ์ž ์ถ”๊ฐ€ + Tier 2 ๋งคํ•‘ + +**Files:** +- Modify: `crates/kebab-parse-code/src/lang.rs` +- Test: same file's test module + +- [ ] **Step 1 (failing test)**: `lang.rs` ์˜ `#[cfg(test)] mod tests` ์— ์ถ”๊ฐ€ (๊ธฐ์กด ํ…Œ์ŠคํŠธ ์˜†): + +```rust +#[test] +fn tier2_basename_takes_precedence_over_extension() { + assert_eq!(code_lang_for_path(Path::new("Dockerfile")), Some("dockerfile")); + assert_eq!(code_lang_for_path(Path::new("foo/Dockerfile.dev")), Some("dockerfile")); + assert_eq!(code_lang_for_path(Path::new("myapp.dockerfile")), Some("dockerfile")); + assert_eq!(code_lang_for_path(Path::new("repo/Cargo.toml")), Some("toml")); + assert_eq!(code_lang_for_path(Path::new("pyproject.toml")), Some("toml")); + assert_eq!(code_lang_for_path(Path::new("repo/package.json")), Some("json")); + assert_eq!(code_lang_for_path(Path::new("tsconfig.json")), Some("json")); + assert_eq!(code_lang_for_path(Path::new("go.mod")), Some("go-mod")); + assert_eq!(code_lang_for_path(Path::new("pom.xml")), Some("xml")); + assert_eq!(code_lang_for_path(Path::new("build.gradle")), Some("groovy")); +} + +#[test] +fn tier2_extension_fallback() { + assert_eq!(code_lang_for_path(Path::new("k8s/deploy.yaml")), Some("yaml")); + assert_eq!(code_lang_for_path(Path::new("k8s/deploy.yml")), Some("yaml")); + assert_eq!(code_lang_for_path(Path::new("foo/bar.toml")), Some("toml")); + assert_eq!(code_lang_for_path(Path::new("foo/bar.json")), Some("json")); + assert_eq!(code_lang_for_path(Path::new("foo/bar.xml")), Some("xml")); + assert_eq!(code_lang_for_path(Path::new("foo/bar.gradle")), Some("groovy")); +} +``` + +- [ ] **Step 2**: Run โ†’ FAIL. + +```bash +cargo test -p kebab-parse-code lang::tests::tier2 -- --nocapture +``` + +Expected: function returns `None` for all new inputs. + +- [ ] **Step 3 (impl)**: `code_lang_for_path` ๋ณธ๋ฌธ์„ ๋‹ค์Œ ํ˜•ํƒœ๋กœ ๊ฐฑ์‹  (๊ธฐ์กด ํ™•์žฅ์ž ๋งค์นญ์€ ์œ ์ง€ํ•˜๊ณ  basename ๋ถ„๊ธฐ๋ฅผ *๋งจ ์•ž* ์œผ๋กœ): + +```rust +pub fn code_lang_for_path(path: &Path) -> Option<&'static str> { + // p10-2: basename takes precedence over extension (Dockerfile, Cargo.toml, โ€ฆ). + let file_name = path.file_name().and_then(|n| n.to_str()).unwrap_or(""); + match file_name { + "Dockerfile" | "Cargo.toml" | "pyproject.toml" | "package.json" + | "tsconfig.json" | "go.mod" | "pom.xml" | "build.gradle" => { + return Some(match file_name { + "Dockerfile" => "dockerfile", + "Cargo.toml" | "pyproject.toml" => "toml", + "package.json" | "tsconfig.json" => "json", + "go.mod" => "go-mod", + "pom.xml" => "xml", + "build.gradle" => "groovy", + _ => unreachable!(), + }); + } + _ => {} + } + // Dockerfile.* prefix variant (Dockerfile.dev, Dockerfile.prod, โ€ฆ). + if let Some(rest) = file_name.strip_prefix("Dockerfile.") { + if !rest.is_empty() { + return Some("dockerfile"); + } + } + + // Extension fallback. + let ext = path.extension().and_then(|e| e.to_str())?; + let lang = match ext { + "rs" => "rust", + "py" | "pyi" => "python", + "ts" | "tsx" | "mts" | "cts" => "typescript", + "js" | "mjs" | "cjs" | "jsx" => "javascript", + "go" => "go", + "java" => "java", + "kt" | "kts" => "kotlin", + // p10-2: Tier 2 extensions. + "yaml" | "yml" => "yaml", + "dockerfile" => "dockerfile", + "toml" => "toml", + "json" => "json", + "xml" => "xml", + "gradle" => "groovy", + _ => return None, + }; + Some(lang) +} +``` + +(๊ธฐ์กด ํ•จ์ˆ˜์˜ ํ™•์žฅ์ž ์ ˆ ๊ทธ๋Œ€๋กœ ๋ณด์กดํ•˜๊ณ  ์œ„ 7์ค„๋งŒ ์ถ”๊ฐ€. ๊ธฐ์กด ์ฝ”๋“œ ํ˜•์‹์ด ๋‹ค๋ฅด๋ฉด ๊ทธ ํ˜•์‹ ์œ ์ง€ + Tier 2 ๋ผ์ธ๋งŒ ์ถ”๊ฐ€.) + +- [ ] **Step 4**: Run โ†’ PASS. + +```bash +cargo test -p kebab-parse-code lang::tests -- --nocapture +``` + +Expected: ๋ชจ๋“  lang::tests ํ†ต๊ณผ. + +- [ ] **Step 5**: Clippy + commit: + +```bash +cargo clippy -p kebab-parse-code --all-targets -- -D warnings +git add crates/kebab-parse-code/src/lang.rs +git commit -m "$(cat <<'EOF' +feat(p10-2): extend code_lang_for_path with Tier 2 basenames + extensions + +Adds basename-first matching for Dockerfile / Cargo.toml / pyproject.toml / +package.json / tsconfig.json / go.mod / pom.xml / build.gradle plus +Dockerfile.* prefix variant. Extension fallback adds .yaml/.yml/.dockerfile/ +.toml/.json/.xml/.gradle โ†’ yaml/dockerfile/toml/json/xml/groovy. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task C: media.rs โ€” code_lang_for_path ํ˜ธ์ถœ๋กœ inline match ํ†ต์ผ + +**Files:** +- Modify: `crates/kebab-source-fs/src/media.rs` + +design ยง3.5 ์˜ "code_lang_for_path ๊ฐ€ *์œ ์ผํ•œ source of truth*" ๋ฃฐ ์ ์šฉ. 1A-1 ๋ถ€ํ„ฐ ๋ˆ„์ ๋œ duplication ์ •๋ฆฌ. + +- [ ] **Step 1**: ํ˜„์žฌ `media_type_for` ํ•จ์ˆ˜์˜ `match extension` ์˜ code ๋งค์นญ ์ ˆ์„ ๋ชจ๋‘ ํ•œ ์ค„๋กœ ๊ต์ฒด: + +```rust +pub fn media_type_for(path: &Path) -> MediaType { + // p10-2: code_lang_for_path is the single source of truth for code lang. + if let Some(lang) = kebab_parse_code::code_lang_for_path(path) { + return MediaType::Code(lang.to_string()); + } + + // ๊ธฐ์กด ๋น„-code ํ™•์žฅ์ž ๋งค์นญ (markdown / pdf / images / etc.) ์€ ๊ทธ๋Œ€๋กœ ์œ ์ง€. + let ext = path.extension().and_then(|e| e.to_str()).unwrap_or(""); + match ext { + // ... ๊ธฐ์กด ๋น„-code ์ ˆ ๊ทธ๋Œ€๋กœ ... + "md" | "markdown" => MediaType::Markdown, + // (๊ธฐ์กด ์ฝ”๋“œ ๊ทธ๋Œ€๋กœ โ€” code lang ์ ˆ ๋งŒ ์‚ญ์ œ) + _ => MediaType::Other(ext.to_string()), + } +} +``` + +(์ „์ฒด ํ•จ์ˆ˜ ๋ณธ๋ฌธ์€ ํ˜„์žฌ ํŒŒ์ผ์„ Read ํ•œ ํ›„ code ์ ˆ๋งŒ ์‚ญ์ œํ•˜๊ณ  ์œ„ if-let ์„ ํ•จ์ˆ˜ ๋งจ ์•ž์— ์ถ”๊ฐ€.) + +- [ ] **Step 2**: ๊ธฐ์กด media.rs ์˜ test ๋ชจ๋“ˆ ๋ณด์กด โ€” `code_files_map_to_media_code`, `go_files_map_to_media_code_go`, `java_kotlin_files_map_to_media_code` ๋“ฑ ๋ชจ๋‘ ํ†ต๊ณผํ•ด์•ผ ํ•จ. ์ถ”๊ฐ€ Tier 2 ํ…Œ์ŠคํŠธ: + +```rust +#[test] +fn tier2_files_map_to_media_code() { + assert_eq!(media_type_for(Path::new("a/deploy.yaml")), MediaType::Code("yaml".into())); + assert_eq!(media_type_for(Path::new("a/Dockerfile")), MediaType::Code("dockerfile".into())); + assert_eq!(media_type_for(Path::new("a/Cargo.toml")), MediaType::Code("toml".into())); + assert_eq!(media_type_for(Path::new("a/pom.xml")), MediaType::Code("xml".into())); + assert_eq!(media_type_for(Path::new("a/build.gradle")), MediaType::Code("groovy".into())); + assert_eq!(media_type_for(Path::new("a/go.mod")), MediaType::Code("go-mod".into())); +} +``` + +- [ ] **Step 3**: `cargo test -p kebab-source-fs` โ†’ ๊ธฐ์กด + ์‹ ๊ทœ ํ…Œ์ŠคํŠธ ๋ชจ๋‘ PASS. ๋งŒ์•ฝ ๋น„-code ํ™•์žฅ์ž (md/pdf/etc.) ๋งค์นญ์ด ๊นจ์กŒ์œผ๋ฉด Step 1 ์˜ ๋น„-code ์ ˆ ๋ณด์กด ๋ˆ„๋ฝ โ€” ๋‹ค์‹œ ํ™•์ธ. + +- [ ] **Step 4**: Clippy + commit: + +```bash +cargo clippy -p kebab-source-fs --all-targets -- -D warnings +git add crates/kebab-source-fs/src/media.rs +git commit -m "$(cat <<'EOF' +refactor(p10-2): media.rs delegates code lang to code_lang_for_path + +Replaces 1A-1 era inline match block with a single call to +kebab_parse_code::code_lang_for_path, per design ยง3.5 single-source-of-truth +rule. Adds Tier 2 routing test (yaml / dockerfile / toml / json / xml / +groovy / go-mod) and preserves all non-code extension branches. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task D: k8s-manifest-resource-v1 chunker + +**Files:** +- Create: `crates/kebab-chunk/src/k8s_manifest_resource_v1.rs` +- Create: `crates/kebab-chunk/tests/fixtures/sample_k8s.yaml` +- Create: `crates/kebab-chunk/tests/k8s_manifest_resource_v1.rs` +- Modify: `crates/kebab-chunk/src/lib.rs` (pub use) + +๊ฐ€์žฅ ๋ณต์žกํ•œ chunker. pre-split + serde_yaml deserialize + identify + emit + oversize fallback. + +- [ ] **Step 1 (fixture)**: `crates/kebab-chunk/tests/fixtures/sample_k8s.yaml` ์ƒ์„ฑ. 3 document (2 k8s + 1 ๋น„-k8s): + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: api-server + namespace: prod +spec: + replicas: 3 + selector: + matchLabels: + app: api-server + template: + metadata: + labels: + app: api-server + spec: + containers: + - name: api + image: example/api:1.2.3 +--- +apiVersion: v1 +kind: Service +metadata: + name: api-server + namespace: prod +spec: + selector: + app: api-server + ports: + - port: 80 + targetPort: 8080 +--- +# Non-k8s document โ€” apiVersion missing +kind: ClusterIP +foo: bar +``` + +- [ ] **Step 2 (failing test)**: `crates/kebab-chunk/tests/k8s_manifest_resource_v1.rs` ์ƒ์„ฑ: + +```rust +use kebab_chunk::{ChunkPolicy, Chunker, K8sManifestResourceV1Chunker}; +use kebab_core::{Asset, AssetId, Document, MediaType, ParserVersion, SourceSpan}; +use std::path::PathBuf; + +fn read_fixture(name: &str) -> String { + let path = PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests/fixtures") + .join(name); + std::fs::read_to_string(path).expect("read fixture") +} + +fn make_doc(lang: &str, text: &str) -> Document { + // Tier 2 makes a Document directly (no extractor). Mirror what + // ingest_one_code_asset will do for Tier 2 in Task G. + use kebab_core::{Block, Inline}; + Document { + doc_id: "test-doc".to_string(), + asset: Asset { + asset_id: AssetId("test".to_string()), + workspace_path: format!("test.{lang}"), + byte_len: text.len() as u64, + content_hash: "deadbeef".to_string(), + media_type: MediaType::Code(lang.to_string()), + }, + parser_version: ParserVersion("none-v1".to_string()), + metadata: Default::default(), + blocks: vec![Block::Code { + text: text.to_string(), + lang: Some(lang.to_string()), + span: SourceSpan::Line { start: 1, end: text.lines().count() as u32 }, + }], + } +} + +#[test] +fn k8s_multi_doc_emits_one_chunk_per_resource() { + let text = read_fixture("sample_k8s.yaml"); + let doc = make_doc("yaml", &text); + let policy = ChunkPolicy::default(); + let chunks = K8sManifestResourceV1Chunker.chunk(&doc, &policy).unwrap(); + + // 2 k8s resources accepted, 1 non-k8s skipped. + assert_eq!(chunks.len(), 2, "expected 2 k8s chunks, got {}", chunks.len()); + + let symbols: Vec<_> = chunks.iter() + .map(|c| c.source_span_symbol().unwrap_or_default().to_string()) + .collect(); + assert_eq!(symbols, vec![ + "Deployment/prod/api-server".to_string(), + "Service/prod/api-server".to_string(), + ]); + + // Each chunk's lang field is "yaml". + for c in &chunks { + assert_eq!(c.source_span_lang().as_deref(), Some("yaml")); + } +} + +#[test] +fn k8s_invalid_yaml_emits_zero_chunks() { + let invalid = "apiVersion: v1\nkind: Service\n\tbadtab: x\n"; // invalid YAML + let doc = make_doc("yaml", invalid); + let policy = ChunkPolicy::default(); + let chunks = K8sManifestResourceV1Chunker.chunk(&doc, &policy).unwrap(); + assert!(chunks.is_empty(), "invalid yaml -> 0 chunks"); +} + +#[test] +fn k8s_cluster_scoped_resource_symbol() { + let cluster = r#"apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: cluster-admin +rules: [] +"#; + let doc = make_doc("yaml", cluster); + let policy = ChunkPolicy::default(); + let chunks = K8sManifestResourceV1Chunker.chunk(&doc, &policy).unwrap(); + assert_eq!(chunks.len(), 1); + assert_eq!( + chunks[0].source_span_symbol().unwrap_or_default(), + "ClusterRole/cluster-admin" + ); +} +``` + +(`Chunk` ์˜ `source_span_symbol()` / `source_span_lang()` ๋Š” helper. ์‹ค์ œ API ๊ฐ€ ๋‹ค๋ฅด๋ฉด โ€” ์˜ˆ: `chunk.source_spans[0]` ์˜ `SourceSpan::Code { symbol, lang, .. }` ์ง์ ‘ ์ ‘๊ทผ โ€” 1A-2 ์˜ snapshot test ํ˜•์‹ ์ฐธ๊ณ ํ•˜์—ฌ ๋™์ผ ํŒจํ„ด์œผ๋กœ ์ž‘์„ฑ. Step 4 impl ์ž‘์„ฑ ํ›„ API ์ผ์น˜ ํ™•์ธ.) + +- [ ] **Step 3**: `cargo test -p kebab-chunk k8s_manifest_resource_v1` โ†’ FAIL ("K8sManifestResourceV1Chunker not found"). + +- [ ] **Step 4a (shared helper)**: ๋จผ์ € `crates/kebab-chunk/src/tier2_shared.rs` ์ƒ์„ฑ โ€” Task D/E/F ๊ฐ€ ๋ชจ๋‘ ์‚ฌ์šฉํ•  oversize-aware chunk emit helper. impl ์ž‘์„ฑ ์ „ `crates/kebab-chunk/src/code_rust_ast_v1.rs` ์˜ Chunk ์ƒ์„ฑ ์ฝ”๋“œ (hash, token count, ChunkPolicy ์ ์šฉ ๋ถ€๋ถ„) Read ํ•˜๊ณ  ๋™์ผ ํŒจํ„ด ๋ฏธ๋Ÿฌ๋ง: + +```rust +//! p10-2: Tier 2 chunker shared helpers (oversize fallback + Chunk build). + +use crate::ChunkPolicy; +use anyhow::Result; +use kebab_core::{Chunk, Document, SourceSpan}; + +pub(crate) const AST_CHUNK_MAX_LINES: u32 = 200; + +/// Push 1+ chunks for a region. โ‰ค200 lines โ†’ 1 chunk. >200 โ†’ line-window +/// split with same symbol (only line range varies). Mirrors 1A-2's oversize +/// fallback. +#[allow(clippy::too_many_arguments)] +pub(crate) fn push_chunks_with_oversize( + out: &mut Vec, + doc: &Document, + policy: &ChunkPolicy, + text: &str, + line_start: u32, + line_end: u32, + symbol: &str, + lang: &str, + chunker_version: &str, +) -> Result<()> { + let n_lines = (line_end - line_start + 1).max(1); + if n_lines <= AST_CHUNK_MAX_LINES { + out.push(build_chunk(doc, policy, text, line_start, line_end, symbol, lang, chunker_version)?); + return Ok(()); + } + let lines: Vec<&str> = text.lines().collect(); + let mut window_start = line_start; + let mut i = 0usize; + while i < lines.len() { + let take = (AST_CHUNK_MAX_LINES as usize).min(lines.len() - i); + let window_text = lines[i..i + take].join("\n"); + let window_end = window_start + take as u32 - 1; + out.push(build_chunk(doc, policy, &window_text, window_start, window_end, symbol, lang, chunker_version)?); + i += take; + window_start = window_end + 1; + } + Ok(()) +} + +/// Build a single Chunk from a (text, line range, symbol, lang) tuple. +/// MUST mirror code_rust_ast_v1.rs's Chunk construction so hash / token / +/// ChunkPolicy semantics stay identical across Tier 1 and Tier 2. +fn build_chunk( + doc: &Document, + policy: &ChunkPolicy, + text: &str, + line_start: u32, + line_end: u32, + symbol: &str, + lang: &str, + chunker_version: &str, +) -> Result { + let span = SourceSpan::Code { + line_start, + line_end, + symbol: Some(symbol.to_string()), + lang: Some(lang.to_string()), + }; + // TODO at impl time: replicate 1A-2's exact Chunk { id, text, source_spans, + // chunker_version, parser_version, policy_hash, token_count, content_hash, ... } + // field-fill, computing id / content_hash / policy_hash via the same helpers + // (blake3 + serde_json_canonicalizer) used by code_rust_ast_v1. The exact + // function names are in code_rust_ast_v1.rs's `chunk` impl; mirror them + // here. + todo!("mirror code_rust_ast_v1's Chunk construction; see Task D Step 4a comment") +} +``` + +(`todo!()` ๋Š” placeholder ํ‘œ์‹œ โ€” impl ๋‹จ๊ณ„์—์„œ `code_rust_ast_v1.rs` ์˜ ์‹ค์ œ chunk ์ƒ์„ฑ ๋ถ€๋ถ„์„ ๊ทธ๋Œ€๋กœ ์˜ฎ๊น€. 1A-2 ์˜ Chunk ์ƒ์„ฑ์ด ~30 ์ค„ ์ •๋„๋ฉด ๊ทธ๊ฒƒ์„ build_chunk ์•ˆ์œผ๋กœ ์˜ฎ๊ธฐ๋˜ `span` / `chunker_version` ์„ ์ธ์ž ํ˜•ํƒœ๋กœ ๋ฐ›์Œ.) + +- [ ] **Step 4b (k8s chunker impl)**: `crates/kebab-chunk/src/k8s_manifest_resource_v1.rs` ์ž‘์„ฑ: + +```rust +//! p10-2: k8s manifest resource-aware chunker. +//! +//! YAML multi-document split with `apiVersion` + `kind` identification. +//! 1 chunk per recognized resource, symbol `//`. +//! Invalid YAML or non-k8s document โ†’ 0 chunks (handled by p10-3 fallback). + +use crate::tier2_shared::push_chunks_with_oversize; +use crate::{Chunker, ChunkPolicy}; +use anyhow::Result; +use kebab_core::{Block, Chunk, Document}; + +pub const VERSION_LABEL: &str = "k8s-manifest-resource-v1"; + +pub struct K8sManifestResourceV1Chunker; + +impl Chunker for K8sManifestResourceV1Chunker { + fn chunker_version(&self) -> &'static str { VERSION_LABEL } + + fn chunk(&self, doc: &Document, policy: &ChunkPolicy) -> Result> { + let Some(Block::Code { text, .. }) = doc.blocks.first() else { + return Ok(vec![]); + }; + + // Pre-split on ^---\s*$ to track line numbers (serde_yaml's + // multi-doc iterator doesn't expose line offsets). + let slices = split_yaml_documents(text); + + let mut chunks = Vec::new(); + for slice in slices { + // Any parse error โ†’ skip whole file (return empty; p10-3 fallback later). + let value: serde_yaml::Value = match serde_yaml::from_str(slice.text) { + Ok(v) => v, + Err(_) => return Ok(vec![]), + }; + let Some(mapping) = value.as_mapping() else { continue }; + + let api = mapping.get("apiVersion").and_then(|v| v.as_str()).unwrap_or(""); + let kind = mapping.get("kind").and_then(|v| v.as_str()).unwrap_or(""); + if api.is_empty() || kind.is_empty() { + continue; + } + + let metadata = mapping.get("metadata").and_then(|v| v.as_mapping()); + let name = metadata + .and_then(|m| m.get("name")) + .and_then(|v| v.as_str()) + .unwrap_or(""); + let namespace = metadata + .and_then(|m| m.get("namespace")) + .and_then(|v| v.as_str()); + + let symbol = match namespace { + Some(ns) if !ns.is_empty() => format!("{kind}/{ns}/{name}"), + _ => format!("{kind}/{name}"), + }; + + push_chunks_with_oversize( + &mut chunks, doc, policy, + slice.text, slice.line_start, slice.line_end, + &symbol, "yaml", VERSION_LABEL, + )?; + } + Ok(chunks) + } +} + +struct YamlSlice<'a> { + text: &'a str, + line_start: u32, + line_end: u32, +} + +fn split_yaml_documents(text: &str) -> Vec> { + let mut slices = Vec::new(); + let lines: Vec<&str> = text.lines().collect(); + + let mut separators: Vec = lines.iter().enumerate() + .filter_map(|(i, l)| { + let trimmed = l.trim_end(); + if trimmed == "---" || trimmed.starts_with("--- ") || trimmed.starts_with("---\t") { + Some(i) + } else { None } + }) + .collect(); + separators.push(lines.len()); // sentinel after last line + + let mut doc_start_line: usize = 0; // 0-indexed + for sep_line in separators { + if sep_line > doc_start_line { + let start_byte = byte_offset_of_line(text, doc_start_line); + let end_byte = byte_offset_of_line(text, sep_line); + let slice_text = &text[start_byte..end_byte]; + if !slice_text.trim().is_empty() { + slices.push(YamlSlice { + text: slice_text, + line_start: (doc_start_line + 1) as u32, + line_end: sep_line as u32, + }); + } + } + doc_start_line = sep_line + 1; + } + slices +} + +fn byte_offset_of_line(text: &str, line_idx: usize) -> usize { + if line_idx == 0 { return 0; } + let mut count = 0usize; + for (i, c) in text.char_indices() { + if c == '\n' { + count += 1; + if count == line_idx { return i + 1; } + } + } + text.len() +} +``` + +- [ ] **Step 5**: `crates/kebab-chunk/src/lib.rs` ์— ์ถ”๊ฐ€ (tier2_shared ์€ pub ์•„๋‹˜ โ€” crate-internal): + +```rust +mod tier2_shared; +pub mod k8s_manifest_resource_v1; +pub use k8s_manifest_resource_v1::K8sManifestResourceV1Chunker; +``` + +- [ ] **Step 6**: `cargo test -p kebab-chunk k8s_manifest_resource_v1 -- --nocapture` โ†’ PASS. fixture ์˜ 2 k8s chunk + ๋น„-k8s skip + invalid yaml 0 chunk + cluster-scoped symbol ๊ฒ€์ฆ. + +- [ ] **Step 7**: Clippy + commit: + +```bash +cargo clippy -p kebab-chunk --all-targets -- -D warnings +git add crates/kebab-chunk/src/k8s_manifest_resource_v1.rs \ + crates/kebab-chunk/src/lib.rs \ + crates/kebab-chunk/tests/fixtures/sample_k8s.yaml \ + crates/kebab-chunk/tests/k8s_manifest_resource_v1.rs +git commit -m "$(cat <<'EOF' +feat(p10-2): k8s-manifest-resource-v1 chunker (YAML multi-doc + apiVersion+kind identification) + +Splits multi-document YAML by ^---\s*$, requires apiVersion + kind string +fields per document, emits 1 chunk per recognized k8s resource. Symbol = +// or / (cluster-scoped). Invalid YAML +returns 0 chunks (handled by p10-3 paragraph fallback). Oversize >200 lines +splits into line-windows sharing the same symbol. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task E: dockerfile-file-v1 chunker + +**Files:** +- Create: `crates/kebab-chunk/src/dockerfile_file_v1.rs` +- Create: `crates/kebab-chunk/tests/fixtures/sample.dockerfile` +- Create: `crates/kebab-chunk/tests/dockerfile_file_v1.rs` +- Modify: `crates/kebab-chunk/src/lib.rs` + +- [ ] **Step 1 (fixture)**: `crates/kebab-chunk/tests/fixtures/sample.dockerfile` (5 ์ค„): + +```dockerfile +FROM rust:1.94-slim AS builder +WORKDIR /app +COPY . . +RUN cargo build --release +CMD ["/app/target/release/kebab"] +``` + +- [ ] **Step 2 (failing test)**: `crates/kebab-chunk/tests/dockerfile_file_v1.rs`: + +```rust +use kebab_chunk::{ChunkPolicy, Chunker, DockerfileFileV1Chunker}; +use kebab_core::{Asset, AssetId, Block, Document, MediaType, ParserVersion, SourceSpan}; +use std::path::PathBuf; + +fn read_fixture(name: &str) -> String { + let path = PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests/fixtures").join(name); + std::fs::read_to_string(path).unwrap() +} +fn make_doc(lang: &str, text: &str) -> Document { + Document { + doc_id: "test".into(), + asset: Asset { + asset_id: AssetId("a".into()), + workspace_path: "Dockerfile".into(), + byte_len: text.len() as u64, + content_hash: "deadbeef".into(), + media_type: MediaType::Code(lang.into()), + }, + parser_version: ParserVersion("none-v1".into()), + metadata: Default::default(), + blocks: vec![Block::Code { + text: text.into(), + lang: Some(lang.into()), + span: SourceSpan::Line { start: 1, end: text.lines().count() as u32 }, + }], + } +} + +#[test] +fn dockerfile_emits_single_chunk() { + let text = read_fixture("sample.dockerfile"); + let doc = make_doc("dockerfile", &text); + let chunks = DockerfileFileV1Chunker.chunk(&doc, &ChunkPolicy::default()).unwrap(); + assert_eq!(chunks.len(), 1); + assert_eq!(chunks[0].source_span_symbol().as_deref(), Some("")); + assert_eq!(chunks[0].source_span_lang().as_deref(), Some("dockerfile")); + // line_start = 1, line_end = 5 (5-line fixture). + let (ls, le) = chunks[0].source_span_lines(); + assert_eq!((ls, le), (1, 5)); +} +``` + +(`source_span_lines()` / `source_span_symbol()` API ๊ฐ€ 1A-2 ์™€ ๋™์ผํ•œ์ง€ ํ™•์ธ ํ›„ ๋ฏธ๋Ÿฌ๋ง.) + +- [ ] **Step 3**: `cargo test -p kebab-chunk dockerfile_file_v1` โ†’ FAIL. + +- [ ] **Step 4 (impl)**: `crates/kebab-chunk/src/dockerfile_file_v1.rs`. Task D Step 4a ์˜ `tier2_shared::push_chunks_with_oversize` ์žฌ์‚ฌ์šฉ: + +```rust +//! p10-2: dockerfile whole-file chunker (Tier 2). + +use crate::tier2_shared::push_chunks_with_oversize; +use crate::{Chunker, ChunkPolicy}; +use anyhow::Result; +use kebab_core::{Block, Chunk, Document}; + +pub const VERSION_LABEL: &str = "dockerfile-file-v1"; + +pub struct DockerfileFileV1Chunker; + +impl Chunker for DockerfileFileV1Chunker { + fn chunker_version(&self) -> &'static str { VERSION_LABEL } + + fn chunk(&self, doc: &Document, policy: &ChunkPolicy) -> Result> { + let Some(Block::Code { text, .. }) = doc.blocks.first() else { + return Ok(vec![]); + }; + let total_lines = text.lines().count().max(1) as u32; + let mut chunks = Vec::new(); + push_chunks_with_oversize( + &mut chunks, doc, policy, + text, 1, total_lines, + "", "dockerfile", VERSION_LABEL, + )?; + Ok(chunks) + } +} +``` + +- [ ] **Step 5**: `crates/kebab-chunk/src/lib.rs` ๊ฐฑ์‹ : + +```rust +pub mod dockerfile_file_v1; +pub use dockerfile_file_v1::DockerfileFileV1Chunker; +``` + +- [ ] **Step 6**: `cargo test -p kebab-chunk dockerfile_file_v1` โ†’ PASS. + +- [ ] **Step 7**: Clippy + commit: + +```bash +cargo clippy -p kebab-chunk --all-targets -- -D warnings +git add crates/kebab-chunk/src/dockerfile_file_v1.rs \ + crates/kebab-chunk/src/lib.rs \ + crates/kebab-chunk/tests/fixtures/sample.dockerfile \ + crates/kebab-chunk/tests/dockerfile_file_v1.rs +git commit -m "$(cat <<'EOF' +feat(p10-2): dockerfile-file-v1 chunker (whole-file 1 chunk, symbol ) + +Reads entire Dockerfile / Dockerfile.* / *.dockerfile content and emits a +single Chunk with symbol "", code_lang "dockerfile", line range +1..EOF. Oversize >200 lines splits into line-windows sharing the symbol. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task F: manifest-file-v1 chunker + +**Files:** +- Create: `crates/kebab-chunk/src/manifest_file_v1.rs` +- Create: `crates/kebab-chunk/tests/fixtures/sample_cargo.toml` +- Create: `crates/kebab-chunk/tests/fixtures/sample_package.json` +- Create: `crates/kebab-chunk/tests/fixtures/sample_pom.xml` +- Create: `crates/kebab-chunk/tests/fixtures/sample_go.mod` +- Create: `crates/kebab-chunk/tests/manifest_file_v1.rs` +- Modify: `crates/kebab-chunk/src/lib.rs` + +- [ ] **Step 1 (fixtures)**: 4 ์ž‘์€ fixture (๊ฐ ~10 ์ค„): + +`sample_cargo.toml`: + +```toml +[package] +name = "demo" +version = "0.1.0" +edition = "2021" + +[dependencies] +serde = "1" +``` + +`sample_package.json`: + +```json +{ + "name": "demo", + "version": "0.1.0", + "dependencies": { + "react": "^18.0.0" + } +} +``` + +`sample_pom.xml`: + +```xml + + + 4.0.0 + com.demo + demo + 0.1.0 + +``` + +`sample_go.mod`: + +``` +module example.com/demo + +go 1.22 + +require github.com/spf13/cobra v1.8.0 +``` + +- [ ] **Step 2 (failing test)**: `crates/kebab-chunk/tests/manifest_file_v1.rs` ์— 4 ํ…Œ์ŠคํŠธ (๊ฐ fixture ๋งˆ๋‹ค): + +```rust +use kebab_chunk::{ChunkPolicy, Chunker, ManifestFileV1Chunker}; +use kebab_core::{Asset, AssetId, Block, Document, MediaType, ParserVersion, SourceSpan}; +use std::path::PathBuf; + +fn read_fixture(name: &str) -> String { + let path = PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests/fixtures").join(name); + std::fs::read_to_string(path).unwrap() +} +fn make_doc(lang: &str, text: &str) -> Document { + // Same as Task E's helper. Copy verbatim. + Document { + doc_id: "test".into(), + asset: Asset { + asset_id: AssetId("a".into()), + workspace_path: format!("test-manifest"), + byte_len: text.len() as u64, + content_hash: "deadbeef".into(), + media_type: MediaType::Code(lang.into()), + }, + parser_version: ParserVersion("none-v1".into()), + metadata: Default::default(), + blocks: vec![Block::Code { + text: text.into(), + lang: Some(lang.into()), + span: SourceSpan::Line { start: 1, end: text.lines().count() as u32 }, + }], + } +} + +#[test] +fn cargo_toml_single_chunk_with_toml_lang() { + let text = read_fixture("sample_cargo.toml"); + let doc = make_doc("toml", &text); + let chunks = ManifestFileV1Chunker.chunk(&doc, &ChunkPolicy::default()).unwrap(); + assert_eq!(chunks.len(), 1); + assert_eq!(chunks[0].source_span_symbol().as_deref(), Some("")); + assert_eq!(chunks[0].source_span_lang().as_deref(), Some("toml")); +} + +#[test] +fn package_json_single_chunk_with_json_lang() { + let text = read_fixture("sample_package.json"); + let doc = make_doc("json", &text); + let chunks = ManifestFileV1Chunker.chunk(&doc, &ChunkPolicy::default()).unwrap(); + assert_eq!(chunks.len(), 1); + assert_eq!(chunks[0].source_span_symbol().as_deref(), Some("")); + assert_eq!(chunks[0].source_span_lang().as_deref(), Some("json")); +} + +#[test] +fn pom_xml_single_chunk_with_xml_lang() { + let text = read_fixture("sample_pom.xml"); + let doc = make_doc("xml", &text); + let chunks = ManifestFileV1Chunker.chunk(&doc, &ChunkPolicy::default()).unwrap(); + assert_eq!(chunks.len(), 1); + assert_eq!(chunks[0].source_span_symbol().as_deref(), Some("")); + assert_eq!(chunks[0].source_span_lang().as_deref(), Some("xml")); +} + +#[test] +fn go_mod_single_chunk_with_go_mod_lang() { + let text = read_fixture("sample_go.mod"); + let doc = make_doc("go-mod", &text); + let chunks = ManifestFileV1Chunker.chunk(&doc, &ChunkPolicy::default()).unwrap(); + assert_eq!(chunks.len(), 1); + assert_eq!(chunks[0].source_span_symbol().as_deref(), Some("")); + assert_eq!(chunks[0].source_span_lang().as_deref(), Some("go-mod")); +} +``` + +- [ ] **Step 3**: `cargo test -p kebab-chunk manifest_file_v1` โ†’ FAIL. + +- [ ] **Step 4 (impl)**: `crates/kebab-chunk/src/manifest_file_v1.rs`: + +```rust +//! p10-2: manifest whole-file chunker (Tier 2). Cargo.toml / package.json / etc. + +use crate::tier2_shared::push_chunks_with_oversize; +use crate::{Chunker, ChunkPolicy}; +use anyhow::Result; +use kebab_core::{Block, Chunk, Document}; + +pub const VERSION_LABEL: &str = "manifest-file-v1"; + +pub struct ManifestFileV1Chunker; + +impl Chunker for ManifestFileV1Chunker { + fn chunker_version(&self) -> &'static str { VERSION_LABEL } + + fn chunk(&self, doc: &Document, policy: &ChunkPolicy) -> Result> { + let Some(Block::Code { text, lang, .. }) = doc.blocks.first() else { + return Ok(vec![]); + }; + let lang_str = lang.as_deref().unwrap_or(""); + let total_lines = text.lines().count().max(1) as u32; + let mut chunks = Vec::new(); + push_chunks_with_oversize( + &mut chunks, doc, policy, + text, 1, total_lines, + "", lang_str, VERSION_LABEL, + )?; + Ok(chunks) + } +} +``` + +- [ ] **Step 5**: `crates/kebab-chunk/src/lib.rs` ๊ฐฑ์‹ : + +```rust +pub mod manifest_file_v1; +pub use manifest_file_v1::ManifestFileV1Chunker; +``` + +- [ ] **Step 6**: `cargo test -p kebab-chunk manifest_file_v1` โ†’ 4 ํ…Œ์ŠคํŠธ PASS. + +- [ ] **Step 7**: Clippy + commit: + +```bash +cargo clippy -p kebab-chunk --all-targets -- -D warnings +git add crates/kebab-chunk/src/manifest_file_v1.rs \ + crates/kebab-chunk/src/lib.rs \ + crates/kebab-chunk/tests/fixtures/sample_cargo.toml \ + crates/kebab-chunk/tests/fixtures/sample_package.json \ + crates/kebab-chunk/tests/fixtures/sample_pom.xml \ + crates/kebab-chunk/tests/fixtures/sample_go.mod \ + crates/kebab-chunk/tests/manifest_file_v1.rs +git commit -m "$(cat <<'EOF' +feat(p10-2): manifest-file-v1 chunker (whole-file 1 chunk, symbol ) + +Emits 1 Chunk per manifest file (Cargo.toml / pyproject.toml / package.json / +tsconfig.json / pom.xml / build.gradle / go.mod). Symbol unified to +""; manifest type distinguished by code_lang (toml / json / xml / +groovy / go-mod). Oversize >200 lines splits into line-windows. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task G: ingest_one_code_asset Tier 2 routing + +**Files:** +- Modify: `crates/kebab-app/src/lib.rs` (`ingest_one_code_asset` ํ•จ์ˆ˜ + ํ˜ธ์ถœ ๋ถ€ allowlist) + +ํ˜„์žฌ 7-arm match (rust|python|typescript|javascript|go|java|kotlin) ์˜†์— Tier 2 ๋ถ„๊ธฐ. Tier 2 ๋Š” `extract` ๋‹จ๊ณ„ ์—†์Œ โ€” `RawAsset` bytes ๋กœ ์ง์ ‘ `Document` ์ƒ์„ฑ. + +- [ ] **Step 1**: ํ•จ์ˆ˜ ๋ณธ๋ฌธ์˜ parser_version match ๊ฐฑ์‹ : + +```rust +let parser_version = match code_lang { + "rust" => ParserVersion(kebab_parse_code::RUST_PARSER_VERSION.to_string()), + // ... ๊ธฐ์กด 7 ์ค„ ๊ทธ๋Œ€๋กœ ... + "kotlin" => ParserVersion(kebab_parse_code::KOTLIN_PARSER_VERSION.to_string()), + // p10-2: Tier 2 has no parse step. + "yaml" | "dockerfile" | "toml" | "json" | "xml" | "groovy" | "go-mod" + => ParserVersion("none-v1".to_string()), + other => anyhow::bail!("unsupported code_lang: {other}"), +}; +``` + +- [ ] **Step 2**: chunker_version match ๊ฐฑ์‹  (Tier 2 ๋ถ„๊ธฐ ์ถ”๊ฐ€): + +```rust +let chunker_version = match code_lang { + "rust" => CodeRustAstV1Chunker.chunker_version(), + // ... ๊ธฐ์กด ... + "kotlin" => CodeKotlinAstV1Chunker.chunker_version(), + "yaml" => K8sManifestResourceV1Chunker.chunker_version(), + "dockerfile" => DockerfileFileV1Chunker.chunker_version(), + "toml" | "json" | "xml" | "groovy" | "go-mod" + => ManifestFileV1Chunker.chunker_version(), + other => anyhow::bail!("unreachable chunker_version: {other}"), +}; +``` + +- [ ] **Step 3**: extract / chunk ๋‹จ๊ณ„ ๋ถ„๋ฆฌ. ํ˜„์žฌ๋Š” `let mut canonical = match code_lang { ... extract ... };` ํ›„ `let chunks = match code_lang { ... chunk(&canonical) ... };`. Tier 2 ๋Š” extract ์—†์ด ์ง์ ‘ Document ์ƒ์„ฑ: + +```rust +// p10-1B/1C: Tier 1 extractors return a canonical Document. +// p10-2: Tier 2 has no parser โ€” synthesize a Document with a single +// Block::Code carrying the whole file text. The chunker does the work. +let mut canonical = match code_lang { + "rust" => RustAstExtractor::new().extract(&ctx, &bytes).context("...")?, + // ... ๊ธฐ์กด 7 ์ค„ ... + "kotlin" => KotlinAstExtractor::new().extract(&ctx, &bytes).context("...")?, + "yaml" | "dockerfile" | "toml" | "json" | "xml" | "groovy" | "go-mod" => { + // Tier 2: no extractor. Build a minimal Document. + synthesize_tier2_document(asset, &bytes, code_lang, &parser_version)? + } + other => anyhow::bail!("unreachable (extract): {other}"), +}; + +let chunks = match code_lang { + "rust" => CodeRustAstV1Chunker.chunk(&canonical, chunk_policy).context("...")?, + // ... ๊ธฐ์กด ... + "kotlin" => CodeKotlinAstV1Chunker.chunk(&canonical, chunk_policy).context("...")?, + "yaml" => K8sManifestResourceV1Chunker.chunk(&canonical, chunk_policy).context("kb-chunk::K8sManifestResourceV1Chunker::chunk")?, + "dockerfile" => DockerfileFileV1Chunker.chunk(&canonical, chunk_policy).context("kb-chunk::DockerfileFileV1Chunker::chunk")?, + "toml" | "json" | "xml" | "groovy" | "go-mod" + => ManifestFileV1Chunker.chunk(&canonical, chunk_policy).context("kb-chunk::ManifestFileV1Chunker::chunk")?, + other => anyhow::bail!("unreachable (chunk): {other}"), +}; +``` + +`synthesize_tier2_document` helper ๋ฅผ ๊ฐ™์€ ํŒŒ์ผ (kebab-app/src/lib.rs) ์•ˆ์— ์ถ”๊ฐ€: + +```rust +fn synthesize_tier2_document( + asset: &RawAsset, + bytes: &[u8], + code_lang: &str, + parser_version: &ParserVersion, +) -> anyhow::Result { + use kebab_core::{Asset, AssetId, Block, Document, MediaType, SourceSpan}; + let text = std::str::from_utf8(bytes) + .with_context(|| format!("tier2 doc not utf-8: {}", asset.workspace_path))? + .to_string(); + let n_lines = text.lines().count().max(1) as u32; + Ok(Document { + doc_id: asset.asset_id.0.clone(), // tentative โ€” will be overwritten downstream + asset: Asset { + asset_id: AssetId(asset.asset_id.0.clone()), + workspace_path: asset.workspace_path.clone(), + byte_len: asset.byte_len, + content_hash: asset.content_hash.clone(), + media_type: MediaType::Code(code_lang.to_string()), + }, + parser_version: parser_version.clone(), + metadata: Default::default(), + blocks: vec![Block::Code { + text, + lang: Some(code_lang.to_string()), + span: SourceSpan::Line { start: 1, end: n_lines }, + }], + }) +} +``` + +(`Document` / `Asset` ์˜ ์ •ํ™•ํ•œ ํ•„๋“œ โ€” Read `crates/kebab-core/src/document.rs` ํ›„ ๋ฏธ๋Ÿฌ๋ง. ์œ„ ์ฝ”๋“œ์˜ ํ•„๋“œ๋ช…์ด ๋‹ค๋ฅด๋ฉด ์ •์ •.) + +- [ ] **Step 4**: ํ˜ธ์ถœ ๋ถ€ allowlist ๊ฐฑ์‹  (ํ˜„์žฌ `matches!(lang.as_str(), "rust" | "python" | ...)`): + +```rust +if matches!(lang.as_str(), + "rust" | "python" | "typescript" | "javascript" + | "go" | "java" | "kotlin" + | "yaml" | "dockerfile" | "toml" | "json" | "xml" | "groovy" | "go-mod" +) { + return ingest_one_code_asset(...); +} +``` + +- [ ] **Step 5**: Build + per-crate test: + +```bash +cargo build -p kebab-app +cargo test -p kebab-app --lib -- --nocapture 2>&1 | tail -20 +``` + +Expected: build clean, ๊ธฐ์กด unit test (์žˆ๋‹ค๋ฉด) ๊ทธ๋Œ€๋กœ PASS. + +- [ ] **Step 6**: Clippy + commit: + +```bash +cargo clippy -p kebab-app --all-targets -- -D warnings +git add crates/kebab-app/src/lib.rs +git commit -m "$(cat <<'EOF' +feat(p10-2): activate Tier 2 chunkers in ingest_one_code_asset dispatch + +Adds yaml / dockerfile / toml / json / xml / groovy / go-mod arms to the +existing 7-arm AST match. parser_version unified to "none-v1" for Tier 2. +synthesize_tier2_document builds a minimal Document (single Block::Code +with raw file text) since Tier 2 has no parse step. allowlist in +ingest_one_asset extended to admit Tier 2 langs. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task H: code_ingest_smoke integration tests (Tier 2) + +**Files:** +- Modify: `crates/kebab-app/tests/code_ingest_smoke.rs` (3 ์ƒˆ test) + +๊ธฐ์กด 9 ํ…Œ์ŠคํŠธ ์˜†์— yaml / dockerfile / manifest ํ†ตํ•ฉ ingest ๊ฒ€์ฆ 1๊ฐœ์”ฉ ์ถ”๊ฐ€. + +- [ ] **Step 1 (failing test)** โ€” ํŒŒ์ผ ๋์— ์ถ”๊ฐ€: + +```rust +#[test] +fn tier2_k8s_yaml_ingest_searchable() { + let kb = isolated_kb(); // TempDir KB helper, existing in this file + let path = kb.workspace_root().join("k8s/deploy.yaml"); + std::fs::create_dir_all(path.parent().unwrap()).unwrap(); + std::fs::write(&path, "apiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: api\n namespace: prod\nspec:\n replicas: 1\n").unwrap(); + kb.ingest_via_cli().expect("ingest"); + + let hits = kb.search_via_cli_json("--code-lang yaml api"); + assert!(!hits.is_empty(), "expected at least 1 yaml hit"); + assert_eq!(hits[0]["citation"]["lang"].as_str(), Some("yaml")); + assert_eq!(hits[0]["citation"]["symbol"].as_str(), Some("Deployment/prod/api")); +} + +#[test] +fn tier2_dockerfile_ingest_searchable() { + let kb = isolated_kb(); + let path = kb.workspace_root().join("Dockerfile"); + std::fs::write(&path, "FROM rust:1.94\nRUN cargo install foo\n").unwrap(); + kb.ingest_via_cli().expect("ingest"); + + let hits = kb.search_via_cli_json("--code-lang dockerfile cargo"); + assert!(!hits.is_empty()); + assert_eq!(hits[0]["citation"]["lang"].as_str(), Some("dockerfile")); + assert_eq!(hits[0]["citation"]["symbol"].as_str(), Some("")); +} + +#[test] +fn tier2_cargo_toml_ingest_searchable() { + let kb = isolated_kb(); + let path = kb.workspace_root().join("Cargo.toml"); + std::fs::write(&path, "[package]\nname = \"demo\"\nversion = \"0.1.0\"\n").unwrap(); + kb.ingest_via_cli().expect("ingest"); + + let hits = kb.search_via_cli_json("--code-lang toml demo"); + assert!(!hits.is_empty()); + assert_eq!(hits[0]["citation"]["lang"].as_str(), Some("toml")); + assert_eq!(hits[0]["citation"]["symbol"].as_str(), Some("")); +} +``` + +(helper API โ€” `isolated_kb()`, `workspace_root()`, `ingest_via_cli()`, `search_via_cli_json()` โ€” ๋Š” ๊ธฐ์กด 9 ํ…Œ์ŠคํŠธ๊ฐ€ ์“ฐ๋Š” ๊ทธ๋Œ€๋กœ. ๋ช…๋ช…์ด ๋‹ค๋ฅด๋ฉด ๊ทธ ํŒŒ์ผ์˜ ํŒจํ„ด ๋ฏธ๋Ÿฌ๋ง.) + +- [ ] **Step 2**: ์‹คํ–‰ โ†’ FAIL ("yaml symbol not present"). + +```bash +cargo test -p kebab-app --test code_ingest_smoke tier2 -- --nocapture +``` + +- [ ] **Step 3**: Task D-G ๊ฐ€ ๋‹ค ์™„๋ฃŒ๋œ ์ƒํƒœ์ด๋ฏ€๋กœ ์ฝ”๋“œ ๋ณ€๊ฒฝ ์—†์ด PASS ํ•ด์•ผ ํ•จ. FAIL ์ด๋ฉด ๋””๋ฒ„๊ทธ: +- citation_helper ์˜ `Citation::Code` mapping (1A-1) ์ด `lang` / `symbol` ์„ wire ์— ์ฑ„์šฐ๋Š”์ง€ ํ™•์ธ. +- `code_lang_for_path` ๊ฐ€ ํ˜ธ์ถœ๋˜๋Š”์ง€ ํ™•์ธ (kebab-source-fs/media.rs). + +- [ ] **Step 4**: 9 + 3 = 12 ํ…Œ์ŠคํŠธ ํ†ต๊ณผ ํ›„ commit: + +```bash +git add crates/kebab-app/tests/code_ingest_smoke.rs +git commit -m "$(cat <<'EOF' +test(p10-2): integration smoke tests for Tier 2 (k8s yaml + Dockerfile + Cargo.toml) + +Three new tests in code_ingest_smoke.rs verifying isolated-TempDir ingest + +--code-lang filter + Citation::Code.lang / .symbol shape for each Tier 2 +chunker. Brings the suite to 12 tests (Rust 3 + Python 1 + TS 1 + JS 1 + +Go 1 + Java 1 + Kotlin 1 + yaml 1 + dockerfile 1 + manifest 1). + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task I: frozen design ยง3.5 + ยง10.1 ๊ฐฑ์‹  + +**Files:** +- Modify: `docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md` + +- [ ] **Step 1**: ยง3.5 ์˜ `code_lang` ๋งคํ•‘ ํ‘œ ๋๋ถ€๋ถ„์— 3 ์ค„ ์ถ”๊ฐ€ (Shell / Make ์ค„ ์‚ฌ์ด ์ ์ ˆํ•œ ์œ„์น˜ โ€” `code_lang_for_path` ์ •์˜ ์ง์ „): + +```diff + - YAML / k8s manifest (`.yaml`, `.yml`) โ†’ `yaml` + - Dockerfile (`Dockerfile`, `*.dockerfile`) โ†’ `dockerfile` + - TOML (`.toml`) โ†’ `toml` + - JSON (`.json`) โ†’ `json` ++- XML (`.xml`, `pom.xml`) โ†’ `xml` ++- Groovy (`build.gradle`, `.gradle`) โ†’ `groovy` ++- Go module (`go.mod`) โ†’ `go-mod` + - Shell (`.sh`, `.bash`, `.zsh`) โ†’ `shell` +``` + +- [ ] **Step 2**: ยง10.1 ์˜ deactivation log ํ‘œ (๋˜๋Š” ์ค„ ๋ชฉ๋ก) ๋์— ์ถ”๊ฐ€ (1C-Go, 1C-JK ํ™œ์„ฑํ™” ์ค„ ๋‹ค์Œ): + +``` +| p10-2 | Tier 2 ํ™œ์„ฑํ™” โ€” k8s-manifest-resource-v1 + dockerfile-file-v1 + manifest-file-v1 chunker 3์ข…. code_lang ์ถ”๊ฐ€ ๋งคํ•‘ (xml / groovy / go-mod). | 2026-05-20 | +``` + +(ยง10.1 ์˜ ์ •ํ™•ํ•œ ํ˜•์‹ โ€” table vs bullet list โ€” ์€ ํ˜„์žฌ ํŒŒ์ผ์„ Read ํ•œ ํ›„ ๊ทธ ํ˜•์‹์— ๋งž๊ฒŒ.) + +- [ ] **Step 3**: commit: + +```bash +git add docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md +git commit -m "$(cat <<'EOF' +docs(p10-2): activate Tier 2 in code-ingest design ยง10.1 + ยง3.5 mappings + +ยง3.5: add code_lang_for_path mappings xml / groovy / go-mod. +ยง10.1: add deactivation log entry for p10-2 (3 Tier 2 chunkers active). + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task J: README + HANDOFF + ARCHITECTURE + SMOKE + tasks/INDEX + tasks/p10/INDEX + full-suite gate + +**Files:** +- Modify: `README.md` +- Modify: `HANDOFF.md` +- Modify: `docs/ARCHITECTURE.md` +- Modify: `docs/SMOKE.md` +- Modify: `tasks/INDEX.md` +- Modify: `tasks/p10/INDEX.md` + +- [ ] **Step 1 โ€” README.md**: **๋ช…๋ น** ํ‘œ์˜ ingest ํ–‰์— Tier 2 7์ข… ์–ธ๊ธ‰ ์ถ”๊ฐ€. ์˜ˆ (๊ธฐ์กด ํ–‰์ด "์ง€์› lang: rust / python / typescript / javascript / go / java / kotlin" ํ˜•์‹์ด๋ฉด): + +```diff +-์ง€์› lang: rust / python / typescript / javascript / go / java / kotlin ++์ง€์› lang: rust / python / typescript / javascript / go / java / kotlin / yaml (k8s) / dockerfile / toml / json / xml / groovy / go-mod +``` + +Configuration ์„น์…˜ ๋ณ€๊ฒฝ ์—†์Œ (gating flag ์‹ ์„ค ์—†์Œ). Mermaid ๋‹ค์ด์–ด๊ทธ๋žจ์€ ๋ณ€๊ฒฝ ์—†์Œ (code ์นดํ…Œ๊ณ ๋ฆฌ ์ด๋ฏธ ์กด์žฌ). + +- [ ] **Step 2 โ€” HANDOFF.md**: phase ํ‘œ์˜ p10-2 ํ–‰ โณ โ†’ โœ…. "๋จธ์ง€ ํ›„ ๋ฐœ๊ฒฌ๋œ ๋ฒ„๊ทธ / ๊ฒฐ์ • (์š”์•ฝ)" ์„น์…˜ ๋ณ€๊ฒฝ ๋ถˆํ•„์š” (post-merge ๋ฐœ๊ฒฌ ์‹œ ๋ณ„๋„ PR). + +- [ ] **Step 3 โ€” docs/ARCHITECTURE.md**: ๋””๋ ‰ํ† ๋ฆฌ ํŠธ๋ฆฌ์˜ `crates/kebab-chunk/src/` ํŠธ๋ฆฌ์— 3 ์ค„ ์ถ”๊ฐ€: + +``` +crates/kebab-chunk/src/ +โ”œโ”€โ”€ code_*_ast_v1.rs (Tier 1, 7๊ฐœ) +โ”œโ”€โ”€ k8s_manifest_resource_v1.rs (Tier 2, p10-2) +โ”œโ”€โ”€ dockerfile_file_v1.rs (Tier 2, p10-2) +โ”œโ”€โ”€ manifest_file_v1.rs (Tier 2, p10-2) +โ”œโ”€โ”€ tier2_shared.rs (Tier 2 helper, p10-2) +โ””โ”€โ”€ ... +``` + +- [ ] **Step 4 โ€” docs/SMOKE.md**: ํ•œ ์ค„ ์ถ”๊ฐ€ โ€” Tier 2 smoke ๊ฒ€์ฆ (yaml + Dockerfile + Cargo.toml ingest โ†’ search --code-lang yaml/dockerfile/toml). + +- [ ] **Step 5 โ€” tasks/INDEX.md** + **tasks/p10/INDEX.md**: p10-2 status โณ โ†’ โœ…. + +- [ ] **Step 6 โ€” Full-suite gate** (memory-conscious): + +```bash +df -h / # ๊ณต๊ฐ„ ํ™•์ธ +cargo clean # heavy ๋ฉด +cargo test --workspace --no-fail-fast -j 1 2>&1 | tail -60 +cargo clippy --workspace --all-targets -- -D warnings 2>&1 | tail -30 +``` + +Expected: ๋ชจ๋“  ํ…Œ์ŠคํŠธ PASS, clippy clean. + +- [ ] **Step 7**: commit: + +```bash +git add README.md HANDOFF.md docs/ARCHITECTURE.md docs/SMOKE.md tasks/INDEX.md tasks/p10/INDEX.md +git commit -m "$(cat <<'EOF' +docs(p10-2): README/HANDOFF/ARCHITECTURE/SMOKE/INDEX + tasks/p10/INDEX + +User-visible surface sync per the docs-split rule: README adds Tier 2 langs +in the command table; HANDOFF flips p10-2 to โœ…; ARCHITECTURE adds the new +chunker modules + tier2_shared.rs to the directory tree; SMOKE adds a +yaml/Dockerfile/Cargo.toml smoke step; both INDEX files flip p10-2 to โœ…. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task K: version bump 0.13.0 โ†’ 0.14.0 + gitea PR + +**Files:** +- Modify: `Cargo.toml` (workspace `version`) +- Modify: `Cargo.lock` (์ž๋™ ๊ฐฑ์‹ ) + +- [ ] **Step 1**: `Cargo.toml` ์˜ `[workspace.package] version = "0.13.0"` โ†’ `"0.14.0"`. + +- [ ] **Step 2**: `cargo build -p kebab` ํ•œ ๋ฒˆ โ€” `Cargo.lock` ๊ฐฑ์‹ . + +- [ ] **Step 3**: commit: + +```bash +git add Cargo.toml Cargo.lock +git commit -m "$(cat <<'EOF' +chore: bump version 0.13.0 โ†’ 0.14.0 (p10-2 Tier 2 resource-aware) + +Minor bump โ€” additive code_lang values (xml / groovy / go-mod) + 3 new +chunker_version labels (k8s-manifest-resource-v1 / dockerfile-file-v1 / +manifest-file-v1) + frozen design ยง3.5 deltas. No DB migration, no wire +schema major bump. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +- [ ] **Step 4**: gitea PR open via gitea-ops skill. Branch `feat/p10-2-tier2-resource` โ†’ `main`. Title: `feat(p10-2): Tier 2 resource-aware chunkers (k8s + Dockerfile + manifest)`. + +- [ ] **Step 5**: ์‚ฌ์šฉ์ž๊ฐ€ APPROVE ํ•˜๋ฉด ์ฆ‰์‹œ ๋จธ์ง€ (memory: feedback_pr_workflow). ๋จธ์ง€ ํ›„ main pull + branch ์ •๋ฆฌ + `gitea-release v0.14.0` (gitea-ops skill). + +--- + +## Verification matrix (final, after Task K merge) + +| ๊ฒ€์ฆ | ๋ช…๋ น | ๊ธฐ๋Œ€ | +|------|------|------| +| Tier 2 lang routing | `kebab schema --json \| jq '.stats.code_lang_breakdown'` (Tier 2 ํŒŒ์ผ ingest ํ›„) | yaml / dockerfile / toml / json / xml / groovy / go-mod ์นด์šดํŠธ ๋“ฑ์žฅ | +| k8s symbol shape | `kebab search --code-lang yaml --json` | citation.symbol = `//` | +| Dockerfile chunk | `kebab search --code-lang dockerfile --json` | citation.symbol = ``, line 1..EOF | +| manifest chunk | `kebab search --code-lang toml --json` | citation.symbol = ``, lang ๋งคํ•‘ | +| ๋น„-k8s YAML skip | docker-compose.yml ingest | 0 chunk, IngestReport.skipped ์นด์šดํŠธ +1 | +| Invalid YAML skip | ์˜๋„์  invalid yaml ingest | 0 chunk, IngestReport.skipped + warning | + +`docs/SMOKE.md` ์˜ Tier 2 ์ ˆ์„ ๋”ฐ๋ผ ์ˆ˜๋™ ๊ฒ€์ฆ ๊ฐ€๋Šฅ. + +--- + +## Risks ์žฌ์š”์•ฝ (๊ตฌํ˜„ ์ค‘ ์ฃผ์˜) + +- `^---\s*$` regex ๊ฐ€ ๋„ˆ๋ฌด ์ข์Œ โ€” YAML ํ‘œ์ค€ ์ƒ `---` ๋’ค ๊ณต๋ฐฑ + comment ๊ฐ€๋Šฅ. fixture ๋กœ ๊ฒ€์ฆ + ํ•„์š”์‹œ regex ์™„ํ™”. +- `serde_yaml::Value::as_str()` ๊ฐ€ boolean / number ์— None ๋ฐ˜ํ™˜ โ€” apiVersion/kind ๊ฐ€ string ์ž„์„ ๊ฐ•์ œ. ์ด๋ฏธ spec ๋ช…์‹œ. +- pre-1.0 ์˜ Cargo.toml workspace version ์œ„์น˜ โ€” `[workspace.package]` ๊ฐ€ ๋งž๋Š”์ง€ ํ˜„์žฌ ํŒŒ์ผ Read ํ›„ ํ™•์ธ. +- `synthesize_tier2_document` ์˜ `doc_id` ๊ฐ€ ์ž„์‹œ๊ฐ’ (`asset_id.0`) โ€” downstream ์˜ ์ง„์งœ doc_id ์ƒ์„ฑ ๋กœ์ง๊ณผ ์ถฉ๋Œ ๊ฐ€๋Šฅ. 1A-2 ์˜ extractor return ํ˜•์‹๊ณผ ๊ฐ™์€ doc_id ์ •์ฑ… ์ ์šฉ. Step 3 impl ์ž‘์„ฑ ์ „ RustAstExtractor ์˜ Document ์ƒ์„ฑ ์ฝ”๋“œ ํ™•์ธ. +- pom.xml ๊ฑฐ๋Œ€ fixture ๊ฐ€ 200 ์ค„ ๋„˜์–ด๊ฐ€๋ฉด oversize split ๊ฒ€์ฆ ์ข‹์Œ โ€” ํ•„์š”์‹œ Task F ์˜ sample_pom.xml ์„ ์ผ๋ถ€๋Ÿฌ ๊ธธ๊ฒŒ. From 077f92f41e1cc01e82bce1084e74e7884b4a9308 Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 20 May 2026 12:57:06 +0000 Subject: [PATCH 03/13] build(p10-2): add serde_yaml dep to kebab-chunk for k8s-manifest-resource-v1 Co-Authored-By: Claude Opus 4.7 (1M context) --- crates/kebab-chunk/Cargo.toml | 1 + 1 file changed, 1 insertion(+) diff --git a/crates/kebab-chunk/Cargo.toml b/crates/kebab-chunk/Cargo.toml index be91c3c..a7f9534 100644 --- a/crates/kebab-chunk/Cargo.toml +++ b/crates/kebab-chunk/Cargo.toml @@ -13,6 +13,7 @@ serde_json_canonicalizer = "0.3" blake3 = { workspace = true } anyhow = { workspace = true } tracing = { workspace = true } +serde_yaml = { workspace = true } [dev-dependencies] # kb-parse-md / kb-normalize are dev-only โ€” used by the snapshot integration From aaa90b1754380111269f65fae8c38c98cf20549c Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 20 May 2026 12:59:11 +0000 Subject: [PATCH 04/13] feat(p10-2): extend code_lang_for_path with Tier 2 basenames + extensions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds basename-first matching for Dockerfile / Cargo.toml / pyproject.toml / package.json / tsconfig.json / go.mod / pom.xml / build.gradle plus Dockerfile.* prefix variant. Extension fallback adds .yaml/.yml/.dockerfile/ .toml/.json/.xml/.gradle โ†’ yaml/dockerfile/toml/json/xml/groovy. Co-Authored-By: Claude Opus 4.7 (1M context) --- crates/kebab-parse-code/src/lang.rs | 56 ++++++++++++++++++++++++++--- 1 file changed, 52 insertions(+), 4 deletions(-) diff --git a/crates/kebab-parse-code/src/lang.rs b/crates/kebab-parse-code/src/lang.rs index 19fbb38..9f974a2 100644 --- a/crates/kebab-parse-code/src/lang.rs +++ b/crates/kebab-parse-code/src/lang.rs @@ -10,18 +10,39 @@ use std::path::Path; /// `None` if the extension / filename is not recognized. /// /// Matching priority: -/// 1. exact filename match (e.g. `Dockerfile`, `Makefile`) -/// 2. lowercase extension match +/// 1. Tier 1 basename exact match (e.g. `Dockerfile`, `Makefile`) +/// 2. Tier 2 basename match (e.g. `Cargo.toml`, `package.json`, `build.gradle`) +/// 3. Tier 2 `Dockerfile.*` prefix variant +/// 4. Tier 1 + Tier 2 extension fallback (lowercase) pub fn code_lang_for_path(path: &Path) -> Option<&'static str> { if let Some(name) = path.file_name().and_then(|n| n.to_str()) { + // Tier 1 basename exact match match name { "Dockerfile" => return Some("dockerfile"), "Makefile" | "GNUmakefile" => return Some("make"), _ => {} } + + // Tier 2 basename match (configuration / manifest files) + match name { + "Cargo.toml" | "pyproject.toml" => return Some("toml"), + "package.json" | "tsconfig.json" => return Some("json"), + "go.mod" => return Some("go-mod"), + "pom.xml" => return Some("xml"), + "build.gradle" => return Some("groovy"), + _ => {} + } + + // Tier 2: `Dockerfile.*` prefix variant (e.g. `Dockerfile.dev`, `Dockerfile.prod`) + if name.starts_with("Dockerfile.") && name.len() > "Dockerfile.".len() { + return Some("dockerfile"); + } } + + // Extension fallback (Tier 1 + Tier 2) let ext = path.extension()?.to_str()?.to_ascii_lowercase(); match ext.as_str() { + // Tier 1 extensions "rs" => Some("rust"), "py" | "pyi" => Some("python"), "ts" | "tsx" | "mts" | "cts" => Some("typescript"), @@ -31,12 +52,15 @@ pub fn code_lang_for_path(path: &Path) -> Option<&'static str> { "kt" | "kts" => Some("kotlin"), "c" | "h" => Some("c"), "cpp" | "cc" | "cxx" | "hpp" | "hh" | "hxx" => Some("cpp"), + "sh" | "bash" | "zsh" => Some("shell"), + "mk" => Some("make"), + // Tier 2 extensions "yaml" | "yml" => Some("yaml"), "toml" => Some("toml"), "json" => Some("json"), - "sh" | "bash" | "zsh" => Some("shell"), - "mk" => Some("make"), + "xml" => Some("xml"), "dockerfile" => Some("dockerfile"), + "gradle" => Some("groovy"), _ => None, } } @@ -118,4 +142,28 @@ mod tests { assert_eq!(module_path_for_tsjs("a/b/c.ts"), "a/b/c"); assert_eq!(module_path_for_tsjs("packages/x/src/Foo.ts"), "packages/x/src/Foo"); } + + #[test] + fn tier2_basename_takes_precedence_over_extension() { + assert_eq!(code_lang_for_path(Path::new("Dockerfile")), Some("dockerfile")); + assert_eq!(code_lang_for_path(Path::new("foo/Dockerfile.dev")), Some("dockerfile")); + assert_eq!(code_lang_for_path(Path::new("myapp.dockerfile")), Some("dockerfile")); + assert_eq!(code_lang_for_path(Path::new("repo/Cargo.toml")), Some("toml")); + assert_eq!(code_lang_for_path(Path::new("pyproject.toml")), Some("toml")); + assert_eq!(code_lang_for_path(Path::new("repo/package.json")), Some("json")); + assert_eq!(code_lang_for_path(Path::new("tsconfig.json")), Some("json")); + assert_eq!(code_lang_for_path(Path::new("go.mod")), Some("go-mod")); + assert_eq!(code_lang_for_path(Path::new("pom.xml")), Some("xml")); + assert_eq!(code_lang_for_path(Path::new("build.gradle")), Some("groovy")); + } + + #[test] + fn tier2_extension_fallback() { + assert_eq!(code_lang_for_path(Path::new("k8s/deploy.yaml")), Some("yaml")); + assert_eq!(code_lang_for_path(Path::new("k8s/deploy.yml")), Some("yaml")); + assert_eq!(code_lang_for_path(Path::new("foo/bar.toml")), Some("toml")); + assert_eq!(code_lang_for_path(Path::new("foo/bar.json")), Some("json")); + assert_eq!(code_lang_for_path(Path::new("foo/bar.xml")), Some("xml")); + assert_eq!(code_lang_for_path(Path::new("foo/bar.gradle")), Some("groovy")); + } } From 22dba09857e289573324ad1885187d2108d15cb7 Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 20 May 2026 13:01:14 +0000 Subject: [PATCH 05/13] refactor(p10-2): media.rs delegates code lang to code_lang_for_path MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replaces 1A-1 era inline match block with a single call to kebab_parse_code::code_lang_for_path, per design ยง3.5 single-source-of-truth rule. Adds Tier 2 routing test (yaml / dockerfile / toml / json / xml / groovy / go-mod) and preserves all non-code extension branches. Co-Authored-By: Claude Opus 4.7 (1M context) --- crates/kebab-source-fs/src/media.rs | 36 ++++++++++++++--------------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/crates/kebab-source-fs/src/media.rs b/crates/kebab-source-fs/src/media.rs index 3f9e0c6..8dfc2df 100644 --- a/crates/kebab-source-fs/src/media.rs +++ b/crates/kebab-source-fs/src/media.rs @@ -12,6 +12,12 @@ use kebab_core::{AudioType, ImageType, MediaType}; /// `MediaType::Image(_)` / `MediaType::Audio(_)`. Anything else (including /// missing extension) โ†’ `MediaType::Other(ext)`. pub(crate) fn media_type_for(path: &Path) -> MediaType { + // p10-2: code_lang_for_path is the single source of truth for code lang + // (design ยง3.5). Delegate before falling back to extension branches. + if let Some(lang) = kebab_parse_code::code_lang_for_path(path) { + return MediaType::Code(lang.to_string()); + } + let ext = path .extension() .and_then(|s| s.to_str()) @@ -36,23 +42,6 @@ pub(crate) fn media_type_for(path: &Path) -> MediaType { "flac" => MediaType::Audio(AudioType::Flac), "ogg" => MediaType::Audio(AudioType::Ogg), - // p10-1A-2: Rust is the only code lang activated in 1A. Other - // recognized code langs stay Other until their phase (1B+). - "rs" => MediaType::Code("rust".to_string()), - - // p10-1B: Python / TS / JS AST chunkers active. - "py" | "pyi" => MediaType::Code("python".into()), - // .mts / .cts are TypeScript ESM / CommonJS variants โ€” same grammar. - "ts" | "tsx" | "mts" | "cts" => MediaType::Code("typescript".into()), - "js" | "mjs" | "cjs" | "jsx" => MediaType::Code("javascript".into()), - - // p10-1C-Go: Go ingest activated. - "go" => MediaType::Code("go".into()), - - // p10-1C-JK: JVM family (Java + Kotlin) ingest activated. - "java" => MediaType::Code("java".into()), - "kt" | "kts" => MediaType::Code("kotlin".into()), - // Empty string (no extension) and any other extension: bucket as // Other and let downstream extractors decide if they support it. _ => MediaType::Other(ext), @@ -96,7 +85,8 @@ mod tests { media_type_for(Path::new("crates/kebab-core/src/lib.rs")), MediaType::Code("rust".to_string()) ); - assert_eq!(media_type_for(Path::new("Cargo.toml")), MediaType::Other("toml".to_string())); + // Cargo.toml is a Tier 2 code manifest (p10-2), handled by code_lang_for_path + assert_eq!(media_type_for(Path::new("Cargo.toml")), MediaType::Code("toml".to_string())); } #[test] @@ -149,4 +139,14 @@ mod tests { MediaType::Other(String::new()) ); } + + #[test] + fn tier2_files_map_to_media_code() { + assert_eq!(media_type_for(Path::new("a/deploy.yaml")), MediaType::Code("yaml".into())); + assert_eq!(media_type_for(Path::new("a/Dockerfile")), MediaType::Code("dockerfile".into())); + assert_eq!(media_type_for(Path::new("a/Cargo.toml")), MediaType::Code("toml".into())); + assert_eq!(media_type_for(Path::new("a/pom.xml")), MediaType::Code("xml".into())); + assert_eq!(media_type_for(Path::new("a/build.gradle")), MediaType::Code("groovy".into())); + assert_eq!(media_type_for(Path::new("a/go.mod")), MediaType::Code("go-mod".into())); + } } From 8996e7328232922b3d315d1be81959b8da591dc6 Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 20 May 2026 13:06:47 +0000 Subject: [PATCH 06/13] feat(p10-2): k8s-manifest-resource-v1 chunker + tier2_shared helper Splits multi-document YAML by ^---\s*$, requires apiVersion + kind string fields per document, emits 1 chunk per recognized k8s resource. Symbol = // or / (cluster-scoped). Invalid YAML returns 0 chunks (handled by p10-3 paragraph fallback). Oversize >200 lines splits into line-windows sharing the same symbol. tier2_shared module hosts the oversize fallback + Chunk-construction helper mirroring code_rust_ast_v1's Chunk shape. Task E (dockerfile) and Task F (manifest) will reuse it. Co-Authored-By: Claude Opus 4.7 (1M context) --- Cargo.lock | 1 + .../src/k8s_manifest_resource_v1.rs | 169 +++++++++++++++ crates/kebab-chunk/src/lib.rs | 3 + crates/kebab-chunk/src/tier2_shared.rs | 142 ++++++++++++ .../tests/fixtures/sample_k8s.yaml | 34 +++ .../tests/k8s_manifest_resource_v1.rs | 205 ++++++++++++++++++ 6 files changed, 554 insertions(+) create mode 100644 crates/kebab-chunk/src/k8s_manifest_resource_v1.rs create mode 100644 crates/kebab-chunk/src/tier2_shared.rs create mode 100644 crates/kebab-chunk/tests/fixtures/sample_k8s.yaml create mode 100644 crates/kebab-chunk/tests/k8s_manifest_resource_v1.rs diff --git a/Cargo.lock b/Cargo.lock index 311a7be..28cbf89 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -4181,6 +4181,7 @@ dependencies = [ "kebab-parse-md", "serde_json", "serde_json_canonicalizer", + "serde_yaml", "time", "tracing", ] diff --git a/crates/kebab-chunk/src/k8s_manifest_resource_v1.rs b/crates/kebab-chunk/src/k8s_manifest_resource_v1.rs new file mode 100644 index 0000000..71a4104 --- /dev/null +++ b/crates/kebab-chunk/src/k8s_manifest_resource_v1.rs @@ -0,0 +1,169 @@ +//! p10-2: k8s manifest resource-aware chunker. +//! +//! Splits a multi-document YAML file on `^---\s*$` boundaries, recognises +//! documents that have both `apiVersion` and `kind` string fields as k8s +//! resources, and emits one `Chunk` per resource (with oversize >200-line +//! fallback). Non-k8s documents are skipped; invalid YAML yields 0 chunks +//! for the entire file. + +use crate::tier2_shared::{policy_hash, push_chunks_with_oversize}; +use anyhow::Result; +use kebab_core::{Block, CanonicalDocument, Chunk, ChunkPolicy, ChunkerVersion, Chunker}; + +pub const VERSION_LABEL: &str = "k8s-manifest-resource-v1"; + +#[derive(Clone, Copy, Debug, Default)] +pub struct K8sManifestResourceV1Chunker; + +impl Chunker for K8sManifestResourceV1Chunker { + fn chunker_version(&self) -> ChunkerVersion { + ChunkerVersion(VERSION_LABEL.to_string()) + } + + fn policy_hash(&self, policy: &ChunkPolicy) -> String { + policy_hash(policy) + } + + fn chunk(&self, doc: &CanonicalDocument, policy: &ChunkPolicy) -> Result> { + // Expect a single Block::Code carrying the full YAML text. + let text = match doc.blocks.first() { + Some(Block::Code(cb)) => cb.code.as_str(), + _ => return Ok(vec![]), + }; + + let slices = split_yaml_documents(text); + let mut chunks: Vec = Vec::new(); + + for slice in slices { + // Invalid YAML in any document โ†’ return 0 chunks for the file. + let value: serde_yaml::Value = match serde_yaml::from_str(slice.text) { + Ok(v) => v, + Err(_) => return Ok(vec![]), + }; + + let Some(mapping) = value.as_mapping() else { + continue; + }; + + let api = mapping + .get("apiVersion") + .and_then(|v| v.as_str()) + .unwrap_or(""); + let kind = mapping + .get("kind") + .and_then(|v| v.as_str()) + .unwrap_or(""); + + // Skip non-k8s documents. + if api.is_empty() || kind.is_empty() { + continue; + } + + let metadata = mapping + .get("metadata") + .and_then(|v| v.as_mapping()); + let name = metadata + .and_then(|m| m.get("name")) + .and_then(|v| v.as_str()) + .unwrap_or(""); + let namespace = metadata + .and_then(|m| m.get("namespace")) + .and_then(|v| v.as_str()); + + let symbol = match namespace { + Some(ns) if !ns.is_empty() => format!("{kind}/{ns}/{name}"), + _ => format!("{kind}/{name}"), + }; + + push_chunks_with_oversize( + &mut chunks, + doc, + policy, + slice.text, + slice.line_start, + slice.line_end, + &symbol, + "yaml", + VERSION_LABEL, + )?; + } + + tracing::debug!( + target: "kebab-chunk", + doc_id = %doc.doc_id, + chunks = chunks.len(), + "k8s-manifest-resource-v1 chunked", + ); + + Ok(chunks) + } +} + +struct YamlSlice<'a> { + text: &'a str, + line_start: u32, + line_end: u32, +} + +/// Split raw YAML text into per-document slices on `---` separator lines. +/// Line numbers are 1-indexed. +fn split_yaml_documents(text: &str) -> Vec> { + let lines: Vec<&str> = text.lines().collect(); + + // Collect indices of separator lines (0-based), then append a sentinel at + // the end so the last slice is always terminated. + let mut separators: Vec = lines + .iter() + .enumerate() + .filter_map(|(i, l)| { + let trimmed = l.trim_end(); + if trimmed == "---" + || trimmed.starts_with("--- ") + || trimmed.starts_with("---\t") + { + Some(i) + } else { + None + } + }) + .collect(); + separators.push(lines.len()); + + let mut slices: Vec> = Vec::new(); + let mut doc_start_line: usize = 0; // 0-based index of current doc start + + for sep_line in separators { + if sep_line > doc_start_line { + let start_byte = byte_offset_of_line(text, doc_start_line); + let end_byte = byte_offset_of_line(text, sep_line); + let slice_text = &text[start_byte..end_byte]; + if !slice_text.trim().is_empty() { + slices.push(YamlSlice { + text: slice_text, + line_start: (doc_start_line + 1) as u32, + line_end: sep_line as u32, + }); + } + } + doc_start_line = sep_line + 1; + } + + slices +} + +/// Return the byte offset of the start of `line_idx` (0-based line index). +fn byte_offset_of_line(text: &str, line_idx: usize) -> usize { + if line_idx == 0 { + return 0; + } + let mut count = 0usize; + for (i, c) in text.char_indices() { + if c == '\n' { + count += 1; + if count == line_idx { + return i + 1; + } + } + } + text.len() +} diff --git a/crates/kebab-chunk/src/lib.rs b/crates/kebab-chunk/src/lib.rs index 750d18e..516620a 100644 --- a/crates/kebab-chunk/src/lib.rs +++ b/crates/kebab-chunk/src/lib.rs @@ -24,6 +24,8 @@ mod code_rust_ast_v1; mod code_ts_ast_v1; mod md_heading_v1; mod pdf_page_v1; +mod tier2_shared; +pub mod k8s_manifest_resource_v1; pub use code_go_ast_v1::CodeGoAstV1Chunker; pub use code_java_ast_v1::CodeJavaAstV1Chunker; @@ -34,3 +36,4 @@ pub use code_rust_ast_v1::CodeRustAstV1Chunker; pub use code_ts_ast_v1::CodeTsAstV1Chunker; pub use md_heading_v1::MdHeadingV1Chunker; pub use pdf_page_v1::PdfPageV1Chunker; +pub use k8s_manifest_resource_v1::K8sManifestResourceV1Chunker; diff --git a/crates/kebab-chunk/src/tier2_shared.rs b/crates/kebab-chunk/src/tier2_shared.rs new file mode 100644 index 0000000..f52173c --- /dev/null +++ b/crates/kebab-chunk/src/tier2_shared.rs @@ -0,0 +1,142 @@ +//! p10-2: Tier 2 chunker shared helpers (oversize fallback + Chunk build). +//! +//! Mirrors `code_rust_ast_v1`'s Chunk-construction pattern exactly so that +//! id / hashes / token-count / ChunkPolicy semantics stay identical across +//! Tier 1 (AST) and Tier 2 (resource-aware) chunkers. + +use anyhow::Result; +use kebab_core::{ + BlockId, CanonicalDocument, Chunk, ChunkPolicy, ChunkerVersion, DocumentId, SourceSpan, + id_for_chunk, +}; + +pub(crate) const AST_CHUNK_MAX_LINES: u32 = 200; +const BYTES_PER_TOKEN: usize = 3; +const POLICY_HASH_HEX_LEN: usize = 16; + +/// Compute the policy hash the same way `code_rust_ast_v1` does. +pub(crate) fn policy_hash(policy: &ChunkPolicy) -> String { + let bytes = serde_json_canonicalizer::to_vec(policy) + .expect("canonical JSON serialization of ChunkPolicy must not fail"); + let hex = blake3::hash(&bytes).to_hex().to_string(); + hex[..POLICY_HASH_HEX_LEN].to_string() +} + +/// Emit one chunk for `(text, line_start..=line_end, symbol, lang)`, splitting +/// into line-windows of at most `AST_CHUNK_MAX_LINES` if the slice is oversize. +/// Mirrors the oversize path in `code_rust_ast_v1`'s `chunk` impl. +#[allow(clippy::too_many_arguments)] +pub(crate) fn push_chunks_with_oversize( + out: &mut Vec, + doc: &CanonicalDocument, + policy: &ChunkPolicy, + text: &str, + line_start: u32, + line_end: u32, + symbol: &str, + lang: &str, + chunker_version: &str, +) -> Result<()> { + let n_lines = (line_end - line_start + 1).max(1); + let cv = ChunkerVersion(chunker_version.to_string()); + let base_policy_hash = policy_hash(policy); + + if n_lines <= AST_CHUNK_MAX_LINES { + out.push(build_chunk( + doc, + &cv, + &base_policy_hash, + text, + line_start, + line_end, + symbol, + lang, + None, + )); + return Ok(()); + } + + let lines: Vec<&str> = text.lines().collect(); + let total = lines.len(); + let mut window_start = line_start; + let mut i = 0usize; + while i < total { + let take = (AST_CHUNK_MAX_LINES as usize).min(total - i); + let window_text = lines[i..i + take].join("\n"); + let window_end = window_start + take as u32 - 1; + out.push(build_chunk( + doc, + &cv, + &base_policy_hash, + &window_text, + window_start, + window_end, + symbol, + lang, + Some(window_start), + )); + i += take; + window_start = window_end + 1; + } + Ok(()) +} + +/// Build a single `Chunk`, mirroring `make_chunk` in `code_rust_ast_v1.rs` +/// exactly (same id recipe, same token estimate, same field set). +/// +/// `split_key` is `Some(line_start_of_window)` for oversize splits, `None` +/// for normal single-chunk emission. Mirrors the `Some(part_ls)` / `None` +/// split_key pattern in 1A-2. +#[allow(clippy::too_many_arguments)] +fn build_chunk( + doc: &CanonicalDocument, + chunker_version: &ChunkerVersion, + base_policy_hash: &str, + text: &str, + line_start: u32, + line_end: u32, + symbol: &str, + lang: &str, + split_key: Option, +) -> Chunk { + let span = SourceSpan::Code { + line_start, + line_end, + symbol: Some(symbol.to_string()), + lang: Some(lang.to_string()), + }; + + // id_hash mirrors code_rust_ast_v1's make_chunk logic: + // split_key Some(k) => "{base_policy_hash}#L{k}" + // split_key None => base_policy_hash + let id_hash = match split_key { + Some(k) => format!("{base_policy_hash}#L{k}"), + None => base_policy_hash.to_string(), + }; + + // block_ids: Tier 2 chunkers have no per-block structure (the whole file + // is one Block::Code), so we pass an empty slice โ€” same as using the doc- + // level slice without explicit block granularity. + let block_ids: Vec = vec![]; + + let chunk_id = id_for_chunk( + &DocumentId(doc.doc_id.0.clone()), + chunker_version, + &block_ids, + &id_hash, + ); + + let token_estimate = text.len().div_ceil(BYTES_PER_TOKEN); + + Chunk { + chunk_id, + doc_id: DocumentId(doc.doc_id.0.clone()), + block_ids, + text: text.to_string(), + heading_path: Vec::new(), + source_spans: vec![span], + token_estimate, + chunker_version: chunker_version.clone(), + policy_hash: base_policy_hash.to_string(), + } +} diff --git a/crates/kebab-chunk/tests/fixtures/sample_k8s.yaml b/crates/kebab-chunk/tests/fixtures/sample_k8s.yaml new file mode 100644 index 0000000..b7f61f0 --- /dev/null +++ b/crates/kebab-chunk/tests/fixtures/sample_k8s.yaml @@ -0,0 +1,34 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: api-server + namespace: prod +spec: + replicas: 3 + selector: + matchLabels: + app: api-server + template: + metadata: + labels: + app: api-server + spec: + containers: + - name: api + image: example/api:1.2.3 +--- +apiVersion: v1 +kind: Service +metadata: + name: api-server + namespace: prod +spec: + selector: + app: api-server + ports: + - port: 80 + targetPort: 8080 +--- +# Non-k8s document โ€” apiVersion missing +kind: ClusterIP +foo: bar diff --git a/crates/kebab-chunk/tests/k8s_manifest_resource_v1.rs b/crates/kebab-chunk/tests/k8s_manifest_resource_v1.rs new file mode 100644 index 0000000..42625a0 --- /dev/null +++ b/crates/kebab-chunk/tests/k8s_manifest_resource_v1.rs @@ -0,0 +1,205 @@ +//! Behavioural tests for `K8sManifestResourceV1Chunker`. +//! +//! Documents are constructed manually (no kebab-parse-code dependency) by +//! placing the raw YAML text into a single `Block::Code`, mirroring the +//! pattern used in `code_rust_ast_snapshot.rs`. + +use std::path::PathBuf; + +use kebab_chunk::K8sManifestResourceV1Chunker; +use kebab_core::{ + AssetId, Block, CanonicalDocument, ChunkPolicy, Chunker, ChunkerVersion, CodeBlock, + CommonBlock, Lang, Metadata, ParserVersion, Provenance, SourceSpan, SourceType, TrustLevel, + WorkspacePath, id_for_block, id_for_doc, +}; +use time::OffsetDateTime; + +// โ”€โ”€ helpers โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ + +fn fixtures_dir() -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests") + .join("fixtures") +} + +/// Build a `CanonicalDocument` with a single `Block::Code` containing `yaml_text`. +fn yaml_doc(yaml_text: &str) -> CanonicalDocument { + let wp = WorkspacePath("manifests/deploy.yaml".into()); + let aid = AssetId("c".repeat(64)); + let pv = ParserVersion("code-yaml-v1".into()); + let doc_id = id_for_doc(&wp, &aid, &pv); + + let line_count = yaml_text.lines().count() as u32; + let span = SourceSpan::Code { + line_start: 1, + line_end: line_count.max(1), + symbol: None, + lang: Some("yaml".into()), + }; + let bid = id_for_block(&doc_id, "code", &[], 0, &span); + let block = Block::Code(CodeBlock { + common: CommonBlock { + block_id: bid, + heading_path: vec![], + source_span: span, + }, + lang: Some("yaml".into()), + code: yaml_text.to_string(), + }); + + CanonicalDocument { + doc_id, + source_asset_id: aid, + workspace_path: wp, + title: "deploy.yaml".into(), + lang: Lang("und".into()), + blocks: vec![block], + metadata: Metadata { + aliases: vec![], + tags: vec![], + created_at: OffsetDateTime::from_unix_timestamp(1_700_000_000).unwrap(), + updated_at: OffsetDateTime::from_unix_timestamp(1_700_000_000).unwrap(), + source_type: SourceType::Note, + trust_level: TrustLevel::Primary, + user_id_alias: None, + user: Default::default(), + repo: Some("kebab".into()), + git_branch: Some("main".into()), + git_commit: Some("0".repeat(40)), + code_lang: Some("yaml".into()), + }, + provenance: Provenance { events: vec![] }, + parser_version: pv, + schema_version: 1, + doc_version: 1, + last_chunker_version: None, + last_embedding_version: None, + } +} + +fn policy() -> ChunkPolicy { + ChunkPolicy { + target_tokens: 500, + overlap_tokens: 80, + respect_markdown_headings: false, + chunker_version: ChunkerVersion("k8s-manifest-resource-v1".into()), + } +} + +// โ”€โ”€ tests โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ + +/// Three YAML documents: 2 valid k8s resources + 1 non-k8s (no apiVersion). +/// The chunker must emit exactly 2 chunks with the correct symbols and lang. +#[test] +fn k8s_multi_doc_emits_one_chunk_per_resource() { + let fixture_path = fixtures_dir().join("sample_k8s.yaml"); + let text = std::fs::read_to_string(&fixture_path) + .unwrap_or_else(|e| panic!("cannot read fixture {}: {e}", fixture_path.display())); + + let doc = yaml_doc(&text); + let chunks = K8sManifestResourceV1Chunker + .chunk(&doc, &policy()) + .expect("chunk"); + + assert_eq!( + chunks.len(), + 2, + "expected 2 k8s chunks, got {}: {chunks:#?}", + chunks.len() + ); + + let symbols: Vec<&str> = chunks + .iter() + .map(|c| { + match &c.source_spans[0] { + SourceSpan::Code { symbol, .. } => { + symbol.as_deref().expect("symbol must be Some for k8s chunks") + } + other => panic!("expected Code span, got {other:?}"), + } + }) + .collect(); + + assert_eq!( + symbols, + vec!["Deployment/prod/api-server", "Service/prod/api-server"], + "symbols mismatch: {symbols:?}" + ); + + // Verify lang = "yaml" on every chunk. + for chunk in &chunks { + match &chunk.source_spans[0] { + SourceSpan::Code { lang, .. } => { + assert_eq!(lang.as_deref(), Some("yaml"), "lang must be 'yaml'"); + } + other => panic!("expected Code span, got {other:?}"), + } + } + + // Verify chunker_version label. + for chunk in &chunks { + assert_eq!(chunk.chunker_version.0, "k8s-manifest-resource-v1"); + } +} + +/// A YAML document with an indentation error (tab in a space-indented context) +/// must cause the chunker to return 0 chunks for the entire file. +#[test] +fn k8s_invalid_yaml_emits_zero_chunks() { + // serde_yaml 0.9 is lenient about duplicate keys (last wins), so use a + // genuine YAML structural error (unclosed flow sequence) to force a parse + // failure. + let actually_bad = "apiVersion: v1\nkind: Service\nfoo: [\nbar\n"; + + let doc = yaml_doc(actually_bad); + let chunks = K8sManifestResourceV1Chunker + .chunk(&doc, &policy()) + .expect("chunk should not error โ€” return Ok(vec![]) for invalid yaml"); + + assert_eq!( + chunks.len(), + 0, + "invalid YAML must yield 0 chunks, got {}: {chunks:#?}", + chunks.len() + ); +} + +/// A cluster-scoped resource (no `metadata.namespace`) must produce a symbol +/// of the form `/` (two components, no namespace segment). +#[test] +fn k8s_cluster_scoped_resource_symbol() { + let yaml = "\ +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: cluster-admin +rules: +- apiGroups: [\"*\"] + resources: [\"*\"] + verbs: [\"*\"] +"; + + let doc = yaml_doc(yaml); + let chunks = K8sManifestResourceV1Chunker + .chunk(&doc, &policy()) + .expect("chunk"); + + assert_eq!( + chunks.len(), + 1, + "expected 1 chunk for cluster-scoped resource, got {}: {chunks:#?}", + chunks.len() + ); + + match &chunks[0].source_spans[0] { + SourceSpan::Code { symbol, lang, .. } => { + assert_eq!( + symbol.as_deref(), + Some("ClusterRole/cluster-admin"), + "cluster-scoped symbol must be /" + ); + assert_eq!(lang.as_deref(), Some("yaml")); + } + other => panic!("expected Code span, got {other:?}"), + } +} From 51004ac59327680022afe5090875f391f0875272 Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 20 May 2026 13:09:13 +0000 Subject: [PATCH 07/13] feat(p10-2): dockerfile-file-v1 chunker (whole-file 1 chunk, symbol ) Reads entire Dockerfile / Dockerfile.* / *.dockerfile content and emits a single Chunk with symbol "", code_lang "dockerfile", line range 1..EOF. Oversize >200 lines splits into line-windows sharing the symbol via tier2_shared::push_chunks_with_oversize. Co-Authored-By: Claude Opus 4.7 (1M context) --- crates/kebab-chunk/src/dockerfile_file_v1.rs | 57 ++++++++ crates/kebab-chunk/src/lib.rs | 2 + .../kebab-chunk/tests/dockerfile_file_v1.rs | 134 ++++++++++++++++++ .../tests/fixtures/sample.dockerfile | 5 + 4 files changed, 198 insertions(+) create mode 100644 crates/kebab-chunk/src/dockerfile_file_v1.rs create mode 100644 crates/kebab-chunk/tests/dockerfile_file_v1.rs create mode 100644 crates/kebab-chunk/tests/fixtures/sample.dockerfile diff --git a/crates/kebab-chunk/src/dockerfile_file_v1.rs b/crates/kebab-chunk/src/dockerfile_file_v1.rs new file mode 100644 index 0000000..519d1ae --- /dev/null +++ b/crates/kebab-chunk/src/dockerfile_file_v1.rs @@ -0,0 +1,57 @@ +//! p10-2: dockerfile whole-file chunker (Tier 2). +//! +//! Reads entire Dockerfile content and emits a single Chunk with symbol +//! "", code_lang "dockerfile", line range 1..EOF. +//! Oversize >200 lines splits into line-windows sharing the symbol via +//! tier2_shared::push_chunks_with_oversize. + +use crate::tier2_shared::{policy_hash, push_chunks_with_oversize}; +use anyhow::Result; +use kebab_core::{Block, CanonicalDocument, Chunk, ChunkPolicy, ChunkerVersion, Chunker}; + +pub const VERSION_LABEL: &str = "dockerfile-file-v1"; + +#[derive(Clone, Copy, Debug, Default)] +pub struct DockerfileFileV1Chunker; + +impl Chunker for DockerfileFileV1Chunker { + fn chunker_version(&self) -> ChunkerVersion { + ChunkerVersion(VERSION_LABEL.to_string()) + } + + fn policy_hash(&self, policy: &ChunkPolicy) -> String { + policy_hash(policy) + } + + fn chunk(&self, doc: &CanonicalDocument, policy: &ChunkPolicy) -> Result> { + // Expect a single Block::Code carrying the full Dockerfile text. + let text = match doc.blocks.first() { + Some(Block::Code(cb)) => cb.code.as_str(), + _ => return Ok(vec![]), + }; + + let total_lines = text.lines().count().max(1) as u32; + let mut chunks = Vec::new(); + + push_chunks_with_oversize( + &mut chunks, + doc, + policy, + text, + 1, + total_lines, + "", + "dockerfile", + VERSION_LABEL, + )?; + + tracing::debug!( + target: "kebab-chunk", + doc_id = %doc.doc_id, + chunks = chunks.len(), + "dockerfile-file-v1 chunked", + ); + + Ok(chunks) + } +} diff --git a/crates/kebab-chunk/src/lib.rs b/crates/kebab-chunk/src/lib.rs index 516620a..a700e91 100644 --- a/crates/kebab-chunk/src/lib.rs +++ b/crates/kebab-chunk/src/lib.rs @@ -26,6 +26,7 @@ mod md_heading_v1; mod pdf_page_v1; mod tier2_shared; pub mod k8s_manifest_resource_v1; +pub mod dockerfile_file_v1; pub use code_go_ast_v1::CodeGoAstV1Chunker; pub use code_java_ast_v1::CodeJavaAstV1Chunker; @@ -37,3 +38,4 @@ pub use code_ts_ast_v1::CodeTsAstV1Chunker; pub use md_heading_v1::MdHeadingV1Chunker; pub use pdf_page_v1::PdfPageV1Chunker; pub use k8s_manifest_resource_v1::K8sManifestResourceV1Chunker; +pub use dockerfile_file_v1::DockerfileFileV1Chunker; diff --git a/crates/kebab-chunk/tests/dockerfile_file_v1.rs b/crates/kebab-chunk/tests/dockerfile_file_v1.rs new file mode 100644 index 0000000..44dd94a --- /dev/null +++ b/crates/kebab-chunk/tests/dockerfile_file_v1.rs @@ -0,0 +1,134 @@ +//! Behavioural tests for `DockerfileFileV1Chunker`. +//! +//! Documents are constructed manually (no kebab-parse-code dependency) by +//! placing the raw Dockerfile text into a single `Block::Code`, mirroring the +//! pattern used in `k8s_manifest_resource_v1.rs`. + +use std::path::PathBuf; + +use kebab_chunk::DockerfileFileV1Chunker; +use kebab_core::{ + AssetId, Block, CanonicalDocument, ChunkPolicy, Chunker, ChunkerVersion, CodeBlock, + CommonBlock, Lang, Metadata, ParserVersion, Provenance, SourceSpan, SourceType, TrustLevel, + WorkspacePath, id_for_block, id_for_doc, +}; +use time::OffsetDateTime; + +// โ”€โ”€ helpers โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ + +fn fixtures_dir() -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests") + .join("fixtures") +} + +/// Build a `CanonicalDocument` with a single `Block::Code` containing `dockerfile_text`. +fn dockerfile_doc(dockerfile_text: &str) -> CanonicalDocument { + let wp = WorkspacePath("build/Dockerfile".into()); + let aid = AssetId("d".repeat(64)); + let pv = ParserVersion("code-dockerfile-v1".into()); + let doc_id = id_for_doc(&wp, &aid, &pv); + + let line_count = dockerfile_text.lines().count() as u32; + let span = SourceSpan::Code { + line_start: 1, + line_end: line_count.max(1), + symbol: None, + lang: Some("dockerfile".into()), + }; + let bid = id_for_block(&doc_id, "code", &[], 0, &span); + let block = Block::Code(CodeBlock { + common: CommonBlock { + block_id: bid, + heading_path: vec![], + source_span: span, + }, + lang: Some("dockerfile".into()), + code: dockerfile_text.to_string(), + }); + + CanonicalDocument { + doc_id, + source_asset_id: aid, + workspace_path: wp, + title: "Dockerfile".into(), + lang: Lang("und".into()), + blocks: vec![block], + metadata: Metadata { + aliases: vec![], + tags: vec![], + created_at: OffsetDateTime::from_unix_timestamp(1_700_000_000).unwrap(), + updated_at: OffsetDateTime::from_unix_timestamp(1_700_000_000).unwrap(), + source_type: SourceType::Note, + trust_level: TrustLevel::Primary, + user_id_alias: None, + user: Default::default(), + repo: Some("kebab".into()), + git_branch: Some("main".into()), + git_commit: Some("0".repeat(40)), + code_lang: Some("dockerfile".into()), + }, + provenance: Provenance { events: vec![] }, + parser_version: pv, + schema_version: 1, + doc_version: 1, + last_chunker_version: None, + last_embedding_version: None, + } +} + +fn policy() -> ChunkPolicy { + ChunkPolicy { + target_tokens: 500, + overlap_tokens: 80, + respect_markdown_headings: false, + chunker_version: ChunkerVersion("dockerfile-file-v1".into()), + } +} + +// โ”€โ”€ tests โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ + +/// A simple 5-line Dockerfile fixture must emit exactly 1 chunk with the +/// correct symbol, lang, and line range. +#[test] +fn dockerfile_emits_single_chunk() { + let fixture_path = fixtures_dir().join("sample.dockerfile"); + let text = std::fs::read_to_string(&fixture_path) + .unwrap_or_else(|e| panic!("cannot read fixture {}: {e}", fixture_path.display())); + + let doc = dockerfile_doc(&text); + let chunks = DockerfileFileV1Chunker + .chunk(&doc, &policy()) + .expect("chunk"); + + assert_eq!( + chunks.len(), + 1, + "expected 1 chunk, got {}: {chunks:#?}", + chunks.len() + ); + + // Inspect the Chunk's source_spans for symbol / lang / line range. + let span = chunks[0].source_spans.first().expect("at least one span"); + match span { + SourceSpan::Code { + line_start, + line_end, + symbol, + lang, + } => { + assert_eq!(*line_start, 1, "line_start must be 1"); + assert_eq!(*line_end, 5, "line_end must be 5 (5-line fixture)"); + assert_eq!( + symbol.as_deref(), + Some(""), + "symbol must be ''" + ); + assert_eq!(lang.as_deref(), Some("dockerfile"), "lang must be 'dockerfile'"); + } + other => panic!("expected SourceSpan::Code, got {other:?}"), + } + + // Verify chunker_version label. + assert_eq!(chunks[0].chunker_version.0, "dockerfile-file-v1"); +} diff --git a/crates/kebab-chunk/tests/fixtures/sample.dockerfile b/crates/kebab-chunk/tests/fixtures/sample.dockerfile new file mode 100644 index 0000000..94352b8 --- /dev/null +++ b/crates/kebab-chunk/tests/fixtures/sample.dockerfile @@ -0,0 +1,5 @@ +FROM rust:1.94-slim AS builder +WORKDIR /app +COPY . . +RUN cargo build --release +CMD ["/app/target/release/kebab"] From 22d41617284e0ec55cf9a9dc8fc9a050591e5ded Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 20 May 2026 13:11:46 +0000 Subject: [PATCH 08/13] feat(p10-2): manifest-file-v1 chunker (whole-file 1 chunk, symbol ) Emits 1 Chunk per manifest file (Cargo.toml / pyproject.toml / package.json / tsconfig.json / pom.xml / build.gradle / go.mod). Symbol unified to ""; manifest type distinguished by code_lang (toml / json / xml / groovy / go-mod) read from Block::Code.lang. Oversize >200 lines splits via tier2_shared::push_chunks_with_oversize. Co-Authored-By: Claude Opus 4.7 (1M context) --- crates/kebab-chunk/src/lib.rs | 2 + crates/kebab-chunk/src/manifest_file_v1.rs | 58 ++++ .../tests/fixtures/sample_cargo.toml | 7 + .../kebab-chunk/tests/fixtures/sample_go.mod | 5 + .../tests/fixtures/sample_package.json | 7 + .../kebab-chunk/tests/fixtures/sample_pom.xml | 7 + crates/kebab-chunk/tests/manifest_file_v1.rs | 267 ++++++++++++++++++ 7 files changed, 353 insertions(+) create mode 100644 crates/kebab-chunk/src/manifest_file_v1.rs create mode 100644 crates/kebab-chunk/tests/fixtures/sample_cargo.toml create mode 100644 crates/kebab-chunk/tests/fixtures/sample_go.mod create mode 100644 crates/kebab-chunk/tests/fixtures/sample_package.json create mode 100644 crates/kebab-chunk/tests/fixtures/sample_pom.xml create mode 100644 crates/kebab-chunk/tests/manifest_file_v1.rs diff --git a/crates/kebab-chunk/src/lib.rs b/crates/kebab-chunk/src/lib.rs index a700e91..9b65e05 100644 --- a/crates/kebab-chunk/src/lib.rs +++ b/crates/kebab-chunk/src/lib.rs @@ -27,6 +27,7 @@ mod pdf_page_v1; mod tier2_shared; pub mod k8s_manifest_resource_v1; pub mod dockerfile_file_v1; +pub mod manifest_file_v1; pub use code_go_ast_v1::CodeGoAstV1Chunker; pub use code_java_ast_v1::CodeJavaAstV1Chunker; @@ -39,3 +40,4 @@ pub use md_heading_v1::MdHeadingV1Chunker; pub use pdf_page_v1::PdfPageV1Chunker; pub use k8s_manifest_resource_v1::K8sManifestResourceV1Chunker; pub use dockerfile_file_v1::DockerfileFileV1Chunker; +pub use manifest_file_v1::ManifestFileV1Chunker; diff --git a/crates/kebab-chunk/src/manifest_file_v1.rs b/crates/kebab-chunk/src/manifest_file_v1.rs new file mode 100644 index 0000000..1e859e0 --- /dev/null +++ b/crates/kebab-chunk/src/manifest_file_v1.rs @@ -0,0 +1,58 @@ +//! p10-2: manifest whole-file chunker (Tier 2). +//! +//! Reads entire manifest file (Cargo.toml / package.json / pom.xml / go.mod / +//! build.gradle / pyproject.toml / tsconfig.json) and emits a single Chunk +//! with symbol "", code_lang read from Block::Code.lang, line range +//! 1..EOF. Oversize >200 lines splits into line-windows sharing the symbol via +//! tier2_shared::push_chunks_with_oversize. + +use crate::tier2_shared::{policy_hash, push_chunks_with_oversize}; +use anyhow::Result; +use kebab_core::{Block, CanonicalDocument, Chunk, ChunkPolicy, ChunkerVersion, Chunker}; + +pub const VERSION_LABEL: &str = "manifest-file-v1"; + +#[derive(Clone, Copy, Debug, Default)] +pub struct ManifestFileV1Chunker; + +impl Chunker for ManifestFileV1Chunker { + fn chunker_version(&self) -> ChunkerVersion { + ChunkerVersion(VERSION_LABEL.to_string()) + } + + fn policy_hash(&self, policy: &ChunkPolicy) -> String { + policy_hash(policy) + } + + fn chunk(&self, doc: &CanonicalDocument, policy: &ChunkPolicy) -> Result> { + // Expect a single Block::Code carrying the full manifest text. + let (text, lang) = match doc.blocks.first() { + Some(Block::Code(cb)) => (cb.code.as_str(), cb.lang.as_deref().unwrap_or("")), + _ => return Ok(vec![]), + }; + + let total_lines = text.lines().count().max(1) as u32; + let mut chunks = Vec::new(); + + push_chunks_with_oversize( + &mut chunks, + doc, + policy, + text, + 1, + total_lines, + "", + lang, + VERSION_LABEL, + )?; + + tracing::debug!( + target: "kebab-chunk", + doc_id = %doc.doc_id, + chunks = chunks.len(), + "manifest-file-v1 chunked", + ); + + Ok(chunks) + } +} diff --git a/crates/kebab-chunk/tests/fixtures/sample_cargo.toml b/crates/kebab-chunk/tests/fixtures/sample_cargo.toml new file mode 100644 index 0000000..cae4a85 --- /dev/null +++ b/crates/kebab-chunk/tests/fixtures/sample_cargo.toml @@ -0,0 +1,7 @@ +[package] +name = "demo" +version = "0.1.0" +edition = "2021" + +[dependencies] +serde = "1" diff --git a/crates/kebab-chunk/tests/fixtures/sample_go.mod b/crates/kebab-chunk/tests/fixtures/sample_go.mod new file mode 100644 index 0000000..b2af9d4 --- /dev/null +++ b/crates/kebab-chunk/tests/fixtures/sample_go.mod @@ -0,0 +1,5 @@ +module example.com/demo + +go 1.22 + +require github.com/spf13/cobra v1.8.0 diff --git a/crates/kebab-chunk/tests/fixtures/sample_package.json b/crates/kebab-chunk/tests/fixtures/sample_package.json new file mode 100644 index 0000000..a84f652 --- /dev/null +++ b/crates/kebab-chunk/tests/fixtures/sample_package.json @@ -0,0 +1,7 @@ +{ + "name": "demo", + "version": "0.1.0", + "dependencies": { + "react": "^18.0.0" + } +} diff --git a/crates/kebab-chunk/tests/fixtures/sample_pom.xml b/crates/kebab-chunk/tests/fixtures/sample_pom.xml new file mode 100644 index 0000000..c6dd05d --- /dev/null +++ b/crates/kebab-chunk/tests/fixtures/sample_pom.xml @@ -0,0 +1,7 @@ + + + 4.0.0 + com.demo + demo + 0.1.0 + diff --git a/crates/kebab-chunk/tests/manifest_file_v1.rs b/crates/kebab-chunk/tests/manifest_file_v1.rs new file mode 100644 index 0000000..297c563 --- /dev/null +++ b/crates/kebab-chunk/tests/manifest_file_v1.rs @@ -0,0 +1,267 @@ +//! Behavioural tests for `ManifestFileV1Chunker`. +//! +//! Documents are constructed manually (no kebab-parse-code dependency) by +//! placing the raw manifest text into a single `Block::Code`, mirroring the +//! pattern used in `dockerfile_file_v1.rs`. + +use std::path::PathBuf; + +use kebab_chunk::ManifestFileV1Chunker; +use kebab_core::{ + AssetId, Block, CanonicalDocument, ChunkPolicy, Chunker, ChunkerVersion, CodeBlock, + CommonBlock, Lang, Metadata, ParserVersion, Provenance, SourceSpan, SourceType, TrustLevel, + WorkspacePath, id_for_block, id_for_doc, +}; +use time::OffsetDateTime; + +// โ”€โ”€ helpers โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ + +fn fixtures_dir() -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests") + .join("fixtures") +} + +/// Build a `CanonicalDocument` with a single `Block::Code` containing manifest text. +fn manifest_doc(lang: &str, manifest_text: &str) -> CanonicalDocument { + let wp = WorkspacePath(format!("build/{}", manifest_filename(lang))); + let aid = AssetId("m".repeat(64)); + let pv = ParserVersion("code-manifest-v1".into()); + let doc_id = id_for_doc(&wp, &aid, &pv); + + let line_count = manifest_text.lines().count() as u32; + let span = SourceSpan::Code { + line_start: 1, + line_end: line_count.max(1), + symbol: None, + lang: Some(lang.into()), + }; + let bid = id_for_block(&doc_id, "code", &[], 0, &span); + let block = Block::Code(CodeBlock { + common: CommonBlock { + block_id: bid, + heading_path: vec![], + source_span: span, + }, + lang: Some(lang.into()), + code: manifest_text.to_string(), + }); + + CanonicalDocument { + doc_id, + source_asset_id: aid, + workspace_path: wp, + title: format!("Manifest ({})", lang), + lang: Lang("und".into()), + blocks: vec![block], + metadata: Metadata { + aliases: vec![], + tags: vec![], + created_at: OffsetDateTime::from_unix_timestamp(1_700_000_000).unwrap(), + updated_at: OffsetDateTime::from_unix_timestamp(1_700_000_000).unwrap(), + source_type: SourceType::Note, + trust_level: TrustLevel::Primary, + user_id_alias: None, + user: Default::default(), + repo: Some("kebab".into()), + git_branch: Some("main".into()), + git_commit: Some("0".repeat(40)), + code_lang: Some(lang.into()), + }, + provenance: Provenance { events: vec![] }, + parser_version: pv, + schema_version: 1, + doc_version: 1, + last_chunker_version: None, + last_embedding_version: None, + } +} + +fn manifest_filename(lang: &str) -> &'static str { + match lang { + "toml" => "Cargo.toml", + "json" => "package.json", + "xml" => "pom.xml", + "go-mod" => "go.mod", + _ => "manifest", + } +} + +fn policy() -> ChunkPolicy { + ChunkPolicy { + target_tokens: 500, + overlap_tokens: 80, + respect_markdown_headings: false, + chunker_version: ChunkerVersion("manifest-file-v1".into()), + } +} + +// โ”€โ”€ tests โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ + +/// A Cargo.toml fixture must emit exactly 1 chunk with the correct symbol, +/// lang, and line range. +#[test] +fn cargo_toml_single_chunk_with_toml_lang() { + let fixture_path = fixtures_dir().join("sample_cargo.toml"); + let text = std::fs::read_to_string(&fixture_path) + .unwrap_or_else(|e| panic!("cannot read fixture {}: {e}", fixture_path.display())); + + let doc = manifest_doc("toml", &text); + let chunks = ManifestFileV1Chunker + .chunk(&doc, &policy()) + .expect("chunk"); + + assert_eq!( + chunks.len(), + 1, + "expected 1 chunk, got {}: {chunks:#?}", + chunks.len() + ); + + let span = chunks[0].source_spans.first().expect("at least one span"); + match span { + SourceSpan::Code { + line_start, + line_end: _, + symbol, + lang, + } => { + assert_eq!(*line_start, 1, "line_start must be 1"); + assert_eq!( + symbol.as_deref(), + Some(""), + "symbol must be ''" + ); + assert_eq!(lang.as_deref(), Some("toml"), "lang must be 'toml'"); + } + other => panic!("expected SourceSpan::Code, got {other:?}"), + } + + assert_eq!(chunks[0].chunker_version.0, "manifest-file-v1"); +} + +/// A package.json fixture must emit exactly 1 chunk with the correct symbol, +/// lang, and line range. +#[test] +fn package_json_single_chunk_with_json_lang() { + let fixture_path = fixtures_dir().join("sample_package.json"); + let text = std::fs::read_to_string(&fixture_path) + .unwrap_or_else(|e| panic!("cannot read fixture {}: {e}", fixture_path.display())); + + let doc = manifest_doc("json", &text); + let chunks = ManifestFileV1Chunker + .chunk(&doc, &policy()) + .expect("chunk"); + + assert_eq!( + chunks.len(), + 1, + "expected 1 chunk, got {}: {chunks:#?}", + chunks.len() + ); + + let span = chunks[0].source_spans.first().expect("at least one span"); + match span { + SourceSpan::Code { + line_start, + line_end: _, + symbol, + lang, + } => { + assert_eq!(*line_start, 1, "line_start must be 1"); + assert_eq!( + symbol.as_deref(), + Some(""), + "symbol must be ''" + ); + assert_eq!(lang.as_deref(), Some("json"), "lang must be 'json'"); + } + other => panic!("expected SourceSpan::Code, got {other:?}"), + } + + assert_eq!(chunks[0].chunker_version.0, "manifest-file-v1"); +} + +/// A pom.xml fixture must emit exactly 1 chunk with the correct symbol, +/// lang, and line range. +#[test] +fn pom_xml_single_chunk_with_xml_lang() { + let fixture_path = fixtures_dir().join("sample_pom.xml"); + let text = std::fs::read_to_string(&fixture_path) + .unwrap_or_else(|e| panic!("cannot read fixture {}: {e}", fixture_path.display())); + + let doc = manifest_doc("xml", &text); + let chunks = ManifestFileV1Chunker + .chunk(&doc, &policy()) + .expect("chunk"); + + assert_eq!( + chunks.len(), + 1, + "expected 1 chunk, got {}: {chunks:#?}", + chunks.len() + ); + + let span = chunks[0].source_spans.first().expect("at least one span"); + match span { + SourceSpan::Code { + line_start, + line_end: _, + symbol, + lang, + } => { + assert_eq!(*line_start, 1, "line_start must be 1"); + assert_eq!( + symbol.as_deref(), + Some(""), + "symbol must be ''" + ); + assert_eq!(lang.as_deref(), Some("xml"), "lang must be 'xml'"); + } + other => panic!("expected SourceSpan::Code, got {other:?}"), + } + + assert_eq!(chunks[0].chunker_version.0, "manifest-file-v1"); +} + +/// A go.mod fixture must emit exactly 1 chunk with the correct symbol, +/// lang, and line range. +#[test] +fn go_mod_single_chunk_with_go_mod_lang() { + let fixture_path = fixtures_dir().join("sample_go.mod"); + let text = std::fs::read_to_string(&fixture_path) + .unwrap_or_else(|e| panic!("cannot read fixture {}: {e}", fixture_path.display())); + + let doc = manifest_doc("go-mod", &text); + let chunks = ManifestFileV1Chunker + .chunk(&doc, &policy()) + .expect("chunk"); + + assert_eq!( + chunks.len(), + 1, + "expected 1 chunk, got {}: {chunks:#?}", + chunks.len() + ); + + let span = chunks[0].source_spans.first().expect("at least one span"); + match span { + SourceSpan::Code { + line_start, + line_end: _, + symbol, + lang, + } => { + assert_eq!(*line_start, 1, "line_start must be 1"); + assert_eq!( + symbol.as_deref(), + Some(""), + "symbol must be ''" + ); + assert_eq!(lang.as_deref(), Some("go-mod"), "lang must be 'go-mod'"); + } + other => panic!("expected SourceSpan::Code, got {other:?}"), + } + + assert_eq!(chunks[0].chunker_version.0, "manifest-file-v1"); +} From 226ce8b744a673699aa026ed32885a871ddfe17b Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 20 May 2026 13:19:54 +0000 Subject: [PATCH 09/13] feat(p10-2): activate Tier 2 chunkers in ingest_one_code_asset dispatch Adds yaml / dockerfile / toml / json / xml / groovy / go-mod arms to the existing 7-arm AST match. parser_version unified to "none-v1" for Tier 2. synthesize_tier2_document builds a minimal Document (single Block::Code with raw file text) since Tier 2 has no parse step. allowlist in ingest_one_asset extended to admit Tier 2 langs. Co-Authored-By: Claude Opus 4.7 (1M context) --- crates/kebab-app/src/lib.rs | 163 ++++++++++++++++++++++++++++++++++-- 1 file changed, 158 insertions(+), 5 deletions(-) diff --git a/crates/kebab-app/src/lib.rs b/crates/kebab-app/src/lib.rs index 08c89cc..149b687 100644 --- a/crates/kebab-app/src/lib.rs +++ b/crates/kebab-app/src/lib.rs @@ -39,7 +39,7 @@ use std::sync::Arc; use anyhow::{Context, anyhow}; use serde::{Deserialize, Serialize}; -use kebab_chunk::{CodeGoAstV1Chunker, CodeJavaAstV1Chunker, CodeJsAstV1Chunker, CodeKotlinAstV1Chunker, CodePythonAstV1Chunker, CodeRustAstV1Chunker, CodeTsAstV1Chunker, MdHeadingV1Chunker, PdfPageV1Chunker}; +use kebab_chunk::{CodeGoAstV1Chunker, CodeJavaAstV1Chunker, CodeJsAstV1Chunker, CodeKotlinAstV1Chunker, CodePythonAstV1Chunker, CodeRustAstV1Chunker, CodeTsAstV1Chunker, DockerfileFileV1Chunker, K8sManifestResourceV1Chunker, ManifestFileV1Chunker, MdHeadingV1Chunker, PdfPageV1Chunker}; use kebab_core::{ Answer, Block, CanonicalDocument, Chunk, ChunkId, ChunkPolicy, ChunkerVersion, Chunker, DocFilter, DocSummary, DocumentId, DocumentStore, Embedder, EmbeddingInput, @@ -948,10 +948,11 @@ fn ingest_one_asset( force_reingest, ); } - // p10-1A-2 / 1B: code ingest dispatch. + // p10-1A-2 / 1B: code ingest dispatch. p10-2: Tier 2 langs added. MediaType::Code(lang) if matches!(lang.as_str(), - "rust" | "python" | "typescript" | "javascript" | "go" | "java" | "kotlin") => + "rust" | "python" | "typescript" | "javascript" | "go" | "java" | "kotlin" + | "yaml" | "dockerfile" | "toml" | "json" | "xml" | "groovy" | "go-mod") => { return ingest_one_code_asset( app, @@ -1831,6 +1832,9 @@ fn ingest_one_code_asset( "go" => ParserVersion(kebab_parse_code::GO_PARSER_VERSION.to_string()), "java" => ParserVersion(kebab_parse_code::JAVA_PARSER_VERSION.to_string()), "kotlin" => ParserVersion(kebab_parse_code::KOTLIN_PARSER_VERSION.to_string()), + // p10-2: Tier 2 has no parse step โ€” sentinel "none-v1". + "yaml" | "dockerfile" | "toml" | "json" | "xml" | "groovy" | "go-mod" + => ParserVersion("none-v1".to_string()), other => anyhow::bail!("unsupported code_lang: {other}"), }; @@ -1842,7 +1846,12 @@ fn ingest_one_code_asset( "javascript" => CodeJsAstV1Chunker.chunker_version(), "go" => CodeGoAstV1Chunker.chunker_version(), "java" => CodeJavaAstV1Chunker.chunker_version(), - "kotlin" => CodeKotlinAstV1Chunker.chunker_version(), + "kotlin" => CodeKotlinAstV1Chunker.chunker_version(), + // p10-2 Tier 2: + "yaml" => K8sManifestResourceV1Chunker.chunker_version(), + "dockerfile" => DockerfileFileV1Chunker.chunker_version(), + "toml" | "json" | "xml" | "groovy" | "go-mod" + => ManifestFileV1Chunker.chunker_version(), other => anyhow::bail!("unreachable chunker_version: {other}"), }; @@ -1890,6 +1899,10 @@ fn ingest_one_code_asset( "kotlin" => KotlinAstExtractor::new() .extract(&ctx, &bytes) .context("kb-parse-code::KotlinAstExtractor::extract (code:kotlin)")?, + // p10-2 Tier 2: no extractor โ€” synthesize Document directly from raw bytes. + "yaml" | "dockerfile" | "toml" | "json" | "xml" | "groovy" | "go-mod" => { + synthesize_tier2_document(asset, &bytes, code_lang, &parser_version)? + } other => anyhow::bail!("unreachable (extract): {other}"), }; @@ -1913,9 +1926,20 @@ fn ingest_one_code_asset( "java" => CodeJavaAstV1Chunker .chunk(&canonical, chunk_policy) .context("kb-chunk::CodeJavaAstV1Chunker::chunk (code:java)")?, - "kotlin" => CodeKotlinAstV1Chunker + "kotlin" => CodeKotlinAstV1Chunker .chunk(&canonical, chunk_policy) .context("kb-chunk::CodeKotlinAstV1Chunker::chunk (code:kotlin)")?, + // p10-2 Tier 2: + "yaml" => K8sManifestResourceV1Chunker + .chunk(&canonical, chunk_policy) + .context("kb-chunk::K8sManifestResourceV1Chunker::chunk")?, + "dockerfile" => DockerfileFileV1Chunker + .chunk(&canonical, chunk_policy) + .context("kb-chunk::DockerfileFileV1Chunker::chunk")?, + "toml" | "json" | "xml" | "groovy" | "go-mod" + => ManifestFileV1Chunker + .chunk(&canonical, chunk_policy) + .context("kb-chunk::ManifestFileV1Chunker::chunk")?, other => anyhow::bail!("unreachable (chunk): {other}"), }; @@ -2011,6 +2035,135 @@ fn ingest_one_code_asset( }) } +/// p10-2: Build a minimal [`CanonicalDocument`] for Tier 2 code assets +/// (yaml / dockerfile / toml / json / xml / groovy / go-mod) that have +/// no AST extractor. Produces a single `Block::Code` whose source span +/// covers the entire file, mirroring the shape the Tier 1 extractors +/// produce for glue / top-level regions. +fn synthesize_tier2_document( + asset: &RawAsset, + bytes: &[u8], + code_lang: &str, + parser_version: &ParserVersion, +) -> anyhow::Result { + use anyhow::Context as _; + use kebab_core::{ + BlockId, CodeBlock, CommonBlock, Lang, Metadata, Provenance, ProvenanceEvent, + ProvenanceKind, SourceSpan, SourceType, TrustLevel, id_for_block, id_for_doc, + }; + + let text = std::str::from_utf8(bytes) + .with_context(|| format!("tier2 doc not utf-8: {}", asset.workspace_path.0))? + .to_string(); + + let doc_id = id_for_doc(&asset.workspace_path, &asset.asset_id, parser_version); + + let n_lines = text.lines().count().max(1) as u32; + let span = SourceSpan::Code { + line_start: 1, + line_end: n_lines, + symbol: Some("".to_string()), + lang: Some(code_lang.to_string()), + }; + let block_id: BlockId = id_for_block( + &doc_id, + "code", + &[], + 0, + &span, + ); + let block = kebab_core::Block::Code(CodeBlock { + common: CommonBlock { + block_id, + heading_path: vec![], + source_span: span, + }, + lang: Some(code_lang.to_string()), + code: text, + }); + + let now = time::OffsetDateTime::now_utc(); + let events = vec![ + ProvenanceEvent { + at: asset.discovered_at, + agent: "kb-source-fs".to_string(), + kind: ProvenanceKind::Discovered, + note: None, + }, + ProvenanceEvent { + at: now, + agent: "kb-app".to_string(), + kind: ProvenanceKind::Parsed, + note: Some(format!( + "parser_version={}; tier2_synthesized; lang={}", + parser_version.0, code_lang + )), + }, + ]; + + // Resolve abs path for repo detection (mirrors RustAstExtractor pattern). + let workspace_root = std::path::PathBuf::new(); // not needed for detect_repo walk + let abs_path = match &asset.source_uri { + kebab_core::SourceUri::File(p) => p.clone(), + kebab_core::SourceUri::Kb(_) => workspace_root, + }; + let (repo, git_branch, git_commit) = match kebab_parse_code::detect_repo(&abs_path) { + Some(r) => (Some(r.name), r.branch, r.commit), + None => (None, None, None), + }; + + let title = { + let fname = asset.workspace_path.0 + .rsplit('/') + .next() + .unwrap_or(&asset.workspace_path.0); + // strip extension + match fname.rfind('.') { + Some(i) => fname[..i].to_string(), + None => fname.to_string(), + } + }; + + let metadata = Metadata { + aliases: vec![], + tags: vec![], + created_at: asset.discovered_at, + updated_at: asset.discovered_at, + source_type: SourceType::Note, + trust_level: TrustLevel::Primary, + user_id_alias: None, + user: serde_json::Map::new(), + repo, + git_branch, + git_commit, + code_lang: Some(code_lang.to_string()), + }; + + tracing::debug!( + target: "kebab-app", + "synthesized tier2 doc_id={} workspace_path={} lang={}", + doc_id.0, + asset.workspace_path.0, + code_lang, + ); + + Ok(kebab_core::CanonicalDocument { + doc_id, + source_asset_id: asset.asset_id.clone(), + workspace_path: asset.workspace_path.clone(), + title, + lang: Lang("und".to_string()), + blocks: vec![block], + metadata, + provenance: Provenance { events }, + parser_version: parser_version.clone(), + schema_version: 1, + doc_version: 1, + last_chunker_version: None, + last_embedding_version: None, + }) +} + /// Pull the BCP-47 language hint from the canonical document. P6-1 /// stamps `Lang("und")` by default; image-pipeline OCR / caption /// adapters special-case "und" so the hint is intentionally dropped From 166e1ddfaf28d872b6365485e67fa62523a4f0c1 Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 20 May 2026 13:23:01 +0000 Subject: [PATCH 10/13] test(p10-2): integration smoke tests for Tier 2 (k8s yaml + Dockerfile + Cargo.toml) Three new tests in code_ingest_smoke.rs verifying isolated-TempDir ingest + --code-lang filter + Citation::Code.lang / .symbol shape for each Tier 2 chunker. Brings the suite to 12 tests (Rust 3 + Python 1 + TS 1 + JS 1 + Go 1 + Java 1 + Kotlin 1 + yaml 1 + dockerfile 1 + manifest 1). Co-Authored-By: Claude Opus 4.7 (1M context) --- crates/kebab-app/tests/code_ingest_smoke.rs | 247 ++++++++++++++++++++ 1 file changed, 247 insertions(+) diff --git a/crates/kebab-app/tests/code_ingest_smoke.rs b/crates/kebab-app/tests/code_ingest_smoke.rs index 6cffb66..69ac528 100644 --- a/crates/kebab-app/tests/code_ingest_smoke.rs +++ b/crates/kebab-app/tests/code_ingest_smoke.rs @@ -603,6 +603,253 @@ fn kotlin_file_ingests_and_searches_as_code_citation() { ); } +/// p10-2 Task H: a `k8s/deploy.yaml` file with a Deployment resource is +/// ingested and the resulting `Citation::Code` hit must carry +/// `lang="yaml"`, `symbol="Deployment/prod/api"`, and `line_start >= 1`. +/// Exercises the k8s-manifest-resource-v1 chunker end-to-end. +#[test] +fn tier2_k8s_yaml_ingest_searchable() { + let env = TestEnv::lexical_only(); + + let k8s_dir = env.workspace_root.join("k8s"); + std::fs::create_dir_all(&k8s_dir).unwrap(); + std::fs::write( + k8s_dir.join("deploy.yaml"), + "apiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: api\n namespace: prod\nspec:\n replicas: 1\n", + ) + .unwrap(); + + let report = kebab_app::ingest_with_config(env.config.clone(), env.scope(), false) + .expect("ingest must succeed"); + assert_eq!(report.errors, 0, "no ingest errors: {report:?}"); + assert!(report.new >= 1, "yaml file ingested: {report:?}"); + + let yaml_item = report + .items + .as_ref() + .expect("items present") + .iter() + .find(|i| i.doc_path.0.ends_with("deploy.yaml")) + .expect("deploy.yaml item present"); + assert_eq!( + yaml_item.parser_version.as_ref().map(|p| p.0.as_str()), + Some("none-v1"), + "parser_version must be none-v1" + ); + assert_eq!( + yaml_item.chunker_version.as_ref().map(|c| c.0.as_str()), + Some("k8s-manifest-resource-v1"), + "chunker_version must be k8s-manifest-resource-v1" + ); + + let query = kebab_core::SearchQuery { + text: "api".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 10, + filters: kebab_core::SearchFilters { + code_lang: vec!["yaml".to_string()], + ..Default::default() + }, + }; + let hits = kebab_app::search_with_config(env.config.clone(), query) + .expect("search must succeed"); + + let h = hits + .iter() + .find(|h| matches!(&h.citation, Citation::Code { .. })) + .expect("at least one Citation::Code hit for 'api'"); + + match &h.citation { + Citation::Code { + lang, + symbol, + line_start, + .. + } => { + assert_eq!(lang.as_deref(), Some("yaml"), "citation.lang must be 'yaml'"); + assert_eq!( + symbol.as_deref(), + Some("Deployment/prod/api"), + "citation.symbol must be 'Deployment/prod/api'" + ); + assert!(*line_start >= 1, "line_start must be >=1"); + } + _ => unreachable!(), + } + + assert_eq!( + h.code_lang.as_deref(), + Some("yaml"), + "SearchHit.code_lang must be 'yaml'" + ); +} + +/// p10-2 Task H: a `Dockerfile` is ingested and the resulting +/// `Citation::Code` hit must carry `lang="dockerfile"`, +/// `symbol=""`, and `line_start >= 1`. +/// Exercises the dockerfile-file-v1 chunker end-to-end. +#[test] +fn tier2_dockerfile_ingest_searchable() { + let env = TestEnv::lexical_only(); + + std::fs::write( + env.workspace_root.join("Dockerfile"), + "FROM rust:1.94\nRUN cargo install foo\n", + ) + .unwrap(); + + let report = kebab_app::ingest_with_config(env.config.clone(), env.scope(), false) + .expect("ingest must succeed"); + assert_eq!(report.errors, 0, "no ingest errors: {report:?}"); + assert!(report.new >= 1, "Dockerfile ingested: {report:?}"); + + let df_item = report + .items + .as_ref() + .expect("items present") + .iter() + .find(|i| i.doc_path.0.ends_with("Dockerfile")) + .expect("Dockerfile item present"); + assert_eq!( + df_item.parser_version.as_ref().map(|p| p.0.as_str()), + Some("none-v1"), + "parser_version must be none-v1" + ); + assert_eq!( + df_item.chunker_version.as_ref().map(|c| c.0.as_str()), + Some("dockerfile-file-v1"), + "chunker_version must be dockerfile-file-v1" + ); + + let query = kebab_core::SearchQuery { + text: "cargo".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 10, + filters: kebab_core::SearchFilters { + code_lang: vec!["dockerfile".to_string()], + ..Default::default() + }, + }; + let hits = kebab_app::search_with_config(env.config.clone(), query) + .expect("search must succeed"); + + let h = hits + .iter() + .find(|h| matches!(&h.citation, Citation::Code { .. })) + .expect("at least one Citation::Code hit for 'cargo'"); + + match &h.citation { + Citation::Code { + lang, + symbol, + line_start, + .. + } => { + assert_eq!( + lang.as_deref(), + Some("dockerfile"), + "citation.lang must be 'dockerfile'" + ); + assert_eq!( + symbol.as_deref(), + Some(""), + "citation.symbol must be ''" + ); + assert!(*line_start >= 1, "line_start must be >=1"); + } + _ => unreachable!(), + } + + assert_eq!( + h.code_lang.as_deref(), + Some("dockerfile"), + "SearchHit.code_lang must be 'dockerfile'" + ); +} + +/// p10-2 Task H: a `Cargo.toml` manifest is ingested and the resulting +/// `Citation::Code` hit must carry `lang="toml"`, `symbol=""`, +/// and `line_start >= 1`. +/// Exercises the manifest-file-v1 chunker end-to-end. +#[test] +fn tier2_cargo_toml_ingest_searchable() { + let env = TestEnv::lexical_only(); + + std::fs::write( + env.workspace_root.join("Cargo.toml"), + "[package]\nname = \"demo\"\nversion = \"0.1.0\"\n", + ) + .unwrap(); + + let report = kebab_app::ingest_with_config(env.config.clone(), env.scope(), false) + .expect("ingest must succeed"); + assert_eq!(report.errors, 0, "no ingest errors: {report:?}"); + assert!(report.new >= 1, "Cargo.toml ingested: {report:?}"); + + let toml_item = report + .items + .as_ref() + .expect("items present") + .iter() + .find(|i| i.doc_path.0.ends_with("Cargo.toml")) + .expect("Cargo.toml item present"); + assert_eq!( + toml_item.parser_version.as_ref().map(|p| p.0.as_str()), + Some("none-v1"), + "parser_version must be none-v1" + ); + assert_eq!( + toml_item.chunker_version.as_ref().map(|c| c.0.as_str()), + Some("manifest-file-v1"), + "chunker_version must be manifest-file-v1" + ); + + let query = kebab_core::SearchQuery { + text: "demo".to_string(), + mode: kebab_core::SearchMode::Lexical, + k: 10, + filters: kebab_core::SearchFilters { + code_lang: vec!["toml".to_string()], + ..Default::default() + }, + }; + let hits = kebab_app::search_with_config(env.config.clone(), query) + .expect("search must succeed"); + + let h = hits + .iter() + .find(|h| matches!(&h.citation, Citation::Code { .. })) + .expect("at least one Citation::Code hit for 'demo'"); + + match &h.citation { + Citation::Code { + lang, + symbol, + line_start, + .. + } => { + assert_eq!( + lang.as_deref(), + Some("toml"), + "citation.lang must be 'toml'" + ); + assert_eq!( + symbol.as_deref(), + Some(""), + "citation.symbol must be ''" + ); + assert!(*line_start >= 1, "line_start must be >=1"); + } + _ => unreachable!(), + } + + assert_eq!( + h.code_lang.as_deref(), + Some("toml"), + "SearchHit.code_lang must be 'toml'" + ); +} + /// Re-ingesting the same `.rs` file without changes must report /// `Unchanged` (incremental-skip path exercised). #[test] From 522ae7b8bcc0e2fd874e6fa5b927d81da6988767 Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 20 May 2026 13:24:16 +0000 Subject: [PATCH 11/13] =?UTF-8?q?docs(p10-2):=20activate=20Tier=202=20in?= =?UTF-8?q?=20code-ingest=20design=20=C2=A710.1=20+=20=C2=A73.5=20mappings?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ยง3.5: add code_lang_for_path mappings xml / groovy / go-mod. ยง10.1: add deactivation log entry for p10-2 (3 Tier 2 chunkers active). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/superpowers/specs/2026-04-27-kebab-final-form-design.md | 2 ++ docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md | 3 +++ 2 files changed, 5 insertions(+) diff --git a/docs/superpowers/specs/2026-04-27-kebab-final-form-design.md b/docs/superpowers/specs/2026-04-27-kebab-final-form-design.md index c216ed6..b946439 100644 --- a/docs/superpowers/specs/2026-04-27-kebab-final-form-design.md +++ b/docs/superpowers/specs/2026-04-27-kebab-final-form-design.md @@ -1549,6 +1549,8 @@ transitional ํ˜•ํƒœ) ์˜ source of truth. **p10-1C-JavaKotlin ํ™œ์„ฑํ™” (Java + Kotlin) (2026-05-20)**: Java (`code-java-ast-v1`, `.java`) + Kotlin (`code-kotlin-ast-v1`, `.kt`/`.kts`) AST chunker ํ™œ์„ฑํ™”. symbol = `com.foo.Foo.bar` ํ˜•์‹ (ํŒจํ‚ค์ง€ + ํด๋ž˜์Šค + ๋ฉ”์„œ๋“œ/ํ•„๋“œ). Kotlin grammar ์€ `tree-sitter-kotlin-ng` ์‚ฌ์šฉ (bare `tree-sitter-kotlin` ์€ tree-sitter 0.21โ€“0.23 ๊ณ ์ฐฉ์œผ๋กœ ์‚ฌ์šฉ ๋ถˆ๊ฐ€). +**p10-2 ํ™œ์„ฑํ™” (Tier 2 chunker) (2026-05-20)**: Tier 2 resource-aware chunker 3์ข… ํ™œ์„ฑํ™” โ€” k8s-manifest-resource-v1 (`.yaml`/`.yml`), dockerfile-file-v1 (`Dockerfile`), manifest-file-v1 (`Cargo.toml` ๋“ฑ ์„ค์ • ํŒŒ์ผ). ์ถ”๊ฐ€ code_lang ๋งคํ•‘: XML (`.xml`, `pom.xml`), Groovy (`build.gradle`, `.gradle`), Go module (`go.mod`). + ### 10.2 MCP server transport (fb-30) `kebab mcp` ๊ฐ€ stdio JSON-RPC server. Rust SDK = `rmcp 1.6`. Tool surface diff --git a/docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md b/docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md index 3a780eb..98f0b0a 100644 --- a/docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md +++ b/docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md @@ -237,6 +237,9 @@ pub struct Metadata { - Dockerfile (`Dockerfile`, `*.dockerfile`) โ†’ `dockerfile` - TOML (`.toml`) โ†’ `toml` - JSON (`.json`) โ†’ `json` +- XML (`.xml`, `pom.xml`) โ†’ `xml` +- Groovy (`build.gradle`, `.gradle`) โ†’ `groovy` +- Go module (`go.mod`) โ†’ `go-mod` - Shell (`.sh`, `.bash`, `.zsh`) โ†’ `shell` - Make (`Makefile`, `*.mk`) โ†’ `make` - ๋ฏธ์ง€์› / Tier 3 fallback โ†’ null From 308666dbd5b6dd2e3ce4609cd83248eb9b2cba20 Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 20 May 2026 14:10:13 +0000 Subject: [PATCH 12/13] docs(p10-2): README/HANDOFF/ARCHITECTURE/SMOKE/INDEX sync + tasks/p10/INDEX MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit User-visible surface sync per the docs-split rule: - README adds Tier 2 langs (yaml / dockerfile / toml / json / xml / groovy / go-mod) to the ingestๆ”ฏๆด list and --code-lang options. - HANDOFF flips p10-2 phase row to โœ… (v0.14.0) and updates the next-task candidates. - ARCHITECTURE extends crates/kebab-chunk/src/ tree with k8s_manifest_resource_v1.rs / dockerfile_file_v1.rs / manifest_file_v1.rs / tier2_shared.rs, plus a Tier 2 note on the code-parser row and flowchart node. - SMOKE adds a Tier 2 smoke walkthrough (k8s yaml + Dockerfile + Cargo.toml ingest + --code-lang search) and a P10-2 entry in the verification checklist. - tasks/INDEX + tasks/p10/INDEX flip p10-2 to โœ… (v0.14.0). Workspace test gate (-j 1) + clippy --workspace pass cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) --- HANDOFF.md | 4 +-- README.md | 2 +- docs/ARCHITECTURE.md | 12 +++++-- docs/SMOKE.md | 81 ++++++++++++++++++++++++++++++++++++++++++++ tasks/INDEX.md | 2 +- tasks/p10/INDEX.md | 2 +- 6 files changed, 95 insertions(+), 8 deletions(-) diff --git a/HANDOFF.md b/HANDOFF.md index 3554fac..184b1e8 100644 --- a/HANDOFF.md +++ b/HANDOFF.md @@ -4,7 +4,7 @@ ## ํ•œ ์ค„ ์š”์•ฝ -P0โ€“P5 + P6 + P7 + P9-1/2/3/4 (Library / Search / Ask / Inspect) ๋จธ์ง€ ์™„๋ฃŒ. `kebab ingest` ๊ฐ€ markdown / image / PDF / ์†Œ์Šค์ฝ”๋“œ (Rust / Python / TS / JS / Go / Java / Kotlin) ์ฒ˜๋ฆฌ. `kebab search` / `kebab ask` ๊ฐ€ ๋งค์ฒด ๊ฐ€๋กœ์งˆ๋Ÿฌ ๊ฒฐ๊ณผ + page / code citation ๋ฐ˜ํ™˜. `kebab tui` ๊ฐ€ 4 ํŒจ๋„ (Library + Search + Ask + Inspect) ์ œ๊ณต. P10-1C (Go + Java + Kotlin) ์™„๋ฃŒ โ€” ๋‹ค์Œ ํ›„๋ณด = P10-1D (C/C++) ๋˜๋Š” P9-5 (desktop tauri) ๋˜๋Š” ๋ณด๋ฅ˜ ์ค‘์ธ P8 (audio). +P0โ€“P5 + P6 + P7 + P9-1/2/3/4 (Library / Search / Ask / Inspect) ๋จธ์ง€ ์™„๋ฃŒ. `kebab ingest` ๊ฐ€ markdown / image / PDF / ์†Œ์Šค์ฝ”๋“œ (Rust / Python / TS / JS / Go / Java / Kotlin) / Tier 2 ๋ฆฌ์†Œ์Šค ํŒŒ์ผ (yaml/k8s / dockerfile / toml / json / xml / groovy / go-mod) ์ฒ˜๋ฆฌ. `kebab search` / `kebab ask` ๊ฐ€ ๋งค์ฒด ๊ฐ€๋กœ์งˆ๋Ÿฌ ๊ฒฐ๊ณผ + page / code citation ๋ฐ˜ํ™˜. `kebab tui` ๊ฐ€ 4 ํŒจ๋„ (Library + Search + Ask + Inspect) ์ œ๊ณต. P10-2 (Tier 2 resource-aware) ์™„๋ฃŒ โ€” ๋‹ค์Œ ํ›„๋ณด = P10-1D (C/C++) ๋˜๋Š” P10-3 (Tier 3 fallback) ๋˜๋Š” P9-5 (desktop tauri) ๋˜๋Š” ๋ณด๋ฅ˜ ์ค‘์ธ P8 (audio). ## Phase ๋กœ๋“œ๋งต @@ -20,7 +20,7 @@ P0โ€“P5 + P6 + P7 + P9-1/2/3/4 (Library / Search / Ask / Inspect) ๋จธ์ง€ ์™„๋ฃŒ. | **P7** | PDF text + page citation | `kebab-parse-pdf` | P5 | โœ… ์™„๋ฃŒ (3/3 component, page-level chunker + ingest wiring) | | **P8** | ์Œ์„ฑ transcription + timestamp citation | `kebab-parse-audio` | P5 | โธ ๋ณด๋ฅ˜ (whisper-rs ์‹œ์Šคํ…œ dep brainstorm ํ•„์š”) | | **P9** | TUI + desktop app | `kebab-tui`, `kebab-desktop` | P5 | ๐ŸŸก ์ง„ํ–‰ (4/5 component โ€” P9-1/2/3/4 ์™„๋ฃŒ [Library / Search / Ask / Inspect], P9-5 desktop ์˜ˆ์ • ยท ๋„๊ทธํ‘ธ๋”ฉ ํ”ผ๋“œ๋ฐฑ **20/20 โœ…**) | -| **P10** | code ingest framework | `kebab-parse-code` | P5 | ๐ŸŸก ์ง„ํ–‰ ์ค‘ โ€” 1A-1 โœ… (wire schema + parse-code skeleton + filter flags), 1A-2 โœ… (Rust AST chunker, `code-rust-ast-v1` โ€” v0.7.0), 1B โœ… (Python/TS/JS AST chunkers โ€” v0.8.0 ์ดํ›„), **1C-Go โœ… (Go AST chunker, `code-go-ast-v1` โ€” v0.12.0)**, **1C-JavaKotlin โœ… (Java + Kotlin AST chunkers, `code-java-ast-v1` / `code-kotlin-ast-v1` โ€” v0.13.0)** | +| **P10** | code ingest framework | `kebab-parse-code` | P5 | ๐ŸŸก ์ง„ํ–‰ ์ค‘ โ€” 1A-1 โœ… (wire schema + parse-code skeleton + filter flags), 1A-2 โœ… (Rust AST chunker, `code-rust-ast-v1` โ€” v0.7.0), 1B โœ… (Python/TS/JS AST chunkers โ€” v0.8.0 ์ดํ›„), **1C-Go โœ… (Go AST chunker, `code-go-ast-v1` โ€” v0.12.0)**, **1C-JavaKotlin โœ… (Java + Kotlin AST chunkers, `code-java-ast-v1` / `code-kotlin-ast-v1` โ€” v0.13.0)**, **2 โœ… (Tier 2 resource-aware: yaml/k8s + dockerfile + manifest, `k8s-manifest-resource-v1` / `dockerfile-file-v1` / `manifest-file-v1` โ€” v0.14.0)** | P0~P5 ์ง๋ ฌ. P6~P9 P5 ์ดํ›„ ๋ณ‘๋ ฌ ๊ฐ€๋Šฅ. diff --git a/README.md b/README.md index fddb742..833432b 100644 --- a/README.md +++ b/README.md @@ -70,7 +70,7 @@ kebab doctor | ๋ช…๋ น | ๋™์ž‘ | |------|------| | `kebab init` | XDG ๊ฒฝ๋กœ์— ๋ฐ์ดํ„ฐ ๋””๋ ‰ํ† ๋ฆฌ + config.toml ์ƒ์„ฑ | -| `kebab ingest []` | Markdown / ์ด๋ฏธ์ง€ / PDF / Rust ์†Œ์Šค์ฝ”๋“œ ์ƒ‰์ธ (idempotent). TTY ์—์„œ๋Š” stderr ์ง„ํ–‰ ๋ฐ”, non-TTY (CI / pipe) ๋Š” stderr ํ•œ ์ค„์”ฉ, `--json` ์€ stdout ์— `ingest_progress.v1` ๋ผ์ธ streaming ํ›„ ๋งˆ์ง€๋ง‰์— `ingest_report.v1`. Ctrl-C ํ•œ ๋ฒˆ์ด๋ฉด ํ˜„์žฌ asset ๋งˆ๋ฌด๋ฆฌ ํ›„ abort (๋ถ€๋ถ„ commit ๋ณด์กด, idempotent re-run), ๋‘ ๋ฒˆ์งธ Ctrl-C ๋Š” hard exit. Markdown title ์ด frontmatter ์— ์—†์–ด๋„ ์ฒซ H1 โ†’ H2 โ†’ ์ฒซ paragraph 80 ์ž โ†’ ํŒŒ์ผ๋ช… ์ˆœ์œผ๋กœ ์ž๋™ ์ฑ„์›€ (parser_version `md-frontmatter-v2`) โ€” ๊ธฐ์กด ์ƒ‰์ธ๋œ doc ๋„ ๋‹ค์Œ ingest ์—์„œ ์ƒˆ title ๋กœ ๊ฐฑ์‹ . **Incremental** (p9-fb-23): ๋‘ ๋ฒˆ์งธ ์ดํ›„์˜ ingest ๋Š” ๋ณ€ํ•˜์ง€ ์•Š์€ doc (blake3 + parser/chunker/embedder version ๋ชจ๋‘ ๋™์ผ) ์˜ parse/chunk/embed/vector upsert ๋ฅผ ์ž๋™ ์Šคํ‚ต. final summary ์— `N unchanged` ์นด์šดํŠธ ํ‘œ์‹œ. `--force-reingest` ๋กœ skip ๋ฌด์‹œ ๊ฐ•์ œ ์žฌ์ฒ˜๋ฆฌ. **์ง€์› ํ˜•์‹** (extractor ์ž๋™ ๊ฒฐ์ • โ€” config ์— ๋ช…์‹œ ๋ถˆ๊ฐ€): Markdown (`.md`), ์ด๋ฏธ์ง€ (`.png` / `.jpg` / `.jpeg`, OCR + caption), PDF (`.pdf`), **์†Œ์Šค์ฝ”๋“œ** (`.rs` โ†’ `code-rust-ast-v1`, `.py` โ†’ `code-python-ast-v1`, `.ts`/`.tsx` โ†’ `code-ts-ast-v1`, `.js`/`.mjs`/`.cjs`/`.jsx` โ†’ `code-js-ast-v1`, `.go` โ†’ `code-go-ast-v1`, `.java` โ†’ `code-java-ast-v1`, `.kt`/`.kts` โ†’ `code-kotlin-ast-v1` โ€” ๋ชจ๋‘ tree-sitter AST chunker). ๋‹ค๋ฅธ ํ™•์žฅ์ž๋Š” ์ž๋™ skip โ€” `IngestItem.warnings` ์— ์‚ฌ์œ  (`"unsupported media type: .docx"` ๋“ฑ), `IngestReport.skipped_by_extension` ์— ์นด์šดํŠธ ๋ถ„๋ฅ˜, CLI / TUI summary ์— breakdown ํ‘œ์‹œ. ์ฝ”๋“œ chunk ๋Š” `citation.kind = "code"` ์— `citation.lang = ""` + `symbol` + line range ๋ฅผ ๋‹ด๊ณ , SearchHit top-level ์— `code_lang` + `repo` (`.git/` walk-up ์˜ ๋””๋ ‰ํ† ๋ฆฌ ์ด๋ฆ„) ๊ฐ€ backfill ๋จ. `--code-lang rust` / `--code-lang python` / `--code-lang typescript` / `--code-lang javascript` / `--code-lang go` / `--code-lang java` / `--code-lang kotlin` / `--media code` filter ๋กœ ์–ธ์–ด๋ณ„ยท์ฝ”๋“œ ์ „์šฉ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ (p10-1A-1 filter flags). Python symbol ์€ workspace ๊ฒฝ๋กœ โ†’ dotted module path prefix (์˜ˆ: `kebab_eval.metrics.compute_mrr`), TS/JS symbol ์€ slash-style module path prefix (์˜ˆ: `src/Foo.Foo.search`), Go symbol ์€ `package.Func` / `package.(*Receiver).Method` ํ˜•์‹, Java / Kotlin symbol ์€ `com.foo.Foo.bar` ํ˜•์‹ (ํŒจํ‚ค์ง€ + ํด๋ž˜์Šค + ๋ฉ”์„œ๋“œ/ํ•„๋“œ). | +| `kebab ingest []` | Markdown / ์ด๋ฏธ์ง€ / PDF / Rust ์†Œ์Šค์ฝ”๋“œ ์ƒ‰์ธ (idempotent). TTY ์—์„œ๋Š” stderr ์ง„ํ–‰ ๋ฐ”, non-TTY (CI / pipe) ๋Š” stderr ํ•œ ์ค„์”ฉ, `--json` ์€ stdout ์— `ingest_progress.v1` ๋ผ์ธ streaming ํ›„ ๋งˆ์ง€๋ง‰์— `ingest_report.v1`. Ctrl-C ํ•œ ๋ฒˆ์ด๋ฉด ํ˜„์žฌ asset ๋งˆ๋ฌด๋ฆฌ ํ›„ abort (๋ถ€๋ถ„ commit ๋ณด์กด, idempotent re-run), ๋‘ ๋ฒˆ์งธ Ctrl-C ๋Š” hard exit. Markdown title ์ด frontmatter ์— ์—†์–ด๋„ ์ฒซ H1 โ†’ H2 โ†’ ์ฒซ paragraph 80 ์ž โ†’ ํŒŒ์ผ๋ช… ์ˆœ์œผ๋กœ ์ž๋™ ์ฑ„์›€ (parser_version `md-frontmatter-v2`) โ€” ๊ธฐ์กด ์ƒ‰์ธ๋œ doc ๋„ ๋‹ค์Œ ingest ์—์„œ ์ƒˆ title ๋กœ ๊ฐฑ์‹ . **Incremental** (p9-fb-23): ๋‘ ๋ฒˆ์งธ ์ดํ›„์˜ ingest ๋Š” ๋ณ€ํ•˜์ง€ ์•Š์€ doc (blake3 + parser/chunker/embedder version ๋ชจ๋‘ ๋™์ผ) ์˜ parse/chunk/embed/vector upsert ๋ฅผ ์ž๋™ ์Šคํ‚ต. final summary ์— `N unchanged` ์นด์šดํŠธ ํ‘œ์‹œ. `--force-reingest` ๋กœ skip ๋ฌด์‹œ ๊ฐ•์ œ ์žฌ์ฒ˜๋ฆฌ. **์ง€์› ํ˜•์‹** (extractor ์ž๋™ ๊ฒฐ์ • โ€” config ์— ๋ช…์‹œ ๋ถˆ๊ฐ€): Markdown (`.md`), ์ด๋ฏธ์ง€ (`.png` / `.jpg` / `.jpeg`, OCR + caption), PDF (`.pdf`), **์†Œ์Šค์ฝ”๋“œ** (`.rs` โ†’ `code-rust-ast-v1`, `.py` โ†’ `code-python-ast-v1`, `.ts`/`.tsx` โ†’ `code-ts-ast-v1`, `.js`/`.mjs`/`.cjs`/`.jsx` โ†’ `code-js-ast-v1`, `.go` โ†’ `code-go-ast-v1`, `.java` โ†’ `code-java-ast-v1`, `.kt`/`.kts` โ†’ `code-kotlin-ast-v1` โ€” ๋ชจ๋‘ tree-sitter AST chunker; **Tier 2 ๋ฆฌ์†Œ์Šค ํŒŒ์ผ**: `.yaml`/`.yml` โ†’ `k8s-manifest-resource-v1` (apiVersion+kind ํŒŒ์‹ฑ), `Dockerfile`/`Dockerfile.*`/`*.dockerfile` โ†’ `dockerfile-file-v1` (์ „์ฒด ํŒŒ์ผ), `Cargo.toml`/`pyproject.toml`/`.toml`/`package.json`/`tsconfig.json`/`.json`/`pom.xml`/`.xml`/`build.gradle`/`.gradle`/`go.mod` โ†’ `manifest-file-v1` (์ „์ฒด ํŒŒ์ผ) โ€” yaml (k8s) / dockerfile / toml / json / xml / groovy / go-mod ์ง€์›). ๋‹ค๋ฅธ ํ™•์žฅ์ž๋Š” ์ž๋™ skip โ€” `IngestItem.warnings` ์— ์‚ฌ์œ  (`"unsupported media type: .docx"` ๋“ฑ), `IngestReport.skipped_by_extension` ์— ์นด์šดํŠธ ๋ถ„๋ฅ˜, CLI / TUI summary ์— breakdown ํ‘œ์‹œ. ์ฝ”๋“œ chunk ๋Š” `citation.kind = "code"` ์— `citation.lang = ""` + `symbol` + line range ๋ฅผ ๋‹ด๊ณ , SearchHit top-level ์— `code_lang` + `repo` (`.git/` walk-up ์˜ ๋””๋ ‰ํ† ๋ฆฌ ์ด๋ฆ„) ๊ฐ€ backfill ๋จ. `--code-lang rust` / `--code-lang python` / `--code-lang typescript` / `--code-lang javascript` / `--code-lang go` / `--code-lang java` / `--code-lang kotlin` / `--code-lang yaml` / `--code-lang dockerfile` / `--code-lang toml` / `--code-lang json` / `--code-lang xml` / `--code-lang groovy` / `--code-lang go-mod` / `--media code` filter ๋กœ ์–ธ์–ด๋ณ„ยท์ฝ”๋“œ ์ „์šฉ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ (p10-1A-1 filter flags). Python symbol ์€ workspace ๊ฒฝ๋กœ โ†’ dotted module path prefix (์˜ˆ: `kebab_eval.metrics.compute_mrr`), TS/JS symbol ์€ slash-style module path prefix (์˜ˆ: `src/Foo.Foo.search`), Go symbol ์€ `package.Func` / `package.(*Receiver).Method` ํ˜•์‹, Java / Kotlin symbol ์€ `com.foo.Foo.bar` ํ˜•์‹ (ํŒจํ‚ค์ง€ + ํด๋ž˜์Šค + ๋ฉ”์„œ๋“œ/ํ•„๋“œ). | | `kebab search --mode {lexical,vector,hybrid} "" [--no-cache] [--max-tokens N] [--snippet-chars N] [--cursor ] [--tag T] [--lang L] [--path-glob G] [--trust-min LEVEL] [--media TYPE] [--ingested-after RFC3339] [--doc-id ID] [--trace] [--bulk] [--repo NAME ...] [--code-lang LIST]` | ๊ฒ€์ƒ‰. hybrid๋Š” RRF fusion, citation ํฌํ•จ. ๊ฐ™์€ process ์•ˆ์—์„œ ๋™์ผ query (NFKC + trim + lowercase ์ •๊ทœํ™”) ๋ฐ˜๋ณต ์‹œ in-process LRU ์บ์‹œ hit (capacity = `[search] cache_capacity`, default 256). `--no-cache` ๋กœ ๊ฐ•์ œ bypass โ€” ๋””๋ฒ„๊น…์šฉ. ingest commit ๋ฐœ์ƒ ์‹œ `kv['corpus_revision']` bump ์œผ๋กœ ๋ชจ๋“  entry ์ž๋™ stale. **`--max-tokens` / `--snippet-chars` / `--cursor` (p9-fb-34)** โ€” agent budget controls. `--json` ์ถœ๋ ฅ์€ `search_response.v1` wrapper (`{hits, next_cursor, truncated}`) โ€” pre-fb-34 ์˜ bare array ์™€ ํ˜ธํ™˜ ์•ˆ ๋จ. mismatched cursor โ†’ `error.v1.code = stale_cursor`. **filter flags (p9-fb-36):** `--tag` ๋Š” ๋ฐ˜๋ณต ๊ฐ€๋Šฅ flag (`--tag rust --tag async`) ๋กœ OR ๋งค์นญ, `--media` ๋Š” `,` ๊ตฌ๋ถ„ ๋‹ค์ค‘ ๊ฐ’ OR ๋งค์นญ, ๋‚˜๋จธ์ง€ flags ๊ฐ„์€ AND ์กฐํ•ฉ. `--trust-min` ์€ `primary\|secondary\|generated` ์ค‘ ํ•˜๋‚˜ (ํ•ด๋‹น level ์ด์ƒ ํฌํ•จ). `--ingested-after` ๋Š” RFC3339 UTC โ€” ํŒŒ์‹ฑ ์‹คํŒจ ์‹œ `error.v1.code = config_invalid` (exit 2). `--media md` ๋Š” `markdown` alias ๋กœ ์ •๊ทœํ™”. ์•Œ ์ˆ˜ ์—†๋Š” `--media` ๊ฐ’์€ ๋ฌด์กฐ๊ฑด empty hits (์˜ค๋ฅ˜ ์•„๋‹˜). **`--trace` (p9-fb-37)** โ€” `search_response.v1.trace` ์— lexical / vector pre-fusion ํ›„๋ณด + RRF union + per-stage timing (`lexical_ms` / `vector_ms` / `fusion_ms` / `total_ms`) ๋…ธ์ถœ. trace ์š”์ฒญ์€ ์บ์‹œ ์šฐํšŒ (`--no-cache` ์—†์ด๋„ ํ•ญ์ƒ cold). **`--bulk` (p9-fb-42)** โ€” stdin ndjson ์œผ๋กœ N query ํ•œ ๋ฒˆ์— ์‹คํ–‰. `--json` ๋ฉด stdout per-query ndjson (`bulk_search_item.v1`) + stderr summary (`bulk_summary: total=N succeeded=S failed=F`). Cap 100. agent ๊ฐ€ query decomposition ํ›„ sub-query ์ผ๊ด„ ์‹คํ–‰ ์‹œ single round-trip โ€” App instance ์žฌ์‚ฌ์šฉ์œผ๋กœ ์บ์‹œ / embedder cold-start ๋น„์šฉ ํ•œ ๋ฒˆ๋งŒ. Per-query failure ๋Š” item ์˜ `error` (error.v1) ์— ๊ฒฉ๋ฆฌ, ๋‹ค๋ฅธ query ๊ณ„์† ์ง„ํ–‰. **code corpus filters (p10-1A-1):** `--repo` ๋Š” ๋ฐ˜๋ณต ๊ฐ€๋Šฅ (`--repo kebab --repo other`) OR ๋งค์นญ. `--code-lang` ๋Š” ๋ฐ˜๋ณต ๋˜๋Š” comma ๋‹ค์ค‘ ๊ฐ’ (`--code-lang rust,python`), ์•Œ ์ˆ˜ ์—†๋Š” ๊ฐ’์€ ๋นˆ hits. `--media code` ๋Š” Tier 1/2/3 ๋ชจ๋“  code chunk ํฌํ•จ. 1A-1 ์‹œ์ ์—์„œ๋Š” indexed ๋œ code chunk ๊ฐ€ ์—†์–ด filter ๊ฐ€ ํ•ญ์ƒ ๋นˆ ๊ฒฐ๊ณผ โ€” 1A-2 (Rust AST chunker) ๋จธ์ง€ ์ดํ›„ ์‹คํšจ. | | `kebab list docs` | ์ƒ‰์ธ๋œ ๋ฌธ์„œ ๋ชฉ๋ก | | `kebab inspect doc ` / `kebab inspect chunk ` | raw record ๋ณด๊ธฐ | diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index beafdec..cc1c391 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -22,7 +22,7 @@ Cargo workspace, ํ•จ์ˆ˜ ํ˜ธ์ถœ ๊ธฐ๋ฐ˜ ๋ชจ๋“ˆ๋Ÿฌ ๋ชจ๋†€๋ฆฌ์Šค. UI binary (`kebab- | OCR | Ollama vision LM (default `gemma4:e4b`) โ€” `OcrEngine` trait ์œผ๋กœ Tesseract / Apple Vision ๋“ฑ future swap (HOTFIXES P6-2) | | Image caption | Ollama vision LM, runtime gate `image.caption.enabled` (default OFF) | | PDF parser | `lopdf` per-page ํ…์ŠคํŠธ, `chunker_version = "pdf-page-v1"` ๊ฐ€ PDF ์ž์‚ฐ์— ํ•˜๋“œ์ฝ”๋”ฉ (HOTFIXES P7-3) | -| code parser | `tree-sitter` + `tree-sitter-rust` / `tree-sitter-python` / `tree-sitter-typescript` / `tree-sitter-javascript` / `tree-sitter-go` / `tree-sitter-java` / `tree-sitter-kotlin-ng` โ€” **parser-side** (`kebab-parse-code`), chunker-side ์•„๋‹˜ (design ยง6.3). chunker versions: Rust = `code-rust-ast-v1`, Python = `code-python-ast-v1`, TypeScript = `code-ts-ast-v1`, JavaScript = `code-js-ast-v1`, Go = `code-go-ast-v1`, Java = `code-java-ast-v1`, Kotlin = `code-kotlin-ast-v1`. `ast_chunk_max_lines = 200` ์ƒ์ˆ˜ ๊ณ ์ • (HOTFIXES 2026-05-19 โ€” Chunker trait ์ด per-medium config ๋ฏธ๋…ธ์ถœ). Kotlin grammar ์€ `tree-sitter-kotlin-ng` ์‚ฌ์šฉ โ€” bare `tree-sitter-kotlin` ์€ tree-sitter 0.21โ€“0.23 ์— ๊ณ ์ฐฉ๋˜์–ด ์žˆ์–ด ์‚ฌ์šฉ ๋ถˆ๊ฐ€. | +| code parser | `tree-sitter` + `tree-sitter-rust` / `tree-sitter-python` / `tree-sitter-typescript` / `tree-sitter-javascript` / `tree-sitter-go` / `tree-sitter-java` / `tree-sitter-kotlin-ng` โ€” **parser-side** (`kebab-parse-code`), chunker-side ์•„๋‹˜ (design ยง6.3). chunker versions: Rust = `code-rust-ast-v1`, Python = `code-python-ast-v1`, TypeScript = `code-ts-ast-v1`, JavaScript = `code-js-ast-v1`, Go = `code-go-ast-v1`, Java = `code-java-ast-v1`, Kotlin = `code-kotlin-ast-v1`. `ast_chunk_max_lines = 200` ์ƒ์ˆ˜ ๊ณ ์ • (HOTFIXES 2026-05-19 โ€” Chunker trait ์ด per-medium config ๋ฏธ๋…ธ์ถœ). Kotlin grammar ์€ `tree-sitter-kotlin-ng` ์‚ฌ์šฉ โ€” bare `tree-sitter-kotlin` ์€ tree-sitter 0.21โ€“0.23 ์— ๊ณ ์ฐฉ๋˜์–ด ์žˆ์–ด ์‚ฌ์šฉ ๋ถˆ๊ฐ€. **Tier 2 (p10-2)**: YAML/k8s โ†’ `serde_yaml` + `k8s-manifest-resource-v1` (apiVersion+kind per resource), Dockerfile โ†’ `dockerfile-file-v1` (whole-file), Cargo.toml/go.mod/.json/.xml/.groovy โ†’ `manifest-file-v1` (whole-file). Tier 2 chunkers live in `kebab-chunk`; no tree-sitter grammar needed (structure from file type, not AST). | | 1B symbol path | workspace path โ†’ module path: Python = dotted prefix (`kebab_eval.metrics.compute_mrr`), TypeScript/JavaScript = slash-style prefix (`src/Foo.Foo.search`). Rust 1A-2 ๋Š” file-scope nesting ๋งŒ (workspace prefix ์—†์Œ, ๋น„์ผ๊ด€ ์ˆ˜์šฉ โ€” HOTFIXES 2026-05-20). | | TUI | Ratatui + crossterm โ€” P9-1 Library ํŒจ๋„, P9-2/3/4 ์ง„ํ–‰ ์˜ˆ์ • | | Desktop | Tauri 2 + `pdfjs-dist` (native PDF render backend ๊ธˆ์ง€) โ€” P9-5 | @@ -52,7 +52,7 @@ flowchart TB ppdf["kebab-parse-pdf"] pimg["kebab-parse-image"] paud["kebab-parse-audio
(P8 ๋ณด๋ฅ˜)"] - pcode["kebab-parse-code
(P10-1A-2 + P10-1B + P10-1C-Go + P10-1C-JK)"] + pcode["kebab-parse-code
(P10-1A-2 + P10-1B + P10-1C-Go + P10-1C-JK + P10-2)"] ptypes["kebab-parse-types"] norm["kebab-normalize"] chunk["kebab-chunk"] @@ -165,7 +165,13 @@ kebab/ โ”‚ โ”œโ”€โ”€ kebab-source-fs/ # ์›Œํฌ์ŠคํŽ˜์ด์Šค walk + checksum (P1-1) โ”‚ โ”œโ”€โ”€ kebab-parse-md/ # Markdown frontmatter + blocks (P1-2/3) โ”‚ โ”œโ”€โ”€ kebab-normalize/ # ParsedBlock โ†’ CanonicalDocument (P1-4) -โ”‚ โ”œโ”€โ”€ kebab-chunk/ # heading-aware + pdf-page-v1 + code-rust-ast-v1 + code-python-ast-v1 + code-ts-ast-v1 + code-js-ast-v1 + code-go-ast-v1 + code-java-ast-v1 + code-kotlin-ast-v1 chunker (P1-5, P7-2, P10-1A-2, P10-1B, P10-1C-Go, P10-1C-JK) +โ”‚ โ”œโ”€โ”€ kebab-chunk/ # heading-aware + pdf-page-v1 + code-*-ast-v1 (Tier 1) + k8s-manifest-resource-v1 + dockerfile-file-v1 + manifest-file-v1 + tier2_shared (P10-2) chunker (P1-5, P7-2, P10-1A-2, P10-1B, P10-1C-Go, P10-1C-JK, P10-2) +โ”‚ โ”‚ โ””โ”€โ”€ src/ +โ”‚ โ”‚ โ”œโ”€โ”€ code_*_ast_v1.rs # Tier 1 AST chunkers (rust/python/ts/js/go/java/kotlin) +โ”‚ โ”‚ โ”œโ”€โ”€ k8s_manifest_resource_v1.rs # Tier 2 (p10-2): YAML multi-doc, apiVersion+kind per resource +โ”‚ โ”‚ โ”œโ”€โ”€ dockerfile_file_v1.rs # Tier 2 (p10-2): whole-file Dockerfile +โ”‚ โ”‚ โ”œโ”€โ”€ manifest_file_v1.rs # Tier 2 (p10-2): whole-file Cargo.toml / go.mod / .json / .xml / .groovy +โ”‚ โ”‚ โ””โ”€โ”€ tier2_shared.rs # Tier 2 (p10-2): shared oversize fallback + Chunk builder helpers โ”‚ โ”œโ”€โ”€ kebab-store-sqlite/ # SQLite + FTS5 (V001/V002/V003) (P1-6, P2-1, P3-3) โ”‚ โ”œโ”€โ”€ kebab-search/ # Lexical + Vector + Hybrid retriever (P2-2, P3-4) โ”‚ โ”œโ”€โ”€ kebab-embed/ kebab-embed-local/ # Embedder trait + fastembed adapter (P3-1, P3-2) diff --git a/docs/SMOKE.md b/docs/SMOKE.md index b609a2f..97c9d59 100644 --- a/docs/SMOKE.md +++ b/docs/SMOKE.md @@ -422,6 +422,86 @@ KB search --mode hybrid "Hello" --code-lang go --json | \ # ๊ธฐ๋Œ€: symbol = "main.Hello", lang = "go" ``` +## P10-2 Tier 2 ๋ฆฌ์†Œ์Šค ํŒŒ์ผ ์ƒ‰์ธ + +P10-1C-Go ์™€ ๋™์ผํ•œ ๊ฒฉ๋ฆฌ KB ์„ค์ •. `.yaml` / `Dockerfile` / `.toml` ๋“ฑ Tier 2 ๋ฆฌ์†Œ์Šค ํŒŒ์ผ์„ ์›Œํฌ์ŠคํŽ˜์ด์Šค์— ๋‘๊ณ  ingest ํ•˜๋ฉด ๊ฐ ํ™•์žฅ์ž์— ๋งž๋Š” chunker ๋กœ ์ฒ˜๋ฆฌ๋œ๋‹ค. + +```bash +# 1) Kubernetes manifest (YAML multi-doc) +cat > /tmp/kebab-smoke/workspace/deploy.yaml <<'EOF' +apiVersion: apps/v1 +kind: Deployment +metadata: + name: my-app + namespace: default +spec: + replicas: 2 + selector: + matchLabels: + app: my-app + template: + metadata: + labels: + app: my-app + spec: + containers: + - name: app + image: my-app:latest +--- +apiVersion: v1 +kind: Service +metadata: + name: my-app-svc + namespace: default +spec: + selector: + app: my-app + ports: + - port: 80 +EOF + +# 2) Dockerfile (์ „์ฒด ํŒŒ์ผ ๋‹จ์ผ chunk) +cat > /tmp/kebab-smoke/workspace/Dockerfile <<'EOF' +FROM rust:1.85 AS builder +WORKDIR /app +COPY . . +RUN cargo build --release + +FROM debian:bookworm-slim +COPY --from=builder /app/target/release/kebab /usr/local/bin/kebab +ENTRYPOINT ["kebab"] +EOF + +# 3) Cargo.toml (manifest โ€” ์ „์ฒด ํŒŒ์ผ ๋‹จ์ผ chunk) +cp Cargo.toml /tmp/kebab-smoke/workspace/Cargo.toml + +# 4) ingest +KB ingest + +# 5) ์–ธ์–ด๋ณ„ ๊ฒ€์ƒ‰ (citation.symbol ํ™•์ธ) +KB search --mode hybrid "Deployment" --code-lang yaml --json | \ + jq '{hits: [.hits[] | {symbol: .citation.symbol, lang: .citation.lang}]}' +# ๊ธฐ๋Œ€: symbol = "Deployment/default/my-app" (kind/namespace/name), lang = "yaml" + +KB search --mode hybrid "rust:1.85" --code-lang dockerfile --json | \ + jq '{hits: [.hits[] | {symbol: .citation.symbol, lang: .citation.lang}]}' +# ๊ธฐ๋Œ€: symbol = "", lang = "dockerfile" + +KB search --mode hybrid "kebab-cli" --code-lang toml --json | \ + jq '{hits: [.hits[] | {symbol: .citation.symbol, lang: .citation.lang}]}' +# ๊ธฐ๋Œ€: symbol = "", lang = "toml" + +# 6) schema stats ์— Tier 2 ์–ธ์–ด ์นด์šดํŠธ ํ™•์ธ +KB --json schema | jq '.stats.code_lang_breakdown' +# ๊ธฐ๋Œ€: {"yaml": N, "dockerfile": N, "toml": N, ...} +``` + +**Tier 2 citation.symbol ์ปจ๋ฒค์…˜**: + +- **YAML k8s ๋ฆฌ์†Œ์Šค**: `//` (์˜ˆ: `Deployment/default/my-app`). `namespace` ์—†์œผ๋ฉด `/`. multi-doc YAML ์€ `---` ๊ตฌ๋ถ„์ž ๊ธฐ์ค€์œผ๋กœ resource ๋ณ„ chunk. +- **Dockerfile**: `` (๊ณ ์ • ์‹ฌ๋ณผ, ์ „์ฒด ํŒŒ์ผ์ด ๋‹จ์ผ chunk). +- **TOML / JSON / XML / Groovy / go.mod**: `` (๊ณ ์ • ์‹ฌ๋ณผ, ์ „์ฒด ํŒŒ์ผ์ด ๋‹จ์ผ chunk). ๋‹จ, ํŒŒ์ผ์ด `tier2_shared` ์˜ oversize threshold ์ดˆ๊ณผ ์‹œ ์ค„ ๋‹จ์œ„ fallback chunk. + ## ๊ฒ€์ฆ ์ฒดํฌ๋ฆฌ์ŠคํŠธ - `kebab doctor` ๊ฐ€ `--config` path ๋ฅผ honor ํ•˜๊ณ  ๊ทธ ์•ˆ์˜ `storage.data_dir` ๋ฅผ ์ถœ๋ ฅ (XDG default ๊ฐ€ ์•„๋‹˜). @@ -456,6 +536,7 @@ rm -rf /tmp/kebab-smoke # ํ†ต์งธ๋กœ ์ •๋ฆฌ - (P10-1B) `.py` / `.ts` / `.tsx` / `.js` / `.mjs` / `.cjs` / `.jsx` ํŒŒ์ผ์„ ์›Œํฌ์ŠคํŽ˜์ด์Šค์— ๋‘๋ฉด `kebab ingest` ๊ฒฐ๊ณผ์— `new` ์นด์šดํ„ฐ์— ํฌํ•จ. `--code-lang python` / `--code-lang typescript` / `--code-lang javascript` ๊ฒ€์ƒ‰์ด `citation.symbol` ์— module path prefix ๋ฅผ ํฌํ•จํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋ฉด wiring ์ •์ƒ. `kebab schema --json | jq .stats.code_lang_breakdown` ์— ํ•ด๋‹น ์–ธ์–ด ์นด์šดํŠธ ๋“ฑ์žฅ ํ™•์ธ. - (P10-1C-Go) `.go` ํŒŒ์ผ์„ ์›Œํฌ์ŠคํŽ˜์ด์Šค์— ๋‘๋ฉด `kebab ingest` ๊ฐ€ `code-go-ast-v1` ๋กœ ์ฒ˜๋ฆฌ. `--code-lang go` ๊ฒ€์ƒ‰์ด `citation.symbol` ์— `.` / `.(*Receiver).` ํ˜•์‹ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋ฉด wiring ์ •์ƒ. `kebab schema --json | jq .stats.code_lang_breakdown` ์— `"go": N` ๋“ฑ์žฅ ํ™•์ธ. - (P10-1C-JK) `.java` ํŒŒ์ผ์€ `code-java-ast-v1`, `.kt`/`.kts` ํŒŒ์ผ์€ `code-kotlin-ast-v1` ๋กœ ์ฒ˜๋ฆฌ. `--code-lang java` / `--code-lang kotlin` ๊ฒ€์ƒ‰์ด `citation.symbol` ์— `com.foo.Foo.bar` ํ˜•์‹ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋ฉด wiring ์ •์ƒ. `kebab schema --json | jq .stats.code_lang_breakdown` ์— `"java": N` / `"kotlin": N` ๋“ฑ์žฅ ํ™•์ธ. +- (P10-2) `.yaml`/`.yml` ํŒŒ์ผ์€ apiVersion+kind ํŒŒ์‹ฑ์œผ๋กœ k8s resource ๋ณ„ chunk ์ƒ์„ฑ (`k8s-manifest-resource-v1`). `Dockerfile`/`Dockerfile.*` ๋Š” ์ „์ฒด ํŒŒ์ผ ๋‹จ์ผ chunk (`dockerfile-file-v1`). `.toml`/`.json`/`.xml`/`.groovy`/`go.mod` ๋Š” ์ „์ฒด ํŒŒ์ผ ๋‹จ์ผ chunk (`manifest-file-v1`). `--code-lang yaml` / `--code-lang dockerfile` / `--code-lang toml` ๊ฒ€์ƒ‰์ด `citation.symbol` ์— ๊ฐ๊ฐ `Deployment/default/my-app` / `` / `` ํ˜•์‹ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋ฉด wiring ์ •์ƒ. `kebab schema --json | jq .stats.code_lang_breakdown` ์— `"yaml": N` / `"dockerfile": N` / `"toml": N` ๋“ฑ์žฅ ํ™•์ธ. - (P7-3 + follow-up) ๋™์ผ path ์— byte ๊ฐ€ ๋‹ค๋ฅธ PDF ๋ฅผ ๋‘ ๋ฒˆ์งธ ingest ํ•˜๋ฉด `purge_vector_orphans_for_workspace_path` ๊ฐ€ ์˜› chunk_id ๋ฅผ LanceDB ์—์„œ ๋จผ์ € ์‚ญ์ œ, ์ด์–ด์„œ `purge_orphan_at_workspace_path` ๊ฐ€ ์˜› doc / chunks / embedding_records ๋ฅผ SQLite ์—์„œ sweep. ์ƒˆ byte ๊ฐ€ ์ƒˆ `doc_id` ๋กœ ์ƒ‰์ธ๋จ. `IngestReport` ์— ๊ทธ ์ž์‚ฐ๋งŒ `new+=1` (๋‹ค๋ฅธ ์ž์‚ฐ์€ `updated`). ๋‘ store ๋ชจ๋‘ ์ •ํ•ฉ โ€” ์˜› ๋ณธ๋ฌธ ๊ฒ€์ƒ‰ ์‹œ ์˜› chunks ๊ฐ€ ๋” ์ด์ƒ surface ๋˜์ง€ ์•Š์Œ. ### Embedding upgrade (fb-39b) diff --git a/tasks/INDEX.md b/tasks/INDEX.md index c9496fe..47922d5 100644 --- a/tasks/INDEX.md +++ b/tasks/INDEX.md @@ -145,7 +145,7 @@ P0~P5 ๋Š” ์ง๋ ฌ. P6~P9 ๋Š” P5 ์ดํ›„ ๋ณ‘๋ ฌ ๊ฐ€๋Šฅ. - p10-1C-Go Go AST chunker โ€” ๐ŸŸก PR ์˜คํ”ˆ (v0.12.0, `code-go-ast-v1`) - p10-1C-JavaKotlin Java + Kotlin AST chunkers โ€” ๐ŸŸข PR ์˜คํ”ˆ (v0.13.0, `code-java-ast-v1` / `code-kotlin-ast-v1`) - p10-1D C + C++ AST chunkers โ€” โณ - - p10-2 Tier 2 resource-aware โ€” โณ + - p10-2 Tier 2 resource-aware โ€” โœ… ๋จธ์ง€ (v0.14.0, `k8s-manifest-resource-v1` / `dockerfile-file-v1` / `manifest-file-v1`) - p10-3 Tier 3 paragraph + line-window fallback โ€” โณ ## Post-merge ํ•ซํ”ฝ์Šค diff --git a/tasks/p10/INDEX.md b/tasks/p10/INDEX.md index 2ec3bd9..c14a287 100644 --- a/tasks/p10/INDEX.md +++ b/tasks/p10/INDEX.md @@ -8,7 +8,7 @@ | 1C-Go | Go AST chunker (`code-go-ast-v1`) | ๐ŸŸก PR ์˜คํ”ˆ (v0.12.0) | | 1C-JavaKotlin | Java + Kotlin AST chunkers (`code-java-ast-v1` / `code-kotlin-ast-v1`) | ๐ŸŸข PR ์˜คํ”ˆ (v0.13.0) | | 1D | C + C++ AST chunkers | โณ | -| 2 | Tier 2 resource-aware (k8s / Dockerfile / manifest) | โณ | +| 2 | Tier 2 resource-aware (k8s / Dockerfile / manifest) | โœ… ๋จธ์ง€ (v0.14.0) | | 3 | Tier 3 paragraph + line-window fallback | โณ | Design: [2026-05-15-kebab-code-ingest-design.md](../../docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md) From 217dddb4ba64f934ee59ed55643ff2506cc389d4 Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 20 May 2026 14:14:38 +0000 Subject: [PATCH 13/13] =?UTF-8?q?chore:=20bump=20version=200.13.0=20?= =?UTF-8?q?=E2=86=92=200.14.0=20(p10-2=20Tier=202=20resource-aware)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Minor bump โ€” additive code_lang values (xml / groovy / go-mod) + 3 new chunker_version labels (k8s-manifest-resource-v1 / dockerfile-file-v1 / manifest-file-v1) + frozen design ยง3.5 / ยง10.1 deltas. No DB migration, no wire schema major bump. Co-Authored-By: Claude Opus 4.7 (1M context) --- Cargo.lock | 46 +++++++++++++++++++++++----------------------- Cargo.toml | 2 +- 2 files changed, 24 insertions(+), 24 deletions(-) diff --git a/Cargo.lock b/Cargo.lock index 28cbf89..0839e76 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -4127,7 +4127,7 @@ dependencies = [ [[package]] name = "kebab-app" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "base64 0.22.1", @@ -4172,7 +4172,7 @@ dependencies = [ [[package]] name = "kebab-chunk" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "blake3", @@ -4188,7 +4188,7 @@ dependencies = [ [[package]] name = "kebab-cli" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "clap", @@ -4209,7 +4209,7 @@ dependencies = [ [[package]] name = "kebab-config" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "dirs 5.0.1", @@ -4224,7 +4224,7 @@ dependencies = [ [[package]] name = "kebab-core" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "blake3", @@ -4238,7 +4238,7 @@ dependencies = [ [[package]] name = "kebab-embed" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "blake3", @@ -4252,7 +4252,7 @@ dependencies = [ [[package]] name = "kebab-embed-local" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "fastembed", @@ -4265,7 +4265,7 @@ dependencies = [ [[package]] name = "kebab-eval" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "kebab-app", @@ -4284,7 +4284,7 @@ dependencies = [ [[package]] name = "kebab-llm" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "kebab-core", @@ -4293,7 +4293,7 @@ dependencies = [ [[package]] name = "kebab-llm-local" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "kebab-config", @@ -4310,7 +4310,7 @@ dependencies = [ [[package]] name = "kebab-mcp" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "kebab-app", @@ -4328,7 +4328,7 @@ dependencies = [ [[package]] name = "kebab-normalize" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "kebab-core", @@ -4343,7 +4343,7 @@ dependencies = [ [[package]] name = "kebab-parse-code" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "gix", @@ -4364,7 +4364,7 @@ dependencies = [ [[package]] name = "kebab-parse-image" -version = "0.13.0" +version = "0.14.0" dependencies = [ "ab_glyph", "anyhow", @@ -4388,7 +4388,7 @@ dependencies = [ [[package]] name = "kebab-parse-md" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "kebab-core", @@ -4405,7 +4405,7 @@ dependencies = [ [[package]] name = "kebab-parse-pdf" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "blake3", @@ -4418,7 +4418,7 @@ dependencies = [ [[package]] name = "kebab-parse-types" -version = "0.13.0" +version = "0.14.0" dependencies = [ "kebab-core", "serde", @@ -4426,7 +4426,7 @@ dependencies = [ [[package]] name = "kebab-rag" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "blake3", @@ -4447,7 +4447,7 @@ dependencies = [ [[package]] name = "kebab-search" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "globset", @@ -4466,7 +4466,7 @@ dependencies = [ [[package]] name = "kebab-source-fs" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "blake3", @@ -4485,7 +4485,7 @@ dependencies = [ [[package]] name = "kebab-store-sqlite" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "blake3", @@ -4506,7 +4506,7 @@ dependencies = [ [[package]] name = "kebab-store-vector" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "arrow", @@ -4530,7 +4530,7 @@ dependencies = [ [[package]] name = "kebab-tui" -version = "0.13.0" +version = "0.14.0" dependencies = [ "anyhow", "crossterm", diff --git a/Cargo.toml b/Cargo.toml index 6d3c9f9..cf98073 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -31,7 +31,7 @@ edition = "2024" rust-version = "1.85" license = "MIT OR Apache-2.0" repository = "https://github.com/altair823/kebab" -version = "0.13.0" +version = "0.14.0" [workspace.dependencies] anyhow = "1"