docs(p10-1a-1): wire schema + frozen design + README/HANDOFF/SMOKE + task index

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(p10-1a-1): patch wire.rs Stats fixture for new schema fields
2026-05-15 17:41:26 +09:00 · 2026-05-15 17:30:01 +09:00 · 2026-05-15 17:21:59 +09:00 · 2026-05-15 17:06:59 +09:00 · 2026-05-15 16:53:21 +09:00 · 2026-05-15 16:50:55 +09:00
75 changed files with 7109 additions and 242 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -27,7 +27,7 @@ cargo build --release                          # produces target/release/kebab

 `-j 1` for the full workspace test isn't optional: 18 integration-test binaries each link `lance` + `datafusion` + `arrow` + `tantivy` and the parallel link step exhausts memory (linker gets SIGKILL'd, build silently fails partway). Per-crate runs are fine in parallel.

-`target/` is 6–10 GB after a fresh build (DataFusion + Lance + fastembed + 18 × test-binary debug info). The dev/test profile is already trimmed (`debug = "line-tables-only"`, `split-debuginfo = "unpacked"` — see workspace `Cargo.toml`). Run `cargo clean` after phase merges if disk pressure shows up; backtraces still resolve to function + line.
+`target/` is 6–10 GB after a fresh build but **balloons to 90+ GB after a few task cycles** (each fb-* batch adds incremental compile artifacts on top of the existing 18 × test-binary debug info). The dev/test profile is already trimmed (`debug = "line-tables-only"`, `split-debuginfo = "unpacked"` — see workspace `Cargo.toml`). Run `cargo clean` **routinely after each merged PR**, not just "if pressure shows up" — disk space is tight and recovery via `cargo clean` is cheap (one re-link per crate on next build). Verified pattern: 92 GB → 0 GB in seconds, backtraces still resolve to function + line.

 ## The facade rule

--- a/Cargo.lock
+++ b/Cargo.lock
@@ -755,6 +755,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "63044e1ae8e69f3b5a92c736ca6269b8d12fa7efe39bf34ddb06d102cf0e2cab"
 dependencies = [
 "memchr",
+ "regex-automata",
 "serde",
 ]

@@ -931,6 +932,15 @@ version = "1.1.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "c8d4a3bb8b1e0c1050499d1815f5ab16d04f0959b233085fb31653fbfc9d98f9"

+[[package]]
+name = "clru"
+version = "0.6.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "197fd99cb113a8d5d9b6376f3aa817f32c1078f2343b714fff7d2ca44fdf67d5"
+dependencies = [
+ "hashbrown 0.16.1",
+]
+
 [[package]]
 name = "color_quant"
 version = "1.1.0"
@@ -2140,6 +2150,12 @@ version = "2.0.2"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "117240f60069e65410b3ae1bb213295bd828f707b5bec6596a1afc8793ce0cbc"

+[[package]]
+name = "dunce"
+version = "1.0.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "92773504d58c093f6de2459af4af33faa518c13451eb8f2b5698ed3d36e7c813"
+
 [[package]]
 name = "dyn-clone"
 version = "1.0.20"
@@ -2302,6 +2318,15 @@ dependencies = [
 "tokenizers",
 ]

+[[package]]
+name = "faster-hex"
+version = "0.9.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "a2a2b11eda1d40935b26cf18f6833c526845ae8c41e58d09af6adeb6f0269183"
+dependencies = [
+ "serde",
+]
+
 [[package]]
 name = "fastrand"
 version = "2.4.1"
@@ -2738,6 +2763,583 @@ dependencies = [
 "weezl",
 ]

+[[package]]
+name = "gix"
+version = "0.70.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "736f14636705f3a56ea52b553e67282519418d9a35bb1e90b3a9637a00296b68"
+dependencies = [
+ "gix-actor",
+ "gix-commitgraph",
+ "gix-config",
+ "gix-date",
+ "gix-diff",
+ "gix-discover",
+ "gix-features",
+ "gix-fs",
+ "gix-glob",
+ "gix-hash",
+ "gix-hashtable",
+ "gix-index",
+ "gix-lock",
+ "gix-object",
+ "gix-odb",
+ "gix-pack",
+ "gix-path",
+ "gix-protocol",
+ "gix-ref",
+ "gix-refspec",
+ "gix-revision",
+ "gix-revwalk",
+ "gix-sec",
+ "gix-shallow",
+ "gix-tempfile",
+ "gix-trace",
+ "gix-traverse",
+ "gix-url",
+ "gix-utils",
+ "gix-validate 0.9.4",
+ "once_cell",
+ "smallvec",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-actor"
+version = "0.33.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "20018a1a6332e065f1fcc8305c1c932c6b8c9985edea2284b3c79dc6fa3ee4b2"
+dependencies = [
+ "bstr",
+ "gix-date",
+ "gix-utils",
+ "itoa",
+ "thiserror 2.0.18",
+ "winnow 0.6.26",
+]
+
+[[package]]
+name = "gix-bitmap"
+version = "0.2.16"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d982fc7ef0608e669851d0d2a6141dae74c60d5a27e8daa451f2a4857bbf41e2"
+dependencies = [
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-chunk"
+version = "0.4.12"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5c356b3825677cb6ff579551bb8311a81821e184453cbd105e2fc5311b288eeb"
+dependencies = [
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-command"
+version = "0.4.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "cb410b84d6575db45e62025a9118bdbf4d4b099ce7575a76161e898d9ca98df1"
+dependencies = [
+ "bstr",
+ "gix-path",
+ "gix-trace",
+ "shell-words",
+]
+
+[[package]]
+name = "gix-commitgraph"
+version = "0.26.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e23a8ec2d8a16026a10dafdb6ed51bcfd08f5d97f20fa52e200bc50cb72e4877"
+dependencies = [
+ "bstr",
+ "gix-chunk",
+ "gix-features",
+ "gix-hash",
+ "memmap2",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-config"
+version = "0.43.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "377c1efd2014d5d469e0b3cd2952c8097bce9828f634e04d5665383249f1d9e9"
+dependencies = [
+ "bstr",
+ "gix-config-value",
+ "gix-features",
+ "gix-glob",
+ "gix-path",
+ "gix-ref",
+ "gix-sec",
+ "memchr",
+ "once_cell",
+ "smallvec",
+ "thiserror 2.0.18",
+ "unicode-bom",
+ "winnow 0.6.26",
+]
+
+[[package]]
+name = "gix-config-value"
+version = "0.14.12"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8dc2c844c4cf141884678cabef736fd91dd73068b9146e6f004ba1a0457944b6"
+dependencies = [
+ "bitflags",
+ "bstr",
+ "gix-path",
+ "libc",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-date"
+version = "0.9.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "daa30058ec7d3511fbc229e4f9e696a35abd07ec5b82e635eff864a2726217e4"
+dependencies = [
+ "bstr",
+ "itoa",
+ "jiff",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-diff"
+version = "0.50.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "62afb7f4ca0acdf4e9dad92065b2eb1bf2993bcc5014b57bc796e3a365b17c4d"
+dependencies = [
+ "bstr",
+ "gix-hash",
+ "gix-object",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-discover"
+version = "0.38.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d0c2414bdf04064e0f5a5aa029dfda1e663cf9a6c4bfc8759f2d369299bb65d8"
+dependencies = [
+ "bstr",
+ "dunce",
+ "gix-fs",
+ "gix-hash",
+ "gix-path",
+ "gix-ref",
+ "gix-sec",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-features"
+version = "0.40.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8bfdd4838a8d42bd482c9f0cb526411d003ee94cc7c7b08afe5007329c71d554"
+dependencies = [
+ "crc32fast",
+ "flate2",
+ "gix-hash",
+ "gix-trace",
+ "gix-utils",
+ "libc",
+ "once_cell",
+ "prodash",
+ "sha1_smol",
+ "thiserror 2.0.18",
+ "walkdir",
+]
+
+[[package]]
+name = "gix-fs"
+version = "0.13.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "182e7fa7bfdf44ffb7cfe7451b373cdf1e00870ac9a488a49587a110c562063d"
+dependencies = [
+ "fastrand",
+ "gix-features",
+ "gix-utils",
+]
+
+[[package]]
+name = "gix-glob"
+version = "0.18.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "4e9c7249fa0a78f9b363aa58323db71e0a6161fd69860ed6f48dedf0ef3a314e"
+dependencies = [
+ "bitflags",
+ "bstr",
+ "gix-features",
+ "gix-path",
+]
+
+[[package]]
+name = "gix-hash"
+version = "0.16.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e81c5ec48649b1821b3ed066a44efb95f1a268b35c1d91295e61252539fbe9f8"
+dependencies = [
+ "faster-hex",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-hashtable"
+version = "0.7.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "189130bc372accd02e0520dc5ab1cef318dcc2bc829b76ab8d84bbe90ac212d1"
+dependencies = [
+ "gix-hash",
+ "hashbrown 0.14.5",
+ "parking_lot",
+]
+
+[[package]]
+name = "gix-index"
+version = "0.38.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "acd12e3626879369310fffe2ac61acc828613ef656b50c4ea984dd59d7dc85d8"
+dependencies = [
+ "bitflags",
+ "bstr",
+ "filetime",
+ "fnv",
+ "gix-bitmap",
+ "gix-features",
+ "gix-fs",
+ "gix-hash",
+ "gix-lock",
+ "gix-object",
+ "gix-traverse",
+ "gix-utils",
+ "gix-validate 0.9.4",
+ "hashbrown 0.14.5",
+ "itoa",
+ "libc",
+ "memmap2",
+ "rustix 0.38.44",
+ "smallvec",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-lock"
+version = "16.0.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9739815270ff6940968441824d162df9433db19211ca9ba8c3fc1b50b849c642"
+dependencies = [
+ "gix-tempfile",
+ "gix-utils",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-object"
+version = "0.47.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ddc4b3a0044244f0fe22347fb7a79cca165e37829d668b41b85ff46a43e5fd68"
+dependencies = [
+ "bstr",
+ "gix-actor",
+ "gix-date",
+ "gix-features",
+ "gix-hash",
+ "gix-hashtable",
+ "gix-path",
+ "gix-utils",
+ "gix-validate 0.9.4",
+ "itoa",
+ "smallvec",
+ "thiserror 2.0.18",
+ "winnow 0.6.26",
+]
+
+[[package]]
+name = "gix-odb"
+version = "0.67.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3e93457df69cd09573608ce9fa4f443fbd84bc8d15d8d83adecd471058459c1b"
+dependencies = [
+ "arc-swap",
+ "gix-date",
+ "gix-features",
+ "gix-fs",
+ "gix-hash",
+ "gix-hashtable",
+ "gix-object",
+ "gix-pack",
+ "gix-path",
+ "gix-quote",
+ "parking_lot",
+ "tempfile",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-pack"
+version = "0.57.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "fc13a475b3db735617017fb35f816079bf503765312d4b1913b18cf96f3fa515"
+dependencies = [
+ "clru",
+ "gix-chunk",
+ "gix-features",
+ "gix-hash",
+ "gix-hashtable",
+ "gix-object",
+ "gix-path",
+ "memmap2",
+ "smallvec",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-packetline"
+version = "0.18.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "123844a70cf4d5352441dc06bab0da8aef61be94ec239cb631e0ba01dc6d3a04"
+dependencies = [
+ "bstr",
+ "faster-hex",
+ "gix-trace",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-path"
+version = "0.10.22"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7cb06c3e4f8eed6e24fd915fa93145e28a511f4ea0e768bae16673e05ed3f366"
+dependencies = [
+ "bstr",
+ "gix-trace",
+ "gix-validate 0.10.1",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-protocol"
+version = "0.48.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6c61bd61afc6b67d213241e2100394c164be421e3f7228d3521b04f48ca5ba90"
+dependencies = [
+ "bstr",
+ "gix-date",
+ "gix-features",
+ "gix-hash",
+ "gix-ref",
+ "gix-shallow",
+ "gix-transport",
+ "gix-utils",
+ "maybe-async",
+ "thiserror 2.0.18",
+ "winnow 0.6.26",
+]
+
+[[package]]
+name = "gix-quote"
+version = "0.4.15"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e49357fccdb0c85c0d3a3292a9f6db32d9b3535959b5471bb9624908f4a066c6"
+dependencies = [
+ "bstr",
+ "gix-utils",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-ref"
+version = "0.50.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "47adf4c5f933429f8554e95d0d92eee583cfe4b95d2bf665cd6fd4a1531ee20c"
+dependencies = [
+ "gix-actor",
+ "gix-features",
+ "gix-fs",
+ "gix-hash",
+ "gix-lock",
+ "gix-object",
+ "gix-path",
+ "gix-tempfile",
+ "gix-utils",
+ "gix-validate 0.9.4",
+ "memmap2",
+ "thiserror 2.0.18",
+ "winnow 0.6.26",
+]
+
+[[package]]
+name = "gix-refspec"
+version = "0.28.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "59650228d8f612f68e7f7a25f517fcf386c5d0d39826085492e94766858b0a90"
+dependencies = [
+ "bstr",
+ "gix-hash",
+ "gix-revision",
+ "gix-validate 0.9.4",
+ "smallvec",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-revision"
+version = "0.32.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3fe28bbccca55da6d66e6c6efc6bb4003c29d407afd8178380293729733e6b53"
+dependencies = [
+ "bitflags",
+ "bstr",
+ "gix-commitgraph",
+ "gix-date",
+ "gix-hash",
+ "gix-hashtable",
+ "gix-object",
+ "gix-revwalk",
+ "gix-trace",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-revwalk"
+version = "0.18.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d4ecb80c235b1e9ef2b99b23a81ea50dd569a88a9eb767179793269e0e616247"
+dependencies = [
+ "gix-commitgraph",
+ "gix-date",
+ "gix-hash",
+ "gix-hashtable",
+ "gix-object",
+ "smallvec",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-sec"
+version = "0.10.12"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "47aeb0f13de9ef2f3033f5ff218de30f44db827ac9f1286f9ef050aacddd5888"
+dependencies = [
+ "bitflags",
+ "gix-path",
+ "libc",
+ "windows-sys 0.52.0",
+]
+
+[[package]]
+name = "gix-shallow"
+version = "0.2.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ab72543011e303e52733c85bef784603ef39632ddf47f69723def52825e35066"
+dependencies = [
+ "bstr",
+ "gix-hash",
+ "gix-lock",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-tempfile"
+version = "16.0.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "2558f423945ef24a8328c55d1fd6db06b8376b0e7013b1bb476cc4ffdf678501"
+dependencies = [
+ "gix-fs",
+ "libc",
+ "once_cell",
+ "parking_lot",
+ "tempfile",
+]
+
+[[package]]
+name = "gix-trace"
+version = "0.1.19"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6f23569e55f2ffaf958617353b9734a7d52a7c19c439eeaa5e3efc217fd2270e"
+
+[[package]]
+name = "gix-transport"
+version = "0.45.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "11187418489477b1b5b862ae1aedbbac77e582f2c4b0ef54280f20cfe5b964d9"
+dependencies = [
+ "bstr",
+ "gix-command",
+ "gix-features",
+ "gix-packetline",
+ "gix-quote",
+ "gix-sec",
+ "gix-url",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-traverse"
+version = "0.44.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "2bec70e53896586ef32a3efa7e4427b67308531ed186bb6120fb3eca0f0d61b4"
+dependencies = [
+ "bitflags",
+ "gix-commitgraph",
+ "gix-date",
+ "gix-hash",
+ "gix-hashtable",
+ "gix-object",
+ "gix-revwalk",
+ "smallvec",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-url"
+version = "0.29.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "29218c768b53dd8f116045d87fec05b294c731a4b2bdd257eeca2084cc150b13"
+dependencies = [
+ "bstr",
+ "gix-features",
+ "gix-path",
+ "percent-encoding",
+ "thiserror 2.0.18",
+ "url",
+]
+
+[[package]]
+name = "gix-utils"
+version = "0.1.14"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ff08f24e03ac8916c478c8419d7d3c33393da9bb41fa4c24455d5406aeefd35f"
+dependencies = [
+ "fastrand",
+ "unicode-normalization",
+]
+
+[[package]]
+name = "gix-validate"
+version = "0.9.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "34b5f1253109da6c79ed7cf6e1e38437080bb6d704c76af14c93e2f255234084"
+dependencies = [
+ "bstr",
+ "thiserror 2.0.18",
+]
+
+[[package]]
+name = "gix-validate"
+version = "0.10.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5b1e63a5b516e970a594f870ed4571a8fdcb8a344e7bd407a20db8bd61dbfde4"
+dependencies = [
+ "bstr",
+ "thiserror 2.0.18",
+]
+
 [[package]]
 name = "glob"
 version = "0.3.3"
@@ -3525,7 +4127,7 @@ dependencies = [

 [[package]]
 name = "kebab-app"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "base64 0.22.1",
@@ -3569,7 +4171,7 @@ dependencies = [

 [[package]]
 name = "kebab-chunk"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "blake3",
@@ -3584,7 +4186,7 @@ dependencies = [

 [[package]]
 name = "kebab-cli"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "clap",
@@ -3605,7 +4207,7 @@ dependencies = [

 [[package]]
 name = "kebab-config"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "dirs 5.0.1",
@@ -3620,7 +4222,7 @@ dependencies = [

 [[package]]
 name = "kebab-core"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "blake3",
@@ -3634,7 +4236,7 @@ dependencies = [

 [[package]]
 name = "kebab-embed"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "blake3",
@@ -3648,7 +4250,7 @@ dependencies = [

 [[package]]
 name = "kebab-embed-local"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "fastembed",
@@ -3661,7 +4263,7 @@ dependencies = [

 [[package]]
 name = "kebab-eval"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "kebab-app",
@@ -3680,7 +4282,7 @@ dependencies = [

 [[package]]
 name = "kebab-llm"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "kebab-core",
@@ -3689,7 +4291,7 @@ dependencies = [

 [[package]]
 name = "kebab-llm-local"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "kebab-config",
@@ -3706,7 +4308,7 @@ dependencies = [

 [[package]]
 name = "kebab-mcp"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "kebab-app",
@@ -3724,7 +4326,7 @@ dependencies = [

 [[package]]
 name = "kebab-normalize"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "kebab-core",
@@ -3737,9 +4339,18 @@ dependencies = [
 "unicode-normalization",
 ]

+[[package]]
+name = "kebab-parse-code"
+version = "0.6.0"
+dependencies = [
+ "anyhow",
+ "gix",
+ "tempfile",
+]
+
 [[package]]
 name = "kebab-parse-image"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "ab_glyph",
 "anyhow",
@@ -3763,7 +4374,7 @@ dependencies = [

 [[package]]
 name = "kebab-parse-md"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "kebab-core",
@@ -3780,7 +4391,7 @@ dependencies = [

 [[package]]
 name = "kebab-parse-pdf"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "blake3",
@@ -3793,7 +4404,7 @@ dependencies = [

 [[package]]
 name = "kebab-parse-types"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "kebab-core",
 "serde",
@@ -3801,7 +4412,7 @@ dependencies = [

 [[package]]
 name = "kebab-rag"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "blake3",
@@ -3822,7 +4433,7 @@ dependencies = [

 [[package]]
 name = "kebab-search"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "globset",
@@ -3841,13 +4452,14 @@ dependencies = [

 [[package]]
 name = "kebab-source-fs"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "blake3",
 "ignore",
 "kebab-config",
 "kebab-core",
+ "kebab-parse-code",
 "serde",
 "serde_json",
 "tempfile",
@@ -3858,7 +4470,7 @@ dependencies = [

 [[package]]
 name = "kebab-store-sqlite"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "blake3",
@@ -3879,7 +4491,7 @@ dependencies = [

 [[package]]
 name = "kebab-store-vector"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "arrow",
@@ -3903,7 +4515,7 @@ dependencies = [

 [[package]]
 name = "kebab-tui"
-version = "0.5.0"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "crossterm",
@@ -4846,6 +5458,17 @@ dependencies = [
 "thread-tree",
 ]

+[[package]]
+name = "maybe-async"
+version = "0.2.11"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "746873a384ad60adc5db74471dfaba74bd278afbdcfd81db93fafcdfc8b5ca0c"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "syn 2.0.117",
+]
+
 [[package]]
 name = "maybe-rayon"
 version = "0.1.1"
@@ -5702,6 +6325,16 @@ dependencies = [
 "unicode-ident",
 ]

+[[package]]
+name = "prodash"
+version = "29.0.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f04bb108f648884c23b98a0e940ebc2c93c0c3b89f04dbaf7eb8256ce617d1bc"
+dependencies = [
+ "log",
+ "parking_lot",
+]
+
 [[package]]
 name = "profiling"
 version = "1.0.17"
@@ -6841,6 +7474,12 @@ dependencies = [
 "unsafe-libyaml",
 ]

+[[package]]
+name = "sha1_smol"
+version = "1.0.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "bbfa15b3dddfee50a0fff136974b3e1bde555604ba463834a7eb7deb6417705d"
+
 [[package]]
 name = "sha2"
 version = "0.10.9"
@@ -6861,6 +7500,12 @@ dependencies = [
 "lazy_static",
 ]

+[[package]]
+name = "shell-words"
+version = "1.1.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "dc6fe69c597f9c37bfeeeeeb33da3530379845f10be461a66d16d03eca2ded77"
+
 [[package]]
 name = "shellexpand"
 version = "3.1.2"
@@ -7889,6 +8534,12 @@ version = "2.9.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "dbc4bc3a9f746d862c45cb89d705aa10f187bb96c76001afab07a0d35ce60142"

+[[package]]
+name = "unicode-bom"
+version = "2.0.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7eec5d1121208364f6793f7d2e222bf75a915c19557537745b195b253dd64217"
+
 [[package]]
 name = "unicode-ident"
 version = "1.0.24"
@@ -8587,6 +9238,15 @@ version = "0.53.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "d6bbff5f0aada427a1e5a6da5f1f98158182f26556f345ac9e04d36d0ebed650"

+[[package]]
+name = "winnow"
+version = "0.6.26"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "1e90edd2ac1aa278a5c4599b1d89cf03074b610800f866d4026dc199d7929a28"
+dependencies = [
+ "memchr",
+]
+
 [[package]]
 name = "winnow"
 version = "0.7.15"
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -23,6 +23,7 @@ members = [
    "crates/kebab-parse-pdf",
    "crates/kebab-tui",
    "crates/kebab-mcp",
+    "crates/kebab-parse-code",
 ]

 [workspace.package]
@@ -30,7 +31,7 @@ edition       = "2024"
 rust-version  = "1.85"
 license       = "MIT OR Apache-2.0"
 repository    = "https://github.com/altair823/kebab"
-version       = "0.5.0"
+version       = "0.6.0"

 [workspace.dependencies]
 anyhow       = "1"
@@ -81,6 +82,10 @@ rmcp         = { version = "1.6", default-features = false, features = ["server"
 # sync via reqwest::blocking — wiremock is dev-only there).
 wiremock     = "0.6"
 base64       = "0.22"
+# Pure-Rust git library for repo metadata detection (kebab-parse-code).
+# No `git` binary required. Default features include thread-safety + most
+# object-reading capabilities needed for HEAD name + commit SHA queries.
+gix          = { version = "0.70", default-features = false, features = ["revision"] }

 # Disk-footprint trim for dev / test builds. Codegen, opt-level, and
 # behavior are unchanged — only DWARF debug info is reduced (line
--- a/HANDOFF.md
+++ b/HANDOFF.md
@@ -20,6 +20,7 @@ P0–P5 + P6 + P7 + P9-1/2/3/4 (Library / Search / Ask / Inspect) 머지 완료.
 | **P7** | PDF text + page citation | `kebab-parse-pdf` | P5 | ✅ 완료 (3/3 component, page-level chunker + ingest wiring) |
 | **P8** | 음성 transcription + timestamp citation | `kebab-parse-audio` | P5 | ⏸ 보류 (whisper-rs 시스템 dep brainstorm 필요) |
 | **P9** | TUI + desktop app | `kebab-tui`, `kebab-desktop` | P5 | 🟡 진행 (4/5 component — P9-1/2/3/4 완료 [Library / Search / Ask / Inspect], P9-5 desktop 예정 · 도그푸딩 피드백 **20/20 ✅**) |
+| **10** | code ingest framework | `kebab-parse-code` | P5 | 🟡 진행 중 (1A-1 머지 직전) — 1A-1 머지 시점 wire schema additive minor + 새 crate kebab-parse-code skeleton 동결, 실제 code chunker 는 1A-2 부터 |

 P0~P5 직렬. P6~P9 P5 이후 병렬 가능.

--- a/README.md
+++ b/README.md
@@ -7,7 +7,7 @@
 - **Rust toolchain** ≥ 1.85 (workspace 가 edition 2024 + resolver 3 사용). [rustup](https://rustup.rs) 권장.
 - **Ollama** — `kebab ask` 와 이미지 OCR/caption 가 사용. `https://ollama.com/download` 에서 설치 후 `ollama serve` 실행. 기본 LLM 은 gemma4 계열 (`ollama pull gemma4:e4b`) — OCR / caption 도 같은 family 라 모델 하나만 pull 하면 됨. 더 큰 variant 원하면 `gemma4:26b` 등으로 config override. config 의 `[models.llm].endpoint` 에 host:port 명시.
 - **빌드 디스크** — 첫 빌드 시 `target/` 가 6–10 GB (Lance + DataFusion + fastembed). 여유 확인.
- **fastembed 모델** — 첫 `kebab ingest` 시 `multilingual-e5-small` (~470 MB) 자동 다운로드.
+- **fastembed 모델** — 첫 `kebab ingest` 시 `multilingual-e5-large` (~1.3 GB, fb-39b) 자동 다운로드. `config.toml` 에서 `model = "multilingual-e5-small"` 로 명시하면 이전 모델 사용.

 ## 설치

@@ -71,7 +71,7 @@ kebab doctor
 |------|------|
 | `kebab init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 |
 | `kebab ingest [<path>]` | Markdown / 이미지 / PDF 색인 (idempotent). TTY 에서는 stderr 진행 바, non-TTY (CI / pipe) 는 stderr 한 줄씩, `--json` 은 stdout 에 `ingest_progress.v1` 라인 streaming 후 마지막에 `ingest_report.v1`. Ctrl-C 한 번이면 현재 asset 마무리 후 abort (부분 commit 보존, idempotent re-run), 두 번째 Ctrl-C 는 hard exit. Markdown title 이 frontmatter 에 없어도 첫 H1 → H2 → 첫 paragraph 80 자 → 파일명 순으로 자동 채움 (parser_version `md-frontmatter-v2`) — 기존 색인된 doc 도 다음 ingest 에서 새 title 로 갱신. **Incremental** (p9-fb-23): 두 번째 이후의 ingest 는 변하지 않은 doc (blake3 + parser/chunker/embedder version 모두 동일) 의 parse/chunk/embed/vector upsert 를 자동 스킵. final summary 에 `N unchanged` 카운트 표시. `--force-reingest` 로 skip 무시 강제 재처리. **지원 형식** (extractor 자동 결정 — config 에 명시 불가): Markdown (`.md`), 이미지 (`.png` / `.jpg` / `.jpeg`, OCR + caption), PDF (`.pdf`). 다른 확장자는 자동 skip — `IngestItem.warnings` 에 사유 (`"unsupported media type: .docx"` 등), `IngestReport.skipped_by_extension` 에 카운트 분류, CLI / TUI summary 에 breakdown 표시. |
-| `kebab search --mode {lexical,vector,hybrid} "<query>" [--no-cache] [--max-tokens N] [--snippet-chars N] [--cursor <opaque>] [--tag T] [--lang L] [--path-glob G] [--trust-min LEVEL] [--media TYPE] [--ingested-after RFC3339] [--doc-id ID] [--trace] [--bulk]` | 검색. hybrid는 RRF fusion, citation 포함. 같은 process 안에서 동일 query (NFKC + trim + lowercase 정규화) 반복 시 in-process LRU 캐시 hit (capacity = `[search] cache_capacity`, default 256). `--no-cache` 로 강제 bypass — 디버깅용. ingest commit 발생 시 `kv['corpus_revision']` bump 으로 모든 entry 자동 stale. **`--max-tokens` / `--snippet-chars` / `--cursor` (p9-fb-34)** — agent budget controls. `--json` 출력은 `search_response.v1` wrapper (`{hits, next_cursor, truncated}`) — pre-fb-34 의 bare array 와 호환 안 됨. mismatched cursor → `error.v1.code = stale_cursor`. **filter flags (p9-fb-36):** `--tag` 는 반복 가능 flag (`--tag rust --tag async`) 로 OR 매칭, `--media` 는 `,` 구분 다중 값 OR 매칭, 나머지 flags 간은 AND 조합. `--trust-min` 은 `primary\|secondary\|generated` 중 하나 (해당 level 이상 포함). `--ingested-after` 는 RFC3339 UTC — 파싱 실패 시 `error.v1.code = config_invalid` (exit 2). `--media md` 는 `markdown` alias 로 정규화. 알 수 없는 `--media` 값은 무조건 empty hits (오류 아님). **`--trace` (p9-fb-37)** — `search_response.v1.trace` 에 lexical / vector pre-fusion 후보 + RRF union + per-stage timing (`lexical_ms` / `vector_ms` / `fusion_ms` / `total_ms`) 노출. trace 요청은 캐시 우회 (`--no-cache` 없이도 항상 cold). **`--bulk` (p9-fb-42)** — stdin ndjson 으로 N query 한 번에 실행. `--json` 면 stdout per-query ndjson (`bulk_search_item.v1`) + stderr summary (`bulk_summary: total=N succeeded=S failed=F`). Cap 100. agent 가 query decomposition 후 sub-query 일괄 실행 시 single round-trip — App instance 재사용으로 캐시 / embedder cold-start 비용 한 번만. Per-query failure 는 item 의 `error` (error.v1) 에 격리, 다른 query 계속 진행. |
+| `kebab search --mode {lexical,vector,hybrid} "<query>" [--no-cache] [--max-tokens N] [--snippet-chars N] [--cursor <opaque>] [--tag T] [--lang L] [--path-glob G] [--trust-min LEVEL] [--media TYPE] [--ingested-after RFC3339] [--doc-id ID] [--trace] [--bulk] [--repo NAME ...] [--code-lang LIST] [--media code]` | 검색. hybrid는 RRF fusion, citation 포함. 같은 process 안에서 동일 query (NFKC + trim + lowercase 정규화) 반복 시 in-process LRU 캐시 hit (capacity = `[search] cache_capacity`, default 256). `--no-cache` 로 강제 bypass — 디버깅용. ingest commit 발생 시 `kv['corpus_revision']` bump 으로 모든 entry 자동 stale. **`--max-tokens` / `--snippet-chars` / `--cursor` (p9-fb-34)** — agent budget controls. `--json` 출력은 `search_response.v1` wrapper (`{hits, next_cursor, truncated}`) — pre-fb-34 의 bare array 와 호환 안 됨. mismatched cursor → `error.v1.code = stale_cursor`. **filter flags (p9-fb-36):** `--tag` 는 반복 가능 flag (`--tag rust --tag async`) 로 OR 매칭, `--media` 는 `,` 구분 다중 값 OR 매칭, 나머지 flags 간은 AND 조합. `--trust-min` 은 `primary\|secondary\|generated` 중 하나 (해당 level 이상 포함). `--ingested-after` 는 RFC3339 UTC — 파싱 실패 시 `error.v1.code = config_invalid` (exit 2). `--media md` 는 `markdown` alias 로 정규화. 알 수 없는 `--media` 값은 무조건 empty hits (오류 아님). **`--trace` (p9-fb-37)** — `search_response.v1.trace` 에 lexical / vector pre-fusion 후보 + RRF union + per-stage timing (`lexical_ms` / `vector_ms` / `fusion_ms` / `total_ms`) 노출. trace 요청은 캐시 우회 (`--no-cache` 없이도 항상 cold). **`--bulk` (p9-fb-42)** — stdin ndjson 으로 N query 한 번에 실행. `--json` 면 stdout per-query ndjson (`bulk_search_item.v1`) + stderr summary (`bulk_summary: total=N succeeded=S failed=F`). Cap 100. agent 가 query decomposition 후 sub-query 일괄 실행 시 single round-trip — App instance 재사용으로 캐시 / embedder cold-start 비용 한 번만. Per-query failure 는 item 의 `error` (error.v1) 에 격리, 다른 query 계속 진행. **code corpus filters (p10-1A-1):** `--repo` 는 반복 가능 (`--repo kebab --repo other`) OR 매칭. `--code-lang` 는 반복 또는 comma 다중 값 (`--code-lang rust,python`), 알 수 없는 값은 빈 hits. `--media code` 는 Tier 1/2/3 모든 code chunk 포함. 1A-1 시점에서는 indexed 된 code chunk 가 없어 filter 가 항상 빈 결과 — 1A-2 (Rust AST chunker) 머지 이후 실효. |
 | `kebab list docs` | 색인된 문서 목록 |
 | `kebab inspect doc <id>` / `kebab inspect chunk <id>` | raw record 보기 |
 | `kebab fetch chunk <id> [--context N]` / `kebab fetch doc <id> [--max-tokens N]` / `kebab fetch span <doc_id> <ls> <le> [--max-tokens N]` | (p9-fb-35) verbatim text fetch from indexed corpus. wire = `fetch_result.v1` (kind discriminator). chunk: target + ±N ordinal-context chunks. doc: full normalized markdown. span: 1-based line range (PDF/audio rejected as `error.v1.code = span_not_supported`). chars/4 budget on doc/span. |
@@ -133,7 +133,7 @@ flowchart TB
    subgraph Pipeline["도메인 + 파이프라인"]
        parse["parse-md / parse-pdf / parse-image"]
        chunker["chunker (md-heading-v1, pdf-page-v1)"]
-        embedder["embedder (fastembed multilingual-e5-small)"]
+        embedder["embedder (fastembed multilingual-e5-large)"]
        retriever["retriever (lexical / vector / hybrid RRF)"]
        rag["RAG pipeline"]
    end
@@ -178,7 +178,17 @@ flowchart TB

 ## Configuration

- `~/.config/kebab/config.toml` — `kebab init` 가 XDG 경로에 생성. `[workspace]` (root, exclude — include 필드는 제거됨, 지원 형식은 자동 결정), `[storage]`, `[chunking]`, `[models.embedding]`, `[models.llm]`, `[image.ocr]`, `[image.caption]`, `[search]`, `[rag]`, `[ui]` 절. `[ui] theme = "dark" | "light"` 로 TUI 팔레트 선택 (default `"dark"`, 알 수 없는 값은 dark fallback). `[search] stale_threshold_days = 30` (p9-fb-32) — search hit / RAG citation 의 `stale` 플래그 기준 (default 30 일, `0` 으로 비활성화). 옛 config 의 `workspace.include = [...]` 은 silently 무시 + 단발 deprecation warning (p9-fb-25).
+- `~/.config/kebab/config.toml` — `kebab init` 가 XDG 경로에 생성. `[workspace]` (root, exclude — include 필드는 제거됨, 지원 형식은 자동 결정), `[storage]`, `[chunking]`, `[models.embedding]`, `[models.llm]`, `[image.ocr]`, `[image.caption]`, `[search]`, `[rag]`, `[ui]` 절. 
+  - `[models.embedding]` — 
+    - `model` (default `"multilingual-e5-large"`, fb-39b) — 다국어 sentence embedding 모델. 1024-dim. ONNX (~1.3 GB) 첫 실행 시 fastembed cache (`config.storage.model_dir/fastembed/`) 에 자동 다운로드. `"multilingual-e5-small"` (384 dim) 는 backwards-compat 으로 사용 가능 — TOML 에 명시.
+    - `dimensions` (default `1024`) — 모델의 embedding 차원. config 와 LanceDB stored dim 불일치 시 검색 결과 0 건 (orphan table). 모델 변경 시 `kebab reset --vector-only && kebab ingest` 로 vector index 재구축 권장.
+  - `[ui] theme = "dark" | "light"` 로 TUI 팔레트 선택 (default `"dark"`, 알 수 없는 값은 dark fallback). 
+  - `[search] stale_threshold_days = 30` (p9-fb-32) — search hit / RAG citation 의 `stale` 플래그 기준 (default 30 일, `0` 으로 비활성화). 옛 config 의 `workspace.include = [...]` 은 silently 무시 + 단발 deprecation warning (p9-fb-25).
+- `[ingest.code]` (p10-1A-1) — code ingest 의 skip 정책 + chunker 기본값.
+  - `skip_generated_header = true` — 첫 ~512 byte 의 generated marker (`@generated` / `DO NOT EDIT` 등) 감지 시 skip.
+  - `max_file_bytes = 262144` (256 KiB) / `max_file_lines = 5000` — 파일당 cap, 초과 시 skip.
+  - `extra_skip_globs = []` — 사용자 추가 skip 패턴 (`.gitignore` 문법).
+  - `.gitignore` honor: 자동 적용. `.kebabignore` 는 추가 layer. 우선순위: built-in safety net (`node_modules/` / `target/` / `__pycache__/` / `.venv/` / `venv/` / `env/`) > `.gitignore` > `.kebabignore`.
 - `[rag] prompt_template_version` (default `"rag-v2"`) — RAG system prompt version. `"rag-v1"` 은 legacy backwards-compat (사용자 명시 시 유지). v2 강화 규칙: (1) fact 인용 시 [#번호] 앞에 chunk 속 원문 큰따옴표 표기, (2) 학습 지식 동원 금지, (3) 근거 모호 시 "확실하지 않다" 명시.
 - `--config <path>` flag — 임시 워크스페이스 / 격리 테스트 시 사용. CLI / TUI 모두 honor.
 - `KEBAB_*` env — 일부 키 override (`KEBAB_RAG_SCORE_GATE`, `KEBAB_EVAL_GOLDEN`, `KEBAB_COMMIT_HASH` 등).
--- a/crates/kebab-app/src/bulk.rs
+++ b/crates/kebab-app/src/bulk.rs
@@ -197,6 +197,8 @@ fn parse_one(raw: &Value) -> Result<(SearchQuery, SearchOpts), String> {
        media,
        ingested_after,
        doc_id,
+        repo: vec![],
+        code_lang: vec![],
    };

    let opts = SearchOpts {
--- a/crates/kebab-app/src/lib.rs
+++ b/crates/kebab-app/src/lib.rs
@@ -44,7 +44,7 @@ use kebab_core::{
    Answer, Block, CanonicalDocument, Chunk, ChunkId, ChunkPolicy, ChunkerVersion, Chunker,
    DocFilter, DocSummary, DocumentId, DocumentStore, Embedder, EmbeddingInput,
    EmbeddingKind, ExtractContext, Extractor, IngestReport, Lang, LanguageModel, MediaType,
-    ParserVersion, RawAsset, SearchHit, SearchQuery, SourceConnector, SourceScope,
+    ParserVersion, RawAsset, SearchHit, SearchQuery, SourceScope,
    SourceUri, VectorRecord, VectorStore,
 };
 use kebab_llm_local::OllamaLanguageModel;
@@ -305,8 +305,8 @@ pub fn ingest_with_config_opts(
    );
    let connector = FsSourceConnector::new(&app.config)
        .context("kb-app::ingest: build FsSourceConnector")?;
-    let assets = connector
-        .scan(&scope)
+    let (assets, fs_skips) = connector
+        .scan_with_skips(&scope)
        .context("kb-app::ingest: scan workspace")?;
    crate::ingest_progress::emit(
        progress,
@@ -675,6 +675,12 @@ pub fn ingest_with_config_opts(
        errors: error_count,
        duration_ms,
        skipped_by_extension,
+        skipped_gitignore: fs_skips.skipped_gitignore,
+        skipped_kebabignore: fs_skips.skipped_kebabignore,
+        skipped_builtin_blacklist: fs_skips.skipped_builtin_blacklist,
+        skipped_generated: fs_skips.skipped_generated,
+        skipped_size_exceeded: fs_skips.skipped_size_exceeded,
+        skip_examples: fs_skips.skip_examples,
        items: if summary_only { None } else { Some(items) },
    })
 }
--- a/crates/kebab-app/src/schema.rs
+++ b/crates/kebab-app/src/schema.rs
@@ -45,7 +45,7 @@ pub struct Models {
    pub corpus_revision: u64,
 }

-#[derive(Debug, Clone, Serialize, Deserialize)]
+#[derive(Debug, Clone, Default, Serialize, Deserialize)]
 pub struct Stats {
    pub doc_count: u64,
    pub chunk_count: u64,
@@ -63,6 +63,14 @@ pub struct Stats {
    /// p9-fb-37: docs whose `updated_at` exceeds the staleness threshold.
    #[serde(default)]
    pub stale_doc_count: u64,
+    /// p10-1A-1: code language breakdown (chunk counts by canonical lowercase
+    /// language identifier). Empty until 1A-2 produces code chunks.
+    #[serde(default)]
+    pub code_lang_breakdown: std::collections::BTreeMap<String, u32>,
+    /// p10-1A-1: repo breakdown (chunk counts by `metadata.repo` value).
+    /// Empty until 1A-2 produces code chunks.
+    #[serde(default)]
+    pub repo_breakdown: std::collections::BTreeMap<String, u32>,
 }

 const KEBAB_VERSION: &str = env!("CARGO_PKG_VERSION");
@@ -158,6 +166,9 @@ fn collect_stats(
        lang_breakdown: counts.lang_breakdown,
        index_bytes,
        stale_doc_count: counts.stale_doc_count,
+        // p10-1A-1: populated by 1A-2 code ingest; empty until then.
+        code_lang_breakdown: std::collections::BTreeMap::new(),
+        repo_breakdown: std::collections::BTreeMap::new(),
    })
 }

@@ -182,6 +193,32 @@ fn collect_models(cfg: &Config, store: &kebab_store_sqlite::SqliteStore) -> Mode
 mod tests_stats_ext {
    use super::*;

+    /// p10-1A-1: Stats must serialize `code_lang_breakdown` and
+    /// `repo_breakdown` so downstream consumers (MCP skill, Claude Code)
+    /// can branch on their presence.
+    #[test]
+    fn stats_includes_code_lang_and_repo_breakdown_fields() {
+        let stats = Stats::default();
+        let v = serde_json::to_value(&stats).unwrap();
+        assert!(
+            v.get("code_lang_breakdown").is_some(),
+            "Stats JSON must include code_lang_breakdown: {v}"
+        );
+        assert!(
+            v.get("repo_breakdown").is_some(),
+            "Stats JSON must include repo_breakdown: {v}"
+        );
+        // Empty BTreeMap serializes as `{}` — confirm it's an object, not null.
+        assert!(
+            v["code_lang_breakdown"].is_object(),
+            "code_lang_breakdown must be an object: {v}"
+        );
+        assert!(
+            v["repo_breakdown"].is_object(),
+            "repo_breakdown must be an object: {v}"
+        );
+    }
+
    #[test]
    fn stats_includes_breakdowns_and_bytes_on_fresh_corpus() {
        let dir = tempfile::tempdir().unwrap();
--- a/crates/kebab-chunk/src/md_heading_v1.rs
+++ b/crates/kebab-chunk/src/md_heading_v1.rs
@@ -472,6 +472,10 @@ mod tests {
                trust_level: TrustLevel::Primary,
                user_id_alias: None,
                user: Default::default(),
+                repo: None,
+                git_branch: None,
+                git_commit: None,
+                code_lang: None,
            },
            provenance: Provenance { events: vec![] },
            parser_version: kebab_core::ParserVersion("test-parser-0".into()),
--- a/crates/kebab-chunk/src/pdf_page_v1.rs
+++ b/crates/kebab-chunk/src/pdf_page_v1.rs
@@ -347,6 +347,10 @@ mod tests {
                trust_level: TrustLevel::Primary,
                user_id_alias: None,
                user: Default::default(),
+                repo: None,
+                git_branch: None,
+                git_commit: None,
+                code_lang: None,
            },
            provenance: Provenance { events: vec![] },
            parser_version,
@@ -512,6 +516,10 @@ mod tests {
                trust_level: TrustLevel::Primary,
                user_id_alias: None,
                user: Default::default(),
+                repo: None,
+                git_branch: None,
+                git_commit: None,
+                code_lang: None,
            },
            provenance: Provenance { events: vec![] },
            parser_version,
--- a/crates/kebab-cli/src/main.rs
+++ b/crates/kebab-cli/src/main.rs
@@ -46,6 +46,11 @@ struct Cli {
    command: Cmd,
 }

+// p10-1A-1: adding `repo` and `code_lang` Vec<String> fields pushed `Cmd`
+// over clippy's large_enum_variant threshold. The enum is short-lived
+// (parsed once at startup, never cloned in a hot path) — boxing would add
+// noise with no real benefit.
+#[allow(clippy::large_enum_variant)]
 #[derive(Subcommand, Debug)]
 enum Cmd {
    /// Initialise XDG dirs + workspace + `config.toml`.
@@ -165,6 +170,18 @@ enum Cmd {
        #[arg(long)]
        doc_id: Option<String>,

+        /// p10-1A-1: filter by repo name (`metadata.repo`). Repeatable;
+        /// multi-value = OR.  Empty = no filter (all repos returned).
+        #[arg(long = "repo", value_name = "NAME", num_args = 1)]
+        repo: Vec<String>,
+
+        /// p10-1A-1: filter by code language identifier (lowercase
+        /// canonical).  Repeatable or comma-separated.
+        /// Examples: `rust`, `python`, `typescript`.
+        /// Unknown values produce empty hits.
+        #[arg(long = "code-lang", value_name = "LANG", num_args = 1, value_delimiter = ',')]
+        code_lang: Vec<String>,
+
        /// p9-fb-37: emit pre-fusion lexical / vector / RRF candidate
        /// lists + per-stage timing in the response. Bypasses cache
        /// (debug intent — fresh run guaranteed). Requires embeddings
@@ -688,6 +705,8 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
            media,
            ingested_after,
            doc_id,
+            repo,
+            code_lang,
            trace,
            bulk,
        } => {
@@ -819,7 +838,7 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
                    None => None,
                };

-            // p9-fb-36: build SearchFilters from the 7 new flags.
+            // p9-fb-36 + p10-1A-1: build SearchFilters from CLI flags.
            let filters = kebab_core::SearchFilters {
                tags_any: tag.clone(),
                lang: lang.as_ref().map(|s| kebab_core::Lang(s.clone())),
@@ -828,6 +847,8 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
                media: media_norm,
                ingested_after: ingested_after_parsed,
                doc_id: doc_id.as_ref().map(|s| kebab_core::DocumentId(s.clone())),
+                repo: repo.clone(),
+                code_lang: code_lang.clone(),
            };

            let q = kebab_core::SearchQuery {
--- a/crates/kebab-cli/src/wire.rs
+++ b/crates/kebab-cli/src/wire.rs
@@ -239,7 +239,7 @@ mod tests {

    #[test]
    fn ingest_wrapper_tags_schema_version() {
-        use kebab_core::SourceScope;
+        use kebab_core::{SkipExamples, SourceScope};
        let r = IngestReport {
            scope: SourceScope {
                root: std::path::PathBuf::from("/tmp"),
@@ -254,6 +254,12 @@ mod tests {
            errors: 0,
            duration_ms: 0,
            skipped_by_extension: std::collections::BTreeMap::new(),
+            skipped_gitignore: 0,
+            skipped_kebabignore: 0,
+            skipped_builtin_blacklist: 0,
+            skipped_generated: 0,
+            skipped_size_exceeded: 0,
+            skip_examples: SkipExamples::default(),
            items: None,
        };
        let v = wire_ingest(&r);
@@ -328,6 +334,8 @@ mod tests {
                lang_breakdown: Default::default(),
                index_bytes: Default::default(),
                stale_doc_count: 0,
+                // p10-1A-1: new fields added to Stats; use Default for the test fixture.
+                ..Default::default()
            },
        };
        let v = wire_schema(&schema);
--- a/crates/kebab-cli/tests/wire_citation_5_variants_unchanged.rs
+++ b/crates/kebab-cli/tests/wire_citation_5_variants_unchanged.rs
@@ -0,0 +1,100 @@
+//! p10-1A-1 Task 13: regression — the 5 original Citation variants
+//! (Line, Page, Region, Caption, Time) serialize byte-identically to
+//! pre-Task-1 form.  No spurious `code`, `line_start`, or `symbol` keys
+//! must leak into these variants.
+
+use kebab_core::{Citation, WorkspacePath};
+
+#[test]
+fn line_variant_serialization_unchanged() {
+    let c = Citation::Line {
+        path: WorkspacePath::new("a.md".into()).unwrap(),
+        start: 1,
+        end: 2,
+        section: Some("§14".into()),
+    };
+    let v = serde_json::to_value(&c).unwrap();
+    assert_eq!(v["kind"], "line");
+    assert_eq!(v["start"], 1);
+    assert_eq!(v["end"], 2);
+    assert_eq!(v["section"], "§14");
+    // Must not bleed Code-variant keys.
+    assert!(v.get("line_start").is_none(), "line_start must be absent: {v}");
+    assert!(v.get("symbol").is_none(), "symbol must be absent: {v}");
+    assert!(v.get("code").is_none(), "code must be absent: {v}");
+}
+
+#[test]
+fn line_variant_null_section_omitted() {
+    let c = Citation::Line {
+        path: WorkspacePath::new("b.md".into()).unwrap(),
+        start: 5,
+        end: 10,
+        section: None,
+    };
+    let v = serde_json::to_value(&c).unwrap();
+    assert_eq!(v["kind"], "line");
+    // `section` with None should be omitted (skip_serializing_if = is_none).
+    assert!(v.get("section").is_none() || v["section"].is_null());
+}
+
+#[test]
+fn page_variant_serialization_unchanged() {
+    let c = Citation::Page {
+        path: WorkspacePath::new("a.pdf".into()).unwrap(),
+        page: 13,
+        section: None,
+    };
+    let v = serde_json::to_value(&c).unwrap();
+    assert_eq!(v["kind"], "page");
+    assert_eq!(v["page"], 13);
+    assert!(v.get("line_start").is_none(), "line_start must be absent: {v}");
+    assert!(v.get("symbol").is_none(), "symbol must be absent: {v}");
+}
+
+#[test]
+fn region_variant_serialization_unchanged() {
+    let c = Citation::Region {
+        path: WorkspacePath::new("img.png".into()).unwrap(),
+        x: 10,
+        y: 20,
+        w: 100,
+        h: 200,
+    };
+    let v = serde_json::to_value(&c).unwrap();
+    assert_eq!(v["kind"], "region");
+    assert_eq!(v["x"], 10);
+    assert_eq!(v["y"], 20);
+    assert_eq!(v["w"], 100);
+    assert_eq!(v["h"], 200);
+    assert!(v.get("line_start").is_none(), "line_start must be absent: {v}");
+}
+
+#[test]
+fn caption_variant_serialization_unchanged() {
+    let c = Citation::Caption {
+        path: WorkspacePath::new("a.png".into()).unwrap(),
+        model: "qwen2.5-vl:7b".into(),
+    };
+    let v = serde_json::to_value(&c).unwrap();
+    assert_eq!(v["kind"], "caption");
+    assert_eq!(v["model"], "qwen2.5-vl:7b");
+    assert!(v.get("line_start").is_none(), "line_start must be absent: {v}");
+}
+
+#[test]
+fn time_variant_serialization_unchanged() {
+    let c = Citation::Time {
+        path: WorkspacePath::new("audio.mp3".into()).unwrap(),
+        start_ms: 1000,
+        end_ms: 5000,
+        speaker: Some("Alice".into()),
+    };
+    let v = serde_json::to_value(&c).unwrap();
+    assert_eq!(v["kind"], "time");
+    assert_eq!(v["start_ms"], 1000);
+    assert_eq!(v["end_ms"], 5000);
+    assert_eq!(v["speaker"], "Alice");
+    assert!(v.get("line_start").is_none(), "line_start must be absent: {v}");
+    assert!(v.get("symbol").is_none(), "symbol must be absent: {v}");
+}
--- a/crates/kebab-cli/tests/wire_search_filters_code.rs
+++ b/crates/kebab-cli/tests/wire_search_filters_code.rs
@@ -0,0 +1,72 @@
+//! p10-1A-1 Task 15: CLI accepts --repo and --code-lang flags.
+//!
+//! These tests verify that clap parses the new flags without error.
+//! They drive `kebab search --help` (which exercises flag parsing
+//! via clap's help generation path, exiting 0) or use a minimal
+//! config + `--json` round-trip to verify the flags reach the wire.
+
+use std::process::Command;
+
+fn kebab() -> Command {
+    Command::new(env!("CARGO_BIN_EXE_kebab"))
+}
+
+/// `kebab search --help` must exit 0 and mention `--repo`.
+#[test]
+fn cli_search_help_mentions_repo_flag() {
+    let out = kebab()
+        .args(["search", "--help"])
+        .output()
+        .expect("failed to run kebab");
+    // clap help exits 0.
+    assert!(
+        out.status.success(),
+        "kebab search --help exited non-zero: {:?}",
+        out.status
+    );
+    let stdout = String::from_utf8_lossy(&out.stdout);
+    assert!(
+        stdout.contains("--repo"),
+        "--repo flag must appear in search help output:\n{stdout}"
+    );
+}
+
+/// `kebab search --help` must exit 0 and mention `--code-lang`.
+#[test]
+fn cli_search_help_mentions_code_lang_flag() {
+    let out = kebab()
+        .args(["search", "--help"])
+        .output()
+        .expect("failed to run kebab");
+    assert!(
+        out.status.success(),
+        "kebab search --help exited non-zero: {:?}",
+        out.status
+    );
+    let stdout = String::from_utf8_lossy(&out.stdout);
+    assert!(
+        stdout.contains("--code-lang"),
+        "--code-lang flag must appear in search help output:\n{stdout}"
+    );
+}
+
+/// `kebab search --help` must exit 0 and mention `--media`.
+/// Confirms `--media code` value pathway is available (media is
+/// a free-form Vec<String> that already accepted arbitrary values).
+#[test]
+fn cli_search_help_mentions_media_flag() {
+    let out = kebab()
+        .args(["search", "--help"])
+        .output()
+        .expect("failed to run kebab");
+    assert!(
+        out.status.success(),
+        "kebab search --help exited non-zero: {:?}",
+        out.status
+    );
+    let stdout = String::from_utf8_lossy(&out.stdout);
+    assert!(
+        stdout.contains("--media"),
+        "--media flag must appear in search help output:\n{stdout}"
+    );
+}
--- a/crates/kebab-cli/tests/wire_search_hit_no_code_fields.rs
+++ b/crates/kebab-cli/tests/wire_search_hit_no_code_fields.rs
@@ -0,0 +1,47 @@
+//! p10-1A-1 Task 13: regression — markdown SearchHit omits `repo` and
+//! `code_lang` from JSON when both are `None`.
+//!
+//! Proves that adding optional fields to SearchHit does not silently
+//! inject spurious keys into the existing markdown corpus wire shape.
+
+use kebab_core::{
+    Citation, ChunkId, ChunkerVersion, DocumentId, IndexVersion, RetrievalDetail, ScoreKind,
+    SearchHit, WorkspacePath,
+};
+
+#[test]
+fn markdown_hit_omits_repo_and_code_lang() {
+    let hit = SearchHit {
+        rank: 1,
+        chunk_id: ChunkId("c1".into()),
+        doc_id: DocumentId("d1".into()),
+        doc_path: WorkspacePath::new("notes/foo.md".into()).unwrap(),
+        heading_path: vec!["A".into(), "B".into()],
+        section_label: Some("B".into()),
+        snippet: "hi".into(),
+        citation: Citation::Line {
+            path: WorkspacePath::new("notes/foo.md".into()).unwrap(),
+            start: 1,
+            end: 2,
+            section: None,
+        },
+        retrieval: RetrievalDetail::default(),
+        index_version: IndexVersion("v1".into()),
+        embedding_model: None,
+        chunker_version: ChunkerVersion("md-heading-v1".into()),
+        indexed_at: time::OffsetDateTime::UNIX_EPOCH,
+        stale: false,
+        score_kind: ScoreKind::Rrf,
+        repo: None,
+        code_lang: None,
+    };
+    let s = serde_json::to_string(&hit).unwrap();
+    assert!(
+        !s.contains("\"repo\""),
+        "repo should be absent from markdown hit JSON: {s}"
+    );
+    assert!(
+        !s.contains("\"code_lang\""),
+        "code_lang should be absent from markdown hit JSON: {s}"
+    );
+}
--- a/crates/kebab-config/src/lib.rs
+++ b/crates/kebab-config/src/lib.rs
@@ -45,6 +45,11 @@ pub struct Config {
    /// `dark`).
    #[serde(default = "UiCfg::defaults")]
    pub ui: UiCfg,
+    /// p10-1A-1: code ingest settings. `#[serde(default)]` so existing
+    /// config files without an `[ingest]` / `[ingest.code]` section
+    /// load cleanly with built-in defaults.
+    #[serde(default)]
+    pub ingest: IngestCfg,
    /// p9-fb-05: directory of the on-disk config file this `Config`
    /// was loaded from, if any. Populated by `Config::from_file` /
    /// `Config::load` — never serialized (`#[serde(skip)]`). Used by
@@ -265,6 +270,52 @@ impl UiCfg {
    }
 }

+/// p10-1A-1: top-level ingest configuration wrapper. Contains per-media-type
+/// sub-sections; currently only `code` is defined.
+#[derive(Clone, Debug, Default, PartialEq, Serialize, Deserialize)]
+#[serde(default)]
+pub struct IngestCfg {
+    pub code: IngestCodeCfg,
+}
+
+/// p10-1A-1: settings for the code ingest pipeline. All fields have
+/// reasonable defaults so the user need not set anything in `config.toml`
+/// to get working code ingest.
+#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
+#[serde(default)]
+pub struct IngestCodeCfg {
+    /// Generated header sniff. Reads first ~512 bytes, checks 7 markers.
+    pub skip_generated_header: bool,
+    /// Max byte size per file. Bigger files skipped.
+    pub max_file_bytes: u64,
+    /// Max line count per file. Bigger files skipped (byte cap checked first).
+    pub max_file_lines: u32,
+    /// User extra skip globs (gitignore syntax). Applied on top of built-in
+    /// + `.gitignore` + `.kebabignore`.
+    pub extra_skip_globs: Vec<String>,
+    /// AST chunk size cap. Functions/classes longer than this fall back to
+    /// paragraph-based split (1A-2 and later).
+    pub ast_chunk_max_lines: u32,
+    /// Tier 3 fallback chunker: lines per chunk.
+    pub fallback_lines_per_chunk: u32,
+    /// Tier 3 fallback chunker: line overlap between adjacent chunks.
+    pub fallback_lines_overlap: u32,
+}
+
+impl Default for IngestCodeCfg {
+    fn default() -> Self {
+        Self {
+            skip_generated_header: true,
+            max_file_bytes: 262_144,
+            max_file_lines: 5_000,
+            extra_skip_globs: vec![],
+            ast_chunk_max_lines: 200,
+            fallback_lines_per_chunk: 80,
+            fallback_lines_overlap: 20,
+        }
+    }
+}
+
 impl Config {
    /// Defaults per design §6.4.
    pub fn defaults() -> Self {
@@ -302,9 +353,9 @@ impl Config {
            models: ModelsCfg {
                embedding: EmbeddingModelCfg {
                    provider: "fastembed".to_string(),
-                    model: "multilingual-e5-small".to_string(),
+                    model: "multilingual-e5-large".to_string(),
                    version: "v1".to_string(),
-                    dimensions: 384,
+                    dimensions: 1024,
                    batch_size: 64,
                },
                llm: LlmCfg {
@@ -336,6 +387,7 @@ impl Config {
            },
            image: ImageCfg::defaults(),
            ui: UiCfg::defaults(),
+            ingest: IngestCfg::default(),
            // p9-fb-05: defaults are not loaded from disk, so no
            // source_dir. Relative `workspace.root` (rare with
            // defaults) falls back to caller `cwd` via the
@@ -764,7 +816,8 @@ mod tests {
        let c = Config::defaults();
        assert_eq!(c.rag.score_gate, 0.30);
        assert_eq!(c.chunking.target_tokens, 500);
-        assert_eq!(c.models.embedding.dimensions, 384);
+        assert_eq!(c.models.embedding.model, "multilingual-e5-large");
+        assert_eq!(c.models.embedding.dimensions, 1024);
        assert_eq!(c.search.rrf_k, 60);
    }

@@ -947,9 +1000,9 @@ chunker_version = "md-heading-v1"

 [models.embedding]
 provider = "fastembed"
-model = "multilingual-e5-small"
+model = "multilingual-e5-large"
 version = "v1"
-dimensions = 384
+dimensions = 1024
 batch_size = 64

 [models.llm]
@@ -1059,6 +1112,49 @@ max_context_tokens = 8000
            }
        }
    }
+
+    #[test]
+    fn ingest_code_cfg_defaults() {
+        let cfg: IngestCodeCfg = toml::from_str("").unwrap();
+        assert_eq!(cfg.max_file_bytes, 262_144);
+        assert_eq!(cfg.max_file_lines, 5_000);
+        assert!(cfg.skip_generated_header);
+        assert!(cfg.extra_skip_globs.is_empty());
+        assert_eq!(cfg.ast_chunk_max_lines, 200);
+        assert_eq!(cfg.fallback_lines_per_chunk, 80);
+        assert_eq!(cfg.fallback_lines_overlap, 20);
+    }
+
+    #[test]
+    fn ingest_code_cfg_user_override() {
+        let toml = r#"
+            max_file_bytes = 1048576
+            max_file_lines = 20000
+            skip_generated_header = false
+            extra_skip_globs = ["**/fixtures/**", "**/snapshots/**"]
+        "#;
+        let cfg: IngestCodeCfg = toml::from_str(toml).unwrap();
+        assert_eq!(cfg.max_file_bytes, 1_048_576);
+        assert_eq!(cfg.max_file_lines, 20_000);
+        assert!(!cfg.skip_generated_header);
+        assert_eq!(cfg.extra_skip_globs.len(), 2);
+    }
+
+    #[test]
+    fn config_with_ingest_code_section() {
+        // Build a full valid Config serialization and patch only the
+        // [ingest.code] field we care about — avoids having to enumerate
+        // every required Config field in the test fixture.
+        let base = Config::defaults();
+        let mut toml_text = toml::to_string(&base).unwrap();
+        // Inject max_file_bytes override into the [ingest.code] table.
+        toml_text = toml_text.replace(
+            "max_file_bytes = 262144",
+            "max_file_bytes = 524288",
+        );
+        let cfg: Config = toml::from_str(&toml_text).unwrap();
+        assert_eq!(cfg.ingest.code.max_file_bytes, 524_288);
+    }
 }

 #[cfg(test)]
--- a/crates/kebab-core/src/citation.rs
+++ b/crates/kebab-core/src/citation.rs
@@ -37,6 +37,13 @@ pub enum Citation {
        end_ms: u64,
        speaker: Option<String>,
    },
+    Code {
+        path: WorkspacePath,
+        line_start: u32,
+        line_end: u32,
+        symbol: Option<String>,
+        lang: Option<String>,
+    },
 }

 impl Citation {
@@ -46,7 +53,8 @@ impl Citation {
            | Citation::Page { path, .. }
            | Citation::Region { path, .. }
            | Citation::Caption { path, .. }
-            | Citation::Time { path, .. } => path,
+            | Citation::Time { path, .. }
+            | Citation::Code { path, .. } => path,
        }
    }

@@ -80,6 +88,18 @@ impl Citation {
                    None => format!("{}#t={},{}", path.0, s, e),
                }
            }
+            Citation::Code {
+                path,
+                line_start,
+                line_end,
+                ..
+            } => {
+                if line_start == line_end {
+                    format!("{}#L{}", path.0, line_start)
+                } else {
+                    format!("{}#L{}-L{}", path.0, line_start, line_end)
+                }
+            }
        }
    }

@@ -354,4 +374,64 @@ mod tests {
        let r = Citation::parse("notes/x#evil.md#L7");
        assert!(r.is_err(), "path with embedded '#' must be rejected");
    }
+
+    #[test]
+    fn citation_code_variant_serializes_with_kind_tag() {
+        let c = Citation::Code {
+            path: WorkspacePath("crates/kebab-chunk/src/md_heading_v1.rs".into()),
+            line_start: 142,
+            line_end: 168,
+            symbol: Some("MdHeadingV1Chunker::chunk_doc".into()),
+            lang: Some("rust".into()),
+        };
+        let v = serde_json::to_value(&c).unwrap();
+        assert_eq!(v["kind"], "code");
+        assert_eq!(v["line_start"], 142);
+        assert_eq!(v["line_end"], 168);
+        assert_eq!(v["symbol"], "MdHeadingV1Chunker::chunk_doc");
+        assert_eq!(v["lang"], "rust");
+        // Existing 5 variants must NOT pick up these fields.
+        let line = Citation::Line {
+            path: WorkspacePath("notes/foo.md".into()),
+            start: 1,
+            end: 10,
+            section: None,
+        };
+        let lv = serde_json::to_value(&line).unwrap();
+        assert!(lv.get("line_start").is_none());
+        assert!(lv.get("symbol").is_none());
+    }
+
+    #[test]
+    fn citation_code_uri_format() {
+        let c = Citation::Code {
+            path: WorkspacePath("a/b.rs".into()),
+            line_start: 10,
+            line_end: 20,
+            symbol: None,
+            lang: Some("rust".into()),
+        };
+        assert_eq!(c.to_uri(), "a/b.rs#L10-L20");
+        // Single-line uses `#L10`.
+        let single = Citation::Code {
+            path: WorkspacePath("a/b.rs".into()),
+            line_start: 5,
+            line_end: 5,
+            symbol: None,
+            lang: None,
+        };
+        assert_eq!(single.to_uri(), "a/b.rs#L5");
+    }
+
+    #[test]
+    fn citation_code_path_accessor() {
+        let c = Citation::Code {
+            path: WorkspacePath("x.rs".into()),
+            line_start: 1,
+            line_end: 1,
+            symbol: None,
+            lang: None,
+        };
+        assert_eq!(c.path().0, "x.rs");
+    }
 }
--- a/crates/kebab-core/src/ingest.rs
+++ b/crates/kebab-core/src/ingest.rs
@@ -25,10 +25,46 @@ pub struct IngestReport {
    /// extension key under "<no-ext>". `BTreeMap` so the wire JSON
    /// has stable key order across runs.
    pub skipped_by_extension: std::collections::BTreeMap<String, u32>,
+    /// p10-1A-1: files skipped because they matched a repo-local `.gitignore`.
+    #[serde(default)]
+    pub skipped_gitignore: u32,
+    /// p10-1A-1: files skipped because they matched a `.kebabignore` entry.
+    #[serde(default)]
+    pub skipped_kebabignore: u32,
+    /// p10-1A-1: files skipped because they matched the built-in safety-net
+    /// blacklist (`node_modules/`, `target/`, `__pycache__/`, `.venv/`,
+    /// `venv/`, `env/`).
+    #[serde(default)]
+    pub skipped_builtin_blacklist: u32,
+    /// p10-1A-1: files skipped because their first ~512 bytes contained a
+    /// generated-file marker (`@generated`, `do not edit`, …).
+    #[serde(default)]
+    pub skipped_generated: u32,
+    /// p10-1A-1: files skipped because they exceeded `max_file_bytes` or
+    /// `max_file_lines` in `[ingest.code]`.
+    #[serde(default)]
+    pub skipped_size_exceeded: u32,
+    /// p10-1A-1: sample file paths per skip category (≤ 5 each).
+    #[serde(default)]
+    pub skip_examples: SkipExamples,
    /// `None` ↔ wire `items: null` (`--summary-only`).
    pub items: Option<Vec<IngestItem>>,
 }

+/// p10-1A-1: per-category sample of skipped file paths. Each category caps at
+/// 5 entries (oldest-first). Used for debugging "why was X not indexed?"
+#[derive(Clone, Debug, Default, PartialEq, Serialize, Deserialize)]
+pub struct SkipExamples {
+    #[serde(default)]
+    pub generated: Vec<String>,
+    #[serde(default)]
+    pub size_exceeded: Vec<String>,
+    #[serde(default)]
+    pub builtin_blacklist: Vec<String>,
+    #[serde(default)]
+    pub gitignore: Vec<String>,
+}
+
 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 pub struct IngestItem {
    pub kind: IngestItemKind,
@@ -58,3 +94,55 @@ pub enum IngestItemKind {
    Unchanged,
    Error,
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::traits::SourceScope;
+
+    #[test]
+    fn skip_examples_default_is_empty() {
+        let s = SkipExamples::default();
+        assert!(s.generated.is_empty());
+        assert!(s.size_exceeded.is_empty());
+        assert!(s.builtin_blacklist.is_empty());
+        assert!(s.gitignore.is_empty());
+    }
+
+    #[test]
+    fn ingest_report_skip_counters_serialize() {
+        let r = IngestReport {
+            scope: SourceScope {
+                root: std::path::PathBuf::from("/tmp"),
+                include: vec![],
+                exclude: vec![],
+            },
+            scanned: 100,
+            new: 50,
+            updated: 0,
+            skipped: 0,
+            unchanged: 0,
+            errors: 0,
+            duration_ms: 1234,
+            skipped_by_extension: Default::default(),
+            skipped_gitignore: 30,
+            skipped_kebabignore: 5,
+            skipped_builtin_blacklist: 10,
+            skipped_generated: 3,
+            skipped_size_exceeded: 2,
+            skip_examples: SkipExamples {
+                generated: vec!["a/b.pb.rs".into()],
+                size_exceeded: vec![],
+                builtin_blacklist: vec!["node_modules/x.js".into()],
+                gitignore: vec![],
+            },
+            items: None,
+        };
+        let v = serde_json::to_value(&r).unwrap();
+        assert_eq!(v["skipped_gitignore"], 30);
+        assert_eq!(v["skipped_builtin_blacklist"], 10);
+        assert_eq!(v["skipped_generated"], 3);
+        assert_eq!(v["skipped_size_exceeded"], 2);
+        assert_eq!(v["skip_examples"]["generated"][0], "a/b.pb.rs");
+    }
+}
--- a/crates/kebab-core/src/lib.rs
+++ b/crates/kebab-core/src/lib.rs
@@ -59,7 +59,7 @@ pub use answer::{
    Answer, AnswerCitation, AnswerRetrievalSummary, ModelRef, RefusalReason, TokenUsage,
    TraceId, Turn,
 };
-pub use ingest::{IngestItem, IngestItemKind, IngestReport};
+pub use ingest::{IngestItem, IngestItemKind, IngestReport, SkipExamples};
 pub use jobs::{JobFilter, JobId, JobKind, JobRow, JobStatus};
 pub use vector::{VectorHit, VectorRecord};
 pub use errors::CoreError;
--- a/crates/kebab-core/src/metadata.rs
+++ b/crates/kebab-core/src/metadata.rs
@@ -17,6 +17,25 @@ pub struct Metadata {
    pub user_id_alias: Option<String>,
    /// Frontmatter keys we don't recognise are preserved here per §0 Q9.
    pub user: Map<String, Value>,
+
+    /// p10-1A-1: name of the source repo if the file lives inside a git
+    /// working tree (`.git/` walk-up). null otherwise.
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub repo: Option<String>,
+
+    /// p10-1A-1: HEAD branch at ingest time. null when no repo or detached HEAD.
+    /// Informational only — current-state observability, not a partition key.
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub git_branch: Option<String>,
+
+    /// p10-1A-1: HEAD commit (40-hex) at ingest time. null when no repo.
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub git_commit: Option<String>,
+
+    /// p10-1A-1: programming language identifier (lowercase canonical). null
+    /// for markdown / pdf / image. Set by `kebab_parse_code::lang::code_lang_for_path`.
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub code_lang: Option<String>,
 }

 #[derive(Clone, Copy, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)]
@@ -66,3 +85,54 @@ pub enum ProvenanceKind {
    Warning,
    Error,
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn metadata_repo_fields_default_to_none_and_omit_when_serialized() {
+        let m = Metadata {
+            aliases: vec![],
+            tags: vec![],
+            created_at: time::OffsetDateTime::UNIX_EPOCH,
+            updated_at: time::OffsetDateTime::UNIX_EPOCH,
+            source_type: SourceType::Markdown,
+            trust_level: TrustLevel::Primary,
+            user_id_alias: None,
+            user: Default::default(),
+            repo: None,
+            git_branch: None,
+            git_commit: None,
+            code_lang: None,
+        };
+        let v = serde_json::to_value(&m).unwrap();
+        assert!(v.get("repo").is_none());
+        assert!(v.get("git_branch").is_none());
+        assert!(v.get("git_commit").is_none());
+        assert!(v.get("code_lang").is_none());
+    }
+
+    #[test]
+    fn metadata_repo_fields_present_when_some() {
+        let m = Metadata {
+            aliases: vec![],
+            tags: vec![],
+            created_at: time::OffsetDateTime::UNIX_EPOCH,
+            updated_at: time::OffsetDateTime::UNIX_EPOCH,
+            source_type: SourceType::Markdown,
+            trust_level: TrustLevel::Primary,
+            user_id_alias: None,
+            user: Default::default(),
+            repo: Some("kebab".into()),
+            git_branch: Some("main".into()),
+            git_commit: Some("a".repeat(40)),
+            code_lang: Some("rust".into()),
+        };
+        let v = serde_json::to_value(&m).unwrap();
+        assert_eq!(v["repo"], "kebab");
+        assert_eq!(v["git_branch"], "main");
+        assert_eq!(v["git_commit"].as_str().unwrap().len(), 40);
+        assert_eq!(v["code_lang"], "rust");
+    }
+}
--- a/crates/kebab-core/src/search.rs
+++ b/crates/kebab-core/src/search.rs
@@ -61,6 +61,14 @@ pub struct SearchFilters {
    /// p9-fb-36: restrict hits to a single document. None = no filter.
    #[serde(default)]
    pub doc_id: Option<DocumentId>,
+    /// p10-1A-1: filter by `metadata.repo`. Empty = no filter; multi-value = OR.
+    #[serde(default)]
+    pub repo: Vec<String>,
+    /// p10-1A-1: filter by `metadata.code_lang`. Empty = no filter; multi-value = OR.
+    /// Identifiers are lowercase canonical names (`rust`, `python`, `typescript`, ...).
+    /// Unknown values produce empty hits (consistent with `media` policy).
+    #[serde(default)]
+    pub code_lang: Vec<String>,
 }

 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
@@ -89,6 +97,15 @@ pub struct SearchHit {
    /// 옛 wire (fb-38 미만) 부재 시 `Rrf` default — hybrid 가 기본 mode.
    #[serde(default)]
    pub score_kind: ScoreKind,
+    /// p10-1A-1: optional. Filled when the source file lives in a git repo
+    /// (`.git/` walk-up). null for markdown / pdf / image hits and for code
+    /// hits ingested via `kebab ingest-file` outside a repo boundary.
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub repo: Option<String>,
+    /// p10-1A-1: optional. Programming language identifier (lowercase). Set for
+    /// every code/manifest/k8s chunk; null for markdown / pdf / image hits.
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub code_lang: Option<String>,
 }

 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
@@ -101,6 +118,19 @@ pub struct RetrievalDetail {
    pub vector_rank: Option<u32>,
 }

+impl Default for RetrievalDetail {
+    fn default() -> Self {
+        Self {
+            method: SearchMode::Hybrid,
+            fusion_score: 0.0,
+            lexical_score: None,
+            vector_score: None,
+            lexical_rank: None,
+            vector_rank: None,
+        }
+    }
+}
+
 /// Filter for `kb-app::list_docs` (§7.2 DocumentStore::list_documents).
 #[derive(Clone, Debug, Default, PartialEq, Serialize, Deserialize)]
 pub struct DocFilter {
@@ -257,6 +287,8 @@ mod tests {
            indexed_at: datetime!(2026-05-09 12:00:00 UTC),
            stale: true,
            score_kind: ScoreKind::Rrf,
+            repo: None,
+            code_lang: None,
        };
        let v = serde_json::to_value(&hit).unwrap();
        assert_eq!(v["indexed_at"], "2026-05-09T12:00:00Z");
@@ -429,4 +461,74 @@ mod tests {
        assert!(v["response"].is_null());
        assert_eq!(v["error"]["code"], "config_invalid");
    }
+
+    #[test]
+    fn search_hit_repo_and_code_lang_are_optional_and_omit_when_none() {
+        let hit = SearchHit {
+            rank: 1,
+            chunk_id: ChunkId("c1".into()),
+            doc_id: DocumentId("d1".into()),
+            doc_path: WorkspacePath("a.md".into()),
+            heading_path: vec![],
+            section_label: None,
+            snippet: "".into(),
+            citation: Citation::Line {
+                path: WorkspacePath("a.md".into()),
+                start: 1,
+                end: 2,
+                section: None,
+            },
+            retrieval: RetrievalDetail::default(),
+            index_version: IndexVersion("v1".into()),
+            embedding_model: None,
+            chunker_version: ChunkerVersion("md-heading-v1".into()),
+            indexed_at: time::OffsetDateTime::UNIX_EPOCH,
+            stale: false,
+            score_kind: ScoreKind::Rrf,
+            repo: None,
+            code_lang: None,
+        };
+        let v = serde_json::to_value(&hit).unwrap();
+        assert!(v.get("repo").is_none(), "repo should be omitted when None");
+        assert!(v.get("code_lang").is_none(), "code_lang should be omitted when None");
+    }
+
+    #[test]
+    fn search_hit_repo_and_code_lang_present_when_some() {
+        let hit = SearchHit {
+            rank: 1,
+            chunk_id: ChunkId("c1".into()),
+            doc_id: DocumentId("d1".into()),
+            doc_path: WorkspacePath("a.rs".into()),
+            heading_path: vec![],
+            section_label: None,
+            snippet: "".into(),
+            citation: Citation::Code {
+                path: WorkspacePath("a.rs".into()),
+                line_start: 1,
+                line_end: 2,
+                symbol: None,
+                lang: Some("rust".into()),
+            },
+            retrieval: RetrievalDetail::default(),
+            index_version: IndexVersion("v1".into()),
+            embedding_model: None,
+            chunker_version: ChunkerVersion("code-rust-ast-v1".into()),
+            indexed_at: time::OffsetDateTime::UNIX_EPOCH,
+            stale: false,
+            score_kind: ScoreKind::Rrf,
+            repo: Some("kebab".into()),
+            code_lang: Some("rust".into()),
+        };
+        let v = serde_json::to_value(&hit).unwrap();
+        assert_eq!(v["repo"], "kebab");
+        assert_eq!(v["code_lang"], "rust");
+    }
+
+    #[test]
+    fn search_filters_repo_and_code_lang_default_to_empty_vec() {
+        let f = SearchFilters::default();
+        assert!(f.repo.is_empty());
+        assert!(f.code_lang.is_empty());
+    }
 }
--- a/crates/kebab-embed-local/Cargo.toml
+++ b/crates/kebab-embed-local/Cargo.toml
@@ -5,14 +5,14 @@ edition       = { workspace = true }
 rust-version  = { workspace = true }
 license       = { workspace = true }
 repository    = { workspace = true }
-description   = "Local fastembed-rs adapter implementing kb_core::Embedder (multilingual-e5-small default)"
+description   = "Local fastembed-rs adapter implementing kb_core::Embedder (multilingual-e5-large default, e5-small backwards-compat)"

 [dependencies]
 kebab-config = { path = "../kebab-config" }
 kebab-embed = { path = "../kebab-embed" }
 # Default features bring `ort-download-binaries` (bundled ONNX runtime)
 # and `hf-hub-native-tls` (first-run model download). No extra features
-# needed for the multilingual-e5-small path.
+# needed for the multilingual-e5-{small,large} paths.
 fastembed = { workspace = true }
 tracing   = { workspace = true }
 anyhow    = { workspace = true }
--- a/crates/kebab-embed-local/src/lib.rs
+++ b/crates/kebab-embed-local/src/lib.rs
@@ -1,8 +1,9 @@
 //! `kb-embed-local` — `FastembedEmbedder`, a local ONNX-backed
 //! [`Embedder`](kebab_embed::Embedder) implementation.
 //!
-//! Wraps [`fastembed::TextEmbedding`] for the default `multilingual-e5-small`
-//! (384-dim) model. Honors `config.models.embedding.batch_size` and applies
+//! Wraps [`fastembed::TextEmbedding`]. Default is `multilingual-e5-large`
+//! (1024-dim, p9-fb-39b); `multilingual-e5-small` (384-dim) is also supported
+//! for backwards-compat. Honors `config.models.embedding.batch_size` and applies
 //! the e5 prefix convention (§11.3 of the design report):
 //!
 //! * `EmbeddingKind::Document` → `"passage: "` prefix
@@ -69,9 +70,9 @@ impl FastembedEmbedder {
            .with_context(|| format!("create fastembed cache dir {}", cache_dir.display()))?;

        // 2. Resolve the fastembed enum variant from
-        //    `config.models.embedding.model`. Currently only the default
-        //    `multilingual-e5-small` is wired; other model names error
-        //    out with a clear message rather than silently misconfiguring.
+        //    `config.models.embedding.model`. Currently `multilingual-e5-large`
+        //    (default) and `multilingual-e5-small` are wired; other model names
+        //    error out with a clear message rather than silently misconfiguring.
        let model_name = resolve_model(&config.models.embedding.model)?;

        // 3. Verify dim match BEFORE loading the model — if the config
@@ -100,7 +101,7 @@ impl FastembedEmbedder {
            target: "kebab-embed-local",
            model = %config.models.embedding.model,
            cache_dir = %cache_dir.display(),
-            "loading embedding model (first run will download ~470MB)"
+            "loading embedding model (first run downloads model weights — ~470MB for e5-small, ~1.3GB for e5-large)"
        );
        let inner = TextEmbedding::try_new(opts)
            .context("fastembed: TextEmbedding::try_new")?;
@@ -193,17 +194,18 @@ fn prefix_input(input: &EmbeddingInput<'_>) -> String {
 }

 /// Resolve a `config.models.embedding.model` string to a fastembed
-/// `EmbeddingModel` enum variant. Only `multilingual-e5-small` is wired
-/// for p3-2; additional model names should be added (and their dims
-/// pinned in tests) as needed.
+/// `EmbeddingModel` enum variant. Currently supports `multilingual-e5-small`
+/// (384-dim) and `multilingual-e5-large` (1024-dim); additional model names
+/// should be added (and their dims pinned in tests) as needed.
 fn resolve_model(name: &str) -> Result<EmbeddingModel> {
    match name {
        "multilingual-e5-small" => Ok(EmbeddingModel::MultilingualE5Small),
+        "multilingual-e5-large" => Ok(EmbeddingModel::MultilingualE5Large),
        other => anyhow::bail!(
            "kb-embed-local: unsupported embedding model {other:?}; \
-             this adapter currently only ships `multilingual-e5-small`. \
-             Add a new arm to `resolve_model` (and a fastembed feature \
-             flag if needed) to support more."
+             this adapter currently ships `multilingual-e5-small` and \
+             `multilingual-e5-large`. Add a new arm to `resolve_model` \
+             (and a fastembed feature flag if needed) to support more."
        ),
    }
 }
@@ -294,6 +296,12 @@ mod tests {
        resolve_model("multilingual-e5-small").expect("default model resolves");
    }

+    #[test]
+    fn resolve_model_supports_e5_large() {
+        let m = resolve_model("multilingual-e5-large").expect("e5-large should resolve");
+        let _ = m;
+    }
+
    #[test]
    fn resolve_unknown_model_errors() {
        let err = resolve_model("not-a-real-model").expect_err("unknown model errors");
@@ -301,6 +309,21 @@ mod tests {
        assert!(msg.contains("unsupported embedding model"), "msg={msg}");
    }

+    // ── check_dim ────────────────────────────────────────────────────
+
+    #[test]
+    fn check_dim_passes_for_1024() {
+        check_dim(1024, 1024).expect("matching dims must pass");
+    }
+
+    #[test]
+    fn check_dim_rejects_384_vs_1024() {
+        let err = check_dim(384, 1024).expect_err("dim mismatch must error");
+        let msg = format!("{err}");
+        assert!(msg.contains("384") && msg.contains("1024"),
+            "error must mention both dims, got: {msg}");
+    }
+
    // expand_path tests live in `kb-config::paths`. The adapter imports
    // it and trusts the upstream coverage rather than duplicating it.
 }
--- a/crates/kebab-embed-local/tests/embed_model.rs
+++ b/crates/kebab-embed-local/tests/embed_model.rs
@@ -3,10 +3,11 @@
 //!
 //! ## Why every test in this file is `#[ignore]`
 //!
-//! The first call to `FastembedEmbedder::new` downloads ~470 MB of
-//! weights from Hugging Face into `data_dir/models/fastembed/`. Doing
-//! that on every `cargo test` invocation is wasteful, so the bare
-//! invocation skips this file entirely.
+//! The first call to `FastembedEmbedder::new` downloads ~1.3 GB of
+//! weights (multilingual-e5-large per p9-fb-39b default) from Hugging
+//! Face into `data_dir/models/fastembed/`. Doing that on every
+//! `cargo test` invocation is wasteful, so the bare invocation skips
+//! this file entirely.
 //!
 //! Run the full suite with:
 //! ```text
@@ -58,19 +59,20 @@ fn shared_embedder() -> &'static FastembedEmbedder {
 // ─── construction ─────────────────────────────────────────────────────

 #[test]
-#[ignore = "downloads ~470MB ONNX model on first run; CI-only"]
-fn default_config_constructs_with_dims_384() {
+#[ignore = "downloads ~1.3GB ONNX model on first run; CI-only"]
+fn default_config_constructs_with_dims_1024() {
+    // p9-fb-39b: default flipped to multilingual-e5-large (1024 dim).
    let emb = shared_embedder();
-    assert_eq!(emb.dimensions(), 384);
-    assert_eq!(emb.model_id().0, "multilingual-e5-small");
+    assert_eq!(emb.dimensions(), 1024);
+    assert_eq!(emb.model_id().0, "multilingual-e5-large");
    assert_eq!(emb.model_version().0, "v1");
 }

 #[test]
-#[ignore = "downloads ~470MB ONNX model on first run; CI-only"]
+#[ignore = "downloads ~1.3GB ONNX model on first run; CI-only"]
 fn mismatched_dims_in_config_errors_at_construction() {
    let (mut cfg, _tmp) = test_config();
-    cfg.models.embedding.dimensions = 512; // model is 384
+    cfg.models.embedding.dimensions = 512; // model is 1024 (e5-large default)
    // `FastembedEmbedder` deliberately does not implement `Debug`
    // (its inner ONNX session has no useful debug shape), so we
    // can't use `expect_err`; match the Result manually.
@@ -80,7 +82,7 @@ fn mismatched_dims_in_config_errors_at_construction() {
    };
    let msg = format!("{err}");
    assert!(msg.contains("dimension mismatch"), "msg={msg}");
-    assert!(msg.contains("384"), "msg={msg}");
+    assert!(msg.contains("1024"), "msg={msg}");
    assert!(msg.contains("512"), "msg={msg}");
 }

@@ -104,8 +106,8 @@ fn document_and_query_yield_different_vectors() {
        ])
        .expect("embed two inputs");
    assert_eq!(out.len(), 2);
-    assert_eq!(out[0].len(), 384);
-    assert_eq!(out[1].len(), 384);
+    assert_eq!(out[0].len(), 1024);
+    assert_eq!(out[1].len(), 1024);

    // Both vectors are L2-normalized → cosine similarity == dot product.
    let cos: f32 = out[0]
@@ -142,11 +144,11 @@ fn output_vectors_are_l2_normalized() {
    ];
    let out = emb.embed(&inputs).expect("embed");
    // Per `kebab_embed::assert_unit_norm` docs: `5e-4` is the safe bound at
-    // 384 dims (f32::EPSILON × √384 ≈ 2.3e-6, but ONNX kernels add
+    // 1024 dims (f32::EPSILON × √1024 ≈ 2.3e-6, but ONNX kernels add
    // their own per-component noise; 1e-3 is very generous and matches
    // the spec's `± 1e-3`).
    kebab_embed::assert_unit_norm(&out, 1e-3);
-    kebab_embed::assert_vector_shape(&out, 384);
+    kebab_embed::assert_vector_shape(&out, 1024);
 }

 // ─── determinism ──────────────────────────────────────────────────────
@@ -254,7 +256,7 @@ fn snapshot_aggregate_hash_is_stable() {
    // Round every component to 4 decimal places, hash deterministically.
    let mut hasher = DefaultHasher::new();
    for (i, v) in out.iter().enumerate() {
-        assert_eq!(v.len(), 384, "row {i} dim mismatch");
+        assert_eq!(v.len(), 1024, "row {i} dim mismatch");
        for x in v {
            let rounded: i32 = (*x * 1.0e4).round() as i32;
            rounded.hash(&mut hasher);
--- a/crates/kebab-eval/src/compare.rs
+++ b/crates/kebab-eval/src/compare.rs
@@ -184,6 +184,18 @@ pub fn render_report_md(report: &CompareReport) -> String {
            ),
        );
    }
+    for k in crate::metrics::TOP_K_VARIANTS {
+        let _ = writeln!(
+            out,
+            "| precision@{k}_chunk | {} | {} | {} |",
+            fmt(a.precision_at_k_chunk.get(k).copied().unwrap_or(f32::NAN)),
+            fmt(b.precision_at_k_chunk.get(k).copied().unwrap_or(f32::NAN)),
+            fmt_delta(
+                a.precision_at_k_chunk.get(k).copied().unwrap_or(f32::NAN),
+                b.precision_at_k_chunk.get(k).copied().unwrap_or(f32::NAN),
+            ),
+        );
+    }
    let _ = writeln!(
        out,
        "| citation_coverage | {} | {} | {} |",
@@ -419,6 +431,7 @@ fn build_deltas(
    }
    let mut hit = serde_json::Map::new();
    let mut recall = serde_json::Map::new();
+    let mut precision = serde_json::Map::new();
    for k in crate::metrics::TOP_K_VARIANTS {
        hit.insert(
            k.to_string(),
@@ -434,11 +447,19 @@ fn build_deltas(
                b.recall_at_k_doc.get(k).copied().unwrap_or(f32::NAN),
            ),
        );
+        precision.insert(
+            k.to_string(),
+            d(
+                a.precision_at_k_chunk.get(k).copied().unwrap_or(f32::NAN),
+                b.precision_at_k_chunk.get(k).copied().unwrap_or(f32::NAN),
+            ),
+        );
    }
    serde_json::json!({
        "hit_at_k": hit,
        "mrr": d(a.mrr, b.mrr),
        "recall_at_k_doc": recall,
+        "precision_at_k_chunk": precision,
        "citation_coverage": d(a.citation_coverage, b.citation_coverage),
        "groundedness": d(a.groundedness, b.groundedness),
        "empty_result_rate": d(a.empty_result_rate, b.empty_result_rate),
@@ -484,6 +505,7 @@ mod tests {
            hit_at_k: Default::default(),
            mrr: 0.5,
            recall_at_k_doc: Default::default(),
+            precision_at_k_chunk: Default::default(),
            citation_coverage: f32::NAN,
            groundedness: 0.0,
            empty_result_rate: 0.0,
--- a/crates/kebab-eval/src/metrics.rs
+++ b/crates/kebab-eval/src/metrics.rs
@@ -58,6 +58,14 @@ pub struct AggregateMetrics {
    pub hit_at_k: BTreeMap<u32, f32>,
    pub mrr: f32,
    pub recall_at_k_doc: BTreeMap<u32, f32>,
+    /// p9-fb-39: chunk-level precision at k. Binary relevance via
+    /// `expected_chunk_ids` (a hit is "relevant" if its chunk_id is
+    /// in the golden's `expected_chunk_ids`). Denominator is k (fixed)
+    /// — `hits.len() < k` still divides by k, treating shortfall as
+    /// precision loss (mirrors `hit_at_k`). Queries with empty
+    /// `expected_chunk_ids` are skipped (mirrors `hit_at_k_chunk`).
+    #[serde(default)]
+    pub precision_at_k_chunk: BTreeMap<u32, f32>,
    #[serde(
        serialize_with = "serialize_f32_nan_as_null",
        deserialize_with = "deserialize_f32_or_nan"
@@ -187,6 +195,8 @@ pub(crate) fn aggregate_from_rows(
        TOP_K_VARIANTS.iter().map(|k| (*k, (0_u32, 0_u32))).collect();
    let mut recall_at_k_doc: BTreeMap<u32, (f64, u32)> =
        TOP_K_VARIANTS.iter().map(|k| (*k, (0.0_f64, 0_u32))).collect();
+    let mut precision_at_k_chunk: BTreeMap<u32, (f64, u32)> =
+        TOP_K_VARIANTS.iter().map(|k| (*k, (0.0_f64, 0_u32))).collect();

    let mut mrr_sum: f64 = 0.0;
    let mut mrr_denom: u32 = 0;
@@ -243,6 +253,18 @@ pub(crate) fn aggregate_from_rows(
            {
                mrr_sum += 1.0 / f64::from(rank);
            }
+            // p9-fb-39: precision@k_chunk — count of top-k hits whose
+            // chunk_id is in `expected`, divided by k (fixed denominator).
+            for k in TOP_K_VARIANTS {
+                let hits_in_topk_relevant = qr
+                    .hits_top_k
+                    .iter()
+                    .filter(|h| h.rank <= *k && expected.contains(&h.chunk_id))
+                    .count();
+                let entry = precision_at_k_chunk.get_mut(k).expect("init");
+                entry.0 += hits_in_topk_relevant as f64 / f64::from(*k);
+                entry.1 += 1;
+            }
        }

        // recall@k_doc (doc-level, requires non-empty expected_doc_ids
@@ -316,7 +338,8 @@ pub(crate) fn aggregate_from_rows(
                        | Citation::Page { path, .. }
                        | Citation::Region { path, .. }
                        | Citation::Caption { path, .. }
-                        | Citation::Time { path, .. } => !path.0.is_empty(),
+                        | Citation::Time { path, .. }
+                        | Citation::Code { path, .. } => !path.0.is_empty(),
                    });
                if covered {
                    citation_num += 1;
@@ -333,6 +356,7 @@ pub(crate) fn aggregate_from_rows(
            mrr_sum / f64::from(mrr_denom)
        }),
        recall_at_k_doc: round_recall_map(&recall_at_k_doc),
+        precision_at_k_chunk: round_recall_map(&precision_at_k_chunk),
        citation_coverage: ratio_or_nan(citation_num, citation_denom),
        groundedness: ratio_or_zero(groundedness_num, groundedness_denom),
        empty_result_rate: ratio_or_zero(empty_result_count, total_queries),
@@ -449,6 +473,8 @@ mod tests {
            indexed_at: OffsetDateTime::UNIX_EPOCH,
            stale: false,
            score_kind: kebab_core::ScoreKind::Rrf,
+            repo: None,
+            code_lang: None,
        }
    }

@@ -674,4 +700,114 @@ mod tests {
        assert_eq!(agg.failed_queries, 1);
        assert_eq!(agg.total_queries, 1);
    }
+
+    #[test]
+    fn precision_at_k_chunk_field_default_empty_on_old_json() {
+        // Old eval_runs.metrics_json predates fb-39 — no precision_at_k_chunk field.
+        // serde(default) yields empty BTreeMap.
+        let old = serde_json::json!({
+            "hit_at_k": {"1": 0.5, "3": 0.5, "5": 0.5, "10": 0.5},
+            "mrr": 0.5,
+            "recall_at_k_doc": {"1": 0.0, "3": 0.0, "5": 0.0, "10": 0.0},
+            "citation_coverage": null,
+            "groundedness": 0.0,
+            "empty_result_rate": 0.0,
+            "refusal_correctness": null,
+            "total_queries": 1,
+            "failed_queries": 0
+        });
+        let parsed: AggregateMetrics =
+            serde_json::from_value(old).expect("backwards-compat deserialize");
+        assert!(parsed.precision_at_k_chunk.is_empty());
+    }
+
+    #[test]
+    fn precision_at_k_chunk_exact_match() {
+        // expected = [c1, c2, c3]. Top-5 hits: [c1@1, c2@2, c3@3, x@4, y@5].
+        // P@5 = 3/5 = 0.6. P@10 = 3/10 = 0.3.
+        let queries = vec![gq("q1", &["c1", "c2", "c3"], &["d1"])];
+        let rows = vec![record(
+            "q1",
+            vec![
+                hit(1, "c1", "d1"),
+                hit(2, "c2", "d1"),
+                hit(3, "c3", "d1"),
+                hit(4, "x", "d1"),
+                hit(5, "y", "d1"),
+            ],
+            None,
+            None,
+        )];
+        let agg = aggregate_from_rows(&queries, &rows).unwrap();
+        assert_eq!(agg.precision_at_k_chunk[&5], 0.6);
+        assert_eq!(agg.precision_at_k_chunk[&10], 0.3);
+    }
+
+    #[test]
+    fn precision_at_k_chunk_partial_topk_divides_by_k() {
+        // expected = [c1, c2]. Hits: only [c1@1, c2@2, x@3] (3 results).
+        // P@5 = 2/5 = 0.4 (denominator is k, not hits.len()).
+        let queries = vec![gq("q1", &["c1", "c2"], &["d1"])];
+        let rows = vec![record(
+            "q1",
+            vec![hit(1, "c1", "d1"), hit(2, "c2", "d1"), hit(3, "x", "d1")],
+            None,
+            None,
+        )];
+        let agg = aggregate_from_rows(&queries, &rows).unwrap();
+        assert_eq!(agg.precision_at_k_chunk[&5], 0.4);
+        assert_eq!(agg.precision_at_k_chunk[&10], 0.2);
+    }
+
+    #[test]
+    fn precision_at_k_chunk_zero_relevant_in_topk() {
+        // expected = [c1]. Hits: [x@1, y@2, z@3] (none relevant).
+        // P@5 = 0/5 = 0.0.
+        let queries = vec![gq("q1", &["c1"], &["d1"])];
+        let rows = vec![record(
+            "q1",
+            vec![hit(1, "x", "d1"), hit(2, "y", "d1"), hit(3, "z", "d1")],
+            None,
+            None,
+        )];
+        let agg = aggregate_from_rows(&queries, &rows).unwrap();
+        assert_eq!(agg.precision_at_k_chunk[&5], 0.0);
+    }
+
+    #[test]
+    fn precision_at_k_chunk_empty_expected_skipped() {
+        // expected_chunk_ids = []. Skipped → final BTreeMap entry value = 0.0
+        // (zero-denom path in round_recall_map). Mirrors recall_at_k_doc behavior.
+        let queries = vec![gq("q1", &[], &["d1"])];
+        let rows = vec![record("q1", vec![hit(1, "c1", "d1")], None, None)];
+        let agg = aggregate_from_rows(&queries, &rows).unwrap();
+        assert_eq!(agg.precision_at_k_chunk[&5], 0.0);
+    }
+
+    #[test]
+    fn precision_at_k_chunk_two_queries_averaged() {
+        // q1: expected=[c1], hits=[c1@1, x@2, y@3]   → P@5 = 1/5 = 0.2
+        // q2: expected=[c1, c2], hits=[c1@1, c2@2]  → P@5 = 2/5 = 0.4
+        // Avg P@5 = 0.3.
+        let queries = vec![
+            gq("q1", &["c1"], &["d1"]),
+            gq("q2", &["c1", "c2"], &["d2"]),
+        ];
+        let rows = vec![
+            record(
+                "q1",
+                vec![hit(1, "c1", "d1"), hit(2, "x", "d1"), hit(3, "y", "d1")],
+                None,
+                None,
+            ),
+            record(
+                "q2",
+                vec![hit(1, "c1", "d2"), hit(2, "c2", "d2")],
+                None,
+                None,
+            ),
+        ];
+        let agg = aggregate_from_rows(&queries, &rows).unwrap();
+        assert_eq!(agg.precision_at_k_chunk[&5], 0.3);
+    }
 }
--- a/crates/kebab-eval/tests/fixtures/eval/compare-1.json
+++ b/crates/kebab-eval/tests/fixtures/eval/compare-1.json
@@ -11,6 +11,12 @@
      "5": 0.666700005531311
    },
    "mrr": 0.41670000553131104,
+    "precision_at_k_chunk": {
+      "1": 0.33329999446868896,
+      "10": 0.06669999659061432,
+      "3": 0.11110000312328339,
+      "5": 0.13330000638961792
+    },
    "recall_at_k_doc": {
      "1": 0.33329999446868896,
      "10": 0.666700005531311,
@@ -32,6 +38,12 @@
      "5": 1.0
    },
    "mrr": 0.833299994468689,
+    "precision_at_k_chunk": {
+      "1": 0.666700005531311,
+      "10": 0.10000000149011612,
+      "3": 0.33329999446868896,
+      "5": 0.20000000298023224
+    },
    "recall_at_k_doc": {
      "1": 0.666700005531311,
      "10": 1.0,
@@ -53,6 +65,12 @@
      "5": 0.33329999446868896
    },
    "mrr": 0.41659998893737793,
+    "precision_at_k_chunk": {
+      "1": 0.33340001106262207,
+      "10": 0.0333000048995018,
+      "3": 0.22219999134540558,
+      "5": 0.06669999659061432
+    },
    "recall_at_k_doc": {
      "1": 0.33340001106262207,
      "10": 0.33329999446868896,
--- a/crates/kebab-eval/tests/metrics_and_compare.rs
+++ b/crates/kebab-eval/tests/metrics_and_compare.rs
@@ -87,6 +87,8 @@ fn hit(rank: u32, chunk_id: &str, doc_id: &str) -> SearchHit {
        indexed_at: OffsetDateTime::UNIX_EPOCH,
        stale: false,
        score_kind: kebab_core::ScoreKind::Rrf,
+        repo: None,
+        code_lang: None,
    }
 }

@@ -203,6 +205,7 @@ fn store_aggregate_rejects_missing_run() {
        hit_at_k: Default::default(),
        mrr: 0.0,
        recall_at_k_doc: Default::default(),
+        precision_at_k_chunk: Default::default(),
        citation_coverage: f32::NAN,
        groundedness: 0.0,
        empty_result_rate: 0.0,
--- a/crates/kebab-mcp/src/tools/search.rs
+++ b/crates/kebab-mcp/src/tools/search.rs
@@ -110,6 +110,8 @@ pub fn handle(state: &KebabAppState, input: SearchInput) -> CallToolResult {
        media,
        ingested_after,
        doc_id: input.doc_id.clone().map(kebab_core::DocumentId),
+        repo: vec![],
+        code_lang: vec![],
    };

    let query = kebab_core::SearchQuery {
--- a/crates/kebab-normalize/src/lib.rs
+++ b/crates/kebab-normalize/src/lib.rs
@@ -467,6 +467,10 @@ mod tests {
            trust_level: TrustLevel::Primary,
            user_id_alias: None,
            user,
+            repo: None,
+            git_branch: None,
+            git_commit: None,
+            code_lang: None,
        }
    }

--- a/crates/kebab-parse-code/Cargo.toml
+++ b/crates/kebab-parse-code/Cargo.toml
@@ -0,0 +1,15 @@
+[package]
+name = "kebab-parse-code"
+version       = { workspace = true }
+edition       = { workspace = true }
+rust-version  = { workspace = true }
+license       = { workspace = true }
+repository    = { workspace = true }
+description   = "Language-aware code parsing infrastructure (lang dispatch, .git/ detect, skip helpers) for the kebab pipeline (P10-1A-1)"
+
+[dependencies]
+anyhow      = { workspace = true }
+gix         = { workspace = true }
+
+[dev-dependencies]
+tempfile = { workspace = true }
--- a/crates/kebab-parse-code/src/lang.rs
+++ b/crates/kebab-parse-code/src/lang.rs
@@ -0,0 +1,42 @@
+//! Canonical extension → language identifier mapping (spec §3.5).
+//!
+//! Lowercase canonical identifiers, matching tree-sitter parser conventions:
+//! `rust`, `python`, `typescript`, `javascript`, `go`, `java`, `kotlin`, `c`,
+//! `cpp`, `yaml`, `toml`, `json`, `shell`, `make`, `dockerfile`.
+
+use std::path::Path;
+
+/// Returns the canonical language identifier for a given file path, or
+/// `None` if the extension / filename is not recognized.
+///
+/// Matching priority:
+///   1. exact filename match (e.g. `Dockerfile`, `Makefile`)
+///   2. lowercase extension match
+pub fn code_lang_for_path(path: &Path) -> Option<&'static str> {
+    if let Some(name) = path.file_name().and_then(|n| n.to_str()) {
+        match name {
+            "Dockerfile" => return Some("dockerfile"),
+            "Makefile" | "GNUmakefile" => return Some("make"),
+            _ => {}
+        }
+    }
+    let ext = path.extension()?.to_str()?.to_ascii_lowercase();
+    match ext.as_str() {
+        "rs" => Some("rust"),
+        "py" | "pyi" => Some("python"),
+        "ts" | "tsx" => Some("typescript"),
+        "js" | "mjs" | "cjs" | "jsx" => Some("javascript"),
+        "go" => Some("go"),
+        "java" => Some("java"),
+        "kt" | "kts" => Some("kotlin"),
+        "c" | "h" => Some("c"),
+        "cpp" | "cc" | "cxx" | "hpp" | "hh" | "hxx" => Some("cpp"),
+        "yaml" | "yml" => Some("yaml"),
+        "toml" => Some("toml"),
+        "json" => Some("json"),
+        "sh" | "bash" | "zsh" => Some("shell"),
+        "mk" => Some("make"),
+        "dockerfile" => Some("dockerfile"),
+        _ => None,
+    }
+}
--- a/crates/kebab-parse-code/src/lib.rs
+++ b/crates/kebab-parse-code/src/lib.rs
@@ -0,0 +1,22 @@
+//! `kebab-parse-code` — language-aware parsing for code corpora.
+//!
+//! Phase 1A-1 ships infrastructure only:
+//!
+//! - [`lang::code_lang_for_path`] — extension → language identifier.
+//! - [`repo::detect_repo`] — `.git/` walk-up → repo / branch / commit metadata.
+//! - [`skip::is_generated_file`] / [`skip::is_oversized`] — pre-ingest skip
+//!   helpers consulted by `kebab-source-fs`.
+//! - [`skip::BUILTIN_BLACKLIST`] — 6-entry safety-net pattern list.
+//!
+//! Per-language parser modules (`rust`, `python`, `typescript`, …) land in
+//! later phases (1A-2 onwards). The crate boundary follows other
+//! `kebab-parse-*` crates per design §8: must NOT depend on store / embed
+//! / llm / rag.
+
+pub mod lang;
+pub mod repo;
+pub mod skip;
+
+pub use lang::code_lang_for_path;
+pub use repo::{RepoMeta, detect_repo};
+pub use skip::{BUILTIN_BLACKLIST, is_generated_file, is_oversized};
--- a/crates/kebab-parse-code/src/repo.rs
+++ b/crates/kebab-parse-code/src/repo.rs
@@ -0,0 +1,61 @@
+//! Git repo auto-detection (spec §5.1).
+//!
+//! Walks up from `path` looking for a `.git/` directory. If found, reads
+//! repo dir name, current branch, and HEAD commit using `gix` (pure Rust;
+//! no `git` binary on PATH required).
+
+use std::path::Path;
+
+#[derive(Clone, Debug, PartialEq, Eq)]
+pub struct RepoMeta {
+    pub name: String,
+    pub branch: Option<String>,
+    pub commit: Option<String>,
+}
+
+/// Walk up from `path` until a `.git/` directory is found. Returns repo
+/// metadata, or `None` if no repo boundary is reached before the filesystem
+/// root.
+///
+/// - `name`: directory name containing `.git/`.
+/// - `branch`: current HEAD branch, or `"detached"` if detached HEAD, or
+///   `None` if branch can't be read.
+/// - `commit`: 40-hex commit SHA at HEAD, or `None` if empty repo / read
+///   failure.
+///
+/// `.git/` as a file (worktree marker / submodule) returns `None` for
+/// `branch` and `commit` and falls back to the parent dir name for `name`.
+pub fn detect_repo(path: &Path) -> Option<RepoMeta> {
+    let mut cur = if path.is_dir() { path } else { path.parent()? };
+    loop {
+        let dotgit = cur.join(".git");
+        if dotgit.is_dir() {
+            let name = cur.file_name()?.to_string_lossy().into_owned();
+            let (branch, commit) = read_head(cur);
+            return Some(RepoMeta { name, branch, commit });
+        } else if dotgit.is_file() {
+            let name = cur.file_name()?.to_string_lossy().into_owned();
+            return Some(RepoMeta { name, branch: None, commit: None });
+        }
+        cur = cur.parent()?;
+    }
+}
+
+fn read_head(repo_dir: &Path) -> (Option<String>, Option<String>) {
+    match gix::open(repo_dir) {
+        Ok(repo) => {
+            let branch = repo
+                .head_name()
+                .ok()
+                .flatten()
+                .map(|n| n.shorten().to_string())
+                .or_else(|| Some("detached".to_string()));
+            let commit = repo
+                .head_id()
+                .ok()
+                .map(|id| id.to_string());
+            (branch, commit)
+        }
+        Err(_) => (None, None),
+    }
+}
--- a/crates/kebab-parse-code/src/skip.rs
+++ b/crates/kebab-parse-code/src/skip.rs
@@ -0,0 +1,65 @@
+//! Pre-ingest skip helpers (spec §5.2 + §5.3 + §5.4).
+//!
+//! - [`BUILTIN_BLACKLIST`] — 6 gitignore-style patterns universal across
+//!   ecosystems. Source of truth: spec §5.2.
+//! - [`is_generated_file`] — reads first ~512 bytes, checks for 7
+//!   case-insensitive markers.
+//! - [`is_oversized`] — byte cap then line cap.
+
+use anyhow::Result;
+use std::fs::File;
+use std::io::{BufRead, BufReader, Read};
+use std::path::Path;
+
+/// 6 built-in gitignore-style patterns. Applied in addition to `.gitignore`
+/// + `.kebabignore`. User can override via `.kebabignore` negation
+///   (`!pattern`).
+pub const BUILTIN_BLACKLIST: &[&str] = &[
+    "**/node_modules/**",
+    "**/target/**",
+    "**/__pycache__/**",
+    "**/.venv/**",
+    "**/venv/**",
+    "**/env/**",
+];
+
+/// Read first 512 bytes, check for any of 7 case-insensitive generated-file
+/// markers. Returns Ok(true) on match, Ok(false) otherwise.
+pub fn is_generated_file(path: &Path) -> Result<bool> {
+    let mut buf = [0u8; 512];
+    let mut f = File::open(path)?;
+    let n = f.read(&mut buf)?;
+    if n == 0 {
+        return Ok(false);
+    }
+    let head = std::str::from_utf8(&buf[..n]).unwrap_or("");
+    let lower: String = head.lines().take(10).collect::<Vec<_>>().join("\n").to_ascii_lowercase();
+    Ok(
+        lower.contains("@generated")
+            || lower.contains("code generated by")
+            || lower.contains("do not edit")
+            || lower.contains("do not modify")
+            || lower.contains("automatically generated")
+            || lower.contains("auto-generated")
+            || lower.contains("autogenerated"),
+    )
+}
+
+/// Check if `path` exceeds `max_bytes` or `max_lines`. Byte cap first
+/// (cheap), then line cap (streaming with early exit).
+pub fn is_oversized(path: &Path, max_bytes: u64, max_lines: u32) -> Result<bool> {
+    let meta = std::fs::metadata(path)?;
+    if meta.len() > max_bytes {
+        return Ok(true);
+    }
+    let reader = BufReader::new(File::open(path)?);
+    let mut count: u32 = 0;
+    for line in reader.lines() {
+        let _ = line?;
+        count = count.saturating_add(1);
+        if count > max_lines {
+            return Ok(true);
+        }
+    }
+    Ok(false)
+}
--- a/crates/kebab-parse-code/tests/lang.rs
+++ b/crates/kebab-parse-code/tests/lang.rs
@@ -0,0 +1,65 @@
+use kebab_parse_code::code_lang_for_path;
+use std::path::Path;
+
+#[test]
+fn known_extensions_map_to_canonical_identifiers() {
+    let cases = [
+        ("foo.rs", Some("rust")),
+        ("foo.py", Some("python")),
+        ("foo.pyi", Some("python")),
+        ("foo.ts", Some("typescript")),
+        ("foo.tsx", Some("typescript")),
+        ("foo.js", Some("javascript")),
+        ("foo.mjs", Some("javascript")),
+        ("foo.cjs", Some("javascript")),
+        ("foo.jsx", Some("javascript")),
+        ("foo.go", Some("go")),
+        ("foo.java", Some("java")),
+        ("foo.kt", Some("kotlin")),
+        ("foo.kts", Some("kotlin")),
+        ("foo.c", Some("c")),
+        ("foo.h", Some("c")),
+        ("foo.cpp", Some("cpp")),
+        ("foo.cc", Some("cpp")),
+        ("foo.cxx", Some("cpp")),
+        ("foo.hpp", Some("cpp")),
+        ("foo.hh", Some("cpp")),
+        ("foo.hxx", Some("cpp")),
+        ("foo.yaml", Some("yaml")),
+        ("foo.yml", Some("yaml")),
+        ("foo.toml", Some("toml")),
+        ("foo.json", Some("json")),
+        ("foo.sh", Some("shell")),
+        ("foo.bash", Some("shell")),
+        ("foo.zsh", Some("shell")),
+        ("foo.mk", Some("make")),
+    ];
+    for (path, expected) in cases {
+        assert_eq!(
+            code_lang_for_path(Path::new(path)),
+            expected,
+            "path = {path}"
+        );
+    }
+}
+
+#[test]
+fn special_filenames_map_to_identifiers() {
+    assert_eq!(code_lang_for_path(Path::new("Dockerfile")), Some("dockerfile"));
+    assert_eq!(code_lang_for_path(Path::new("foo.dockerfile")), Some("dockerfile"));
+    assert_eq!(code_lang_for_path(Path::new("Makefile")), Some("make"));
+    assert_eq!(code_lang_for_path(Path::new("GNUmakefile")), Some("make"));
+}
+
+#[test]
+fn unknown_extension_returns_none() {
+    assert_eq!(code_lang_for_path(Path::new("foo.docx")), None);
+    assert_eq!(code_lang_for_path(Path::new("foo")), None);
+    assert_eq!(code_lang_for_path(Path::new("foo.unknown")), None);
+}
+
+#[test]
+fn case_insensitive() {
+    assert_eq!(code_lang_for_path(Path::new("Foo.RS")), Some("rust"));
+    assert_eq!(code_lang_for_path(Path::new("FOO.YAML")), Some("yaml"));
+}
--- a/crates/kebab-parse-code/tests/repo.rs
+++ b/crates/kebab-parse-code/tests/repo.rs
@@ -0,0 +1,62 @@
+use kebab_parse_code::repo::detect_repo;
+use std::fs;
+use std::process::Command;
+use tempfile::TempDir;
+
+fn init_git_repo(root: &std::path::Path) {
+    let run = |args: &[&str]| {
+        Command::new("git")
+            .args(args)
+            .current_dir(root)
+            .status()
+            .expect("git command failed");
+    };
+    run(&["init", "-q"]);
+    run(&["config", "user.email", "test@test"]);
+    run(&["config", "user.name", "test"]);
+    fs::write(root.join("README.md"), "hi").unwrap();
+    run(&["add", "README.md"]);
+    run(&["commit", "-q", "-m", "init"]);
+}
+
+#[test]
+fn detect_repo_returns_none_outside_git() {
+    let tmp = TempDir::new().unwrap();
+    let nested = tmp.path().join("a/b/c.txt");
+    fs::create_dir_all(nested.parent().unwrap()).unwrap();
+    fs::write(&nested, "x").unwrap();
+    assert!(detect_repo(&nested).is_none());
+}
+
+#[test]
+fn detect_repo_walks_up_to_git_dir() {
+    let tmp = TempDir::new().unwrap();
+    let repo_root = tmp.path().join("myrepo");
+    fs::create_dir_all(&repo_root).unwrap();
+    init_git_repo(&repo_root);
+    let nested = repo_root.join("src/deep/file.rs");
+    fs::create_dir_all(nested.parent().unwrap()).unwrap();
+    fs::write(&nested, "x").unwrap();
+
+    let meta = detect_repo(&nested).expect("should detect repo");
+    assert_eq!(meta.name, "myrepo");
+    assert!(meta.branch.is_some());
+    assert!(meta.commit.is_some());
+    assert_eq!(meta.commit.as_ref().unwrap().len(), 40);
+}
+
+#[test]
+fn detect_repo_returns_consistent_metadata_for_paths_in_same_repo() {
+    let tmp = TempDir::new().unwrap();
+    let repo_root = tmp.path().join("myrepo");
+    fs::create_dir_all(&repo_root).unwrap();
+    init_git_repo(&repo_root);
+    let f1 = repo_root.join("a.rs");
+    let f2 = repo_root.join("b.rs");
+    fs::write(&f1, "x").unwrap();
+    fs::write(&f2, "x").unwrap();
+    let m1 = detect_repo(&f1).unwrap();
+    let m2 = detect_repo(&f2).unwrap();
+    assert_eq!(m1.name, m2.name);
+    assert_eq!(m1.commit, m2.commit);
+}
--- a/crates/kebab-parse-code/tests/skip.rs
+++ b/crates/kebab-parse-code/tests/skip.rs
@@ -0,0 +1,74 @@
+use kebab_parse_code::skip::{BUILTIN_BLACKLIST, is_generated_file, is_oversized};
+use std::fs;
+use tempfile::NamedTempFile;
+
+#[test]
+fn generated_header_markers_trigger_skip() {
+    let cases = [
+        "// @generated\nfn foo() {}\n",
+        "// Code generated by tonic-build. DO NOT EDIT.\nfn x() {}\n",
+        "/* DO NOT EDIT */\nfn x() {}\n",
+        "/* do not modify */\nfn x() {}\n",
+        "// AUTOMATICALLY GENERATED\nfn x() {}\n",
+        "# auto-generated\ndef x(): pass\n",
+        "// autogenerated\nfn x() {}\n",
+    ];
+    for content in cases {
+        let f = NamedTempFile::new().unwrap();
+        fs::write(f.path(), content).unwrap();
+        assert!(is_generated_file(f.path()).unwrap(), "content: {content:?}");
+    }
+}
+
+#[test]
+fn normal_code_is_not_flagged_generated() {
+    let f = NamedTempFile::new().unwrap();
+    fs::write(f.path(), "fn main() {\n    println!(\"hi\");\n}\n").unwrap();
+    assert!(!is_generated_file(f.path()).unwrap());
+}
+
+#[test]
+fn is_generated_returns_false_for_empty_file() {
+    let f = NamedTempFile::new().unwrap();
+    fs::write(f.path(), "").unwrap();
+    assert!(!is_generated_file(f.path()).unwrap());
+}
+
+#[test]
+fn oversized_by_bytes_returns_true() {
+    let f = NamedTempFile::new().unwrap();
+    let body: String = "x".repeat(300_000);
+    fs::write(f.path(), &body).unwrap();
+    assert!(is_oversized(f.path(), 262_144, 5_000).unwrap());
+}
+
+#[test]
+fn oversized_by_lines_returns_true() {
+    let f = NamedTempFile::new().unwrap();
+    let body: String = "x\n".repeat(6_000);
+    fs::write(f.path(), &body).unwrap();
+    assert!(is_oversized(f.path(), 262_144, 5_000).unwrap());
+}
+
+#[test]
+fn small_file_returns_false_for_oversize() {
+    let f = NamedTempFile::new().unwrap();
+    fs::write(f.path(), "fn foo() {}\n").unwrap();
+    assert!(!is_oversized(f.path(), 262_144, 5_000).unwrap());
+}
+
+#[test]
+fn builtin_blacklist_has_exactly_six_entries() {
+    assert_eq!(BUILTIN_BLACKLIST.len(), 6);
+    let expected = [
+        "**/node_modules/**",
+        "**/target/**",
+        "**/__pycache__/**",
+        "**/.venv/**",
+        "**/venv/**",
+        "**/env/**",
+    ];
+    for pat in expected {
+        assert!(BUILTIN_BLACKLIST.contains(&pat), "missing pattern: {pat}");
+    }
+}
--- a/crates/kebab-parse-image/src/lib.rs
+++ b/crates/kebab-parse-image/src/lib.rs
@@ -190,6 +190,10 @@ impl Extractor for ImageExtractor {
            trust_level: TrustLevel::Primary,
            user_id_alias: None,
            user,
+            repo: None,
+            git_branch: None,
+            git_commit: None,
+            code_lang: None,
        };

        tracing::debug!(
--- a/crates/kebab-parse-md/src/frontmatter.rs
+++ b/crates/kebab-parse-md/src/frontmatter.rs
@@ -471,6 +471,10 @@ fn derive_metadata(
        trust_level,
        user_id_alias,
        user,
+        repo: None,
+        git_branch: None,
+        git_commit: None,
+        code_lang: None,
    }
 }

--- a/crates/kebab-parse-pdf/src/lib.rs
+++ b/crates/kebab-parse-pdf/src/lib.rs
@@ -194,6 +194,10 @@ impl Extractor for PdfTextExtractor {
            trust_level: TrustLevel::Primary,
            user_id_alias: None,
            user,
+            repo: None,
+            git_branch: None,
+            git_commit: None,
+            code_lang: None,
        };

        tracing::debug!(
--- a/crates/kebab-rag/src/pipeline.rs
+++ b/crates/kebab-rag/src/pipeline.rs
@@ -1170,6 +1170,8 @@ mod stream_event_serde_tests {
            indexed_at: datetime!(2026-05-09 12:00:00 UTC),
            stale: false,
            score_kind: kebab_core::ScoreKind::Rrf,
+            repo: None,
+            code_lang: None,
        }
    }

--- a/crates/kebab-rag/tests/common/mod.rs
+++ b/crates/kebab-rag/tests/common/mod.rs
@@ -171,6 +171,8 @@ pub fn mk_hit_with_indexed_at(
        indexed_at,
        stale: false,
        score_kind: kebab_core::ScoreKind::Rrf,
+        repo: None,
+        code_lang: None,
    }
 }

--- a/crates/kebab-search/src/hybrid.rs
+++ b/crates/kebab-search/src/hybrid.rs
@@ -509,6 +509,8 @@ mod tests {
            indexed_at: time::OffsetDateTime::UNIX_EPOCH,
            stale: false,
            score_kind: kebab_core::ScoreKind::Rrf,
+            repo: None,
+            code_lang: None,
        }
    }

@@ -760,6 +762,8 @@ mod tests {
                indexed_at: time::OffsetDateTime::UNIX_EPOCH,
                stale: false,
                score_kind: kebab_core::ScoreKind::Rrf,
+                repo: None,
+                code_lang: None,
            }
        }

--- a/crates/kebab-search/src/lexical.rs
+++ b/crates/kebab-search/src/lexical.rs
@@ -470,6 +470,8 @@ fn build_hit(
        // in `RagPipeline::ask` against the configured threshold.
        stale: false,
        score_kind: ScoreKind::Bm25,
+        repo: None,
+        code_lang: None,
    })
 }

--- a/crates/kebab-search/src/vector.rs
+++ b/crates/kebab-search/src/vector.rs
@@ -327,6 +327,8 @@ fn build_hit(
        // in `RagPipeline::ask` against the configured threshold.
        stale: false,
        score_kind: ScoreKind::Cosine,
+        repo: None,
+        code_lang: None,
    })
 }

--- a/crates/kebab-source-fs/Cargo.toml
+++ b/crates/kebab-source-fs/Cargo.toml
@@ -10,6 +10,7 @@ description   = "Local filesystem SourceConnector — walks workspace.root + app
 [dependencies]
 kebab-core = { path = "../kebab-core" }
 kebab-config = { path = "../kebab-config" }
+kebab-parse-code = { path = "../kebab-parse-code" }
 anyhow       = { workspace = true }
 serde        = { workspace = true }
 time         = { workspace = true }
--- a/crates/kebab-source-fs/src/connector.rs
+++ b/crates/kebab-source-fs/src/connector.rs
@@ -10,20 +10,20 @@
 //! }
 //! ```

-use std::path::PathBuf;
+use std::path::{Path, PathBuf};

 use anyhow::{Context, Result};
 use time::OffsetDateTime;

 use kebab_config::Config;
 use kebab_core::{
-    AssetStorage, Checksum, RawAsset, SourceConnector, SourceScope, SourceUri,
+    AssetStorage, Checksum, RawAsset, SkipExamples, SourceConnector, SourceScope, SourceUri,
    id_for_asset, to_posix,
 };

 use crate::hash::hash_file;
 use crate::media::media_type_for;
-use crate::walker::{build_overrides, read_kbignore, walk_files};
+use crate::walker::{SkipCategory, WalkOverrides, build_overrides, read_kbignore, walk_files_with_skips};

 /// Local-filesystem `SourceConnector`. Constructed once from `Config`,
 /// reused across `scan` calls.
@@ -36,10 +36,16 @@ use crate::walker::{build_overrides, read_kbignore, walk_files};
 ///     construction time.
 ///   - `copy_threshold_bytes`: `config.storage.copy_threshold_mb * 1 MiB`
 ///     pre-multiplied so we don't recompute per file.
+///   - `skip_generated_header`: `config.ingest.code.skip_generated_header`.
+///   - `max_file_bytes`: `config.ingest.code.max_file_bytes`.
+///   - `max_file_lines`: `config.ingest.code.max_file_lines`.
 pub struct FsSourceConnector {
    default_root: PathBuf,
    default_exclude: Vec<String>,
    copy_threshold_bytes: u64,
+    skip_generated_header: bool,
+    max_file_bytes: u64,
+    max_file_lines: u32,
 }

 impl FsSourceConnector {
@@ -59,109 +65,239 @@ impl FsSourceConnector {
            default_root: root,
            default_exclude: config.workspace.exclude.clone(),
            copy_threshold_bytes,
+            skip_generated_header: config.ingest.code.skip_generated_header,
+            max_file_bytes: config.ingest.code.max_file_bytes,
+            max_file_lines: config.ingest.code.max_file_lines,
        })
    }
-}

-impl SourceConnector for FsSourceConnector {
-    fn scan(&self, scope: &SourceScope) -> Result<Vec<RawAsset>> {
-        // `SourceScope::root` overrides config root when non-empty. This
-        // matches the design's "scope is the per-call lens; config is the
-        // default" split (§7.1).
+    /// Resolve the effective root and build the merged + per-source overrides.
+    fn resolve_scan_params(
+        &self,
+        scope: &SourceScope,
+    ) -> Result<(PathBuf, WalkOverrides)> {
        let root = if scope.root.as_os_str().is_empty() {
            self.default_root.clone()
        } else {
            scope.root.clone()
        };

-        // Union: config.workspace.exclude ∪ scope.exclude ∪ .kebabignore.
-        // Per §6.2 the union of `.kebabignore` and `config.workspace.exclude`
-        // is the filter set. `scope.exclude` is added on top so a caller
-        // can layer a per-call narrowing.
        let mut excludes = self.default_exclude.clone();
        excludes.extend(scope.exclude.iter().cloned());
-        // .kebabignore is re-read on every scan() so users can edit it without
-        // restarting any long-running process.
        let kbignore = read_kbignore(&root)?;

        let overrides = build_overrides(&root, &excludes, &kbignore)?;
+        Ok((root, overrides))
+    }

-        // TODO(P1-2/P1-3 router): apply SourceScope::include glob filter at the
-        // extractor router layer once that crate lands. SourceConnector emits all
-        // non-excluded files; routing by include-glob is a downstream concern
-        // (design §6.2 + §7.2 are silent on this split, treat it as router work).
-        //
-        // `scope.include` is intentionally ignored at this stage of the
-        // pipeline: per §6.2 the workspace-level include lives in
-        // `WorkspaceCfg` and is enforced by the asset writer / extractors.
-        // Surfacing it here would double-filter Markdown vs PDF before the
-        // extractor router gets to see them.
-        if !scope.include.is_empty() {
-            tracing::debug!(
-                count = scope.include.len(),
-                "FsSourceConnector ignores scope.include — handled by extractor router"
-            );
-        }
+    /// Scan the workspace and return the accepted assets together with
+    /// per-category skip counts and sample paths for `IngestReport`.
+    ///
+    /// This is the **preferred entry point** for `kebab-app`: it provides
+    /// all the information needed to populate `IngestReport.skipped_gitignore`,
+    /// `skipped_kebabignore`, `skipped_builtin_blacklist`, and `skip_examples`
+    /// without a second walker pass.
+    pub fn scan_with_skips(
+        &self,
+        scope: &SourceScope,
+    ) -> Result<(Vec<RawAsset>, FsScanSkips)> {
+        let (root, overrides) = self.resolve_scan_params(scope)?;

-        let files = walk_files(&root, &overrides)?;
+        log_scope_include_warning(scope);

-        let mut assets = Vec::with_capacity(files.len());
-        for abs in &files {
-            // `to_posix` does NFC + leading `./` strip + `#` rejection.
-            // Compute the workspace-relative path before handing to it so
-            // emitted `WorkspacePath` is always relative.
-            let rel = abs.strip_prefix(&root).unwrap_or(abs);
-            let workspace_path = match to_posix(rel) {
-                Ok(p) => p,
-                Err(e) => {
-                    // A path containing `#` is the only documented reason
-                    // `to_posix` fails today. Drop the file with a warning
-                    // rather than aborting the entire scan — a single bad
-                    // filename should not nuke a 10 000-file ingest.
-                    tracing::warn!(
-                        path = %abs.display(),
-                        error = %e,
-                        "skipping file: path is not a valid WorkspacePath",
+        let (files, skipped_entries) = walk_files_with_skips(&root, &overrides)?;
+
+        // Accumulate per-category skip counts and sample paths.
+        let mut fs_skips = FsScanSkips::default();
+        for entry in &skipped_entries {
+            match entry.category {
+                SkipCategory::BuiltinBlacklist => {
+                    fs_skips.skipped_builtin_blacklist =
+                        fs_skips.skipped_builtin_blacklist.saturating_add(1);
+                    push_sample(
+                        &mut fs_skips.skip_examples.builtin_blacklist,
+                        &entry.path,
+                        &root,
                    );
-                    continue;
                }
-            };
-
-            let media_type = media_type_for(abs);
-            let (byte_len, full_hex) = hash_file(abs)
-                .with_context(|| format!("hashing {}", abs.display()))?;
-            let checksum = Checksum(full_hex.clone());
-            let asset_id = id_for_asset(&full_hex);
-
-            // Storage variant signals *intent*, not an actual copy.
-            // P1-6 (asset writer) is responsible for the on-disk copy.
-            let stored = if byte_len > self.copy_threshold_bytes {
-                AssetStorage::Reference {
-                    path: abs.clone(),
-                    sha: checksum.clone(),
+                SkipCategory::Gitignore => {
+                    fs_skips.skipped_gitignore =
+                        fs_skips.skipped_gitignore.saturating_add(1);
+                    push_sample(
+                        &mut fs_skips.skip_examples.gitignore,
+                        &entry.path,
+                        &root,
+                    );
                }
-            } else {
-                AssetStorage::Copied { path: abs.clone() }
-            };
-
-            assets.push(RawAsset {
-                asset_id,
-                source_uri: SourceUri::File(abs.clone()),
-                workspace_path,
-                media_type,
-                byte_len,
-                checksum,
-                discovered_at: OffsetDateTime::now_utc(),
-                stored,
-            });
+                SkipCategory::Kebabignore => {
+                    fs_skips.skipped_kebabignore =
+                        fs_skips.skipped_kebabignore.saturating_add(1);
+                    // kebabignore intentionally NOT in skip_examples per spec §5.5.
+                }
+                SkipCategory::Other => {
+                    // DEFAULT_EXCLUDES or config.workspace.exclude — no dedicated
+                    // IngestReport counter; these are lumped into the existing
+                    // `skipped` field by kebab-app.
+                }
+            }
        }

-        // Determinism: sort by workspace_path. WorkspacePath is a String
-        // newtype with stable lexicographic ordering. Two scans of the
-        // same tree must produce identical Vec<RawAsset> modulo the
-        // wall-clock `discovered_at` field.
-        assets.sort_by(|a, b| a.workspace_path.0.cmp(&b.workspace_path.0));
+        // p10-1A-1: apply per-file generated-header + size-cap checks on files
+        // that passed the override (gitignore/builtin/kebabignore) matching.
+        // These run AFTER the walk-level skip attribution, BEFORE parse dispatch.
+        let mut accepted_files: Vec<PathBuf> = Vec::with_capacity(files.len());
+        for abs_path in files {
+            let rel_path = abs_path.strip_prefix(&root).unwrap_or(&abs_path);

+            // Generated-header sniff (config-gated).
+            if self.skip_generated_header
+                && kebab_parse_code::is_generated_file(&abs_path).unwrap_or(false)
+            {
+                fs_skips.skipped_generated =
+                    fs_skips.skipped_generated.saturating_add(1);
+                push_sample(
+                    &mut fs_skips.skip_examples.generated,
+                    &abs_path,
+                    &root,
+                );
+                tracing::debug!(
+                    path = %rel_path.display(),
+                    "skip: generated-file marker detected"
+                );
+                continue;
+            }
+
+            // Size-cap check (byte or line limit).
+            if kebab_parse_code::is_oversized(
+                &abs_path,
+                self.max_file_bytes,
+                self.max_file_lines,
+            )
+            .unwrap_or(false)
+            {
+                fs_skips.skipped_size_exceeded =
+                    fs_skips.skipped_size_exceeded.saturating_add(1);
+                push_sample(
+                    &mut fs_skips.skip_examples.size_exceeded,
+                    &abs_path,
+                    &root,
+                );
+                tracing::debug!(
+                    path = %rel_path.display(),
+                    max_bytes = self.max_file_bytes,
+                    max_lines = self.max_file_lines,
+                    "skip: file exceeds size cap"
+                );
+                continue;
+            }
+
+            accepted_files.push(abs_path);
+        }
+
+        let assets = build_assets(&accepted_files, &root, self.copy_threshold_bytes)?;
+        Ok((assets, fs_skips))
+    }
+}
+
+/// Per-category skip counts and sample paths returned alongside the asset list
+/// by [`FsSourceConnector::scan_with_skips`].
+///
+/// Populated from the walker's per-source matchers without a second pass.
+#[derive(Debug, Default)]
+pub struct FsScanSkips {
+    pub skipped_gitignore: u32,
+    pub skipped_kebabignore: u32,
+    pub skipped_builtin_blacklist: u32,
+    /// p10-1A-1: files skipped because their first ~512 bytes contained a
+    /// generated-file marker (`@generated`, `do not edit`, …).
+    pub skipped_generated: u32,
+    /// p10-1A-1: files skipped because they exceeded `max_file_bytes` or
+    /// `max_file_lines` in `[ingest.code]`.
+    pub skipped_size_exceeded: u32,
+    /// Sample paths per spec §5.5 (≤ 5 per category). Paths are
+    /// workspace-relative POSIX strings when available, absolute otherwise.
+    pub skip_examples: SkipExamples,
+}
+
+/// Push a path into a sample vec (cap = 5) as a workspace-relative POSIX
+/// string. Falls back to the lossy absolute path if relativisation fails.
+fn push_sample(samples: &mut Vec<String>, abs: &Path, root: &Path) {
+    if samples.len() >= 5 {
+        return;
+    }
+    let rel = abs.strip_prefix(root).unwrap_or(abs);
+    // Best-effort POSIX string; any non-UTF8 char → replacement char.
+    let s = rel.to_string_lossy().replace('\\', "/");
+    samples.push(s);
+}
+
+/// Convert a list of absolute file paths to `Vec<RawAsset>`, sorted by
+/// workspace-relative POSIX path for determinism.
+fn build_assets(
+    files: &[PathBuf],
+    root: &Path,
+    copy_threshold_bytes: u64,
+) -> Result<Vec<RawAsset>> {
+    let mut assets = Vec::with_capacity(files.len());
+    for abs in files {
+        let rel = abs.strip_prefix(root).unwrap_or(abs);
+        let workspace_path = match to_posix(rel) {
+            Ok(p) => p,
+            Err(e) => {
+                tracing::warn!(
+                    path = %abs.display(),
+                    error = %e,
+                    "skipping file: path is not a valid WorkspacePath",
+                );
+                continue;
+            }
+        };
+
+        let media_type = media_type_for(abs);
+        let (byte_len, full_hex) = hash_file(abs)
+            .with_context(|| format!("hashing {}", abs.display()))?;
+        let checksum = Checksum(full_hex.clone());
+        let asset_id = id_for_asset(&full_hex);
+
+        let stored = if byte_len > copy_threshold_bytes {
+            AssetStorage::Reference {
+                path: abs.clone(),
+                sha: checksum.clone(),
+            }
+        } else {
+            AssetStorage::Copied { path: abs.clone() }
+        };
+
+        assets.push(RawAsset {
+            asset_id,
+            source_uri: SourceUri::File(abs.clone()),
+            workspace_path,
+            media_type,
+            byte_len,
+            checksum,
+            discovered_at: OffsetDateTime::now_utc(),
+            stored,
+        });
+    }
+
+    assets.sort_by(|a, b| a.workspace_path.0.cmp(&b.workspace_path.0));
+    Ok(assets)
+}
+
+fn log_scope_include_warning(scope: &SourceScope) {
+    if !scope.include.is_empty() {
+        tracing::debug!(
+            count = scope.include.len(),
+            "FsSourceConnector ignores scope.include — handled by extractor router"
+        );
+    }
+}
+
+impl SourceConnector for FsSourceConnector {
+    fn scan(&self, scope: &SourceScope) -> Result<Vec<RawAsset>> {
+        // Delegate to scan_with_skips; discard the skip counts.
+        // Callers that need skip attribution should call scan_with_skips directly.
+        let (assets, _skips) = self.scan_with_skips(scope)?;
        Ok(assets)
    }
 }
@@ -401,4 +537,241 @@ mod tests {
        let v2 = conn2.scan(&SourceScope::default()).unwrap();
        assert!(matches!(v2[0].stored, AssetStorage::Copied { .. }));
    }
+
+    // ── IngestReport skip counter wiring tests ───────────────────────────────
+
+    #[test]
+    fn scan_with_skips_counts_gitignored_files() {
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::write(root.join(".gitignore"), "*.log\n").unwrap();
+        std::fs::write(root.join("ok.md"), b"# ok").unwrap();
+        std::fs::write(root.join("skipme.log"), b"x").unwrap();
+
+        let conn =
+            FsSourceConnector::new(&cfg_with_root(root.to_str().unwrap()))
+                .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_gitignore >= 1,
+            "skipped_gitignore should be >= 1; got {}",
+            skips.skipped_gitignore
+        );
+        assert!(
+            skips.skip_examples.gitignore.iter().any(|p| p.contains("skipme.log")),
+            "skip_examples.gitignore should contain 'skipme.log'; got: {:?}",
+            skips.skip_examples.gitignore
+        );
+        // kebabignore counter must be 0 — file matched gitignore, not kebabignore.
+        assert_eq!(skips.skipped_kebabignore, 0);
+    }
+
+    #[test]
+    fn scan_with_skips_counts_builtin_blacklist_dirs() {
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::create_dir_all(root.join("node_modules/foo")).unwrap();
+        std::fs::write(root.join("node_modules/foo/bar.js"), b"x").unwrap();
+        std::fs::write(root.join("ok.md"), b"# ok").unwrap();
+
+        let conn =
+            FsSourceConnector::new(&cfg_with_root(root.to_str().unwrap()))
+                .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_builtin_blacklist >= 1,
+            "skipped_builtin_blacklist should be >= 1; got {}",
+            skips.skipped_builtin_blacklist
+        );
+        assert!(
+            skips.skip_examples.builtin_blacklist.iter().any(|p| p.contains("node_modules")),
+            "skip_examples.builtin_blacklist should contain a node_modules path; got: {:?}",
+            skips.skip_examples.builtin_blacklist
+        );
+    }
+
+    #[test]
+    fn scan_with_skips_kebabignore_increments_counter_no_example() {
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::write(root.join(".kebabignore"), "*.secret\n").unwrap();
+        std::fs::write(root.join("ok.md"), b"x").unwrap();
+        std::fs::write(root.join("creds.secret"), b"pw").unwrap();
+
+        let conn =
+            FsSourceConnector::new(&cfg_with_root(root.to_str().unwrap()))
+                .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_kebabignore >= 1,
+            "skipped_kebabignore should be >= 1; got {}",
+            skips.skipped_kebabignore
+        );
+        // Per spec §5.5: kebabignore is intentionally NOT in skip_examples.
+        assert!(
+            skips.skip_examples.gitignore.is_empty(),
+            "gitignore examples should be empty; got: {:?}",
+            skips.skip_examples.gitignore
+        );
+        assert!(
+            skips.skip_examples.builtin_blacklist.is_empty(),
+            "builtin_blacklist examples should be empty; got: {:?}",
+            skips.skip_examples.builtin_blacklist
+        );
+    }
+
+    #[test]
+    fn scan_with_skips_builtin_priority_over_gitignore() {
+        // node_modules/ matches both BUILTIN_BLACKLIST and a .gitignore entry.
+        // It must be attributed to builtin (spec §5.2 priority order).
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::write(root.join(".gitignore"), "node_modules/\n").unwrap();
+        std::fs::create_dir_all(root.join("node_modules/pkg")).unwrap();
+        std::fs::write(root.join("node_modules/pkg/index.js"), b"x").unwrap();
+        std::fs::write(root.join("ok.md"), b"x").unwrap();
+
+        let conn =
+            FsSourceConnector::new(&cfg_with_root(root.to_str().unwrap()))
+                .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_builtin_blacklist >= 1,
+            "builtin counter should be >= 1; got {}",
+            skips.skipped_builtin_blacklist
+        );
+        assert_eq!(
+            skips.skipped_gitignore, 0,
+            "gitignore counter must be 0 when builtin wins; got {}",
+            skips.skipped_gitignore
+        );
+    }
+
+    #[test]
+    fn skip_examples_cap_at_five() {
+        // Write 7 .log files — skip_examples.gitignore must cap at 5.
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::write(root.join(".gitignore"), "*.log\n").unwrap();
+        for i in 0..7 {
+            std::fs::write(root.join(format!("f{i}.log")), b"x").unwrap();
+        }
+        std::fs::write(root.join("ok.md"), b"x").unwrap();
+
+        let conn =
+            FsSourceConnector::new(&cfg_with_root(root.to_str().unwrap()))
+                .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert_eq!(skips.skipped_gitignore, 7, "should count all 7");
+        assert_eq!(
+            skips.skip_examples.gitignore.len(),
+            5,
+            "skip_examples.gitignore must cap at 5; got: {:?}",
+            skips.skip_examples.gitignore
+        );
+    }
+
+    // ── p10-1A-1: generated-header + size-cap skip tests ────────────────────
+
+    /// Helper: connector with default ingest.code settings.
+    fn cfg_with_root_defaults(root: &str) -> Config {
+        // cfg_with_root already uses Config::defaults() which has
+        // skip_generated_header=true, max_file_bytes=262144, max_file_lines=5000.
+        cfg_with_root(root)
+    }
+
+    /// Helper: connector with overridden size caps.
+    fn cfg_with_size_cap(root: &str, max_bytes: u64, max_lines: u32) -> Config {
+        let mut c = cfg_with_root(root);
+        c.ingest.code.max_file_bytes = max_bytes;
+        c.ingest.code.max_file_lines = max_lines;
+        c
+    }
+
+    #[test]
+    fn ingest_report_counts_generated_files() {
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::write(root.join("normal.md"), "# hi").unwrap();
+        std::fs::write(root.join("autogen.rs"), "// @generated\nfn x() {}\n").unwrap();
+
+        let conn = FsSourceConnector::new(
+            &cfg_with_root_defaults(root.to_str().unwrap()),
+        )
+        .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_generated >= 1,
+            "skipped_generated should be >= 1; got {}",
+            skips.skipped_generated
+        );
+        assert!(
+            skips.skip_examples.generated.iter().any(|p| p.contains("autogen")),
+            "skip_examples.generated should contain 'autogen'; got: {:?}",
+            skips.skip_examples.generated
+        );
+        // The normal.md file must NOT be skipped.
+        let asset_paths: Vec<_> = _assets
+            .iter()
+            .map(|a| a.workspace_path.0.clone())
+            .collect();
+        assert!(
+            asset_paths.iter().any(|p| p.contains("normal")),
+            "normal.md should still be emitted; assets: {asset_paths:?}"
+        );
+    }
+
+    #[test]
+    fn ingest_report_counts_oversized_files_by_bytes() {
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::write(root.join("normal.md"), "# hi").unwrap();
+        // Write a file larger than the 1024-byte cap.
+        let big: String = "x\n".repeat(1_000);
+        std::fs::write(root.join("huge.rs"), &big).unwrap();
+
+        let conn = FsSourceConnector::new(
+            &cfg_with_size_cap(root.to_str().unwrap(), 1024, 5_000),
+        )
+        .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_size_exceeded >= 1,
+            "skipped_size_exceeded should be >= 1; got {}",
+            skips.skipped_size_exceeded
+        );
+        assert!(
+            skips.skip_examples.size_exceeded.iter().any(|p| p.contains("huge")),
+            "skip_examples.size_exceeded should contain 'huge'; got: {:?}",
+            skips.skip_examples.size_exceeded
+        );
+    }
+
+    #[test]
+    fn ingest_report_size_cap_by_line_count() {
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        // 6000 lines but small per-line — line cap of 5000 should trigger.
+        let body: String = "x\n".repeat(6_000);
+        std::fs::write(root.join("longfile.rs"), &body).unwrap();
+
+        let conn = FsSourceConnector::new(
+            &cfg_with_size_cap(root.to_str().unwrap(), 262_144, 5_000),
+        )
+        .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_size_exceeded >= 1,
+            "skipped_size_exceeded should be >= 1 (line cap); got {}",
+            skips.skipped_size_exceeded
+        );
+    }
 }
--- a/crates/kebab-source-fs/src/lib.rs
+++ b/crates/kebab-source-fs/src/lib.rs
@@ -13,4 +13,4 @@ mod hash;
 mod media;
 mod walker;

-pub use connector::FsSourceConnector;
+pub use connector::{FsScanSkips, FsSourceConnector};
--- a/crates/kebab-source-fs/src/walker.rs
+++ b/crates/kebab-source-fs/src/walker.rs
@@ -1,12 +1,15 @@
 //! Directory walker with gitignore-style filtering and symlink-cycle
 //! protection.
 //!
-//! Filter set (per task spec, design §6.2):
-//!   - `config.workspace.exclude` (passed in by `FsSourceConnector`)
-//!   - `<root>/.kebabignore` (optional file at workspace root)
-//!   - default-excludes for `.DS_Store` and macOS resource forks (`._*`)
+//! Filter set, in order of application:
+//!   - DEFAULT_EXCLUDES (constants — VCS dirs, build artifacts, never-useful)
+//!   - `config.workspace.exclude` (user-supplied per workspace)
+//!   - `<root>/.kebabignore` (user-supplied kebab-specific exclude)
+//!   - Built-in safety-net blacklist (`node_modules/`, `target/`, etc. —
+//!     spec §5.2, applied via `kebab_parse_code::BUILTIN_BLACKLIST`)
+//!   - `<root>/.gitignore` (repo-root only, no nested cascade — spec §5.2)
 //!
-//! All three are merged via `ignore::overrides::OverrideBuilder`, which
+//! All five are merged via `ignore::overrides::OverrideBuilder`, which
 //! gives full gitignore semantics (anchors, `!` negation, `**`, etc.). We
 //! prepend `!` to each pattern because `OverrideBuilder` treats positive
 //! patterns as "include" and negative as "exclude" — see §"Filter set"
@@ -18,6 +21,16 @@
 //! `follow_links(true)`; we layer our own visited-set on top, keyed by the
 //! canonical path of every entry, and skip any entry we've already seen.
 //!
+//! ## Per-source skip attribution (spec §5.5)
+//!
+//! `walk_files_with_skips` returns a `WalkOverrides` struct that carries
+//! both a `combined` matcher (used for the actual walk decision) and three
+//! per-source matchers (`gitignore`, `kebabignore`, `builtin`). When an
+//! entry is excluded, `classify_skip` probes the per-source matchers in
+//! priority order (built-in > gitignore > kebabignore) to determine which
+//! `IngestReport` counter should be incremented — without requiring a
+//! second walker pass over the filesystem.
+//!
 //! ## Why `walkdir` instead of `ignore::WalkBuilder`?
 //!
 //! `ignore::WalkBuilder` bundles gitignore semantics + cycle detection in
@@ -32,7 +45,7 @@ use std::path::{Path, PathBuf};

 use anyhow::{Context, Result};
 use ignore::overrides::{Override, OverrideBuilder};
-use walkdir::{DirEntry, WalkDir};
+use walkdir::WalkDir;

 /// Default-excludes baked into the connector. These are NOT configurable;
 /// they cover noise that is never useful to ingest and would otherwise need
@@ -46,37 +59,222 @@ const DEFAULT_EXCLUDES: &[&str] = &[
    "**/._*",
 ];

-/// Build the merged `Override` from `config.workspace.exclude` ∪ `.kebabignore`
-/// ∪ baked-in default excludes.
+/// Per-source `Override` matchers for skip-counter attribution (spec §5.5).
+///
+/// `combined` is the merged union of all sources — used for the actual
+/// "is this entry excluded?" decision in the walker. The three per-source
+/// matchers (`gitignore`, `kebabignore`, `builtin`) are used ONLY when
+/// classifying an already-excluded path for `IngestReport` counter wiring;
+/// they are never consulted for every walked file.
+///
+/// `default_and_config` covers DEFAULT_EXCLUDES + `config.workspace.exclude`
+/// — these do NOT map to any of the three named `IngestReport` counters.
+pub(crate) struct WalkOverrides {
+    /// Merged matcher — same as today's `Override`; used for the walk decision.
+    pub combined: Override,
+    /// Matcher built from `<root>/.gitignore` patterns only.
+    pub gitignore: Override,
+    /// Matcher built from `<root>/.kebabignore` patterns only.
+    pub kebabignore: Override,
+    /// Matcher built from `kebab_parse_code::BUILTIN_BLACKLIST` only.
+    pub builtin: Override,
+}
+
+/// Skip attribution category. Used by the connector when counting per-source
+/// skips for `IngestReport` (spec §5.5).
+///
+/// Priority order per spec §5.2: built-in > gitignore > kebabignore.
+/// A path matching multiple sources is attributed to the first match.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum SkipCategory {
+    BuiltinBlacklist,
+    Gitignore,
+    Kebabignore,
+    /// Matched DEFAULT_EXCLUDES or `config.workspace.exclude`. No dedicated
+    /// counter in `IngestReport` — lumped into the existing `skipped` field.
+    Other,
+}
+
+/// Build a single `Override` from a list of gitignore-style patterns, all
+/// registered as excludes (prepend `!`).
+///
+/// Empty pattern list → an `Override` that matches nothing (i.e. no
+/// exclusions). Callers must strip blanks / comments before passing.
+fn build_single_matcher(root: &Path, patterns: &[&str]) -> Result<Override> {
+    let mut builder = OverrideBuilder::new(root);
+    for pat in patterns {
+        builder
+            .add(&format!("!{pat}"))
+            .with_context(|| format!("invalid pattern: {pat}"))?;
+    }
+    builder.build().context("failed to compile override")
+}
+
+/// Build the builtin-blacklist `Override`, adding directory-level patterns in
+/// addition to the `/**`-suffix ones from `BUILTIN_BLACKLIST`.
+///
+/// BUILTIN_BLACKLIST uses `**/X/**` patterns which match files *inside* X but
+/// NOT the directory entry `X` itself (because `**/X/**` requires a path
+/// component after X). The walker prunes at the directory level (`is_dir=true`),
+/// so we need `**/X` (no trailing `/**`) to also match the directory itself
+/// for attribution purposes.
+fn build_builtin_matcher(root: &Path) -> Result<Override> {
+    let mut builder = OverrideBuilder::new(root);
+    for pat in kebab_parse_code::BUILTIN_BLACKLIST {
+        // Register the original pattern (matches files inside the dir).
+        builder
+            .add(&format!("!{pat}"))
+            .with_context(|| format!("builtin pattern: {pat}"))?;
+        // Also derive a directory-level match by stripping trailing `/**`.
+        // This makes `is_dir=true` checks on the directory itself work.
+        if let Some(dir_pat) = pat.strip_suffix("/**") {
+            builder
+                .add(&format!("!{dir_pat}"))
+                .with_context(|| format!("builtin dir pattern: {dir_pat}"))?;
+        }
+    }
+    builder.build().context("failed to compile builtin override")
+}
+
+/// Owned-string variant of `build_single_matcher` for caller-supplied
+/// `Vec<String>` sources (config.workspace.exclude, .kebabignore).
+fn build_single_matcher_owned(root: &Path, patterns: &[String]) -> Result<Override> {
+    let mut builder = OverrideBuilder::new(root);
+    for pat in patterns {
+        builder
+            .add(&format!("!{pat}"))
+            .with_context(|| format!("invalid pattern: {pat}"))?;
+    }
+    builder.build().context("failed to compile override")
+}
+
+/// Build the merged `WalkOverrides` from all five filter sources, in order:
+/// DEFAULT_EXCLUDES, `config.workspace.exclude`, `.kebabignore`,
+/// built-in safety-net blacklist (`kebab_parse_code::BUILTIN_BLACKLIST`),
+/// and `<root>/.gitignore` (root-only, no nested cascade).
 ///
 /// Each input pattern is registered as an *exclude* (gitignore-style: a
 /// leading `!` flips a positive match to a negative one in the
 /// `OverrideBuilder` API). Order doesn't matter — the union is computed by
 /// the underlying gitignore engine.
+///
+/// The three per-source matchers (`gitignore`, `kebabignore`, `builtin`) are
+/// built in addition to the combined one so the connector can attribute skips
+/// to the correct `IngestReport` counter without a second walker pass.
 pub(crate) fn build_overrides(
    root: &Path,
    config_exclude: &[String],
    kbignore_patterns: &[String],
-) -> Result<Override> {
-    let mut builder = OverrideBuilder::new(root);
+) -> Result<WalkOverrides> {
+    let gitignore_patterns = read_gitignore(root)?;
+
+    // Per-source matchers (for attribution only).
+    let gitignore =
+        build_single_matcher(root, &gitignore_patterns.iter().map(|s| s.as_str()).collect::<Vec<_>>())?;
+    let kebabignore = build_single_matcher_owned(root, kbignore_patterns)?;
+    // Use the directory-aware builtin matcher so that `is_dir=true` checks on
+    // directory entries (e.g., `node_modules/`) are attributed to builtin rather
+    // than to an overlapping gitignore pattern.
+    let builtin = build_builtin_matcher(root)?;
+
+    // Combined matcher — union of all five sources.
+    let mut combined_builder = OverrideBuilder::new(root);

    for pat in DEFAULT_EXCLUDES {
-        builder
+        combined_builder
            .add(&format!("!{pat}"))
            .with_context(|| format!("invalid default-exclude pattern: {pat}"))?;
    }
    for pat in config_exclude {
-        builder
+        combined_builder
            .add(&format!("!{pat}"))
            .with_context(|| format!("invalid workspace.exclude pattern: {pat}"))?;
    }
    for pat in kbignore_patterns {
-        builder
+        combined_builder
            .add(&format!("!{pat}"))
            .with_context(|| format!("invalid .kebabignore pattern: {pat}"))?;
    }
+    for pat in kebab_parse_code::BUILTIN_BLACKLIST {
+        combined_builder
+            .add(&format!("!{pat}"))
+            .with_context(|| format!("built-in blacklist pattern: {pat}"))?;
+    }
+    for pat in &gitignore_patterns {
+        combined_builder
+            .add(&format!("!{pat}"))
+            .with_context(|| format!(".gitignore pattern: {pat}"))?;
+    }
+    let combined = combined_builder
+        .build()
+        .context("failed to compile combined override set")?;

-    builder.build().context("failed to compile override set")
+    Ok(WalkOverrides {
+        combined,
+        gitignore,
+        kebabignore,
+        builtin,
+    })
+}
+
+/// Classify why a path was excluded, using per-source matchers in spec §5.2
+/// priority order: built-in > gitignore > kebabignore > other.
+///
+/// `rel` must be relative to the walker root (same as `Override::matched`
+/// expects). `is_dir` should match what the original walker saw.
+pub(crate) fn classify_skip(rel: &Path, is_dir: bool, ov: &WalkOverrides) -> SkipCategory {
+    if ov.builtin.matched(rel, is_dir).is_ignore() {
+        return SkipCategory::BuiltinBlacklist;
+    }
+    if ov.gitignore.matched(rel, is_dir).is_ignore() {
+        return SkipCategory::Gitignore;
+    }
+    if ov.kebabignore.matched(rel, is_dir).is_ignore() {
+        return SkipCategory::Kebabignore;
+    }
+    SkipCategory::Other
+}
+
+/// Read `<root>/.gitignore` (single-file, root-only — nested cascade is P+).
+/// Missing file → empty Vec. Comments / blanks stripped.
+///
+/// Trailing-slash patterns (`dist/`) in real gitignore mean "match the
+/// directory AND everything inside it". `OverrideBuilder::matched(path,
+/// is_dir=false)` only checks `is_dir` for the trailing-slash variant, so
+/// `dist/bundle.js` would not be matched. We normalize by also emitting a
+/// `<stem>/**` variant so files inside the directory are caught.
+pub(crate) fn read_gitignore(root: &Path) -> Result<Vec<String>> {
+    let p = root.join(".gitignore");
+    if !p.exists() {
+        return Ok(vec![]);
+    }
+    let s = std::fs::read_to_string(&p)
+        .with_context(|| format!("read .gitignore at {}", p.display()))?;
+    let mut out = Vec::new();
+    for line in s.lines() {
+        let trimmed = line.trim();
+        if trimmed.is_empty() || trimmed.starts_with('#') {
+            continue;
+        }
+        // If the pattern starts with `!` (gitignore negation/un-ignore), pass through
+        // as-is. Trailing-slash normalization is unsafe here — the `!`-prefix and `/`-
+        // suffix combined confuse OverrideBuilder (would produce double-`!`).
+        if trimmed.starts_with('!') {
+            out.push(trimmed.to_string());
+            continue;
+        }
+        if let Some(stem) = trimmed.strip_suffix('/') {
+            // Keep the dir-only form so `is_dir=true` matches are still
+            // excluded (e.g., for skip_current_dir in the walker).
+            out.push(trimmed.to_string());
+            // Also emit a glob that catches files inside the directory,
+            // since `is_dir=false` won't satisfy the trailing-slash form.
+            out.push(format!("{stem}/**"));
+        } else {
+            out.push(trimmed.to_string());
+        }
+    }
+    Ok(out)
 }

 /// Read `<root>/.kebabignore` if it exists. Each non-blank, non-comment line is
@@ -96,51 +294,87 @@ pub(crate) fn read_kbignore(root: &Path) -> Result<Vec<String>> {
        .collect())
 }

-/// Iterate every regular file under `root`, applying `overrides` and
-/// detecting symlink cycles. Returns absolute file paths.
+/// Skipped-path record emitted by `walk_files_with_skips`.
+///
+/// `path` is the absolute path of the excluded entry (dir or file).
+/// For excluded directories, this is the directory itself — individual
+/// files inside are not enumerated (the subtree is pruned).
+pub(crate) struct SkippedEntry {
+    pub path: PathBuf,
+    pub category: SkipCategory,
+}
+
+/// Iterate every regular file under `root`, applying `overrides.combined` and
+/// detecting symlink cycles. Returns:
+///   - `accepted`: absolute paths of files that passed all filters.
+///   - `skipped`: entries that were excluded, with attribution.
+///
+/// For excluded *directories*, the directory path itself is returned (not the
+/// individual files inside — the subtree is pruned in one step, matching the
+/// walker's `skip_current_dir` behavior).
 ///
 /// Strategy:
 ///   - `walkdir::WalkDir::follow_links(true)` to traverse symlinks.
+///   - Manual per-entry check (instead of `filter_entry`) so we can capture
+///     the excluded paths for skip attribution.
 ///   - Maintain `visited: HashSet<PathBuf>` of *canonical* paths. Before
 ///     descending into a directory entry, canonicalize and check the set;
 ///     if already present, skip. This breaks `a -> b -> a` cycles in O(n)
 ///     per entry without a custom recursive walker.
-///   - For each yielded entry, ask `overrides` whether it is excluded; if
-///     so, drop it. If the entry is a directory, also short-circuit
-///     `WalkDir`'s descent via `it.skip_current_dir()`.
-pub(crate) fn walk_files(root: &Path, overrides: &Override) -> Result<Vec<PathBuf>> {
-    let mut out = Vec::new();
+pub(crate) fn walk_files_with_skips(
+    root: &Path,
+    overrides: &WalkOverrides,
+) -> Result<(Vec<PathBuf>, Vec<SkippedEntry>)> {
+    let mut accepted = Vec::new();
+    let mut skipped: Vec<SkippedEntry> = Vec::new();
    let mut visited: HashSet<PathBuf> = HashSet::new();

+    // Use a non-filtering iterator so we see excluded entries too.
    let walker = WalkDir::new(root).follow_links(true).into_iter();
-    let mut it = walker.filter_entry(|e| !is_excluded(e, root, overrides));
+    // We still use filter_entry for the *combined* override so that walkdir
+    // can short-circuit pruned directories. But we wrap it so we can capture
+    // the exclusion reason before discarding the entry.
+    //
+    // Problem: filter_entry discards without letting us see the entry first.
+    // Solution: use the raw iterator (no filter_entry) and manage skip_current_dir
+    // manually, which lets us record what was excluded before pruning.
+    let mut it = walker;

    while let Some(res) = it.next() {
        let entry = match res {
            Ok(e) => e,
            Err(err) => {
-                // `walkdir` surfaces I/O errors AND its own cycle detector
-                // (when follow_links is on it sometimes catches them).
-                // Either way: log and skip; do not abort the whole scan.
                tracing::warn!(error = %err, "walkdir entry error; skipping");
                continue;
            }
        };

        let path = entry.path();
+        let rel = match path.strip_prefix(root) {
+            Ok(p) => p,
+            Err(_) => path,
+        };
+        let is_dir = entry.file_type().is_dir();
+        let excluded = overrides.combined.matched(rel, is_dir).is_ignore();

-        // Cycle guard: only canonicalize symlinks (cheap on the common case
-        // of plain files/dirs) and on directories that are followed via a
-        // symlink. `walkdir`'s `path_is_symlink()` is true when the entry's
-        // *original* path is a symlink (it returns true for the link, not
-        // for the resolved target). For non-symlinked directories we still
-        // record the canonical path so a *later* symlink that points back
-        // to one of them is detected.
-        if entry.file_type().is_dir() {
+        if excluded {
+            let cat = classify_skip(rel, is_dir, overrides);
+            skipped.push(SkippedEntry {
+                path: path.to_path_buf(),
+                category: cat,
+            });
+            if is_dir {
+                // Prune the subtree — don't descend into excluded dirs.
+                it.skip_current_dir();
+            }
+            continue;
+        }
+
+        // Cycle guard for directories.
+        if is_dir {
            match std::fs::canonicalize(path) {
                Ok(canon) => {
                    if !visited.insert(canon) {
-                        // Already visited via another path → break cycle.
                        it.skip_current_dir();
                        continue;
                    }
@@ -157,26 +391,13 @@ pub(crate) fn walk_files(root: &Path, overrides: &Override) -> Result<Vec<PathBu
        }

        if entry.file_type().is_file() {
-            out.push(path.to_path_buf());
+            accepted.push(path.to_path_buf());
        }
    }

-    Ok(out)
+    Ok((accepted, skipped))
 }

-fn is_excluded(entry: &DirEntry, root: &Path, overrides: &Override) -> bool {
-    // `Override::matched(path, is_dir)` uses the path *relative to* the
-    // override builder's root. `walkdir` gives absolute paths when
-    // `WalkDir::new` was given an absolute path — strip the root prefix
-    // before consulting the override.
-    let rel = match entry.path().strip_prefix(root) {
-        Ok(p) => p,
-        Err(_) => entry.path(),
-    };
-    overrides
-        .matched(rel, entry.file_type().is_dir())
-        .is_ignore()
-}

 #[cfg(test)]
 mod tests {
@@ -187,7 +408,7 @@ mod tests {
        let dir = tempfile::tempdir().unwrap();
        let ov = build_overrides(dir.path(), &[], &[]).unwrap();
        // Default-excludes only; non-special files should not match.
-        let m = ov.matched(Path::new("notes/alpha.md"), false);
+        let m = ov.combined.matched(Path::new("notes/alpha.md"), false);
        assert!(!m.is_ignore());
    }

@@ -195,13 +416,13 @@ mod tests {
    fn default_excludes_ds_store_and_resource_forks() {
        let dir = tempfile::tempdir().unwrap();
        let ov = build_overrides(dir.path(), &[], &[]).unwrap();
-        assert!(ov.matched(Path::new(".DS_Store"), false).is_ignore());
+        assert!(ov.combined.matched(Path::new(".DS_Store"), false).is_ignore());
        assert!(
-            ov.matched(Path::new("notes/.DS_Store"), false).is_ignore()
+            ov.combined.matched(Path::new("notes/.DS_Store"), false).is_ignore()
        );
-        assert!(ov.matched(Path::new("._foo.md"), false).is_ignore());
+        assert!(ov.combined.matched(Path::new("._foo.md"), false).is_ignore());
        assert!(
-            ov.matched(Path::new("notes/._sidecar"), false).is_ignore()
+            ov.combined.matched(Path::new("notes/._sidecar"), false).is_ignore()
        );
    }

@@ -214,13 +435,13 @@ mod tests {
            &[],
        )
        .unwrap();
-        assert!(ov.matched(Path::new("a.tmp"), false).is_ignore());
-        assert!(ov.matched(Path::new("notes/x.tmp"), false).is_ignore());
+        assert!(ov.combined.matched(Path::new("a.tmp"), false).is_ignore());
+        assert!(ov.combined.matched(Path::new("notes/x.tmp"), false).is_ignore());
        assert!(
-            ov.matched(Path::new("node_modules/foo/bar.js"), false)
+            ov.combined.matched(Path::new("node_modules/foo/bar.js"), false)
                .is_ignore()
        );
-        assert!(!ov.matched(Path::new("alpha.md"), false).is_ignore());
+        assert!(!ov.combined.matched(Path::new("alpha.md"), false).is_ignore());
    }

    #[test]
@@ -233,9 +454,9 @@ mod tests {
            &["secret/**".to_string()],
        )
        .unwrap();
-        assert!(ov.matched(Path::new("a.tmp"), false).is_ignore());
+        assert!(ov.combined.matched(Path::new("a.tmp"), false).is_ignore());
        assert!(
-            ov.matched(Path::new("secret/key.md"), false).is_ignore()
+            ov.combined.matched(Path::new("secret/key.md"), false).is_ignore()
        );
    }

@@ -257,4 +478,230 @@ mod tests {
        let v = read_kbignore(dir.path()).unwrap();
        assert_eq!(v, vec!["*.tmp".to_string(), "ignored/**".to_string()]);
    }
+
+    #[test]
+    fn built_in_blacklist_excludes_node_modules() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        fs::create_dir_all(root.join("src")).unwrap();
+        fs::create_dir_all(root.join("node_modules/foo")).unwrap();
+        fs::write(root.join("src/main.rs"), "x").unwrap();
+        fs::write(root.join("node_modules/foo/bar.js"), "x").unwrap();
+
+        let overrides = build_overrides(root, &[], &[]).unwrap();
+        // Override::matched expects paths relative to the builder's root.
+        let m_in = overrides.combined.matched(Path::new("src/main.rs"), false);
+        let m_out = overrides.combined.matched(Path::new("node_modules/foo/bar.js"), false);
+
+        assert!(!m_in.is_ignore(), "src/main.rs should NOT be ignored");
+        assert!(m_out.is_ignore(), "node_modules/foo/bar.js SHOULD be ignored");
+    }
+
+    #[test]
+    fn built_in_blacklist_excludes_target_pycache_venv() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        for dir in ["target/x", "__pycache__/x", ".venv/x", "venv/x", "env/x"] {
+            fs::create_dir_all(root.join(dir)).unwrap();
+            fs::write(root.join(dir).join("y.txt"), "z").unwrap();
+        }
+        fs::create_dir_all(root.join("ok")).unwrap();
+        fs::write(root.join("ok/z.txt"), "z").unwrap();
+
+        let overrides = build_overrides(root, &[], &[]).unwrap();
+        // Override::matched expects paths relative to the builder's root.
+        for blacklisted in [
+            "target/x/y.txt",
+            "__pycache__/x/y.txt",
+            ".venv/x/y.txt",
+            "venv/x/y.txt",
+            "env/x/y.txt",
+        ] {
+            let m = overrides.combined.matched(Path::new(blacklisted), false);
+            assert!(m.is_ignore(), "{blacklisted} should be ignored");
+        }
+        let m_ok = overrides.combined.matched(Path::new("ok/z.txt"), false);
+        assert!(!m_ok.is_ignore(), "ok/z.txt should not be ignored");
+    }
+
+    #[test]
+    fn gitignore_at_repo_root_excludes_matching_files() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        fs::create_dir_all(root.join("src")).unwrap();
+        fs::write(root.join(".gitignore"), "*.log\ndist/\n").unwrap();
+        fs::write(root.join("a.log"), "x").unwrap();
+        fs::write(root.join("src/main.rs"), "x").unwrap();
+        fs::create_dir_all(root.join("dist")).unwrap();
+        fs::write(root.join("dist/bundle.js"), "x").unwrap();
+
+        let overrides = build_overrides(root, &[], &[]).unwrap();
+        assert!(overrides.combined.matched(Path::new("a.log"), false).is_ignore());
+        assert!(overrides.combined.matched(Path::new("dist/bundle.js"), false).is_ignore());
+        assert!(!overrides.combined.matched(Path::new("src/main.rs"), false).is_ignore());
+    }
+
+    #[test]
+    fn gitignore_missing_is_no_op() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        fs::write(root.join("a.log"), "x").unwrap();
+        fs::create_dir_all(root.join("src")).unwrap();
+        fs::write(root.join("src/main.rs"), "x").unwrap();
+
+        // No .gitignore present — patterns from .gitignore should not affect overrides.
+        let overrides = build_overrides(root, &[], &[]).unwrap();
+        assert!(!overrides.combined.matched(Path::new("a.log"), false).is_ignore());
+        assert!(!overrides.combined.matched(Path::new("src/main.rs"), false).is_ignore());
+    }
+
+    #[test]
+    fn gitignore_negation_with_trailing_slash_passes_through() {
+        use std::fs;
+        use tempfile::TempDir;
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        // Negation pattern. We don't fully implement gitignore negation
+        // semantics, but at minimum it must not produce double-`!` corruption.
+        fs::write(root.join(".gitignore"), "!keep/\n").unwrap();
+        // Just verify build_overrides doesn't error.
+        let result = build_overrides(root, &[], &[]);
+        assert!(result.is_ok(), "should not error on negation pattern: {:?}", result.err());
+    }
+
+    // ── Skip attribution tests ────────────────────────────────────────────────
+
+    #[test]
+    fn classify_skip_attributes_builtin_over_gitignore() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        // node_modules matches both BUILTIN_BLACKLIST and a hypothetical
+        // .gitignore entry. Builtin must win (priority order §5.2).
+        fs::write(root.join(".gitignore"), "node_modules/\n").unwrap();
+
+        let ov = build_overrides(root, &[], &[]).unwrap();
+        // node_modules/ dir itself
+        let cat = classify_skip(Path::new("node_modules"), true, &ov);
+        assert_eq!(cat, SkipCategory::BuiltinBlacklist, "builtin must have priority");
+    }
+
+    #[test]
+    fn classify_skip_attributes_gitignore_for_log_files() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        fs::write(root.join(".gitignore"), "*.log\n").unwrap();
+
+        let ov = build_overrides(root, &[], &[]).unwrap();
+        let cat = classify_skip(Path::new("app.log"), false, &ov);
+        assert_eq!(cat, SkipCategory::Gitignore);
+    }
+
+    #[test]
+    fn classify_skip_attributes_kebabignore() {
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+
+        let ov = build_overrides(root, &[], &["*.secret".to_string()]).unwrap();
+        let cat = classify_skip(Path::new("creds.secret"), false, &ov);
+        assert_eq!(cat, SkipCategory::Kebabignore);
+    }
+
+    #[test]
+    fn walk_files_with_skips_counts_gitignored_files() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        fs::write(root.join(".gitignore"), "*.log\n").unwrap();
+        fs::write(root.join("ok.md"), "# ok").unwrap();
+        fs::write(root.join("skipme.log"), "x").unwrap();
+
+        let ov = build_overrides(root, &[], &[]).unwrap();
+        let (accepted, skipped_entries) = walk_files_with_skips(root, &ov).unwrap();
+
+        let accepted_names: Vec<_> = accepted
+            .iter()
+            .map(|p| p.file_name().unwrap().to_string_lossy().into_owned())
+            .collect();
+        assert!(
+            accepted_names.iter().any(|n| n == "ok.md"),
+            "ok.md should be accepted; got: {accepted_names:?}"
+        );
+        assert!(
+            !accepted_names.iter().any(|n| n == "skipme.log"),
+            "skipme.log should not be accepted; got: {accepted_names:?}"
+        );
+
+        let gitignore_skipped: Vec<_> = skipped_entries
+            .iter()
+            .filter(|e| e.category == SkipCategory::Gitignore)
+            .collect();
+        assert!(
+            gitignore_skipped.iter().any(|e| e.path.file_name()
+                .map(|n| n == "skipme.log")
+                .unwrap_or(false)),
+            "skipme.log should appear in gitignore_skipped; skipped: {:?}",
+            skipped_entries.iter().map(|e| &e.path).collect::<Vec<_>>()
+        );
+    }
+
+    #[test]
+    fn walk_files_with_skips_counts_builtin_blacklist_dirs() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        fs::create_dir_all(root.join("node_modules/foo")).unwrap();
+        fs::write(root.join("node_modules/foo/bar.js"), "x").unwrap();
+        fs::write(root.join("ok.md"), "# ok").unwrap();
+
+        let ov = build_overrides(root, &[], &[]).unwrap();
+        let (accepted, skipped_entries) = walk_files_with_skips(root, &ov).unwrap();
+
+        let accepted_names: Vec<_> = accepted
+            .iter()
+            .map(|p| p.file_name().unwrap().to_string_lossy().into_owned())
+            .collect();
+        assert!(
+            accepted_names.iter().any(|n| n == "ok.md"),
+            "ok.md must be accepted; got: {accepted_names:?}"
+        );
+
+        let builtin_skipped: Vec<_> = skipped_entries
+            .iter()
+            .filter(|e| e.category == SkipCategory::BuiltinBlacklist)
+            .collect();
+        assert!(
+            !builtin_skipped.is_empty(),
+            "node_modules/ should produce at least one BuiltinBlacklist skip"
+        );
+        assert!(
+            builtin_skipped.iter().any(|e| e.path.components()
+                .any(|c| c.as_os_str() == "node_modules")),
+            "skipped path should contain node_modules; got: {:?}",
+            builtin_skipped.iter().map(|e| &e.path).collect::<Vec<_>>()
+        );
+    }
 }
--- a/crates/kebab-source-fs/tests/symlink_cycle.rs
+++ b/crates/kebab-source-fs/tests/symlink_cycle.rs
@@ -9,7 +9,7 @@
 //! Expected: `scan` returns in O(seconds), every emitted path is unique,
 //! and `alpha.md` appears at least once.
 //!
-//! The cycle guard lives in `walker::walk_files`; this test exists to
+//! The cycle guard lives in `walker::walk_files_with_skips`; this test exists to
 //! prove it catches the realistic shape (cycle through one or more
 //! symlinks) end-to-end via the public API.

@@ -100,7 +100,7 @@ fn two_step_directory_cycle_visited_set_breaks_loop() {
    //
    // Without the visited-set, walkdir would descend
    //   a → a/loop (=b) → a/loop/loop (=a) → … forever.
-    // The canonical-path visited-set in `walker::walk_files` must break
+    // The canonical-path visited-set in `walker::walk_files_with_skips` must break
    // the loop and yield a finite, deterministic result.
    let dir = tempfile::tempdir().unwrap();
    let root = dir.path();
--- a/crates/kebab-store-sqlite/snapshots/ingest_report.snapshot.json
+++ b/crates/kebab-store-sqlite/snapshots/ingest_report.snapshot.json
@@ -42,8 +42,19 @@
    ],
    "root": "/home/u/KB"
  },
+  "skip_examples": {
+    "builtin_blacklist": [],
+    "generated": [],
+    "gitignore": [],
+    "size_exceeded": []
+  },
  "skipped": 0,
+  "skipped_builtin_blacklist": 0,
  "skipped_by_extension": {},
+  "skipped_generated": 0,
+  "skipped_gitignore": 0,
+  "skipped_kebabignore": 0,
+  "skipped_size_exceeded": 0,
  "unchanged": 0,
  "updated": 1
 }
--- a/crates/kebab-store-sqlite/tests/idempotency.rs
+++ b/crates/kebab-store-sqlite/tests/idempotency.rs
@@ -42,6 +42,10 @@ fn make_metadata() -> Metadata {
        trust_level: TrustLevel::Primary,
        user_id_alias: None,
        user: Default::default(),
+        repo: None,
+        git_branch: None,
+        git_commit: None,
+        code_lang: None,
    }
 }

--- a/crates/kebab-store-sqlite/tests/incremental_ingest.rs
+++ b/crates/kebab-store-sqlite/tests/incremental_ingest.rs
@@ -51,6 +51,10 @@ fn make_doc() -> CanonicalDocument {
        trust_level: TrustLevel::Primary,
        user_id_alias: None,
        user: Default::default(),
+        repo: None,
+        git_branch: None,
+        git_commit: None,
+        code_lang: None,
    };
    CanonicalDocument {
        doc_id,
--- a/crates/kebab-store-sqlite/tests/ingest_report_snapshot.rs
+++ b/crates/kebab-store-sqlite/tests/ingest_report_snapshot.rs
@@ -35,6 +35,12 @@ fn fixture_report() -> IngestReport {
        errors: 0,
        duration_ms: 187,
        skipped_by_extension: std::collections::BTreeMap::new(),
+        skipped_gitignore: 0,
+        skipped_kebabignore: 0,
+        skipped_builtin_blacklist: 0,
+        skipped_generated: 0,
+        skipped_size_exceeded: 0,
+        skip_examples: kebab_core::SkipExamples::default(),
        items: Some(vec![
            IngestItem {
                kind: IngestItemKind::New,
--- a/crates/kebab-store-sqlite/tests/list_docs.rs
+++ b/crates/kebab-store-sqlite/tests/list_docs.rs
@@ -54,6 +54,10 @@ fn make_doc(
        trust_level: trust,
        user_id_alias: None,
        user: Default::default(),
+        repo: None,
+        git_branch: None,
+        git_commit: None,
+        code_lang: None,
    };
    let doc = CanonicalDocument {
        doc_id,
--- a/crates/kebab-tui/tests/inspect.rs
+++ b/crates/kebab-tui/tests/inspect.rs
@@ -79,6 +79,10 @@ fn make_doc() -> CanonicalDocument {
            trust_level: TrustLevel::Primary,
            user_id_alias: None,
            user,
+            repo: None,
+            git_branch: None,
+            git_commit: None,
+            code_lang: None,
        },
        provenance: Provenance {
            events: vec![ProvenanceEvent {
--- a/crates/kebab-tui/tests/search.rs
+++ b/crates/kebab-tui/tests/search.rs
@@ -56,6 +56,8 @@ fn make_hit(rank: u32, path: &str, snippet: &str, citation: Citation) -> SearchH
        indexed_at: time::OffsetDateTime::UNIX_EPOCH,
        stale: false,
        score_kind: kebab_core::ScoreKind::Rrf,
+        repo: None,
+        code_lang: None,
    }
 }

--- a/docs/SMOKE.md
+++ b/docs/SMOKE.md
@@ -113,6 +113,11 @@ max_context_tokens = 6000

 [ui]
 theme = "dark"                       # p9-fb-14 — TUI palette ("dark" / "light", default "dark")
+
+[ingest.code]
+skip_generated_header = true
+max_file_bytes = 262144
+max_file_lines = 5000
 ```

 `KEBAB_*` 환경변수로 override 가능 (`KEBAB_MODELS_LLM_MODEL=gemma4:26b kebab …` 등). 자세한 키 목록은 `crates/kebab-config/src/lib.rs` 의 `apply_env` 매치 암. `KEBAB_READONLY=1` — write-path 비활성화 (CI 안전망). `KEBAB_PROGRESS=plain` — non-TTY 환경에서 진행 상황을 plain 한 줄씩 stderr 출력 (spinner 대신).
@@ -329,4 +334,24 @@ rm -rf /tmp/kebab-smoke              # 통째로 정리
 - (P7-3) 한 PDF 가 N 페이지면 `kebab ingest` 가 N 개 (또는 그 이상의, 페이지 길면 multi-chunk) 의 chunk 를 한 transaction 안에서 commit. 500 페이지 책 → 500+ chunk 한 번에 → embedding throughput 가 bottleneck. 임베딩 활성 워크스페이스에서 큰 PDF 를 처음 ingest 하면 분-단위 시간 + WAL 크기 증가 가능 — P+ 스케일 hardening task 까지 정상 동작이지만 비용은 측정 가능.
 - (P7-3 + follow-up) 동일 path 에 byte 가 다른 PDF 를 두 번째 ingest 하면 `purge_vector_orphans_for_workspace_path` 가 옛 chunk_id 를 LanceDB 에서 먼저 삭제, 이어서 `purge_orphan_at_workspace_path` 가 옛 doc / chunks / embedding_records 를 SQLite 에서 sweep. 새 byte 가 새 `doc_id` 로 색인됨. `IngestReport` 에 그 자산만 `new+=1` (다른 자산은 `updated`). 두 store 모두 정합 — 옛 본문 검색 시 옛 chunks 가 더 이상 surface 되지 않음.

+### Embedding upgrade (fb-39b)
+
+`multilingual-e5-small` 에서 `multilingual-e5-large` 로 업그레이드 시퀀스:
+
+```bash
+# 기존 vector index 정리 (orphan table 회피)
+kebab --config /tmp/kebab-smoke/config.toml reset --vector-only
+
+# config.toml 의 [models.embedding] 갱신:
+#   model = "multilingual-e5-large"
+#   dimensions = 1024
+
+# 재-ingest — fastembed 가 첫 실행 시 e5-large ONNX (~1.3 GB) 자동 다운로드.
+# 다운로드 시간 + 모든 chunk re-embed 시간 (e5-small 대비 ~3-4×).
+kebab --config /tmp/kebab-smoke/config.toml ingest
+
+# fb-39 의 P@k metric 으로 small vs large 비교:
+kebab --config /tmp/kebab-smoke/config.toml eval run
+```
+
 자세한 history 와 발견된 버그는 [tasks/HOTFIXES.md](../tasks/HOTFIXES.md) 참조.
--- a/docs/superpowers/plans/2026-05-10-p9-fb-39b-embedding-upgrade.md
+++ b/docs/superpowers/plans/2026-05-10-p9-fb-39b-embedding-upgrade.md
@@ -0,0 +1,405 @@
+# fb-39b Embedding Model Upgrade Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Upgrade default embedding model from `multilingual-e5-small` (384 dim) to `multilingual-e5-large` (1024 dim) so retrieval precision can improve on Korean dogfooding corpus. Existing user TOMLs pinning `multilingual-e5-small` keep working unchanged.
+
+**Architecture:** Three-line code surface: a new arm in `kebab-embed-local::resolve_model`, defaults flipped in `kebab-config::Config::defaults` (and the TOML template), and the existing test asserting the 384 default updated. LanceDB tables are already namespaced by `(model, dim)` so an upgraded model writes to a fresh table; fb-23 incremental ingest detects the `embedding_version` mismatch and auto-re-embeds on next ingest. No migration tooling — orphan old-model tables cleaned via `kebab reset --vector-only`.
+
+**Tech Stack:** Rust 2024, fastembed 4.9.1 (`MultilingualE5Large` enum already shipped), LanceDB.
+
+**Spec:** `docs/superpowers/specs/2026-05-10-p9-fb-39b-embedding-upgrade-design.md`
+
+---
+
+## File map
+
+**Modify:**
+- `crates/kebab-embed-local/src/lib.rs` — add `multilingual-e5-large` arm in `resolve_model`. Update or add `check_dim` test for 1024.
+- `crates/kebab-config/src/lib.rs` — flip `Config::defaults().models.embedding.{model, dimensions}` and the TOML template at line ~952. Update default test at line 767.
+- `README.md` — `[models.embedding]` section: mention new default + small opt-out + dim mismatch hint.
+- `docs/SMOKE.md` — append "Embedding upgrade (fb-39b)" walkthrough showing the `kebab reset --vector-only && kebab ingest` sequence + first-run ONNX download warning.
+- `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` §5 storage / §9 versioning — update default model + dim references.
+- `tasks/HOTFIXES.md` — entry for embedding upgrade UX (orphan tables on model swap, reset --vector-only flow).
+- `tasks/p9/p9-fb-39-retrieval-precision-tuning.md` banner — append note "fb-39b lever 적용 (embedding upgrade) ✅".
+- `tasks/INDEX.md` — fb-39b row ✅ (new row alongside fb-39).
+
+**Create:**
+- `tasks/p9/p9-fb-39b-embedding-upgrade.md` — new task spec mirroring fb-39 frontmatter (status: completed, design + plan links).
+
+---
+
+## Task 1: Add multilingual-e5-large to kebab-embed-local
+
+**Files:**
+- Modify: `crates/kebab-embed-local/src/lib.rs`
+
+- [ ] **Step 1: Append failing tests**
+
+Find the existing `mod tests` (~line 230). Append:
+
+```rust
+#[test]
+fn resolve_model_supports_e5_large() {
+    let m = resolve_model("multilingual-e5-large").expect("e5-large should resolve");
+    // The fastembed enum is non-comparable in some versions; we only need
+    // to confirm Ok and that the underlying TextEmbedding could be built.
+    // Avoid actually constructing the model in tests (1.3 GB ONNX download).
+    let _ = m;
+}
+
+#[test]
+fn check_dim_passes_for_1024() {
+    check_dim(1024, 1024).expect("matching dims must pass");
+}
+
+#[test]
+fn check_dim_rejects_384_vs_1024() {
+    let err = check_dim(384, 1024).expect_err("dim mismatch must error");
+    let msg = format!("{err}");
+    assert!(msg.contains("384") && msg.contains("1024"),
+        "error must mention both dims, got: {msg}");
+}
+```
+
+- [ ] **Step 2: Run tests to confirm failures**
+
+```bash
+cargo test -p kebab-embed-local resolve_model_supports_e5_large
+cargo test -p kebab-embed-local check_dim_passes_for_1024
+```
+Expected: `resolve_model_supports_e5_large` fails (no arm); `check_dim_*` passes already (helper is generic).
+
+- [ ] **Step 3: Add arm to resolve_model**
+
+In `crates/kebab-embed-local/src/lib.rs`, find `fn resolve_model` (~line 199). Replace the match body:
+
+```rust
+fn resolve_model(name: &str) -> Result<EmbeddingModel> {
+    match name {
+        "multilingual-e5-small" => Ok(EmbeddingModel::MultilingualE5Small),
+        "multilingual-e5-large" => Ok(EmbeddingModel::MultilingualE5Large),
+        other => anyhow::bail!(
+            "kb-embed-local: unsupported embedding model {other:?}; \
+             this adapter currently ships `multilingual-e5-small` and \
+             `multilingual-e5-large`. Add a new arm to `resolve_model` \
+             (and a fastembed feature flag if needed) to support more."
+        ),
+    }
+}
+```
+
+- [ ] **Step 4: Run tests — all pass**
+
+```bash
+cargo test -p kebab-embed-local
+cargo clippy -p kebab-embed-local --all-targets -- -D warnings
+```
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add crates/kebab-embed-local/src/lib.rs
+git commit -m "feat(embed): add multilingual-e5-large arm to resolve_model (fb-39b)"
+```
+
+---
+
+## Task 2: Flip kebab-config default to e5-large + 1024 dim
+
+**Files:**
+- Modify: `crates/kebab-config/src/lib.rs`
+
+- [ ] **Step 1: Read existing default test + value sites**
+
+```bash
+grep -n "multilingual-e5-small\|dimensions: 384\|dimensions = 384\|default.*embedding" crates/kebab-config/src/lib.rs
+```
+
+Three sites to update:
+- `Config::defaults()` body (~line 307): `dimensions: 384` and `model: "multilingual-e5-small"`.
+- Default-assert test (~line 767): `assert_eq!(c.models.embedding.dimensions, 384)` and likely a sibling assertion on model.
+- TOML template at ~line 952: `dimensions = 384` (and likely `model = "multilingual-e5-small"`).
+
+- [ ] **Step 2: Add failing assertion to existing default test**
+
+Find the test at ~line 763-768 (likely `defaults_match_design_64_score_gate` or similar). Read it:
+
+```bash
+sed -n '760,780p' crates/kebab-config/src/lib.rs
+```
+
+If the test asserts `dimensions == 384`, change to `1024`. If it doesn't assert model name, add:
+
+```rust
+    assert_eq!(c.models.embedding.model, "multilingual-e5-large");
+    assert_eq!(c.models.embedding.dimensions, 1024);
+```
+
+- [ ] **Step 3: Run tests — expect failure**
+
+```bash
+cargo test -p kebab-config defaults_match
+```
+Expected: assertion failure on dimensions == 1024 (still 384) and/or model name.
+
+- [ ] **Step 4: Flip the defaults**
+
+In `crates/kebab-config/src/lib.rs:307` (the `EmbeddingCfg` defaults block):
+
+```rust
+EmbeddingCfg {
+    provider: "fastembed".to_string(),
+    model: "multilingual-e5-large".to_string(),
+    version: "v1".to_string(),
+    dimensions: 1024,
+    // ... preserve other fields (batch_size etc.) ...
+}
+```
+
+(Read the surrounding lines first to confirm field names — if `version` field doesn't exist or has a different shape, only update `model` + `dimensions`.)
+
+- [ ] **Step 5: Flip the TOML template**
+
+In `crates/kebab-config/src/lib.rs` near line 952, the multi-line raw string contains the example TOML config. Find:
+
+```toml
+[models.embedding]
+provider = "fastembed"
+model = "multilingual-e5-small"
+...
+dimensions = 384
+```
+
+Replace with `model = "multilingual-e5-large"` and `dimensions = 1024`.
+
+- [ ] **Step 6: Run tests — pass**
+
+```bash
+cargo test -p kebab-config
+cargo clippy -p kebab-config --all-targets -- -D warnings
+```
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add crates/kebab-config/src/lib.rs
+git commit -m "feat(config): default embedding model multilingual-e5-large + 1024 dim (fb-39b)"
+```
+
+---
+
+## Task 3: Cross-crate test fixture sweep
+
+**Files:**
+- Modify: any test fixture broken by Task 2's default flip.
+
+- [ ] **Step 1: Find broken sites**
+
+```bash
+cargo build --workspace 2>&1 | tail -10
+cargo test --workspace --no-run 2>&1 | grep -E "error\[|FAILED" | head -20
+```
+
+Likely candidates:
+- `crates/kebab-app/tests/` — anywhere a test asserted `embedding.dimensions == 384`.
+- `crates/kebab-cli/tests/cli_schema.rs` — a capability/model assertion may include the embedding model name.
+
+For each failure, decide:
+- **Pin to small intentionally** (test exercises small-specific behavior): set `cfg.models.embedding.model = "multilingual-e5-small"; cfg.models.embedding.dimensions = 384;` explicitly.
+- **Inherit new default** (test just snapshots defaults): update assertion to `multilingual-e5-large` / `1024`.
+
+The vast majority of integration tests use `provider = "none"` (no embeddings) — those are unaffected.
+
+- [ ] **Step 2: Verify workspace builds**
+
+```bash
+cargo build --workspace 2>&1 | tail -5
+```
+
+- [ ] **Step 3: Run workspace tests**
+
+```bash
+cargo test --workspace --no-fail-fast -j 1 2>&1 | tail -10
+cargo clippy --workspace --all-targets -- -D warnings 2>&1 | tail -5
+```
+
+`-j 1` REQUIRED.
+
+Expected: all green.
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add crates/
+git commit -m "fix(fb-39b): update test fixtures for embedding default flip"
+```
+
+(Skip this commit if `cargo build --workspace` is already clean after Task 2 — meaning no fixture broke.)
+
+---
+
+## Task 4: Wire schema docs (design + HOTFIXES + new task spec)
+
+**Files:**
+- Modify: `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`
+- Modify: `tasks/HOTFIXES.md`
+- Create: `tasks/p9/p9-fb-39b-embedding-upgrade.md`
+- Modify: `tasks/p9/p9-fb-39-retrieval-precision-tuning.md`
+- Modify: `tasks/INDEX.md`
+
+- [ ] **Step 1: Update design §5 storage and §9 versioning**
+
+```bash
+grep -n "multilingual-e5-small\|^## §5\|^### §5\|^## §9\|384" docs/superpowers/specs/2026-04-27-kebab-final-form-design.md | head -10
+```
+
+Update any reference to `multilingual-e5-small` or `dim 384` in the design doc to read `multilingual-e5-large` and `dim 1024`. Keep historical version mentions intact (e.g. "0.6.0 shipped with multilingual-e5-small") if any — but the "current default" line must reflect the new model.
+
+- [ ] **Step 2: Add HOTFIXES entry**
+
+Append to `tasks/HOTFIXES.md` (under the dated log; place at top of the dated entries with today's date `2026-05-10`):
+
+```markdown
+- **2026-05-10 fb-39b — embedding upgrade UX**: default embedding flipped from `multilingual-e5-small` (384 dim) to `multilingual-e5-large` (1024 dim). LanceDB tables are namespaced by `(model, dim)` so the new model writes to a fresh table and the old `chunk_embeddings_multilingual-e5-small_384` table becomes orphan. fb-23 incremental ingest auto-re-embeds chunks (embedding_version mismatch) into the new table on next `kebab ingest`. To free disk before re-ingest, run `kebab reset --vector-only` first — this wipes both LanceDB and the SQLite `embedding_records` table. Search/ask against the new model returns empty hits until `kebab ingest` populates the new table.
+```
+
+- [ ] **Step 3: Create `tasks/p9/p9-fb-39b-embedding-upgrade.md`**
+
+Mirror the fb-39 frontmatter shape:
+
+```markdown
+---
+phase: P9
+component: kebab-embed-local + kebab-config + kebab-store-vector + docs
+task_id: p9-fb-39b
+title: "Embedding model upgrade (multilingual-e5-large)"
+status: completed
+target_version: 0.7.0
+depends_on: [p9-fb-39]
+unblocks: []
+contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
+contract_sections: [§4 search, §5 storage, §9 versioning cascade]
+source_feedback: 사용자 도그푸딩 2026-05-06 — Claude Code 가 kebab CLI 사용 후 "rank 5+ 노이즈 섞임" 지적 (fb-39 의 lever 적용 측면).
+---
+
+# p9-fb-39b — Embedding model upgrade
+
+> ✅ **구현 완료.** fb-39 의 lever 후보 4개 중 embedding model 업그레이드 lever 적용. P@k metric (fb-39) 으로 small vs large 비교 가능.
+>
+> - Design: [`docs/superpowers/specs/2026-05-10-p9-fb-39b-embedding-upgrade-design.md`](../../docs/superpowers/specs/2026-05-10-p9-fb-39b-embedding-upgrade-design.md)
+> - Plan: [`docs/superpowers/plans/2026-05-10-p9-fb-39b-embedding-upgrade.md`](../../docs/superpowers/plans/2026-05-10-p9-fb-39b-embedding-upgrade.md)
+
+## 요약
+
+- `multilingual-e5-small` (384 dim) → `multilingual-e5-large` (1024 dim) default flip.
+- 기존 user TOML 이 small 명시 시 그대로 (backwards-compat).
+- fb-23 incremental ingest 가 embedding_version mismatch 감지 → 자동 re-embed.
+- 0.6 → 0.7 minor bump 트리거 (design §9 cascade rule).
+```
+
+- [ ] **Step 4: Append fb-39b note to fb-39 task spec banner**
+
+In `tasks/p9/p9-fb-39-retrieval-precision-tuning.md`, find the existing `> ✅ **Eval foundation 부분 구현 완료.**` banner. Append a line:
+
+```markdown
+> - fb-39b (lever 적용 — embedding upgrade): [`tasks/p9/p9-fb-39b-embedding-upgrade.md`](./p9-fb-39b-embedding-upgrade.md) ✅
+```
+
+- [ ] **Step 5: Add fb-39b row to INDEX**
+
+In `tasks/INDEX.md`, find the fb-39 row. Add a sibling row immediately below:
+
+```markdown
+    - [p9-fb-39b embedding upgrade](p9/p9-fb-39b-embedding-upgrade.md) — ✅ 머지 (2026-05-10) — multilingual-e5-large default
+```
+
+(Adapt format to match neighbor rows.)
+
+- [ ] **Step 6: Workspace test + clippy gate**
+
+```bash
+cargo test --workspace --no-fail-fast -j 1 2>&1 | tail -10
+cargo clippy --workspace --all-targets -- -D warnings 2>&1 | tail -5
+```
+
+`-j 1` REQUIRED.
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add docs/ tasks/
+git commit -m "docs(fb-39b): design + HOTFIXES + new task spec + INDEX"
+```
+
+---
+
+## Task 5: README + SMOKE walkthrough
+
+**Files:**
+- Modify: `README.md`
+- Modify: `docs/SMOKE.md`
+
+- [ ] **Step 1: Update README `[models.embedding]` section**
+
+```bash
+grep -n "models.embedding\|multilingual-e5-small\|fastembed" README.md | head -5
+```
+
+Locate the `[models.embedding]` config block in README. Update default values mentioned + add new bullet:
+
+```markdown
+- `model` (default `"multilingual-e5-large"`, fb-39b) — 다국어 sentence embedding 모델. 1024-dim. ONNX (~1.3 GB) 첫 실행 시 fastembed cache (`config.storage.model_dir/fastembed/`) 에 자동 다운로드. `"multilingual-e5-small"` (384 dim) 는 backwards-compat 으로 사용 가능 — TOML 에 명시.
+- `dimensions` (default `1024`) — 모델의 embedding 차원. config 와 LanceDB stored dim 불일치 시 검색 결과 0 건 (orphan table). 모델 변경 시 `kebab reset --vector-only && kebab ingest` 로 vector index 재구축 권장.
+```
+
+- [ ] **Step 2: Append SMOKE walkthrough**
+
+Append to `docs/SMOKE.md` after fb-39 section (or at end if absent):
+
+````markdown
+### Embedding upgrade (fb-39b)
+
+`multilingual-e5-small` 에서 `multilingual-e5-large` 로 업그레이드 시퀀스:
+
+```bash
+# 기존 vector index 정리 (orphan table 회피)
+kebab --config /tmp/kebab-smoke/config.toml reset --vector-only
+
+# config.toml 의 [models.embedding] 갱신:
+#   model = "multilingual-e5-large"
+#   dimensions = 1024
+
+# 재-ingest — fastembed 가 첫 실행 시 e5-large ONNX (~1.3 GB) 자동 다운로드.
+# 다운로드 시간 + 모든 chunk re-embed 시간 (e5-small 대비 ~3-4×).
+kebab --config /tmp/kebab-smoke/config.toml ingest
+
+# fb-39 의 P@k metric 으로 small vs large 비교:
+kebab --config /tmp/kebab-smoke/config.toml eval run
+```
+````
+
+- [ ] **Step 3: Workspace test + clippy gate (sanity)**
+
+```bash
+cargo test --workspace --no-fail-fast -j 1 2>&1 | tail -5
+cargo clippy --workspace --all-targets -- -D warnings 2>&1 | tail -3
+```
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add README.md docs/SMOKE.md
+git commit -m "docs(fb-39b): README + SMOKE — embedding upgrade walkthrough"
+```
+
+---
+
+## Final verification checklist
+
+- [ ] `cargo test --workspace --no-fail-fast -j 1` green
+- [ ] `cargo clippy --workspace --all-targets -- -D warnings` clean
+- [ ] `kebab schema --json | jq .models.embedding_version` reflects new model name (after a fresh ingest with new defaults)
+- [ ] Manual smoke: `kebab reset --vector-only && kebab ingest` against `/tmp/kebab-smoke` triggers ONNX download (first run) then completes ingest into the new `chunk_embeddings_multilingual-e5-large_1024` table
+- [ ] README + SMOKE + design + HOTFIXES + fb-39b spec + INDEX all updated
+- [ ] **Post-merge**: cut version bump 0.6 → 0.7 + tag (CLAUDE.md `Versioning cascade` release rule — embedding_version cascade triggers minor bump)
--- a/docs/superpowers/plans/2026-05-15-p10-1a-1-code-ingest-framework.md
+++ b/docs/superpowers/plans/2026-05-15-p10-1a-1-code-ingest-framework.md
--- a/docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
+++ b/docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
@@ -37,6 +37,7 @@ related_tasks: ../../../tasks/INDEX.md
 | – | ignore | gitignore 문법 + `.kebabignore` | 익숙함 |
 | – | 에러 | thiserror per crate, anyhow at boundary | 추적성 + UX |
 | – | sync | watch=false default | v1 명시 ingest |
+| C+ | code ingest 추가 | Tier 1/2/3 fan-out, e5-large 유지, 새 Citation `code` variant | 2026-05-15 spec |

 ---

@@ -93,7 +94,7 @@ retrieval trace

 grounded ✓  qwen2.5:14b-instruct  rag-v1  3 chunks
 prompt   1184 tokens  completion 312 tokens  latency 1842 ms
-embedding multilingual-e5-small  index v1.0
+embedding multilingual-e5-large  index v1.0
 ```

 ### 1.3 `kebab ask` (refusal — score gate)
@@ -168,12 +169,12 @@ $ kebab search "Markdown chunking 규칙"

 `docs/wire-schema/v1/*.schema.json` 으로 동결. internal Rust struct ↔ wire 변환은 `From`/`TryFrom`. 모든 wire 객체는 `schema_version` 필드 필수.

-### 2.1 Citation (5 variants — discriminated by `kind`)
+### 2.1 Citation (6 variants — discriminated by `kind`)

 ```json
 {
  "schema_version": "citation.v1",
-  "kind": "line|page|region|caption|time",
+  "kind": "line|page|region|caption|time|code",
  "path": "notes/rust/kebab.md",
  "uri":  "notes/rust/kebab.md#L12-L34",

@@ -181,7 +182,20 @@ $ kebab search "Markdown chunking 규칙"
  "page":    { "page": 13, "section": "Experiment Setup" },
  "region":  { "x": 120, "y": 40, "w": 520, "h": 180 },
  "caption": { "model": "qwen2.5-vl:7b" },
-  "time":    { "start_ms": 822000, "end_ms": 850000, "speaker": "S1" }
+  "time":    { "start_ms": 822000, "end_ms": 850000, "speaker": "S1" },
+  "code":    { "start": 10, "end": 42, "lang": "rust", "repo": "kebab", "symbol": "fn ingest" }
+}
+```
+
+code variant example (p10-1A-1):
+
+```json
+{
+  "schema_version": "citation.v1",
+  "kind": "code",
+  "path": "crates/kebab-app/src/ingest.rs",
+  "uri":  "crates/kebab-app/src/ingest.rs#L10-L42",
+  "code": { "start": 10, "end": 42, "lang": "rust", "repo": "kebab", "symbol": "fn ingest" }
 }
 ```

@@ -212,8 +226,10 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
    "vector_rank": 2
  },
  "index_version": "v1.0",
-  "embedding_model": "multilingual-e5-small",
-  "chunker_version": "md-heading-v1"
+  "embedding_model": "multilingual-e5-large",
+  "chunker_version": "md-heading-v1",
+  "repo": null,        // p10-1A-1: optional, omitted when null (code corpus only)
+  "code_lang": null    // p10-1A-1: optional, omitted when null (code corpus only)
 }
 ```

@@ -264,7 +280,7 @@ Per-query failure 는 `bulk_search_item.v1.error` (error.v1) 에 격리, 다른
  "grounded": true,
  "refusal_reason": null,
  "model": { "id": "qwen2.5:14b-instruct", "provider": "ollama" },
-  "embedding": { "id": "multilingual-e5-small", "provider": "fastembed", "dimensions": 384 },
+  "embedding": { "id": "multilingual-e5-large", "provider": "fastembed", "dimensions": 1024 },
  "prompt_template_version": "rag-v1",
  "retrieval": {
    "trace_id": "ret_4a8b2c1e",
@@ -297,6 +313,17 @@ Per-query failure 는 `bulk_search_item.v1.error` (error.v1) 에 격리, 다른
  "scope": { "root": "/home/altair/KnowledgeBase", "include": ["**/*.md"], "exclude": [".git/**"] },
  "scanned": 142, "new": 12, "updated": 3, "skipped": 127, "errors": 0,
  "duration_ms": 4231,
+  "skipped_gitignore": 40,
+  "skipped_kebabignore": 5,
+  "skipped_builtin_blacklist": 80,
+  "skipped_generated": 2,
+  "skipped_size_exceeded": 1,
+  "skip_examples": {
+    "generated": ["crates/kebab-app/src/generated.rs"],
+    "size_exceeded": ["crates/kebab-app/fixtures/huge.rs"],
+    "builtin_blacklist": ["target/release/kebab"],
+    "gitignore": ["node_modules/lodash/index.js"]
+  },
  "items": [
    {
      "kind": "new|updated|skipped|error",
@@ -374,7 +401,7 @@ Per-query failure 는 `bulk_search_item.v1.error` (error.v1) 에 격리, 다른
  "token_estimate": 480,
  "chunker_version": "md-heading-v1",
  "embeddings": [
-    { "model": "multilingual-e5-small", "dimensions": 384, "embedding_id": "e_2f1a" }
+    { "model": "multilingual-e5-large", "dimensions": 1024, "embedding_id": "e_2f1a" }
  ]
 }
 ```
@@ -390,7 +417,7 @@ Per-query failure 는 `bulk_search_item.v1.error` (error.v1) 에 격리, 다른
    { "name": "data_dir_writable", "ok": true, "detail": "~/.local/share/kebab" },
    { "name": "sqlite_open", "ok": true, "detail": "kebab.sqlite (schema v1)" },
    { "name": "lancedb_open", "ok": true, "detail": "lancedb/" },
-    { "name": "embedding_model", "ok": true, "detail": "multilingual-e5-small (384d)" },
+    { "name": "embedding_model", "ok": true, "detail": "multilingual-e5-large (1024d)" },
    { "name": "ollama_reachable", "ok": true, "detail": "http://127.0.0.1:11434" },
    { "name": "ollama_model_pulled", "ok": false, "detail": "qwen2.5:14b-instruct missing", "hint": "ollama pull qwen2.5:14b-instruct" }
  ]
@@ -434,6 +461,8 @@ pub struct PromptTemplateVersion(pub String);
 pub struct SchemaVersion(pub &'static str);
 ```

+Note: `chunker_version` family extended in phase 10 (per-language pattern, see 2026-05-15 spec §3.3 for canonical list). Each new language AST chunker registers its own `ChunkerVersion` label (e.g. `code-rust-ast-v1`, `code-python-ast-v1`). The existing `md-heading-v1` / `pdf-page-v1` labels are unaffected.
+
 ### 3.3 RawAsset

 ```rust
@@ -577,6 +606,11 @@ pub struct Metadata {
    pub trust_level: TrustLevel,
    pub user_id_alias: Option<String>,
    pub user: serde_json::Map<String, serde_json::Value>,
+    // p10-1A-1: code corpus fields — None for non-code assets.
+    pub repo: Option<String>,         // git repo name (top-level dir or remote basename)
+    pub git_branch: Option<String>,   // HEAD branch name at ingest time
+    pub git_commit: Option<String>,   // HEAD commit SHA (short, 12 chars) at ingest time
+    pub code_lang: Option<String>,    // lowercase language name (e.g. "rust", "python")
 }

 pub enum SourceType { Markdown, Note, Paper, Reference, Inbox }
@@ -1209,9 +1243,9 @@ chunker_version           = "md-heading-v1"

 [models.embedding]
 provider   = "fastembed"
-model      = "multilingual-e5-small"
+model      = "multilingual-e5-large"
 version    = "v1"
-dimensions = 384
+dimensions = 1024
 batch_size = 64

 [models.llm]
@@ -1370,8 +1404,11 @@ pub trait JobRepo {
 kebab-cli, kebab-tui, kebab-desktop
   └─> kebab-app
         ├─> kebab-source-fs
+         │     └─> kebab-parse-code (p10-1A-1: lang detect / repo detect / skip policy)
         ├─> kebab-parse-md / kebab-parse-pdf / kebab-parse-image / kebab-parse-audio
         │     └─> kebab-parse-types (parser intermediate)
+         ├─> kebab-parse-code
+         │     └─> kebab-core (domain types only — NO store/embed/llm/rag/UI)
         ├─> kebab-normalize
         │     └─> kebab-parse-types
         ├─> kebab-chunk
@@ -1474,7 +1511,7 @@ $ kebab doctor
 ✓ data_dir_writable     ~/.local/share/kebab
 ✓ sqlite_open           kebab.sqlite (schema v1)
 ✓ lancedb_open          lancedb/
-✓ embedding_model       multilingual-e5-small (384d)
+✓ embedding_model       multilingual-e5-large (1024d)
 ✓ ollama_reachable      http://127.0.0.1:11434
 ✗ ollama_model_pulled   qwen2.5:14b-instruct missing
                        hint: ollama pull qwen2.5:14b-instruct
@@ -1510,6 +1547,26 @@ agent 가 분기). HTTP-SSE transport 는 fb-29 deferral 따라 P+. classify
 모듈은 `kebab-app::error_wire` 에 single source — kebab-cli + kebab-mcp
 공유.

+### 10.3 Eval metrics (fb-39)
+
+#### Retrieval metrics (ground-truth curated)
+
+`kebab eval run` 이 golden query suite (`fixtures/golden_queries.yaml`) 대해 메트릭 계산. Curator 가 `expected_chunk_ids` 및 `expected_doc_ids` 설정 시에만 측정 가능 (shipped template 은 empty — workspace 별 자체 채움).
+
+| 메트릭 | 정의 | 조건 |
+|--------|------|------|
+| `hit_at_k` | top-k 안 expected chunk 존재 여부 (binary). P(hit@k=true) 평균 | `expected_chunk_ids` 채움 |
+| `MRR` | Mean Reciprocal Rank (첫 관련 chunk rank 역수 평균) | `expected_chunk_ids` 채움 |
+| `recall_at_k_doc` | top-k 안 expected doc 비율 (`|top-k_docs ∩ expected_doc_ids| / |expected_doc_ids|`) | `expected_doc_ids` 채움 |
+| `precision_at_k_chunk` (fb-39) | top-k 안 chunk_id 가 `expected_chunk_ids` 에 포함된 비율. 분모 = k (fixed) — `top-k` 부족도 precision 손실로 간주. 빈 `expected_chunk_ids` query 는 skip. | `expected_chunk_ids` 채움 |
+
+#### Groundedness metrics (rule-based)
+
+| 메트릭 | 정의 |
+|--------|------|
+| `must_contain` pass | answer 문자열 이 `golden.must_contain` 의 모든 substring 포함 |
+| `forbidden` pass | answer 문자열 이 `golden.forbidden` 의 substring 미포함 |
+
 ---

 ## 11. 동결 범위 / 변경 정책
@@ -1535,6 +1592,8 @@ agent 가 분기). HTTP-SSE transport 는 fb-29 deferral 따라 P+. classify
 - real-time collab
 - enterprise auth

+코드 ingest 는 더 이상 비-스코프 아님 (2026-05-15 spec). 단 multi-workspace / watch mode / history aware (git blame 기반 citation, diff-aware re-chunking) 는 그대로 비-스코프.
+
 ---

 ## 12. 다음 단계
--- a/docs/superpowers/specs/2026-05-10-p9-fb-39b-embedding-upgrade-design.md
+++ b/docs/superpowers/specs/2026-05-10-p9-fb-39b-embedding-upgrade-design.md
@@ -0,0 +1,198 @@
+---
+title: "p9-fb-39b — Embedding model upgrade design (multilingual-e5-large)"
+phase: P9
+component: kebab-embed-local + kebab-store-vector + kebab-config + kebab-app
+task_id: p9-fb-39b
+status: design
+target_version: 0.7.0
+contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
+contract_sections: [§4 search, §5 storage, §9 versioning cascade]
+date: 2026-05-10
+---
+
+# p9-fb-39b — Embedding model upgrade
+
+## Goal
+
+fb-39 의 lever 적용 — embedding model 을 `multilingual-e5-small` (384 dim) 에서 `multilingual-e5-large` (1024 dim) 로 업그레이드. 도그푸딩 한국어 corpus 의 retrieval precision 개선.
+
+fb-39 가 측정 도구 (P@5 / P@10) 를 추가했으므로, 본 PR 머지 후 small vs large 비교 가능.
+
+`bge-m3` 검토했으나 fastembed 4.9.1 의 `EmbeddingModel` enum 에 미포함 — `UserDefinedEmbeddingModel` ONNX 직접 로드 path 는 별도 작업 (fb-39c 후보). 본 PR scope = e5-large 만.
+
+## Behavior contract
+
+### Embedding model
+
+- 신규 default: `multilingual-e5-large` (1024 dim).
+- `kebab-embed-local::resolve_model` 에 신규 arm:
+
+```rust
+"multilingual-e5-large" => Ok(EmbeddingModel::MultilingualE5Large),
+```
+
+기존 `multilingual-e5-small` arm 그대로 (backwards-compat opt-out).
+
+### Config defaults
+
+- `Config::defaults().models.embedding.model`: `"multilingual-e5-small"` → `"multilingual-e5-large"`.
+- `Config::defaults().models.embedding.dimensions`: `384` → `1024`.
+- `kebab init` 가 생성하는 config.toml 템플릿 동일 갱신.
+
+기존 user TOML 이 `model = "multilingual-e5-small"` 또는 `dimensions = 384` 명시한 경우 그대로 유지 — `serde` 가 user value 우선. opt-out 가능.
+
+### Cascade
+
+- `embedding_version`: 자동 변경 (config.models.embedding.model 값 그대로 wire 에 emit). `multilingual-e5-small` → `multilingual-e5-large`.
+- fb-23 incremental ingest: 4-input match (blake3 + parser_version + chunker_version + embedding_version) 에서 embedding_version 깨짐 → 모든 chunk 재-embed. text/parse/chunk 비용 회피, embed 비용만 발생.
+- `eval_runs.config_snapshot_json`: 새 version 자동 기록. 비교 시 동일 version 끼리.
+- design §9 cascade rule 의 5 키 중 `embedding_version` 변경 — binary release 트리거 (CLAUDE.md `Versioning cascade` 룰).
+
+### Migration policy
+
+LanceDB stored vectors 의 dim 과 `config.models.embedding.dimensions` 가 mismatch 면:
+
+- `LanceVectorStore::open` (또는 첫 호출) 가 비교 → mismatch 시 신규 `ErrorV1`:
+  - `code = "embedding_dim_mismatch"`
+  - `message`: `"vector index dim 384 vs config dim 1024"`
+  - `hint`: `"기존 vector index 가 4-dim, config 는 N-dim. 'kebab reset --vector-only && kebab ingest' 로 재구축."`
+- CLI: exit 1 + error.v1 stderr (또는 비-`--json` 모드 plain stderr).
+- silent migration / auto-wipe 안 함 — 사용자 명시 동의 필요.
+
+remediation flow:
+
+```
+$ kebab search "..."
+error: vector index dim 384 vs config dim 1024
+
+Hint: 기존 vector index 가 384-dim, config 는 1024-dim.
+'kebab reset --vector-only && kebab ingest' 로 재구축.
+
+$ kebab reset --vector-only
+[wipe LanceDB + SQLite embedding_records]
+
+$ kebab ingest
+[full re-embed with new model — fastembed downloads e5-large ONNX (~1.3 GB) on first run]
+```
+
+### Wire shape
+
+신규 wire field 없음. `error.v1.code` 의 valid value namespace 에 `"embedding_dim_mismatch"` 추가 (string, enum 아님 — additive).
+
+## Allowed / forbidden dependencies
+
+- `kebab-embed-local`: 신규 dep 없음. fastembed enum variant 추가만.
+- `kebab-store-vector`: 신규 dep 없음. LanceDB schema reader 사용.
+- `kebab-config`: 신규 dep 없음. defaults 값 변경.
+- `kebab-app`: 신규 dep 없음. error propagation.
+
+`kebab-core` 의 다른 `kebab-*` 의존 금지 룰 그대로.
+
+## Public surface delta
+
+### kebab-embed-local (`lib.rs`)
+
+```rust
+fn resolve_model(name: &str) -> Result<EmbeddingModel> {
+    match name {
+        "multilingual-e5-small" => Ok(EmbeddingModel::MultilingualE5Small),
+        "multilingual-e5-large" => Ok(EmbeddingModel::MultilingualE5Large),  // 신규
+        other => anyhow::bail!(/* ... */),
+    }
+}
+```
+
+### kebab-config (defaults + TOML 템플릿)
+
+```rust
+EmbeddingCfg {
+    provider: "fastembed".to_string(),
+    model: "multilingual-e5-large".to_string(),
+    dimensions: 1024,
+    // ... 기타 ...
+}
+```
+
+generated config.toml 템플릿 도 같이 갱신.
+
+### kebab-store-vector (`lib.rs` 또는 신규 helper)
+
+```rust
+impl LanceVectorStore {
+    pub fn open(...) -> Result<Self> {
+        // 기존 open 로직 ...
+        let stored_dim = read_schema_vector_dim(&table)?;
+        if stored_dim != config_dim {
+            anyhow::bail!(StructuredError(ErrorV1 {
+                code: "embedding_dim_mismatch".to_string(),
+                message: format!("vector index dim {stored_dim} vs config dim {config_dim}"),
+                hint: Some(format!(
+                    "기존 vector index 가 {stored_dim}-dim, config 는 {config_dim}-dim. \
+                     'kebab reset --vector-only && kebab ingest' 로 재구축."
+                )),
+                // ...
+            }));
+        }
+        Ok(...)
+    }
+}
+```
+
+(정확한 LanceDB schema reading API 는 구현 시 확인 — `Table::schema()` 또는 `arrow_schema::Schema` 직접 inspect.)
+
+## Test plan
+
+| kind | description |
+|------|-------------|
+| unit (kebab-embed-local) | `resolve_model("multilingual-e5-large")` returns Ok |
+| unit (kebab-embed-local) | `check_dim(1024, 1024)` ok |
+| unit (kebab-embed-local) | `check_dim(384, 1024)` Err — message mentions both dims |
+| unit (kebab-config) | `Config::defaults().models.embedding.model == "multilingual-e5-large"` |
+| unit (kebab-config) | `Config::defaults().models.embedding.dimensions == 1024` |
+| unit (kebab-config) | TOML `model = "multilingual-e5-small"` deserialize 정상 (backwards-compat) |
+| unit (kebab-config) | 생성된 config.toml 템플릿 안 `model = "multilingual-e5-large"`, `dimensions = 1024` |
+| unit (kebab-store-vector) | mismatch fixture (384-dim stored + 1024 cfg) → `embedding_dim_mismatch` ErrorV1 |
+| 통합 (kebab-cli) | mismatch scenario — pre-existing 384-dim DB + new config → exit 1 + error.v1 stderr (`code = embedding_dim_mismatch`) + hint mentions reset --vector-only |
+| 통합 (kebab-cli) | small config 로 fresh ingest + search → 정상 (backwards-compat path 검증) |
+
+`multilingual-e5-large` 모델 다운로드 회피 위해 unit/integration 테스트는 fixture 또는 mock — 실 모델 호출 안 함. 첫 도그푸딩 시 사용자가 fastembed cache 다운로드.
+
+## Implementation steps (high-level)
+
+1. `kebab-embed-local::resolve_model` arm + check_dim 단위 테스트.
+2. `kebab-store-vector` dim mismatch detection + ErrorV1 + 단위 테스트.
+3. `kebab-config` defaults flip + TOML 템플릿 + 단위 테스트.
+4. `kebab-cli` integration: mismatch error.v1 wire + backwards-compat path 통합 테스트.
+5. README + SMOKE + design + HOTFIXES + status flip.
+
+5 task. 단일 PR, single 세션 가능.
+
+## Risks / notes
+
+- **첫 실행 모델 다운로드**: e5-large ONNX ~1.3 GB. fastembed cache (`config.storage.model_dir/fastembed/`) 에 자동 다운로드 (첫 호출 시). progress 표시 없음 — 사용자 침묵 latency. `kebab doctor` 또는 README 에 경고 안내.
+- **Search/ingest latency**: e5-large 가 e5-small 대비 ~3-4× embedding 시간. ingest 비용 증가 (one-time + 신규 docs). search 시 query embed per-call 증가.
+- **Disk usage**: vector dim 2.6× → LanceDB 약 2.7× 증가.
+- **HOTFIXES entry**: dim mismatch UX (error.v1 + reset --vector-only flow) 가 frozen design 안 명시 안 된 신규 동작 — HOTFIXES 한 항목 추가.
+- **eval comparison**: fb-39 P@k 가 측정 도구. 도그푸딩 corpus + golden 의 expected_chunk_ids 채워서 small vs large 정량 비교 별도 (PR 안 의무 아님).
+- **fb-23 incremental ingest 와의 상호작용**: embedding_version 변경 → 모든 doc 재-embed. fb-23 의 unchanged path 는 한 번도 hit 안 함 (예상 동작).
+- **release trigger**: design §9 cascade rule 의 `embedding_version` 변경 → CLAUDE.md `Versioning cascade` 룰에 따라 binary 0.6 → 0.7 minor bump 필요.
+
+## Out of scope
+
+- bge-m3 또는 user-defined ONNX path (fb-39c 후보).
+- Other lever (RRF / cross-encoder / chunk policy).
+- Auto-migration / background re-vector.
+- LanceDB schema migration tooling (별도 wipe + re-ingest).
+- multi-model coexistence (한 KB 안 small + large 동시).
+- precision 정량 비교 의무 (별도 도그푸딩).
+
+## Documentation updates (implementation PR 동시)
+
+- `README.md` `[models.embedding]` config 섹션 — default 변경 + small opt-out 안내 + dim mismatch 시 reset 명령 안내.
+- `docs/SMOKE.md` — upgrade walkthrough (`kebab reset --vector-only && kebab ingest` 시퀀스 + 첫 ONNX 다운로드 latency 경고).
+- `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` §5 storage / §9 versioning 적절 절 — 새 default + dim 1024 명시.
+- `tasks/HOTFIXES.md` — dim mismatch UX entry.
+- `tasks/p9/p9-fb-39-retrieval-precision-tuning.md` banner — fb-39b lever 적용 (embedding upgrade) ✅ 추가 (단 spec status 는 fb-39 frozen).
+- `tasks/p9/p9-fb-39b-embedding-upgrade.md` 신규 task spec (만들거나, fb-39 sub-task 로 frontmatter 처리).
+- `tasks/INDEX.md` — fb-39b 행 추가 ✅.
+- 본 PR 머지 후 `chore: bump version 0.6 → 0.7` + tag (CLAUDE.md release 절차).
--- a/docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md
+++ b/docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md
@@ -0,0 +1,816 @@
+# kebab — Code Ingest Design
+
+기준일: 2026-05-15
+대상: kebab 워크스페이스를 **코드 corpus** 로 확장 (`Tier 1` AST per-language + `Tier 2` resource-aware + `Tier 3` paragraph fallback). frozen design doc `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` 의 후속이자, 그 §11 비-스코프 중 "모든 파일 포맷의 완벽한 parsing" 을 **부분적으로** 깨는 첫 spec. 단, multi-workspace / watch mode 같은 다른 비-스코프는 그대로 유지.
+
+대상 사용자 시나리오: 한 부모 디렉토리 (`workspace.root`) 아래 *수십 개의 git repo* 를 clone 한 상태에서, 그 corpus 전체에 의미 검색 + RAG 를 한 곳에서 수행.
+
+---
+
+## 0. 동결된 결정 요약
+
+| # | 결정 | 값 | 근거 |
+|---|------|-----|------|
+| C1 | 스코프 단위 | 한 `workspace.root` 아래 여러 repo (multi-workspace 안 함) | frozen Q10 / §11 그대로 — 부모 디렉토리 한 줄로 커버 |
+| C2 | chunking 전략 | Tier 1 = AST per-language, Tier 2 = resource-aware, Tier 3 = paragraph + line-window | 의미 단위가 언어별로 다름. 일률 적용 불가 |
+| C3 | embedding 모델 | 기존 `multilingual-e5-large` 유지 | 코드 + 문서 동일 벡터 공간 → cross-corpus 검색. embedding_version cascade 회피 |
+| C4 | ignore 통합 | `.gitignore` 자동 honor + `.kebabignore` 추가 layer + 최소 built-in safety net | 사용자 mental model 자연스러움. `.gitignore` 가 source of truth |
+| C5 | repo 인식 | `.git/` walk-up 자동 감지, identifier = dir 이름 | 단순 / deterministic / git remote 미설정 repo 도 안 깨짐 |
+| C6 | branch 처리 | working tree only. branch 변경 후 ingest 는 blake3 hash 차이로 incremental reprocess | git history aware 색인은 §3 도메인 모델 크게 흔듦 — P+ |
+| C7 | Citation variant | 새 `code` variant 도입 (line/page/region/caption/time/**code**) | 의미 분리 명확 — agent / consumer 분기 깔끔 |
+| C8 | search hit 추가 | `SearchHit.repo`, `SearchHit.code_lang` (optional, additive minor) | repo 격리 / 통계 / filter |
+| C9 | 새 filter | `--media code`, `--code-lang <list>`, `--repo <name>` | search ergonomics |
+| C10 | chunker_version | per-language (`code-rust-ast-v1` 등) | 언어별 chunker 독립 진화, §9 cascade rule 깔끔 |
+| C11 | crate 구조 | 새 crate `kebab-parse-code` (모든 언어 mod) + 기존 `kebab-chunk` 모듈 확장 | 22 crates 한 번만 증가. 언어 추가는 모듈 한 쌍 |
+| C12 | symbol path | per-language convention (`mod::fn` / `pkg.cls.method` / `module/Class.method` …) | 각 언어 self-reference 관습 그대로 |
+| C13 | RAG prompt | Phase 1A 는 `rag-v2` 유지. 측정 후 `rag-v3` 도입 검토 | YAGNI |
+| C14 | 특수 파일 | manifest 류 (`Cargo.toml` 등) 는 파일 통째로 1 chunk (`manifest-file-v1`) | 작은 파일은 전체 보기가 더 유용 |
+| C15 | Phase 분할 | 1A Rust → 1B Python+TS/JS → 1C Go+Java/Kotlin → 1D C/C++ → 2 Tier 2 → 3 Tier 3 | 점진 도입, dogfooding 가능 |
+| C16 | built-in skip 최소 | 5 entries: `node_modules/` `target/` `__pycache__/` `.venv/` `venv/` `env/` | `.gitignore` 가 메인 — built-in 은 safety net |
+| C17 | generated header sniff | `@generated` / `DO NOT EDIT` 등 marker 6 종 — 첫 ~500 byte read | 첫 도그푸딩 비용 차단 (protobuf 등) |
+| C18 | size cap | `max_file_bytes = 262144` (256 KiB), `max_file_lines = 5000` default | 대용량 fixture / minified 차단 |
+
+---
+
+## 1. 스코프 + 비-스코프
+
+### 1.1 스코프 (이 spec 으로 동결되는 것)
+
+- 코드 / 설정 파일 ingest 파이프라인 (parse → chunk → embed → store → retrieve → answer)
+- 새 Citation variant `code`
+- 새 SearchHit 필드 (`repo`, `code_lang`)
+- 새 search filter (`--media code`, `--code-lang`, `--repo`)
+- 새 chunker_version 라벨 family (`code-{lang}-ast-v1`, `k8s-manifest-resource-v1`, `dockerfile-file-v1`, `manifest-file-v1`, `code-text-paragraph-v1`)
+- 새 crate `kebab-parse-code`
+- 기존 `kebab-chunk` 모듈 확장
+- repo 자동 감지 + `metadata.repo` / `git_branch` / `git_commit`
+- ignore 통합 정책 (`.gitignore` honor + `.kebabignore` + built-in)
+- generated / vendored / size cap skip 정책
+- IngestReport 카운트 분류 확장
+- 새 config 절 `[ingest.code]`
+- Phase 분할 (1A → 1B → 1C → 1D → 2 → 3)
+
+### 1.2 비-스코프 (이 spec 으로 명시적으로 *안 다루는* 것)
+
+- **Multi-workspace** — 여전히 single `workspace.root`. 사용자가 직접 부모 디렉토리 정렬.
+- **Watch mode** — 여전히 명시 ingest 만.
+- **git history aware indexing** — branch / commit 별 snapshot 색인 안 함. working tree 한 시점만.
+- **LSP / go-to-definition / find-references** — 코드 *내비게이션* 은 IDE / CC 가 잘 함. kebab 은 *의미 검색* + *RAG* 만.
+- **Code-specific embedding 모델** — Phase 2+ 측정 후 검토. 현재 spec 에선 e5-large 유지.
+- **`rag-v3` (code-aware prompt)** — Phase 2+ 측정 후 검토.
+- **서브모듈 / git worktree** — `.git/` 가 dir 인 normal repo 만 인식. submodule (`.git` file) 은 metadata.repo 만 null 또는 부모 repo 이름 fallback.
+- **Cross-repo 의도적 dedup** — blake3 content hash 의 우연 dedup 만 존재. 명시적 dedup 로직 안 함.
+- **`kebab://` URL handler** — frozen §11 그대로 P+.
+
+---
+
+## 2. Phase 분할 + 마일스톤
+
+각 phase = 별도 task spec (`tasks/p10/p10-1a-1-code-ingest-framework.md` 등) + 별도 PR. **Phase 1A-1** 이 *프레임워크 일체* (새 crate skeleton, 새 Citation variant, repo metadata, 새 filter, ignore 정책 전체, skip 정책, IngestReport 세분화) 를 들고 들어가는 가장 무거운 phase. 1A-2 이후는 *언어 / chunker 추가* 만.
+
+| Phase | 내용 | 새 crate / 모듈 | 새 chunker_version | 마일스톤 |
+|-------|------|----------------|--------------------|----------|
+| **1A-1** | 프레임워크 일체 — Citation `code` variant, SearchHit `repo`/`code_lang`, 새 filter (`--media code` / `--code-lang` / `--repo`), ignore 통합 정책, skip 정책 (built-in/generated/size), IngestReport 세분화, config `[ingest.code]` 절. `kebab-parse-code` crate **skeleton** (lang/repo/skip 모듈만, 언어 parser 없음) | `kebab-parse-code` 신설 — infrastructure only, language parser 모듈 없음 | *없음* (chunker 추가 0) | wire schema additive minor commit. 기존 markdown corpus 무영향 검증 (regression test). 코드 ingest 아직 활성 안 됨 |
+| **1A-2** | Rust AST chunker 자체 + tree-sitter-rust 도입. Rust 파일 ingest 활성화 | 동일 crate 에 `rust.rs` parser 모듈 + `kebab-chunk/code_rust_ast_v1.rs` | `code-rust-ast-v1` | kebab 자기 자신 dogfooding 가능 |
+| **1B** | Python + TS/JS AST ingest | 동일 crate 에 `python.rs` / `typescript.rs` / `javascript.rs` 모듈 + chunker 추가 | `code-python-ast-v1`, `code-ts-ast-v1`, `code-js-ast-v1` | 사내 ML 코드 + 웹 코드 검색 |
+| **1C** | Go + Java + Kotlin AST ingest | 동일 crate 에 모듈 추가 | `code-go-ast-v1`, `code-java-ast-v1`, `code-kotlin-ast-v1` | 사내 backend 검색 |
+| **1D** | C + C++ AST ingest | 동일 crate 에 모듈 추가 | `code-c-ast-v1`, `code-cpp-ast-v1` | system code 검색 (마지막) |
+| **2** | Tier 2 resource-aware: k8s manifest + Dockerfile + 일반 manifest | 동일 crate 에 모듈 추가 | `k8s-manifest-resource-v1`, `dockerfile-file-v1`, `manifest-file-v1` | k8s 운영 / DevOps 검색 |
+| **3** | Tier 3 fallback: shell + 미지원 확장자 | 동일 crate 에 모듈 추가 | `code-text-paragraph-v1` | 잡 텍스트 fallback |
+
+**Phase 1A 가 1A-1 / 1A-2 로 쪼개진 이유**: 1A 가 들고 들어가는 *프레임워크 surface* (Citation variant, SearchHit 필드, filter 3종, skip 정책, config 절, IngestReport 세분화, 새 crate) 가 *언어 chunker 자체* 와 독립적으로 검증 가능. 1A-1 머지 후 기존 markdown corpus 가 *byte-level identical* 한 출력을 내는지 regression test 로 검증 — 코드 ingest 가 활성화되지 않은 상태에서 wire schema 변경 안전성을 별도 확인. 1A-2 는 Rust chunker 자체에만 집중, dogfooding 가능 지점 = 1A-2 머지.
+
+**Binary version bump 트리거 정리**:
+- **1A-1 머지**: bump 없음. wire 의 additive minor 변경 (CLAUDE.md "wire 의 additive minor 변경 은 backward-compat 이라 본 트리거에 해당 안 됨" 적용). 코드 ingest 미활성 — 사용자 도그푸드 surface 변경 없음.
+- **1A-2 머지**: minor bump (예: `0.6` → `0.7`). 사용자 도그푸딩 가능 = bump 트리거.
+- 이후 phase (1B/1C/1D/2/3) 의 bump 여부는 각 phase 의 task spec 에서 결정 — wire / flag 추가 없으면 patch bump.
+
+---
+
+## 3. 도메인 모델 영향
+
+frozen design `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` 의 §3 (도메인 모델) 과 §2 (wire schema) 를 *additive minor* 로 확장. breaking 변경 없음. 영향 받는 frozen design 섹션은 [§10 cascade](#10-변경-영향--cascade) 에 정리.
+
+### 3.1 새 Citation variant: `code`
+
+frozen design §2.1 의 5 variant (`line` / `page` / `region` / `caption` / `time`) 에 `code` 추가 → 총 6 variant.
+
+```json
+{
+  "schema_version": "citation.v1",
+  "kind": "code",
+  "path": "kebab/crates/kebab-chunk/src/md_heading_v1.rs",
+  "uri":  "kebab/crates/kebab-chunk/src/md_heading_v1.rs#L142-L168",
+  "line_start": 142,
+  "line_end": 168,
+  "symbol": "MdHeadingV1Chunker::chunk_doc",
+  "lang": "rust"
+}
+```
+
+**Wire 형태 — flat**: 기존 5 variants 와 동일한 패턴 (`Citation::Line` 도 `start` / `end` / `section` 이 top-level, 중첩 없음). serde `#[serde(tag = "kind")]` 외부 tag enum 이라 variant 별 필드가 top-level 에 들어감.
+
+`symbol` 은 nullable — Tier 1 AST chunk 면 채움, Tier 2/3 면 `null`.
+`lang` 은 `--code-lang` filter 와 같은 식별자 (lowercase). null 가능.
+
+기존 5 variant 와 마찬가지로 `path` + `uri` 는 항상 채움. `uri` 는 `path#L<start>-L<end>` (W3C Media Fragments) 그대로.
+
+### 3.2 SearchHit 신규 optional 필드
+
+frozen design §2.2 의 SearchHit 에 두 필드 추가, 모두 optional / nullable, additive minor:
+
+```json
+{
+  "schema_version": "search_hit.v1",
+  "rank": 1,
+  "score": 0.78,
+  "score_kind": "rrf",
+  "chunk_id": "...",
+  "doc_id": "...",
+  "doc_path": "kebab/crates/kebab-chunk/src/md_heading_v1.rs",
+  "heading_path": ["src", "md_heading_v1"],
+  "section_label": "MdHeadingV1Chunker::chunk_doc",
+  "snippet": "...",
+  "citation": { "kind": "code", "...": "citation.v1" },
+
+  "repo": "kebab",          // ← 신규 optional. .git/ walk-up 결과.
+  "code_lang": "rust",      // ← 신규 optional. Tier 1/2/3 모두 채움 (Tier 2 의 yaml 등 포함).
+
+  "retrieval": { "...": "..." },
+  "index_version": "v1.0",
+  "embedding_model": "multilingual-e5-large",
+  "chunker_version": "code-rust-ast-v1"
+}
+```
+
+기존 consumer (Claude Code skill 등) 는 두 필드 미인지 시 무시 — backwards-compat.
+
+Markdown / PDF / 이미지 hit 는 두 필드 모두 null. 코드 hit 도 *repo 외부 single-file ingest* (`kebab ingest-file`) 인 경우 `repo` null 가능.
+
+### 3.3 chunker_version 명명 (per-language)
+
+frozen design §3.2 의 `chunker_version` 라벨 family 확장. **per-language 독립** — 언어 chunker 버그 픽스가 다른 언어 chunks 무효화 안 함.
+
+```text
+기존:
+  md-heading-v1
+  pdf-page-v1
+
+Phase 1A 추가:
+  code-rust-ast-v1
+
+Phase 1B 추가:
+  code-python-ast-v1
+  code-ts-ast-v1
+  code-js-ast-v1
+
+Phase 1C 추가:
+  code-go-ast-v1
+  code-java-ast-v1
+  code-kotlin-ast-v1
+
+Phase 1D 추가:
+  code-c-ast-v1
+  code-cpp-ast-v1
+
+Phase 2 추가:
+  k8s-manifest-resource-v1
+  dockerfile-file-v1
+  manifest-file-v1
+
+Phase 3 추가:
+  code-text-paragraph-v1
+```
+
+cascade rule (frozen design §9):
+- 한 언어 chunker 버그 픽스 → 해당 `code-{lang}-ast-vN` 만 bump → `embedding_records` 의 해당 chunk 만 invalidate → 다음 ingest 에서 해당 언어 파일만 reprocess.
+- 공통 코드 (예: tree-sitter wrapper) 변경 → 영향 받는 모든 언어 chunker 동시 bump.
+
+### 3.4 symbol path 포맷 (per-language convention)
+
+`Citation.code.symbol` 의 값. 각 언어의 *self-reference 관습* 그대로.
+
+| 언어 | 포맷 | 예시 |
+|------|------|------|
+| Rust | `mod::sub::fn_name`, `impl Type::method`, `Trait::method` | `chunk::md_heading_v1::MdHeadingV1Chunker::chunk_doc` |
+| Python | `pkg.module.Class.method`, `pkg.module.func` | `kebab_eval.metrics.compute_mrr` |
+| TS/JS | `module/Class.method`, `module/func`, `module/default` | `src/search/retriever/Retriever.search` |
+| Go | `package.Func`, `package.(Receiver).Method` | `chunk.(*MdHeadingV1Chunker).ChunkDoc` |
+| Java/Kotlin | `package.Class.method` | `com.kebab.chunk.MdHeadingV1Chunker.chunkDoc` |
+| C | `func_name` | `parse_blocks` |
+| C++ | `namespace::Class::method`, `namespace::func` | `kebab::chunk::MdHeadingV1Chunker::chunk_doc` |
+
+**top-level scope** (top-level fn / struct / class 정의 외부의 code, 예: Rust `use` / Python `import` block) 는 `<top-level>` 로 표기. null 아님 — chunk 가 의미 단위 *없는* 영역임을 명시.
+
+**module / namespace 만 있고 symbol 없는 경우** (예: Rust mod 선언만 모인 `lib.rs`): `<module>` 로 표기.
+
+### 3.5 metadata 확장
+
+frozen design §3.6 (Metadata / Provenance) 에 코드 ingest 시 채워지는 필드:
+
+```rust
+pub struct Metadata {
+    // 기존 필드 ...
+    pub lang: Option<String>,         // BCP-47 (자연어). 코드 파일은 보통 null. 코드 안의 주석 dominant lang detection 안 함.
+    pub tags: Vec<String>,
+    // ...
+
+    // 신규 (코드 ingest)
+    pub repo: Option<String>,         // .git/ walk-up 결과. dir 이름.
+    pub git_branch: Option<String>,   // ingest 시점 HEAD branch.
+    pub git_commit: Option<String>,   // ingest 시점 HEAD commit SHA (full 40 hex).
+    pub code_lang: Option<String>,    // tree-sitter parser 이름과 매칭. lowercase.
+}
+```
+
+`code_lang` 식별자 정규화 (이 spec 의 canonical 정의):
+- Rust 파일 (`.rs`) → `rust`
+- Python (`.py`, `.pyi`) → `python`
+- TypeScript (`.ts`, `.tsx`) → `typescript`
+- JavaScript (`.js`, `.jsx`, `.mjs`, `.cjs`) → `javascript`
+- Go (`.go`) → `go`
+- Java (`.java`) → `java`
+- Kotlin (`.kt`, `.kts`) → `kotlin`
+- C (`.c`, `.h`) → `c`
+- C++ (`.cpp`, `.cc`, `.cxx`, `.hpp`, `.hh`, `.hxx`) → `cpp`
+- YAML / k8s manifest (`.yaml`, `.yml`) → `yaml`
+- Dockerfile (`Dockerfile`, `*.dockerfile`) → `dockerfile`
+- TOML (`.toml`) → `toml`
+- JSON (`.json`) → `json`
+- Shell (`.sh`, `.bash`, `.zsh`) → `shell`
+- Make (`Makefile`, `*.mk`) → `make`
+- 미지원 / Tier 3 fallback → null
+
+확장자 sniff 는 `kebab-parse-code` 의 단일 함수 `code_lang_for_path(path: &Path) -> Option<&'static str>` 에서 결정. 이 함수가 *유일한 source of truth*.
+
+---
+
+## 4. Wire schema v1 변경 (모두 additive minor)
+
+### 4.1 변경 요약 표
+
+| schema | 변경 | 영향 |
+|--------|------|------|
+| `citation.v1` | `kind = "code"` variant 추가 + `code: { line_start, line_end, symbol, lang }` 키 추가 | additive minor (기존 consumer 미인지 시 빠짐) |
+| `search_hit.v1` | `repo`, `code_lang` 두 optional 필드 추가 | additive minor |
+| `ingest_report.v1` | `skipped_generated`, `skipped_size_exceeded`, `skipped_builtin_blacklist`, `skipped_gitignore` 카운트 + `skip_examples` 추가 | additive minor |
+| `schema.v1` | `media_breakdown` 에 `code` 카테고리 추가, 새 `code_lang_breakdown` 표 추가 | additive minor |
+| `doctor.v1` | (변경 없음) | — |
+| `answer.v1` | (Phase 1A 변경 없음. citation 객체가 code variant 일 수 있다는 점만 implicit) | — |
+| `fetch_result.v1` | (변경 없음, kind=chunk / doc / span 그대로) | — |
+
+### 4.2 JSON Schema 파일 수정 위치
+
+```
+docs/wire-schema/v1/
+  citation.schema.json          ← code variant 추가
+  search_hit.schema.json        ← repo / code_lang 추가
+  ingest_report.schema.json     ← skip 카운트 + skip_examples 추가
+  schema.schema.json            ← code_lang_breakdown 추가
+```
+
+각 schema 파일에 `"additionalProperties": false` 가 켜져 있으면 새 필드 정의 추가만으로 valid 가 안 됨 — 새 필드를 `properties` 에 명시하고 `required` 는 그대로 유지 (optional).
+
+### 4.3 `--json` 출력 호환성 검증
+
+Phase 1A 구현 시 기존 markdown corpus 의 hit / answer 가 *예전과 byte-level identical* 한 출력 내는지 단위 테스트 추가:
+
+- `search_hit.v1` 의 `repo` / `code_lang` 필드는 markdown hit 에서 *output 에 등장하지 않음* (snake-case omit-null serialization).
+- `ingest_report.v1` 의 새 카운트 필드는 코드 ingest 가 실행되지 않으면 `0` 으로 채워짐 (또는 omit-zero — task spec 단계에서 결정).
+- `citation.v1` 의 `code` 키는 `kind != "code"` variant 에서 항상 absent.
+
+---
+
+## 5. Ingest 파이프라인 변경
+
+### 5.1 Repo 자동 감지
+
+```text
+fn detect_repo(path: &Path) -> Option<RepoMeta> {
+    // path 의 부모 디렉토리에서 위로 .git/ (dir) 만날 때까지 walk.
+    // workspace.root 위로는 안 올라감 (boundary).
+    // .git/ 가 file 인 경우 (worktree marker / submodule) → metadata.repo = None,
+    //   metadata.git_branch / commit = None.
+    // .git/ 가 dir 이면:
+    //   - repo_name = .git/ 의 부모 dir 이름
+    //   - branch = git symbolic-ref HEAD (없으면 detached HEAD → "detached")
+    //   - commit = git rev-parse HEAD (40 hex 또는 None if empty repo)
+}
+```
+
+`git` binary 호출 vs `gix` (gitoxide) library 사용 — task spec 에서 결정. 단 `git` binary 호출은 PATH 의존성 도입 (kebab 의 다른 곳엔 없음) → `gix` 선호.
+
+repo 감지는 ingest 시 *파일당 한 번* 만 — repo 별 캐시 (in-memory HashMap) 로 같은 repo 의 두 번째 파일부터는 lookup hit.
+
+### 5.2 ignore 통합 (`.gitignore` + `.kebabignore` + built-in)
+
+**우선순위** (앞이 강함):
+1. **Built-in safety net** — 항상 적용, 사용자 negate 가능 (`.kebabignore` 의 `!pattern`)
+2. **`.gitignore`** — repo 의 `.gitignore` 자동 honor. nested `.gitignore` 도 적용 (디렉토리 단위 cascade).
+3. **`.kebabignore`** — kebab 만의 추가 layer. workspace.root + 각 디렉토리 별 가능 (현재 동작 그대로).
+
+**Built-in safety net (5 entries 만)**:
+```text
+**/node_modules/
+**/target/
+**/__pycache__/
+**/.venv/
+**/venv/
+**/env/
+```
+
+`env/` 가 모호하지만 (사용자 자식 디렉토리가 우연히 "env" 일 수 있음) Python virtualenv 관습 강해서 포함. 사용자 override 는 `.kebabignore` 의 `!env/` 로.
+
+**구현**:
+- 기존 `kebab-source-fs` 의 `.kebabignore` 처리 코드를 확장.
+- `ignore` crate (gitignore syntax) 그대로 사용. `.gitignore` + `.kebabignore` 를 같은 `Override` 빌더에 add — `ignore` crate 가 둘 다 표준으로 처리.
+- built-in 은 hardcoded `WalkBuilder.add_custom_ignore_filename` 또는 코드 내 `OverrideBuilder` 로.
+
+### 5.3 Generated / vendored skip 정책
+
+**Generated header sniff** — `kebab-source-fs` 의 file scan 단계에서 *blake3 hash 계산 전* 에 실행 (incremental ingest 의 빠른 path 유지):
+
+```text
+fn is_generated_file(path: &Path) -> io::Result<bool> {
+    let mut buf = [0u8; 512];
+    let n = File::open(path)?.read(&mut buf)?;
+    let head = std::str::from_utf8(&buf[..n]).unwrap_or("");
+
+    // 줄 단위 markers — case-insensitive 매칭 (다양한 ecosystem 관습 수용).
+    head.lines().take(10).any(|line| {
+        let l = line.to_ascii_lowercase();
+        l.contains("@generated") ||
+        l.contains("code generated by") ||
+        l.contains("do not edit") ||
+        l.contains("do not modify") ||
+        l.contains("automatically generated") ||
+        l.contains("auto-generated") ||
+        l.contains("autogenerated")
+    })
+}
+```
+
+비용: 파일당 1 read syscall (≤512 byte). 이미 `.gitignore` / built-in 으로 빠진 파일은 이 단계 도달 안 함.
+
+**Skip 시 IngestReport 에 sample 등록** — 디버깅 용 (사용자 "왜 X 파일이 색인 안 됐지?" 시 즉시 답):
+```json
+{
+  "skip_examples": {
+    "generated": [
+      "kebab/crates/proto/src/api.pb.rs",
+      "..."
+    ],
+    "size_exceeded": [
+      "vendor/data/large-fixture.json"
+    ],
+    "builtin_blacklist": ["..."],
+    "gitignore": ["..."]
+  }
+}
+```
+각 카테고리당 처음 5건만. CLI text 모드에서는 카운트만 표시, `--json` 이면 위 schema 그대로.
+
+### 5.4 Size cap
+
+```text
+[ingest.code]
+max_file_bytes = 262144         # 256 KiB
+max_file_lines = 5000           # 둘 중 먼저 hit
+```
+
+- byte cap 은 `fs::metadata().len()` 한 번 — 매우 빠름.
+- line cap 은 byte cap 통과 후 streaming read 로 5000 line 까지 count, 초과 시 skip.
+- 둘 다 `IngestReport.skipped_size_exceeded` 로 카운트, `skip_examples.size_exceeded` 에 sample.
+
+기본값 근거:
+- 256 KiB → 보통 코드 파일 (Rust fn, Python class) 의 100배 이상. minified JS / 대용량 fixture / generated client 의 일반적 사이즈 (수 MB) 는 차단.
+- 5000 line → 한 파일이 한 사람이 이해할 수 있는 한계 근처. 그 이상은 보통 generated.
+
+사용자 override:
+```toml
+[ingest.code]
+max_file_bytes = 1048576    # 1 MiB 로 풀고 싶을 때
+max_file_lines = 20000
+```
+
+### 5.5 IngestReport 세분화
+
+기존 `skipped_by_extension` 옆에 추가:
+
+```json
+{
+  "schema_version": "ingest_report.v1",
+  "indexed": 1234,
+  "unchanged": 5678,
+  "updated": 12,
+  "deleted": 3,
+  "skipped_by_extension": 45,
+  "skipped_gitignore": 2104,
+  "skipped_kebabignore": 8,
+  "skipped_builtin_blacklist": 567,
+  "skipped_generated": 89,
+  "skipped_size_exceeded": 4,
+  "skip_examples": {
+    "generated": ["..."],
+    "size_exceeded": ["..."],
+    "builtin_blacklist": ["..."],
+    "gitignore": ["..."]
+  },
+  "warnings": [],
+  "duration_ms": 12345
+}
+```
+
+`skipped_by_extension` 은 *지원 안 되는 확장자* — 코드 ingest 후로는 Tier 3 fallback (`code-text-paragraph-v1`) 이 잡아내는 폭이 넓어져서 비율이 줄 것. Tier 3 도 못 잡는 binary 등이 남음.
+
+human 출력 (TTY) 에서는 한 줄 요약:
+```text
+✓ indexed 1234 chunks (unchanged 5678, updated 12, deleted 3)
+  skipped: 2104 .gitignore, 567 built-in, 89 generated, 45 unsupported, 8 .kebabignore, 4 too-large
+  duration: 12.3s
+```
+
+---
+
+## 6. Crate 구조
+
+### 6.1 새 crate `kebab-parse-code`
+
+```text
+crates/kebab-parse-code/
+├── Cargo.toml
+└── src/
+    ├── lib.rs                  # 공통 entry, dispatch by extension
+    ├── lang.rs                 # code_lang_for_path(), 식별자 정규화
+    ├── repo.rs                 # detect_repo() — gix wrapper
+    ├── skip.rs                 # generated header sniff, size cap
+    ├── rust.rs                 # tree-sitter-rust → CanonicalDocument (Phase 1A)
+    ├── python.rs               # tree-sitter-python → ...    (Phase 1B)
+    ├── typescript.rs           # ...                          (Phase 1B)
+    ├── javascript.rs           # ...                          (Phase 1B)
+    ├── go.rs                   # ...                          (Phase 1C)
+    ├── java.rs                 # ...                          (Phase 1C)
+    ├── kotlin.rs               # ...                          (Phase 1C)
+    ├── c.rs                    # ...                          (Phase 1D)
+    ├── cpp.rs                  # ...                          (Phase 1D)
+    ├── yaml_k8s.rs             # k8s manifest resource-aware  (Phase 2)
+    ├── dockerfile.rs           # ...                          (Phase 2)
+    ├── manifest.rs             # Cargo.toml / package.json 1-chunk (Phase 2)
+    └── text_paragraph.rs       # Tier 3 fallback              (Phase 3)
+```
+
+**의존성**:
+- 각 phase 별로 `tree-sitter-*` dep 추가. Phase 1A 는 `tree-sitter-rust` + `tree-sitter` (core) 만.
+- `gix` (gitoxide) — Phase 1A 부터.
+- `kebab-core`, `kebab-parse-types` (CanonicalDocument / Block / SourceSpan).
+
+**의존성 제약** (frozen design §8 inheritance):
+- `kebab-parse-code` 는 다른 `kebab-parse-*` 크레이트와 동일한 격리 규칙 — store / embed / llm / rag 직접 import 금지.
+- UI crate (`kebab-cli` / `kebab-tui` / `kebab-mcp`) 는 이 crate 직접 import 금지. `kebab-app` facade 통해서만.
+
+### 6.2 `kebab-chunk` 모듈 확장
+
+```text
+crates/kebab-chunk/src/
+├── lib.rs                          # export 추가 (per phase 누적)
+├── md_heading_v1.rs                # 기존
+├── pdf_page_v1.rs                  # 기존
+├── code_rust_ast_v1.rs             # Phase 1A
+├── code_python_ast_v1.rs           # Phase 1B
+├── code_ts_ast_v1.rs               # ...
+├── code_js_ast_v1.rs               # ...
+├── code_go_ast_v1.rs               # Phase 1C
+├── code_java_ast_v1.rs             # ...
+├── code_kotlin_ast_v1.rs           # ...
+├── code_c_ast_v1.rs                # Phase 1D
+├── code_cpp_ast_v1.rs              # ...
+├── k8s_manifest_resource_v1.rs     # Phase 2
+├── dockerfile_file_v1.rs           # ...
+├── manifest_file_v1.rs             # ...
+└── code_text_paragraph_v1.rs       # Phase 3
+```
+
+각 모듈 = 한 chunker 구현체 + `pub use` 로 lib 에 노출. 기존 패턴 (md_heading_v1 / pdf_page_v1) 그대로.
+
+**Chunker trait 변경 없음** — 기존 `Chunker` trait (frozen §7.2) 가 `CanonicalDocument → Vec<Chunk>` 시그니처라 코드도 같은 trait 로 동작.
+
+### 6.3 의존성 그래프 변경
+
+```text
+기존:
+  kebab-app → kebab-parse-md, kebab-parse-pdf, kebab-parse-image
+            → kebab-chunk
+            → ...
+
+추가 (Phase 1A):
+  kebab-app → kebab-parse-code (신규)
+            → kebab-chunk (모듈 추가)
+```
+
+추가 의존성:
+- `kebab-app → kebab-parse-code`
+- `kebab-parse-code → tree-sitter`, `tree-sitter-rust`, `gix`
+- 빌드 영향: `kebab-parse-code` 추가 → workspace `cargo test -p` 단위 한 개 추가. `-j 1` 정책 (frozen CLAUDE.md) 그대로 적용.
+
+### 6.4 `target/` 디스크 영향
+
+frozen CLAUDE.md 에 "target/ 가 90 GB+ 까지 balloon" 경고 있음. 이 spec 으로 22 → 새 모듈들 추가 시 *integration test* 마다 새 binary linkage 추가 → 더 부풀어. **각 phase 머지 후 `cargo clean` 강제 권장** — CLAUDE.md 의 기존 rule 그대로 적용, phase 끝마다 명시.
+
+---
+
+## 7. Search / RAG 표면
+
+### 7.1 새 search filter
+
+`kebab search` 의 기존 filter (`--tag` / `--lang` / `--path-glob` / `--media` / `--ingested-after` / `--trust-min` / `--doc-id`) 에 세 종 추가:
+
+```text
+--media code                       # umbrella — 모든 code Tier 의 chunk
+--code-lang <list>                 # 반복 / comma — rust,python 식. OR 매칭.
+--repo <name>                      # 반복 가능. OR 매칭.
+```
+
+기존 정책 일관:
+- 반복 가능 flag 는 OR 매칭 (`--repo kebab --repo other`).
+- `--code-lang rs` 같은 alias 는 미지원 — *full identifier* (`rust`) 만. 일관성 위해.
+- 모르는 `--code-lang` 값 → empty hits (`--media` 와 동일 정책).
+- filter flags 간은 AND (`--media code --code-lang rust` → 코드이면서 Rust).
+
+### 7.2 `kebab schema` stats 확장
+
+frozen design §2.5 / p9-fb-37 의 `stats.media_breakdown` 에 `code` 카테고리 추가:
+
+```json
+{
+  "schema_version": "schema.v1",
+  "stats": {
+    "media_breakdown": {
+      "markdown": 1234,
+      "pdf": 56,
+      "image": 78,
+      "audio": 0,
+      "code": 4567,       // ← 신규
+      "other": 12
+    },
+    "lang_breakdown": {       // 기존 — 자연어
+      "ko": 1100,
+      "en": 234,
+      "null": 134
+    },
+    "code_lang_breakdown": {  // ← 신규 — 프로그래밍 언어 (chunk 수)
+      "rust": 2345,
+      "python": 1234,
+      "typescript": 567,
+      "yaml": 89,
+      "go": 332
+    },
+    "repo_breakdown": {       // ← 신규 — repo 별 chunk 수
+      "kebab": 1234,
+      "internal-api": 567,
+      "...": "..."
+    },
+    "index_bytes": 1234567890,
+    "stale_doc_count": 12
+  }
+}
+```
+
+`repo_breakdown` 도 추가하기로 — 사용자가 "어느 repo 가 가장 많이 색인 됐지?" 확인 가능.
+
+### 7.3 RAG prompt (Phase 1A 는 `rag-v2` 그대로)
+
+Phase 1A 에서는 코드 chunk 가 *일반 도큐먼트* 로 prompt 에 들어감:
+
+```text
+[#1] (code: kebab::chunk::md_heading_v1::MdHeadingV1Chunker::chunk_doc)
+fn chunk_doc(&self, doc: &CanonicalDocument) -> Result<Vec<Chunk>> {
+    ...
+}
+
+[#2] (code: kebab::chunk::pdf_page_v1::PdfPageV1Chunker::chunk_doc)
+...
+```
+
+prompt 의 source identifier 가 *file path + symbol* 둘 다 들어가게 — symbol 이 있으면 *symbol* 을 우선 표시, 없으면 file path.
+
+`rag-v2` 의 기존 규칙 ("fact 인용 시 [#번호] 앞에 chunk 속 원문 큰따옴표") 은 코드에서 좀 어색할 수 있음 (코드의 큰따옴표는 string literal). 측정 후 어색하면 Phase 2+ 에서 `rag-v3` (code-aware) 도입.
+
+### 7.4 `kebab inspect` / `kebab fetch` 영향
+
+기존 `kebab inspect chunk <id>` 출력에서 `Citation::Code` variant 의 `symbol` / `code_lang` 표시. text 모드 출력 변경:
+
+```text
+chunk_id:        abc123...
+doc_path:        kebab/crates/kebab-chunk/src/md_heading_v1.rs
+line range:      L142-L168
+symbol:          MdHeadingV1Chunker::chunk_doc          ← 신규 (code variant 에서만)
+code_lang:       rust                                   ← 신규
+repo:            kebab                                  ← 신규
+chunker_version: code-rust-ast-v1
+```
+
+`kebab fetch chunk` / `kebab fetch span` 은 변경 없음 — 본문 byte 그대로 반환.
+
+---
+
+## 8. Config 변경
+
+### 8.1 신규 `[ingest.code]` 절
+
+```toml
+[ingest.code]
+# Generated header sniff 활성화. 첫 ~500 byte 의 6 markers 중 하나 발견 시 skip.
+skip_generated_header = true
+
+# 파일당 max byte (bytes). 초과 시 skip.
+max_file_bytes = 262144     # 256 KiB
+
+# 파일당 max line. 초과 시 skip. byte cap 통과 후 검사.
+max_file_lines = 5000
+
+# 사용자 추가 skip 패턴. gitignore 문법. built-in / .gitignore / .kebabignore 외 추가.
+extra_skip_globs = []
+
+# AST chunk 가 너무 길 때 fallback line-window 적용 임계.
+# 단일 fn / class 가 이 라인 수 넘으면 paragraph fallback 적용.
+ast_chunk_max_lines = 200    # 단일 chunk 최대 라인
+
+# Tier 3 fallback (paragraph + line-window) 시 line-window 사이즈.
+fallback_lines_per_chunk = 80
+fallback_lines_overlap = 20
+```
+
+기본값 근거:
+- `skip_generated_header = true` — 안전 default. 미스 케이스 (사용자가 generated 도 색인 원함) 는 명시적 false.
+- `max_file_bytes = 262144` — minified JS / 대용량 generated 차단 충분.
+- `max_file_lines = 5000` — 한 사람이 한 번에 이해할 수 있는 코드 한계 근처.
+- `ast_chunk_max_lines = 200` — 사람 인지 한계 + retrieval token budget.
+- `fallback_lines_per_chunk = 80`, `overlap = 20` — RAG 컨벤션의 보수적 default.
+
+### 8.2 기본값 + override 정책
+
+- 모든 키 optional. 누락 시 위 default.
+- `KEBAB_*` env override 안 지원 (이건 dev / debug 가 아닌 정책 설정).
+- `--config <path>` 로 격리 테스트 가능 (XDG 의존 안 함).
+
+### 8.3 `config.toml` 의 기존 `[workspace]` 절 영향
+
+변경 없음. `workspace.root`, `exclude` 그대로. `.gitignore` / `.kebabignore` honor 정책은 *기본 동작* 으로 config 키 없이 active — 사용자가 끄고 싶으면 `.kebabignore` 의 `!pattern` 으로 override.
+
+---
+
+## 9. Tier 별 chunker 상세
+
+### 9.1 Tier 1 — AST per-language
+
+**입력**: `CanonicalDocument` with `Block::Code { lang: Some("rust"), code: "..." }`.
+**출력**: `Vec<Chunk>` — 각 chunk 가 AST 의 *top-level 의미 단위* 또는 fallback unit.
+
+**Rust 예시 (Phase 1A)**:
+
+tree-sitter 의미 단위:
+- `function_item` → 1 chunk
+- `impl_item` 의 각 `function_item` → 1 chunk per method
+- `struct_item` / `enum_item` / `trait_item` → 1 chunk (선언 + doc comment)
+- `mod_item` 의 *내용물* → 재귀 분해
+- top-level `use` / `extern crate` / `const` / `static` block → 한 chunk 로 모음 (`<top-level>` symbol)
+
+**ast_chunk_max_lines 초과 시 fallback**:
+- 단일 fn 이 200 line 넘으면 paragraph (blank-line) 기반으로 split.
+- 각 sub-chunk 의 symbol 은 `function_name [part 1/N]` 식으로 표기.
+- 이 동작은 `kebab-chunk/src/code_rust_ast_v1.rs` 내부에서.
+
+**citation 의 line range**: tree-sitter node 의 `start_position.row` / `end_position.row` (0-indexed → +1 로 1-based).
+
+**예시 input → output**:
+```rust
+// src/lib.rs
+pub fn parse(input: &str) -> Result<Doc> {
+    // 50 lines
+}
+
+impl Chunker for Foo {
+    fn chunk_doc(&self, doc: &Doc) -> Vec<Chunk> {
+        // 80 lines
+    }
+
+    fn name(&self) -> &str { "foo-v1" }
+}
+```
+
+→ chunks:
+1. `parse`, lines 1-50, symbol = `parse`
+2. `Foo::chunk_doc`, lines 53-132, symbol = `Foo::chunk_doc` (impl 의 method)
+3. `Foo::name`, lines 134-134, symbol = `Foo::name`
+
+### 9.2 Tier 2 — resource-aware
+
+**k8s-manifest-resource-v1**:
+- YAML multi-document split (`---` separator).
+- 각 document 마다 1 chunk.
+- chunk metadata: `kind: Deployment`, `apiVersion`, `metadata.name`, `metadata.namespace`.
+- citation 의 `symbol` 필드: `<kind>/<namespace>/<name>` (e.g., `Deployment/prod/api-server`). namespace 없으면 `<kind>/<name>`.
+- yaml 파싱 실패 (invalid YAML, 또는 k8s schema 가 아닌 일반 yaml) 시: `code-text-paragraph-v1` 로 fallback 처리.
+
+**dockerfile-file-v1**:
+- Dockerfile 전체 = 1 chunk.
+- symbol = `<dockerfile>`.
+- citation 의 line range = 1 ~ EOF.
+- ARG / FROM / RUN / COPY / CMD 등은 chunk 내부 plain text 로 보존.
+
+**manifest-file-v1** (Cargo.toml, package.json, pyproject.toml, go.mod, pom.xml, build.gradle, tsconfig.json 등):
+- 파일 통째로 1 chunk.
+- symbol = `<manifest>`.
+- citation 의 line range = 1 ~ EOF.
+
+### 9.3 Tier 3 — paragraph + line-window fallback
+
+**code-text-paragraph-v1** — shell script, 미지원 확장자, AST 실패 시 fallback:
+- 빈 줄 (blank line) 기준으로 paragraph 분할.
+- paragraph 가 `fallback_lines_per_chunk` (default 80) 넘으면 line-window split with `fallback_lines_overlap` (default 20).
+- symbol 은 null. citation 은 `Citation::Code { symbol: None, lang: Some("shell") }` 또는 lang 미지정.
+
+---
+
+## 10. 변경 영향 / cascade
+
+### 10.1 Frozen design doc 갱신 (이 spec 머지와 동시)
+
+`docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` 의 다음 섹션 갱신:
+
+| 섹션 | 갱신 내용 |
+|------|-----------|
+| §0 동결된 결정 요약 | "코드 ingest 추가" 1줄. cross-link to 2026-05-15 spec |
+| §2.1 Citation | 5 → 6 variants, `code` 추가 |
+| §2.2 SearchHit | `repo`, `code_lang` optional 필드 |
+| §2.4 IngestReport | skip 카운트 4종 + skip_examples |
+| §2 schema.v1 (fb-37 추가분) | `code` media + `code_lang_breakdown` + `repo_breakdown` |
+| §3.2 Versions / labels | chunker_version family 확장 (per-language pattern) |
+| §3.6 Metadata | `repo`, `git_branch`, `git_commit`, `code_lang` 필드 |
+| §8 모듈 경계 | `kebab-parse-code` 추가 + 의존성 규칙 inheritance |
+| §11 동결 범위 | "code ingest" 가 더 이상 비-스코프 아님 명시. 단 multi-workspace / watch / history aware 는 그대로 비-스코프 |
+
+### 10.2 cascade rule (frozen §9) 영향
+
+- `parser_version` cascade: 각 phase 의 새 parser version (`code-rust-parse-v1` 등) 추가. 기존 markdown / pdf 무영향.
+- `chunker_version` cascade: per-language 라벨 → 한 언어 chunker 변경이 다른 언어 chunks 무효화 안 함.
+- `embedding_version` cascade: 변경 없음 (`multilingual-e5-large` 그대로).
+- `prompt_template_version` cascade: Phase 1A 는 `rag-v2` 그대로 → 무영향.
+- `index_version` cascade: SQLite DDL 변경 없으면 무영향. metadata.repo / git_branch / git_commit 필드는 *Metadata* 의 JSON blob 안에 추가 — DDL 변경 안 필요 (frozen §5 의 `documents.metadata_json TEXT` 가 free-form).
+
+### 10.3 V00X migration?
+
+SQLite DDL 변경 없음 → V00X migration 불요. `documents.metadata_json` 의 free-form 내부에 새 키 (`repo`, `git_branch`, `git_commit`, `code_lang`) 가 들어감. 기존 markdown / pdf chunk 들의 metadata_json 은 그대로.
+
+### 10.4 Binary version bump
+
+[§2 Phase 분할 표 하단 "Binary version bump 트리거 정리"](#2-phase-분할--마일스톤) 참조. 요지:
+- **1A-1 머지** → bump 없음 (wire additive minor + 사용자 surface 변경 없음).
+- **1A-2 머지** → minor bump (`0.6` → `0.7`, 사용자 도그푸딩 시작).
+- 이후 phase 는 각 task spec 에서 결정 (wire / flag 추가 없으면 patch bump).
+
+---
+
+## 11. Open questions (Phase 1A task spec 단계에서 픽스)
+
+이 spec 은 *프레임워크* 까지만 동결. 다음 항목은 Phase 1A 의 task spec 작성 시 결정:
+
+1. **AST chunk 의 minimum size** — 5-line fn 도 한 chunk? 또는 minimum threshold (예: ≥ 10 line) 미만은 인접 fn 과 merge? *영향*: chunk 수 폭증 vs retrieval miss.
+
+2. **doc_id 충돌 위험** — `Cargo.toml` 두 repo 의 content 가 우연히 동일 → blake3 hash 동일 → 같은 doc? frozen §4.2 의 ID recipe 확인 필요. *영향*: 한 doc 이 두 repo 에서 출처 표시. 해결: doc_id recipe 에 repo / path 포함 여부 확인.
+
+3. **`--code-lang` 식별자 정규화 (canonical)** — `rust` / `python` / `typescript` 의 풀네임만 vs `rs` / `py` / `ts` 짧은 alias 도 허용? 이 spec 은 풀네임만 — task spec 에서 alias 매핑 명시.
+
+4. **TUI surface 변경 시점** — Phase 1A 에 포함 vs 별도 Phase 4 (TUI code rendering)? *영향*: TUI 의 Library/Inspect 패널에서 code citation 의 symbol/lang/repo 렌더. 일단 Phase 1A 에 *최소 변경* (citation 표시) 만 포함, 별도 인터랙션 (예: `g` 키로 LSP 식 navigation) 은 P+.
+
+5. **AST chunk symbol path 의 *depth 한계*** — Rust 의 nested impl / nested mod 가 깊으면 `outer::inner::deepest::method` 식 path 가 길어짐. 60 char cap + 중간 생략 (`outer::…::method`)? Phase 1A 의 task spec 에서 cap 정책 결정.
+
+6. **`gix` 의 binary size 영향** — `kebab-parse-code` → `gix` dep 도입이 release binary 크기에 얼마나 영향? `git2` (libgit2) 는 C dep 이라 안 쓰기로 — `gix` 가 pure rust. binary size 영향 측정 후 결정.
+
+7. **k8s manifest 의 `kind` 인식 범위** — `Deployment` / `Service` / `ConfigMap` 등 표준 외 *CRD* (custom resource) 처리? Phase 2 의 task spec 에서 결정. 일단 *모든 yaml document 의 `kind` 필드 그대로* 사용 (CRD 포함 자동 처리).
+
+---
+
+## 12. 다음 단계
+
+1. **이 spec 의 사용자 검토** — 빠진 결정 / 모순 / 추가 우려 확인.
+2. 검토 통과 시 `tasks/p10/` 디렉토리 신설 + `tasks/p10/INDEX.md` 추가 + `tasks/INDEX.md` 에 phase 10 entry.
+3. **Phase 1A-1 task spec 작성** (먼저) — `tasks/p10/p10-1a-1-code-ingest-framework.md`. `contract_sections` 로 `[§2.1, §2.2, §2.4, §2 schema.v1, §3.6, §8, §11]` (chunker 추가 없음 — §3.2 chunker_version 갱신은 1A-2 와 함께).
+4. **Phase 1A-2 task spec 작성** (1A-1 머지 후) — `tasks/p10/p10-1a-2-rust-ast-chunker.md`. `contract_sections` 로 `[§2.1 (code variant 실 사용), §3.2 (code-rust-ast-v1 추가), §3.4 (Rust symbol path)]`.
+5. Frozen design doc (2026-04-27) 갱신을 *Phase 1A-1 PR* 에 동봉 (이 spec 의 §10.1 표 그대로, 단 §3.2 chunker_version 부분은 1A-2 에서).
+6. writing-plans skill 로 Phase 1A-1 의 구현 계획 (작업 단위) 작성.
+7. Phase 1A-1 머지 후 regression test 통과 확인 → Phase 1A-2 구현 계획 작성 → 머지 → kebab 자기 자신 dogfooding → 측정 → 다음 phase 진행 결정.
+
+---
+
+## 부록 A — 의사 결정 회의록 (이 spec 작성 시 사용자와의 brainstorming 요약)
+
+이 spec 작성에 들어간 결정들의 *왜* 를 짧게 (감사용 / 미래 재고 시 참조):
+
+- **시나리오**: "한 부모 dir 아래 수십 개 repo + 의미 검색 + RAG" — kebab 의 cross-corpus 가치를 코드까지 확장.
+- **chunking 전략**: 사용자가 길 B (AST per-language) 명시 선택. 작성자 추천은 길 C (A 로 시작 측정 후 승급) 였으나 사용자 결정 존중.
+- **언어 범위**: 사용자 초기 답 (Rust/Python/TS-JS/Go-Java-Kotlin/C/C++/Shell/Dockerfile/yaml) 을 Tier 1/2/3 으로 재분류 → AST 가 의미 있는 곳에만 AST 적용. 작성자 push back 결과.
+- **embedding 모델**: e5-large 유지. cross-corpus 가치 + cascade 비용 회피.
+- **Citation variant**: 사용자가 `(a)-2` (새 `code` variant) 선택. 작성자 추천은 `(a)-1` (line variant 재사용) 였으나 의미 분리 명확함이 결정 요인.
+- **built-in blacklist**: 사용자가 *축소* 요청 → 5 entry 최종. `.gitignore` 가 source of truth, built-in 은 safety net 만.
+- **Phase 분할**: 사용자가 "되도록 많은 디테일 spec → Phase 1A 부터 구현" 명시. 이 spec 이 그 프레임워크 동결, phase 별 구현은 별도 task spec.
--- a/docs/wire-schema/v1/citation.schema.json
+++ b/docs/wire-schema/v1/citation.schema.json
@@ -7,7 +7,7 @@
  "required": ["schema_version", "kind", "path", "uri", "indexed_at", "stale"],
  "properties": {
    "schema_version": { "const": "citation.v1" },
-    "kind": { "enum": ["line", "page", "region", "caption", "time"] },
+    "kind": { "enum": ["line", "page", "region", "caption", "time", "code"] },
    "path": { "type": "string" },
    "uri":  { "type": "string" },
    "line":    { "type": "object" },
@@ -15,6 +15,7 @@
    "region":  { "type": "object" },
    "caption": { "type": "object" },
    "time":    { "type": "object" },
+    "code":    { "type": "object" },
    "indexed_at": { "type": "string", "format": "date-time" },
    "stale":      { "type": "boolean" }
  }
--- a/docs/wire-schema/v1/ingest_report.schema.json
+++ b/docs/wire-schema/v1/ingest_report.schema.json
@@ -38,6 +38,20 @@
      },
      "description": "p9-fb-25: per-extension skip count. Key = lowercase extension without leading dot (e.g. 'docx'). Files without extension key under '<no-ext>'."
    },
-    "items":          { "type": ["array", "null"] }
+    "items":          { "type": ["array", "null"] },
+    "skipped_gitignore":         { "type": "integer", "minimum": 0 },
+    "skipped_kebabignore":       { "type": "integer", "minimum": 0 },
+    "skipped_builtin_blacklist": { "type": "integer", "minimum": 0 },
+    "skipped_generated":         { "type": "integer", "minimum": 0 },
+    "skipped_size_exceeded":     { "type": "integer", "minimum": 0 },
+    "skip_examples": {
+      "type": "object",
+      "properties": {
+        "generated":         { "type": "array", "items": { "type": "string" }, "maxItems": 5 },
+        "size_exceeded":     { "type": "array", "items": { "type": "string" }, "maxItems": 5 },
+        "builtin_blacklist": { "type": "array", "items": { "type": "string" }, "maxItems": 5 },
+        "gitignore":         { "type": "array", "items": { "type": "string" }, "maxItems": 5 }
+      }
+    }
  }
 }
--- a/docs/wire-schema/v1/schema.schema.json
+++ b/docs/wire-schema/v1/schema.schema.json
@@ -78,6 +78,16 @@
          "type": "integer",
          "minimum": 0,
          "description": "p9-fb-37: docs whose updated_at exceeds config.search.stale_threshold_days. 0 when threshold=0."
+        },
+        "code_lang_breakdown": {
+          "type": "object",
+          "description": "p10-1A-1: per-language code chunk count. Key = lowercase language name (e.g. 'rust', 'python'). Populated after 1A-2 lands; empty on markdown-only corpora.",
+          "additionalProperties": { "type": "integer", "minimum": 0 }
+        },
+        "repo_breakdown": {
+          "type": "object",
+          "description": "p10-1A-1: per-repo code chunk count. Key = repo name as detected by kebab-parse-code::repo. Empty on markdown-only corpora.",
+          "additionalProperties": { "type": "integer", "minimum": 0 }
        }
      }
    }
--- a/docs/wire-schema/v1/search_hit.schema.json
+++ b/docs/wire-schema/v1/search_hit.schema.json
@@ -42,6 +42,8 @@
    "embedding_model": { "type": ["string", "null"] },
    "chunker_version": { "type": "string" },
    "indexed_at":      { "type": "string", "format": "date-time" },
-    "stale":           { "type": "boolean" }
+    "stale":           { "type": "boolean" },
+    "repo":            { "type": ["string", "null"] },
+    "code_lang":       { "type": ["string", "null"] }
  }
 }
--- a/fixtures/golden_queries.yaml
+++ b/fixtures/golden_queries.yaml
@@ -1,4 +1,4 @@
-# Golden query suite for `kb eval run` (P5-1 / P5-2).
+# Golden query suite for `kebab eval run` (P5-1 / P5-2 / fb-39).
 #
 # Top-level: list of queries. Required fields: `id`, `query`. All
 # others are optional and default to empty / null.
@@ -7,8 +7,16 @@
 # real rows in the active workspace's SQLite store at run time. Stale
 # references make the runner bail at start. The shipped template
 # leaves them empty so the file is loadable on any fresh workspace —
-# fill them in after a `kb ingest` to enable hit@k / MRR metrics
-# (P5-2).
+# fill them in after a `kebab ingest` to enable the metrics that
+# require ground truth (P5-2 + fb-39):
+#
+#   - `expected_chunk_ids` →  hit_at_k, MRR, precision_at_k_chunk (fb-39)
+#   - `expected_doc_ids`   →  recall_at_k_doc
+#
+# `precision_at_k_chunk` (fb-39): of the top-k retrieved hits, what
+# fraction's `chunk_id` is in `expected_chunk_ids`. Denominator is k
+# (fixed) — `top-k` shortfall is treated as precision loss. Queries
+# with empty `expected_chunk_ids` are skipped from this metric.
 #
 # `must_contain` / `forbidden` drive the rule-based groundedness
 # metric (P5-2).
--- a/tasks/HOTFIXES.md
+++ b/tasks/HOTFIXES.md
@@ -14,6 +14,22 @@ historical contract that was implemented; this file accumulates the
 deltas so phase 5+ readers can find the live behavior without diffing
 git history.

+## 2026-05-10 — p9-fb-39b: embedding upgrade UX
+
+**무엇이 바뀌었나**: default embedding 이 `multilingual-e5-small` (384 dim) 에서 `multilingual-e5-large` (1024 dim) 로 변경. LanceDB 테이블은 `(model, dim)` 으로 네임스페이스되어 새 모델은 fresh 테이블에 쓰고, 옛 `chunk_embeddings_multilingual-e5-small_384` 테이블은 orphan 상태 됨.
+
+**user TOML 에 small 명시한 경우**: backwards-compat 유지. 사용자가 `[models.embedding] model = "multilingual-e5-small"` 로 명시했으면 그대로 small 사용 (새 default 무시).
+
+**idempotent re-embed**: fb-23 incremental ingest 가 embedding_version mismatch 감지하면 자동으로 이전 chunk 를 새 모델로 re-embed. 다음 `kebab ingest` 호출 시 기존 chunk 의 embedding 을 새 테이블에 재작성.
+
+**disk 절약**: 이전 모델의 orphan 테이블을 먼저 정리하려면 `kebab reset --vector-only` 실행 (LanceDB + SQLite `embedding_records` 모두 wipe). 이후 `kebab ingest` 가 모든 chunk 를 새 모델로 re-embed 해 새 테이블 채움.
+
+**search/ask 결과**: re-embed 전까지는 empty hit (새 모델에 데이터가 없음). `kebab ingest` 후 정상 검색 가능.
+
+**Spec contract 와의 관계**: design §5 (storage) + §9 (versioning cascade) 의 embedding_model.id / dimensions 변경. wire 의 `embedding_version` 필드 (kebab-app schema.v1.models.embedding_version 가 config.models.embedding.model 값을 그대로 emit) 변경 — CLAUDE.md cascade rule 의 release 트리거. 본 PR 머지 후 `chore: bump version 0.5 → 0.6` + tag 필요.
+
+**Spec deviation**: design `2026-05-10-p9-fb-39b-embedding-upgrade-design.md` 의 §Migration policy + §Public surface delta 가 `LanceVectorStore::open` 안 신규 `error.v1.code = "embedding_dim_mismatch"` 명시했으나 구현 제외. 이유: LanceDB tables 가 `(model, dim)` namespaced — silent orphan + empty-hit 으로 surface (hard error 아님). 명시 error 필요 시 별도 startup health check 작업 필요 (fb-39c 후보 또는 doctor 확장).
+
 ## 2026-05-09 — p9-fb-34: search wire wrapped in search_response.v1

 **무엇이 바뀌었나**: `kebab search --json` stdout 이 기존 `search_hit.v1[]` 배열에서 신규 `search_response.v1` object 로 교체. wrapper 가 `hits`, `next_cursor`, `truncated` 세 필드를 가짐.
--- a/tasks/INDEX.md
+++ b/tasks/INDEX.md
@@ -35,6 +35,7 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.
 | P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kebab-parse-pdf | P5 |
 | P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kebab-parse-audio | P5 |
 | P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kebab-tui, kebab-desktop | P5 |
+| P10 | [p10/INDEX.md](p10/INDEX.md) | Code ingest framework + AST chunkers | kebab-parse-code, kebab-source-fs (code walk) | P5 |

 ## Component task decomposition (per phase)

@@ -129,13 +130,23 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.

    ### 🎯 0.5.0 — RAG quality (cascade 동반: V00X + reindex)
    - [p9-fb-38 score semantics](p9/p9-fb-38-score-semantics.md) — ✅ 머지 (2026-05-10)
-    - [p9-fb-39 retrieval precision 튜닝](p9/p9-fb-39-retrieval-precision-tuning.md) — ⏳ 미구현, brainstorm 필요 (embedding_version cascade)
+    - [p9-fb-39 retrieval precision 튜닝](p9/p9-fb-39-retrieval-precision-tuning.md) — ✅ 머지 (2026-05-10) — eval foundation only, lever 적용 deferred
+    - [p9-fb-39b embedding upgrade](p9/p9-fb-39b-embedding-upgrade.md) — ✅ 머지 (2026-05-10) — multilingual-e5-large default
    - [p9-fb-40 fact-grounded answer](p9/p9-fb-40-fact-grounded-answer.md) — ✅ 머지 (2026-05-10)

    ### 🎯 0.6.0 또는 P+ — reasoning
    - [p9-fb-41 multi-hop reasoning](p9/p9-fb-41-multi-hop-reasoning.md) — ⏳ 미구현, brainstorm 필요 (XL, eval 인프라 선행)
    - [p9-fb-42 bulk multi-query + re-rank hint](p9/p9-fb-42-bulk-multi-query-rerank.md) — ✅ 머지 (2026-05-10) — bulk only, rerank hint deferred

+- P10 — [p10/](p10/) — code ingest (multi-task, sub-indexed in [p10/INDEX.md](p10/INDEX.md))
+  - [p10-1A-1 code ingest framework](p10/p10-1a-1-code-ingest-framework.md) — 🟡 진행 중
+  - p10-1A-2 Rust AST chunker — ⏳
+  - p10-1B Python + TS/JS AST chunkers — ⏳
+  - p10-1C Go + Java + Kotlin AST chunkers — ⏳
+  - p10-1D C + C++ AST chunkers — ⏳
+  - p10-2 Tier 2 resource-aware — ⏳
+  - p10-3 Tier 3 paragraph + line-window fallback — ⏳
+
 ## Post-merge 핫픽스

 머지 후 발견된 버그들과 그 follow-up PR들은 [HOTFIXES.md](HOTFIXES.md)에 dated 로그로 기록한다. 원래 task spec은 frozen 상태로 두고, post-merge 동작 변경은 HOTFIXES.md를 source of truth로 본다.
--- a/tasks/p10/INDEX.md
+++ b/tasks/p10/INDEX.md
@@ -0,0 +1,13 @@
+# Phase 10 — Code Ingest
+
+| ID | Subject | Status |
+|----|---------|--------|
+| 1A-1 | code ingest framework (wire schema, parse-code crate skeleton, filter flags, skip policy, config 절) | 🟡 진행 중 |
+| 1A-2 | Rust AST chunker | ⏳ |
+| 1B | Python + TS/JS AST chunkers | ⏳ |
+| 1C | Go + Java + Kotlin AST chunkers | ⏳ |
+| 1D | C + C++ AST chunkers | ⏳ |
+| 2 | Tier 2 resource-aware (k8s / Dockerfile / manifest) | ⏳ |
+| 3 | Tier 3 paragraph + line-window fallback | ⏳ |
+
+Design: [2026-05-15-kebab-code-ingest-design.md](../../docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md)
--- a/tasks/p10/p10-1a-1-code-ingest-framework.md
+++ b/tasks/p10/p10-1a-1-code-ingest-framework.md
@@ -0,0 +1,31 @@
+# p10-1A-1 — code ingest framework
+
+**Status:** 🟡 진행 중
+**Contract sections:** §2.1 (Citation `code` variant), §2.2 (SearchHit repo/code_lang), §2.4 (IngestReport skip counters), §2 schema.v1 (code_lang_breakdown + repo_breakdown), §3.6 (Metadata fields), §8 (kebab-parse-code crate boundary), §11 (code ingest no longer 비-스코프).
+**Design:** [2026-05-15-kebab-code-ingest-design.md](../../docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md) §1A-1.
+**Plan:** [2026-05-15-p10-1a-1-code-ingest-framework.md](../../docs/superpowers/plans/2026-05-15-p10-1a-1-code-ingest-framework.md).
+
+## Goal
+
+Land the *framework surface* for code ingest — wire schema (additive minor), CLI filter flags, ignore policy, skip policy infrastructure, `kebab-parse-code` crate skeleton, `[ingest.code]` config section — without enabling any code chunker. 1A-2 plugs in the Rust AST chunker on top.
+
+## Acceptance criteria
+
+- `cargo test --workspace --no-fail-fast -j 1` passes.
+- Regression test (`wire_search_hit_no_code_fields`, `wire_citation_5_variants_unchanged`) passes — markdown corpus wire output unchanged.
+- `cargo clippy --workspace --all-targets -- -D warnings` passes.
+- `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` updated per design §10.1.
+- README + HANDOFF + SMOKE updated.
+
+## Allowed dependencies
+
+- `kebab-parse-code` may depend on `kebab-core`, `anyhow`, `gix`. NOT on store / embed / llm / rag / UI.
+- Source-fs may depend on `kebab-parse-code`.
+
+## Forbidden dependencies
+
+- UI crates (cli / mcp / tui) must NOT import `kebab-parse-code` directly.
+
+## Risks / notes
+
+- `.gitignore` honor changes existing behavior for markdown corpora whose files live in gitignored areas. Regression test covers the standard case (no overlap). If a user reports missing docs after 1A-1 lands, log to HOTFIXES.
--- a/tasks/p9/p9-fb-39-retrieval-precision-tuning.md
+++ b/tasks/p9/p9-fb-39-retrieval-precision-tuning.md
@@ -1,20 +1,24 @@
 ---
 phase: P9
-component: kebab-search + kebab-rag + kebab-chunk
+component: kebab-eval + docs
 task_id: p9-fb-39
 title: "Retrieval precision 튜닝 (rank 5+ 노이즈 완화)"
-status: open
-target_version: 0.5.0
+status: completed
+target_version: 0.7.0
 depends_on: []
 unblocks: []
 contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
-contract_sections: [§3 chunking, §4 search, §7 RAG]
+contract_sections: [§3 chunking, §4 search, §7 RAG, §10.3 eval metrics]
 source_feedback: 사용자 도그푸딩 2026-05-06 — Claude Code 가 kebab CLI 사용 후 "rank 5+ 부터 노이즈 섞임" 지적. precision-at-k 가 k=5 이후 떨어짐.
 ---

 # p9-fb-39 — Retrieval precision 튜닝

-> ⏳ **백로그 only — 미구현.** 본 spec 은 도그푸딩 피드백 skeleton. 구현 착수 전 [superpowers:brainstorming](../../docs/superpowers/) 으로 설계 단계 선행 필요. 어느 lever (chunk policy / RRF k / score gate / cross-encoder / embedding 업그레이드) 부터 손볼지, eval golden set 선행 여부 brainstorm 후 결정.
+> ✅ **Eval foundation 부분 구현 완료.** P@k metric (P@5, P@10) 추가. 본 spec 의 lever 적용 (chunk policy / RRF / cross-encoder / embedding 업그레이드) 은 별도 task 로 분리 (fb-39b 이후).
+>
+> - Design: [`docs/superpowers/specs/2026-05-10-p9-fb-39-eval-foundation-design.md`](../../docs/superpowers/specs/2026-05-10-p9-fb-39-eval-foundation-design.md)
+> - Plan: [`docs/superpowers/plans/2026-05-10-p9-fb-39-eval-foundation.md`](../../docs/superpowers/plans/2026-05-10-p9-fb-39-eval-foundation.md)
+> - fb-39b (lever 적용 — embedding upgrade): [`tasks/p9/p9-fb-39b-embedding-upgrade.md`](./p9-fb-39b-embedding-upgrade.md) ✅

 ## 증상 / 동기

--- a/tasks/p9/p9-fb-39b-embedding-upgrade.md
+++ b/tasks/p9/p9-fb-39b-embedding-upgrade.md
@@ -0,0 +1,76 @@
+---
+phase: P9
+component: kebab-embed-local + kebab-config + kebab-store-vector + docs
+task_id: p9-fb-39b
+title: "Embedding model upgrade (multilingual-e5-large)"
+status: completed
+target_version: 0.6.0
+depends_on: [p9-fb-39]
+unblocks: []
+contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
+contract_sections: [§4 search, §5 storage, §9 versioning cascade]
+source_feedback: 사용자 도그푸딩 2026-05-06 — Claude Code 가 kebab CLI 사용 후 "rank 5+ 노이즈 섞임" 지적 (fb-39 의 lever 적용 측면).
+---
+
+# p9-fb-39b — Embedding model upgrade
+
+> ✅ **구현 완료.** fb-39 의 lever 후보 4개 중 embedding model 업그레이드 lever 적용. P@k metric (fb-39) 으로 small vs large 비교 가능.
+>
+> - Design update: [`docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`](../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md) §5 / §9
+> - Plan: [`docs/superpowers/plans/2026-05-10-p9-fb-39b-embedding-upgrade.md`](../../docs/superpowers/plans/2026-05-10-p9-fb-39b-embedding-upgrade.md)
+
+## 요약
+
+- `multilingual-e5-small` (384 dim) → `multilingual-e5-large` (1024 dim) default flip.
+- 기존 user TOML 이 small 명시 시 그대로 유지 (backwards-compat).
+- fb-23 incremental ingest 가 embedding_version mismatch 감지 → 자동 re-embed.
+- 0.5 → 0.6 minor bump 트리거 (design §9 cascade rule, current Cargo.toml = 0.5.0).
+
+## 구현 항목
+
+1. **config defaults flip** — `[models.embedding] model = "multilingual-e5-large"`, `dimensions = 1024`.
+2. **fastembed e5-large resolution** — `kebab-embed-local` 의 `resolve_model()` 에 e5-large arm 추가.
+3. **fixture sweep** — 모든 unit/integration 테스트의 default embedding 모델 확인. Config 에서 명시하지 않으면 새 default 따름 (`provider = "none"` 테스트 제외).
+4. **design contract update** — design §5 (storage example) + §9 (versioning table) 의 embedding_model.id + dimensions 갱신.
+5. **HOTFIXES entry** — 사용자 재 ingest 절차 + backwards-compat 동작 명시.
+6. **README update** — `[models.embedding]` 섹션의 기본값 + `dimensions` 필드 설명 갱신.
+7. **SMOKE.md append** — 스모크 테스트 중 embedding 업그레이드 검증 절차 (reset → config 갱신 → ingest → eval).
+8. **tasks/INDEX.md append** — p9-fb-39b row 추가 (p9-fb-39 sibling).
+
+## Allowed dependencies
+
+- `kebab-embed-local` — fastembed crate + `kebab-core`
+- `kebab-config` — toml crate
+- `kebab-store-vector` — lancedb crate (table naming 로직만 영향)
+- `kebab-app` — 와이어링만 (API 변경 없음)
+
+## Forbidden dependencies
+
+- parse-* crate (parser 무관)
+- llm-* crate (embedding 과 무관)
+- search crate (검색 로직은 adapter pattern 으로 이미 generic)
+
+## Test
+
+- `cargo test -p kebab-embed-local -- e5_large` (새 arm 테스트)
+- `cargo test -p kebab-config -- embedding_defaults` (config defaults)
+- `cargo test --workspace --no-fail-fast -j 1` (full regression)
+- Smoke: `kebab --config /tmp/smoke.toml doctor | grep embedding` → `multilingual-e5-large (1024d)`
+- Smoke: `kebab --config /tmp/smoke.toml ingest` → embedding 진행 표시 + dimension check
+- P@k eval: `kebab eval run` (fb-39 의 golden set) small vs large 비교
+
+## Backward compat notes
+
+- Pre-fb-39b user 가 config 에서 명시하지 않은 embedding → new default (large) 자동 적용. TOML 에 `model = "multilingual-e5-small"` 명시하면 유지.
+- `kebab-config` 의 `EmbeddingCfg.model` 은 String 필드 — TOML 에 명시한 값이 default 를 override (serde 기본 동작).
+- Orphan LanceDB table (`chunk_embeddings_multilingual-e5-small_384`) 은 다음 `kebab ingest` 실행 후 stale 취급 — 사용자가 수동 `kebab reset --vector-only` 로 정리 가능.
+
+## Binary version bump
+
+- 0.5.0 → 0.6.0 (current Cargo.toml = 0.5.0; embedding_version cascade triggers minor bump per design §9).
+- Release notes: `embedding default: multilingual-e5-small (384d) → multilingual-e5-large (1024d), P@k metric ↑`.
+
+## Post-merge deviation
+
+- **`embedding_dim_mismatch` ErrorV1 dropped**: design spec §Migration policy 가 `LanceVectorStore::open` 안 dim mismatch 감지 + 신규 `error.v1.code = "embedding_dim_mismatch"` 를 명시했으나 구현에서 제외. 이유: LanceDB tables 가 `(model, dim)` namespaced (`crates/kebab-store-vector/src/paths.rs:21`) — 신규 model 변경 시 새 table 자동 생성, 옛 table orphan. dim mismatch 가 hard error 되지 않고 검색 결과 0건 (silent precision loss) 으로 surface. HOTFIXES 항목이 documentation source. 명시 error 가 의미 있으려면 별도 startup health check 필요 — fb-39c 후보 또는 v0.7.0 의 doctor 확장.
+- 영향: 사용자가 model 변경 후 `kebab ingest` 안 하면 검색 결과 0건. README + SMOKE walkthrough 가 reset --vector-only && ingest 시퀀스 안내.
Author	SHA1	Message	Date
th-kim0823	7bbd2c0cbf	docs(p10-1a-1): wire schema + frozen design + README/HANDOFF/SMOKE + task index Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 17:41:26 +09:00
th-kim0823	d13f58d28a	fix(p10-1a-1): patch wire.rs Stats fixture for new schema fields Task 16's new code_lang_breakdown / repo_breakdown fields broke the existing schema_wrapper_tags_schema_version test in wire.rs which constructs Stats { ... } literally. Use ..Default::default() since Stats now derives Default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 17:30:01 +09:00
th-kim0823	298f4adc81	feat(p10-1a-1): CLI filter flags + SchemaStats breakdowns + regression tests Task 13: add wire regression tests proving markdown SearchHit omits repo/code_lang when None, and all 5 original Citation variants serialize byte-identically without spurious Code-variant keys. Task 15: add --repo (repeatable) and --code-lang (repeatable, comma-separated) flags to `kebab search`; propagate both into SearchFilters instead of the previous vec![] stub. Add #[allow(clippy::large_enum_variant)] — Cmd is short-lived, boxing buys nothing. Task 16: add code_lang_breakdown and repo_breakdown BTreeMap fields to Stats (schema.v1); derive Default on Stats; populate both as empty in collect_stats (1A-2 fills them when code chunks land). Add unit test asserting both keys are present in the serialized object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 17:21:59 +09:00
th-kim0823	4e8b70a04b	feat(p10-1a-1): apply generated-header + size-cap skip per file Wire kebab_parse_code::is_generated_file and is_oversized into FsSourceConnector::scan_with_skips. Files that pass gitignore/builtin/ kebabignore matching are now checked for generated-file markers (config-gated via ingest.code.skip_generated_header) and byte/line caps (ingest.code.max_file_bytes / max_file_lines). FsScanSkips gains skipped_generated + skipped_size_exceeded counters; kebab-app threads them into IngestReport. Also fixes a pre-existing clippy::derivable_impls warning in IngestCfg. Three new connector tests cover all three paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 17:06:59 +09:00
th-kim0823	682f7dd3a2	feat(p10-1a-1): add [ingest.code] config section Add IngestCfg + IngestCodeCfg structs with serde defaults and embed ingest: IngestCfg into the top-level Config. Existing configs without an [ingest] section continue to load unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:53:21 +09:00
th-kim0823	40b3ea8408	chore(p10-1a-1): cleanup Task 11 review findings + sync Cargo.lock Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:50:55 +09:00
th-kim0823	9fce24b106	feat(p10-1a-1): wire IngestReport skip counters by category (gitignore/builtin/kebabignore) Refactor walker to expose WalkOverrides (combined + per-source matchers), add walk_files_with_skips that returns accepted files alongside skip attribution, wire FsSourceConnector::scan_with_skips into kebab-app so IngestReport.skipped_gitignore, skipped_kebabignore, skipped_builtin_blacklist, and skip_examples are populated instead of left at zero. Priority order per spec §5.2 (builtin > gitignore > kebabignore) enforced in classify_skip, with a directory-aware builtin matcher so pruned directory entries are correctly attributed to builtin rather than a coincident gitignore entry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:42:28 +09:00
th-kim0823	8bbe25dc10	fix(p10-1a-1): guard .gitignore negation + sync doc comments Prevent double-`!` corruption when a `.gitignore` negation pattern (e.g. `!keep/`) hits the trailing-slash normalizer in `read_gitignore`. Also updates module-level and `build_overrides` doc to list all five filter sources in application order, and adds a regression test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:30:00 +09:00
th-kim0823	abfdcbd31d	feat(p10-1a-1): honor repo-root .gitignore in walker overrides Adds read_gitignore() (pub(crate), root-only, nested cascade deferred) and merges its patterns as a 5th group in build_overrides(). Trailing- slash patterns (dist/) are normalized to also emit a stem/** glob so files inside the directory are matched when is_dir=false. Two new tests cover both the happy path and the missing-file no-op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:25:19 +09:00
th-kim0823	69d1593bc5	feat(p10-1a-1): integrate built-in blacklist into walker overrides Wires `kebab_parse_code::BUILTIN_BLACKLIST` (6 patterns: node_modules, target, __pycache__, .venv, venv, env) into `build_overrides()` so the walker automatically excludes these directories even when the user has no `.kebabignore`. TDD cycle: 2 failing tests added first, then the pattern-add loop inserted after the existing kbignore block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:13:39 +09:00
th-kim0823	2a8451c033	fix(p10-1a-1): tighten kebab-parse-code manifest + tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:05:34 +09:00
th-kim0823	ff11f81f7f	feat(p10-1a-1): kebab-parse-code crate (lang + repo + skip) Tasks 5-8: new `kebab-parse-code` crate with three infrastructure modules for the code ingest framework. Ships lang.rs (extension→language identifier mapping), repo.rs (.git walk-up via gix 0.70 for RepoMeta), and skip.rs (BUILTIN_BLACKLIST, is_generated_file, is_oversized). 14 integration tests across three test files, all passing; clippy -D warnings clean. Note: gix pinned to 0.70 (not 0.83 as originally suggested) because 0.83 fails to compile against Rust 1.94.1 due to non-exhaustive match patterns in gix-hash. 0.70 resolves cleanly and has identical head_name/head_id API. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 15:57:59 +09:00
th-kim0823	bf4ebf8d2a	feat(p10-1a-1): add Metadata.repo / git_branch / git_commit / code_lang Four optional, serde-skipped-when-None fields added to `Metadata` for code ingest context. All 11 downstream construction sites patched with `repo: None, git_branch: None, git_commit: None, code_lang: None`. Full workspace check (`--tests`) and per-crate test suite pass clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 15:44:18 +09:00
th-kim0823	351c7a0826	feat(p10-1a-1): add IngestReport skip counters + SkipExamples Adds five new u32 counters (skipped_gitignore, skipped_kebabignore, skipped_builtin_blacklist, skipped_generated, skipped_size_exceeded) and a SkipExamples struct (≤5 sample paths per category) to IngestReport. All new fields are #[serde(default)] for backward-compat deserialization. Downstream literal construction sites patched with zeros/empty; snapshot re-baked. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 15:28:19 +09:00
th-kim0823	7329ba96ee	fix(p10-1a-1): patch missed SearchHit test-only construction sites Add repo: None, code_lang: None to the 3 SearchHit struct literals inside #[cfg(test)] blocks that were missed by the `fa4eeb5` sweep.	2026-05-15 15:17:10 +09:00
th-kim0823	fa4eeb5a87	feat(p10-1a-1): add SearchHit.repo / code_lang + SearchFilters.repo / code_lang Wire two new optional fields onto SearchHit (skip_serializing_if = None) and two Vec<String> filter fields onto SearchFilters (serde default). Add RetrievalDetail::Default impl (manual, uses SearchMode::Hybrid as sentinel). Patch all downstream SearchHit / SearchFilters literal constructors with repo: None / code_lang: None / vec![] as appropriate. Also covers Citation::Code arm in kebab-eval metrics match.	2026-05-15 15:04:23 +09:00
th-kim0823	3b1e878aed	feat(p10-1a-1): add Citation::Code variant Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 14:39:18 +09:00
th-kim0823	005a9011ea	plan(p10-1a-1): code ingest framework implementation plan + spec wire-shape fix 21 task plan: kebab-core 도메인 타입 (Citation::Code variant, SearchHit repo/code_lang, IngestReport skip counters, Metadata extension), 새 kebab-parse-code crate (lang/repo/skip 모듈, gix dep), kebab-source-fs gitignore+blacklist 통합, kebab-config [ingest.code] 절, kebab-cli --repo/--code-lang flag, wire schema JSON 갱신, frozen design doc 갱신, README/HANDOFF/SMOKE 갱신, task index. 각 task 가 5-step TDD cycle (test fail → impl → pass → commit). 코드 chunker 는 1A-1 에 없음 — 1A-2 에서 추가. spec 의 Citation::Code 예시가 기존 5 variants 의 flat wire 형태와 안 맞아서 (`code: {...}` 중첩이 아니라 top-level field) 같이 fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:31:22 +09:00
th-kim0823	c6d61b0b37	spec(p10): split Phase 1A into 1A-1 (framework) and 1A-2 (Rust chunker) 1A 가 들고 들어가는 프레임워크 surface (Citation `code` variant, SearchHit repo/code_lang, --media code / --code-lang / --repo filter, skip 정책, IngestReport 세분화, config 절, kebab-parse-code crate skeleton) 가 언어 chunker 자체 와 독립 검증 가능 — 1A-1 머지 후 기존 markdown corpus 의 wire 출력이 byte-level identical 한지 regression test 로 검증한 다음 1A-2 에서 Rust AST chunker 자체에 집중. binary version bump 트리거도 1A-2 로 미룸 (1A-1 은 wire additive minor + 사용자 surface 변경 없음). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:20:10 +09:00
th-kim0823	49487dc46b	spec(p10): code ingest design — Tier 1 AST + Tier 2 resource + Tier 3 fallback 수십 개 git repo (한 부모 dir 아래) 를 corpus 로 확장. Tier 1 (Rust/Python/TS-JS/Go/Java/Kotlin/C/C++) 은 tree-sitter AST per-language chunker, Tier 2 (k8s manifest / Dockerfile / Cargo.toml 류) 는 resource-aware chunker, Tier 3 (shell / fallback) 는 paragraph + line-window. embedding 은 multilingual-e5-large 유지 — cross-corpus 검색 위해. Phase 1A (Rust) 부터 1D (C/C++) + Phase 2 (Tier 2) + Phase 3 (Tier 3) 순으로 진행. ignore 통합 (.gitignore honor + .kebabignore 추가 + 최소 built-in safety net), generated header sniff, size cap 으로 첫 도그푸딩 비용 차단. 새 Citation variant `code`, SearchHit 의 repo/code_lang 필드, --media code / --code-lang / --repo filter — 모두 additive minor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:15:59 +09:00
altair823	2c2bf9bac5	Merge pull request 'docs(claude): cargo clean routinely between merges' (#135 ) from chore/cargo-clean-cadence into main Reviewed-on: #135	2026-05-10 15:02:00 +00:00
altair823	72798bd3ff	Merge pull request 'chore: bump version 0.5 → 0.6' (#138 ) from chore/bump-v0.6.0 into main Reviewed-on: #138	2026-05-10 15:01:45 +00:00
th-kim0823	c3177561b9	chore: bump version 0.5 → 0.6 v0.6.0 batches RAG quality batch: - fb-38 score semantics (search_hit.v1 score_kind) - fb-40 fact-grounded answer (rag-v2 prompt template) - fb-42 bulk multi-query (kebab search --bulk + mcp__kebab__bulk_search) - fb-39 eval foundation (precision_at_k_chunk metric) - fb-39b embedding upgrade (multilingual-e5-large default) embedding_version cascade triggers minor bump per design §9. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 23:56:51 +09:00
altair823	a465b71f99	Merge pull request 'feat(fb-39b): embedding upgrade — multilingual-e5-large default' (#137 ) from feat/fb-39b-embedding-upgrade into main Reviewed-on: #137	2026-05-10 14:53:21 +00:00
th-kim0823	787007172a	fix(fb-39b): address PR #137 round 2 review - target_version 0.7.0 → 0.6.0 (current Cargo.toml = 0.5.0; embedding_version cascade bumps to 0.6, not 0.7) - 요약 bullet "0.6 → 0.7" → "0.5 → 0.6" 정정 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 23:47:47 +09:00
th-kim0823	b954e9ce66	fix(fb-39b): address PR #137 round 1 review - CI-only embed_model.rs tests updated 384 → 1024 + e5-small → e5-large references (incl. file header download size, snapshot dim assert, L2 norm comment) - kebab-embed-local module docs + Cargo.toml description list both models (small + large) - Stale tracing message expanded with both model sizes - Task spec Post-merge deviation section: record dropped embedding_dim_mismatch ErrorV1 + reason (LanceDB (model, dim) namespacing makes hard-error redundant) - Task spec + HOTFIXES version bump 0.6→0.7 corrected to 0.5→0.6 (current Cargo.toml = 0.5.0; fb-42 0.6 cut deferred per user direction) - HOTFIXES "embedding_version bump 아님" line corrected — cascade rule DOES trigger release bump, plus deviation note for the dropped error Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 23:45:55 +09:00
th-kim0823	c62a8ff503	docs(fb-39b): design + HOTFIXES + new task spec + INDEX + README + SMOKE Tasks 4 + 5: comprehensive doc update for embedding upgrade (multilingual-e5-large). - design §5 + §9: update embedding_model / dimensions references (384 -> 1024) - HOTFIXES: add fb-39b entry with user re-ingest procedure + backwards-compat notes - tasks/p9-fb-39b-embedding-upgrade.md: new task spec (completed status) - INDEX.md: add fb-39b row under RAG quality phase - fb-39 task banner: append fb-39b link as lever implementation - README: update config defaults + fastembed model size + embedding field docs - SMOKE.md: append embedding upgrade verification section with e5-small -> e5-large sequence Wire schema: no change (additive at config level, new table created by existing code). Binary version: 0.6.0 -> 0.7.0 (cascade rule: embedding_model change = minor bump). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 23:28:48 +09:00
th-kim0823	69c94b6692	feat(embed,config): add multilingual-e5-large + flip default config (fb-39b) Task 1: Add multilingual-e5-large arm to kebab-embed-local::resolve_model with tests for 1024-dim variants and error cases. Task 2: Flip kebab-config defaults from e5-small (384-dim) to e5-large (1024-dim) across defaults(), test assertions, and TOML template. All tests pass; clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 23:05:36 +09:00
th-kim0823	d5321701ea	plan(fb-39b): embedding upgrade implementation plan 5 tasks: kebab-embed-local resolve_model arm + check_dim test, kebab-config defaults + TOML template flip, cross-crate fixture sweep (likely no-op since most tests use provider=none), docs (design + HOTFIXES + new task spec + INDEX), README + SMOKE walkthrough. Post-merge: 0.6 → 0.7 binary bump per CLAUDE.md cascade rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 23:02:37 +09:00
th-kim0823	2c3461c465	spec(fb-39b): embedding model upgrade design - multilingual-e5-small (384 dim) → multilingual-e5-large (1024 dim) - Cascade: embedding_version bump → fb-23 incremental ingest re-embeds all chunks - Migration policy: dim mismatch detection at LanceVectorStore::open → error.v1 (code = embedding_dim_mismatch) + hint "kebab reset --vector-only && kebab ingest" - Config defaults flip (model + dimensions). User TOML pinning small preserves backwards-compat - bge-m3 deferred (fastembed enum 미포함, UserDefinedEmbeddingModel ONNX path 별도) - Release trigger: 0.6 → 0.7 minor bump per CLAUDE.md cascade rule Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 22:59:03 +09:00
altair823	240120ee80	Merge pull request 'feat(fb-39): eval foundation — precision_at_k_chunk metric' (#136 ) from feat/fb-39-eval-foundation into main Reviewed-on: #136	2026-05-10 13:41:04 +00:00
th-kim0823	5870a1de15	fix(fb-39): address PR #136 round 1 review kebab eval compare now surfaces precision_at_k_chunk delta in both human-readable table + deltas JSON. Snapshot fixture regenerated additively. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 22:39:11 +09:00
th-kim0823	f00fb376fe	docs(fb-39): golden header + design §10.3 eval + spec status + INDEX Strengthen fixtures/golden_queries.yaml header with precision_at_k_chunk explanation + measurement guidance. Add §10.3 Eval metrics section to frozen design documenting retrieval metrics (hit@k, MRR, recall@k_doc, P@k_chunk) + groundedness metrics. Flip p9-fb-39 spec status from open → completed (eval foundation only, lever deferral noted). Update tasks/INDEX.md fb-39 row mirror to fb-42 (merged, deferred note). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 22:35:15 +09:00
th-kim0823	bb0ec0469f	feat(eval): precision_at_k_chunk metric (P@5, P@10) (fb-39)	2026-05-10 22:26:21 +09:00
th-kim0823	7c6c2e8102	docs(claude): cargo clean routinely between merges target/ balloons to 90+ GB after a few task cycles (fb-* batches accumulate). User reported disk full mid-session twice — strengthen guidance from "if pressure shows up" to "routinely after each merged PR". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 21:48:43 +09:00