docs(p10-1a-1): wire schema + frozen design + README/HANDOFF/SMOKE + task index

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(p10-1a-1): patch wire.rs Stats fixture for new schema fields
2026-05-15 17:41:26 +09:00 · 2026-05-15 17:30:01 +09:00 · 2026-05-15 17:21:59 +09:00 · 2026-05-15 17:06:59 +09:00 · 2026-05-15 16:53:21 +09:00 · 2026-05-15 16:50:55 +09:00
25 changed files with 1492 additions and 168 deletions
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -4345,7 +4345,6 @@ version = "0.6.0"
 dependencies = [
 "anyhow",
 "gix",
- "kebab-core",
 "tempfile",
 ]

@@ -4460,6 +4459,7 @@ dependencies = [
 "ignore",
 "kebab-config",
 "kebab-core",
+ "kebab-parse-code",
 "serde",
 "serde_json",
 "tempfile",
--- a/HANDOFF.md
+++ b/HANDOFF.md
@@ -20,6 +20,7 @@ P0–P5 + P6 + P7 + P9-1/2/3/4 (Library / Search / Ask / Inspect) 머지 완료.
 | **P7** | PDF text + page citation | `kebab-parse-pdf` | P5 | ✅ 완료 (3/3 component, page-level chunker + ingest wiring) |
 | **P8** | 음성 transcription + timestamp citation | `kebab-parse-audio` | P5 | ⏸ 보류 (whisper-rs 시스템 dep brainstorm 필요) |
 | **P9** | TUI + desktop app | `kebab-tui`, `kebab-desktop` | P5 | 🟡 진행 (4/5 component — P9-1/2/3/4 완료 [Library / Search / Ask / Inspect], P9-5 desktop 예정 · 도그푸딩 피드백 **20/20 ✅**) |
+| **10** | code ingest framework | `kebab-parse-code` | P5 | 🟡 진행 중 (1A-1 머지 직전) — 1A-1 머지 시점 wire schema additive minor + 새 crate kebab-parse-code skeleton 동결, 실제 code chunker 는 1A-2 부터 |

 P0~P5 직렬. P6~P9 P5 이후 병렬 가능.

--- a/README.md
+++ b/README.md
@@ -71,7 +71,7 @@ kebab doctor
 |------|------|
 | `kebab init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 |
 | `kebab ingest [<path>]` | Markdown / 이미지 / PDF 색인 (idempotent). TTY 에서는 stderr 진행 바, non-TTY (CI / pipe) 는 stderr 한 줄씩, `--json` 은 stdout 에 `ingest_progress.v1` 라인 streaming 후 마지막에 `ingest_report.v1`. Ctrl-C 한 번이면 현재 asset 마무리 후 abort (부분 commit 보존, idempotent re-run), 두 번째 Ctrl-C 는 hard exit. Markdown title 이 frontmatter 에 없어도 첫 H1 → H2 → 첫 paragraph 80 자 → 파일명 순으로 자동 채움 (parser_version `md-frontmatter-v2`) — 기존 색인된 doc 도 다음 ingest 에서 새 title 로 갱신. **Incremental** (p9-fb-23): 두 번째 이후의 ingest 는 변하지 않은 doc (blake3 + parser/chunker/embedder version 모두 동일) 의 parse/chunk/embed/vector upsert 를 자동 스킵. final summary 에 `N unchanged` 카운트 표시. `--force-reingest` 로 skip 무시 강제 재처리. **지원 형식** (extractor 자동 결정 — config 에 명시 불가): Markdown (`.md`), 이미지 (`.png` / `.jpg` / `.jpeg`, OCR + caption), PDF (`.pdf`). 다른 확장자는 자동 skip — `IngestItem.warnings` 에 사유 (`"unsupported media type: .docx"` 등), `IngestReport.skipped_by_extension` 에 카운트 분류, CLI / TUI summary 에 breakdown 표시. |
-| `kebab search --mode {lexical,vector,hybrid} "<query>" [--no-cache] [--max-tokens N] [--snippet-chars N] [--cursor <opaque>] [--tag T] [--lang L] [--path-glob G] [--trust-min LEVEL] [--media TYPE] [--ingested-after RFC3339] [--doc-id ID] [--trace] [--bulk]` | 검색. hybrid는 RRF fusion, citation 포함. 같은 process 안에서 동일 query (NFKC + trim + lowercase 정규화) 반복 시 in-process LRU 캐시 hit (capacity = `[search] cache_capacity`, default 256). `--no-cache` 로 강제 bypass — 디버깅용. ingest commit 발생 시 `kv['corpus_revision']` bump 으로 모든 entry 자동 stale. **`--max-tokens` / `--snippet-chars` / `--cursor` (p9-fb-34)** — agent budget controls. `--json` 출력은 `search_response.v1` wrapper (`{hits, next_cursor, truncated}`) — pre-fb-34 의 bare array 와 호환 안 됨. mismatched cursor → `error.v1.code = stale_cursor`. **filter flags (p9-fb-36):** `--tag` 는 반복 가능 flag (`--tag rust --tag async`) 로 OR 매칭, `--media` 는 `,` 구분 다중 값 OR 매칭, 나머지 flags 간은 AND 조합. `--trust-min` 은 `primary\|secondary\|generated` 중 하나 (해당 level 이상 포함). `--ingested-after` 는 RFC3339 UTC — 파싱 실패 시 `error.v1.code = config_invalid` (exit 2). `--media md` 는 `markdown` alias 로 정규화. 알 수 없는 `--media` 값은 무조건 empty hits (오류 아님). **`--trace` (p9-fb-37)** — `search_response.v1.trace` 에 lexical / vector pre-fusion 후보 + RRF union + per-stage timing (`lexical_ms` / `vector_ms` / `fusion_ms` / `total_ms`) 노출. trace 요청은 캐시 우회 (`--no-cache` 없이도 항상 cold). **`--bulk` (p9-fb-42)** — stdin ndjson 으로 N query 한 번에 실행. `--json` 면 stdout per-query ndjson (`bulk_search_item.v1`) + stderr summary (`bulk_summary: total=N succeeded=S failed=F`). Cap 100. agent 가 query decomposition 후 sub-query 일괄 실행 시 single round-trip — App instance 재사용으로 캐시 / embedder cold-start 비용 한 번만. Per-query failure 는 item 의 `error` (error.v1) 에 격리, 다른 query 계속 진행. |
+| `kebab search --mode {lexical,vector,hybrid} "<query>" [--no-cache] [--max-tokens N] [--snippet-chars N] [--cursor <opaque>] [--tag T] [--lang L] [--path-glob G] [--trust-min LEVEL] [--media TYPE] [--ingested-after RFC3339] [--doc-id ID] [--trace] [--bulk] [--repo NAME ...] [--code-lang LIST] [--media code]` | 검색. hybrid는 RRF fusion, citation 포함. 같은 process 안에서 동일 query (NFKC + trim + lowercase 정규화) 반복 시 in-process LRU 캐시 hit (capacity = `[search] cache_capacity`, default 256). `--no-cache` 로 강제 bypass — 디버깅용. ingest commit 발생 시 `kv['corpus_revision']` bump 으로 모든 entry 자동 stale. **`--max-tokens` / `--snippet-chars` / `--cursor` (p9-fb-34)** — agent budget controls. `--json` 출력은 `search_response.v1` wrapper (`{hits, next_cursor, truncated}`) — pre-fb-34 의 bare array 와 호환 안 됨. mismatched cursor → `error.v1.code = stale_cursor`. **filter flags (p9-fb-36):** `--tag` 는 반복 가능 flag (`--tag rust --tag async`) 로 OR 매칭, `--media` 는 `,` 구분 다중 값 OR 매칭, 나머지 flags 간은 AND 조합. `--trust-min` 은 `primary\|secondary\|generated` 중 하나 (해당 level 이상 포함). `--ingested-after` 는 RFC3339 UTC — 파싱 실패 시 `error.v1.code = config_invalid` (exit 2). `--media md` 는 `markdown` alias 로 정규화. 알 수 없는 `--media` 값은 무조건 empty hits (오류 아님). **`--trace` (p9-fb-37)** — `search_response.v1.trace` 에 lexical / vector pre-fusion 후보 + RRF union + per-stage timing (`lexical_ms` / `vector_ms` / `fusion_ms` / `total_ms`) 노출. trace 요청은 캐시 우회 (`--no-cache` 없이도 항상 cold). **`--bulk` (p9-fb-42)** — stdin ndjson 으로 N query 한 번에 실행. `--json` 면 stdout per-query ndjson (`bulk_search_item.v1`) + stderr summary (`bulk_summary: total=N succeeded=S failed=F`). Cap 100. agent 가 query decomposition 후 sub-query 일괄 실행 시 single round-trip — App instance 재사용으로 캐시 / embedder cold-start 비용 한 번만. Per-query failure 는 item 의 `error` (error.v1) 에 격리, 다른 query 계속 진행. **code corpus filters (p10-1A-1):** `--repo` 는 반복 가능 (`--repo kebab --repo other`) OR 매칭. `--code-lang` 는 반복 또는 comma 다중 값 (`--code-lang rust,python`), 알 수 없는 값은 빈 hits. `--media code` 는 Tier 1/2/3 모든 code chunk 포함. 1A-1 시점에서는 indexed 된 code chunk 가 없어 filter 가 항상 빈 결과 — 1A-2 (Rust AST chunker) 머지 이후 실효. |
 | `kebab list docs` | 색인된 문서 목록 |
 | `kebab inspect doc <id>` / `kebab inspect chunk <id>` | raw record 보기 |
 | `kebab fetch chunk <id> [--context N]` / `kebab fetch doc <id> [--max-tokens N]` / `kebab fetch span <doc_id> <ls> <le> [--max-tokens N]` | (p9-fb-35) verbatim text fetch from indexed corpus. wire = `fetch_result.v1` (kind discriminator). chunk: target + ±N ordinal-context chunks. doc: full normalized markdown. span: 1-based line range (PDF/audio rejected as `error.v1.code = span_not_supported`). chars/4 budget on doc/span. |
@@ -184,6 +184,11 @@ flowchart TB
    - `dimensions` (default `1024`) — 모델의 embedding 차원. config 와 LanceDB stored dim 불일치 시 검색 결과 0 건 (orphan table). 모델 변경 시 `kebab reset --vector-only && kebab ingest` 로 vector index 재구축 권장.
  - `[ui] theme = "dark" | "light"` 로 TUI 팔레트 선택 (default `"dark"`, 알 수 없는 값은 dark fallback). 
  - `[search] stale_threshold_days = 30` (p9-fb-32) — search hit / RAG citation 의 `stale` 플래그 기준 (default 30 일, `0` 으로 비활성화). 옛 config 의 `workspace.include = [...]` 은 silently 무시 + 단발 deprecation warning (p9-fb-25).
+- `[ingest.code]` (p10-1A-1) — code ingest 의 skip 정책 + chunker 기본값.
+  - `skip_generated_header = true` — 첫 ~512 byte 의 generated marker (`@generated` / `DO NOT EDIT` 등) 감지 시 skip.
+  - `max_file_bytes = 262144` (256 KiB) / `max_file_lines = 5000` — 파일당 cap, 초과 시 skip.
+  - `extra_skip_globs = []` — 사용자 추가 skip 패턴 (`.gitignore` 문법).
+  - `.gitignore` honor: 자동 적용. `.kebabignore` 는 추가 layer. 우선순위: built-in safety net (`node_modules/` / `target/` / `__pycache__/` / `.venv/` / `venv/` / `env/`) > `.gitignore` > `.kebabignore`.
 - `[rag] prompt_template_version` (default `"rag-v2"`) — RAG system prompt version. `"rag-v1"` 은 legacy backwards-compat (사용자 명시 시 유지). v2 강화 규칙: (1) fact 인용 시 [#번호] 앞에 chunk 속 원문 큰따옴표 표기, (2) 학습 지식 동원 금지, (3) 근거 모호 시 "확실하지 않다" 명시.
 - `--config <path>` flag — 임시 워크스페이스 / 격리 테스트 시 사용. CLI / TUI 모두 honor.
 - `KEBAB_*` env — 일부 키 override (`KEBAB_RAG_SCORE_GATE`, `KEBAB_EVAL_GOLDEN`, `KEBAB_COMMIT_HASH` 등).
--- a/crates/kebab-app/src/lib.rs
+++ b/crates/kebab-app/src/lib.rs
@@ -44,7 +44,7 @@ use kebab_core::{
    Answer, Block, CanonicalDocument, Chunk, ChunkId, ChunkPolicy, ChunkerVersion, Chunker,
    DocFilter, DocSummary, DocumentId, DocumentStore, Embedder, EmbeddingInput,
    EmbeddingKind, ExtractContext, Extractor, IngestReport, Lang, LanguageModel, MediaType,
-    ParserVersion, RawAsset, SearchHit, SearchQuery, SkipExamples, SourceConnector, SourceScope,
+    ParserVersion, RawAsset, SearchHit, SearchQuery, SourceScope,
    SourceUri, VectorRecord, VectorStore,
 };
 use kebab_llm_local::OllamaLanguageModel;
@@ -305,8 +305,8 @@ pub fn ingest_with_config_opts(
    );
    let connector = FsSourceConnector::new(&app.config)
        .context("kb-app::ingest: build FsSourceConnector")?;
-    let assets = connector
-        .scan(&scope)
+    let (assets, fs_skips) = connector
+        .scan_with_skips(&scope)
        .context("kb-app::ingest: scan workspace")?;
    crate::ingest_progress::emit(
        progress,
@@ -675,12 +675,12 @@ pub fn ingest_with_config_opts(
        errors: error_count,
        duration_ms,
        skipped_by_extension,
-        skipped_gitignore: 0,
-        skipped_kebabignore: 0,
-        skipped_builtin_blacklist: 0,
-        skipped_generated: 0,
-        skipped_size_exceeded: 0,
-        skip_examples: SkipExamples::default(),
+        skipped_gitignore: fs_skips.skipped_gitignore,
+        skipped_kebabignore: fs_skips.skipped_kebabignore,
+        skipped_builtin_blacklist: fs_skips.skipped_builtin_blacklist,
+        skipped_generated: fs_skips.skipped_generated,
+        skipped_size_exceeded: fs_skips.skipped_size_exceeded,
+        skip_examples: fs_skips.skip_examples,
        items: if summary_only { None } else { Some(items) },
    })
 }
--- a/crates/kebab-app/src/schema.rs
+++ b/crates/kebab-app/src/schema.rs
@@ -45,7 +45,7 @@ pub struct Models {
    pub corpus_revision: u64,
 }

-#[derive(Debug, Clone, Serialize, Deserialize)]
+#[derive(Debug, Clone, Default, Serialize, Deserialize)]
 pub struct Stats {
    pub doc_count: u64,
    pub chunk_count: u64,
@@ -63,6 +63,14 @@ pub struct Stats {
    /// p9-fb-37: docs whose `updated_at` exceeds the staleness threshold.
    #[serde(default)]
    pub stale_doc_count: u64,
+    /// p10-1A-1: code language breakdown (chunk counts by canonical lowercase
+    /// language identifier). Empty until 1A-2 produces code chunks.
+    #[serde(default)]
+    pub code_lang_breakdown: std::collections::BTreeMap<String, u32>,
+    /// p10-1A-1: repo breakdown (chunk counts by `metadata.repo` value).
+    /// Empty until 1A-2 produces code chunks.
+    #[serde(default)]
+    pub repo_breakdown: std::collections::BTreeMap<String, u32>,
 }

 const KEBAB_VERSION: &str = env!("CARGO_PKG_VERSION");
@@ -158,6 +166,9 @@ fn collect_stats(
        lang_breakdown: counts.lang_breakdown,
        index_bytes,
        stale_doc_count: counts.stale_doc_count,
+        // p10-1A-1: populated by 1A-2 code ingest; empty until then.
+        code_lang_breakdown: std::collections::BTreeMap::new(),
+        repo_breakdown: std::collections::BTreeMap::new(),
    })
 }

@@ -182,6 +193,32 @@ fn collect_models(cfg: &Config, store: &kebab_store_sqlite::SqliteStore) -> Mode
 mod tests_stats_ext {
    use super::*;

+    /// p10-1A-1: Stats must serialize `code_lang_breakdown` and
+    /// `repo_breakdown` so downstream consumers (MCP skill, Claude Code)
+    /// can branch on their presence.
+    #[test]
+    fn stats_includes_code_lang_and_repo_breakdown_fields() {
+        let stats = Stats::default();
+        let v = serde_json::to_value(&stats).unwrap();
+        assert!(
+            v.get("code_lang_breakdown").is_some(),
+            "Stats JSON must include code_lang_breakdown: {v}"
+        );
+        assert!(
+            v.get("repo_breakdown").is_some(),
+            "Stats JSON must include repo_breakdown: {v}"
+        );
+        // Empty BTreeMap serializes as `{}` — confirm it's an object, not null.
+        assert!(
+            v["code_lang_breakdown"].is_object(),
+            "code_lang_breakdown must be an object: {v}"
+        );
+        assert!(
+            v["repo_breakdown"].is_object(),
+            "repo_breakdown must be an object: {v}"
+        );
+    }
+
    #[test]
    fn stats_includes_breakdowns_and_bytes_on_fresh_corpus() {
        let dir = tempfile::tempdir().unwrap();
--- a/crates/kebab-cli/src/main.rs
+++ b/crates/kebab-cli/src/main.rs
@@ -46,6 +46,11 @@ struct Cli {
    command: Cmd,
 }

+// p10-1A-1: adding `repo` and `code_lang` Vec<String> fields pushed `Cmd`
+// over clippy's large_enum_variant threshold. The enum is short-lived
+// (parsed once at startup, never cloned in a hot path) — boxing would add
+// noise with no real benefit.
+#[allow(clippy::large_enum_variant)]
 #[derive(Subcommand, Debug)]
 enum Cmd {
    /// Initialise XDG dirs + workspace + `config.toml`.
@@ -165,6 +170,18 @@ enum Cmd {
        #[arg(long)]
        doc_id: Option<String>,

+        /// p10-1A-1: filter by repo name (`metadata.repo`). Repeatable;
+        /// multi-value = OR.  Empty = no filter (all repos returned).
+        #[arg(long = "repo", value_name = "NAME", num_args = 1)]
+        repo: Vec<String>,
+
+        /// p10-1A-1: filter by code language identifier (lowercase
+        /// canonical).  Repeatable or comma-separated.
+        /// Examples: `rust`, `python`, `typescript`.
+        /// Unknown values produce empty hits.
+        #[arg(long = "code-lang", value_name = "LANG", num_args = 1, value_delimiter = ',')]
+        code_lang: Vec<String>,
+
        /// p9-fb-37: emit pre-fusion lexical / vector / RRF candidate
        /// lists + per-stage timing in the response. Bypasses cache
        /// (debug intent — fresh run guaranteed). Requires embeddings
@@ -688,6 +705,8 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
            media,
            ingested_after,
            doc_id,
+            repo,
+            code_lang,
            trace,
            bulk,
        } => {
@@ -819,7 +838,7 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
                    None => None,
                };

-            // p9-fb-36: build SearchFilters from the 7 new flags.
+            // p9-fb-36 + p10-1A-1: build SearchFilters from CLI flags.
            let filters = kebab_core::SearchFilters {
                tags_any: tag.clone(),
                lang: lang.as_ref().map(|s| kebab_core::Lang(s.clone())),
@@ -828,8 +847,8 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
                media: media_norm,
                ingested_after: ingested_after_parsed,
                doc_id: doc_id.as_ref().map(|s| kebab_core::DocumentId(s.clone())),
-                repo: vec![],
-                code_lang: vec![],
+                repo: repo.clone(),
+                code_lang: code_lang.clone(),
            };

            let q = kebab_core::SearchQuery {
--- a/crates/kebab-cli/src/wire.rs
+++ b/crates/kebab-cli/src/wire.rs
@@ -334,6 +334,8 @@ mod tests {
                lang_breakdown: Default::default(),
                index_bytes: Default::default(),
                stale_doc_count: 0,
+                // p10-1A-1: new fields added to Stats; use Default for the test fixture.
+                ..Default::default()
            },
        };
        let v = wire_schema(&schema);
--- a/crates/kebab-cli/tests/wire_citation_5_variants_unchanged.rs
+++ b/crates/kebab-cli/tests/wire_citation_5_variants_unchanged.rs
@@ -0,0 +1,100 @@
+//! p10-1A-1 Task 13: regression — the 5 original Citation variants
+//! (Line, Page, Region, Caption, Time) serialize byte-identically to
+//! pre-Task-1 form.  No spurious `code`, `line_start`, or `symbol` keys
+//! must leak into these variants.
+
+use kebab_core::{Citation, WorkspacePath};
+
+#[test]
+fn line_variant_serialization_unchanged() {
+    let c = Citation::Line {
+        path: WorkspacePath::new("a.md".into()).unwrap(),
+        start: 1,
+        end: 2,
+        section: Some("§14".into()),
+    };
+    let v = serde_json::to_value(&c).unwrap();
+    assert_eq!(v["kind"], "line");
+    assert_eq!(v["start"], 1);
+    assert_eq!(v["end"], 2);
+    assert_eq!(v["section"], "§14");
+    // Must not bleed Code-variant keys.
+    assert!(v.get("line_start").is_none(), "line_start must be absent: {v}");
+    assert!(v.get("symbol").is_none(), "symbol must be absent: {v}");
+    assert!(v.get("code").is_none(), "code must be absent: {v}");
+}
+
+#[test]
+fn line_variant_null_section_omitted() {
+    let c = Citation::Line {
+        path: WorkspacePath::new("b.md".into()).unwrap(),
+        start: 5,
+        end: 10,
+        section: None,
+    };
+    let v = serde_json::to_value(&c).unwrap();
+    assert_eq!(v["kind"], "line");
+    // `section` with None should be omitted (skip_serializing_if = is_none).
+    assert!(v.get("section").is_none() || v["section"].is_null());
+}
+
+#[test]
+fn page_variant_serialization_unchanged() {
+    let c = Citation::Page {
+        path: WorkspacePath::new("a.pdf".into()).unwrap(),
+        page: 13,
+        section: None,
+    };
+    let v = serde_json::to_value(&c).unwrap();
+    assert_eq!(v["kind"], "page");
+    assert_eq!(v["page"], 13);
+    assert!(v.get("line_start").is_none(), "line_start must be absent: {v}");
+    assert!(v.get("symbol").is_none(), "symbol must be absent: {v}");
+}
+
+#[test]
+fn region_variant_serialization_unchanged() {
+    let c = Citation::Region {
+        path: WorkspacePath::new("img.png".into()).unwrap(),
+        x: 10,
+        y: 20,
+        w: 100,
+        h: 200,
+    };
+    let v = serde_json::to_value(&c).unwrap();
+    assert_eq!(v["kind"], "region");
+    assert_eq!(v["x"], 10);
+    assert_eq!(v["y"], 20);
+    assert_eq!(v["w"], 100);
+    assert_eq!(v["h"], 200);
+    assert!(v.get("line_start").is_none(), "line_start must be absent: {v}");
+}
+
+#[test]
+fn caption_variant_serialization_unchanged() {
+    let c = Citation::Caption {
+        path: WorkspacePath::new("a.png".into()).unwrap(),
+        model: "qwen2.5-vl:7b".into(),
+    };
+    let v = serde_json::to_value(&c).unwrap();
+    assert_eq!(v["kind"], "caption");
+    assert_eq!(v["model"], "qwen2.5-vl:7b");
+    assert!(v.get("line_start").is_none(), "line_start must be absent: {v}");
+}
+
+#[test]
+fn time_variant_serialization_unchanged() {
+    let c = Citation::Time {
+        path: WorkspacePath::new("audio.mp3".into()).unwrap(),
+        start_ms: 1000,
+        end_ms: 5000,
+        speaker: Some("Alice".into()),
+    };
+    let v = serde_json::to_value(&c).unwrap();
+    assert_eq!(v["kind"], "time");
+    assert_eq!(v["start_ms"], 1000);
+    assert_eq!(v["end_ms"], 5000);
+    assert_eq!(v["speaker"], "Alice");
+    assert!(v.get("line_start").is_none(), "line_start must be absent: {v}");
+    assert!(v.get("symbol").is_none(), "symbol must be absent: {v}");
+}
--- a/crates/kebab-cli/tests/wire_search_filters_code.rs
+++ b/crates/kebab-cli/tests/wire_search_filters_code.rs
@@ -0,0 +1,72 @@
+//! p10-1A-1 Task 15: CLI accepts --repo and --code-lang flags.
+//!
+//! These tests verify that clap parses the new flags without error.
+//! They drive `kebab search --help` (which exercises flag parsing
+//! via clap's help generation path, exiting 0) or use a minimal
+//! config + `--json` round-trip to verify the flags reach the wire.
+
+use std::process::Command;
+
+fn kebab() -> Command {
+    Command::new(env!("CARGO_BIN_EXE_kebab"))
+}
+
+/// `kebab search --help` must exit 0 and mention `--repo`.
+#[test]
+fn cli_search_help_mentions_repo_flag() {
+    let out = kebab()
+        .args(["search", "--help"])
+        .output()
+        .expect("failed to run kebab");
+    // clap help exits 0.
+    assert!(
+        out.status.success(),
+        "kebab search --help exited non-zero: {:?}",
+        out.status
+    );
+    let stdout = String::from_utf8_lossy(&out.stdout);
+    assert!(
+        stdout.contains("--repo"),
+        "--repo flag must appear in search help output:\n{stdout}"
+    );
+}
+
+/// `kebab search --help` must exit 0 and mention `--code-lang`.
+#[test]
+fn cli_search_help_mentions_code_lang_flag() {
+    let out = kebab()
+        .args(["search", "--help"])
+        .output()
+        .expect("failed to run kebab");
+    assert!(
+        out.status.success(),
+        "kebab search --help exited non-zero: {:?}",
+        out.status
+    );
+    let stdout = String::from_utf8_lossy(&out.stdout);
+    assert!(
+        stdout.contains("--code-lang"),
+        "--code-lang flag must appear in search help output:\n{stdout}"
+    );
+}
+
+/// `kebab search --help` must exit 0 and mention `--media`.
+/// Confirms `--media code` value pathway is available (media is
+/// a free-form Vec<String> that already accepted arbitrary values).
+#[test]
+fn cli_search_help_mentions_media_flag() {
+    let out = kebab()
+        .args(["search", "--help"])
+        .output()
+        .expect("failed to run kebab");
+    assert!(
+        out.status.success(),
+        "kebab search --help exited non-zero: {:?}",
+        out.status
+    );
+    let stdout = String::from_utf8_lossy(&out.stdout);
+    assert!(
+        stdout.contains("--media"),
+        "--media flag must appear in search help output:\n{stdout}"
+    );
+}
--- a/crates/kebab-cli/tests/wire_search_hit_no_code_fields.rs
+++ b/crates/kebab-cli/tests/wire_search_hit_no_code_fields.rs
@@ -0,0 +1,47 @@
+//! p10-1A-1 Task 13: regression — markdown SearchHit omits `repo` and
+//! `code_lang` from JSON when both are `None`.
+//!
+//! Proves that adding optional fields to SearchHit does not silently
+//! inject spurious keys into the existing markdown corpus wire shape.
+
+use kebab_core::{
+    Citation, ChunkId, ChunkerVersion, DocumentId, IndexVersion, RetrievalDetail, ScoreKind,
+    SearchHit, WorkspacePath,
+};
+
+#[test]
+fn markdown_hit_omits_repo_and_code_lang() {
+    let hit = SearchHit {
+        rank: 1,
+        chunk_id: ChunkId("c1".into()),
+        doc_id: DocumentId("d1".into()),
+        doc_path: WorkspacePath::new("notes/foo.md".into()).unwrap(),
+        heading_path: vec!["A".into(), "B".into()],
+        section_label: Some("B".into()),
+        snippet: "hi".into(),
+        citation: Citation::Line {
+            path: WorkspacePath::new("notes/foo.md".into()).unwrap(),
+            start: 1,
+            end: 2,
+            section: None,
+        },
+        retrieval: RetrievalDetail::default(),
+        index_version: IndexVersion("v1".into()),
+        embedding_model: None,
+        chunker_version: ChunkerVersion("md-heading-v1".into()),
+        indexed_at: time::OffsetDateTime::UNIX_EPOCH,
+        stale: false,
+        score_kind: ScoreKind::Rrf,
+        repo: None,
+        code_lang: None,
+    };
+    let s = serde_json::to_string(&hit).unwrap();
+    assert!(
+        !s.contains("\"repo\""),
+        "repo should be absent from markdown hit JSON: {s}"
+    );
+    assert!(
+        !s.contains("\"code_lang\""),
+        "code_lang should be absent from markdown hit JSON: {s}"
+    );
+}
--- a/crates/kebab-config/src/lib.rs
+++ b/crates/kebab-config/src/lib.rs
@@ -45,6 +45,11 @@ pub struct Config {
    /// `dark`).
    #[serde(default = "UiCfg::defaults")]
    pub ui: UiCfg,
+    /// p10-1A-1: code ingest settings. `#[serde(default)]` so existing
+    /// config files without an `[ingest]` / `[ingest.code]` section
+    /// load cleanly with built-in defaults.
+    #[serde(default)]
+    pub ingest: IngestCfg,
    /// p9-fb-05: directory of the on-disk config file this `Config`
    /// was loaded from, if any. Populated by `Config::from_file` /
    /// `Config::load` — never serialized (`#[serde(skip)]`). Used by
@@ -265,6 +270,52 @@ impl UiCfg {
    }
 }

+/// p10-1A-1: top-level ingest configuration wrapper. Contains per-media-type
+/// sub-sections; currently only `code` is defined.
+#[derive(Clone, Debug, Default, PartialEq, Serialize, Deserialize)]
+#[serde(default)]
+pub struct IngestCfg {
+    pub code: IngestCodeCfg,
+}
+
+/// p10-1A-1: settings for the code ingest pipeline. All fields have
+/// reasonable defaults so the user need not set anything in `config.toml`
+/// to get working code ingest.
+#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
+#[serde(default)]
+pub struct IngestCodeCfg {
+    /// Generated header sniff. Reads first ~512 bytes, checks 7 markers.
+    pub skip_generated_header: bool,
+    /// Max byte size per file. Bigger files skipped.
+    pub max_file_bytes: u64,
+    /// Max line count per file. Bigger files skipped (byte cap checked first).
+    pub max_file_lines: u32,
+    /// User extra skip globs (gitignore syntax). Applied on top of built-in
+    /// + `.gitignore` + `.kebabignore`.
+    pub extra_skip_globs: Vec<String>,
+    /// AST chunk size cap. Functions/classes longer than this fall back to
+    /// paragraph-based split (1A-2 and later).
+    pub ast_chunk_max_lines: u32,
+    /// Tier 3 fallback chunker: lines per chunk.
+    pub fallback_lines_per_chunk: u32,
+    /// Tier 3 fallback chunker: line overlap between adjacent chunks.
+    pub fallback_lines_overlap: u32,
+}
+
+impl Default for IngestCodeCfg {
+    fn default() -> Self {
+        Self {
+            skip_generated_header: true,
+            max_file_bytes: 262_144,
+            max_file_lines: 5_000,
+            extra_skip_globs: vec![],
+            ast_chunk_max_lines: 200,
+            fallback_lines_per_chunk: 80,
+            fallback_lines_overlap: 20,
+        }
+    }
+}
+
 impl Config {
    /// Defaults per design §6.4.
    pub fn defaults() -> Self {
@@ -336,6 +387,7 @@ impl Config {
            },
            image: ImageCfg::defaults(),
            ui: UiCfg::defaults(),
+            ingest: IngestCfg::default(),
            // p9-fb-05: defaults are not loaded from disk, so no
            // source_dir. Relative `workspace.root` (rare with
            // defaults) falls back to caller `cwd` via the
@@ -1060,6 +1112,49 @@ max_context_tokens = 8000
            }
        }
    }
+
+    #[test]
+    fn ingest_code_cfg_defaults() {
+        let cfg: IngestCodeCfg = toml::from_str("").unwrap();
+        assert_eq!(cfg.max_file_bytes, 262_144);
+        assert_eq!(cfg.max_file_lines, 5_000);
+        assert!(cfg.skip_generated_header);
+        assert!(cfg.extra_skip_globs.is_empty());
+        assert_eq!(cfg.ast_chunk_max_lines, 200);
+        assert_eq!(cfg.fallback_lines_per_chunk, 80);
+        assert_eq!(cfg.fallback_lines_overlap, 20);
+    }
+
+    #[test]
+    fn ingest_code_cfg_user_override() {
+        let toml = r#"
+            max_file_bytes = 1048576
+            max_file_lines = 20000
+            skip_generated_header = false
+            extra_skip_globs = ["**/fixtures/**", "**/snapshots/**"]
+        "#;
+        let cfg: IngestCodeCfg = toml::from_str(toml).unwrap();
+        assert_eq!(cfg.max_file_bytes, 1_048_576);
+        assert_eq!(cfg.max_file_lines, 20_000);
+        assert!(!cfg.skip_generated_header);
+        assert_eq!(cfg.extra_skip_globs.len(), 2);
+    }
+
+    #[test]
+    fn config_with_ingest_code_section() {
+        // Build a full valid Config serialization and patch only the
+        // [ingest.code] field we care about — avoids having to enumerate
+        // every required Config field in the test fixture.
+        let base = Config::defaults();
+        let mut toml_text = toml::to_string(&base).unwrap();
+        // Inject max_file_bytes override into the [ingest.code] table.
+        toml_text = toml_text.replace(
+            "max_file_bytes = 262144",
+            "max_file_bytes = 524288",
+        );
+        let cfg: Config = toml::from_str(&toml_text).unwrap();
+        assert_eq!(cfg.ingest.code.max_file_bytes, 524_288);
+    }
 }

 #[cfg(test)]
--- a/crates/kebab-source-fs/Cargo.toml
+++ b/crates/kebab-source-fs/Cargo.toml
@@ -10,6 +10,7 @@ description   = "Local filesystem SourceConnector — walks workspace.root + app
 [dependencies]
 kebab-core = { path = "../kebab-core" }
 kebab-config = { path = "../kebab-config" }
+kebab-parse-code = { path = "../kebab-parse-code" }
 anyhow       = { workspace = true }
 serde        = { workspace = true }
 time         = { workspace = true }
--- a/crates/kebab-source-fs/src/connector.rs
+++ b/crates/kebab-source-fs/src/connector.rs
@@ -10,20 +10,20 @@
 //! }
 //! ```

-use std::path::PathBuf;
+use std::path::{Path, PathBuf};

 use anyhow::{Context, Result};
 use time::OffsetDateTime;

 use kebab_config::Config;
 use kebab_core::{
-    AssetStorage, Checksum, RawAsset, SourceConnector, SourceScope, SourceUri,
+    AssetStorage, Checksum, RawAsset, SkipExamples, SourceConnector, SourceScope, SourceUri,
    id_for_asset, to_posix,
 };

 use crate::hash::hash_file;
 use crate::media::media_type_for;
-use crate::walker::{build_overrides, read_kbignore, walk_files};
+use crate::walker::{SkipCategory, WalkOverrides, build_overrides, read_kbignore, walk_files_with_skips};

 /// Local-filesystem `SourceConnector`. Constructed once from `Config`,
 /// reused across `scan` calls.
@@ -36,10 +36,16 @@ use crate::walker::{build_overrides, read_kbignore, walk_files};
 ///     construction time.
 ///   - `copy_threshold_bytes`: `config.storage.copy_threshold_mb * 1 MiB`
 ///     pre-multiplied so we don't recompute per file.
+///   - `skip_generated_header`: `config.ingest.code.skip_generated_header`.
+///   - `max_file_bytes`: `config.ingest.code.max_file_bytes`.
+///   - `max_file_lines`: `config.ingest.code.max_file_lines`.
 pub struct FsSourceConnector {
    default_root: PathBuf,
    default_exclude: Vec<String>,
    copy_threshold_bytes: u64,
+    skip_generated_header: bool,
+    max_file_bytes: u64,
+    max_file_lines: u32,
 }

 impl FsSourceConnector {
@@ -59,109 +65,239 @@ impl FsSourceConnector {
            default_root: root,
            default_exclude: config.workspace.exclude.clone(),
            copy_threshold_bytes,
+            skip_generated_header: config.ingest.code.skip_generated_header,
+            max_file_bytes: config.ingest.code.max_file_bytes,
+            max_file_lines: config.ingest.code.max_file_lines,
        })
    }
-}

-impl SourceConnector for FsSourceConnector {
-    fn scan(&self, scope: &SourceScope) -> Result<Vec<RawAsset>> {
-        // `SourceScope::root` overrides config root when non-empty. This
-        // matches the design's "scope is the per-call lens; config is the
-        // default" split (§7.1).
+    /// Resolve the effective root and build the merged + per-source overrides.
+    fn resolve_scan_params(
+        &self,
+        scope: &SourceScope,
+    ) -> Result<(PathBuf, WalkOverrides)> {
        let root = if scope.root.as_os_str().is_empty() {
            self.default_root.clone()
        } else {
            scope.root.clone()
        };

-        // Union: config.workspace.exclude ∪ scope.exclude ∪ .kebabignore.
-        // Per §6.2 the union of `.kebabignore` and `config.workspace.exclude`
-        // is the filter set. `scope.exclude` is added on top so a caller
-        // can layer a per-call narrowing.
        let mut excludes = self.default_exclude.clone();
        excludes.extend(scope.exclude.iter().cloned());
-        // .kebabignore is re-read on every scan() so users can edit it without
-        // restarting any long-running process.
        let kbignore = read_kbignore(&root)?;

        let overrides = build_overrides(&root, &excludes, &kbignore)?;
+        Ok((root, overrides))
+    }

-        // TODO(P1-2/P1-3 router): apply SourceScope::include glob filter at the
-        // extractor router layer once that crate lands. SourceConnector emits all
-        // non-excluded files; routing by include-glob is a downstream concern
-        // (design §6.2 + §7.2 are silent on this split, treat it as router work).
-        //
-        // `scope.include` is intentionally ignored at this stage of the
-        // pipeline: per §6.2 the workspace-level include lives in
-        // `WorkspaceCfg` and is enforced by the asset writer / extractors.
-        // Surfacing it here would double-filter Markdown vs PDF before the
-        // extractor router gets to see them.
-        if !scope.include.is_empty() {
-            tracing::debug!(
-                count = scope.include.len(),
-                "FsSourceConnector ignores scope.include — handled by extractor router"
-            );
-        }
+    /// Scan the workspace and return the accepted assets together with
+    /// per-category skip counts and sample paths for `IngestReport`.
+    ///
+    /// This is the **preferred entry point** for `kebab-app`: it provides
+    /// all the information needed to populate `IngestReport.skipped_gitignore`,
+    /// `skipped_kebabignore`, `skipped_builtin_blacklist`, and `skip_examples`
+    /// without a second walker pass.
+    pub fn scan_with_skips(
+        &self,
+        scope: &SourceScope,
+    ) -> Result<(Vec<RawAsset>, FsScanSkips)> {
+        let (root, overrides) = self.resolve_scan_params(scope)?;

-        let files = walk_files(&root, &overrides)?;
+        log_scope_include_warning(scope);

-        let mut assets = Vec::with_capacity(files.len());
-        for abs in &files {
-            // `to_posix` does NFC + leading `./` strip + `#` rejection.
-            // Compute the workspace-relative path before handing to it so
-            // emitted `WorkspacePath` is always relative.
-            let rel = abs.strip_prefix(&root).unwrap_or(abs);
-            let workspace_path = match to_posix(rel) {
-                Ok(p) => p,
-                Err(e) => {
-                    // A path containing `#` is the only documented reason
-                    // `to_posix` fails today. Drop the file with a warning
-                    // rather than aborting the entire scan — a single bad
-                    // filename should not nuke a 10 000-file ingest.
-                    tracing::warn!(
-                        path = %abs.display(),
-                        error = %e,
-                        "skipping file: path is not a valid WorkspacePath",
+        let (files, skipped_entries) = walk_files_with_skips(&root, &overrides)?;
+
+        // Accumulate per-category skip counts and sample paths.
+        let mut fs_skips = FsScanSkips::default();
+        for entry in &skipped_entries {
+            match entry.category {
+                SkipCategory::BuiltinBlacklist => {
+                    fs_skips.skipped_builtin_blacklist =
+                        fs_skips.skipped_builtin_blacklist.saturating_add(1);
+                    push_sample(
+                        &mut fs_skips.skip_examples.builtin_blacklist,
+                        &entry.path,
+                        &root,
                    );
-                    continue;
                }
-            };
-
-            let media_type = media_type_for(abs);
-            let (byte_len, full_hex) = hash_file(abs)
-                .with_context(|| format!("hashing {}", abs.display()))?;
-            let checksum = Checksum(full_hex.clone());
-            let asset_id = id_for_asset(&full_hex);
-
-            // Storage variant signals *intent*, not an actual copy.
-            // P1-6 (asset writer) is responsible for the on-disk copy.
-            let stored = if byte_len > self.copy_threshold_bytes {
-                AssetStorage::Reference {
-                    path: abs.clone(),
-                    sha: checksum.clone(),
+                SkipCategory::Gitignore => {
+                    fs_skips.skipped_gitignore =
+                        fs_skips.skipped_gitignore.saturating_add(1);
+                    push_sample(
+                        &mut fs_skips.skip_examples.gitignore,
+                        &entry.path,
+                        &root,
+                    );
                }
-            } else {
-                AssetStorage::Copied { path: abs.clone() }
-            };
-
-            assets.push(RawAsset {
-                asset_id,
-                source_uri: SourceUri::File(abs.clone()),
-                workspace_path,
-                media_type,
-                byte_len,
-                checksum,
-                discovered_at: OffsetDateTime::now_utc(),
-                stored,
-            });
+                SkipCategory::Kebabignore => {
+                    fs_skips.skipped_kebabignore =
+                        fs_skips.skipped_kebabignore.saturating_add(1);
+                    // kebabignore intentionally NOT in skip_examples per spec §5.5.
+                }
+                SkipCategory::Other => {
+                    // DEFAULT_EXCLUDES or config.workspace.exclude — no dedicated
+                    // IngestReport counter; these are lumped into the existing
+                    // `skipped` field by kebab-app.
+                }
+            }
        }

-        // Determinism: sort by workspace_path. WorkspacePath is a String
-        // newtype with stable lexicographic ordering. Two scans of the
-        // same tree must produce identical Vec<RawAsset> modulo the
-        // wall-clock `discovered_at` field.
-        assets.sort_by(|a, b| a.workspace_path.0.cmp(&b.workspace_path.0));
+        // p10-1A-1: apply per-file generated-header + size-cap checks on files
+        // that passed the override (gitignore/builtin/kebabignore) matching.
+        // These run AFTER the walk-level skip attribution, BEFORE parse dispatch.
+        let mut accepted_files: Vec<PathBuf> = Vec::with_capacity(files.len());
+        for abs_path in files {
+            let rel_path = abs_path.strip_prefix(&root).unwrap_or(&abs_path);

+            // Generated-header sniff (config-gated).
+            if self.skip_generated_header
+                && kebab_parse_code::is_generated_file(&abs_path).unwrap_or(false)
+            {
+                fs_skips.skipped_generated =
+                    fs_skips.skipped_generated.saturating_add(1);
+                push_sample(
+                    &mut fs_skips.skip_examples.generated,
+                    &abs_path,
+                    &root,
+                );
+                tracing::debug!(
+                    path = %rel_path.display(),
+                    "skip: generated-file marker detected"
+                );
+                continue;
+            }
+
+            // Size-cap check (byte or line limit).
+            if kebab_parse_code::is_oversized(
+                &abs_path,
+                self.max_file_bytes,
+                self.max_file_lines,
+            )
+            .unwrap_or(false)
+            {
+                fs_skips.skipped_size_exceeded =
+                    fs_skips.skipped_size_exceeded.saturating_add(1);
+                push_sample(
+                    &mut fs_skips.skip_examples.size_exceeded,
+                    &abs_path,
+                    &root,
+                );
+                tracing::debug!(
+                    path = %rel_path.display(),
+                    max_bytes = self.max_file_bytes,
+                    max_lines = self.max_file_lines,
+                    "skip: file exceeds size cap"
+                );
+                continue;
+            }
+
+            accepted_files.push(abs_path);
+        }
+
+        let assets = build_assets(&accepted_files, &root, self.copy_threshold_bytes)?;
+        Ok((assets, fs_skips))
+    }
+}
+
+/// Per-category skip counts and sample paths returned alongside the asset list
+/// by [`FsSourceConnector::scan_with_skips`].
+///
+/// Populated from the walker's per-source matchers without a second pass.
+#[derive(Debug, Default)]
+pub struct FsScanSkips {
+    pub skipped_gitignore: u32,
+    pub skipped_kebabignore: u32,
+    pub skipped_builtin_blacklist: u32,
+    /// p10-1A-1: files skipped because their first ~512 bytes contained a
+    /// generated-file marker (`@generated`, `do not edit`, …).
+    pub skipped_generated: u32,
+    /// p10-1A-1: files skipped because they exceeded `max_file_bytes` or
+    /// `max_file_lines` in `[ingest.code]`.
+    pub skipped_size_exceeded: u32,
+    /// Sample paths per spec §5.5 (≤ 5 per category). Paths are
+    /// workspace-relative POSIX strings when available, absolute otherwise.
+    pub skip_examples: SkipExamples,
+}
+
+/// Push a path into a sample vec (cap = 5) as a workspace-relative POSIX
+/// string. Falls back to the lossy absolute path if relativisation fails.
+fn push_sample(samples: &mut Vec<String>, abs: &Path, root: &Path) {
+    if samples.len() >= 5 {
+        return;
+    }
+    let rel = abs.strip_prefix(root).unwrap_or(abs);
+    // Best-effort POSIX string; any non-UTF8 char → replacement char.
+    let s = rel.to_string_lossy().replace('\\', "/");
+    samples.push(s);
+}
+
+/// Convert a list of absolute file paths to `Vec<RawAsset>`, sorted by
+/// workspace-relative POSIX path for determinism.
+fn build_assets(
+    files: &[PathBuf],
+    root: &Path,
+    copy_threshold_bytes: u64,
+) -> Result<Vec<RawAsset>> {
+    let mut assets = Vec::with_capacity(files.len());
+    for abs in files {
+        let rel = abs.strip_prefix(root).unwrap_or(abs);
+        let workspace_path = match to_posix(rel) {
+            Ok(p) => p,
+            Err(e) => {
+                tracing::warn!(
+                    path = %abs.display(),
+                    error = %e,
+                    "skipping file: path is not a valid WorkspacePath",
+                );
+                continue;
+            }
+        };
+
+        let media_type = media_type_for(abs);
+        let (byte_len, full_hex) = hash_file(abs)
+            .with_context(|| format!("hashing {}", abs.display()))?;
+        let checksum = Checksum(full_hex.clone());
+        let asset_id = id_for_asset(&full_hex);
+
+        let stored = if byte_len > copy_threshold_bytes {
+            AssetStorage::Reference {
+                path: abs.clone(),
+                sha: checksum.clone(),
+            }
+        } else {
+            AssetStorage::Copied { path: abs.clone() }
+        };
+
+        assets.push(RawAsset {
+            asset_id,
+            source_uri: SourceUri::File(abs.clone()),
+            workspace_path,
+            media_type,
+            byte_len,
+            checksum,
+            discovered_at: OffsetDateTime::now_utc(),
+            stored,
+        });
+    }
+
+    assets.sort_by(|a, b| a.workspace_path.0.cmp(&b.workspace_path.0));
+    Ok(assets)
+}
+
+fn log_scope_include_warning(scope: &SourceScope) {
+    if !scope.include.is_empty() {
+        tracing::debug!(
+            count = scope.include.len(),
+            "FsSourceConnector ignores scope.include — handled by extractor router"
+        );
+    }
+}
+
+impl SourceConnector for FsSourceConnector {
+    fn scan(&self, scope: &SourceScope) -> Result<Vec<RawAsset>> {
+        // Delegate to scan_with_skips; discard the skip counts.
+        // Callers that need skip attribution should call scan_with_skips directly.
+        let (assets, _skips) = self.scan_with_skips(scope)?;
        Ok(assets)
    }
 }
@@ -401,4 +537,241 @@ mod tests {
        let v2 = conn2.scan(&SourceScope::default()).unwrap();
        assert!(matches!(v2[0].stored, AssetStorage::Copied { .. }));
    }
+
+    // ── IngestReport skip counter wiring tests ───────────────────────────────
+
+    #[test]
+    fn scan_with_skips_counts_gitignored_files() {
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::write(root.join(".gitignore"), "*.log\n").unwrap();
+        std::fs::write(root.join("ok.md"), b"# ok").unwrap();
+        std::fs::write(root.join("skipme.log"), b"x").unwrap();
+
+        let conn =
+            FsSourceConnector::new(&cfg_with_root(root.to_str().unwrap()))
+                .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_gitignore >= 1,
+            "skipped_gitignore should be >= 1; got {}",
+            skips.skipped_gitignore
+        );
+        assert!(
+            skips.skip_examples.gitignore.iter().any(|p| p.contains("skipme.log")),
+            "skip_examples.gitignore should contain 'skipme.log'; got: {:?}",
+            skips.skip_examples.gitignore
+        );
+        // kebabignore counter must be 0 — file matched gitignore, not kebabignore.
+        assert_eq!(skips.skipped_kebabignore, 0);
+    }
+
+    #[test]
+    fn scan_with_skips_counts_builtin_blacklist_dirs() {
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::create_dir_all(root.join("node_modules/foo")).unwrap();
+        std::fs::write(root.join("node_modules/foo/bar.js"), b"x").unwrap();
+        std::fs::write(root.join("ok.md"), b"# ok").unwrap();
+
+        let conn =
+            FsSourceConnector::new(&cfg_with_root(root.to_str().unwrap()))
+                .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_builtin_blacklist >= 1,
+            "skipped_builtin_blacklist should be >= 1; got {}",
+            skips.skipped_builtin_blacklist
+        );
+        assert!(
+            skips.skip_examples.builtin_blacklist.iter().any(|p| p.contains("node_modules")),
+            "skip_examples.builtin_blacklist should contain a node_modules path; got: {:?}",
+            skips.skip_examples.builtin_blacklist
+        );
+    }
+
+    #[test]
+    fn scan_with_skips_kebabignore_increments_counter_no_example() {
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::write(root.join(".kebabignore"), "*.secret\n").unwrap();
+        std::fs::write(root.join("ok.md"), b"x").unwrap();
+        std::fs::write(root.join("creds.secret"), b"pw").unwrap();
+
+        let conn =
+            FsSourceConnector::new(&cfg_with_root(root.to_str().unwrap()))
+                .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_kebabignore >= 1,
+            "skipped_kebabignore should be >= 1; got {}",
+            skips.skipped_kebabignore
+        );
+        // Per spec §5.5: kebabignore is intentionally NOT in skip_examples.
+        assert!(
+            skips.skip_examples.gitignore.is_empty(),
+            "gitignore examples should be empty; got: {:?}",
+            skips.skip_examples.gitignore
+        );
+        assert!(
+            skips.skip_examples.builtin_blacklist.is_empty(),
+            "builtin_blacklist examples should be empty; got: {:?}",
+            skips.skip_examples.builtin_blacklist
+        );
+    }
+
+    #[test]
+    fn scan_with_skips_builtin_priority_over_gitignore() {
+        // node_modules/ matches both BUILTIN_BLACKLIST and a .gitignore entry.
+        // It must be attributed to builtin (spec §5.2 priority order).
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::write(root.join(".gitignore"), "node_modules/\n").unwrap();
+        std::fs::create_dir_all(root.join("node_modules/pkg")).unwrap();
+        std::fs::write(root.join("node_modules/pkg/index.js"), b"x").unwrap();
+        std::fs::write(root.join("ok.md"), b"x").unwrap();
+
+        let conn =
+            FsSourceConnector::new(&cfg_with_root(root.to_str().unwrap()))
+                .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_builtin_blacklist >= 1,
+            "builtin counter should be >= 1; got {}",
+            skips.skipped_builtin_blacklist
+        );
+        assert_eq!(
+            skips.skipped_gitignore, 0,
+            "gitignore counter must be 0 when builtin wins; got {}",
+            skips.skipped_gitignore
+        );
+    }
+
+    #[test]
+    fn skip_examples_cap_at_five() {
+        // Write 7 .log files — skip_examples.gitignore must cap at 5.
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::write(root.join(".gitignore"), "*.log\n").unwrap();
+        for i in 0..7 {
+            std::fs::write(root.join(format!("f{i}.log")), b"x").unwrap();
+        }
+        std::fs::write(root.join("ok.md"), b"x").unwrap();
+
+        let conn =
+            FsSourceConnector::new(&cfg_with_root(root.to_str().unwrap()))
+                .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert_eq!(skips.skipped_gitignore, 7, "should count all 7");
+        assert_eq!(
+            skips.skip_examples.gitignore.len(),
+            5,
+            "skip_examples.gitignore must cap at 5; got: {:?}",
+            skips.skip_examples.gitignore
+        );
+    }
+
+    // ── p10-1A-1: generated-header + size-cap skip tests ────────────────────
+
+    /// Helper: connector with default ingest.code settings.
+    fn cfg_with_root_defaults(root: &str) -> Config {
+        // cfg_with_root already uses Config::defaults() which has
+        // skip_generated_header=true, max_file_bytes=262144, max_file_lines=5000.
+        cfg_with_root(root)
+    }
+
+    /// Helper: connector with overridden size caps.
+    fn cfg_with_size_cap(root: &str, max_bytes: u64, max_lines: u32) -> Config {
+        let mut c = cfg_with_root(root);
+        c.ingest.code.max_file_bytes = max_bytes;
+        c.ingest.code.max_file_lines = max_lines;
+        c
+    }
+
+    #[test]
+    fn ingest_report_counts_generated_files() {
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::write(root.join("normal.md"), "# hi").unwrap();
+        std::fs::write(root.join("autogen.rs"), "// @generated\nfn x() {}\n").unwrap();
+
+        let conn = FsSourceConnector::new(
+            &cfg_with_root_defaults(root.to_str().unwrap()),
+        )
+        .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_generated >= 1,
+            "skipped_generated should be >= 1; got {}",
+            skips.skipped_generated
+        );
+        assert!(
+            skips.skip_examples.generated.iter().any(|p| p.contains("autogen")),
+            "skip_examples.generated should contain 'autogen'; got: {:?}",
+            skips.skip_examples.generated
+        );
+        // The normal.md file must NOT be skipped.
+        let asset_paths: Vec<_> = _assets
+            .iter()
+            .map(|a| a.workspace_path.0.clone())
+            .collect();
+        assert!(
+            asset_paths.iter().any(|p| p.contains("normal")),
+            "normal.md should still be emitted; assets: {asset_paths:?}"
+        );
+    }
+
+    #[test]
+    fn ingest_report_counts_oversized_files_by_bytes() {
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        std::fs::write(root.join("normal.md"), "# hi").unwrap();
+        // Write a file larger than the 1024-byte cap.
+        let big: String = "x\n".repeat(1_000);
+        std::fs::write(root.join("huge.rs"), &big).unwrap();
+
+        let conn = FsSourceConnector::new(
+            &cfg_with_size_cap(root.to_str().unwrap(), 1024, 5_000),
+        )
+        .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_size_exceeded >= 1,
+            "skipped_size_exceeded should be >= 1; got {}",
+            skips.skipped_size_exceeded
+        );
+        assert!(
+            skips.skip_examples.size_exceeded.iter().any(|p| p.contains("huge")),
+            "skip_examples.size_exceeded should contain 'huge'; got: {:?}",
+            skips.skip_examples.size_exceeded
+        );
+    }
+
+    #[test]
+    fn ingest_report_size_cap_by_line_count() {
+        let dir = tempfile::tempdir().unwrap();
+        let root = dir.path();
+        // 6000 lines but small per-line — line cap of 5000 should trigger.
+        let body: String = "x\n".repeat(6_000);
+        std::fs::write(root.join("longfile.rs"), &body).unwrap();
+
+        let conn = FsSourceConnector::new(
+            &cfg_with_size_cap(root.to_str().unwrap(), 262_144, 5_000),
+        )
+        .unwrap();
+        let (_assets, skips) = conn.scan_with_skips(&SourceScope::default()).unwrap();
+
+        assert!(
+            skips.skipped_size_exceeded >= 1,
+            "skipped_size_exceeded should be >= 1 (line cap); got {}",
+            skips.skipped_size_exceeded
+        );
+    }
 }
--- a/crates/kebab-source-fs/src/lib.rs
+++ b/crates/kebab-source-fs/src/lib.rs
@@ -13,4 +13,4 @@ mod hash;
 mod media;
 mod walker;

-pub use connector::FsSourceConnector;
+pub use connector::{FsScanSkips, FsSourceConnector};
--- a/crates/kebab-source-fs/src/walker.rs
+++ b/crates/kebab-source-fs/src/walker.rs
@@ -1,12 +1,15 @@
 //! Directory walker with gitignore-style filtering and symlink-cycle
 //! protection.
 //!
-//! Filter set (per task spec, design §6.2):
-//!   - `config.workspace.exclude` (passed in by `FsSourceConnector`)
-//!   - `<root>/.kebabignore` (optional file at workspace root)
-//!   - default-excludes for `.DS_Store` and macOS resource forks (`._*`)
+//! Filter set, in order of application:
+//!   - DEFAULT_EXCLUDES (constants — VCS dirs, build artifacts, never-useful)
+//!   - `config.workspace.exclude` (user-supplied per workspace)
+//!   - `<root>/.kebabignore` (user-supplied kebab-specific exclude)
+//!   - Built-in safety-net blacklist (`node_modules/`, `target/`, etc. —
+//!     spec §5.2, applied via `kebab_parse_code::BUILTIN_BLACKLIST`)
+//!   - `<root>/.gitignore` (repo-root only, no nested cascade — spec §5.2)
 //!
-//! All three are merged via `ignore::overrides::OverrideBuilder`, which
+//! All five are merged via `ignore::overrides::OverrideBuilder`, which
 //! gives full gitignore semantics (anchors, `!` negation, `**`, etc.). We
 //! prepend `!` to each pattern because `OverrideBuilder` treats positive
 //! patterns as "include" and negative as "exclude" — see §"Filter set"
@@ -18,6 +21,16 @@
 //! `follow_links(true)`; we layer our own visited-set on top, keyed by the
 //! canonical path of every entry, and skip any entry we've already seen.
 //!
+//! ## Per-source skip attribution (spec §5.5)
+//!
+//! `walk_files_with_skips` returns a `WalkOverrides` struct that carries
+//! both a `combined` matcher (used for the actual walk decision) and three
+//! per-source matchers (`gitignore`, `kebabignore`, `builtin`). When an
+//! entry is excluded, `classify_skip` probes the per-source matchers in
+//! priority order (built-in > gitignore > kebabignore) to determine which
+//! `IngestReport` counter should be incremented — without requiring a
+//! second walker pass over the filesystem.
+//!
 //! ## Why `walkdir` instead of `ignore::WalkBuilder`?
 //!
 //! `ignore::WalkBuilder` bundles gitignore semantics + cycle detection in
@@ -32,7 +45,7 @@ use std::path::{Path, PathBuf};

 use anyhow::{Context, Result};
 use ignore::overrides::{Override, OverrideBuilder};
-use walkdir::{DirEntry, WalkDir};
+use walkdir::WalkDir;

 /// Default-excludes baked into the connector. These are NOT configurable;
 /// they cover noise that is never useful to ingest and would otherwise need
@@ -46,37 +59,222 @@ const DEFAULT_EXCLUDES: &[&str] = &[
    "**/._*",
 ];

-/// Build the merged `Override` from `config.workspace.exclude` ∪ `.kebabignore`
-/// ∪ baked-in default excludes.
+/// Per-source `Override` matchers for skip-counter attribution (spec §5.5).
+///
+/// `combined` is the merged union of all sources — used for the actual
+/// "is this entry excluded?" decision in the walker. The three per-source
+/// matchers (`gitignore`, `kebabignore`, `builtin`) are used ONLY when
+/// classifying an already-excluded path for `IngestReport` counter wiring;
+/// they are never consulted for every walked file.
+///
+/// `default_and_config` covers DEFAULT_EXCLUDES + `config.workspace.exclude`
+/// — these do NOT map to any of the three named `IngestReport` counters.
+pub(crate) struct WalkOverrides {
+    /// Merged matcher — same as today's `Override`; used for the walk decision.
+    pub combined: Override,
+    /// Matcher built from `<root>/.gitignore` patterns only.
+    pub gitignore: Override,
+    /// Matcher built from `<root>/.kebabignore` patterns only.
+    pub kebabignore: Override,
+    /// Matcher built from `kebab_parse_code::BUILTIN_BLACKLIST` only.
+    pub builtin: Override,
+}
+
+/// Skip attribution category. Used by the connector when counting per-source
+/// skips for `IngestReport` (spec §5.5).
+///
+/// Priority order per spec §5.2: built-in > gitignore > kebabignore.
+/// A path matching multiple sources is attributed to the first match.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum SkipCategory {
+    BuiltinBlacklist,
+    Gitignore,
+    Kebabignore,
+    /// Matched DEFAULT_EXCLUDES or `config.workspace.exclude`. No dedicated
+    /// counter in `IngestReport` — lumped into the existing `skipped` field.
+    Other,
+}
+
+/// Build a single `Override` from a list of gitignore-style patterns, all
+/// registered as excludes (prepend `!`).
+///
+/// Empty pattern list → an `Override` that matches nothing (i.e. no
+/// exclusions). Callers must strip blanks / comments before passing.
+fn build_single_matcher(root: &Path, patterns: &[&str]) -> Result<Override> {
+    let mut builder = OverrideBuilder::new(root);
+    for pat in patterns {
+        builder
+            .add(&format!("!{pat}"))
+            .with_context(|| format!("invalid pattern: {pat}"))?;
+    }
+    builder.build().context("failed to compile override")
+}
+
+/// Build the builtin-blacklist `Override`, adding directory-level patterns in
+/// addition to the `/**`-suffix ones from `BUILTIN_BLACKLIST`.
+///
+/// BUILTIN_BLACKLIST uses `**/X/**` patterns which match files *inside* X but
+/// NOT the directory entry `X` itself (because `**/X/**` requires a path
+/// component after X). The walker prunes at the directory level (`is_dir=true`),
+/// so we need `**/X` (no trailing `/**`) to also match the directory itself
+/// for attribution purposes.
+fn build_builtin_matcher(root: &Path) -> Result<Override> {
+    let mut builder = OverrideBuilder::new(root);
+    for pat in kebab_parse_code::BUILTIN_BLACKLIST {
+        // Register the original pattern (matches files inside the dir).
+        builder
+            .add(&format!("!{pat}"))
+            .with_context(|| format!("builtin pattern: {pat}"))?;
+        // Also derive a directory-level match by stripping trailing `/**`.
+        // This makes `is_dir=true` checks on the directory itself work.
+        if let Some(dir_pat) = pat.strip_suffix("/**") {
+            builder
+                .add(&format!("!{dir_pat}"))
+                .with_context(|| format!("builtin dir pattern: {dir_pat}"))?;
+        }
+    }
+    builder.build().context("failed to compile builtin override")
+}
+
+/// Owned-string variant of `build_single_matcher` for caller-supplied
+/// `Vec<String>` sources (config.workspace.exclude, .kebabignore).
+fn build_single_matcher_owned(root: &Path, patterns: &[String]) -> Result<Override> {
+    let mut builder = OverrideBuilder::new(root);
+    for pat in patterns {
+        builder
+            .add(&format!("!{pat}"))
+            .with_context(|| format!("invalid pattern: {pat}"))?;
+    }
+    builder.build().context("failed to compile override")
+}
+
+/// Build the merged `WalkOverrides` from all five filter sources, in order:
+/// DEFAULT_EXCLUDES, `config.workspace.exclude`, `.kebabignore`,
+/// built-in safety-net blacklist (`kebab_parse_code::BUILTIN_BLACKLIST`),
+/// and `<root>/.gitignore` (root-only, no nested cascade).
 ///
 /// Each input pattern is registered as an *exclude* (gitignore-style: a
 /// leading `!` flips a positive match to a negative one in the
 /// `OverrideBuilder` API). Order doesn't matter — the union is computed by
 /// the underlying gitignore engine.
+///
+/// The three per-source matchers (`gitignore`, `kebabignore`, `builtin`) are
+/// built in addition to the combined one so the connector can attribute skips
+/// to the correct `IngestReport` counter without a second walker pass.
 pub(crate) fn build_overrides(
    root: &Path,
    config_exclude: &[String],
    kbignore_patterns: &[String],
-) -> Result<Override> {
-    let mut builder = OverrideBuilder::new(root);
+) -> Result<WalkOverrides> {
+    let gitignore_patterns = read_gitignore(root)?;
+
+    // Per-source matchers (for attribution only).
+    let gitignore =
+        build_single_matcher(root, &gitignore_patterns.iter().map(|s| s.as_str()).collect::<Vec<_>>())?;
+    let kebabignore = build_single_matcher_owned(root, kbignore_patterns)?;
+    // Use the directory-aware builtin matcher so that `is_dir=true` checks on
+    // directory entries (e.g., `node_modules/`) are attributed to builtin rather
+    // than to an overlapping gitignore pattern.
+    let builtin = build_builtin_matcher(root)?;
+
+    // Combined matcher — union of all five sources.
+    let mut combined_builder = OverrideBuilder::new(root);

    for pat in DEFAULT_EXCLUDES {
-        builder
+        combined_builder
            .add(&format!("!{pat}"))
            .with_context(|| format!("invalid default-exclude pattern: {pat}"))?;
    }
    for pat in config_exclude {
-        builder
+        combined_builder
            .add(&format!("!{pat}"))
            .with_context(|| format!("invalid workspace.exclude pattern: {pat}"))?;
    }
    for pat in kbignore_patterns {
-        builder
+        combined_builder
            .add(&format!("!{pat}"))
            .with_context(|| format!("invalid .kebabignore pattern: {pat}"))?;
    }
+    for pat in kebab_parse_code::BUILTIN_BLACKLIST {
+        combined_builder
+            .add(&format!("!{pat}"))
+            .with_context(|| format!("built-in blacklist pattern: {pat}"))?;
+    }
+    for pat in &gitignore_patterns {
+        combined_builder
+            .add(&format!("!{pat}"))
+            .with_context(|| format!(".gitignore pattern: {pat}"))?;
+    }
+    let combined = combined_builder
+        .build()
+        .context("failed to compile combined override set")?;

-    builder.build().context("failed to compile override set")
+    Ok(WalkOverrides {
+        combined,
+        gitignore,
+        kebabignore,
+        builtin,
+    })
+}
+
+/// Classify why a path was excluded, using per-source matchers in spec §5.2
+/// priority order: built-in > gitignore > kebabignore > other.
+///
+/// `rel` must be relative to the walker root (same as `Override::matched`
+/// expects). `is_dir` should match what the original walker saw.
+pub(crate) fn classify_skip(rel: &Path, is_dir: bool, ov: &WalkOverrides) -> SkipCategory {
+    if ov.builtin.matched(rel, is_dir).is_ignore() {
+        return SkipCategory::BuiltinBlacklist;
+    }
+    if ov.gitignore.matched(rel, is_dir).is_ignore() {
+        return SkipCategory::Gitignore;
+    }
+    if ov.kebabignore.matched(rel, is_dir).is_ignore() {
+        return SkipCategory::Kebabignore;
+    }
+    SkipCategory::Other
+}
+
+/// Read `<root>/.gitignore` (single-file, root-only — nested cascade is P+).
+/// Missing file → empty Vec. Comments / blanks stripped.
+///
+/// Trailing-slash patterns (`dist/`) in real gitignore mean "match the
+/// directory AND everything inside it". `OverrideBuilder::matched(path,
+/// is_dir=false)` only checks `is_dir` for the trailing-slash variant, so
+/// `dist/bundle.js` would not be matched. We normalize by also emitting a
+/// `<stem>/**` variant so files inside the directory are caught.
+pub(crate) fn read_gitignore(root: &Path) -> Result<Vec<String>> {
+    let p = root.join(".gitignore");
+    if !p.exists() {
+        return Ok(vec![]);
+    }
+    let s = std::fs::read_to_string(&p)
+        .with_context(|| format!("read .gitignore at {}", p.display()))?;
+    let mut out = Vec::new();
+    for line in s.lines() {
+        let trimmed = line.trim();
+        if trimmed.is_empty() || trimmed.starts_with('#') {
+            continue;
+        }
+        // If the pattern starts with `!` (gitignore negation/un-ignore), pass through
+        // as-is. Trailing-slash normalization is unsafe here — the `!`-prefix and `/`-
+        // suffix combined confuse OverrideBuilder (would produce double-`!`).
+        if trimmed.starts_with('!') {
+            out.push(trimmed.to_string());
+            continue;
+        }
+        if let Some(stem) = trimmed.strip_suffix('/') {
+            // Keep the dir-only form so `is_dir=true` matches are still
+            // excluded (e.g., for skip_current_dir in the walker).
+            out.push(trimmed.to_string());
+            // Also emit a glob that catches files inside the directory,
+            // since `is_dir=false` won't satisfy the trailing-slash form.
+            out.push(format!("{stem}/**"));
+        } else {
+            out.push(trimmed.to_string());
+        }
+    }
+    Ok(out)
 }

 /// Read `<root>/.kebabignore` if it exists. Each non-blank, non-comment line is
@@ -96,51 +294,87 @@ pub(crate) fn read_kbignore(root: &Path) -> Result<Vec<String>> {
        .collect())
 }

-/// Iterate every regular file under `root`, applying `overrides` and
-/// detecting symlink cycles. Returns absolute file paths.
+/// Skipped-path record emitted by `walk_files_with_skips`.
+///
+/// `path` is the absolute path of the excluded entry (dir or file).
+/// For excluded directories, this is the directory itself — individual
+/// files inside are not enumerated (the subtree is pruned).
+pub(crate) struct SkippedEntry {
+    pub path: PathBuf,
+    pub category: SkipCategory,
+}
+
+/// Iterate every regular file under `root`, applying `overrides.combined` and
+/// detecting symlink cycles. Returns:
+///   - `accepted`: absolute paths of files that passed all filters.
+///   - `skipped`: entries that were excluded, with attribution.
+///
+/// For excluded *directories*, the directory path itself is returned (not the
+/// individual files inside — the subtree is pruned in one step, matching the
+/// walker's `skip_current_dir` behavior).
 ///
 /// Strategy:
 ///   - `walkdir::WalkDir::follow_links(true)` to traverse symlinks.
+///   - Manual per-entry check (instead of `filter_entry`) so we can capture
+///     the excluded paths for skip attribution.
 ///   - Maintain `visited: HashSet<PathBuf>` of *canonical* paths. Before
 ///     descending into a directory entry, canonicalize and check the set;
 ///     if already present, skip. This breaks `a -> b -> a` cycles in O(n)
 ///     per entry without a custom recursive walker.
-///   - For each yielded entry, ask `overrides` whether it is excluded; if
-///     so, drop it. If the entry is a directory, also short-circuit
-///     `WalkDir`'s descent via `it.skip_current_dir()`.
-pub(crate) fn walk_files(root: &Path, overrides: &Override) -> Result<Vec<PathBuf>> {
-    let mut out = Vec::new();
+pub(crate) fn walk_files_with_skips(
+    root: &Path,
+    overrides: &WalkOverrides,
+) -> Result<(Vec<PathBuf>, Vec<SkippedEntry>)> {
+    let mut accepted = Vec::new();
+    let mut skipped: Vec<SkippedEntry> = Vec::new();
    let mut visited: HashSet<PathBuf> = HashSet::new();

+    // Use a non-filtering iterator so we see excluded entries too.
    let walker = WalkDir::new(root).follow_links(true).into_iter();
-    let mut it = walker.filter_entry(|e| !is_excluded(e, root, overrides));
+    // We still use filter_entry for the *combined* override so that walkdir
+    // can short-circuit pruned directories. But we wrap it so we can capture
+    // the exclusion reason before discarding the entry.
+    //
+    // Problem: filter_entry discards without letting us see the entry first.
+    // Solution: use the raw iterator (no filter_entry) and manage skip_current_dir
+    // manually, which lets us record what was excluded before pruning.
+    let mut it = walker;

    while let Some(res) = it.next() {
        let entry = match res {
            Ok(e) => e,
            Err(err) => {
-                // `walkdir` surfaces I/O errors AND its own cycle detector
-                // (when follow_links is on it sometimes catches them).
-                // Either way: log and skip; do not abort the whole scan.
                tracing::warn!(error = %err, "walkdir entry error; skipping");
                continue;
            }
        };

        let path = entry.path();
+        let rel = match path.strip_prefix(root) {
+            Ok(p) => p,
+            Err(_) => path,
+        };
+        let is_dir = entry.file_type().is_dir();
+        let excluded = overrides.combined.matched(rel, is_dir).is_ignore();

-        // Cycle guard: only canonicalize symlinks (cheap on the common case
-        // of plain files/dirs) and on directories that are followed via a
-        // symlink. `walkdir`'s `path_is_symlink()` is true when the entry's
-        // *original* path is a symlink (it returns true for the link, not
-        // for the resolved target). For non-symlinked directories we still
-        // record the canonical path so a *later* symlink that points back
-        // to one of them is detected.
-        if entry.file_type().is_dir() {
+        if excluded {
+            let cat = classify_skip(rel, is_dir, overrides);
+            skipped.push(SkippedEntry {
+                path: path.to_path_buf(),
+                category: cat,
+            });
+            if is_dir {
+                // Prune the subtree — don't descend into excluded dirs.
+                it.skip_current_dir();
+            }
+            continue;
+        }
+
+        // Cycle guard for directories.
+        if is_dir {
            match std::fs::canonicalize(path) {
                Ok(canon) => {
                    if !visited.insert(canon) {
-                        // Already visited via another path → break cycle.
                        it.skip_current_dir();
                        continue;
                    }
@@ -157,26 +391,13 @@ pub(crate) fn walk_files(root: &Path, overrides: &Override) -> Result<Vec<PathBu
        }

        if entry.file_type().is_file() {
-            out.push(path.to_path_buf());
+            accepted.push(path.to_path_buf());
        }
    }

-    Ok(out)
+    Ok((accepted, skipped))
 }

-fn is_excluded(entry: &DirEntry, root: &Path, overrides: &Override) -> bool {
-    // `Override::matched(path, is_dir)` uses the path *relative to* the
-    // override builder's root. `walkdir` gives absolute paths when
-    // `WalkDir::new` was given an absolute path — strip the root prefix
-    // before consulting the override.
-    let rel = match entry.path().strip_prefix(root) {
-        Ok(p) => p,
-        Err(_) => entry.path(),
-    };
-    overrides
-        .matched(rel, entry.file_type().is_dir())
-        .is_ignore()
-}

 #[cfg(test)]
 mod tests {
@@ -187,7 +408,7 @@ mod tests {
        let dir = tempfile::tempdir().unwrap();
        let ov = build_overrides(dir.path(), &[], &[]).unwrap();
        // Default-excludes only; non-special files should not match.
-        let m = ov.matched(Path::new("notes/alpha.md"), false);
+        let m = ov.combined.matched(Path::new("notes/alpha.md"), false);
        assert!(!m.is_ignore());
    }

@@ -195,13 +416,13 @@ mod tests {
    fn default_excludes_ds_store_and_resource_forks() {
        let dir = tempfile::tempdir().unwrap();
        let ov = build_overrides(dir.path(), &[], &[]).unwrap();
-        assert!(ov.matched(Path::new(".DS_Store"), false).is_ignore());
+        assert!(ov.combined.matched(Path::new(".DS_Store"), false).is_ignore());
        assert!(
-            ov.matched(Path::new("notes/.DS_Store"), false).is_ignore()
+            ov.combined.matched(Path::new("notes/.DS_Store"), false).is_ignore()
        );
-        assert!(ov.matched(Path::new("._foo.md"), false).is_ignore());
+        assert!(ov.combined.matched(Path::new("._foo.md"), false).is_ignore());
        assert!(
-            ov.matched(Path::new("notes/._sidecar"), false).is_ignore()
+            ov.combined.matched(Path::new("notes/._sidecar"), false).is_ignore()
        );
    }

@@ -214,13 +435,13 @@ mod tests {
            &[],
        )
        .unwrap();
-        assert!(ov.matched(Path::new("a.tmp"), false).is_ignore());
-        assert!(ov.matched(Path::new("notes/x.tmp"), false).is_ignore());
+        assert!(ov.combined.matched(Path::new("a.tmp"), false).is_ignore());
+        assert!(ov.combined.matched(Path::new("notes/x.tmp"), false).is_ignore());
        assert!(
-            ov.matched(Path::new("node_modules/foo/bar.js"), false)
+            ov.combined.matched(Path::new("node_modules/foo/bar.js"), false)
                .is_ignore()
        );
-        assert!(!ov.matched(Path::new("alpha.md"), false).is_ignore());
+        assert!(!ov.combined.matched(Path::new("alpha.md"), false).is_ignore());
    }

    #[test]
@@ -233,9 +454,9 @@ mod tests {
            &["secret/**".to_string()],
        )
        .unwrap();
-        assert!(ov.matched(Path::new("a.tmp"), false).is_ignore());
+        assert!(ov.combined.matched(Path::new("a.tmp"), false).is_ignore());
        assert!(
-            ov.matched(Path::new("secret/key.md"), false).is_ignore()
+            ov.combined.matched(Path::new("secret/key.md"), false).is_ignore()
        );
    }

@@ -257,4 +478,230 @@ mod tests {
        let v = read_kbignore(dir.path()).unwrap();
        assert_eq!(v, vec!["*.tmp".to_string(), "ignored/**".to_string()]);
    }
+
+    #[test]
+    fn built_in_blacklist_excludes_node_modules() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        fs::create_dir_all(root.join("src")).unwrap();
+        fs::create_dir_all(root.join("node_modules/foo")).unwrap();
+        fs::write(root.join("src/main.rs"), "x").unwrap();
+        fs::write(root.join("node_modules/foo/bar.js"), "x").unwrap();
+
+        let overrides = build_overrides(root, &[], &[]).unwrap();
+        // Override::matched expects paths relative to the builder's root.
+        let m_in = overrides.combined.matched(Path::new("src/main.rs"), false);
+        let m_out = overrides.combined.matched(Path::new("node_modules/foo/bar.js"), false);
+
+        assert!(!m_in.is_ignore(), "src/main.rs should NOT be ignored");
+        assert!(m_out.is_ignore(), "node_modules/foo/bar.js SHOULD be ignored");
+    }
+
+    #[test]
+    fn built_in_blacklist_excludes_target_pycache_venv() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        for dir in ["target/x", "__pycache__/x", ".venv/x", "venv/x", "env/x"] {
+            fs::create_dir_all(root.join(dir)).unwrap();
+            fs::write(root.join(dir).join("y.txt"), "z").unwrap();
+        }
+        fs::create_dir_all(root.join("ok")).unwrap();
+        fs::write(root.join("ok/z.txt"), "z").unwrap();
+
+        let overrides = build_overrides(root, &[], &[]).unwrap();
+        // Override::matched expects paths relative to the builder's root.
+        for blacklisted in [
+            "target/x/y.txt",
+            "__pycache__/x/y.txt",
+            ".venv/x/y.txt",
+            "venv/x/y.txt",
+            "env/x/y.txt",
+        ] {
+            let m = overrides.combined.matched(Path::new(blacklisted), false);
+            assert!(m.is_ignore(), "{blacklisted} should be ignored");
+        }
+        let m_ok = overrides.combined.matched(Path::new("ok/z.txt"), false);
+        assert!(!m_ok.is_ignore(), "ok/z.txt should not be ignored");
+    }
+
+    #[test]
+    fn gitignore_at_repo_root_excludes_matching_files() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        fs::create_dir_all(root.join("src")).unwrap();
+        fs::write(root.join(".gitignore"), "*.log\ndist/\n").unwrap();
+        fs::write(root.join("a.log"), "x").unwrap();
+        fs::write(root.join("src/main.rs"), "x").unwrap();
+        fs::create_dir_all(root.join("dist")).unwrap();
+        fs::write(root.join("dist/bundle.js"), "x").unwrap();
+
+        let overrides = build_overrides(root, &[], &[]).unwrap();
+        assert!(overrides.combined.matched(Path::new("a.log"), false).is_ignore());
+        assert!(overrides.combined.matched(Path::new("dist/bundle.js"), false).is_ignore());
+        assert!(!overrides.combined.matched(Path::new("src/main.rs"), false).is_ignore());
+    }
+
+    #[test]
+    fn gitignore_missing_is_no_op() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        fs::write(root.join("a.log"), "x").unwrap();
+        fs::create_dir_all(root.join("src")).unwrap();
+        fs::write(root.join("src/main.rs"), "x").unwrap();
+
+        // No .gitignore present — patterns from .gitignore should not affect overrides.
+        let overrides = build_overrides(root, &[], &[]).unwrap();
+        assert!(!overrides.combined.matched(Path::new("a.log"), false).is_ignore());
+        assert!(!overrides.combined.matched(Path::new("src/main.rs"), false).is_ignore());
+    }
+
+    #[test]
+    fn gitignore_negation_with_trailing_slash_passes_through() {
+        use std::fs;
+        use tempfile::TempDir;
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        // Negation pattern. We don't fully implement gitignore negation
+        // semantics, but at minimum it must not produce double-`!` corruption.
+        fs::write(root.join(".gitignore"), "!keep/\n").unwrap();
+        // Just verify build_overrides doesn't error.
+        let result = build_overrides(root, &[], &[]);
+        assert!(result.is_ok(), "should not error on negation pattern: {:?}", result.err());
+    }
+
+    // ── Skip attribution tests ────────────────────────────────────────────────
+
+    #[test]
+    fn classify_skip_attributes_builtin_over_gitignore() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        // node_modules matches both BUILTIN_BLACKLIST and a hypothetical
+        // .gitignore entry. Builtin must win (priority order §5.2).
+        fs::write(root.join(".gitignore"), "node_modules/\n").unwrap();
+
+        let ov = build_overrides(root, &[], &[]).unwrap();
+        // node_modules/ dir itself
+        let cat = classify_skip(Path::new("node_modules"), true, &ov);
+        assert_eq!(cat, SkipCategory::BuiltinBlacklist, "builtin must have priority");
+    }
+
+    #[test]
+    fn classify_skip_attributes_gitignore_for_log_files() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        fs::write(root.join(".gitignore"), "*.log\n").unwrap();
+
+        let ov = build_overrides(root, &[], &[]).unwrap();
+        let cat = classify_skip(Path::new("app.log"), false, &ov);
+        assert_eq!(cat, SkipCategory::Gitignore);
+    }
+
+    #[test]
+    fn classify_skip_attributes_kebabignore() {
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+
+        let ov = build_overrides(root, &[], &["*.secret".to_string()]).unwrap();
+        let cat = classify_skip(Path::new("creds.secret"), false, &ov);
+        assert_eq!(cat, SkipCategory::Kebabignore);
+    }
+
+    #[test]
+    fn walk_files_with_skips_counts_gitignored_files() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        fs::write(root.join(".gitignore"), "*.log\n").unwrap();
+        fs::write(root.join("ok.md"), "# ok").unwrap();
+        fs::write(root.join("skipme.log"), "x").unwrap();
+
+        let ov = build_overrides(root, &[], &[]).unwrap();
+        let (accepted, skipped_entries) = walk_files_with_skips(root, &ov).unwrap();
+
+        let accepted_names: Vec<_> = accepted
+            .iter()
+            .map(|p| p.file_name().unwrap().to_string_lossy().into_owned())
+            .collect();
+        assert!(
+            accepted_names.iter().any(|n| n == "ok.md"),
+            "ok.md should be accepted; got: {accepted_names:?}"
+        );
+        assert!(
+            !accepted_names.iter().any(|n| n == "skipme.log"),
+            "skipme.log should not be accepted; got: {accepted_names:?}"
+        );
+
+        let gitignore_skipped: Vec<_> = skipped_entries
+            .iter()
+            .filter(|e| e.category == SkipCategory::Gitignore)
+            .collect();
+        assert!(
+            gitignore_skipped.iter().any(|e| e.path.file_name()
+                .map(|n| n == "skipme.log")
+                .unwrap_or(false)),
+            "skipme.log should appear in gitignore_skipped; skipped: {:?}",
+            skipped_entries.iter().map(|e| &e.path).collect::<Vec<_>>()
+        );
+    }
+
+    #[test]
+    fn walk_files_with_skips_counts_builtin_blacklist_dirs() {
+        use std::fs;
+        use tempfile::TempDir;
+
+        let tmp = TempDir::new().unwrap();
+        let root = tmp.path();
+        fs::create_dir_all(root.join("node_modules/foo")).unwrap();
+        fs::write(root.join("node_modules/foo/bar.js"), "x").unwrap();
+        fs::write(root.join("ok.md"), "# ok").unwrap();
+
+        let ov = build_overrides(root, &[], &[]).unwrap();
+        let (accepted, skipped_entries) = walk_files_with_skips(root, &ov).unwrap();
+
+        let accepted_names: Vec<_> = accepted
+            .iter()
+            .map(|p| p.file_name().unwrap().to_string_lossy().into_owned())
+            .collect();
+        assert!(
+            accepted_names.iter().any(|n| n == "ok.md"),
+            "ok.md must be accepted; got: {accepted_names:?}"
+        );
+
+        let builtin_skipped: Vec<_> = skipped_entries
+            .iter()
+            .filter(|e| e.category == SkipCategory::BuiltinBlacklist)
+            .collect();
+        assert!(
+            !builtin_skipped.is_empty(),
+            "node_modules/ should produce at least one BuiltinBlacklist skip"
+        );
+        assert!(
+            builtin_skipped.iter().any(|e| e.path.components()
+                .any(|c| c.as_os_str() == "node_modules")),
+            "skipped path should contain node_modules; got: {:?}",
+            builtin_skipped.iter().map(|e| &e.path).collect::<Vec<_>>()
+        );
+    }
 }
--- a/crates/kebab-source-fs/tests/symlink_cycle.rs
+++ b/crates/kebab-source-fs/tests/symlink_cycle.rs
@@ -9,7 +9,7 @@
 //! Expected: `scan` returns in O(seconds), every emitted path is unique,
 //! and `alpha.md` appears at least once.
 //!
-//! The cycle guard lives in `walker::walk_files`; this test exists to
+//! The cycle guard lives in `walker::walk_files_with_skips`; this test exists to
 //! prove it catches the realistic shape (cycle through one or more
 //! symlinks) end-to-end via the public API.

@@ -100,7 +100,7 @@ fn two_step_directory_cycle_visited_set_breaks_loop() {
    //
    // Without the visited-set, walkdir would descend
    //   a → a/loop (=b) → a/loop/loop (=a) → … forever.
-    // The canonical-path visited-set in `walker::walk_files` must break
+    // The canonical-path visited-set in `walker::walk_files_with_skips` must break
    // the loop and yield a finite, deterministic result.
    let dir = tempfile::tempdir().unwrap();
    let root = dir.path();
--- a/docs/SMOKE.md
+++ b/docs/SMOKE.md
@@ -113,6 +113,11 @@ max_context_tokens = 6000

 [ui]
 theme = "dark"                       # p9-fb-14 — TUI palette ("dark" / "light", default "dark")
+
+[ingest.code]
+skip_generated_header = true
+max_file_bytes = 262144
+max_file_lines = 5000
 ```

 `KEBAB_*` 환경변수로 override 가능 (`KEBAB_MODELS_LLM_MODEL=gemma4:26b kebab …` 등). 자세한 키 목록은 `crates/kebab-config/src/lib.rs` 의 `apply_env` 매치 암. `KEBAB_READONLY=1` — write-path 비활성화 (CI 안전망). `KEBAB_PROGRESS=plain` — non-TTY 환경에서 진행 상황을 plain 한 줄씩 stderr 출력 (spinner 대신).
--- a/docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
+++ b/docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
@@ -37,6 +37,7 @@ related_tasks: ../../../tasks/INDEX.md
 | – | ignore | gitignore 문법 + `.kebabignore` | 익숙함 |
 | – | 에러 | thiserror per crate, anyhow at boundary | 추적성 + UX |
 | – | sync | watch=false default | v1 명시 ingest |
+| C+ | code ingest 추가 | Tier 1/2/3 fan-out, e5-large 유지, 새 Citation `code` variant | 2026-05-15 spec |

 ---

@@ -168,12 +169,12 @@ $ kebab search "Markdown chunking 규칙"

 `docs/wire-schema/v1/*.schema.json` 으로 동결. internal Rust struct ↔ wire 변환은 `From`/`TryFrom`. 모든 wire 객체는 `schema_version` 필드 필수.

-### 2.1 Citation (5 variants — discriminated by `kind`)
+### 2.1 Citation (6 variants — discriminated by `kind`)

 ```json
 {
  "schema_version": "citation.v1",
-  "kind": "line|page|region|caption|time",
+  "kind": "line|page|region|caption|time|code",
  "path": "notes/rust/kebab.md",
  "uri":  "notes/rust/kebab.md#L12-L34",

@@ -181,7 +182,20 @@ $ kebab search "Markdown chunking 규칙"
  "page":    { "page": 13, "section": "Experiment Setup" },
  "region":  { "x": 120, "y": 40, "w": 520, "h": 180 },
  "caption": { "model": "qwen2.5-vl:7b" },
-  "time":    { "start_ms": 822000, "end_ms": 850000, "speaker": "S1" }
+  "time":    { "start_ms": 822000, "end_ms": 850000, "speaker": "S1" },
+  "code":    { "start": 10, "end": 42, "lang": "rust", "repo": "kebab", "symbol": "fn ingest" }
+}
+```
+
+code variant example (p10-1A-1):
+
+```json
+{
+  "schema_version": "citation.v1",
+  "kind": "code",
+  "path": "crates/kebab-app/src/ingest.rs",
+  "uri":  "crates/kebab-app/src/ingest.rs#L10-L42",
+  "code": { "start": 10, "end": 42, "lang": "rust", "repo": "kebab", "symbol": "fn ingest" }
 }
 ```

@@ -213,7 +227,9 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
  },
  "index_version": "v1.0",
  "embedding_model": "multilingual-e5-large",
-  "chunker_version": "md-heading-v1"
+  "chunker_version": "md-heading-v1",
+  "repo": null,        // p10-1A-1: optional, omitted when null (code corpus only)
+  "code_lang": null    // p10-1A-1: optional, omitted when null (code corpus only)
 }
 ```

@@ -297,6 +313,17 @@ Per-query failure 는 `bulk_search_item.v1.error` (error.v1) 에 격리, 다른
  "scope": { "root": "/home/altair/KnowledgeBase", "include": ["**/*.md"], "exclude": [".git/**"] },
  "scanned": 142, "new": 12, "updated": 3, "skipped": 127, "errors": 0,
  "duration_ms": 4231,
+  "skipped_gitignore": 40,
+  "skipped_kebabignore": 5,
+  "skipped_builtin_blacklist": 80,
+  "skipped_generated": 2,
+  "skipped_size_exceeded": 1,
+  "skip_examples": {
+    "generated": ["crates/kebab-app/src/generated.rs"],
+    "size_exceeded": ["crates/kebab-app/fixtures/huge.rs"],
+    "builtin_blacklist": ["target/release/kebab"],
+    "gitignore": ["node_modules/lodash/index.js"]
+  },
  "items": [
    {
      "kind": "new|updated|skipped|error",
@@ -434,6 +461,8 @@ pub struct PromptTemplateVersion(pub String);
 pub struct SchemaVersion(pub &'static str);
 ```

+Note: `chunker_version` family extended in phase 10 (per-language pattern, see 2026-05-15 spec §3.3 for canonical list). Each new language AST chunker registers its own `ChunkerVersion` label (e.g. `code-rust-ast-v1`, `code-python-ast-v1`). The existing `md-heading-v1` / `pdf-page-v1` labels are unaffected.
+
 ### 3.3 RawAsset

 ```rust
@@ -577,6 +606,11 @@ pub struct Metadata {
    pub trust_level: TrustLevel,
    pub user_id_alias: Option<String>,
    pub user: serde_json::Map<String, serde_json::Value>,
+    // p10-1A-1: code corpus fields — None for non-code assets.
+    pub repo: Option<String>,         // git repo name (top-level dir or remote basename)
+    pub git_branch: Option<String>,   // HEAD branch name at ingest time
+    pub git_commit: Option<String>,   // HEAD commit SHA (short, 12 chars) at ingest time
+    pub code_lang: Option<String>,    // lowercase language name (e.g. "rust", "python")
 }

 pub enum SourceType { Markdown, Note, Paper, Reference, Inbox }
@@ -1370,8 +1404,11 @@ pub trait JobRepo {
 kebab-cli, kebab-tui, kebab-desktop
   └─> kebab-app
         ├─> kebab-source-fs
+         │     └─> kebab-parse-code (p10-1A-1: lang detect / repo detect / skip policy)
         ├─> kebab-parse-md / kebab-parse-pdf / kebab-parse-image / kebab-parse-audio
         │     └─> kebab-parse-types (parser intermediate)
+         ├─> kebab-parse-code
+         │     └─> kebab-core (domain types only — NO store/embed/llm/rag/UI)
         ├─> kebab-normalize
         │     └─> kebab-parse-types
         ├─> kebab-chunk
@@ -1555,6 +1592,8 @@ agent 가 분기). HTTP-SSE transport 는 fb-29 deferral 따라 P+. classify
 - real-time collab
 - enterprise auth

+코드 ingest 는 더 이상 비-스코프 아님 (2026-05-15 spec). 단 multi-workspace / watch mode / history aware (git blame 기반 citation, diff-aware re-chunking) 는 그대로 비-스코프.
+
 ---

 ## 12. 다음 단계
--- a/docs/wire-schema/v1/citation.schema.json
+++ b/docs/wire-schema/v1/citation.schema.json
@@ -7,7 +7,7 @@
  "required": ["schema_version", "kind", "path", "uri", "indexed_at", "stale"],
  "properties": {
    "schema_version": { "const": "citation.v1" },
-    "kind": { "enum": ["line", "page", "region", "caption", "time"] },
+    "kind": { "enum": ["line", "page", "region", "caption", "time", "code"] },
    "path": { "type": "string" },
    "uri":  { "type": "string" },
    "line":    { "type": "object" },
@@ -15,6 +15,7 @@
    "region":  { "type": "object" },
    "caption": { "type": "object" },
    "time":    { "type": "object" },
+    "code":    { "type": "object" },
    "indexed_at": { "type": "string", "format": "date-time" },
    "stale":      { "type": "boolean" }
  }
--- a/docs/wire-schema/v1/ingest_report.schema.json
+++ b/docs/wire-schema/v1/ingest_report.schema.json
@@ -38,6 +38,20 @@
      },
      "description": "p9-fb-25: per-extension skip count. Key = lowercase extension without leading dot (e.g. 'docx'). Files without extension key under '<no-ext>'."
    },
-    "items":          { "type": ["array", "null"] }
+    "items":          { "type": ["array", "null"] },
+    "skipped_gitignore":         { "type": "integer", "minimum": 0 },
+    "skipped_kebabignore":       { "type": "integer", "minimum": 0 },
+    "skipped_builtin_blacklist": { "type": "integer", "minimum": 0 },
+    "skipped_generated":         { "type": "integer", "minimum": 0 },
+    "skipped_size_exceeded":     { "type": "integer", "minimum": 0 },
+    "skip_examples": {
+      "type": "object",
+      "properties": {
+        "generated":         { "type": "array", "items": { "type": "string" }, "maxItems": 5 },
+        "size_exceeded":     { "type": "array", "items": { "type": "string" }, "maxItems": 5 },
+        "builtin_blacklist": { "type": "array", "items": { "type": "string" }, "maxItems": 5 },
+        "gitignore":         { "type": "array", "items": { "type": "string" }, "maxItems": 5 }
+      }
+    }
  }
 }
--- a/docs/wire-schema/v1/schema.schema.json
+++ b/docs/wire-schema/v1/schema.schema.json
@@ -78,6 +78,16 @@
          "type": "integer",
          "minimum": 0,
          "description": "p9-fb-37: docs whose updated_at exceeds config.search.stale_threshold_days. 0 when threshold=0."
+        },
+        "code_lang_breakdown": {
+          "type": "object",
+          "description": "p10-1A-1: per-language code chunk count. Key = lowercase language name (e.g. 'rust', 'python'). Populated after 1A-2 lands; empty on markdown-only corpora.",
+          "additionalProperties": { "type": "integer", "minimum": 0 }
+        },
+        "repo_breakdown": {
+          "type": "object",
+          "description": "p10-1A-1: per-repo code chunk count. Key = repo name as detected by kebab-parse-code::repo. Empty on markdown-only corpora.",
+          "additionalProperties": { "type": "integer", "minimum": 0 }
        }
      }
    }
--- a/docs/wire-schema/v1/search_hit.schema.json
+++ b/docs/wire-schema/v1/search_hit.schema.json
@@ -42,6 +42,8 @@
    "embedding_model": { "type": ["string", "null"] },
    "chunker_version": { "type": "string" },
    "indexed_at":      { "type": "string", "format": "date-time" },
-    "stale":           { "type": "boolean" }
+    "stale":           { "type": "boolean" },
+    "repo":            { "type": ["string", "null"] },
+    "code_lang":       { "type": ["string", "null"] }
  }
 }
--- a/tasks/INDEX.md
+++ b/tasks/INDEX.md
@@ -35,6 +35,7 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.
 | P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kebab-parse-pdf | P5 |
 | P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kebab-parse-audio | P5 |
 | P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kebab-tui, kebab-desktop | P5 |
+| P10 | [p10/INDEX.md](p10/INDEX.md) | Code ingest framework + AST chunkers | kebab-parse-code, kebab-source-fs (code walk) | P5 |

 ## Component task decomposition (per phase)

@@ -137,6 +138,15 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.
    - [p9-fb-41 multi-hop reasoning](p9/p9-fb-41-multi-hop-reasoning.md) — ⏳ 미구현, brainstorm 필요 (XL, eval 인프라 선행)
    - [p9-fb-42 bulk multi-query + re-rank hint](p9/p9-fb-42-bulk-multi-query-rerank.md) — ✅ 머지 (2026-05-10) — bulk only, rerank hint deferred

+- P10 — [p10/](p10/) — code ingest (multi-task, sub-indexed in [p10/INDEX.md](p10/INDEX.md))
+  - [p10-1A-1 code ingest framework](p10/p10-1a-1-code-ingest-framework.md) — 🟡 진행 중
+  - p10-1A-2 Rust AST chunker — ⏳
+  - p10-1B Python + TS/JS AST chunkers — ⏳
+  - p10-1C Go + Java + Kotlin AST chunkers — ⏳
+  - p10-1D C + C++ AST chunkers — ⏳
+  - p10-2 Tier 2 resource-aware — ⏳
+  - p10-3 Tier 3 paragraph + line-window fallback — ⏳
+
 ## Post-merge 핫픽스

 머지 후 발견된 버그들과 그 follow-up PR들은 [HOTFIXES.md](HOTFIXES.md)에 dated 로그로 기록한다. 원래 task spec은 frozen 상태로 두고, post-merge 동작 변경은 HOTFIXES.md를 source of truth로 본다.
--- a/tasks/p10/INDEX.md
+++ b/tasks/p10/INDEX.md
@@ -0,0 +1,13 @@
+# Phase 10 — Code Ingest
+
+| ID | Subject | Status |
+|----|---------|--------|
+| 1A-1 | code ingest framework (wire schema, parse-code crate skeleton, filter flags, skip policy, config 절) | 🟡 진행 중 |
+| 1A-2 | Rust AST chunker | ⏳ |
+| 1B | Python + TS/JS AST chunkers | ⏳ |
+| 1C | Go + Java + Kotlin AST chunkers | ⏳ |
+| 1D | C + C++ AST chunkers | ⏳ |
+| 2 | Tier 2 resource-aware (k8s / Dockerfile / manifest) | ⏳ |
+| 3 | Tier 3 paragraph + line-window fallback | ⏳ |
+
+Design: [2026-05-15-kebab-code-ingest-design.md](../../docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md)
--- a/tasks/p10/p10-1a-1-code-ingest-framework.md
+++ b/tasks/p10/p10-1a-1-code-ingest-framework.md
@@ -0,0 +1,31 @@
+# p10-1A-1 — code ingest framework
+
+**Status:** 🟡 진행 중
+**Contract sections:** §2.1 (Citation `code` variant), §2.2 (SearchHit repo/code_lang), §2.4 (IngestReport skip counters), §2 schema.v1 (code_lang_breakdown + repo_breakdown), §3.6 (Metadata fields), §8 (kebab-parse-code crate boundary), §11 (code ingest no longer 비-스코프).
+**Design:** [2026-05-15-kebab-code-ingest-design.md](../../docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md) §1A-1.
+**Plan:** [2026-05-15-p10-1a-1-code-ingest-framework.md](../../docs/superpowers/plans/2026-05-15-p10-1a-1-code-ingest-framework.md).
+
+## Goal
+
+Land the *framework surface* for code ingest — wire schema (additive minor), CLI filter flags, ignore policy, skip policy infrastructure, `kebab-parse-code` crate skeleton, `[ingest.code]` config section — without enabling any code chunker. 1A-2 plugs in the Rust AST chunker on top.
+
+## Acceptance criteria
+
+- `cargo test --workspace --no-fail-fast -j 1` passes.
+- Regression test (`wire_search_hit_no_code_fields`, `wire_citation_5_variants_unchanged`) passes — markdown corpus wire output unchanged.
+- `cargo clippy --workspace --all-targets -- -D warnings` passes.
+- `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` updated per design §10.1.
+- README + HANDOFF + SMOKE updated.
+
+## Allowed dependencies
+
+- `kebab-parse-code` may depend on `kebab-core`, `anyhow`, `gix`. NOT on store / embed / llm / rag / UI.
+- Source-fs may depend on `kebab-parse-code`.
+
+## Forbidden dependencies
+
+- UI crates (cli / mcp / tui) must NOT import `kebab-parse-code` directly.
+
+## Risks / notes
+
+- `.gitignore` honor changes existing behavior for markdown corpora whose files live in gitignored areas. Regression test covers the standard case (no overlap). If a user reports missing docs after 1A-1 lands, log to HOTFIXES.
Author	SHA1	Message	Date
th-kim0823	7bbd2c0cbf	docs(p10-1a-1): wire schema + frozen design + README/HANDOFF/SMOKE + task index Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 17:41:26 +09:00
th-kim0823	d13f58d28a	fix(p10-1a-1): patch wire.rs Stats fixture for new schema fields Task 16's new code_lang_breakdown / repo_breakdown fields broke the existing schema_wrapper_tags_schema_version test in wire.rs which constructs Stats { ... } literally. Use ..Default::default() since Stats now derives Default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 17:30:01 +09:00
th-kim0823	298f4adc81	feat(p10-1a-1): CLI filter flags + SchemaStats breakdowns + regression tests Task 13: add wire regression tests proving markdown SearchHit omits repo/code_lang when None, and all 5 original Citation variants serialize byte-identically without spurious Code-variant keys. Task 15: add --repo (repeatable) and --code-lang (repeatable, comma-separated) flags to `kebab search`; propagate both into SearchFilters instead of the previous vec![] stub. Add #[allow(clippy::large_enum_variant)] — Cmd is short-lived, boxing buys nothing. Task 16: add code_lang_breakdown and repo_breakdown BTreeMap fields to Stats (schema.v1); derive Default on Stats; populate both as empty in collect_stats (1A-2 fills them when code chunks land). Add unit test asserting both keys are present in the serialized object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 17:21:59 +09:00
th-kim0823	4e8b70a04b	feat(p10-1a-1): apply generated-header + size-cap skip per file Wire kebab_parse_code::is_generated_file and is_oversized into FsSourceConnector::scan_with_skips. Files that pass gitignore/builtin/ kebabignore matching are now checked for generated-file markers (config-gated via ingest.code.skip_generated_header) and byte/line caps (ingest.code.max_file_bytes / max_file_lines). FsScanSkips gains skipped_generated + skipped_size_exceeded counters; kebab-app threads them into IngestReport. Also fixes a pre-existing clippy::derivable_impls warning in IngestCfg. Three new connector tests cover all three paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 17:06:59 +09:00
th-kim0823	682f7dd3a2	feat(p10-1a-1): add [ingest.code] config section Add IngestCfg + IngestCodeCfg structs with serde defaults and embed ingest: IngestCfg into the top-level Config. Existing configs without an [ingest] section continue to load unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:53:21 +09:00
th-kim0823	40b3ea8408	chore(p10-1a-1): cleanup Task 11 review findings + sync Cargo.lock Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:50:55 +09:00
th-kim0823	9fce24b106	feat(p10-1a-1): wire IngestReport skip counters by category (gitignore/builtin/kebabignore) Refactor walker to expose WalkOverrides (combined + per-source matchers), add walk_files_with_skips that returns accepted files alongside skip attribution, wire FsSourceConnector::scan_with_skips into kebab-app so IngestReport.skipped_gitignore, skipped_kebabignore, skipped_builtin_blacklist, and skip_examples are populated instead of left at zero. Priority order per spec §5.2 (builtin > gitignore > kebabignore) enforced in classify_skip, with a directory-aware builtin matcher so pruned directory entries are correctly attributed to builtin rather than a coincident gitignore entry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:42:28 +09:00
th-kim0823	8bbe25dc10	fix(p10-1a-1): guard .gitignore negation + sync doc comments Prevent double-`!` corruption when a `.gitignore` negation pattern (e.g. `!keep/`) hits the trailing-slash normalizer in `read_gitignore`. Also updates module-level and `build_overrides` doc to list all five filter sources in application order, and adds a regression test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:30:00 +09:00
th-kim0823	abfdcbd31d	feat(p10-1a-1): honor repo-root .gitignore in walker overrides Adds read_gitignore() (pub(crate), root-only, nested cascade deferred) and merges its patterns as a 5th group in build_overrides(). Trailing- slash patterns (dist/) are normalized to also emit a stem/** glob so files inside the directory are matched when is_dir=false. Two new tests cover both the happy path and the missing-file no-op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:25:19 +09:00
th-kim0823	69d1593bc5	feat(p10-1a-1): integrate built-in blacklist into walker overrides Wires `kebab_parse_code::BUILTIN_BLACKLIST` (6 patterns: node_modules, target, __pycache__, .venv, venv, env) into `build_overrides()` so the walker automatically excludes these directories even when the user has no `.kebabignore`. TDD cycle: 2 failing tests added first, then the pattern-add loop inserted after the existing kbignore block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:13:39 +09:00