Merge pull request 'feat(config): config.toml v2→v3 스키마 재편 — 미디어 [ingest.*] 통합 + 무손실 자동 마이그레이션' (#207 ) from feat/config-schema-reorg into main

Reviewed-on: #207
refactor(config): PR #207 회차 1 반영 — from_file toml::Value 단일 파싱
2026-06-04 14:36:41 +00:00 · 2026-06-04 13:33:07 +00:00 · 2026-06-04 13:12:01 +00:00 · 2026-06-04 13:03:07 +00:00 · 2026-06-04 12:56:25 +00:00 · 2026-06-04 12:48:52 +00:00
34 changed files with 2553 additions and 315 deletions
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -4751,7 +4751,7 @@ dependencies = [

 [[package]]
 name = "kebab-app"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "base64 0.22.1",
@@ -4799,7 +4799,7 @@ dependencies = [

 [[package]]
 name = "kebab-chunk"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "blake3",
@@ -4817,7 +4817,7 @@ dependencies = [

 [[package]]
 name = "kebab-cli"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "clap",
@@ -4838,7 +4838,7 @@ dependencies = [

 [[package]]
 name = "kebab-config"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "dirs 5.0.1",
@@ -4854,7 +4854,7 @@ dependencies = [

 [[package]]
 name = "kebab-core"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "blake3",
@@ -4868,7 +4868,7 @@ dependencies = [

 [[package]]
 name = "kebab-embed"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "blake3",
@@ -4882,7 +4882,7 @@ dependencies = [

 [[package]]
 name = "kebab-embed-candle"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "candle-core",
@@ -4902,7 +4902,7 @@ dependencies = [

 [[package]]
 name = "kebab-embed-local"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "fastembed",
@@ -4915,7 +4915,7 @@ dependencies = [

 [[package]]
 name = "kebab-embed-ollama"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "kebab-config",
@@ -4930,7 +4930,7 @@ dependencies = [

 [[package]]
 name = "kebab-eval"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "kebab-app",
@@ -4949,7 +4949,7 @@ dependencies = [

 [[package]]
 name = "kebab-llm"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "kebab-core",
@@ -4958,7 +4958,7 @@ dependencies = [

 [[package]]
 name = "kebab-llm-local"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "kebab-config",
@@ -4975,7 +4975,7 @@ dependencies = [

 [[package]]
 name = "kebab-mcp"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "kebab-app",
@@ -4993,7 +4993,7 @@ dependencies = [

 [[package]]
 name = "kebab-nli"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "hf-hub",
@@ -5008,7 +5008,7 @@ dependencies = [

 [[package]]
 name = "kebab-parse-code"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "gix",
@@ -5031,7 +5031,7 @@ dependencies = [

 [[package]]
 name = "kebab-parse-image"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "ab_glyph",
 "anyhow",
@@ -5059,7 +5059,7 @@ dependencies = [

 [[package]]
 name = "kebab-parse-md"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "kebab-core",
@@ -5076,7 +5076,7 @@ dependencies = [

 [[package]]
 name = "kebab-parse-pdf"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "blake3",
@@ -5091,7 +5091,7 @@ dependencies = [

 [[package]]
 name = "kebab-rag"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "blake3",
@@ -5113,7 +5113,7 @@ dependencies = [

 [[package]]
 name = "kebab-search"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "globset",
@@ -5132,7 +5132,7 @@ dependencies = [

 [[package]]
 name = "kebab-source-fs"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "blake3",
@@ -5150,7 +5150,7 @@ dependencies = [

 [[package]]
 name = "kebab-store-sqlite"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "blake3",
@@ -5170,7 +5170,7 @@ dependencies = [

 [[package]]
 name = "kebab-store-vector"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "arrow",
@@ -5194,7 +5194,7 @@ dependencies = [

 [[package]]
 name = "kebab-tui"
-version = "0.27.0"
+version = "0.28.0"
 dependencies = [
 "anyhow",
 "crossterm",
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -32,7 +32,7 @@ edition       = "2024"
 rust-version  = "1.85"
 license       = "MIT OR Apache-2.0"
 repository    = "https://github.com/altair823/kebab"
-version       = "0.27.0"   # v0.27.0 — PP-OCRv5 ONNX Rust 네이티브 OCR 엔진: `[image.ocr] engine = "paddle-onnx"` (default 여전히 "ollama-vision") 로 in-process 검출+인식(`ort` =2.0.0-rc.9, Python 런타임 0). DBNet det + CTC rec, 후처리(min-area rect/unclip)는 pure-Rust. e2e CER 0.005(synthetic 한/영, PoC 0.024 대비 우수), 큰 페이지 CPU <4초(Ollama vision ~50초 대비). 신규 config `det_model`/`rec_model`/`dict`/`score_thresh`/`unclip_ratio`/`max_boxes` + `KEBAB_IMAGE_OCR_*` env. ingest 서명 `|ocr:1:{engine}:{engine_version}` 로 engine/모델 변경 시 자동 재색인. 신규 인터페이스(engine 값/config 키) → minor. — CLAUDE.md §Release
+version       = "0.28.0"   # v0.28.0 — config 스키마 v2→v3 재편: 미디어 형식 설정을 `[ingest.*]` 우산으로 통합(`[indexing]`→`[ingest]` 스칼라, `[chunking]`/`[image.ocr]`/`[image.caption]`/`[pdf.ocr]`→`[ingest.*]`). 기존 v2 파일은 load 시 메모리 자동 변환(디스크 미변경), 파일 갱신은 `kebab config migrate`(값·주석 보존). env 이름(LHS) 100% 보존 + RHS 만 새 경로, 신규 `KEBAB_PDF_OCR_{DET_MODEL,REC_MODEL,DICT,SCORE_THRESH,UNCLIP_RATIO,MAX_BOXES}`. `ingest_config_signature` 바이트 불변(재색인 0). PdfOcrCfg paddle 대칭 키. 신규 인터페이스(config 레이아웃 rename + env 추가) → minor. — CLAUDE.md §Release

 # pre-v0.18 workspace-wide cleanup: enable clippy::pedantic group with
 # intentional allow-list. The allowed lints are either cosmetic (doc style),
--- a/README.md
+++ b/README.md
@@ -68,7 +68,7 @@ rsync -a <src-data_dir>/lancedb/      user@server:<dst-data_dir>/lancedb/

 ### 멀티미디어 색인

-Markdown · PDF · 이미지(OCR + caption) · 소스코드(Rust/Python/TS/JS/Go/Java/Kotlin/C/C++ AST) · 리소스(YAML/Dockerfile/TOML/JSON/XML 등)를 확장자에 따라 자동으로 적절한 chunker 에 라우팅한다. embedded text 가 없는 scanned PDF 는 `[pdf.ocr]` 로 page-단위 OCR (opt-in). 전체 확장자→chunker 매핑은 [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md).
+Markdown · PDF · 이미지(OCR + caption) · 소스코드(Rust/Python/TS/JS/Go/Java/Kotlin/C/C++ AST) · 리소스(YAML/Dockerfile/TOML/JSON/XML 등)를 확장자에 따라 자동으로 적절한 chunker 에 라우팅한다. embedded text 가 없는 scanned PDF 는 `[ingest.pdf.ocr]` 로 page-단위 OCR (opt-in). 전체 확장자→chunker 매핑은 [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md).

 ### RAG (근거 인용 + 거절)

@@ -182,10 +182,11 @@ prompt_template_version = "rag-v3"   # 답변 언어 = 질문 언어. rag-v1/v2
 nli_threshold = 0.0                  # >0 (예: 0.5) 면 mDeBERTa XNLI groundedness 검증.
 ```

+- **`[ingest]`** (v0.28.0) — 모든 형식 ingest 설정의 우산. 병렬도(`max_parallel_extractors`/`max_parallel_embeddings`/`watch_filesystem`, ← 옛 `[indexing]`)와 형식별 하위 절(`[ingest.chunking]` ← 옛 `[chunking]`, `[ingest.code]`, `[ingest.image.ocr]` ← 옛 `[image.ocr]`, `[ingest.pdf.ocr]` ← 옛 `[pdf.ocr]`)이 전부 이 아래로 모인다. 기존 v2 `config.toml` 은 그대로 둬도 로드 시 메모리에서 자동 변환되며, 파일을 새 레이아웃으로 갱신하려면 `kebab config migrate` (값·주석 보존).
 - **파생물 캐시** — embedding 결과를 내용 해시로 자동 캐싱한다 (위 「핵심 기능」 참고). 설정 항목 없음.
 - **`[ingest.code]`** — code ingest 의 skip 정책 (`skip_generated_header`, `max_file_bytes`, `extra_skip_globs`). `.gitignore` 자동 honor, `.kebabignore` 는 추가 layer.
- **`[image.ocr]`** — 이미지 OCR (default off / opt-in). `engine` 으로 백엔드 선택: `"ollama-vision"` (default, 원격 vision LM) 또는 `"paddle-onnx"` (v0.27.0 신규 — PP-OCRv5 ONNX 를 in-process 로 실행, Python 런타임 불필요, 큰 페이지 CPU <4초, 오프라인). `paddle-onnx` 는 워크스페이스에 번들된 모델을 쓰며 `det_model`/`rec_model`/`dict` 로 경로 override, `score_thresh`(0.3)/`unclip_ratio`(1.5)/`max_boxes`(1000) 로 검출 튜닝 가능 (`KEBAB_IMAGE_OCR_*` env 동일 지원). engine 또는 모델을 바꾸면 영향 이미지가 자동 재색인된다.
- **`[pdf.ocr]`** — scanned PDF 의 page-단위 OCR (default off / opt-in, page 당 ~수십 초 cost). `engine` 은 `[image.ocr]` 과 동일하게 `"ollama-vision"`/`"paddle-onnx"` 선택. 활성화 후 v0.19 시절 색인분은 `kebab ingest --force-reingest` 로 재처리.
+- **`[ingest.image.ocr]`** — 이미지 OCR (default off / opt-in). `engine` 으로 백엔드 선택: `"ollama-vision"` (default, 원격 vision LM) 또는 `"paddle-onnx"` (PP-OCRv5 ONNX 를 in-process 로 실행, Python 런타임 불필요, 큰 페이지 CPU <4초, 오프라인). `paddle-onnx` 는 워크스페이스에 번들된 모델을 쓰며 `det_model`/`rec_model`/`dict` 로 경로 override, `score_thresh`(0.3)/`unclip_ratio`(1.5)/`max_boxes`(1000) 로 검출 튜닝 가능 (`KEBAB_IMAGE_OCR_*` env 동일 지원 — env 이름은 v3 에서도 불변). engine 또는 모델을 바꾸면 영향 이미지가 자동 재색인된다.
+- **`[ingest.pdf.ocr]`** — scanned PDF 의 page-단위 OCR (default off / opt-in, page 당 ~수십 초 cost). `engine` 은 `[ingest.image.ocr]` 과 동일하게 `"ollama-vision"`/`"paddle-onnx"` 선택. v3 에서 paddle 모델 경로 키(`det_model`/`rec_model`/`dict`/`score_thresh`/`unclip_ratio`/`max_boxes`)를 PDF 자체적으로 가질 수 있다(`KEBAB_PDF_OCR_*` env 동일). 활성화 후 옛 색인분은 `kebab ingest --force-reingest` 로 재처리.
 - **`--config <path>`** — 임시 워크스페이스 / 격리 테스트용 (CLI · TUI 모두 honor).
 - **`kebab config migrate`** — 새 버전에서 추가된 config 섹션을 기존 `config.toml` 에 설명 주석과 함께 채워 넣는다 (사용자가 손본 값·주석·순서는 보존, 멱등, 변경 시 자동 `.bak` 백업). `--dry-run` 으로 변경 미리보기. `kebab doctor` 가 갱신 필요 시 안내한다. `kebab init` 으로 새로 생성되는 config.toml 도 섹션별 주석을 포함한다.
 - **`KEBAB_*` env** — 일부 키 override (`KEBAB_RAG_SCORE_GATE`, `KEBAB_EVAL_GOLDEN` 등).
--- a/crates/kebab-app/src/app.rs
+++ b/crates/kebab-app/src/app.rs
@@ -924,7 +924,7 @@ impl App {
            k: u32::try_from(query.k).unwrap_or(u32::MAX),
            snippet_chars: u32::try_from(self.config.search.snippet_chars).unwrap_or(u32::MAX),
            embedding_version,
-            chunker_version: self.config.chunking.chunker_version.clone(),
+            chunker_version: self.config.ingest.chunking.chunker_version.clone(),
            corpus_revision: self.sqlite.corpus_revision(),
        })
    }
@@ -1025,7 +1025,7 @@ impl App {
 fn lexical_index_version(config: &kebab_config::Config) -> IndexVersion {
    IndexVersion(format!(
        "lex:{}:fts5-v009-korean-morphological",
-        config.chunking.chunker_version
+        config.ingest.chunking.chunker_version
    ))
 }

--- a/crates/kebab-app/src/lib.rs
+++ b/crates/kebab-app/src/lib.rs
@@ -54,7 +54,7 @@ use kebab_core::{
 use kebab_llm_local::OllamaLanguageModel;
 use kebab_parse_image::{
    OLLAMA_VISION_ENGINE, OcrEngine, OllamaVisionOcr, OnnxPaddleOcr, PADDLE_ONNX_ENGINE,
-    apply_caption, apply_ocr, engine_version_for_config,
+    apply_caption, apply_ocr, engine_version_for_paths,
 };
 use kebab_parse_md::{BodyHints, build_canonical_document, parse_blocks, parse_frontmatter};
 use kebab_source_fs::FsSourceConnector;
@@ -360,12 +360,12 @@ pub fn ingest_with_config_opts(
    // loop is correct and cheap. Construction failure (e.g. invalid
    // endpoint) aborts ingest fail-fast — better than silently disabling
    // OCR/caption mid-run.
-    let ocr_engine: Option<Box<dyn OcrEngine>> = if app.config.image.ocr.enabled {
+    let ocr_engine: Option<Box<dyn OcrEngine>> = if app.config.ingest.image.ocr.enabled {
        Some(build_image_ocr_engine(&app.config).context("kb-app::ingest: build image OCR engine")?)
    } else {
        None
    };
-    let caption_llm: Option<Box<dyn LanguageModel>> = if app.config.image.caption.enabled {
+    let caption_llm: Option<Box<dyn LanguageModel>> = if app.config.ingest.image.caption.enabled {
        Some(Box::new(OllamaLanguageModel::new(&app.config).context(
            "kb-app::ingest: build OllamaLanguageModel for caption",
        )?))
@@ -380,7 +380,7 @@ pub fn ingest_with_config_opts(
    // p10 / v0.20 sub-item 1: PDF OCR engine eager init (H-5 resolution).
    // image OCR pattern mirror — per-ingest 1회 build, fallible → fail-fast.
    let pdf_ocr_engine: Option<Box<dyn OcrEngine>> =
-        if app.config.pdf.ocr.enabled || app.config.pdf.ocr.always_on {
+        if app.config.ingest.pdf.ocr.enabled || app.config.ingest.pdf.ocr.always_on {
            Some(
                build_pdf_ocr_engine(&app.config)
                    .context("kb-app::ingest: build pdf OCR engine")?,
@@ -825,7 +825,7 @@ fn mint_ingest_run_id(scope_json: &str, at: time::OffsetDateTime) -> String {
 type SqliteStoreAlias = kebab_store_sqlite::SqliteStore;

 /// v0.27.0 (T8): build the image OCR engine selected by
-/// `config.image.ocr.engine`. Returns a boxed trait object so the ingest
+/// `config.ingest.image.ocr.engine`. Returns a boxed trait object so the ingest
 /// pipeline is engine-agnostic. Construction is fail-fast (model load /
 /// hash / endpoint validation) — mirrors the prior concrete-type behaviour.
 ///
@@ -835,7 +835,7 @@ type SqliteStoreAlias = kebab_store_sqlite::SqliteStore;
 fn build_image_ocr_engine(
    config: &kebab_config::Config,
 ) -> anyhow::Result<Box<dyn OcrEngine>> {
-    match config.image.ocr.engine.as_str() {
+    match config.ingest.image.ocr.engine.as_str() {
        OLLAMA_VISION_ENGINE => Ok(Box::new(
            OllamaVisionOcr::new(config).context("build OllamaVisionOcr")?,
        )),
@@ -850,7 +850,7 @@ fn build_image_ocr_engine(
 }

 /// v0.27.0 (T8): build the PDF OCR engine selected by
-/// `config.pdf.ocr.engine`. The ollama-vision arm uses the PDF-specific
+/// `config.ingest.pdf.ocr.engine`. The ollama-vision arm uses the PDF-specific
 /// `model` / `languages` / `max_pixels` / `request_timeout_secs` knobs (and
 /// endpoint fallback to `models.llm.endpoint`). The paddle-onnx arm shares
 /// the same bundled ONNX models as image OCR (resolved from `image.ocr`
@@ -869,9 +869,9 @@ fn build_image_ocr_engine(
 fn build_pdf_ocr_engine(
    config: &kebab_config::Config,
 ) -> anyhow::Result<Box<dyn OcrEngine>> {
-    match config.pdf.ocr.engine.as_str() {
+    match config.ingest.pdf.ocr.engine.as_str() {
        OLLAMA_VISION_ENGINE => {
-            let cfg = &config.pdf.ocr;
+            let cfg = &config.ingest.pdf.ocr;
            let endpoint = match cfg.endpoint.as_deref() {
                Some(s) if !s.is_empty() => s.to_string(),
                _ => config.models.llm.endpoint.clone(),
@@ -2144,7 +2144,7 @@ fn sweep_deleted_files(
 ///   asset rollback on embed-fail is a P+ task).
 ///
 /// `chunker_version` is hard-coded to `pdf-page-v1` (HOTFIXES entry —
-/// `config.chunking.chunker_version` is single-valued today and serves
+/// `config.ingest.chunking.chunker_version` is single-valued today and serves
 /// the markdown path; per-medium config split is a P+ chunker registry
 /// task).
 #[allow(clippy::too_many_arguments)]
@@ -2229,15 +2229,15 @@ fn ingest_one_pdf_asset(
    // v0.20 sub-item 1: post-extract OCR enrichment (PR #187 registry
    // dispatch invariant 보존 — extract_for 가 normal entry).
    let (pdf_ocr_pages, pdf_ocr_ms_total): (Option<u32>, Option<u64>) =
-        if app.config.pdf.ocr.enabled || app.config.pdf.ocr.always_on {
+        if app.config.ingest.pdf.ocr.enabled || app.config.ingest.pdf.ocr.always_on {
            match pdf_ocr_engine {
                Some(engine) => {
                    let ocr_opts = crate::pdf_ocr_apply::PdfOcrOpts {
-                        enabled: app.config.pdf.ocr.enabled || app.config.pdf.ocr.always_on,
-                        always_on: app.config.pdf.ocr.always_on,
-                        valid_ratio_threshold: app.config.pdf.ocr.valid_ratio_threshold,
-                        min_char_count: app.config.pdf.ocr.min_char_count,
-                        lang_hint: app.config.pdf.ocr.lang_hint.clone().map(kebab_core::Lang),
+                        enabled: app.config.ingest.pdf.ocr.enabled || app.config.ingest.pdf.ocr.always_on,
+                        always_on: app.config.ingest.pdf.ocr.always_on,
+                        valid_ratio_threshold: app.config.ingest.pdf.ocr.valid_ratio_threshold,
+                        min_char_count: app.config.ingest.pdf.ocr.min_char_count,
+                        lang_hint: app.config.ingest.pdf.ocr.lang_hint.clone().map(kebab_core::Lang),
                        cancel: cancel.cloned(),
                    };
                    // v0.20.x Hook 2: pre-clone Arcs for capture by OCR closure.
@@ -2356,7 +2356,7 @@ fn ingest_one_pdf_asset(
        };

    // Per-medium chunker selection: PDF docs always use pdf-page-v1
-    // regardless of `config.chunking.chunker_version`. The chunker
+    // regardless of `config.ingest.chunking.chunker_version`. The chunker
    // validates every block carries `SourceSpan::Page`; failure here
    // means the parser drifted from its contract.
    let chunker = PdfPageV1Chunker;
@@ -3056,10 +3056,10 @@ fn build_body_hints(asset: &RawAsset) -> BodyHints {
 /// Build a `ChunkPolicy` from the active config.
 fn chunk_policy_from_config(config: &kebab_config::Config) -> ChunkPolicy {
    ChunkPolicy {
-        target_tokens: config.chunking.target_tokens,
-        overlap_tokens: config.chunking.overlap_tokens,
-        respect_markdown_headings: config.chunking.respect_markdown_headings,
-        chunker_version: ChunkerVersion(config.chunking.chunker_version.clone()),
+        target_tokens: config.ingest.chunking.target_tokens,
+        overlap_tokens: config.ingest.chunking.overlap_tokens,
+        respect_markdown_headings: config.ingest.chunking.respect_markdown_headings,
+        chunker_version: ChunkerVersion(config.ingest.chunking.chunker_version.clone()),
    }
 }

@@ -3090,21 +3090,31 @@ static PADDLE_OCR_VERSION_MEMO: std::sync::OnceLock<
    std::sync::Mutex<std::collections::HashMap<String, String>>,
 > = std::sync::OnceLock::new();

-/// T9: resolve the OCR `engine_version` string used inside the ingest config
+/// T9/v3: resolve the OCR `engine_version` string used inside the ingest config
 /// signature. ollama-vision is self-describing from `engine/model` (cheap, no
 /// I/O). paddle-onnx hashes the bundled/override model assets (memoized).
-fn ocr_engine_version_for_sig(config: &kebab_config::Config, engine: &str, model: &str) -> String {
+///
+/// v3: paddle 경로(det/rec/dict)는 **호출자가 미디어별로** 넘긴다 — image 는
+/// `[ingest.image.ocr]`, pdf 는 `[ingest.pdf.ocr]`. v2 의 "pdf 가 image paddle
+/// 을 빌려쓰던" 비대칭을 제거한다. 마이그레이션(T5)이 pdf 대칭 키를 image 값
+/// 으로 채우므로 미변환 v2 → v3 의 signature 는 바이트 동일하게 유지된다.
+fn ocr_engine_version_for_sig(
+    engine: &str,
+    model: &str,
+    det: Option<&str>,
+    rec: Option<&str>,
+    dict: Option<&str>,
+) -> String {
    if engine != PADDLE_ONNX_ENGINE {
        // ollama-vision (and any non-paddle engine): the daemon exposes no
        // stable per-model revision, so engine/model is the identity.
        return format!("ollama/{model}");
    }
-    let ocr = &config.image.ocr;
    let key = format!(
        "{}|{}|{}",
-        ocr.det_model.as_deref().unwrap_or("<bundled>"),
-        ocr.rec_model.as_deref().unwrap_or("<bundled>"),
-        ocr.dict.as_deref().unwrap_or("<bundled>"),
+        det.unwrap_or("<bundled>"),
+        rec.unwrap_or("<bundled>"),
+        dict.unwrap_or("<bundled>"),
    );
    let memo = PADDLE_OCR_VERSION_MEMO.get_or_init(|| std::sync::Mutex::new(std::collections::HashMap::new()));
    if let Some(v) = memo.lock().unwrap().get(&key) {
@@ -3114,7 +3124,7 @@ fn ocr_engine_version_for_sig(config: &kebab_config::Config, engine: &str, model
    // ingest the engine was already built (fail-fast) so the assets are
    // present and this succeeds; the path-derived identity below is an
    // unreachable-in-practice guard that keeps the signature total.
-    let version = engine_version_for_config(config).unwrap_or_else(|e| {
+    let version = engine_version_for_paths(det, rec, dict).unwrap_or_else(|e| {
        tracing::warn!(
            target: "kebab-app::ingest",
            error = %e,
@@ -3126,11 +3136,19 @@ fn ocr_engine_version_for_sig(config: &kebab_config::Config, engine: &str, model
    version
 }

+/// v3: signature 바이트 불변 골든을 위한 테스트 seam. `ingest_config_signature`
+/// 는 private 이라 통합 테스트에서 직접 못 부른다. 값 기반이라 struct 경로가
+/// 바뀌어도(미디어 ingest 통합) 출력 문자열은 v2 와 바이트 동일해야 한다.
+#[doc(hidden)]
+pub fn test_ingest_config_signature(c: &kebab_config::Config, m: &MediaType) -> String {
+    ingest_config_signature(c, m)
+}
+
 fn ingest_config_signature(config: &kebab_config::Config, media: &MediaType) -> String {
    // Common (every media type): chunking parameters that move chunk
    // boundaries. `target_tokens` / `overlap_tokens` change re-chunking for
    // markdown / image / pdf / code alike, so a change re-indexes all types.
-    let c = &config.chunking;
+    let c = &config.ingest.chunking;
    let mut sig = format!(
        "chunk:{}:{}:{}:{}",
        c.target_tokens, c.overlap_tokens, c.respect_markdown_headings, c.chunker_version
@@ -3140,7 +3158,7 @@ fn ingest_config_signature(config: &kebab_config::Config, media: &MediaType) ->
            // OCR / caption only affect output when their `enabled` flag is
            // on; the model / prompt version matters only then. Off ↔ off is
            // a stable empty token so re-running the same config skips.
-            let ocr = &config.image.ocr;
+            let ocr = &config.ingest.image.ocr;
            if ocr.enabled {
                // v0.27.0 (T9): engine + engine_version so switching engine
                // (ollama-vision ↔ paddle-onnx) OR changing the model/assets
@@ -3148,12 +3166,18 @@ fn ingest_config_signature(config: &kebab_config::Config, media: &MediaType) ->
                sig.push_str(&format!(
                    "|ocr:1:{}:{}",
                    ocr.engine,
-                    ocr_engine_version_for_sig(config, &ocr.engine, &ocr.model)
+                    ocr_engine_version_for_sig(
+                        &ocr.engine,
+                        &ocr.model,
+                        ocr.det_model.as_deref(),
+                        ocr.rec_model.as_deref(),
+                        ocr.dict.as_deref(),
+                    )
                ));
            } else {
                sig.push_str("|ocr:0");
            }
-            let cap = &config.image.caption;
+            let cap = &config.ingest.image.caption;
            if cap.enabled {
                sig.push_str(&format!("|cap:1:{}", cap.prompt_template_version));
            } else {
@@ -3163,7 +3187,7 @@ fn ingest_config_signature(config: &kebab_config::Config, media: &MediaType) ->
        MediaType::Pdf => {
            // PDF OCR is active when EITHER `enabled` or `always_on` is set
            // (mirrors the ingest gate). `model` only matters when active.
-            let ocr = &config.pdf.ocr;
+            let ocr = &config.ingest.pdf.ocr;
            if ocr.enabled || ocr.always_on {
                // v0.27.0 (T9): engine + engine_version (same cascade rule as
                // image OCR above) alongside the enabled/always_on gate.
@@ -3172,7 +3196,13 @@ fn ingest_config_signature(config: &kebab_config::Config, media: &MediaType) ->
                    ocr.enabled,
                    ocr.always_on,
                    ocr.engine,
-                    ocr_engine_version_for_sig(config, &ocr.engine, &ocr.model)
+                    ocr_engine_version_for_sig(
+                        &ocr.engine,
+                        &ocr.model,
+                        ocr.det_model.as_deref(),
+                        ocr.rec_model.as_deref(),
+                        ocr.dict.as_deref(),
+                    )
                ));
            } else {
                sig.push_str("|pdfocr:0");
@@ -3739,7 +3769,7 @@ mod ingest_config_signature_tests {
    fn chunking_change_invalidates_all_types() {
        let base = Config::defaults();
        let mut bumped = base.clone();
-        bumped.chunking.target_tokens += 100;
+        bumped.ingest.chunking.target_tokens += 100;
        for m in [md(), img(), pdf(), code()] {
            assert_ne!(
                ingest_config_signature(&base, &m),
@@ -3749,14 +3779,14 @@ mod ingest_config_signature_tests {
        }

        let mut overlap = base.clone();
-        overlap.chunking.overlap_tokens += 10;
+        overlap.ingest.chunking.overlap_tokens += 10;
        assert_ne!(
            ingest_config_signature(&base, &md()),
            ingest_config_signature(&overlap, &md())
        );

        let mut headings = base.clone();
-        headings.chunking.respect_markdown_headings = !base.chunking.respect_markdown_headings;
+        headings.ingest.chunking.respect_markdown_headings = !base.ingest.chunking.respect_markdown_headings;
        assert_ne!(
            ingest_config_signature(&base, &md()),
            ingest_config_signature(&headings, &md())
@@ -3768,9 +3798,9 @@ mod ingest_config_signature_tests {
    #[test]
    fn image_ocr_toggle_invalidates_image_only() {
        let base = Config::defaults();
-        assert!(!base.image.ocr.enabled, "default OCR is off");
+        assert!(!base.ingest.image.ocr.enabled, "default OCR is off");
        let mut on = base.clone();
-        on.image.ocr.enabled = true;
+        on.ingest.image.ocr.enabled = true;

        assert_ne!(
            ingest_config_signature(&base, &img()),
@@ -3792,16 +3822,16 @@ mod ingest_config_signature_tests {
    fn image_ocr_model_matters_only_when_enabled() {
        let mut off_a = Config::defaults();
        let mut off_b = off_a.clone();
-        off_b.image.ocr.model = "some-other-model".to_string();
+        off_b.ingest.image.ocr.model = "some-other-model".to_string();
        assert_eq!(
            ingest_config_signature(&off_a, &img()),
            ingest_config_signature(&off_b, &img()),
            "OCR model is irrelevant while OCR is off"
        );

-        off_a.image.ocr.enabled = true;
+        off_a.ingest.image.ocr.enabled = true;
        let mut on_b = off_a.clone();
-        on_b.image.ocr.model = "some-other-model".to_string();
+        on_b.ingest.image.ocr.model = "some-other-model".to_string();
        assert_ne!(
            ingest_config_signature(&off_a, &img()),
            ingest_config_signature(&on_b, &img()),
@@ -3814,14 +3844,14 @@ mod ingest_config_signature_tests {
    fn image_caption_toggle_and_prompt_invalidate_image() {
        let base = Config::defaults();
        let mut on = base.clone();
-        on.image.caption.enabled = true;
+        on.ingest.image.caption.enabled = true;
        assert_ne!(
            ingest_config_signature(&base, &img()),
            ingest_config_signature(&on, &img())
        );

        let mut prompt = on.clone();
-        prompt.image.caption.prompt_template_version = "caption-v9".to_string();
+        prompt.ingest.image.caption.prompt_template_version = "caption-v9".to_string();
        assert_ne!(
            ingest_config_signature(&on, &img()),
            ingest_config_signature(&prompt, &img()),
@@ -3835,7 +3865,7 @@ mod ingest_config_signature_tests {
    fn pdf_ocr_toggle_invalidates_pdf_only() {
        let base = Config::defaults();
        let mut enabled = base.clone();
-        enabled.pdf.ocr.enabled = true;
+        enabled.ingest.pdf.ocr.enabled = true;
        assert_ne!(
            ingest_config_signature(&base, &pdf()),
            ingest_config_signature(&enabled, &pdf()),
@@ -3843,7 +3873,7 @@ mod ingest_config_signature_tests {
        );

        let mut always = base.clone();
-        always.pdf.ocr.always_on = true;
+        always.ingest.pdf.ocr.always_on = true;
        assert_ne!(
            ingest_config_signature(&base, &pdf()),
            ingest_config_signature(&always, &pdf()),
@@ -3921,13 +3951,13 @@ mod ingest_config_signature_tests {
        // ui
        other.ui.theme = "light".to_string();
        // image runtime-only (non-output) knobs
-        other.image.ocr.max_pixels += 100;
-        other.image.ocr.languages.push("jpn".to_string());
-        other.image.ocr.request_timeout_secs += 10;
+        other.ingest.image.ocr.max_pixels += 100;
+        other.ingest.image.ocr.languages.push("jpn".to_string());
+        other.ingest.image.ocr.request_timeout_secs += 10;
        // pdf runtime-only knobs
-        other.pdf.ocr.max_pixels += 100;
-        other.pdf.ocr.request_timeout_secs += 10;
-        other.pdf.ocr.languages.push("jpn".to_string());
+        other.ingest.pdf.ocr.max_pixels += 100;
+        other.ingest.pdf.ocr.request_timeout_secs += 10;
+        other.ingest.pdf.ocr.languages.push("jpn".to_string());

        for m in [md(), img(), pdf(), code()] {
            assert_eq!(
@@ -3946,10 +3976,10 @@ mod ingest_config_signature_tests {
    #[test]
    fn image_ocr_engine_switch_invalidates_image() {
        let mut ollama = Config::defaults();
-        ollama.image.ocr.enabled = true;
+        ollama.ingest.image.ocr.enabled = true;
        // same `model` string on both — only the engine differs
        let mut paddle = ollama.clone();
-        paddle.image.ocr.engine = "paddle-onnx".to_string();
+        paddle.ingest.image.ocr.engine = "paddle-onnx".to_string();
        assert_ne!(
            ingest_config_signature(&ollama, &img()),
            ingest_config_signature(&paddle, &img()),
@@ -3962,10 +3992,10 @@ mod ingest_config_signature_tests {
    #[test]
    fn image_ocr_engine_version_change_invalidates_image() {
        let mut a = Config::defaults();
-        a.image.ocr.enabled = true;
-        a.image.ocr.model = "gemma4:e4b".to_string();
+        a.ingest.image.ocr.enabled = true;
+        a.ingest.image.ocr.model = "gemma4:e4b".to_string();
        let mut b = a.clone();
-        b.image.ocr.model = "qwen2.5vl:3b".to_string();
+        b.ingest.image.ocr.model = "qwen2.5vl:3b".to_string();
        assert_ne!(
            ingest_config_signature(&a, &img()),
            ingest_config_signature(&b, &img()),
@@ -3978,10 +4008,10 @@ mod ingest_config_signature_tests {
    #[test]
    fn image_ocr_paddle_model_path_change_invalidates_image() {
        let mut base = Config::defaults();
-        base.image.ocr.enabled = true;
-        base.image.ocr.engine = "paddle-onnx".to_string();
+        base.ingest.image.ocr.enabled = true;
+        base.ingest.image.ocr.engine = "paddle-onnx".to_string();
        let mut overridden = base.clone();
-        overridden.image.ocr.det_model = Some("/some/other/det.onnx".to_string());
+        overridden.ingest.image.ocr.det_model = Some("/some/other/det.onnx".to_string());
        assert_ne!(
            ingest_config_signature(&base, &img()),
            ingest_config_signature(&overridden, &img()),
@@ -3994,11 +4024,11 @@ mod ingest_config_signature_tests {
    #[test]
    fn paddle_image_signature_stable_for_unrelated_change() {
        let mut base = Config::defaults();
-        base.image.ocr.enabled = true;
-        base.image.ocr.engine = "paddle-onnx".to_string();
+        base.ingest.image.ocr.enabled = true;
+        base.ingest.image.ocr.engine = "paddle-onnx".to_string();
        let mut other = base.clone();
        other.search.default_k += 3;
-        other.image.ocr.max_pixels += 100; // runtime-only knob
+        other.ingest.image.ocr.max_pixels += 100; // runtime-only knob
        assert_eq!(
            ingest_config_signature(&base, &img()),
            ingest_config_signature(&other, &img()),
@@ -4010,9 +4040,9 @@ mod ingest_config_signature_tests {
    #[test]
    fn pdf_ocr_engine_switch_invalidates_pdf() {
        let mut ollama = Config::defaults();
-        ollama.pdf.ocr.enabled = true;
+        ollama.ingest.pdf.ocr.enabled = true;
        let mut paddle = ollama.clone();
-        paddle.pdf.ocr.engine = "paddle-onnx".to_string();
+        paddle.ingest.pdf.ocr.engine = "paddle-onnx".to_string();
        assert_ne!(
            ingest_config_signature(&ollama, &pdf()),
            ingest_config_signature(&paddle, &pdf()),
--- a/crates/kebab-app/src/schema.rs
+++ b/crates/kebab-app/src/schema.rs
@@ -205,7 +205,7 @@ fn collect_models(cfg: &Config, store: &kebab_store_sqlite::SqliteStore) -> Mode
        // maintain their own versions; surface those when SchemaV1.models
        // becomes a multi-medium map (P+).
        parser_version: kebab_parse_md::PARSER_VERSION.to_string(),
-        chunker_version: cfg.chunking.chunker_version.clone(),
+        chunker_version: cfg.ingest.chunking.chunker_version.clone(),
        active_parsers,
        active_chunkers,
        // EmbeddingModelCfg uses `.model` (not `.id`) — adapt from plan.
--- a/crates/kebab-app/tests/common/mod.rs
+++ b/crates/kebab-app/tests/common/mod.rs
@@ -62,8 +62,8 @@ impl TestEnv {
        // Drop in a small chunk policy so the fixture's small files
        // emit at least a couple of chunks even with overlap_tokens
        // honored.
-        config.chunking.target_tokens = 80;
-        config.chunking.overlap_tokens = 20;
+        config.ingest.chunking.target_tokens = 80;
+        config.ingest.chunking.overlap_tokens = 20;

        Self {
            temp,
--- a/crates/kebab-app/tests/config_invalidation.rs
+++ b/crates/kebab-app/tests/config_invalidation.rs
@@ -63,7 +63,7 @@ fn chunking_change_reindexes_all_types() {
    let scanned = first.scanned;

    // Bump target_tokens — folds into every type's signature.
-    env.config.chunking.target_tokens += 100;
+    env.config.ingest.chunking.target_tokens += 100;

    let second = reingest(&env);
    assert_eq!(second.scanned, scanned);
@@ -146,3 +146,24 @@ fn search_setting_change_reindexes_nothing() {
    assert_eq!(second.new, 0);
    assert_eq!(second.errors, 0);
 }
+
+/// v3 불변식 #1: `ingest_config_signature` 출력 문자열은 값 기반이라 struct
+/// 경로 재편(미디어 ingest 통합) 후에도 v2 와 **바이트 동일**해야 한다. 깨지면
+/// 업그레이드 시 전체 재색인 발생. paddle-onnx image 분기 형식 골든.
+#[test]
+fn ingest_signature_image_paddle_byte_stable() {
+    let mut cfg = kebab_config::Config::defaults();
+    cfg.ingest.image.ocr.enabled = true;
+    cfg.ingest.image.ocr.engine = "paddle-onnx".into();
+    let sig = kebab_app::test_ingest_config_signature(
+        &cfg,
+        &kebab_core::MediaType::Image(kebab_core::ImageType::Png),
+    );
+    // 골든: chunk:... |ocr:1:paddle-onnx:<engine_version> |cap:0
+    assert!(
+        sig.starts_with("chunk:500:80:true:md-heading-v1"),
+        "chunk prefix drift: {sig}"
+    );
+    assert!(sig.contains("|ocr:1:paddle-onnx:"), "ocr token drift: {sig}");
+    assert!(sig.ends_with("|cap:0"), "cap token drift: {sig}");
+}
--- a/crates/kebab-app/tests/image_pipeline.rs
+++ b/crates/kebab-app/tests/image_pipeline.rs
@@ -34,11 +34,11 @@ fn cfg_with_image_pipeline(env: &TestEnv, mock_endpoint: &str) -> Config {
    let mut cfg = env.config.clone();
    // p9-fb-25: workspace.include removed; extension routing is now
    // handled by extractor matching alone (no config knob).
-    cfg.image.ocr.enabled = true;
-    cfg.image.ocr.endpoint = Some(mock_endpoint.to_string());
-    cfg.image.ocr.model = "vision-mock:1b".to_string();
-    cfg.image.ocr.max_pixels = 512;
-    cfg.image.caption.enabled = false; // tested separately below
+    cfg.ingest.image.ocr.enabled = true;
+    cfg.ingest.image.ocr.endpoint = Some(mock_endpoint.to_string());
+    cfg.ingest.image.ocr.model = "vision-mock:1b".to_string();
+    cfg.ingest.image.ocr.max_pixels = 512;
+    cfg.ingest.image.caption.enabled = false; // tested separately below
    cfg.models.llm.endpoint = mock_endpoint.to_string();
    cfg.models.llm.model = "vision-mock:1b".to_string();
    cfg
@@ -161,8 +161,8 @@ async fn ingest_image_with_ocr_and_caption_populates_both_fields() {
    let env = TestEnv::lexical_only();
    write_red_png(&env.workspace_root, "diagram.png");
    let mut cfg = cfg_with_image_pipeline(&env, &server.uri());
-    cfg.image.caption.enabled = true;
-    cfg.image.caption.max_pixels = 384;
+    cfg.ingest.image.caption.enabled = true;
+    cfg.ingest.image.caption.max_pixels = 384;

    let cfg_clone = cfg.clone();
    let scope = env.scope();
@@ -270,8 +270,8 @@ async fn image_indexed_with_filename_when_ocr_and_caption_disabled() {
    let mut cfg = env.config.clone();
    // p9-fb-25: workspace.include removed; extension routing is now
    // handled by extractor matching alone (no config knob).
-    cfg.image.ocr.enabled = false;
-    cfg.image.caption.enabled = false;
+    cfg.ingest.image.ocr.enabled = false;
+    cfg.ingest.image.caption.enabled = false;

    let cfg_clone = cfg.clone();
    let scope = env.scope();
@@ -334,8 +334,8 @@ async fn garbage_png_increments_errors_counter_exactly_once() {
    let mut cfg = env.config.clone();
    // p9-fb-25: workspace.include removed; extension routing is now
    // handled by extractor matching alone (no config knob).
-    cfg.image.ocr.enabled = false;
-    cfg.image.caption.enabled = false;
+    cfg.ingest.image.ocr.enabled = false;
+    cfg.ingest.image.caption.enabled = false;

    let cfg_clone = cfg.clone();
    let scope = env.scope();
--- a/crates/kebab-app/tests/ingest_log_smoke.rs
+++ b/crates/kebab-app/tests/ingest_log_smoke.rs
@@ -23,8 +23,8 @@ fn minimal_config(workspace: &std::path::Path, log_dir: &std::path::Path) -> Con
    cfg.storage.model_dir = model_dir.to_string_lossy().into_owned();
    cfg.models.embedding.provider = "none".to_string();
    cfg.models.embedding.dimensions = 0;
-    cfg.chunking.target_tokens = 80;
-    cfg.chunking.overlap_tokens = 20;
+    cfg.ingest.chunking.target_tokens = 80;
+    cfg.ingest.chunking.overlap_tokens = 20;
    cfg.logging = LoggingCfg {
        ingest_log_enabled: true,
        ingest_log_dir: log_dir.to_path_buf(),
--- a/crates/kebab-app/tests/ingest_pdf_ocr_smoke.rs
+++ b/crates/kebab-app/tests/ingest_pdf_ocr_smoke.rs
@@ -22,8 +22,8 @@ fn ollama_endpoint() -> String {

 fn make_ocr_env_real() -> TestEnv {
    let mut env = TestEnv::lexical_only();
-    env.config.pdf.ocr.enabled = true;
-    env.config.pdf.ocr.endpoint = Some(ollama_endpoint());
+    env.config.ingest.pdf.ocr.enabled = true;
+    env.config.ingest.pdf.ocr.endpoint = Some(ollama_endpoint());
    env.config.models.embedding.provider = "none".to_string();

    let src = PathBuf::from(env!("CARGO_MANIFEST_DIR"))
@@ -92,8 +92,8 @@ fn ocr_text_indexed_and_searchable() {
 #[test]
 fn ingest_with_cancel_aborts_mid_pdf() {
    let mut env = TestEnv::lexical_only();
-    env.config.pdf.ocr.enabled = true;
-    env.config.pdf.ocr.endpoint = Some("http://127.0.0.1:1".to_string());
+    env.config.ingest.pdf.ocr.enabled = true;
+    env.config.ingest.pdf.ocr.endpoint = Some("http://127.0.0.1:1".to_string());

    let src = PathBuf::from(env!("CARGO_MANIFEST_DIR"))
        .parent()
--- a/crates/kebab-app/tests/ingest_progress.rs
+++ b/crates/kebab-app/tests/ingest_progress.rs
@@ -196,9 +196,9 @@ fn pdf_ocr_progress_emits_started_finished_events() {
    config.storage.data_dir = data_dir.to_string_lossy().into_owned();
    config.models.embedding.provider = "none".to_string();
    config.models.embedding.dimensions = 0;
-    config.pdf.ocr.enabled = true;
+    config.ingest.pdf.ocr.enabled = true;
    if let Ok(endpoint) = std::env::var("KEBAB_PDF_OCR_ENDPOINT") {
-        config.pdf.ocr.endpoint = Some(endpoint);
+        config.ingest.pdf.ocr.endpoint = Some(endpoint);
    }

    let scope = kebab_core::SourceScope {
--- a/crates/kebab-app/tests/pdf_ocr_events_insert_smoke.rs
+++ b/crates/kebab-app/tests/pdf_ocr_events_insert_smoke.rs
@@ -49,9 +49,9 @@ async fn ingest_dual_write_doc_id_matches_ndjson() {
    let result = spawn_blocking(move || {
        let mut env = TestEnv::lexical_only();
        // Enable PDF OCR + set up mock endpoint
-        env.config.pdf.ocr.enabled = true;
-        env.config.pdf.ocr.endpoint = Some(mock_url.clone());
-        env.config.pdf.ocr.model = "qwen2.5vl:3b".to_string();
+        env.config.ingest.pdf.ocr.enabled = true;
+        env.config.ingest.pdf.ocr.endpoint = Some(mock_url.clone());
+        env.config.ingest.pdf.ocr.model = "qwen2.5vl:3b".to_string();
        // Enable ingest log
        let log_dir = env.temp.path().join("logs");
        std::fs::create_dir_all(&log_dir).unwrap();
--- a/crates/kebab-app/tests/pdf_pipeline.rs
+++ b/crates/kebab-app/tests/pdf_pipeline.rs
@@ -121,8 +121,8 @@ fn cfg_with_pdf(env: &TestEnv) -> Config {
    // PDF ingest does not need OCR / caption / LM — leave defaults
    // (ocr.enabled=false, caption.enabled=false). The image pipeline
    // construction step skips both adapters.
-    cfg.image.ocr.enabled = false;
-    cfg.image.caption.enabled = false;
+    cfg.ingest.image.ocr.enabled = false;
+    cfg.ingest.image.caption.enabled = false;
    cfg
 }

--- a/crates/kebab-app/tests/schema_active_versions.rs
+++ b/crates/kebab-app/tests/schema_active_versions.rs
@@ -12,8 +12,8 @@ fn minimal_config(data_dir: &std::path::Path, workspace_root: &std::path::Path)
    cfg.storage.model_dir = data_dir.join("models").to_string_lossy().into_owned();
    cfg.models.embedding.provider = "none".to_string();
    cfg.models.embedding.dimensions = 0;
-    cfg.chunking.target_tokens = 80;
-    cfg.chunking.overlap_tokens = 20;
+    cfg.ingest.chunking.target_tokens = 80;
+    cfg.ingest.chunking.overlap_tokens = 20;
    cfg
 }

--- a/crates/kebab-app/tests/schema_report.rs
+++ b/crates/kebab-app/tests/schema_report.rs
@@ -14,8 +14,8 @@ fn minimal_config(data_dir: &std::path::Path, workspace_root: &std::path::Path)
    config.storage.model_dir = data_dir.join("models").to_string_lossy().into_owned();
    config.models.embedding.provider = "none".to_string();
    config.models.embedding.dimensions = 0;
-    config.chunking.target_tokens = 80;
-    config.chunking.overlap_tokens = 20;
+    config.ingest.chunking.target_tokens = 80;
+    config.ingest.chunking.overlap_tokens = 20;
    config
 }

--- a/crates/kebab-config/src/lib.rs
+++ b/crates/kebab-config/src/lib.rs
@@ -12,6 +12,18 @@ mod paths;
 pub mod migrate;
 pub use paths::{expand_path, expand_path_with_base};

+/// f32 의 shortest round-trip(Display)을 f64 로 재파싱해 직렬화한다.
+/// `0.3_f32` 가 `0.30000001192092896` 으로 새지 않고 `0.3` 으로 출력되게 한다.
+/// 마이그레이션 시 toml_edit relocation 의 무손실 비교를 깨지 않도록, 그리고
+/// `kebab config migrate` 산출물이 사람이 읽기 좋게.
+fn ser_f32_clean<S>(v: &f32, s: S) -> Result<S::Ok, S::Error>
+where
+    S: serde::Serializer,
+{
+    let clean: f64 = format!("{v}").parse().unwrap_or(f64::from(*v));
+    s.serialize_f64(clean)
+}
+
 /// Signal: `Config::from_file` / `Config::load` failed due to missing path,
 /// I/O failure, TOML parse failure, or post-parse validation failure.
 ///
@@ -39,32 +51,20 @@ pub struct Config {
    pub schema_version: u32,
    pub workspace: WorkspaceCfg,
    pub storage: StorageCfg,
-    pub indexing: IndexingCfg,
-    pub chunking: ChunkingCfg,
    pub models: ModelsCfg,
+    /// v3: 모든 미디어 형식 ingest 설정의 우산 — 병렬도(← 옛 `[indexing]`),
+    /// chunking, code, image, pdf 가 전부 `[ingest.*]` 하위로 통합됐다.
+    /// `#[serde(default)]` 로 두어 미변환 / 부분 config 도 로드된다(자동
+    /// 변환은 `Config::from_file` 가 메모리에서 수행 — T6).
+    #[serde(default)]
+    pub ingest: IngestCfg,
    pub search: SearchCfg,
    pub rag: RagCfg,
-    /// Image-pipeline settings (P6: OCR, captioning). Tagged
-    /// `#[serde(default)]` so pre-P6 config files that predate the
-    /// `[image]` section still load — defaults disable OCR / caption
-    /// (they cost a model call per asset).
-    #[serde(default = "ImageCfg::defaults")]
-    pub image: ImageCfg,
    /// p9-fb-14: TUI palette + role-style mapping. `#[serde(default)]`
    /// so configs that predate this section still load (defaults to
    /// `dark`).
    #[serde(default = "UiCfg::defaults")]
    pub ui: UiCfg,
-    /// p10-1A-1: code ingest settings. `#[serde(default)]` so existing
-    /// config files without an `[ingest]` / `[ingest.code]` section
-    /// load cleanly with built-in defaults.
-    #[serde(default)]
-    pub ingest: IngestCfg,
-    /// v0.20.0 sub-item 1: PDF ingest pipeline settings. `#[serde(default)]`
-    /// so pre-v0.20 config files without a `[pdf]` section load with
-    /// built-in defaults (OCR disabled — opt-in for scanned PDF KB).
-    #[serde(default = "PdfCfg::defaults")]
-    pub pdf: PdfCfg,
    /// v0.20.x ingest log surface. `#[serde(default)]` so pre-v0.20
    /// config files without a `[logging]` section load with built-in
    /// defaults (enabled=true, dir=~/.local/state/kebab/logs).
@@ -104,13 +104,6 @@ pub struct StorageCfg {
    pub copy_threshold_mb: u64,
 }

-#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
-pub struct IndexingCfg {
-    pub max_parallel_extractors: u32,
-    pub max_parallel_embeddings: u32,
-    pub watch_filesystem: bool,
-}
-
 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 pub struct ChunkingCfg {
    pub target_tokens: usize,
@@ -119,6 +112,17 @@ pub struct ChunkingCfg {
    pub chunker_version: String,
 }

+impl ChunkingCfg {
+    pub fn defaults() -> Self {
+        Self {
+            target_tokens: 500,
+            overlap_tokens: 80,
+            respect_markdown_headings: true,
+            chunker_version: "md-heading-v1".to_string(),
+        }
+    }
+}
+
 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 pub struct ModelsCfg {
    pub embedding: EmbeddingModelCfg,
@@ -186,6 +190,7 @@ pub struct LlmCfg {
    pub model: String,
    pub context_tokens: usize,
    pub endpoint: String,
+    #[serde(serialize_with = "ser_f32_clean")]
    pub temperature: f32,
    pub seed: u64,
    /// v0.17.0 post-dogfood: Hard ceiling on a single HTTP exchange to
@@ -244,6 +249,7 @@ fn default_stale_threshold_days() -> u32 {
 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 pub struct RagCfg {
    pub prompt_template_version: String,
+    #[serde(serialize_with = "ser_f32_clean")]
    pub score_gate: f32,
    pub explain_default: bool,
    pub max_context_tokens: usize,
@@ -293,7 +299,7 @@ pub struct RagCfg {
    ///
    /// Single-pass `ask` ignores this knob entirely — only multi-hop
    /// runs through the verification step (PR-9c-2 wires it).
-    #[serde(default = "default_nli_threshold")]
+    #[serde(default = "default_nli_threshold", serialize_with = "ser_f32_clean")]
    pub nli_threshold: f32,
 }

@@ -397,11 +403,11 @@ pub struct OcrCfg {
    pub dict: Option<String>,
    /// DBNet detection box score threshold (0.0..=1.0). Boxes whose mean
    /// probability is below this are dropped. Default `0.3`.
-    #[serde(default = "default_ocr_score_thresh")]
+    #[serde(default = "default_ocr_score_thresh", serialize_with = "ser_f32_clean")]
    pub score_thresh: f32,
    /// Polygon unclip ratio applied to each detected box before crop.
    /// Larger = more padding around the text. Default `1.5`.
-    #[serde(default = "default_ocr_unclip_ratio")]
+    #[serde(default = "default_ocr_unclip_ratio", serialize_with = "ser_f32_clean")]
    pub unclip_ratio: f32,
    /// Hard cap on detected boxes per image (runaway guard). Extra boxes
    /// past this count are truncated with a warning. Default `1000`.
@@ -583,7 +589,7 @@ pub struct PdfOcrCfg {
    /// Valid char ratio threshold (0.0..=1.0). Page with ratio below
    /// this is classified as scanned/mojibake → OCR fallback. Default
    /// `0.5`.
-    #[serde(default = "default_pdf_ocr_valid_ratio")]
+    #[serde(default = "default_pdf_ocr_valid_ratio", serialize_with = "ser_f32_clean")]
    pub valid_ratio_threshold: f32,
    /// Minimum char count per page below which page is auto-scanned.
    /// Default `20`.
@@ -592,6 +598,30 @@ pub struct PdfOcrCfg {
    /// Single-page lang hint. Default `Some("kor")`. `None` = no hint.
    #[serde(default = "default_pdf_ocr_lang_hint")]
    pub lang_hint: Option<String>,
+
+    // ── paddle-onnx engine overrides (v3) ───────────────────────────────
+    // Symmetric with `[ingest.image.ocr]`. v2 의 "pdf paddle 이 image 의
+    // 모델 경로를 빌려쓰던" 비대칭을 제거 — pdf 자체 키로 옮긴다. 마이그레이션
+    // (T5)이 image 값을 이 키로 복사해 signature 바이트 동일 유지. 전부
+    // `#[serde(default)]` 이라 pre-v3 config 도 로드.
+    /// Override path to the detection ONNX model. `None` → bundled.
+    #[serde(default)]
+    pub det_model: Option<String>,
+    /// Override path to the recognition ONNX model. `None` → bundled.
+    #[serde(default)]
+    pub rec_model: Option<String>,
+    /// Override path to the character dictionary. `None` → bundled.
+    #[serde(default)]
+    pub dict: Option<String>,
+    /// DBNet detection box score threshold (0.0..=1.0). Default `0.3`.
+    #[serde(default = "default_ocr_score_thresh", serialize_with = "ser_f32_clean")]
+    pub score_thresh: f32,
+    /// Polygon unclip ratio applied to each detected box. Default `1.5`.
+    #[serde(default = "default_ocr_unclip_ratio", serialize_with = "ser_f32_clean")]
+    pub unclip_ratio: f32,
+    /// Hard cap on detected boxes per page (runaway guard). Default `1000`.
+    #[serde(default = "default_ocr_max_boxes")]
+    pub max_boxes: usize,
 }

 impl PdfOcrCfg {
@@ -608,6 +638,12 @@ impl PdfOcrCfg {
            valid_ratio_threshold: default_pdf_ocr_valid_ratio(),
            min_char_count: default_pdf_ocr_min_char_count(),
            lang_hint: default_pdf_ocr_lang_hint(),
+            det_model: None,
+            rec_model: None,
+            dict: None,
+            score_thresh: default_ocr_score_thresh(),
+            unclip_ratio: default_ocr_unclip_ratio(),
+            max_boxes: default_ocr_max_boxes(),
        }
    }
 }
@@ -659,12 +695,47 @@ impl UiCfg {
    }
 }

-/// p10-1A-1: top-level ingest configuration wrapper. Contains per-media-type
-/// sub-sections; currently only `code` is defined.
-#[derive(Clone, Debug, Default, PartialEq, Serialize, Deserialize)]
-#[serde(default)]
+/// v3: 모든 미디어 형식 ingest 설정의 우산. 스칼라(병렬도)는 ← 옛 `[indexing]`,
+/// 미디어별 하위 테이블(chunking/code/image/pdf)은 ← 옛 top-level 섹션.
+/// 직렬화 순서 = 필드 순서: 스칼라(병렬도) 먼저, 하위 테이블 뒤
+/// (TOML 의 "bare key 는 sub-table header 앞" 규칙 준수).
+#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 pub struct IngestCfg {
+    #[serde(default = "default_max_parallel_extractors")]
+    pub max_parallel_extractors: u32,
+    #[serde(default = "default_max_parallel_embeddings")]
+    pub max_parallel_embeddings: u32,
+    #[serde(default)]
+    pub watch_filesystem: bool,
+    #[serde(default = "ChunkingCfg::defaults")]
+    pub chunking: ChunkingCfg,
+    #[serde(default)]
    pub code: IngestCodeCfg,
+    #[serde(default = "ImageCfg::defaults")]
+    pub image: ImageCfg,
+    #[serde(default = "PdfCfg::defaults")]
+    pub pdf: PdfCfg,
+}
+
+impl Default for IngestCfg {
+    fn default() -> Self {
+        Self {
+            max_parallel_extractors: default_max_parallel_extractors(),
+            max_parallel_embeddings: default_max_parallel_embeddings(),
+            watch_filesystem: false,
+            chunking: ChunkingCfg::defaults(),
+            code: IngestCodeCfg::default(),
+            image: ImageCfg::defaults(),
+            pdf: PdfCfg::defaults(),
+        }
+    }
+}
+
+fn default_max_parallel_extractors() -> u32 {
+    2
+}
+fn default_max_parallel_embeddings() -> u32 {
+    1
 }

 /// p10-1A-1: settings for the code ingest pipeline. All fields have
@@ -728,17 +799,6 @@ impl Config {
                runs_dir: "{data_dir}/runs".to_string(),
                copy_threshold_mb: 100,
            },
-            indexing: IndexingCfg {
-                max_parallel_extractors: 2,
-                max_parallel_embeddings: 1,
-                watch_filesystem: false,
-            },
-            chunking: ChunkingCfg {
-                target_tokens: 500,
-                overlap_tokens: 80,
-                respect_markdown_headings: true,
-                chunker_version: "md-heading-v1".to_string(),
-            },
            models: ModelsCfg {
                embedding: EmbeddingModelCfg {
                    provider: "fastembed".to_string(),
@@ -765,6 +825,15 @@ impl Config {
                },
                nli: NliCfg::defaults(),
            },
+            ingest: IngestCfg {
+                max_parallel_extractors: 2,
+                max_parallel_embeddings: 1,
+                watch_filesystem: false,
+                chunking: ChunkingCfg::defaults(),
+                code: IngestCodeCfg::default(),
+                image: ImageCfg::defaults(),
+                pdf: PdfCfg::defaults(),
+            },
            search: SearchCfg {
                default_k: 10,
                hybrid_fusion: "rrf".to_string(),
@@ -783,10 +852,7 @@ impl Config {
                multi_hop_max_pool_chunks: default_multi_hop_max_pool_chunks(),
                nli_threshold: default_nli_threshold(),
            },
-            image: ImageCfg::defaults(),
            ui: UiCfg::defaults(),
-            ingest: IngestCfg::default(),
-            pdf: PdfCfg::defaults(),
            logging: LoggingCfg::default(),
            // p9-fb-05: defaults are not loaded from disk, so no
            // source_dir. Relative `workspace.root` (rare with
@@ -898,29 +964,56 @@ impl Config {
            })
        })?;

-        // p9-fb-25: probe for the legacy `workspace.include` key — if
-        // present, emit a one-shot deprecation warning. Detection uses
-        // raw `toml::Value` lookup; the warning fires via a process-
-        // level OnceLock so a long-running TUI / CLI run doesn't spam
-        // the log on every Config::load.
-        if let Ok(value) = toml::from_str::<toml::Value>(&text) {
-            if value
-                .get("workspace")
-                .and_then(|v| v.get("include"))
-                .is_some()
-            {
-                static DEPRECATION_FIRED: std::sync::OnceLock<()> = std::sync::OnceLock::new();
-                DEPRECATION_FIRED.get_or_init(|| {
+        // raw `toml::Value` 를 한 번만 파싱해 (1) legacy `workspace.include`
+        // deprecation probe (p9-fb-25) 와 (2) `schema_version` 감지(v3 자동변환)
+        // 에 함께 쓴다 — config load 는 매 CLI 호출마다 일어나므로 파싱 1회로.
+        let probe = toml::from_str::<toml::Value>(&text).ok();
+
+        // p9-fb-25: legacy `workspace.include` 키가 있으면 일회성 deprecation
+        // 경고(process-level OnceLock 로 장기 실행 시 로그 도배 방지).
+        if probe
+            .as_ref()
+            .and_then(|v| v.get("workspace"))
+            .and_then(|v| v.get("include"))
+            .is_some()
+        {
+            static DEPRECATION_FIRED: std::sync::OnceLock<()> = std::sync::OnceLock::new();
+            DEPRECATION_FIRED.get_or_init(|| {
+                tracing::warn!(
+                    target: "kebab-config",
+                    config = %path.display(),
+                    "deprecated config: `workspace.include` 필드는 더 이상 사용되지 않습니다 (p9-fb-25, v0.2.1+). 처리 가능한 형식 (md / png / jpg / pdf) 은 extractor 가 자동 결정. config 에서 이 필드를 제거해도 안전 — 더 이상 enforce 안 됨."
+                );
+            });
+        }
+
+        // v3: 파일의 schema_version 이 CURRENT 보다 낮으면 메모리에서 변환한다
+        // (디스크 미변경 — 파일 갱신은 `kebab config migrate`). 미변환 v2 파일도
+        // 설정 유실 없이 로드(불변식 #3). non-additive relocation(v2→v3) 은
+        // serde default forward-compat 로는 커버 안 되므로 반드시 거쳐야 한다.
+        let parse_text = {
+            let from = probe
+                .as_ref()
+                .and_then(|v| v.get("schema_version").and_then(toml::Value::as_integer))
+                .unwrap_or(1) as u32;
+            if from < crate::migrate::CURRENT_SCHEMA_VERSION {
+                static MIGRATE_WARNED: std::sync::OnceLock<()> = std::sync::OnceLock::new();
+                MIGRATE_WARNED.get_or_init(|| {
                    tracing::warn!(
                        target: "kebab-config",
                        config = %path.display(),
-                        "deprecated config: `workspace.include` 필드는 더 이상 사용되지 않습니다 (p9-fb-25, v0.2.1+). 처리 가능한 형식 (md / png / jpg / pdf) 은 extractor 가 자동 결정. config 에서 이 필드를 제거해도 안전 — 더 이상 enforce 안 됨."
+                        from,
+                        to = crate::migrate::CURRENT_SCHEMA_VERSION,
+                        "config 가 옛 스키마입니다 — 이번 실행은 메모리에서 변환됨. 파일 갱신: `kebab config migrate`."
                    );
                });
+                crate::migrate::migrate_document(&text).new_text
+            } else {
+                text.clone()
            }
-        }
+        };

-        let mut cfg: Self = toml::from_str(&text).map_err(|e| {
+        let mut cfg: Self = toml::from_str(&parse_text).map_err(|e| {
            anyhow::Error::new(ConfigInvalid {
                path: path.to_path_buf(),
                cause: format!("parse_failed: {e}"),
@@ -963,33 +1056,33 @@ impl Config {
                // indexing
                "KEBAB_INDEXING_MAX_PARALLEL_EXTRACTORS" => {
                    if let Ok(n) = v.parse::<u32>() {
-                        self.indexing.max_parallel_extractors = n;
+                        self.ingest.max_parallel_extractors = n;
                    }
                }
                "KEBAB_INDEXING_MAX_PARALLEL_EMBEDDINGS" => {
                    if let Ok(n) = v.parse::<u32>() {
-                        self.indexing.max_parallel_embeddings = n;
+                        self.ingest.max_parallel_embeddings = n;
                    }
                }
                "KEBAB_INDEXING_WATCH_FILESYSTEM" => {
-                    self.indexing.watch_filesystem = parse_bool(v);
+                    self.ingest.watch_filesystem = parse_bool(v);
                }

                // chunking
                "KEBAB_CHUNKING_TARGET_TOKENS" => {
                    if let Ok(n) = v.parse::<usize>() {
-                        self.chunking.target_tokens = n;
+                        self.ingest.chunking.target_tokens = n;
                    }
                }
                "KEBAB_CHUNKING_OVERLAP_TOKENS" => {
                    if let Ok(n) = v.parse::<usize>() {
-                        self.chunking.overlap_tokens = n;
+                        self.ingest.chunking.overlap_tokens = n;
                    }
                }
                "KEBAB_CHUNKING_RESPECT_MARKDOWN_HEADINGS" => {
-                    self.chunking.respect_markdown_headings = parse_bool(v);
+                    self.ingest.chunking.respect_markdown_headings = parse_bool(v);
                }
-                "KEBAB_CHUNKING_CHUNKER_VERSION" => self.chunking.chunker_version = v.clone(),
+                "KEBAB_CHUNKING_CHUNKER_VERSION" => self.ingest.chunking.chunker_version = v.clone(),

                // models.embedding
                "KEBAB_MODELS_EMBEDDING_PROVIDER" => self.models.embedding.provider = v.clone(),
@@ -1122,18 +1215,18 @@ impl Config {

                // image.ocr
                "KEBAB_IMAGE_OCR_ENABLED" => {
-                    self.image.ocr.enabled = parse_bool(v);
+                    self.ingest.image.ocr.enabled = parse_bool(v);
                }
-                "KEBAB_IMAGE_OCR_ENGINE" => self.image.ocr.engine = v.clone(),
-                "KEBAB_IMAGE_OCR_MODEL" => self.image.ocr.model = v.clone(),
+                "KEBAB_IMAGE_OCR_ENGINE" => self.ingest.image.ocr.engine = v.clone(),
+                "KEBAB_IMAGE_OCR_MODEL" => self.ingest.image.ocr.model = v.clone(),
                "KEBAB_IMAGE_OCR_ENDPOINT" => {
                    // Empty env value is treated the same as "fall back
                    // to models.llm.endpoint" — i.e. set None.
-                    self.image.ocr.endpoint = if v.is_empty() { None } else { Some(v.clone()) };
+                    self.ingest.image.ocr.endpoint = if v.is_empty() { None } else { Some(v.clone()) };
                }
                "KEBAB_IMAGE_OCR_LANGUAGES" => {
                    // Comma-separated list, e.g. "eng,kor".
-                    self.image.ocr.languages = v
+                    self.ingest.image.ocr.languages = v
                        .split(',')
                        .map(|s| s.trim().to_string())
                        .filter(|s| !s.is_empty())
@@ -1141,66 +1234,66 @@ impl Config {
                }
                "KEBAB_IMAGE_OCR_MAX_PIXELS" => {
                    if let Ok(n) = v.parse::<u32>() {
-                        self.image.ocr.max_pixels = n;
+                        self.ingest.image.ocr.max_pixels = n;
                    }
                }
                "KEBAB_IMAGE_OCR_REQUEST_TIMEOUT_SECS" => {
                    if let Ok(n) = v.parse::<u64>() {
-                        self.image.ocr.request_timeout_secs = n;
+                        self.ingest.image.ocr.request_timeout_secs = n;
                    }
                }
                // paddle-onnx engine overrides (v0.27.0). Empty string → None
                // (fall back to bundled / KEBAB_IMAGE_OCR_MODEL_DIR).
                "KEBAB_IMAGE_OCR_DET_MODEL" => {
-                    self.image.ocr.det_model =
+                    self.ingest.image.ocr.det_model =
                        if v.is_empty() { None } else { Some(v.clone()) };
                }
                "KEBAB_IMAGE_OCR_REC_MODEL" => {
-                    self.image.ocr.rec_model =
+                    self.ingest.image.ocr.rec_model =
                        if v.is_empty() { None } else { Some(v.clone()) };
                }
                "KEBAB_IMAGE_OCR_DICT" => {
-                    self.image.ocr.dict = if v.is_empty() { None } else { Some(v.clone()) };
+                    self.ingest.image.ocr.dict = if v.is_empty() { None } else { Some(v.clone()) };
                }
                "KEBAB_IMAGE_OCR_SCORE_THRESH" => {
                    if let Ok(f) = v.parse::<f32>() {
-                        self.image.ocr.score_thresh = f;
+                        self.ingest.image.ocr.score_thresh = f;
                    }
                }
                "KEBAB_IMAGE_OCR_UNCLIP_RATIO" => {
                    if let Ok(f) = v.parse::<f32>() {
-                        self.image.ocr.unclip_ratio = f;
+                        self.ingest.image.ocr.unclip_ratio = f;
                    }
                }
                "KEBAB_IMAGE_OCR_MAX_BOXES" => {
                    if let Ok(n) = v.parse::<usize>() {
-                        self.image.ocr.max_boxes = n;
+                        self.ingest.image.ocr.max_boxes = n;
                    }
                }

                // image.caption (P6-3)
                "KEBAB_IMAGE_CAPTION_ENABLED" => {
-                    self.image.caption.enabled = parse_bool(v);
+                    self.ingest.image.caption.enabled = parse_bool(v);
                }
                "KEBAB_IMAGE_CAPTION_MAX_PIXELS" => {
                    if let Ok(n) = v.parse::<u32>() {
-                        self.image.caption.max_pixels = n;
+                        self.ingest.image.caption.max_pixels = n;
                    }
                }
                "KEBAB_IMAGE_CAPTION_PROMPT_TEMPLATE_VERSION" => {
-                    self.image.caption.prompt_template_version = v.clone();
+                    self.ingest.image.caption.prompt_template_version = v.clone();
                }

                // pdf.ocr (v0.20.0 sub-item 1)
-                "KEBAB_PDF_OCR_ENABLED" => self.pdf.ocr.enabled = parse_bool(v),
-                "KEBAB_PDF_OCR_ALWAYS_ON" => self.pdf.ocr.always_on = parse_bool(v),
-                "KEBAB_PDF_OCR_ENGINE" => self.pdf.ocr.engine = v.clone(),
-                "KEBAB_PDF_OCR_MODEL" => self.pdf.ocr.model = v.clone(),
+                "KEBAB_PDF_OCR_ENABLED" => self.ingest.pdf.ocr.enabled = parse_bool(v),
+                "KEBAB_PDF_OCR_ALWAYS_ON" => self.ingest.pdf.ocr.always_on = parse_bool(v),
+                "KEBAB_PDF_OCR_ENGINE" => self.ingest.pdf.ocr.engine = v.clone(),
+                "KEBAB_PDF_OCR_MODEL" => self.ingest.pdf.ocr.model = v.clone(),
                "KEBAB_PDF_OCR_ENDPOINT" => {
-                    self.pdf.ocr.endpoint = if v.is_empty() { None } else { Some(v.clone()) };
+                    self.ingest.pdf.ocr.endpoint = if v.is_empty() { None } else { Some(v.clone()) };
                }
                "KEBAB_PDF_OCR_LANGUAGES" => {
-                    self.pdf.ocr.languages = v
+                    self.ingest.pdf.ocr.languages = v
                        .split(',')
                        .map(|s| s.trim().to_string())
                        .filter(|s| !s.is_empty())
@@ -1208,26 +1301,54 @@ impl Config {
                }
                "KEBAB_PDF_OCR_MAX_PIXELS" => {
                    if let Ok(n) = v.parse::<u32>() {
-                        self.pdf.ocr.max_pixels = n;
+                        self.ingest.pdf.ocr.max_pixels = n;
                    }
                }
                "KEBAB_PDF_OCR_REQUEST_TIMEOUT_SECS" => {
                    if let Ok(n) = v.parse::<u64>() {
-                        self.pdf.ocr.request_timeout_secs = n;
+                        self.ingest.pdf.ocr.request_timeout_secs = n;
                    }
                }
                "KEBAB_PDF_OCR_VALID_RATIO_THRESHOLD" => {
                    if let Ok(n) = v.parse::<f32>() {
-                        self.pdf.ocr.valid_ratio_threshold = n.clamp(0.0, 1.0);
+                        self.ingest.pdf.ocr.valid_ratio_threshold = n.clamp(0.0, 1.0);
                    }
                }
                "KEBAB_PDF_OCR_MIN_CHAR_COUNT" => {
                    if let Ok(n) = v.parse::<u32>() {
-                        self.pdf.ocr.min_char_count = n;
+                        self.ingest.pdf.ocr.min_char_count = n;
                    }
                }
                "KEBAB_PDF_OCR_LANG_HINT" => {
-                    self.pdf.ocr.lang_hint = if v.is_empty() { None } else { Some(v.clone()) };
+                    self.ingest.pdf.ocr.lang_hint = if v.is_empty() { None } else { Some(v.clone()) };
+                }
+                // pdf paddle-onnx engine overrides (v3). image.ocr paddle 패턴 복제.
+                // Empty string → None (fall back to bundled / KEBAB_IMAGE_OCR_MODEL_DIR).
+                "KEBAB_PDF_OCR_DET_MODEL" => {
+                    self.ingest.pdf.ocr.det_model =
+                        if v.is_empty() { None } else { Some(v.clone()) };
+                }
+                "KEBAB_PDF_OCR_REC_MODEL" => {
+                    self.ingest.pdf.ocr.rec_model =
+                        if v.is_empty() { None } else { Some(v.clone()) };
+                }
+                "KEBAB_PDF_OCR_DICT" => {
+                    self.ingest.pdf.ocr.dict = if v.is_empty() { None } else { Some(v.clone()) };
+                }
+                "KEBAB_PDF_OCR_SCORE_THRESH" => {
+                    if let Ok(f) = v.parse::<f32>() {
+                        self.ingest.pdf.ocr.score_thresh = f;
+                    }
+                }
+                "KEBAB_PDF_OCR_UNCLIP_RATIO" => {
+                    if let Ok(f) = v.parse::<f32>() {
+                        self.ingest.pdf.ocr.unclip_ratio = f;
+                    }
+                }
+                "KEBAB_PDF_OCR_MAX_BOXES" => {
+                    if let Ok(n) = v.parse::<usize>() {
+                        self.ingest.pdf.ocr.max_boxes = n;
+                    }
                }

                // Unknown KEBAB_* keys are silently ignored — see
@@ -1413,11 +1534,68 @@ theme = "dark"
        assert_eq!(c, back);
    }

+    /// 불변식 #3: `from_file` 이 v2 파일을 디스크 미변경으로 메모리에서 v3
+    /// 변환 — 미변환 v2 파일도 설정 유실 0.
+    #[test]
+    fn from_file_auto_migrates_v2_in_memory() {
+        let dir = tempfile::tempdir().unwrap();
+        let p = dir.path().join("config.toml");
+        std::fs::write(
+            &p,
+            "\
+schema_version = 2
+
+[workspace]
+root = \"/my/notes\"
+exclude = []
+
+[chunking]
+target_tokens = 777
+
+[image.ocr]
+enabled = true
+engine = \"ollama-vision\"
+model = \"gemma4:e4b\"
+languages = [\"kor\"]
+max_pixels = 1600
+",
+        )
+        .unwrap();
+        let c = Config::from_file(&p).expect("v2 auto-migrate load");
+        // 사용자 v2 값이 새 경로로 살아있어야(기본값 유실 X).
+        assert_eq!(c.ingest.chunking.target_tokens, 777);
+        assert!(c.ingest.image.ocr.enabled);
+        assert_eq!(c.ingest.image.ocr.languages, vec!["kor"]);
+        // 디스크 파일은 안 바뀜(여전히 schema_version = 2 + [chunking]).
+        let on_disk = std::fs::read_to_string(&p).unwrap();
+        assert!(
+            on_disk.contains("schema_version = 2"),
+            "파일이 변경됨:\n{on_disk}"
+        );
+        assert!(on_disk.contains("[chunking]"), "파일이 변경됨:\n{on_disk}");
+    }
+
+    #[test]
+    fn v3_layout_nests_media_under_ingest() {
+        let c = Config::defaults();
+        // 새 경로가 컴파일·접근 가능해야 한다.
+        assert_eq!(c.ingest.max_parallel_extractors, 2);
+        assert_eq!(c.ingest.chunking.target_tokens, 500);
+        assert_eq!(c.ingest.code.max_file_bytes, 262_144);
+        assert_eq!(c.ingest.image.ocr.engine, "ollama-vision");
+        assert_eq!(c.ingest.image.caption.max_pixels, 768);
+        assert_eq!(c.ingest.pdf.ocr.model, "qwen2.5vl:3b");
+        // pdf paddle 대칭 키 존재 + 기본값.
+        assert_eq!(c.ingest.pdf.ocr.score_thresh, 0.3);
+        assert_eq!(c.ingest.pdf.ocr.max_boxes, 1000);
+        assert!(c.ingest.pdf.ocr.det_model.is_none());
+    }
+
    #[test]
    fn defaults_match_design_64_score_gate() {
        let c = Config::defaults();
        assert_eq!(c.rag.score_gate, 0.30);
-        assert_eq!(c.chunking.target_tokens, 500);
+        assert_eq!(c.ingest.chunking.target_tokens, 500);
        assert_eq!(c.models.embedding.model, "multilingual-e5-large");
        assert_eq!(c.models.embedding.dimensions, 1024);
        assert_eq!(c.search.rrf_k, 60);
@@ -1445,6 +1623,34 @@ theme = "dark"
        assert_eq!(c.search.default_k, 25);
    }

+    /// 불변식 #2: env override 이름(LHS) 100% 보존 — struct 경로가 바뀌어도
+    /// 기존 `KEBAB_*` 스크립트가 새 경로로 대입되어 무파손.
+    #[test]
+    fn env_names_preserved_target_new_paths() {
+        let mut env = HashMap::new();
+        env.insert("KEBAB_CHUNKING_TARGET_TOKENS".into(), "640".into());
+        env.insert("KEBAB_INDEXING_MAX_PARALLEL_EXTRACTORS".into(), "6".into());
+        env.insert("KEBAB_IMAGE_OCR_ENABLED".into(), "true".into());
+        env.insert("KEBAB_PDF_OCR_ENGINE".into(), "paddle-onnx".into());
+        let c = Config::defaults().apply_env(&env);
+        assert_eq!(c.ingest.chunking.target_tokens, 640);
+        assert_eq!(c.ingest.max_parallel_extractors, 6);
+        assert!(c.ingest.image.ocr.enabled);
+        assert_eq!(c.ingest.pdf.ocr.engine, "paddle-onnx");
+    }
+
+    #[test]
+    fn env_pdf_paddle_symmetric_overrides() {
+        let mut env = HashMap::new();
+        env.insert("KEBAB_PDF_OCR_DET_MODEL".into(), "/d.onnx".into());
+        env.insert("KEBAB_PDF_OCR_SCORE_THRESH".into(), "0.4".into());
+        env.insert("KEBAB_PDF_OCR_MAX_BOXES".into(), "500".into());
+        let c = Config::defaults().apply_env(&env);
+        assert_eq!(c.ingest.pdf.ocr.det_model.as_deref(), Some("/d.onnx"));
+        assert!((c.ingest.pdf.ocr.score_thresh - 0.4).abs() < 1e-6);
+        assert_eq!(c.ingest.pdf.ocr.max_boxes, 500);
+    }
+
    #[test]
    fn env_unknown_key_is_ignored() {
        let baseline = Config::defaults();
@@ -1462,7 +1668,7 @@ theme = "dark"
            "777".to_string(),
        );
        let c = Config::defaults().apply_env(&env);
-        assert_eq!(c.chunking.target_tokens, 777);
+        assert_eq!(c.ingest.chunking.target_tokens, 777);
    }

    #[test]
@@ -1517,24 +1723,24 @@ theme = "dark"
            "true".to_string(),
        );
        let c = Config::defaults().apply_env(&env);
-        assert!(c.indexing.watch_filesystem);
+        assert!(c.ingest.watch_filesystem);
    }

    #[test]
    fn image_ocr_defaults_disabled_with_ollama_vision() {
        let c = Config::defaults();
-        assert!(!c.image.ocr.enabled);
-        assert_eq!(c.image.ocr.engine, "ollama-vision");
-        assert_eq!(c.image.ocr.model, "gemma4:e4b");
-        assert_eq!(c.image.ocr.languages, vec!["eng", "kor"]);
-        assert_eq!(c.image.ocr.max_pixels, 1600);
+        assert!(!c.ingest.image.ocr.enabled);
+        assert_eq!(c.ingest.image.ocr.engine, "ollama-vision");
+        assert_eq!(c.ingest.image.ocr.model, "gemma4:e4b");
+        assert_eq!(c.ingest.image.ocr.languages, vec!["eng", "kor"]);
+        assert_eq!(c.ingest.image.ocr.max_pixels, 1600);
    }

    /// v0.17.2 post-dogfood: matches the legacy hard-coded 300s cap so
    /// existing configs that omit the new field keep behaving identically.
    #[test]
    fn default_ocr_request_timeout_secs_is_300() {
-        assert_eq!(Config::defaults().image.ocr.request_timeout_secs, 300);
+        assert_eq!(Config::defaults().ingest.image.ocr.request_timeout_secs, 300);
    }

    #[test]
@@ -1545,7 +1751,7 @@ theme = "dark"
            "900".to_string(),
        );
        let c = Config::defaults().apply_env(&env);
-        assert_eq!(c.image.ocr.request_timeout_secs, 900);
+        assert_eq!(c.ingest.image.ocr.request_timeout_secs, 900);
    }

    /// post-v0.17.1 dogfood: a config file written before the OCR
@@ -1555,7 +1761,7 @@ theme = "dark"
    #[test]
    fn legacy_config_without_ocr_request_timeout_secs_uses_default() {
        let c: Config = toml::from_str(LEGACY_PRE_TIMEOUT_TOML).expect("parse legacy config");
-        assert_eq!(c.image.ocr.request_timeout_secs, 300);
+        assert_eq!(c.ingest.image.ocr.request_timeout_secs, 300);
    }

    // ── p9-fb-41: multi-hop RAG knobs ────────────────────────────────────
@@ -1707,14 +1913,14 @@ theme = "dark"
        );
        env.insert("KEBAB_IMAGE_OCR_MAX_PIXELS".to_string(), "2048".to_string());
        let c = Config::defaults().apply_env(&env);
-        assert!(c.image.ocr.enabled);
-        assert_eq!(c.image.ocr.model, "gemma4:31b");
+        assert!(c.ingest.image.ocr.enabled);
+        assert_eq!(c.ingest.image.ocr.model, "gemma4:31b");
        assert_eq!(
-            c.image.ocr.endpoint.as_deref(),
+            c.ingest.image.ocr.endpoint.as_deref(),
            Some("http://192.168.0.47:11434")
        );
-        assert_eq!(c.image.ocr.languages, vec!["eng", "kor", "jpn"]);
-        assert_eq!(c.image.ocr.max_pixels, 2048);
+        assert_eq!(c.ingest.image.ocr.languages, vec!["eng", "kor", "jpn"]);
+        assert_eq!(c.ingest.image.ocr.max_pixels, 2048);
    }

    /// Pre-P6 config files don't have an `[image]` section. The
@@ -1723,9 +1929,9 @@ theme = "dark"
    #[test]
    fn image_caption_defaults_disabled() {
        let c = Config::defaults();
-        assert!(!c.image.caption.enabled);
-        assert_eq!(c.image.caption.max_pixels, 768);
-        assert_eq!(c.image.caption.prompt_template_version, "caption-v1");
+        assert!(!c.ingest.image.caption.enabled);
+        assert_eq!(c.ingest.image.caption.max_pixels, 768);
+        assert_eq!(c.ingest.image.caption.prompt_template_version, "caption-v1");
    }

    #[test]
@@ -1744,9 +1950,9 @@ theme = "dark"
            "caption-v2".to_string(),
        );
        let c = Config::defaults().apply_env(&env);
-        assert!(c.image.caption.enabled);
-        assert_eq!(c.image.caption.max_pixels, 1024);
-        assert_eq!(c.image.caption.prompt_template_version, "caption-v2");
+        assert!(c.ingest.image.caption.enabled);
+        assert_eq!(c.ingest.image.caption.max_pixels, 1024);
+        assert_eq!(c.ingest.image.caption.prompt_template_version, "caption-v2");
    }

    /// `KEBAB_IMAGE_OCR_ENDPOINT=""` (empty value) should map to `None`
@@ -1757,7 +1963,7 @@ theme = "dark"
        let mut env = HashMap::new();
        env.insert("KEBAB_IMAGE_OCR_ENDPOINT".to_string(), String::new());
        let c = Config::defaults().apply_env(&env);
-        assert_eq!(c.image.ocr.endpoint, None);
+        assert_eq!(c.ingest.image.ocr.endpoint, None);
    }

    #[test]
@@ -1820,7 +2026,7 @@ explain_default = false
 max_context_tokens = 8000
 "#;
        let c: Config = toml::from_str(toml_text).expect("pre-P6 TOML must still parse");
-        assert_eq!(c.image, ImageCfg::defaults());
+        assert_eq!(c.ingest.image, ImageCfg::defaults());
    }

    /// p9-fb-25: legacy config with `workspace.include = [...]` must
--- a/crates/kebab-config/src/migrate.rs
+++ b/crates/kebab-config/src/migrate.rs
@@ -5,11 +5,11 @@
 //! non-additive 변환(deprecated 제거 등). 자세한 계약은 spec
 //! `docs/superpowers/specs/2026-05-31-config-migration-design.md`.

-use toml_edit::DocumentMut;
+use toml_edit::{DocumentMut, Item};

 /// 현재 바이너리가 이해하는 config 스키마 버전. 마이그레이션 완료 시
 /// 사용자 파일의 `schema_version` 을 이 값으로 stamp 한다.
-pub const CURRENT_SCHEMA_VERSION: u32 = 2;
+pub const CURRENT_SCHEMA_VERSION: u32 = 3;

 /// 한 번의 마이그레이션에서 발생한 개별 변경.
 #[derive(Clone, Debug, PartialEq, serde::Serialize)]
@@ -77,19 +77,60 @@ fn section_comment(path: &str) -> Option<&'static str> {
        "models.nli" => "# NLI(groundedness) 모델.",
        "search" => "# 검색 기본 k·stale 기준·fusion.",
        "rag" => "# 답변 생성: prompt 템플릿·score gate·NLI.",
-        "image" => "# 이미지 OCR + 캡션(기본 off, asset 당 모델 호출 비용).",
-        "image.ocr" => "# 이미지 OCR(기본 off).",
-        "image.caption" => "# 이미지 캡션(기본 off).",
        "ui" => "# TUI 팔레트·role 스타일.",
-        "ingest" => "# ingest 정책(code skip 등).",
+        "ingest" => "# 모든 형식 ingest 우산: 병렬도 + chunking/code/image/pdf.",
+        "ingest.chunking" => "# 청크 크기·오버랩·heading 존중(전 형식 공통).",
        "ingest.code" => "# code ingest skip 정책(.gitignore 자동 honor).",
-        "pdf" => "# PDF ingest. scanned PDF OCR 은 기본 off(page 당 cost).",
-        "pdf.ocr" => "# scanned PDF page-단위 OCR(기본 off).",
+        "ingest.image" => "# 이미지 OCR + 캡션(기본 off, asset 당 모델 호출 비용).",
+        "ingest.image.ocr" => "# 이미지 OCR(기본 off).",
+        "ingest.image.caption" => "# 이미지 캡션(기본 off).",
+        "ingest.pdf" => "# PDF ingest. scanned PDF OCR 은 기본 off(page 당 cost).",
+        "ingest.pdf.ocr" => "# scanned PDF page-단위 OCR(기본 off).",
        "logging" => "# ingest 로그(기본 on, ~/.local/state/kebab/logs).",
        _ => return None,
    })
 }

+/// leaf 키 인라인 주석. dotted path(예: `ingest.chunking.target_tokens`) → 한 줄.
+/// 값 뒤에 `  # ...` suffix 로 부착된다(`#` 없이 본문만 반환).
+fn key_comment(path: &str) -> Option<&'static str> {
+    Some(match path {
+        "workspace.root" => "색인 루트. 절대/~/${VAR}/상대(=이 파일 기준).",
+        "workspace.exclude" => "denylist glob.",
+        "storage.copy_threshold_mb" => "이 크기(MB) 초과 파일은 사본 대신 참조.",
+        "models.embedding.provider" => "fastembed | candle | ollama | none.",
+        "models.embedding.dimensions" => "모델 출력 차원. 틀리면 검색 0건.",
+        "models.embedding.num_threads" => "candle 전용 CPU 스레드 cap(0=auto).",
+        "models.embedding.endpoint" => "ollama provider 시 HTTP. 비우면 llm.endpoint fallback.",
+        "models.llm.request_timeout_secs" => "단일 HTTP 상한. 0=즉시실패(비활성화 아님).",
+        "ingest.max_parallel_extractors" => "동시 extractor 수.",
+        "ingest.max_parallel_embeddings" => "동시 임베딩 수.",
+        "ingest.chunking.target_tokens" => "청크 목표 토큰(전 형식 공통).",
+        "ingest.chunking.respect_markdown_headings" => "markdown heading 경계 존중.",
+        "ingest.image.ocr.enabled" => "이미지 OCR(기본 off, asset 당 비용).",
+        "ingest.image.ocr.engine" => "ollama-vision | paddle-onnx.",
+        "ingest.image.ocr.model" => "ollama-vision 전용. paddle-onnx 는 번들 모델 사용(이 값 무시).",
+        "ingest.image.ocr.request_timeout_secs" => "0=즉시실패(비활성화 아님).",
+        "ingest.image.ocr.score_thresh" => "DBNet box 점수 하한(paddle).",
+        "ingest.image.ocr.unclip_ratio" => "box 패딩 비율(paddle).",
+        "ingest.image.ocr.max_boxes" => "이미지당 box cap(paddle).",
+        "ingest.image.caption.enabled" => "이미지 캡션(기본 off).",
+        "ingest.pdf.ocr.enabled" => "scanned PDF OCR(기본 off, page 당 비용).",
+        "ingest.pdf.ocr.always_on" => "true=모든 page vision 호출(dual-text).",
+        "ingest.pdf.ocr.engine" => "ollama-vision | paddle-onnx.",
+        "ingest.pdf.ocr.model" => "ollama-vision 전용. paddle-onnx 는 번들 모델 사용.",
+        "ingest.pdf.ocr.valid_ratio_threshold" => "유효문자 비율 < 이면 scanned 판정.",
+        "ingest.pdf.ocr.min_char_count" => "page 문자수 < 이면 auto-scanned.",
+        "ingest.pdf.ocr.request_timeout_secs" => "0=즉시실패(비활성화 아님).",
+        "rag.score_gate" => "검색 점수 게이트.",
+        "rag.nli_threshold" => "0=NLI 게이트 off.",
+        "search.default_k" => "기본 검색 결과 수.",
+        "ui.theme" => "dark | light.",
+        "logging.ingest_log_enabled" => "ingest 로그(기본 on).",
+        _ => return None,
+    })
+}
+
 /// Config::defaults() 를 직렬화 + 주석 부착한 "완전체" 문서.
 /// init 과 migrate reconciliation 의 단일 참조 원천.
 pub fn annotated_default_document() -> DocumentMut {
@@ -123,6 +164,12 @@ fn annotate_table(table: &mut toml_edit::Table, prefix_path: &str) {
                    sub.decor_mut().set_prefix(format!("\n{c}\n"));
                }
                annotate_table(sub, &path);
+            } else if let Some(kc) = key_comment(&path) {
+                // 스칼라/배열 leaf: 값 뒤 인라인 주석 suffix. 배열(exclude 등)은
+                // 멀티라인 직렬화돼도 닫는 `]` 뒤로 가 유효.
+                if let Some(v) = item.as_value_mut() {
+                    v.decor_mut().set_suffix(format!("  # {kc}"));
+                }
            }
        }
    }
@@ -191,12 +238,152 @@ pub fn step_1_to_2(doc: &mut DocumentMut, changes: &mut Vec<MigrationChange>) {
    }
 }

+/// `from_path` 의 마지막 키를 통째(decor 포함) remove 해 `to_path` 의 dotted
+/// 경로에 삽입한다(중간 테이블 자동 생성). 대상 키가 이미 있으면 덮어쓰지
+/// 않는다(사용자 명시 우선). 원본이 없으면 no-op(멱등).
+fn move_table(
+    doc: &mut DocumentMut,
+    from_path: &[&str],
+    to_path: &[&str],
+    changes: &mut Vec<MigrationChange>,
+) {
+    // from 의 부모까지 내려가 마지막 키를 remove.
+    let (from_parent, from_key) = from_path.split_at(from_path.len() - 1);
+    let mut cur = doc.as_table_mut();
+    for k in from_parent {
+        match cur.get_mut(k).and_then(Item::as_table_mut) {
+            Some(t) => cur = t,
+            None => return, // 원본 없음 → no-op.
+        }
+    }
+    let Some(item) = cur.remove(from_key[0]) else {
+        return;
+    };
+
+    // to 경로의 부모 테이블 확보(없으면 생성), 마지막 키에 삽입.
+    let (to_parent, to_key) = to_path.split_at(to_path.len() - 1);
+    let mut cur = doc.as_table_mut();
+    for k in to_parent {
+        if cur.get(k).is_none() {
+            cur.insert(k, Item::Table(toml_edit::Table::new()));
+        }
+        cur = cur
+            .get_mut(k)
+            .and_then(Item::as_table_mut)
+            .expect("just inserted");
+    }
+    if cur.get(to_key[0]).is_none() {
+        cur.insert(to_key[0], item);
+        changes.push(MigrationChange {
+            kind: ChangeKind::AddedSection,
+            path: to_path.join("."),
+            detail: format!("{} → {}", from_path.join("."), to_path.join(".")),
+        });
+    }
+}
+
+/// 옛 `[indexing]` 의 bare 스칼라 키들을 `[ingest]` 로 옮긴다(테이블 자체가
+/// 아니라 키 단위). 대상에 이미 있는 키는 덮어쓰지 않는다.
+fn move_indexing_keys(doc: &mut DocumentMut, changes: &mut Vec<MigrationChange>) {
+    let Some(idx) = doc.as_table_mut().remove("indexing") else {
+        return;
+    };
+    let Some(idx_tbl) = idx.as_table().cloned() else {
+        return;
+    };
+    if doc.get("ingest").is_none() {
+        doc["ingest"] = Item::Table(toml_edit::Table::new());
+    }
+    let ingest = doc["ingest"].as_table_mut().expect("ingest table");
+    for (k, v) in idx_tbl.iter() {
+        if ingest.get(k).is_none() {
+            ingest.insert(k, v.clone());
+        }
+    }
+    changes.push(MigrationChange {
+        kind: ChangeKind::AddedKey,
+        path: "ingest".to_string(),
+        detail: "indexing → ingest (병렬도 키)".to_string(),
+    });
+}
+
+/// v3: pdf paddle 동작 보존. v2 는 pdf paddle 이 `[image.ocr]` 의 모델 경로를
+/// 빌려썼다. relocation 후 image.ocr 의 paddle 6키 실제 값을 pdf.ocr 대칭
+/// 키로 복사한다(pdf 가 이미 명시한 키는 덮어쓰지 않음, pdf 가 paddle 일 때만).
+fn copy_image_paddle_to_pdf(doc: &mut DocumentMut) {
+    const PADDLE_KEYS: [&str; 6] = [
+        "det_model",
+        "rec_model",
+        "dict",
+        "score_thresh",
+        "unclip_ratio",
+        "max_boxes",
+    ];
+    let img = doc
+        .get("ingest")
+        .and_then(|i| i.get("image"))
+        .and_then(|i| i.get("ocr"))
+        .and_then(Item::as_table)
+        .cloned();
+    let Some(img) = img else {
+        return;
+    };
+    let pdf_is_paddle = doc
+        .get("ingest")
+        .and_then(|i| i.get("pdf"))
+        .and_then(|i| i.get("ocr"))
+        .and_then(|o| o.get("engine"))
+        .and_then(Item::as_str)
+        == Some("paddle-onnx");
+    if !pdf_is_paddle {
+        return;
+    }
+    let Some(pdf) = doc["ingest"]["pdf"]["ocr"].as_table_mut() else {
+        return;
+    };
+    for k in PADDLE_KEYS {
+        if pdf.get(k).is_none() {
+            if let Some(v) = img.get(k) {
+                pdf.insert(k, v.clone());
+            }
+        }
+    }
+}
+
+/// v2 → v3: 미디어 테이블을 `[ingest.*]` 로 relocation(값·주석 보존) + pdf
+/// paddle 값 보존. 멱등(이미 v3 면 원본 테이블이 없어 전부 no-op).
+pub fn step_2_to_3(doc: &mut DocumentMut, changes: &mut Vec<MigrationChange>) {
+    move_indexing_keys(doc, changes);
+    move_table(doc, &["chunking"], &["ingest", "chunking"], changes);
+    move_table(doc, &["image", "ocr"], &["ingest", "image", "ocr"], changes);
+    move_table(
+        doc,
+        &["image", "caption"],
+        &["ingest", "image", "caption"],
+        changes,
+    );
+    move_table(doc, &["pdf", "ocr"], &["ingest", "pdf", "ocr"], changes);
+
+    // 빈 껍데기 [image] / [pdf] 제거.
+    for empty in ["image", "pdf"] {
+        if let Some(t) = doc.get(empty).and_then(Item::as_table) {
+            if t.is_empty() {
+                doc.as_table_mut().remove(empty);
+            }
+        }
+    }
+
+    copy_image_paddle_to_pdf(doc);
+}
+
 /// 파일의 schema_version(없으면 1) 부터 CURRENT 까지 step 적용.
 fn run_steps(doc: &mut DocumentMut, from: u32, changes: &mut Vec<MigrationChange>) {
    if from < 2 {
        step_1_to_2(doc, changes);
    }
-    // 미래 step: if from < 3 { step_2_to_3(...) } ...
+    if from < 3 {
+        step_2_to_3(doc, changes);
+    }
 }

 /// 사용자 config.toml 텍스트를 받아 step 체인 + reconciliation + version
@@ -250,16 +437,35 @@ pub fn migrate_document(text: &str) -> MigrationOutcome {
 mod tests {
    use super::*;

+    #[test]
+    fn annotated_default_has_per_key_comments() {
+        let text = annotated_default_document().to_string();
+        // 대표 키 인라인 주석 존재.
+        assert!(text.contains("# 색인 루트"), "workspace.root 주석 누락:\n{text}");
+        assert!(text.contains("0=즉시실패"), "request_timeout 주석 누락:\n{text}");
+        assert!(
+            text.contains("paddle-onnx 는 번들 모델"),
+            "ocr.model 주석 누락:\n{text}"
+        );
+        // 주석 추가가 파싱을 깨지 않는다.
+        let back: crate::Config = toml::from_str(&text).expect("parse annotated default");
+        assert_eq!(back, crate::Config::defaults());
+    }
+
    #[test]
    fn annotated_default_has_all_sections_and_parses_back_to_defaults() {
        let doc = annotated_default_document();
        let text = doc.to_string();
-        // PdfCfg/ImageCfg/ModelsCfg/IngestCfg 는 스칼라 필드가 없어 bare
-        // `[pdf]` 등은 안 나오고 `[pdf.ocr]` 같은 하위 테이블만 직렬화된다.
+        // v3: 미디어 형식 섹션이 전부 `[ingest.*]` 하위로 통합됐다. IngestCfg
+        // 는 스칼라(병렬도) 필드가 있어 bare `[ingest]` + 하위 테이블이 함께
+        // 직렬화된다.
        for section in [
            "[workspace]",
+            "[ingest]",
+            "[ingest.chunking]",
            "[ingest.code]",
-            "[pdf.ocr]",
+            "[ingest.image.ocr]",
+            "[ingest.pdf.ocr]",
            "[logging]",
            "[ui]",
        ] {
@@ -382,6 +588,66 @@ include = [\"*.md\"]
        assert_eq!(again.new_text, outcome.new_text);
    }

+    fn changes_after_second_pass(text: &str) -> Vec<MigrationChange> {
+        let mut doc: DocumentMut = text.parse().unwrap();
+        let mut ch = Vec::new();
+        step_2_to_3(&mut doc, &mut ch);
+        ch
+    }
+
+    #[test]
+    fn step_2_to_3_relocates_media_tables() {
+        let v2 = "\
+schema_version = 2
+
+[indexing]
+max_parallel_extractors = 4
+watch_filesystem = true
+
+[chunking]
+target_tokens = 700
+
+[image.ocr]
+enabled = true
+engine = \"paddle-onnx\"
+det_model = \"/custom/det.onnx\"
+
+[image.caption]
+enabled = true
+
+[pdf.ocr]
+enabled = false
+engine = \"paddle-onnx\"
+";
+        let mut doc: DocumentMut = v2.parse().unwrap();
+        let mut changes = Vec::new();
+        step_2_to_3(&mut doc, &mut changes);
+        let out = doc.to_string();
+        // 새 위치 존재.
+        assert!(out.contains("[ingest]"), "{out}");
+        assert!(out.contains("max_parallel_extractors = 4"));
+        assert!(out.contains("watch_filesystem = true"));
+        assert!(out.contains("[ingest.chunking]"));
+        assert!(out.contains("target_tokens = 700"));
+        assert!(out.contains("[ingest.image.ocr]"));
+        assert!(out.contains("det_model = \"/custom/det.onnx\""));
+        assert!(out.contains("[ingest.image.caption]"));
+        assert!(out.contains("[ingest.pdf.ocr]"));
+        // 옛 위치 제거.
+        assert!(!out.contains("[indexing]"));
+        assert!(!out.contains("\n[chunking]"));
+        assert!(!out.contains("\n[image.ocr]"));
+        assert!(!out.contains("\n[image.caption]"));
+        assert!(!out.contains("\n[pdf.ocr]"));
+        // pdf paddle 동작 보존: image paddle det_model 이 pdf 대칭 키로 복사.
+        let reparsed: DocumentMut = out.parse().unwrap();
+        let pdf_det = reparsed["ingest"]["pdf"]["ocr"].get("det_model");
+        assert_eq!(pdf_det.and_then(|v| v.as_str()), Some("/custom/det.onnx"));
+        // 멱등.
+        let again = changes_after_second_pass(&out);
+        assert!(again.is_empty(), "not idempotent: {again:?}");
+    }
+
    #[test]
    fn migrate_document_missing_schema_version_treated_as_v1() {
        let old = "[workspace]\nroot = \"/n\"\n";
--- a/crates/kebab-config/tests/fixtures/user_v2_config.toml
+++ b/crates/kebab-config/tests/fixtures/user_v2_config.toml
@@ -0,0 +1,149 @@
+# kebab config — `~/.config/kebab/config.toml`.
+#
+## `workspace.root` accepts:
+#   • absolute paths       (`/home/me/KnowledgeBase`)
+#   • tilde                (`~/KnowledgeBase`)         ← default
+#   • env vars             (`${XDG_DATA_HOME}/kebab`)
+#   • relative paths       (`./notes`, `notes`, `../shared/x`)
+#     — relative paths resolve against the directory of THIS
+#       config file, NOT the user's `cwd` at invocation time.
+#
+# 처리 가능한 형식 (extractor 가 자동 결정 — config 에 명시할 수 없음):
+#   • Markdown: .md
+#   • 이미지:   .png .jpg .jpeg  (OCR + caption)
+#   • PDF:      .pdf
+# 다른 확장자는 ingest 시 자동 skip + warning. 처리 대상 폴더의
+# 일부만 ingest 하고 싶으면 `kebab ingest <path>` 로 root 명시
+# 또는 `.kebabignore` 파일 / 본 `workspace.exclude` 로 denylist.
+#
+# Override individual keys at runtime with `KEBAB_*` env vars
+# (e.g. `KEBAB_WORKSPACE_ROOT=/tmp/test kebab ingest`).
+schema_version = 2
+
+[workspace]
+root = "/Users/user/Obsidian/Default"
+exclude = [
+    ".git/**",
+    "node_modules/**",
+    ".obsidian/**",
+]
+
+[storage]
+data_dir = "${XDG_DATA_HOME:-~/.local/share}/kebab"
+sqlite = "{data_dir}/kebab.sqlite"
+vector_dir = "{data_dir}/lancedb"
+asset_dir = "{data_dir}/assets"
+artifact_dir = "{data_dir}/artifacts"
+model_dir = "{data_dir}/models"
+runs_dir = "{data_dir}/runs"
+copy_threshold_mb = 100
+
+[indexing]
+max_parallel_extractors = 2
+max_parallel_embeddings = 1
+watch_filesystem = false
+
+[chunking]
+target_tokens = 500
+overlap_tokens = 80
+respect_markdown_headings = true
+chunker_version = "md-heading-v1"
+
+[models.embedding]
+provider = "ollama"
+endpoint = "http://127.0.0.1:11434"
+# endpoint = "http://192.168.0.2:11943"
+model = "snowflake-arctic-embed2"
+# provider = "candle"
+# model    = "snowflake-arctic-embed-l-v2.0"
+version = "v1"
+dimensions = 1024
+batch_size = 64
+num_threads = 0
+
+[models.llm]
+provider = "ollama"
+model = "gemma4:e4b"
+context_tokens = 32768
+# endpoint = "http://127.0.0.1:11434"
+endpoint = "http://192.168.0.2:11943"
+temperature = 0.0
+seed = 0
+request_timeout_secs = 300
+
+# NLI(groundedness) 모델.
+[models.nli]
+model = "Xenova/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7"
+provider = "onnx"
+
+[search]
+default_k = 10
+hybrid_fusion = "rrf"
+rrf_k = 60
+snippet_chars = 220
+cache_capacity = 256
+stale_threshold_days = 30
+
+[rag]
+prompt_template_version = "rag-v3"
+score_gate = 0.30000001192092896
+explain_default = false
+max_context_tokens = 8000
+multi_hop_max_depth = 3
+multi_hop_max_sub_queries_per_iter = 5
+multi_hop_max_pool_chunks = 15
+nli_threshold = 0.0
+
+[image.ocr]
+enabled = true
+engine = "paddle-onnx"
+# engine = "ollama-vision"
+model = "gemma4:e4b"
+languages = [
+    "eng",
+    "kor",
+]
+max_pixels = 1600
+request_timeout_secs = 300
+
+[image.caption]
+enabled = true
+max_pixels = 768
+prompt_template_version = "caption-v1"
+
+[ui]
+theme = "dark"
+
+# code ingest skip 정책(.gitignore 자동 honor).
+[ingest.code]
+skip_generated_header = false
+max_file_bytes = 262144
+max_file_lines = 5000
+extra_skip_globs = []
+ast_chunk_max_lines = 200
+fallback_lines_per_chunk = 80
+fallback_lines_overlap = 20
+
+# scanned PDF page-단위 OCR(기본 off).
+[pdf.ocr]
+enabled = false
+always_on = false
+engine = "paddle-onnx"
+# engine = "ollama-vision"
+model = "qwen2.5vl:3b"
+languages = [
+    "eng",
+    "kor",
+]
+max_pixels = 2048
+request_timeout_secs = 180
+valid_ratio_threshold = 0.5
+min_char_count = 20
+lang_hint = "kor"
+
+# ingest 로그(기본 on, ~/.local/state/kebab/logs).
+[logging]
+ingest_log_enabled = true
+ingest_log_dir = "{state_dir}/logs"
+keep_recent_runs = 100
+retention_days = 30
--- a/crates/kebab-config/tests/migrate_v3.rs
+++ b/crates/kebab-config/tests/migrate_v3.rs
@@ -0,0 +1,48 @@
+//! v3 마이그레이션 무손실 골든 — 사용자 실제 v2 config.
+//!
+//! 불변식: 사용자가 손본 값·주석·대안(commented) 줄이 [ingest.*] relocation
+//! 후에도 전부 보존되고, v3 Config 로 파싱했을 때 같은 값을 내며, 재실행이
+//! 멱등이어야 한다.
+use kebab_config::migrate::migrate_document;
+
+const USER_V2: &str = include_str!("fixtures/user_v2_config.toml");
+
+#[test]
+fn user_v2_migrates_losslessly() {
+    let out = migrate_document(USER_V2);
+    assert_eq!(out.from_schema_version, 2);
+    assert_eq!(out.to_schema_version, 3);
+    let t = &out.new_text;
+
+    // 사용자 값 보존.
+    assert!(t.contains("root = \"/Users/user/Obsidian/Default\""), "{t}");
+    assert!(t.contains("model = \"snowflake-arctic-embed2\""));
+    assert!(t.contains("endpoint = \"http://192.168.0.2:11943\""));
+    // 사용자 주석/대안 줄 보존.
+    assert!(t.contains("# engine = \"ollama-vision\""), "대안 주석 유실:\n{t}");
+    assert!(t.contains("# provider = \"candle\""));
+    // 새 위치.
+    assert!(t.contains("[ingest.image.ocr]"));
+    assert!(t.contains("[ingest.pdf.ocr]"));
+    assert!(t.contains("[ingest.chunking]"));
+    assert!(t.contains("[ingest.image.caption]"));
+    // 옛 top-level 위치 제거.
+    assert!(!t.contains("\n[chunking]"));
+    assert!(!t.contains("\n[image.ocr]"));
+    assert!(!t.contains("\n[indexing]"));
+
+    // v3 Config 로 parse + 값 동일.
+    let cfg: kebab_config::Config = toml::from_str(t).expect("v3 parse");
+    assert!(cfg.ingest.image.ocr.enabled);
+    assert_eq!(cfg.ingest.image.ocr.engine, "paddle-onnx");
+    assert_eq!(cfg.models.embedding.model, "snowflake-arctic-embed2");
+    assert_eq!(cfg.models.llm.endpoint, "http://192.168.0.2:11943");
+    // pdf paddle 값 보존(v2 비대칭 → pdf 대칭 키로 복사). user 의 pdf.ocr 는
+    // engine=paddle-onnx 이고 자체 det_model 없으므로 번들(None) 유지.
+    assert_eq!(cfg.ingest.pdf.ocr.engine, "paddle-onnx");
+
+    // 멱등.
+    let again = migrate_document(t);
+    assert!(!again.changed(), "재실행 변경: {:?}", again.changes);
+    assert_eq!(again.new_text, *t);
+}
--- a/crates/kebab-config/tests/pdf_ocr.rs
+++ b/crates/kebab-config/tests/pdf_ocr.rs
@@ -47,20 +47,20 @@ lang_hint = "kor"
 #[test]
 fn pdf_ocr_defaults_off_with_qwen_3b() {
    let cfg = Config::defaults();
-    assert!(!cfg.pdf.ocr.enabled);
-    assert!(!cfg.pdf.ocr.always_on);
-    assert_eq!(cfg.pdf.ocr.engine, "ollama-vision");
-    assert_eq!(cfg.pdf.ocr.model, "qwen2.5vl:3b");
-    assert!(cfg.pdf.ocr.endpoint.is_none());
+    assert!(!cfg.ingest.pdf.ocr.enabled);
+    assert!(!cfg.ingest.pdf.ocr.always_on);
+    assert_eq!(cfg.ingest.pdf.ocr.engine, "ollama-vision");
+    assert_eq!(cfg.ingest.pdf.ocr.model, "qwen2.5vl:3b");
+    assert!(cfg.ingest.pdf.ocr.endpoint.is_none());
    assert_eq!(
-        cfg.pdf.ocr.languages,
+        cfg.ingest.pdf.ocr.languages,
        vec!["eng".to_string(), "kor".to_string()]
    );
-    assert_eq!(cfg.pdf.ocr.max_pixels, 2048);
-    assert_eq!(cfg.pdf.ocr.request_timeout_secs, 180); // Bug #11: 600 → 60 → 180 (HOTFIXES 2026-05-28)
-    assert!((cfg.pdf.ocr.valid_ratio_threshold - 0.5).abs() < 1e-6);
-    assert_eq!(cfg.pdf.ocr.min_char_count, 20);
-    assert_eq!(cfg.pdf.ocr.lang_hint.as_deref(), Some("kor"));
+    assert_eq!(cfg.ingest.pdf.ocr.max_pixels, 2048);
+    assert_eq!(cfg.ingest.pdf.ocr.request_timeout_secs, 180); // Bug #11: 600 → 60 → 180 (HOTFIXES 2026-05-28)
+    assert!((cfg.ingest.pdf.ocr.valid_ratio_threshold - 0.5).abs() < 1e-6);
+    assert_eq!(cfg.ingest.pdf.ocr.min_char_count, 20);
+    assert_eq!(cfg.ingest.pdf.ocr.lang_hint.as_deref(), Some("kor"));
 }

 // Test 3: env var override — 4 keys 의 typical override case.
@@ -80,12 +80,12 @@ fn pdf_ocr_env_overrides() {

    let cfg = Config::defaults().apply_env(&env);

-    assert!(cfg.pdf.ocr.enabled);
-    assert_eq!(cfg.pdf.ocr.model, "qwen2.5vl:7b");
-    assert!(cfg.pdf.ocr.always_on);
-    assert!((cfg.pdf.ocr.valid_ratio_threshold - 0.75).abs() < 1e-6);
+    assert!(cfg.ingest.pdf.ocr.enabled);
+    assert_eq!(cfg.ingest.pdf.ocr.model, "qwen2.5vl:7b");
+    assert!(cfg.ingest.pdf.ocr.always_on);
+    assert!((cfg.ingest.pdf.ocr.valid_ratio_threshold - 0.75).abs() < 1e-6);

    // 다른 env var 가 default 보존
-    assert_eq!(cfg.pdf.ocr.engine, "ollama-vision");
-    assert_eq!(cfg.pdf.ocr.min_char_count, 20);
+    assert_eq!(cfg.ingest.pdf.ocr.engine, "ollama-vision");
+    assert_eq!(cfg.ingest.pdf.ocr.min_char_count, 20);
 }
--- a/crates/kebab-eval/src/runner.rs
+++ b/crates/kebab-eval/src/runner.rs
@@ -220,7 +220,7 @@ fn build_config_snapshot(cfg: &kebab_config::Config, eval_k: usize) -> Result<se
    Ok(serde_json::json!({
        "config": cfg_value,
        "eval_k": eval_k,
-        "chunker_version": cfg.chunking.chunker_version,
+        "chunker_version": cfg.ingest.chunking.chunker_version,
        "embedding": {
            "model": cfg.models.embedding.model,
            "version": cfg.models.embedding.version,
--- a/crates/kebab-parse-image/src/caption.rs
+++ b/crates/kebab-parse-image/src/caption.rs
@@ -18,7 +18,7 @@
 //!
 //! The original P6-3 spec asked for a cargo feature `caption` (default
 //! OFF at compile time). We collapse this into a single runtime gate
-//! (`config.image.caption.enabled = false`, default OFF). Reasoning:
+//! (`config.ingest.image.caption.enabled = false`, default OFF). Reasoning:
 //! the captioning module's only extra deps are `base64` + `image` +
 //! `kebab-llm` trait — all already pulled in by the rest of the
 //! crate. A cargo feature would only complicate the build matrix
@@ -50,13 +50,13 @@ const CAPTION_MAX_TOKENS: usize = 96;

 /// Run a caption pass and return the resulting `ModelCaption`.
 ///
-/// Pure raw operation — does **not** consult `config.image.caption.enabled`.
+/// Pure raw operation — does **not** consult `config.ingest.image.caption.enabled`.
 /// The runtime feature gate lives in [`apply_caption`]; this entry
 /// always invokes the LM. Tests pinning the produced `ModelCaption`
 /// shape can call this directly without flipping the config flag.
 ///
 /// Honours the `[MIN_CAPTION_LONG_EDGE, MAX_CAPTION_LONG_EDGE]` clamp
-/// on `config.image.caption.max_pixels` so a hostile config cannot
+/// on `config.ingest.image.caption.max_pixels` so a hostile config cannot
 /// blow up prompt cost.
 pub fn caption_image(
    llm: &dyn LanguageModel,
@@ -65,15 +65,16 @@ pub fn caption_image(
    cfg: &kebab_config::Config,
 ) -> Result<ModelCaption> {
    let max_pixels = cfg
+        .ingest
        .image
        .caption
        .max_pixels
        .clamp(MIN_CAPTION_LONG_EDGE, MAX_CAPTION_LONG_EDGE);
-    if max_pixels != cfg.image.caption.max_pixels {
+    if max_pixels != cfg.ingest.image.caption.max_pixels {
        tracing::warn!(
            target: "kebab-parse-image",
            "image.caption.max_pixels = {} clamped to {} (legal range [{}, {}])",
-            cfg.image.caption.max_pixels,
+            cfg.ingest.image.caption.max_pixels,
            max_pixels,
            MIN_CAPTION_LONG_EDGE,
            MAX_CAPTION_LONG_EDGE
@@ -129,7 +130,7 @@ pub fn caption_image(
    let caption_text = text.trim().to_string();

    let model_ref = llm.model_ref();
-    let prompt_v = &cfg.image.caption.prompt_template_version;
+    let prompt_v = &cfg.ingest.image.caption.prompt_template_version;
    let model_version = format!(
        "{provider}/{prompt}",
        provider = model_ref.provider,
@@ -151,7 +152,7 @@ pub fn caption_image(
    })
 }

-/// Pipeline entry point — gate-checks `config.image.caption.enabled`
+/// Pipeline entry point — gate-checks `config.ingest.image.caption.enabled`
 /// then mutates `block.caption` in place via [`caption_image`].
 ///
 /// When `enabled = false` the function is a clean no-op (returns
@@ -167,7 +168,7 @@ pub fn apply_caption(
    cfg: &kebab_config::Config,
    events: &mut Vec<ProvenanceEvent>,
 ) -> Result<()> {
-    if !cfg.image.caption.enabled {
+    if !cfg.ingest.image.caption.enabled {
        tracing::debug!(
            target: "kebab-parse-image",
            "captioning skipped — image.caption.enabled = false"
--- a/crates/kebab-parse-image/src/lib.rs
+++ b/crates/kebab-parse-image/src/lib.rs
@@ -34,7 +34,10 @@ pub mod paddle_onnx;

 pub use caption::{apply_caption, caption_image};
 pub use ocr::{OLLAMA_VISION_ENGINE, OcrEngine, OllamaVisionOcr, apply_ocr};
-pub use paddle_onnx::{ModelPaths, OnnxPaddleOcr, PADDLE_ONNX_ENGINE, engine_version_for_config};
+pub use paddle_onnx::{
+    ModelPaths, OnnxPaddleOcr, PADDLE_ONNX_ENGINE, engine_version_for_config,
+    engine_version_for_paths,
+};

 use anyhow::{Context, Result};
 use kebab_core::{
--- a/crates/kebab-parse-image/src/ocr.rs
+++ b/crates/kebab-parse-image/src/ocr.rs
@@ -39,7 +39,7 @@ use crate::image_prep;
 /// Engine name written into `OcrText.engine` for the Ollama-vision adapter.
 pub const OLLAMA_VISION_ENGINE: &str = "ollama-vision";

-/// Lower bound on `config.image.ocr.max_pixels`. Anything below this is
+/// Lower bound on `config.ingest.image.ocr.max_pixels`. Anything below this is
 /// silently bumped to keep the model from receiving an unreadable thumbnail.
 const MIN_LONG_EDGE: u32 = 256;

@@ -126,14 +126,14 @@ pub struct OllamaVisionOcr {

 impl OllamaVisionOcr {
    /// Build an adapter from a workspace [`kebab_config::Config`].
-    /// Reads `config.image.ocr.{model, endpoint, languages, max_pixels}`;
+    /// Reads `config.ingest.image.ocr.{model, endpoint, languages, max_pixels}`;
    /// when `endpoint` is empty falls back to `config.models.llm.endpoint`
    /// so the same Ollama host serves both LLM and OCR by default.
    ///
    /// Construction does NOT touch the network — the first HTTP call
    /// happens inside [`OcrEngine::recognize`].
    pub fn new(config: &kebab_config::Config) -> Result<Self> {
-        let ocr = &config.image.ocr;
+        let ocr = &config.ingest.image.ocr;
        let endpoint = match ocr.endpoint.as_deref() {
            Some(s) if !s.is_empty() => s.to_string(),
            _ => config.models.llm.endpoint.clone(),
--- a/crates/kebab-parse-image/src/paddle_onnx.rs
+++ b/crates/kebab-parse-image/src/paddle_onnx.rs
@@ -122,7 +122,7 @@ impl ModelPaths {
    /// [`from_default_dir`]: ModelPaths::from_default_dir
    pub fn from_config(config: &kebab_config::Config) -> Self {
        let defaults = Self::from_default_dir();
-        let ocr = &config.image.ocr;
+        let ocr = &config.ingest.image.ocr;
        Self {
            det: ocr.det_model.as_ref().map(PathBuf::from).unwrap_or(defaults.det),
            rec: ocr.rec_model.as_ref().map(PathBuf::from).unwrap_or(defaults.rec),
@@ -138,7 +138,7 @@ impl OnnxPaddleOcr {
    /// here are fail-fast (matches the Ollama adapter's construction contract).
    pub fn new(config: &kebab_config::Config) -> Result<Self> {
        let paths = ModelPaths::from_config(config);
-        let ocr = &config.image.ocr;
+        let ocr = &config.ingest.image.ocr;
        Self::from_paths(
            &paths,
            ocr.score_thresh,
@@ -474,6 +474,26 @@ pub fn engine_version_for_config(config: &kebab_config::Config) -> Result<String
    compute_engine_version(&ModelPaths::from_config(config))
 }

+/// v3: `engine_version` 을 명시적 (det,rec,dict) override 로부터 계산한다.
+/// `ingest_config_signature` 의 미디어별 경로(image 는 `[ingest.image.ocr]`,
+/// pdf 는 `[ingest.pdf.ocr]`)를 받아 쓰기 위함 — v2 의 "pdf 가 image paddle
+/// 경로를 빌려쓰던" 비대칭 제거. `None` override 는 번들 모델로 fallback.
+/// `engine_version_for_config` 과 동일하게 ~17 MB 를 읽으므로 호출자가
+/// (det,rec,dict) triple 별로 memoize 해야 한다.
+pub fn engine_version_for_paths(
+    det: Option<&str>,
+    rec: Option<&str>,
+    dict: Option<&str>,
+) -> Result<String> {
+    let defaults = ModelPaths::from_default_dir();
+    let paths = ModelPaths {
+        det: det.map(PathBuf::from).unwrap_or(defaults.det),
+        rec: rec.map(PathBuf::from).unwrap_or(defaults.rec),
+        dict: dict.map(PathBuf::from).unwrap_or(defaults.dict),
+    };
+    compute_engine_version(&paths)
+}
+
 /// blake3 over det + rec + dict bytes → stable `engine_version`.
 fn compute_engine_version(paths: &ModelPaths) -> Result<String> {
    let mut hasher = blake3::Hasher::new();
@@ -882,8 +902,8 @@ mod tests {
        assert!(def.dict.ends_with("korean_dict.txt"), "{:?}", def.dict);

        // Override det + dict; rec stays bundled (partial override allowed).
-        cfg.image.ocr.det_model = Some("/custom/det.onnx".to_string());
-        cfg.image.ocr.dict = Some("/custom/dict.txt".to_string());
+        cfg.ingest.image.ocr.det_model = Some("/custom/det.onnx".to_string());
+        cfg.ingest.image.ocr.dict = Some("/custom/dict.txt".to_string());
        let ov = ModelPaths::from_config(&cfg);
        assert_eq!(ov.det, PathBuf::from("/custom/det.onnx"));
        assert_eq!(ov.dict, PathBuf::from("/custom/dict.txt"));
--- a/crates/kebab-parse-image/tests/caption.rs
+++ b/crates/kebab-parse-image/tests/caption.rs
@@ -22,8 +22,8 @@ use crate::common::red_100x50_png;

 fn cfg_with_caption_enabled() -> Config {
    let mut cfg = Config::defaults();
-    cfg.image.caption.enabled = true;
-    cfg.image.caption.max_pixels = 512;
+    cfg.ingest.image.caption.enabled = true;
+    cfg.ingest.image.caption.max_pixels = 512;
    cfg
 }

@@ -67,7 +67,7 @@ fn mk_mock(canned: &str) -> MockLanguageModel {
 #[test]
 fn apply_caption_no_op_when_feature_disabled() {
    let mut cfg = Config::defaults();
-    cfg.image.caption.enabled = false;
+    cfg.ingest.image.caption.enabled = false;
    let mock = mk_mock("ignored");
    let mut block = empty_image_block();
    let mut events: Vec<ProvenanceEvent> = Vec::new();
@@ -292,8 +292,8 @@ fn caption_image_deterministic_with_identical_inputs() {
 #[test]
 fn caption_image_clamps_oversized_max_pixels() {
    let mut cfg = Config::defaults();
-    cfg.image.caption.enabled = true;
-    cfg.image.caption.max_pixels = 99_999; // way over MAX_CAPTION_LONG_EDGE
+    cfg.ingest.image.caption.enabled = true;
+    cfg.ingest.image.caption.max_pixels = 99_999; // way over MAX_CAPTION_LONG_EDGE
    let captured_images: Arc<Mutex<Vec<String>>> = Arc::new(Mutex::new(Vec::new()));
    let mock = CapturingMock {
        captured_system: Arc::new(Mutex::new(None)),
@@ -339,8 +339,8 @@ fn caption_integration_real_ollama_describes_image() {
    use kebab_llm_local::OllamaLanguageModel;

    let mut cfg = Config::defaults();
-    cfg.image.caption.enabled = true;
-    cfg.image.caption.max_pixels = 768;
+    cfg.ingest.image.caption.enabled = true;
+    cfg.ingest.image.caption.max_pixels = 768;
    if let Ok(ep) = std::env::var("KEBAB_MODELS_LLM_ENDPOINT") {
        cfg.models.llm.endpoint = ep;
    } else {
--- a/crates/kebab-parse-image/tests/ocr.rs
+++ b/crates/kebab-parse-image/tests/ocr.rs
@@ -19,10 +19,10 @@ use crate::common::red_100x50_png;

 fn cfg_for_endpoint(endpoint: &str) -> Config {
    let mut cfg = Config::defaults();
-    cfg.image.ocr.endpoint = Some(endpoint.to_string());
-    cfg.image.ocr.model = "gemma4:e4b".to_string();
-    cfg.image.ocr.languages = vec!["eng".to_string(), "kor".to_string()];
-    cfg.image.ocr.max_pixels = 1024;
+    cfg.ingest.image.ocr.endpoint = Some(endpoint.to_string());
+    cfg.ingest.image.ocr.model = "gemma4:e4b".to_string();
+    cfg.ingest.image.ocr.languages = vec!["eng".to_string(), "kor".to_string()];
+    cfg.ingest.image.ocr.max_pixels = 1024;
    cfg
 }

@@ -375,9 +375,9 @@ async fn ocr_integration_real_ollama_transcribes_text() {
    };
    let cfg = {
        let mut c = Config::defaults();
-        c.image.ocr.endpoint = Some(endpoint);
-        c.image.ocr.model = model;
-        c.image.ocr.max_pixels = 1024;
+        c.ingest.image.ocr.endpoint = Some(endpoint);
+        c.ingest.image.ocr.model = model;
+        c.ingest.image.ocr.max_pixels = 1024;
        c
    };
    let text = tokio::task::spawn_blocking(move || run_recognize(cfg, bytes, None))
--- a/docs/SMOKE.md
+++ b/docs/SMOKE.md
@@ -95,12 +95,14 @@ model_dir = "{data_dir}/models"
 runs_dir = "{data_dir}/runs"
 copy_threshold_mb = 100

-[indexing]
+# v0.28.0: 모든 형식 ingest 설정의 우산. 병렬도(← 옛 [indexing])는 [ingest] 스칼라로,
+# chunking/code/image/pdf 는 [ingest.*] 하위로 통합. 옛 v2 파일은 로드 시 자동 변환됨.
+[ingest]
 max_parallel_extractors = 2
 max_parallel_embeddings = 1
 watch_filesystem = false

-[chunking]
+[ingest.chunking]
 target_tokens = 500
 overlap_tokens = 80
 respect_markdown_headings = true
@@ -329,7 +331,7 @@ MCP tool 동등:
 [workspace]
 include = ["**/*.md", "**/*.png", "**/*.jpg"]

-[image.ocr]
+[ingest.image.ocr]
 enabled = true                        # vision LM 으로 이미지 안 텍스트 전사
 engine = "ollama-vision"
 model = "gemma4:e4b"                  # 사용자 환경의 비전 모델
@@ -337,12 +339,12 @@ endpoint = "http://192.168.0.47:11434"  # 비우면 models.llm.endpoint fallback
 languages = ["eng", "kor"]
 max_pixels = 1600                     # long-edge cap

-[image.caption]
+[ingest.image.caption]
 enabled = true                        # vision LM 으로 한 문장 객관 설명 생성
 max_pixels = 768
 prompt_template_version = "caption-v1"

-[pdf.ocr]
+[ingest.pdf.ocr]
 enabled = true               # smoke test 의 OCR path 활성화 (manual invoke)
 always_on = false
 engine = "ollama-vision"
@@ -358,13 +360,13 @@ lang_hint = "kor"

 이미지 자산 한 장당 OCR 1 호출 + Caption 1 호출 → ~3-6초 (`gemma4:e4b` 기준). 다이어그램 / 카메라 사진 / 스크린샷 위주 워크스페이스에 권장. 책 / 스캔본은 P7 PDF 라인으로.

-**v0.27.0 — paddle-onnx 엔진 (오프라인, Ollama 불필요).** `[image.ocr] engine = "paddle-onnx"` 로 바꾸면 PP-OCRv5 ONNX 를 in-process 로 실행한다 (원격 vision LM 불필요, 큰 페이지 CPU <4초). embedding 까지 끄려면 `[models.embedding] provider = "none"` (lexical-only) 로 두면 Ollama 없이 OCR→FTS5 검색 전체 경로를 스모크할 수 있다:
+**v0.27.0 — paddle-onnx 엔진 (오프라인, Ollama 불필요).** `[ingest.image.ocr] engine = "paddle-onnx"` 로 바꾸면 PP-OCRv5 ONNX 를 in-process 로 실행한다 (원격 vision LM 불필요, 큰 페이지 CPU <4초). embedding 까지 끄려면 `[models.embedding] provider = "none"` (lexical-only) 로 두면 Ollama 없이 OCR→FTS5 검색 전체 경로를 스모크할 수 있다:

 ```toml
 [models.embedding]
 provider = "none"            # lexical-only — Ollama 불필요

-[image.ocr]
+[ingest.image.ocr]
 enabled = true
 engine = "paddle-onnx"       # PP-OCRv5 ONNX in-process (Python/원격 0)
 model = "ppocrv5-mobile-kor"
--- a/docs/release-notes/v0.28.0-draft.md
+++ b/docs/release-notes/v0.28.0-draft.md
@@ -0,0 +1,66 @@
+---
+title: kebab v0.28.0 release notes (draft)
+created: 2026-06-04
+status: draft
+release_trigger:
+  - config 스키마 v2→v3 재편 (config 레이아웃 rename + 신규 env) — pre-1.0 minor bump
+  - frozen 설계 변경 (2026-06-04-config-schema-reorg-design)
+---
+
+# kebab v0.28.0 — config 스키마 v2→v3: 미디어 ingest 통합
+
+v0.27.0(PP-OCRv5 ONNX OCR) 후속 minor release. `config.toml` 에 흩어져
+있던 미디어 형식 설정을 **`[ingest.*]` 우산** 하나로 모은다. **기존 사용자는
+아무것도 손대지 않아도 된다** — 옛 v2 파일은 로드 시 메모리에서 자동 변환되고,
+검색·색인 결과는 바이트 단위로 동일하다(재색인 0).
+
+---
+
+## 변경 사실
+
+미디어 형식 설정이 top-level 에서 `[ingest.*]` 하위로 이동했다.
+
+| v2 (top-level) | v3 (`[ingest.*]`) |
+|---|---|
+| `[indexing]` (스칼라) | `[ingest]` 스칼라 (`max_parallel_extractors` 등) |
+| `[chunking]` | `[ingest.chunking]` |
+| `[image.ocr]` | `[ingest.image.ocr]` |
+| `[image.caption]` | `[ingest.image.caption]` |
+| `[pdf.ocr]` | `[ingest.pdf.ocr]` |
+| `[ingest.code]` | `[ingest.code]` (변화 없음) |
+
+부수적으로 `[ingest.pdf.ocr]` 가 paddle-onnx 모델 경로 키(`det_model`/`rec_model`/
+`dict`/`score_thresh`/`unclip_ratio`/`max_boxes`)를 PDF 자체적으로 가질 수 있게
+됐다(v2 는 image.ocr 의 값을 빌려썼다). 신규 env `KEBAB_PDF_OCR_*` 6키.
+
+## Trade-off
+
+비-additive **rename** 마이그레이션이라(첫 사례), 옛 섹션 이름을 그대로 두면
+serde 가 모르는 키로 무시될 수 있다. 이를 피하려고 두 겹의 안전장치를 깔았다:
+(1) `Config::from_file` 이 load 시 메모리에서 v3 로 자동 변환 — 미변환 v2 파일도
+설정 유실 0. (2) `kebab config migrate` 가 디스크 파일을 새 레이아웃으로 갱신하되
+값·주석·대안(commented) 줄을 전부 보존하고 멱등이다.
+
+env override 이름(예 `KEBAB_CHUNKING_TARGET_TOKENS`,
+`KEBAB_INDEXING_MAX_PARALLEL_EXTRACTORS`)은 **그대로 유지**된다 — 대입 대상만
+새 경로로 바뀌므로 기존 `KEBAB_*` 스크립트는 손댈 필요가 없다.
+
+## Mitigation — 재색인 0 보장
+
+`ingest_config_signature` 는 값 기반이라 struct 경로가 바뀌어도 출력 문자열이
+v2 와 **바이트 동일**하다. paddle 모델 경로는 미디어별(image/pdf)로 호출자가
+넘기도록 인자화했고, 마이그레이션이 v2 의 image↔pdf paddle 비대칭을 값 복사로
+보존한다. 도그푸딩(v0.28.0 release 빌드)에서 v2 config 로 first ingest 후
+(a) 동일 v2 config 재ingest(자동변환), (b) v3 로 `config migrate` 후 재ingest
+모두 `new=0 updated=0 unchanged=2` — 업그레이드 시 재색인이 발생하지 않음을
+실증했다.
+
+## Upgrade 절차
+
+1. 아무것도 안 해도 된다 — 옛 `config.toml` 은 그대로 로드된다(자동 변환,
+   디스크 미변경, 일회성 warn 으로 안내).
+2. 파일을 새 레이아웃으로 정리하려면: `kebab config migrate` (자동 `.bak` 백업,
+   `--dry-run` 으로 미리보기). `kebab doctor` 가 갱신 필요 시 안내한다.
+3. `kebab init` 으로 새로 만드는 config 는 v3 레이아웃 + per-option 주석 포함.
+
+검색·RAG 결과와 색인은 변하지 않는다.
--- a/docs/superpowers/plans/2026-06-04-config-schema-reorg.md
+++ b/docs/superpowers/plans/2026-06-04-config-schema-reorg.md
--- a/docs/superpowers/specs/2026-05-31-config-migration-design.md
+++ b/docs/superpowers/specs/2026-05-31-config-migration-design.md
@@ -266,3 +266,6 @@ config load 체크 직후 `config_migration` 체크 1개 추가:
 - kickoff 인계 문서와의 차이: kickoff §4.2 는 "버전별 변환 함수 체인"만 제안했으나,
  kebab 의 serde-default 특성상 additive 변경은 step 으로 표현하기 부적절(버전 무관) →
  **reconciliation 을 1급 메커니즘으로 승격**하고 step 은 non-additive 전용으로 한정.
+- 2026-06-04 v3 재편(첫 non-additive rename)에서 `step_2_to_3`(미디어 테이블
+  `[ingest.*]` relocation) + `Config::from_file` load 시 메모리 자동변환 추가 —
+  `docs/superpowers/specs/2026-06-04-config-schema-reorg-design.md`.
--- a/docs/superpowers/specs/2026-06-04-config-schema-reorg-design.md
+++ b/docs/superpowers/specs/2026-06-04-config-schema-reorg-design.md
@@ -0,0 +1,297 @@
+# config 스키마 재편 (v2 → v3): 미디어별 `[ingest]` 통합 + per-option 주석
+
+- 상태: 설계 확정 (brainstorming 완료)
+- 작성일: 2026-06-04
+- 선행: `docs/superpowers/specs/2026-05-31-config-migration-design.md` (마이그레이션 엔진), `#197`(엔진), `#198`(`kebab config migrate` surface)
+- 영향 crate: `kebab-config`(스키마+마이그레이션), `kebab-app`(call-site sweep + signature), `kebab-eval`(config_snapshot), `kebab-cli`(`config migrate`/`init` 출력)
+- contract_sections: design §6 (Config schema / XDG), §9 (versioning cascade — signature 불변 보장)
+
+## 1. 동기
+
+옵션이 누적되며 `config.toml`(13 섹션 / ~60 필드)이 다음 군더더기를 갖게 됨:
+
+1. **OCR 중복·비대칭** — `[image.ocr]` 와 `[pdf.ocr]` 가 `enabled/engine/model/endpoint/languages/max_pixels/request_timeout_secs` 를 거의 그대로 중복. 게다가 paddle-onnx 모델 경로(`det_model`/`rec_model`/`dict`/`score_thresh`/`unclip_ratio`/`max_boxes`)는 `[image.ocr]` 에만 존재하고 PDF paddle 경로가 거기를 참조(`kebab-app/src/lib.rs:3102` `ocr_engine_version_for_sig` 가 `config.image.ocr` 를 읽음) — "pdf 설정인데 image 밑을 봐야 하는" 숨은 비대칭.
+2. **미디어별 설정 산재** — 이미지 `[image]`, PDF `[pdf]`, 코드 `[ingest.code]`, 청킹 `[chunking]`. "형식 X 설정이 어디 있나"의 규칙이 없음.
+3. **`endpoint` 4중복** — `models.llm`/`models.embedding`/`image.ocr`/`pdf.ocr`. "비우면 `models.llm.endpoint` fallback" 규칙이 코드에만 있고 파일엔 안 보임. (단, **컴포넌트별 endpoint 는 실사용 중** — embedding 로컬 + llm 원격 — 이므로 통합 금지.)
+4. **`request_timeout_secs` 3중복** + 각각 "`0` 은 비활성화 아님" 함정.
+5. **`kebab init` 이 60+ 필드 일괄 방출** — 실제 사용자가 만지는 건 `workspace.root`/endpoint/모델명 정도.
+6. 사용자 실파일에서 추가 관찰: `score_gate = 0.30000001192092896`(f32→f64 직렬화 찌꺼기), `engine="paddle-onnx"` 인데 `model="gemma4:e4b"` 가 남는 죽은 필드.
+
+## 2. 목표 / 비목표
+
+**목표**
+
+- 미디어 형식 설정을 `[ingest.*]` 한 우산 아래로 일관 배치 (향후 새 형식 = `[ingest.<형식>]` 한 곳 추가).
+- OCR 비대칭 제거: image·pdf 가 **각자 OCR 전체(paddle 경로 포함)를 독립 보유**(완전 대칭).
+- **무손실 변환**: 기존 v2 파일의 모든 값·주석·순서·사용자 대안 주석 줄을 보존.
+- **per-option 주석**: 각 키 옆 한 줄 설명을 `kebab init` 출력과 신규 추가 키에 부착.
+- 업그레이드 시 **불필요한 재색인 0** (parser_version signature 불변).
+- env override 이름 **무파손**.
+
+**비목표 (YAGNI)**
+
+- config 값 의미 검증(범위 체크 등) — 별개.
+- 다운그레이드(v3→v2).
+- 노브 숨기기/축소 — 명시적으로 제외(사용자가 "온전한 변환" 선택). 전 옵션을 잘 문서화한 완전체 유지.
+- endpoint 통합 — 컴포넌트별 override 유지(실사용).
+- **load 시 파일 자동 쓰기** — 여전히 비목표(2026-05-31 spec 계승). 단 §5.3 의 *메모리 내* 변환은 쓰기가 아니므로 별개로 허용.
+
+## 3. 새 스키마 (v3)
+
+per-option 주석을 부착한 `kebab init` 출력 형태(값은 기본값):
+
+```toml
+# kebab config — `~/.config/kebab/config.toml`.
+# (헤더: workspace.root 경로 규칙 / 지원 형식 / KEBAB_* override — 기존 헤더 계승)
+schema_version = 3
+
+# 색인 대상 워크스페이스.
+[workspace]
+root = "~/KnowledgeBase"      # 색인 루트. 절대/~/${VAR}/상대(=이 파일 기준).
+exclude = [".git/**", "node_modules/**", ".obsidian/**"]  # denylist glob.
+
+# XDG 저장 경로(데이터/sqlite/벡터/에셋/모델).
+[storage]
+data_dir = "${XDG_DATA_HOME:-~/.local/share}/kebab"  # 모든 산출물 루트.
+sqlite = "{data_dir}/kebab.sqlite"   # 메타·FTS5 DB.
+vector_dir = "{data_dir}/lancedb"    # 임베딩 벡터 스토어.
+asset_dir = "{data_dir}/assets"      # 원본 사본(_external 등).
+artifact_dir = "{data_dir}/artifacts"
+model_dir = "{data_dir}/models"      # fastembed/candle/nli 모델 캐시.
+runs_dir = "{data_dir}/runs"         # eval run 산출.
+copy_threshold_mb = 100              # 이 크기 초과 파일은 사본 대신 참조.
+
+# 다국어 sentence embedding. dim 불일치 시 검색 0건.
+[models.embedding]
+provider = "fastembed"   # fastembed | candle | ollama | none.
+model = "multilingual-e5-large"
+version = "v1"           # 모델 정체성 일부(캐시 키). 모델 바꾸면 함께 갱신.
+dimensions = 1024        # 모델 출력 차원. 틀리면 검색 0건.
+batch_size = 64
+num_threads = 0          # candle 전용 CPU 스레드 cap(0=auto). NUMA 회피 레버.
+# endpoint = "..."       # ollama provider 시 HTTP. 비우면 models.llm.endpoint fallback.
+
+# Ollama host:port + 모델.
+[models.llm]
+provider = "ollama"
+model = "gemma4:e4b"
+context_tokens = 32768
+endpoint = "http://127.0.0.1:11434"
+temperature = 0.0
+seed = 0
+request_timeout_secs = 300   # 단일 HTTP 상한. 0=즉시실패(비활성화 아님). 대형모델 CPU면 ↑.
+
+# NLI(groundedness) 모델.
+[models.nli]
+model = "Xenova/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7"
+provider = "onnx"
+
+# 색인 공통(병렬도 + 파일시스템 watch).   ← 기존 [indexing]
+[ingest]
+max_parallel_extractors = 2
+max_parallel_embeddings = 1
+watch_filesystem = false
+
+# 청크 크기·오버랩·heading 존중 (markdown/pdf/code/image 모든 형식 공통).  ← 기존 [chunking]
+[ingest.chunking]
+target_tokens = 500
+overlap_tokens = 80
+respect_markdown_headings = true
+chunker_version = "md-heading-v1"
+
+# code ingest skip 정책(.gitignore 자동 honor).
+[ingest.code]
+skip_generated_header = true
+max_file_bytes = 262144
+max_file_lines = 5000
+extra_skip_globs = []
+ast_chunk_max_lines = 200
+fallback_lines_per_chunk = 80
+fallback_lines_overlap = 20
+
+# 이미지 OCR(기본 off, asset 당 비용).   ← 기존 [image.ocr]
+[ingest.image.ocr]
+enabled = false
+engine = "ollama-vision"     # ollama-vision | paddle-onnx.
+model = "gemma4:e4b"         # ollama-vision 전용. paddle-onnx 는 번들 모델 사용(이 값 무시).
+languages = ["eng", "kor"]
+max_pixels = 1600
+request_timeout_secs = 300   # 0=즉시실패(비활성화 아님).
+# --- paddle-onnx 전용(engine=paddle-onnx 일 때만) ---
+# det_model = "..."          # 비우면 번들 ppocrv5_mobile_det.onnx.
+# rec_model = "..."          # 비우면 번들 korean rec.
+# dict = "..."               # 비우면 번들 korean_dict.txt.
+score_thresh = 0.3           # DBNet box 점수 하한.
+unclip_ratio = 1.5           # box 패딩 비율.
+max_boxes = 1000             # 이미지당 box cap(runaway guard).
+
+# 이미지 캡션(기본 off).   ← 기존 [image.caption]
+[ingest.image.caption]
+enabled = false
+max_pixels = 768
+prompt_template_version = "caption-v1"
+
+# scanned PDF page-단위 OCR(기본 off, page 당 비용).   ← 기존 [pdf.ocr]
+[ingest.pdf.ocr]
+enabled = false
+always_on = false            # true=모든 page vision 호출(vector PDF dual-text).
+engine = "ollama-vision"     # ollama-vision | paddle-onnx.
+model = "qwen2.5vl:3b"       # ollama-vision 전용. paddle-onnx 는 번들 모델 사용.
+languages = ["eng", "kor"]
+max_pixels = 2048
+request_timeout_secs = 180   # 0=즉시실패(비활성화 아님).
+valid_ratio_threshold = 0.5  # 유효문자 비율 < 이면 scanned 로 판정→OCR fallback.
+min_char_count = 20          # page 문자수 < 이면 auto-scanned.
+lang_hint = "kor"            # 단일 page lang hint(비우면 없음).
+# --- paddle-onnx 전용(대칭 신규) ---
+# det_model / rec_model / dict = "..."   # 비우면 번들.
+score_thresh = 0.3
+unclip_ratio = 1.5
+max_boxes = 1000
+
+# 검색 기본 k·stale 기준·fusion.
+[search]
+default_k = 10
+hybrid_fusion = "rrf"
+rrf_k = 60
+snippet_chars = 220
+cache_capacity = 256
+stale_threshold_days = 30
+
+# 답변 생성: prompt 템플릿·score gate·multi-hop·NLI.
+[rag]
+prompt_template_version = "rag-v3"
+score_gate = 0.3             # serialize_with 헬퍼로 직렬화 깔끔(기존 f32 찌꺼기 제거).
+explain_default = false
+max_context_tokens = 8000
+multi_hop_max_depth = 3
+multi_hop_max_sub_queries_per_iter = 5
+multi_hop_max_pool_chunks = 15
+nli_threshold = 0.0          # 0=NLI 게이트 off.
+
+# TUI 팔레트.
+[ui]
+theme = "dark"
+
+# ingest 로그(기본 on).
+[logging]
+ingest_log_enabled = true
+ingest_log_dir = "{state_dir}/logs"
+keep_recent_runs = 100
+retention_days = 30
+```
+
+## 4. 필드 매핑 (v2 → v3)
+
+| v2 위치 | v3 위치 | 비고 |
+|---------|---------|------|
+| `[workspace]` `[storage]` `[search]` `[rag]` `[ui]` `[logging]` `[models.*]` | 동일 | 변경 없음 |
+| `[indexing].*` (3키) | `[ingest].*` (bare 키) | `IndexingCfg` 해체 → `IngestCfg` 스칼라 |
+| `[chunking]` | `[ingest.chunking]` | 이름 의도적으로 `markdown` 아님(전 형식 공통) |
+| `[ingest.code]` | `[ingest.code]` | 이미 nested — 무이동 |
+| `[image.ocr]` | `[ingest.image.ocr]` | 키 동일 |
+| `[image.caption]` | `[ingest.image.caption]` | 키 동일 |
+| `[pdf.ocr]` | `[ingest.pdf.ocr]` | 키 동일 + **paddle 6키 대칭 신규** (`det_model`/`rec_model`/`dict`/`score_thresh`/`unclip_ratio`/`max_boxes`) |
+
+신규 키(pdf paddle 대칭)는 모두 `#[serde(default)]` + `Option`/기본값 → v2 파일에 없어도 무해.
+
+## 5. Rust 구조 변경 (`kebab-config/src/lib.rs`)
+
+### 5.1 구조체
+
+```rust
+pub struct Config {
+    pub schema_version: u32,
+    pub workspace: WorkspaceCfg,
+    pub storage: StorageCfg,
+    pub models: ModelsCfg,
+    pub ingest: IngestCfg,   // ← indexing/chunking/image/pdf 흡수
+    pub search: SearchCfg,
+    pub rag: RagCfg,
+    pub ui: UiCfg,
+    pub logging: LoggingCfg,
+    #[serde(skip)] source_dir: Option<PathBuf>,
+}
+
+pub struct IngestCfg {
+    // ← 기존 IndexingCfg (스칼라 먼저: toml 직렬화는 스칼라가 테이블보다 앞)
+    pub max_parallel_extractors: u32,
+    pub max_parallel_embeddings: u32,
+    pub watch_filesystem: bool,
+    // 하위 테이블
+    pub chunking: ChunkingCfg,
+    pub code: IngestCodeCfg,
+    pub image: ImageCfg,     // { ocr: OcrCfg, caption: CaptionCfg }
+    pub pdf: PdfCfg,         // { ocr: PdfOcrCfg }
+}
+```
+
+- `IndexingCfg` 구조체 삭제(스칼라로 흡수). `ChunkingCfg`/`ImageCfg`/`OcrCfg`/`CaptionCfg`/`PdfCfg`/`IngestCodeCfg` **내부 필드 불변**(부모 경로만 이동).
+- `PdfOcrCfg` 에 paddle 6키 대칭 추가.
+- 제거된 top-level 필드: `indexing`/`chunking`/`image`/`pdf`.
+- 스칼라-우선 필드 순서로 `defaults_are_serde_roundtrip_stable` 유지.
+
+### 5.2 call-site sweep (~65곳, 7 src 파일)
+
+기계적 치환: `config.chunking.X`→`config.ingest.chunking.X`, `config.image.ocr`→`config.ingest.image.ocr`, `config.pdf.ocr`→`config.ingest.pdf.ocr`, `config.indexing.X`→`config.ingest.X`. 대상: `kebab-app/src/{lib.rs,app.rs,schema.rs}`, `kebab-eval/src/runner.rs`. `kebab-parse-image` 는 leaf 구조체(`&OcrCfg` 등) 직접 수령 → 무영향(확인됨).
+
+### 5.3 load 시 메모리 내 자동 변환 (정합성 필수)
+
+v3 는 최초의 **non-additive rename** 이라, 미변환 v2 파일을 v3 struct 로 deserialize 하면 `[chunking]`/`[image]`/`[pdf]`/`[indexing]` 을 못 찾아 **사용자 설정이 조용히 기본값으로 유실**. (이전 마이그레이션은 전부 additive 라 serde default 로 load 호환됐음 — 이 가정이 v3 에서 처음 깨짐.)
+
+→ `Config::from_file` 변경: 텍스트의 `schema_version < CURRENT` (또는 legacy 테이블 탐지) 시 `migrate::migrate_document(text)` 를 **메모리에서** 적용한 `new_text` 를 deserialize. **디스크 쓰기 없음**(파일 갱신은 여전히 `kebab config migrate` 전용 — 2026-05-31 spec 의 "자동 쓰기 비목표" 계승; 메모리 변환은 쓰기가 아니므로 무충돌). 1회성 `tracing::warn!`: "config 가 schema vN 입니다 — 이번 실행은 메모리에서 v3 로 변환됨. 파일 갱신은 `kebab config migrate`."
+
+- parse 실패 시 `migrate_document` 는 입력 그대로 반환 → 기존 `ConfigInvalid` 경로 유지.
+- `source_dir` stamp 는 변환 후 동일하게 `path.parent()`.
+
+## 6. 마이그레이션 `step_2_to_3` (`kebab-config/src/migrate.rs`)
+
+`run_steps` 에 `if from < 3 { step_2_to_3(doc, changes) }` 추가. `step_2_to_3` 는 **테이블 relocation**(toml_edit, 값·키주석·순서 보존):
+
+1. `[indexing]` 의 3키 → `[ingest]` bare 키로 이동. 원 `[indexing]` 제거.
+2. `[chunking]` 테이블 → `[ingest.chunking]` 로 이동(통째). 원 제거.
+3. `[image.ocr]`→`[ingest.image.ocr]`, `[image.caption]`→`[ingest.image.caption]`. 원 `[image]` 제거.
+4. `[pdf.ocr]`→`[ingest.pdf.ocr]`. 원 `[pdf]` 제거.
+5. **pdf paddle 동작 보존(중요)** — v2 는 pdf 가 paddle 일 때 `image.ocr` 의 paddle 값(`det_model`/`rec_model`/`dict`/`score_thresh`/`unclip_ratio`/`max_boxes`)을 빌려 썼다(§1 비대칭). 따라서 이동 직후 **`[image.ocr]` 의 이 6키 실제 값을 `[ingest.pdf.ocr]` 의 대칭 키로 복사**한다(사용자가 image paddle 을 튜닝한 경우까지 동작 동일 보장). 사용자가 둘 다 기본이면 복사값=기본값이라 무차. 복사는 사용자가 `[pdf.ocr]` 에 해당 키를 이미 명시한 경우엔 덮어쓰지 않음.
+6. 기존 `[ingest.code]` 는 그대로(이미 올바른 위치). 단 `[ingest]` 가 새로 bare 키를 받으므로 직렬화 순서 정합 확인.
+
+이동은 **user item 의 decor(값 뒤 인라인 주석 + 사용자 대안 주석 줄)를 동반**해야 함 — toml_edit 에서 `Table::remove` 로 떼어낸 `Item` 을 새 부모에 `insert`. 멱등(이미 v3 형태면 no-op).
+
+이동 후 기존 `reconcile(annotated_default, doc)` 가:
+- 빠진 키(특히 pdf paddle 대칭 6키) 를 주석과 함께 추가.
+- `schema_version` stamp → 3.
+
+`CURRENT_SCHEMA_VERSION: u32 = 3` 으로 bump.
+
+## 7. per-option 주석 인프라
+
+- `key_comment(path: &str) -> Option<&'static str>` 신설 (`section_comment` 자매). dotted leaf 경로(`ingest.chunking.target_tokens` 등) → 한 줄.
+- `annotate_table` 확장: 스칼라 leaf 에도 `key_comment` 가 있으면 인라인/prefix 주석 부착.
+- **부착 범위**: `annotated_default_document`(=`kebab init` + reconcile 참조원) 의 모든 키. reconcile 가 **새로 추가하는** 키만 주석 동반(기존 사용자 키는 값 불가침 → 주석 미주입, 사용자 대안 주석 보존).
+- §3 의 모든 키 주석 텍스트를 `key_comment` 에 등재(구현 시 일괄).
+
+## 8. 불변식 / 회귀 가드
+
+1. **signature 불변** — `ingest_config_signature`(lib.rs:3129) 출력 문자열이 v2 바이너리와 **바이트 동일**. 값 기반이라 struct 경로 변경과 무관해야 함. `ocr_engine_version_for_sig` 가 읽는 paddle 경로 소스를 image signature 는 `config.ingest.image.ocr` 로, **pdf signature 는 `config.ingest.pdf.ocr` 의 신규 대칭 키**로 갱신. 동작 보존은 §6.5 의 값 복사(image paddle 값 → pdf 대칭 키)로 성립 — 마이그레이션된 파일에서 pdf 대칭 키 = v2 시절 image 값이므로 signature 동일. 골든 문자열 회귀 테스트 필수.
+2. **env 이름 보존** — `apply_env` whitelist 의 LHS(키 문자열) 전부 그대로, RHS(대입 대상)만 새 struct 경로. 신규 pdf paddle 키만 `KEBAB_PDF_OCR_{DET_MODEL,REC_MODEL,DICT,SCORE_THRESH,UNCLIP_RATIO,MAX_BOXES}` 추가. 기존 env 테스트 전부 green 유지.
+3. **무손실 골든** — 사용자 실제 v2 config(첨부본; `score_gate` 찌꺼기·주석 대안 줄 포함)를 fixture 로: `migrate_document` → (a) 모든 사용자 값 보존, (b) 사용자 주석/대안 줄 보존, (c) `[ingest.image.ocr]` 등 신 위치 존재, (d) 결과가 v3 `Config` 로 parse 되고 값이 원 의미와 동일, (e) 재실행 멱등.
+4. **load 자동변환** — v2 텍스트를 `Config::from_file` 로 읽으면(디스크 미변경) `config.ingest.chunking.target_tokens` 등이 사용자 값으로 채워짐(기본값 유실 없음) 테스트.
+5. **float 직렬화 정리** — `Config::defaults()` 직렬화에 `0.30000001192092896` 부재, `score_gate = 0.3`. 구현: f32 필드에 `#[serde(serialize_with = "ser_f32_clean")]`(f32 Display 의 shortest round-trip 을 f64 로 재파싱해 직렬화) — struct 타입·호출부 무변경, kebab-config 국소. 사용자 기존 파일의 찌꺼기 값은 toml_edit 보존(값 불가침)이라 그대로 — 재생성 시에만 정리됨(비목표 §2 정합).
+
+## 9. 버전 / 문서 cascade
+
+- **minor bump** (인터페이스 변경: config 섹션 rename + 신규 키). `Cargo.toml` workspace version.
+- **schema_version 2→3** (위).
+- **도그푸딩 필수**(CLAUDE.md Dogfood trigger: CLI/config surface) — `kebab config migrate` 를 실제 v2 파일(첨부본)에 돌려 무손실 + 자동변환 + 재색인 0 확인. evidence → HOTFIXES + release notes.
+- **문서 동기화(같은 PR)**: README Configuration 섹션 + `docs/SMOKE.md` config 예시 블록(새 레이아웃) + HOTFIXES dated entry + `2026-05-31-config-migration-design.md` 의 Risks/notes 에 v3 rename 교차링크.
+
+## 10. 리스크
+
+| 리스크 | 완화 |
+|--------|------|
+| 테이블 이동 시 주석 유실 | toml_edit `remove`→`insert` 로 `Item` 통째 이동, 골든 테스트(§8.3) |
+| signature 변동→전체 재색인 | 골든 문자열 회귀 테스트(§8.1), 값 포맷 보존 |
+| pdf paddle 대칭 추가가 기존 pdf paddle 동작 변경 | §6.5 마이그레이션이 image paddle 6키 실제 값을 pdf 대칭 키로 복사 → 동작·signature 동일(§8.1) |
+| call-site 누락 | 컴파일러가 강제(필드 제거→ 미수정 site 컴파일 에러), clippy gate |
+| 메모리 자동변환 매 load 비용 | toml_edit parse 1회/실행, 무시 가능 |
+```
--- a/tasks/HOTFIXES.md
+++ b/tasks/HOTFIXES.md
@@ -14,6 +14,35 @@ historical contract that was implemented; this file accumulates the
 deltas so phase 5+ readers can find the live behavior without diffing
 git history.

+## 2026-06-04 — config 스키마 v2→v3 재편: 미디어 ingest 통합 (v0.28.0)
+
+**무엇을 바꿨나.** `config.toml` 의 미디어 형식 설정을 `[ingest.*]` 우산 아래로 통합했다. 첫 non-additive rename 마이그레이션.
+
+rename 매핑:
+
+| v2 (top-level) | v3 (`[ingest.*]`) |
+|---|---|
+| `[indexing]` (스칼라 키) | `[ingest]` 스칼라 (`max_parallel_extractors`/`max_parallel_embeddings`/`watch_filesystem`) |
+| `[chunking]` | `[ingest.chunking]` |
+| `[image.ocr]` | `[ingest.image.ocr]` |
+| `[image.caption]` | `[ingest.image.caption]` |
+| `[pdf.ocr]` | `[ingest.pdf.ocr]` |
+| `[ingest.code]` | `[ingest.code]` (불변) |
+
+**보장한 3가지 불변식.**
+
+1. **signature 바이트 불변** — `ingest_config_signature` 출력은 값 기반이라 struct 경로 재편 후에도 v2 와 바이트 동일. 업그레이드 시 전체 재색인 발생 안 함. paddle 경로(det/rec/dict)는 미디어별로 호출자가 넘기도록 인자화(`ocr_engine_version_for_sig` + `engine_version_for_paths`); v2 의 "pdf 가 image paddle 을 빌려쓰던" 비대칭은 `step_2_to_3` 의 값 복사(`copy_image_paddle_to_pdf`)로 보존.
+2. **env override 이름 100% 보존** — `apply_env` whitelist 의 키 문자열(LHS, 예 `KEBAB_CHUNKING_TARGET_TOKENS`/`KEBAB_INDEXING_MAX_PARALLEL_EXTRACTORS`)은 불변, 대입 대상(RHS)만 `self.ingest.*` 로. 기존 `KEBAB_*` 스크립트 무파손. 신규 `KEBAB_PDF_OCR_{DET_MODEL,REC_MODEL,DICT,SCORE_THRESH,UNCLIP_RATIO,MAX_BOXES}` 6키 추가(image.ocr paddle 대칭).
+3. **load 시 메모리 내 자동 변환** — `Config::from_file` 이 `schema_version < 3` 파일을 디스크 미변경으로 메모리에서 v3 변환(`migrate_document` 경유). 미변환 v2 파일도 설정 유실 0. 파일 갱신은 `kebab config migrate` (값·주석·대안 줄 보존, 멱등).
+
+`PdfOcrCfg` 에 paddle 대칭 6키 추가. `ser_f32_clean` 으로 f32 직렬화 정리(`0.30000001192092896`→`0.3`). per-option 인라인 주석(`key_comment`)을 init/migrate 산출 config 에 부착.
+
+**도그푸딩 (v0.28.0 release 빌드).**
+
+1. **사용자 실제 v2 config 변환** (`/build/dogfood/config-v3-test/`): `kebab config migrate` → `v2 → v3 (11 changes)`, `.bak` 백업. schema_version 2→3, 섹션 헤더만 [ingest.*] 로 이동(`[indexing]`/`[chunking]`/`[image.ocr]`/`[image.caption]`/`[pdf.ocr]` → `[ingest.*]`). 사용자 값 보존(`root`, `model = "snowflake-arctic-embed2"`, `endpoint = "http://192.168.0.2:11943"`, `score_gate = 0.30000001192092896` 그대로) + 대안 주석 보존(`# engine = "ollama-vision"`, `# provider = "candle"`). 재실행 멱등(`config 이미 최신입니다`).
+
+2. **재색인 0 실증** (`/build/dogfood/config-v3-reindex/`, lexical-only `provider = "none"`): v2 config(디스크 schema_version=2)로 first ingest `new=2`. (a) 동일 v2 config 재ingest(메모리 자동변환) → `new=0 updated=0 unchanged=2`. (b) 디스크 파일을 v3 로 `config migrate` 후 재ingest → `new=0 updated=0 unchanged=2`. signature 바이트 불변이 실제 업그레이드 경로(v2 자동변환 ↔ v3 디스크)에서 재색인 0 으로 확인됨 — 불변식 #1 검증.
+
 ## 2026-06-04 — PP-OCRv5 ONNX Rust 네이티브 OCR 엔진 (v0.27.0)

 **무엇을 추가했나.** 이미지 OCR 에 두 번째 백엔드 `paddle-onnx` 를 붙였다. 기존 `ollama-vision`
Author	SHA1	Message	Date
altair823	403e162ac0	Merge pull request 'feat(config): config.toml v2→v3 스키마 재편 — 미디어 [ingest.*] 통합 + 무손실 자동 마이그레이션' (#207 ) from feat/config-schema-reorg into main Reviewed-on: #207	2026-06-04 14:36:41 +00:00
altair823	fdf09c369c	refactor(config): PR #207 회차 1 반영 — from_file toml::Value 단일 파싱	2026-06-04 13:33:07 +00:00
altair823	e7b58017fd	docs(config): v3 재편 도그푸딩 evidence + release notes 도그푸딩(release 빌드): 사용자 실제 v2 config 변환(값·주석 보존·멱등) + 재색인 0 실증(v2 자동변환·v3 디스크 양 경로 unchanged). v0.28.0 release notes draft(변경/trade-off/mitigation/upgrade 4단락). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 13:12:01 +00:00
altair823	90812e981f	docs(config): v3 재편 surface 동기화 + minor version bump 0.27.0→0.28.0 README Configuration([ingest.*] 레이아웃 + migrate 안내), SMOKE config 예시, HOTFIXES dated entry(rename 매핑 + 3 불변식), 선행 마이그레이션 spec 교차링크. 인터페이스 변경(config 레이아웃 rename + env 추가) = minor. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 13:03:07 +00:00
altair823	15e6918cef	feat(config): env 이름 보존 RHS 갱신 + pdf paddle 신규 env 6키 apply_env whitelist 의 키 문자열(LHS) 전부 불변, 대입 대상만 self.ingest.* (불변식 #2). KEBAB_PDF_OCR_{DET_MODEL,REC_MODEL,DICT,SCORE_THRESH, UNCLIP_RATIO,MAX_BOXES} 신규(image.ocr paddle 패턴 대칭). 게이트: clippy --workspace --all-targets 0, kebab-config/app/eval 테스트 green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 12:56:25 +00:00
altair823	a8ec354188	test(config): v3 무손실 골든 — 사용자 실제 v2 config relocation+멱등 사용자 실제 config(주석·대안 줄·score_gate=0.3000…1192 포함)를 fixture 로. 값·주석 보존 + v3 파싱 일치 + 멱등 검증. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 12:48:52 +00:00
altair823	2686a4f27d	feat(config): from_file load 시 v2→v3 메모리 내 자동 변환(디스크 미변경) schema_version < CURRENT 이면 migrate_document 경유로 메모리에서 변환 후 파싱. 디스크 파일은 불변(갱신은 kebab config migrate). 일회성 warn. 불변식 #3. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 12:47:59 +00:00
altair823	25e94feab8	feat(config): step_2_to_3 — 미디어 테이블 [ingest.] relocation + pdf paddle 값 보존 move_table(decor 포함 통째 이동) + move_indexing_keys(병렬도 키) + copy_image_paddle_to_pdf(v2 비대칭 보존). CURRENT_SCHEMA_VERSION=3. section_comment 를 ingest. 경로로 갱신. 멱등. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 12:47:14 +00:00
altair823	7b7330cdf2	feat(config): per-option 인라인 주석(key_comment) — init/reconcile 부착 annotate_table 의 leaf 분기 추가: 스칼라/배열 키 값 뒤에 한 줄 주석 suffix. dotted path → 주석 매핑(workspace.root, ocr.model, request_timeout 등). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 12:45:46 +00:00
altair823	3d45994693	refactor(config): signature paddle 경로 미디어화 + 바이트 불변 골든 ocr_engine_version_for_sig 가 det/rec/dict 를 호출자(미디어별)로부터 받도록 인자화 — image 는 [ingest.image.ocr], pdf 는 [ingest.pdf.ocr]. v2 의 pdf↔image paddle 비대칭 제거. engine_version_for_paths 신설(kebab-parse-image). 출력 문자열은 값 기반이라 v2 와 바이트 동일(불변식 #1). test seam + 골든 추가. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 12:44:27 +00:00
altair823	d5c69f6715	refactor(config): v3 경로 call-site sweep (kebab-app/kebab-eval/kebab-parse-image) 부모 경로에 .ingest 삽입(leaf 구조체 불변). src + 테스트 call-site 전부. kebab-cli 테스트의 v2 TOML fixture 는 from_file 자동변환(T6) 경로 검증용으로 유지. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 12:40:06 +00:00
altair823	148c8b7040	refactor(config): v3 레이아웃 — 미디어 ingest 통합 + pdf paddle 대칭 + float 직렬화 Config 의 indexing/chunking/image/pdf top-level 필드를 ingest: IngestCfg 하나로 통합. leaf 구조체는 불변, 부모 경로만 [ingest.] 하위로 이동. PdfOcrCfg 에 paddle 대칭 6키(det/rec/dict/score_thresh/unclip_ratio/ max_boxes) 추가. ser_f32_clean 으로 f32 직렬화 정리(0.3000000119→0.3). apply_env RHS 를 self.ingest. 로 갱신(env 키 문자열 LHS 불변). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 12:37:09 +00:00
altair823	898cdaa043	docs(config): v3 스키마 재편 설계 + 구현 계획	2026-06-04 12:25:49 +00:00