Merge pull request 'feat(ingest): asset 내부 phase 진행 로깅 (asset_chunked/expansion_progress/asset_timings)' (#201 ) from feat/ingest-progress-detail into main

Reviewed-on: #201
fix(ingest-progress): 리뷰 반영 — store_ms 경계 정정 + 중복 expansion 프레임 가드
2026-06-02 17:26:47 +00:00 · 2026-06-02 14:49:02 +00:00 · 2026-06-02 14:13:36 +00:00 · 2026-06-02 13:58:33 +00:00 · 2026-06-02 13:58:27 +00:00 · 2026-06-02 12:25:45 +00:00
36 changed files with 4735 additions and 112 deletions
--- a/Cargo.lock
+++ b/Cargo.lock
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -11,6 +11,7 @@ members = [
    "crates/kebab-search",
    "crates/kebab-embed",
    "crates/kebab-embed-local",
+    "crates/kebab-embed-candle",
    "crates/kebab-llm",
    "crates/kebab-llm-local",
    "crates/kebab-rag",
@@ -30,7 +31,7 @@ edition       = "2024"
 rust-version  = "1.85"
 license       = "MIT OR Apache-2.0"
 repository    = "https://github.com/altair823/kebab"
-version       = "0.21.0"   # v0.21.0 — doc-side expansion 별칭(개별 dense 벡터) + 파생물 캐시(V012, 내용 해시 키) — CLAUDE.md §Release 도그푸딩 트리거
+version       = "0.24.0"   # v0.24.0 — 상세 ingest 진행 로깅: 신규 wire 이벤트 asset_chunked / expansion_progress / asset_timings (ingest_progress.v1 additive), CLI 진행바 sub-message + phase timing 한 줄. asset 내부 parse/chunk/expansion/embed/store 가시화. wire v1 backward-compat. — CLAUDE.md §Release

 # pre-v0.18 workspace-wide cleanup: enable clippy::pedantic group with
 # intentional allow-list. The allowed lints are either cosmetic (doc style),
--- a/HANDOFF.md
+++ b/HANDOFF.md
@@ -30,6 +30,9 @@ P0~P5 직렬. P6~P9 P5 이후 병렬 가능.

 ## 머지 후 발견된 버그 / 결정 (요약)

+- **candle 임베딩 백엔드 다변화** (2026-06-01, Track 1, v0.22.0): `provider = "candle"` opt-in 추가 — 같은 `multilingual-e5-large` 모델을 순수 Rust(candle)로 돌려 듀얼소켓 NUMA 서버의 onnxruntime 48-스레드 double-free 를 회피. `[models.embedding].num_threads`(+env `KEBAB_EMBED_THREADS`)로 CPU 스레드 캡. fastembed default 동작·벡터 불변, `embedding_version` 유지(재색인 0). Phase 0 스파이크 패리티 cosine 1.000000. 상세 HOTFIXES 동일 일자.
+- **config 마이그레이션** (2026-05-31, PR #198): `kebab config migrate` 추가 — 기존 config.toml 에 빠진 섹션을 주석과 함께 채우고 deprecated 정리(멱등·`.bak`·dry-run, 값/주석 보존). `schema_version` 1→2, `init` 도 섹션 주석 포함, doctor 에 `config_migration` 체크. 상세 HOTFIXES 동일 일자.
+
 머지 후 발견된 모든 deviation / hotfix 의 dated 로그는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md). 본 요약은 \"누군가가 인수받을 때 알아두면 시간을 많이 절약하는\" 항목만:

 - **2026-05-31 Phase 2 doc-side expansion 별칭(개별 dense 벡터) + 파생물 캐시(V012)** — v0.21.0 cut. 색인 시 LLM 이 청크별 별칭("같은 의미 다른 표현")을 생성, 줄별 **개별 dense 벡터**(sentinel `{chunk}#alias#N`)로 색인 (묶음 1벡터는 평균화 희석으로 회귀 → 폐기) + boilerplate 청크 skip. `[ingest.expansion]` default off. 측정(나무위키 ~1000 문서 CS corpus): 변형 일관성 14/18 → **16/18**, spread 0.222→0.111, 대조군 false-positive 별칭 무죄. 비용 병목(별칭 18문서 2.5h)은 **파생물 캐시(V012, 청크 내용 해시 키)**로 해소 — 정답 3개 cold 1879s → warm 13s **≈ 145배**, embedding+별칭 LLM 캐싱, version_key cascade 정합. search/ask 가 `kebab.sqlite`+`lancedb` 만으로 동작 → 외부 서버 색인 후 DB 만 복사하는 이식 워크플로 가능. **결정/known limitation**: grounded/refusal 판정이 부분 인용을 grounded 로 오분류(정직한 거부가 false-positive 로 집계) — 별도 개선 후보. stack·svm 설명형 2개 잔존. 자세한 내용: `tasks/HOTFIXES.md` (2026-05-31), 측정: `docs/superpowers/handoffs/2026-05-31-namu-wiki-alias-cache-study.md`.
--- a/README.md
+++ b/README.md
@@ -51,7 +51,24 @@ embedding 벡터와 별칭 LLM 결과를 청크 **내용 해시** 로 캐싱한

 ### 외부 계산 + 로컬 검색 워크플로

-search/ask 는 asset 파일 없이 `kebab.sqlite` + `lancedb` 만으로 동작한다. 비싼 색인(임베딩·OCR·별칭 생성)을 성능 좋은 서버에서 수행한 뒤, 이 두 산출물만 로컬로 복사하면 그대로 검색·질문할 수 있다.
+search/ask 는 원본 파일 없이 KB 산출물만으로 동작한다 (청크 본문이 SQLite 에 저장되고 문서 경로는 상대경로로 기록됨). 비싼 색인(임베딩·OCR·별칭 생성)을 성능 좋은 머신에서 수행한 뒤(예: Apple Silicon 맥에서 candle Metal GPU), **두 산출물만** 다른 머신(예: NUMA 서버)으로 복사하면 그대로 검색·질문할 수 있다.
+
+**무엇을 복사하나 — `[storage]` 에서 정의된 두 경로:**
+
+| 복사 대상 | config 키 (`[storage]`) | 기본 경로 | 내용 |
+|-----------|------------------------|-----------|------|
+| `kebab.sqlite` | `sqlite = "{data_dir}/kebab.sqlite"` | `{data_dir}/kebab.sqlite` | 문서·청크·본문·FTS5·메타 |
+| `lancedb/` | `vector_dir = "{data_dir}/lancedb"` | `{data_dir}/lancedb/` | 임베딩 벡터 |
+
+`{data_dir}` 는 `[storage].data_dir` (예: `~/.local/share/kebab`). `models/`(`model_dir`)·`assets/`(`asset_dir`)는 **복사 불필요** — 모델은 각 머신이 자기 캐시를 받고, asset 원본 바이트는 검색·질문에 쓰이지 않는다 (단일파일/`stdin` 색인의 원본 재읽기·재색인까지 보존하려면 `assets/` 도 함께 복사).
+
+```bash
+# ingest 가 끝난(쓰기 없는) 상태에서 복사
+rsync -a <src-data_dir>/kebab.sqlite  user@server:<dst-data_dir>/
+rsync -a <src-data_dir>/lancedb/      user@server:<dst-data_dir>/lancedb/
+```
+
+조건: **양쪽 동일 `kebab` 버전 + 동일 임베딩 모델/차원** (`[models.embedding].model`·`dimensions`). provider 는 달라도 됨 (예: 맥 `candle`/Metal ↔ 서버 `candle`/CPU 또는 `fastembed` — 같은 모델이면 벡터 호환). 복사는 반드시 ingest 가 돌지 않을 때.

 ### 멀티미디어 색인

@@ -70,7 +87,7 @@ Markdown · PDF · 이미지(OCR + caption) · 소스코드(Rust/Python/TS/JS/Go
 | 명령 | 동작 |
 |------|------|
 | `kebab init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 |
-| `kebab ingest [<path>]` | 워크스페이스 스캔 후 새/변경 문서 색인 (idempotent · incremental, `--force-reingest` 로 강제 재처리). 미지원 확장자는 자동 skip |
+| `kebab ingest [<path>]` | 워크스페이스 스캔 후 새/변경 문서 색인 (idempotent · incremental, `--force-reingest` 로 강제 재처리). 미지원 확장자는 자동 skip. 진행바는 문서별 청크 수 · 별칭 확장 라이브 카운터 · 문서 종료 시 phase별 소요시간(parse/chunk/expand/embed/store)을 표시 (`--json` 은 `asset_chunked`/`expansion_progress`/`asset_timings` 이벤트로) |
 | `kebab ingest-file <path>` | 단일 파일 ingest (workspace 외부 가능 — `_external/` 로 deterministic copy) |
 | `kebab ingest-stdin --title <T>` | stdin 의 markdown 본문 ingest |
 | `kebab search --mode {lexical,vector,hybrid} "<query>" [flags]` | 검색 (default hybrid = RRF fusion, citation 포함). 필터/budget flag 는 `--help` |
@@ -97,9 +114,36 @@ root = "~/KnowledgeBase"   # 색인할 폴더. 절대 / tilde / env / 상대 경
                          # 상대 경로의 base 는 config.toml 위치 (cwd 무관).

 [models.embedding]
+provider = "fastembed"            # "fastembed"(기본, onnxruntime) / "candle"(순수 Rust)
+                                  # / "none"(lexical-only). candle 는 같은 모델·같은 벡터를
+                                  # 순수 Rust 로 돌려 NUMA 서버의 onnxruntime 48-스레드
+                                  # double-free 를 피하는 opt-in 백엔드 (재색인 불필요).
 model = "multilingual-e5-large"   # 다국어 sentence embedding (1024-dim).
                                  # 첫 ingest 시 ONNX (~1.3GB) 자동 다운로드.
+                                  # candle provider 는 safetensors (~2GB) 다운로드.
 dimensions = 1024                 # config 와 LanceDB stored dim 불일치 시 검색 0건.
+num_threads = 0                   # candle 전용 CPU 스레드 캡 (0=auto=#cores).
+                                  # env KEBAB_EMBED_THREADS 가 우선. NUMA 노드 바인딩은
+                                  # numactl 과 조합. fastembed provider 는 무시.
+```
+
+**Apple Silicon GPU 가속 (candle / macOS)**: M-시리즈 맥에서 candle 임베딩을
+GPU(Metal)로 돌리면 CPU 대비 대용량 ingest 가 크게 빨라진다. 빌드 또는 설치 시
+`embed_metal` feature 를 켠다:
+
+```bash
+# 빌드만:
+cargo build --release --features embed_metal
+# 전역 설치 (~/.cargo/bin/kebab):
+cargo install --path crates/kebab-cli --features embed_metal --locked
+```
+
+벡터는 CPU candle 과 동일 모델이라 호환되므로, 맥에서 GPU 로 색인한
+`kebab.sqlite` + `lancedb/` 를 그대로 Linux 서버(CPU candle)로 복사해 질의할 수
+있다. 색인 로그에 `candle device = Metal (GPU)` 가 보이면 GPU 사용 중. metal
+feature 는 macOS 전용 (Linux/서버는 기본 CPU 빌드).
+
+```toml

 [models.llm]
 endpoint = "http://localhost:11434"   # Ollama host:port
@@ -123,6 +167,7 @@ nli_threshold = 0.0                  # >0 (예: 0.5) 면 mDeBERTa XNLI groundedn
 - **`[ingest.code]`** — code ingest 의 skip 정책 (`skip_generated_header`, `max_file_bytes`, `extra_skip_globs`). `.gitignore` 자동 honor, `.kebabignore` 는 추가 layer.
 - **`[pdf.ocr]`** — scanned PDF 의 page-단위 OCR (default off / opt-in, page 당 ~수십 초 cost). 활성화 후 v0.19 시절 색인분은 `kebab ingest --force-reingest` 로 재처리.
 - **`--config <path>`** — 임시 워크스페이스 / 격리 테스트용 (CLI · TUI 모두 honor).
+- **`kebab config migrate`** — 새 버전에서 추가된 config 섹션을 기존 `config.toml` 에 설명 주석과 함께 채워 넣는다 (사용자가 손본 값·주석·순서는 보존, 멱등, 변경 시 자동 `.bak` 백업). `--dry-run` 으로 변경 미리보기. `kebab doctor` 가 갱신 필요 시 안내한다. `kebab init` 으로 새로 생성되는 config.toml 도 섹션별 주석을 포함한다.
 - **`KEBAB_*` env** — 일부 키 override (`KEBAB_RAG_SCORE_GATE`, `KEBAB_EVAL_GOLDEN` 등).
 - **XDG layout**: `~/.config/kebab/`, `~/.local/share/kebab/`, `~/.cache/kebab/`, `~/.local/state/kebab/`.

--- a/crates/kebab-app/Cargo.toml
+++ b/crates/kebab-app/Cargo.toml
@@ -18,6 +18,7 @@ kebab-store-vector = { path = "../kebab-store-vector" }
 kebab-search = { path = "../kebab-search" }
 kebab-embed = { path = "../kebab-embed" }
 kebab-embed-local = { path = "../kebab-embed-local" }
+kebab-embed-candle = { path = "../kebab-embed-candle" }
 kebab-llm = { path = "../kebab-llm" }
 kebab-llm-local = { path = "../kebab-llm-local" }
 kebab-rag = { path = "../kebab-rag" }
@@ -71,6 +72,7 @@ base64               = { workspace = true }
 rusqlite             = { workspace = true }

 [dev-dependencies]
+kebab-config = { path = "../kebab-config" }
 # doc-side expansion (Phase 2) Task 4: ExpansionGenerator unit tests build
 # MockLanguageModel (gated behind kebab-llm's `mock` feature, default OFF in
 # [dependencies]). Enabling it here turns it on for the test build only.
@@ -98,6 +100,8 @@ reqwest      = { version = "0.12", default-features = false, features = ["blocki
 # disable path 없음; 이 feature 는 spec §6.3 명시를 honor 하는 role 만.
 default = ["fts_korean_morphological"]
 fts_korean_morphological = []
+# opt-in (macOS): candle embedder runs on the Apple Silicon GPU. See kebab-embed-candle.
+embed_metal = ["kebab-embed-candle/metal"]

 [lints]
 workspace = true
--- a/crates/kebab-app/src/app.rs
+++ b/crates/kebab-app/src/app.rs
@@ -43,6 +43,7 @@ use kebab_core::{
    Answer, DocumentStore, Embedder, ExtractContext, Extractor, IndexVersion, LanguageModel,
    MediaType, Retriever, SearchHit, SearchMode, SearchOpts, SearchQuery, VectorStore,
 };
+use kebab_embed_candle::CandleEmbedder;
 use kebab_embed_local::FastembedEmbedder;
 use kebab_llm_local::OllamaLanguageModel;
 use kebab_parse_code::{
@@ -833,9 +834,26 @@ impl App {
        if let Some(e) = self.embedder.get() {
            return Ok(Some(e.clone()));
        }
-        let emb: Arc<dyn Embedder + Send + Sync> = Arc::new(
-            FastembedEmbedder::new(&self.config).context("kb-app: load FastembedEmbedder")?,
-        );
+        // Provider branch (Track 1 spec §3). `embeddings_disabled()` above
+        // already handled `"none"`; here we route the live providers.
+        // `fastembed`/`onnx`/(empty) keep the default onnxruntime path
+        // (vectors unchanged — `embedding_version` is preserved); `candle`
+        // selects the pure-Rust NUMA-safe backend.
+        let provider = self.config.models.embedding.provider.as_str();
+        let emb: Arc<dyn Embedder + Send + Sync> = match provider {
+            "fastembed" | "onnx" | "" => Arc::new(
+                FastembedEmbedder::new(&self.config).context("kb-app: load FastembedEmbedder")?,
+            ),
+            "candle" => Arc::new(
+                CandleEmbedder::new(&self.config).context("kb-app: load CandleEmbedder")?,
+            ),
+            other => {
+                return Err(anyhow!(
+                    "kb-app: unknown embedding provider {other:?}; expected one of \
+                     `fastembed` (default), `candle`, or `none` (lexical-only)"
+                ));
+            }
+        };
        // `set` returns Err if another thread won the race; in that case
        // the loser still returns the (now-cached) winner via `get()`.
        let _ = self.embedder.set(emb.clone());
--- a/crates/kebab-app/src/ingest_progress.rs
+++ b/crates/kebab-app/src/ingest_progress.rs
@@ -47,11 +47,21 @@ pub struct AggregateCounts {
 ///
 /// ```text
 /// ScanStarted < ScanCompleted
-///   < (AssetStarted [< (PdfOcrStarted < PdfOcrFinished)*] < AssetFinished)*
+///   < ( AssetStarted
+///         [< (PdfOcrStarted < PdfOcrFinished)*]
+///         [< AssetChunked]
+///         [< ExpansionProgress*]
+///         [< AssetTimings]
+///       < AssetFinished )*
 ///   < (Completed | Aborted)
 /// ```
 ///
-/// `[]` = optional, per-PDF asset only (v0.20.0 sub-item 1).
+/// `[]` = optional. `PdfOcr*` is per-PDF asset only (v0.20.0 sub-item 1).
+/// `AssetChunked` / `ExpansionProgress` / `AssetTimings` are the v0.24.0
+/// asset-internal phase events: `AssetChunked` fires once right after
+/// chunking (markdown / image / PDF); `ExpansionProgress` is a throttled
+/// counter through the alias-expansion loop (markdown, expansion enabled
+/// only); `AssetTimings` reports per-phase wall-clock once (markdown only).
 ///
 /// Embed-batch events (`embed_batch_started` / `embed_batch_finished`
 /// in §2.4a) are reserved for a future iteration and are not emitted
@@ -82,6 +92,41 @@ pub enum IngestEvent {
        result: IngestItemKind,
        chunks: u32,
    },
+    /// v0.24.0 (additive): emitted right after an asset is chunked, before
+    /// expansion / embed / store. Surfaces "this document is N chunks"
+    /// immediately so a single large document no longer looks frozen at
+    /// `idx/total` while its per-chunk phases churn. `chunks` is the chunk
+    /// count for asset `idx`.
+    AssetChunked { idx: u32, total: u32, chunks: u32 },
+    /// v0.24.0 (additive): throttled progress through the per-chunk
+    /// expansion (alias-LLM) loop — the slowest inner phase for large
+    /// documents (~1–4s per chunk against a remote GPU Ollama). `done` is
+    /// the number of chunks processed so far (cache hits included, so the
+    /// counter still advances on a warm re-run); `chunks` is the asset's
+    /// total chunk count. Emitted at most every 25 chunks or once per
+    /// second (see the loop in `ingest_one_asset`), plus a final
+    /// `done == chunks` frame.
+    ExpansionProgress {
+        idx: u32,
+        total: u32,
+        done: u32,
+        chunks: u32,
+    },
+    /// v0.24.0 (additive): per-phase wall-clock (milliseconds) for asset
+    /// `idx`, emitted once the asset's markdown pipeline finishes. Lets a
+    /// user see *where* the time went (parse / chunk / expansion / embed /
+    /// store) without parsing logs. Only the markdown path emits this; the
+    /// image / PDF paths surface `AssetChunked` but skip phase timing (their
+    /// phase shapes differ — OCR / caption rather than expansion).
+    AssetTimings {
+        idx: u32,
+        total: u32,
+        parse_ms: u64,
+        chunk_ms: u64,
+        expansion_ms: u64,
+        embed_ms: u64,
+        store_ms: u64,
+    },
    /// Run finished normally. `counts` is the final aggregate.
    Completed { counts: AggregateCounts },
    /// Run finished by user cancellation. `counts` is the partial
@@ -199,6 +244,79 @@ mod tests {
        assert_eq!(v.get("media").and_then(|s| s.as_str()), Some("markdown"));
    }

+    #[test]
+    fn asset_chunked_serializes_with_discriminator() {
+        // v0.24.0 additive variant — `kind` must be snake_case
+        // `asset_chunked` so wire v1 consumers branch on it cleanly.
+        let ev = IngestEvent::AssetChunked {
+            idx: 3,
+            total: 10,
+            chunks: 142,
+        };
+        let v = serde_json::to_value(&ev).unwrap();
+        assert_eq!(
+            v.get("kind").and_then(|s| s.as_str()),
+            Some("asset_chunked")
+        );
+        assert_eq!(v.get("idx").and_then(serde_json::Value::as_u64), Some(3));
+        assert_eq!(
+            v.get("chunks").and_then(serde_json::Value::as_u64),
+            Some(142)
+        );
+    }
+
+    #[test]
+    fn expansion_progress_serializes_with_discriminator() {
+        let ev = IngestEvent::ExpansionProgress {
+            idx: 1,
+            total: 5,
+            done: 25,
+            chunks: 200,
+        };
+        let v = serde_json::to_value(&ev).unwrap();
+        assert_eq!(
+            v.get("kind").and_then(|s| s.as_str()),
+            Some("expansion_progress")
+        );
+        assert_eq!(v.get("done").and_then(serde_json::Value::as_u64), Some(25));
+        assert_eq!(
+            v.get("chunks").and_then(serde_json::Value::as_u64),
+            Some(200)
+        );
+    }
+
+    #[test]
+    fn asset_timings_serializes_all_phase_fields() {
+        let ev = IngestEvent::AssetTimings {
+            idx: 2,
+            total: 7,
+            parse_ms: 12,
+            chunk_ms: 3,
+            expansion_ms: 45_000,
+            embed_ms: 800,
+            store_ms: 20,
+        };
+        let v = serde_json::to_value(&ev).unwrap();
+        assert_eq!(
+            v.get("kind").and_then(|s| s.as_str()),
+            Some("asset_timings")
+        );
+        // All five phase fields are present (plain u64, always serialized).
+        for (field, want) in [
+            ("parse_ms", 12u64),
+            ("chunk_ms", 3),
+            ("expansion_ms", 45_000),
+            ("embed_ms", 800),
+            ("store_ms", 20),
+        ] {
+            assert_eq!(
+                v.get(field).and_then(serde_json::Value::as_u64),
+                Some(want),
+                "field {field}"
+            );
+        }
+    }
+
    #[test]
    fn ingest_event_completed_has_counts() {
        let ev = IngestEvent::Completed {
--- a/crates/kebab-app/src/lib.rs
+++ b/crates/kebab-app/src/lib.rs
@@ -143,40 +143,10 @@ pub fn init_workspace(force: bool) -> anyhow::Result<()> {
    std::fs::create_dir_all(&workspace_root)?;

    if !cfg_path.exists() || force {
-        let cfg = kebab_config::Config::defaults();
-        let toml_text = toml::to_string_pretty(&cfg)?;
-        // p9-fb-05: prepend a header comment documenting the path
-        // policy so a user editing this file knows what's allowed
-        // for `workspace.root` (and how relative paths resolve).
-        // The actual key lives inside `[workspace]` further down;
-        // we keep the explanation up top because users skim header
-        // comments first.
-        let header = "\
-# kebab config — `~/.config/kebab/config.toml`.
-#
-# `workspace.root` accepts:
-#   • absolute paths       (`/home/me/KnowledgeBase`)
-#   • tilde                (`~/KnowledgeBase`)         ← default
-#   • env vars             (`${XDG_DATA_HOME}/kebab`)
-#   • relative paths       (`./notes`, `notes`, `../shared/x`)
-#     — relative paths resolve against the directory of THIS
-#       config file, NOT the user's `cwd` at invocation time.
-#
-# 처리 가능한 형식 (extractor 가 자동 결정 — config 에 명시할 수 없음):
-#   • Markdown: .md
-#   • 이미지:   .png .jpg .jpeg  (OCR + caption)
-#   • PDF:      .pdf
-# 다른 확장자는 ingest 시 자동 skip + warning. 처리 대상 폴더의
-# 일부만 ingest 하고 싶으면 `kebab ingest <path>` 로 root 명시
-# 또는 `.kebabignore` 파일 / 본 `workspace.exclude` 로 denylist.
-#
-# Override individual keys at runtime with `KEBAB_*` env vars
-# (e.g. `KEBAB_WORKSPACE_ROOT=/tmp/test kebab ingest`).
-\n";
-        let mut combined = String::with_capacity(header.len() + toml_text.len());
-        combined.push_str(header);
-        combined.push_str(&toml_text);
-        std::fs::write(&cfg_path, combined)?;
+        // init 과 migrate 가 동일한 "주석 달린 default" 문서를 공유한다
+        // (주석 카탈로그·헤더의 단일 원천 = kebab_config::migrate).
+        let doc = kebab_config::migrate::annotated_default_document();
+        std::fs::write(&cfg_path, doc.to_string())?;
    }

    Ok(())
@@ -510,6 +480,8 @@ pub fn ingest_with_config_opts(
        let item = ingest_one_asset(
            &app,
            &asset,
+            idx,
+            scanned_count,
            &parser_version,
            &chunk_policy,
            embedder.as_ref(),
@@ -1130,6 +1102,8 @@ fn embed_with_cache(
 fn ingest_one_asset(
    app: &App,
    asset: &RawAsset,
+    idx: u32,
+    total: u32,
    parser_version: &ParserVersion,
    chunk_policy: &ChunkPolicy,
    embedder: Option<&Arc<dyn Embedder + Send + Sync>>,
@@ -1162,18 +1136,23 @@ fn ingest_one_asset(
            return ingest_one_image_asset(
                app,
                asset,
+                idx,
+                total,
                chunk_policy,
                embedder,
                vector_store,
                existing_doc_ids,
                image_pipeline,
                force_reingest,
+                progress,
            );
        }
        MediaType::Pdf => {
            return ingest_one_pdf_asset(
                app,
                asset,
+                idx,
+                total,
                chunk_policy,
                embedder,
                vector_store,
@@ -1282,6 +1261,10 @@ fn ingest_one_asset(
        return Ok(item);
    }

+    // v0.24.0 phase timing: parse spans from here (byte read) through
+    // `build_canonical_document`, i.e. everything before the chunker runs.
+    let t_parse = std::time::Instant::now();
+
    let bytes = std::fs::read(&path)
        .with_context(|| format!("read asset bytes from {}", path.display()))?;

@@ -1316,9 +1299,26 @@ fn ingest_one_asset(
        build_canonical_document(asset, metadata, parsed_blocks, parser_version, all_warnings)
            .context("kb-parse-md::build_canonical_document")?;

+    let parse_ms = u64::try_from(t_parse.elapsed().as_millis()).unwrap_or(u64::MAX);
+
+    let t_chunk = std::time::Instant::now();
    let mut chunks = MdHeadingV1Chunker
        .chunk(&canonical, chunk_policy)
        .context("kb-chunk::MdHeadingV1Chunker::chunk")?;
+    let chunk_ms = u64::try_from(t_chunk.elapsed().as_millis()).unwrap_or(u64::MAX);
+
+    // v0.24.0: surface the chunk count immediately, before the (potentially
+    // very slow) expansion / embed phases — so a single large document no
+    // longer looks frozen at `idx/total` while its chunks churn.
+    let total_chunks = u32::try_from(chunks.len()).unwrap_or(u32::MAX);
+    crate::ingest_progress::emit(
+        progress,
+        crate::ingest_progress::IngestEvent::AssetChunked {
+            idx,
+            total,
+            chunks: total_chunks,
+        },
+    );

    // Phase 2 doc-side expansion: flag on 이면 청크당 별칭 생성 (fail-soft).
    // derivation cache(§3.4): 같은 청크 text + 같은 alias version_key 면 LLM
@@ -1326,6 +1326,7 @@ fn ingest_one_asset(
    let mut alias_cache_hit = 0_usize;
    let mut alias_cache_miss = 0_usize;
    let mut alias_touch_keys: Vec<String> = Vec::new();
+    let t_expansion = std::time::Instant::now();
    if app.config.ingest.expansion.enabled {
        let exp = &app.config.ingest.expansion;
        let alias_version_key = format!(
@@ -1343,6 +1344,12 @@ fn ingest_one_asset(
            Ok(llm) => {
                let generator =
                    crate::expansion::ExpansionGenerator::new(&llm, exp.max_aliases_per_chunk);
+                // v0.24.0: throttled live counter through the per-chunk
+                // expansion loop. Emit at most every 25 chunks or once per
+                // second — never per chunk (would flood the mpsc channel).
+                let mut done: u32 = 0;
+                let mut last_emit = std::time::Instant::now();
+                let mut last_done: u32 = 0;
                for chunk in &mut chunks {
                    let key = kebab_core::derivation_cache_key(
                        "alias",
@@ -1375,6 +1382,40 @@ fn ingest_one_asset(
                                .derivation_cache_put(&key, "alias", a.as_bytes())?;
                        }
                    }
+                    // Cache hits count toward `done` too (the brief: show the
+                    // warm-run fast-forward). Throttle: every 25 chunks or
+                    // ≥1s since the last emit.
+                    done += 1;
+                    if done % 25 == 0
+                        || last_emit.elapsed() >= std::time::Duration::from_secs(1)
+                    {
+                        crate::ingest_progress::emit(
+                            progress,
+                            crate::ingest_progress::IngestEvent::ExpansionProgress {
+                                idx,
+                                total,
+                                done,
+                                chunks: total_chunks,
+                            },
+                        );
+                        last_emit = std::time::Instant::now();
+                        last_done = done;
+                    }
+                }
+                // Final frame so the counter lands on done == total — but only
+                // if the last in-loop emit didn't already report this `done`
+                // (avoids a duplicate frame when chunks is a multiple of the
+                // throttle, and skips a 0/0 frame when there are no chunks).
+                if done != last_done {
+                    crate::ingest_progress::emit(
+                        progress,
+                        crate::ingest_progress::IngestEvent::ExpansionProgress {
+                            idx,
+                            total,
+                            done,
+                            chunks: total_chunks,
+                        },
+                    );
                }
            }
            Err(e) => {
@@ -1385,6 +1426,7 @@ fn ingest_one_asset(
            }
        }
    }
+    let expansion_ms = u64::try_from(t_expansion.elapsed().as_millis()).unwrap_or(u64::MAX);

    // Stamp chunker + embedding versions so Task 7's skip detection has
    // data on the second run.
@@ -1397,7 +1439,7 @@ fn ingest_one_asset(
    // (per-document tx semantics per design §5.8); composing them is
    // the kb-app job. A failure mid-way leaves the DB in a state the
    // next ingest run can re-converge (UPSERT + DELETE-then-INSERT).
-    purge_vector_orphans_for_workspace_path(app, asset, vector_store)?;
+    let t_store = std::time::Instant::now();
    app.sqlite
        .put_asset_with_bytes(asset, &bytes)
        .context("DocumentStore::put_asset_with_bytes")?;
@@ -1410,8 +1452,16 @@ fn ingest_one_asset(
    app.sqlite
        .put_chunks(&canonical.doc_id, &chunks)
        .context("DocumentStore::put_chunks")?;
+    let store_ms = u64::try_from(t_store.elapsed().as_millis()).unwrap_or(u64::MAX);

    // Embed + vector upsert (only when both sides are configured).
+    let t_embed = std::time::Instant::now();
+    // Stale-vector purge is LanceDB I/O, so it belongs to the embed/vector
+    // phase — not the SQLite `store` phase. Keeping it here makes `store_ms`
+    // mean "SQLite persist only" and `embed_ms` cover all vector-store work
+    // (purge + upsert), so per-phase timings attribute the bottleneck
+    // correctly (review fix). Runs before any new upsert, as before.
+    purge_vector_orphans_for_workspace_path(app, asset, vector_store)?;
    let mut emb_cache_hit = 0_usize;
    let mut emb_cache_miss = 0_usize;
    if let (Some(emb), Some(vec_store)) = (embedder, vector_store) {
@@ -1541,6 +1591,22 @@ fn ingest_one_asset(
        }
    }

+    let embed_ms = u64::try_from(t_embed.elapsed().as_millis()).unwrap_or(u64::MAX);
+
+    // v0.24.0: phase-timing breakdown for this asset (markdown path only).
+    crate::ingest_progress::emit(
+        progress,
+        crate::ingest_progress::IngestEvent::AssetTimings {
+            idx,
+            total,
+            parse_ms,
+            chunk_ms,
+            expansion_ms,
+            embed_ms,
+            store_ms,
+        },
+    );
+
    // 히트한 alias 키들의 last_used_at 갱신(LRU 보존, §3.5).
    app.sqlite.derivation_cache_touch(&alias_touch_keys)?;

@@ -1594,12 +1660,15 @@ fn ingest_one_asset(
 fn ingest_one_image_asset(
    app: &App,
    asset: &RawAsset,
+    idx: u32,
+    total: u32,
    chunk_policy: &ChunkPolicy,
    embedder: Option<&Arc<dyn Embedder + Send + Sync>>,
    vector_store: Option<&Arc<kebab_store_vector::LanceVectorStore>>,
    existing_doc_ids: &std::collections::HashSet<String>,
    image_pipeline: &ImagePipeline<'_>,
    force_reingest: bool,
+    progress: Option<&std::sync::mpsc::Sender<crate::ingest_progress::IngestEvent>>,
 ) -> anyhow::Result<kebab_core::IngestItem> {
    let ocr_engine = image_pipeline.ocr_engine;
    let caption_llm = image_pipeline.caption_llm;
@@ -1752,6 +1821,17 @@ fn ingest_one_image_asset(
        .chunk(&canonical, chunk_policy)
        .context("kb-chunk::MdHeadingV1Chunker::chunk (image)")?;

+    // v0.24.0: surface chunk count for the image path too (phase timing is
+    // markdown-only, but AssetChunked is consistent across media).
+    crate::ingest_progress::emit(
+        progress,
+        crate::ingest_progress::IngestEvent::AssetChunked {
+            idx,
+            total,
+            chunks: u32::try_from(chunks.len()).unwrap_or(u32::MAX),
+        },
+    );
+
    // 5. Persist + embed — identical sequence to markdown.
    // Stamp chunker + embedding versions (image uses MdHeadingV1Chunker
    // for its single-block doc, so we record that version).
@@ -2157,6 +2237,8 @@ fn sweep_deleted_files(
 fn ingest_one_pdf_asset(
    app: &App,
    asset: &RawAsset,
+    idx: u32,
+    total: u32,
    chunk_policy: &ChunkPolicy,
    embedder: Option<&Arc<dyn Embedder + Send + Sync>>,
    vector_store: Option<&Arc<kebab_store_vector::LanceVectorStore>>,
@@ -2360,6 +2442,16 @@ fn ingest_one_pdf_asset(
        .chunk(&canonical, chunk_policy)
        .context("kb-chunk::PdfPageV1Chunker::chunk")?;

+    // v0.24.0: surface chunk count for the PDF path too.
+    crate::ingest_progress::emit(
+        progress,
+        crate::ingest_progress::IngestEvent::AssetChunked {
+            idx,
+            total,
+            chunks: u32::try_from(chunks.len()).unwrap_or(u32::MAX),
+        },
+    );
+
    // Stamp chunker + embedding versions so Task 7's skip detection has
    // data on the second run.
    canonical.last_chunker_version = Some(chunker.chunker_version());
@@ -3211,6 +3303,48 @@ pub fn doctor_with_config_path(
        hint: data_hint,
    });

+    // config_migration — 사용자 파일이 새 스키마와 동기인지(dry-run 마이그레이션).
+    // 파일이 존재할 때만 점검(없으면 defaults 사용 중이라 마이그레이션 무의미).
+    if cfg_path.exists() {
+        if let Ok(text) = std::fs::read_to_string(&cfg_path) {
+            let outcome = kebab_config::migrate::migrate_document(&text);
+            let (mok, detail, hint) = if outcome.changed() {
+                let added = outcome
+                    .changes
+                    .iter()
+                    .filter(|c| {
+                        matches!(
+                            c.kind,
+                            kebab_config::migrate::ChangeKind::AddedSection
+                                | kebab_config::migrate::ChangeKind::AddedKey
+                        )
+                    })
+                    .count();
+                let removed = outcome.changes.len() - added;
+                (
+                    false,
+                    format!(
+                        "{} pending changes (added {added}, removed {removed} deprecated)",
+                        outcome.changes.len()
+                    ),
+                    Some("run `kebab config migrate` to update your config.toml".to_string()),
+                )
+            } else {
+                (
+                    true,
+                    format!("config up to date (schema v{})", outcome.to_schema_version),
+                    None,
+                )
+            };
+            checks.push(DoctorCheck {
+                name: "config_migration".to_string(),
+                ok: mok,
+                detail,
+                hint,
+            });
+        }
+    }
+
    let ok = checks.iter().all(|c| c.ok);
    Ok(DoctorReport {
        schema_version: "doctor.v1".to_string(),
@@ -3227,6 +3361,66 @@ pub fn doctor() -> anyhow::Result<DoctorReport> {
    doctor_with_config_path(None)
 }

+/// `kebab config migrate` 의 결과(wire `config_migration.v1` 소스).
+#[derive(Clone, Debug, PartialEq, serde::Serialize)]
+pub struct ConfigMigrationReport {
+    /// 항상 `"config_migration.v1"`.
+    pub schema_version: String,
+    pub config_path: String,
+    pub dry_run: bool,
+    pub from_schema_version: u32,
+    pub to_schema_version: u32,
+    pub changed: bool,
+    pub backup_path: Option<String>,
+    pub changes: Vec<kebab_config::migrate::MigrationChange>,
+}
+
+/// 사용자 config.toml 을 새 스키마로 마이그레이션한다(facade).
+/// `config_path` 미지정 시 XDG 기본. `dry_run=true` 면 파일·백업 미변경.
+/// 안전: 변경 시 `.bak` 백업 후 tmp 에 쓰고 round-trip 검증 → atomic rename.
+pub fn config_migrate_with_config_path(
+    config_path: Option<&std::path::Path>,
+    dry_run: bool,
+) -> anyhow::Result<ConfigMigrationReport> {
+    let path: PathBuf = match config_path {
+        Some(p) => p.to_path_buf(),
+        None => kebab_config::Config::xdg_config_path(),
+    };
+    if !path.exists() {
+        anyhow::bail!(
+            "config 파일이 없습니다: {} — 먼저 `kebab init` 을 실행하세요.",
+            path.display()
+        );
+    }
+    let text = std::fs::read_to_string(&path)?;
+    let outcome = kebab_config::migrate::migrate_document(&text);
+
+    let mut backup_path = None;
+    if !dry_run && outcome.changed() {
+        let bak = path.with_extension("toml.bak");
+        std::fs::copy(&path, &bak)?;
+        backup_path = Some(bak.display().to_string());
+        let tmp = path.with_extension("toml.tmp");
+        std::fs::write(&tmp, &outcome.new_text)?;
+        if kebab_config::Config::from_file(&tmp).is_err() {
+            std::fs::remove_file(&tmp).ok();
+            anyhow::bail!("마이그레이션 결과가 유효하지 않아 원본을 보존합니다.");
+        }
+        std::fs::rename(&tmp, &path)?;
+    }
+
+    Ok(ConfigMigrationReport {
+        schema_version: "config_migration.v1".to_string(),
+        config_path: path.display().to_string(),
+        dry_run,
+        from_schema_version: outcome.from_schema_version,
+        to_schema_version: outcome.to_schema_version,
+        changed: outcome.changed(),
+        backup_path,
+        changes: outcome.changes,
+    })
+}
+
 /// Single-file ingest (p9-fb-31). Copies the file to
 /// `<workspace.root>/_external/<blake3-12>.<ext>` and runs the
 /// per-medium ingest pipeline on that single asset. Returns an
--- a/crates/kebab-app/src/schema.rs
+++ b/crates/kebab-app/src/schema.rs
@@ -108,6 +108,7 @@ const WIRE_SCHEMAS: &[&str] = &[
    "doc_summary.v1",
    "chunk_inspection.v1",
    "doctor.v1",
+    "config_migration.v1",
    "ingest_report.v1",
    "ingest_progress.v1",
    "reset_report.v1",
--- a/crates/kebab-app/tests/config_migrate.rs
+++ b/crates/kebab-app/tests/config_migrate.rs
@@ -0,0 +1,82 @@
+use std::fs;
+
+#[test]
+fn migrate_writes_backup_and_atomic_with_dry_run_noop() {
+    let dir = tempfile::tempdir().unwrap();
+    let cfg = dir.path().join("config.toml");
+    fs::write(
+        &cfg,
+        "schema_version = 1\n\n[workspace]\nroot = \"/n\"\ninclude = [\"*.md\"]\n",
+    )
+    .unwrap();
+
+    // dry-run: 파일·백업 미변경.
+    let report = kebab_app::config_migrate_with_config_path(Some(&cfg), true).unwrap();
+    assert!(report.changed);
+    assert!(report.dry_run);
+    assert!(report.backup_path.is_none());
+    assert!(!dir.path().join("config.toml.bak").exists());
+    assert!(
+        fs::read_to_string(&cfg).unwrap().contains("include"),
+        "dry-run modified file"
+    );
+
+    // 실제 적용: 백업 생성 + 파일 갱신.
+    let report = kebab_app::config_migrate_with_config_path(Some(&cfg), false).unwrap();
+    assert!(report.changed);
+    assert!(!report.dry_run);
+    assert!(report.backup_path.is_some());
+    assert!(dir.path().join("config.toml.bak").exists());
+    let new = fs::read_to_string(&cfg).unwrap();
+    assert!(!new.contains("include"));
+    assert!(new.contains("[ingest.expansion]"));
+
+    // 멱등: 재실행 changed=false.
+    let report = kebab_app::config_migrate_with_config_path(Some(&cfg), false).unwrap();
+    assert!(!report.changed);
+}
+
+#[test]
+fn migrate_missing_file_errors() {
+    let dir = tempfile::tempdir().unwrap();
+    let cfg = dir.path().join("nope.toml");
+    assert!(kebab_app::config_migrate_with_config_path(Some(&cfg), false).is_err());
+}
+
+#[test]
+fn annotated_default_serialization_contains_section_comments() {
+    let doc = kebab_config::migrate::annotated_default_document();
+    let text = doc.to_string();
+    assert!(text.contains("doc-side 별칭"), "section comment missing:\n{text}");
+    assert!(text.contains("[ingest.expansion]"));
+}
+
+#[test]
+fn doctor_flags_outdated_config() {
+    let dir = tempfile::tempdir().unwrap();
+    let cfg = dir.path().join("config.toml");
+    fs::write(
+        &cfg,
+        "schema_version = 1\n\n[workspace]\nroot = \"/n\"\ninclude=[\"*.md\"]\n",
+    )
+    .unwrap();
+    let report = kebab_app::doctor_with_config_path(Some(&cfg)).unwrap();
+    let check = report
+        .checks
+        .iter()
+        .find(|c| c.name == "config_migration")
+        .unwrap();
+    assert!(!check.ok, "outdated config should fail check");
+    assert!(check.hint.as_deref().unwrap().contains("config migrate"));
+    assert!(!report.ok, "overall doctor should be false");
+
+    // migrate 후엔 통과.
+    kebab_app::config_migrate_with_config_path(Some(&cfg), false).unwrap();
+    let report = kebab_app::doctor_with_config_path(Some(&cfg)).unwrap();
+    let check = report
+        .checks
+        .iter()
+        .find(|c| c.name == "config_migration")
+        .unwrap();
+    assert!(check.ok, "after migrate should pass");
+}
--- a/crates/kebab-app/tests/ingest_progress.rs
+++ b/crates/kebab-app/tests/ingest_progress.rs
@@ -69,40 +69,74 @@ fn progress_event_sequence_matches_design_section_2_4a() {
        other => panic!("expected Completed last, got {other:?}"),
    }

-    // Middle: 3 AssetStarted/AssetFinished pairs in monotonic idx order.
-    let asset_events: Vec<&IngestEvent> = events[2..events.len() - 1].iter().collect();
-    assert_eq!(
-        asset_events.len(),
-        6,
-        "expected 3 (Started + Finished) pairs, got {asset_events:?}"
-    );
-    for (chunk_idx, pair) in asset_events.chunks(2).enumerate() {
-        let expected_idx = chunk_idx as u32 + 1;
-        match (pair[0], pair[1]) {
-            (
-                IngestEvent::AssetStarted {
-                    idx: si,
-                    total: st,
-                    media,
-                    ..
-                },
-                IngestEvent::AssetFinished {
-                    idx: fi,
-                    total: ft,
-                    result,
-                    chunks,
-                },
-            ) => {
-                assert_eq!(*si, expected_idx, "Started idx mismatch: {pair:?}");
-                assert_eq!(*fi, expected_idx, "Finished idx mismatch: {pair:?}");
-                assert_eq!(*st, 3, "Started total mismatch");
-                assert_eq!(*ft, 3, "Finished total mismatch");
-                assert_eq!(media, "markdown", "fixture is markdown only");
-                assert_eq!(*result, IngestItemKind::New, "first ingest → New");
-                assert!(*chunks >= 1, "chunks: {pair:?}");
+    // Middle (v0.24.0 ordering invariant §2.4a): per asset the stream is
+    //   AssetStarted < AssetChunked < [ExpansionProgress*] < AssetTimings
+    //     < AssetFinished
+    // Expansion is disabled in the lexical fixture, so no ExpansionProgress
+    // frames appear here — but AssetChunked + AssetTimings are emitted for
+    // every markdown asset.
+    let middle = &events[2..events.len() - 1];
+
+    // 3 AssetStarted events, monotonic idx 1..=3, all markdown, total = 3.
+    let started: Vec<u32> = middle
+        .iter()
+        .filter_map(|e| match e {
+            IngestEvent::AssetStarted {
+                idx, total, media, ..
+            } => {
+                assert_eq!(*total, 3, "Started total mismatch: {e:?}");
+                assert_eq!(media, "markdown", "fixture is markdown only: {e:?}");
+                Some(*idx)
            }
-            other => panic!("expected Started+Finished pair, got {other:?}"),
-        }
+            _ => None,
+        })
+        .collect();
+    assert_eq!(started, vec![1, 2, 3], "AssetStarted idx order: {middle:?}");
+
+    // 3 AssetFinished events, monotonic idx 1..=3, each New with ≥1 chunk.
+    let finished: Vec<u32> = middle
+        .iter()
+        .filter_map(|e| match e {
+            IngestEvent::AssetFinished {
+                idx,
+                total,
+                result,
+                chunks,
+            } => {
+                assert_eq!(*total, 3, "Finished total mismatch: {e:?}");
+                assert_eq!(*result, IngestItemKind::New, "first ingest → New: {e:?}");
+                assert!(*chunks >= 1, "chunks: {e:?}");
+                Some(*idx)
+            }
+            _ => None,
+        })
+        .collect();
+    assert_eq!(finished, vec![1, 2, 3], "AssetFinished idx order: {middle:?}");
+
+    // v0.24.0 additive events: exactly one AssetChunked + one AssetTimings
+    // per asset, each strictly bracketed by that asset's Started / Finished.
+    for target in 1u32..=3 {
+        let started_at = middle
+            .iter()
+            .position(|e| matches!(e, IngestEvent::AssetStarted { idx, .. } if *idx == target))
+            .unwrap_or_else(|| panic!("missing AssetStarted for idx {target}: {middle:?}"));
+        let finished_at = middle
+            .iter()
+            .position(|e| matches!(e, IngestEvent::AssetFinished { idx, .. } if *idx == target))
+            .unwrap_or_else(|| panic!("missing AssetFinished for idx {target}: {middle:?}"));
+        let chunked_at = middle
+            .iter()
+            .position(|e| matches!(e, IngestEvent::AssetChunked { idx, chunks, .. } if *idx == target && *chunks >= 1))
+            .unwrap_or_else(|| panic!("missing AssetChunked for idx {target}: {middle:?}"));
+        let timings_at = middle
+            .iter()
+            .position(|e| matches!(e, IngestEvent::AssetTimings { idx, .. } if *idx == target))
+            .unwrap_or_else(|| panic!("missing AssetTimings for idx {target}: {middle:?}"));
+        assert!(
+            started_at < chunked_at && chunked_at < timings_at && timings_at < finished_at,
+            "idx {target} ordering: started={started_at} chunked={chunked_at} \
+             timings={timings_at} finished={finished_at}: {middle:?}"
+        );
    }
 }

--- a/crates/kebab-cli/Cargo.toml
+++ b/crates/kebab-cli/Cargo.toml
@@ -51,5 +51,10 @@ tempfile     = { workspace = true }
 rusqlite     = { workspace = true }
 time         = { workspace = true }

+[features]
+# opt-in (macOS): build the `kebab` binary with candle on the Apple Silicon GPU.
+#   cargo build --release --features embed_metal
+embed_metal = ["kebab-app/embed_metal"]
+
 [lints]
 workspace = true
--- a/crates/kebab-cli/src/main.rs
+++ b/crates/kebab-cli/src/main.rs
@@ -60,6 +60,12 @@ enum Cmd {
        force: bool,
    },

+    /// config.toml 관리 (스키마 마이그레이션 등).
+    Config {
+        #[command(subcommand)]
+        what: ConfigWhat,
+    },
+
    /// Scan the workspace and ingest new/updated documents.
    Ingest {
        /// Workspace root override.
@@ -346,6 +352,16 @@ enum Cmd {
    },
 }

+#[derive(Subcommand, Debug)]
+enum ConfigWhat {
+    /// 기존 config.toml 을 새 스키마로 마이그레이션(빠진 섹션 추가 + 멱등 + .bak 백업).
+    Migrate {
+        /// 변경만 출력하고 파일은 수정하지 않는다.
+        #[arg(long)]
+        dry_run: bool,
+    },
+}
+
 #[derive(Subcommand, Debug)]
 enum ListWhat {
    /// List documents currently indexed.
@@ -616,6 +632,24 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
                .map(|v| v.eq_ignore_ascii_case("plain"))
                .unwrap_or(false);
            let mode = progress::ProgressMode::from_flags(cli.json, cli.quiet, plain_env);
+
+            // Surface the active embedding backend/device on the terminal so the
+            // user sees it without grepping kb.log (the per-device tracing line
+            // only lands in the log file at --verbose). Suppressed under
+            // --json/--quiet. The Metal note reflects the build (`embed_metal`);
+            // the confirmed runtime device is in kb.log (`candle device = ...`).
+            if !cli.json && !cli.quiet {
+                let backend = match cfg.models.embedding.provider.as_str() {
+                    "candle" if cfg!(feature = "embed_metal") => "candle (Metal/GPU 빌드)",
+                    "candle" => "candle (CPU, 순수 Rust)",
+                    "fastembed" | "onnx" | "" => "fastembed (onnxruntime)",
+                    "none" => "비활성 (lexical-only)",
+                    other => other,
+                };
+                eprintln!("임베딩 백엔드: {backend} · 모델 {} ({}-dim)",
+                    cfg.models.embedding.model, cfg.models.embedding.dimensions);
+            }
+
            let (tx, rx) = std::sync::mpsc::channel::<kebab_app::IngestEvent>();
            let display_handle =
                std::thread::spawn(move || progress::ProgressDisplay::new(mode).run(rx));
@@ -1310,6 +1344,42 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
            Ok(())
        }

+        Cmd::Config { what } => match what {
+            ConfigWhat::Migrate { dry_run } => {
+                let report =
+                    kebab_app::config_migrate_with_config_path(cli.config.as_deref(), *dry_run)?;
+                if cli.json {
+                    println!(
+                        "{}",
+                        serde_json::to_string(&wire::wire_config_migration(&report))?
+                    );
+                } else if !report.changed {
+                    println!(
+                        "config 이미 최신입니다 (schema v{}).",
+                        report.to_schema_version
+                    );
+                } else {
+                    let verb = if report.dry_run { "변경 예정" } else { "적용됨" };
+                    println!(
+                        "config 마이그레이션 {verb}: v{} → v{} ({} changes)",
+                        report.from_schema_version,
+                        report.to_schema_version,
+                        report.changes.len()
+                    );
+                    for c in &report.changes {
+                        println!("  - [{:?}] {} — {}", c.kind, c.path, c.detail);
+                    }
+                    if let Some(bak) = &report.backup_path {
+                        println!("백업: {bak}");
+                    }
+                    if report.dry_run {
+                        println!("(--dry-run: 파일 미수정. 적용하려면 --dry-run 없이 재실행)");
+                    }
+                }
+                Ok(())
+            }
+        },
+
        Cmd::Doctor => {
            let report = kebab_app::doctor_with_config_path(cli.config.as_deref())?;
            if cli.json {
--- a/crates/kebab-cli/src/progress.rs
+++ b/crates/kebab-cli/src/progress.rs
@@ -157,6 +157,54 @@ impl ProgressDisplay {
                // in Completed handles the final state. No per-asset bar update
                // here avoids the duplicate-frame artifact in TTY scrollback.
            }
+            // v0.24.0: asset-internal phase visibility. AssetChunked /
+            // ExpansionProgress use the bar *message* (live sub-progress for
+            // the current asset) — distinct from the per-file position draw,
+            // so a single large document no longer looks frozen. AssetTimings
+            // prints a one-line breakdown when the asset finishes.
+            IngestEvent::AssetChunked { idx, total, chunks } => {
+                if let Some(bar) = self.bar.as_ref() {
+                    bar.set_message(format!("→ {chunks} chunks"));
+                }
+                if !tty && !quiet {
+                    let mut err = std::io::stderr().lock();
+                    let _ = writeln!(err, "ingest: {idx}/{total} → {chunks} chunks");
+                }
+            }
+            IngestEvent::ExpansionProgress {
+                done, chunks, ..
+            } => {
+                if let Some(bar) = self.bar.as_ref() {
+                    bar.set_message(format!("별칭 확장 {done}/{chunks}"));
+                }
+                // Non-TTY: suppressed by default — throttled though it is, one
+                // line per emit would still spam CI logs. The bar message
+                // covers the interactive case; --json carries every frame.
+            }
+            IngestEvent::AssetTimings {
+                parse_ms,
+                chunk_ms,
+                expansion_ms,
+                embed_ms,
+                store_ms,
+                ..
+            } => {
+                if let Some(bar) = self.bar.as_ref() {
+                    bar.set_message("");
+                }
+                if !quiet {
+                    let mut err = std::io::stderr().lock();
+                    let _ = writeln!(
+                        err,
+                        "  ⏱ parse {} · chunk {} · expand {} · embed {} · store {}",
+                        fmt_ms(*parse_ms),
+                        fmt_ms(*chunk_ms),
+                        fmt_ms(*expansion_ms),
+                        fmt_ms(*embed_ms),
+                        fmt_ms(*store_ms),
+                    );
+                }
+            }
            IngestEvent::Completed { counts } => {
                if let Some(bar) = self.bar.take() {
                    bar.finish_and_clear();
@@ -239,6 +287,17 @@ fn emit_json(event: &IngestEvent) -> anyhow::Result<()> {
    Ok(())
 }

+/// Render a phase duration (milliseconds) compactly for the human-mode
+/// `AssetTimings` line: `< 1000ms` stays in `ms`, larger spans collapse to
+/// one-decimal seconds so a 45-second expansion reads `45.0s`, not `45000ms`.
+fn fmt_ms(ms: u64) -> String {
+    if ms >= 1000 {
+        format!("{:.1}s", ms as f64 / 1000.0)
+    } else {
+        format!("{ms}ms")
+    }
+}
+
 /// Format the current wall-clock as RFC 3339 — used by `wire_ingest_progress`
 /// so every emitted event carries an `ts` field per §2.4a / the wire schema.
 pub(crate) fn now_rfc3339() -> anyhow::Result<String> {
@@ -285,6 +344,15 @@ mod tests {
        }
    }

+    #[test]
+    fn fmt_ms_switches_unit_at_one_second() {
+        assert_eq!(fmt_ms(0), "0ms");
+        assert_eq!(fmt_ms(999), "999ms");
+        assert_eq!(fmt_ms(1000), "1.0s");
+        assert_eq!(fmt_ms(45_000), "45.0s");
+        assert_eq!(fmt_ms(1500), "1.5s");
+    }
+
    #[test]
    fn now_rfc3339_parses_back() {
        let s = now_rfc3339().unwrap();
--- a/crates/kebab-cli/src/wire.rs
+++ b/crates/kebab-cli/src/wire.rs
@@ -225,6 +225,12 @@ pub fn wire_bulk_search_item(item: &kebab_core::BulkSearchItem) -> Value {
    v
 }

+/// `config_migration.v1` 직렬화. `ConfigMigrationReport` 가 `schema_version`
+/// 필드를 자체 보유하므로(doctor 와 동일) 그대로 직렬화한다.
+pub fn wire_config_migration(r: &kebab_app::ConfigMigrationReport) -> Value {
+    serde_json::to_value(r).expect("ConfigMigrationReport serializes")
+}
+
 #[cfg(test)]
 mod tests {
    use super::*;
--- a/crates/kebab-config/Cargo.toml
+++ b/crates/kebab-config/Cargo.toml
@@ -15,6 +15,7 @@ serde        = { workspace = true }
 serde_json   = { workspace = true }
 thiserror    = { workspace = true }
 toml         = "0.8"
+toml_edit    = "0.22"
 dirs         = "5"
 # p9-fb-05: warn-log when current_dir() fails (chroot, deleted cwd)
 # during workspace.root resolution.
--- a/crates/kebab-config/src/lib.rs
+++ b/crates/kebab-config/src/lib.rs
@@ -9,6 +9,7 @@ use std::path::{Path, PathBuf};
 use serde::{Deserialize, Serialize};

 mod paths;
+pub mod migrate;
 pub use paths::{expand_path, expand_path_with_base};

 /// Signal: `Config::from_file` / `Config::load` failed due to missing path,
@@ -154,11 +155,21 @@ impl NliCfg {

 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 pub struct EmbeddingModelCfg {
+    /// `fastembed` (default, onnxruntime) or `candle` (pure-Rust,
+    /// NUMA-safe). `none` disables embeddings (lexical-only). Unknown
+    /// values error at embedder construction.
    pub provider: String,
    pub model: String,
    pub version: String,
    pub dimensions: usize,
    pub batch_size: usize,
+    /// Cap on the CPU worker threads the `candle` provider spins up
+    /// (sizes the global rayon pool; env `KEBAB_EMBED_THREADS` overrides).
+    /// `0` = auto (rayon default = #cores). Lever to sidestep the
+    /// onnxruntime 48-thread NUMA double-free; ignored by the `fastembed`
+    /// provider. Defaulted on load so pre-0.22 config files still parse.
+    #[serde(default)]
+    pub num_threads: u32,
 }

 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
@@ -669,7 +680,7 @@ impl Config {
    /// Defaults per design §6.4.
    pub fn defaults() -> Self {
        Self {
-            schema_version: 1,
+            schema_version: crate::migrate::CURRENT_SCHEMA_VERSION,
            workspace: WorkspaceCfg {
                root: "~/KnowledgeBase".to_string(),
                exclude: vec![
@@ -706,6 +717,7 @@ impl Config {
                    version: "v1".to_string(),
                    dimensions: 1024,
                    batch_size: 64,
+                    num_threads: 0,
                },
                llm: LlmCfg {
                    provider: "ollama".to_string(),
@@ -963,6 +975,11 @@ impl Config {
                        self.models.embedding.batch_size = n;
                    }
                }
+                "KEBAB_MODELS_EMBEDDING_NUM_THREADS" => {
+                    if let Ok(n) = v.parse::<u32>() {
+                        self.models.embedding.num_threads = n;
+                    }
+                }

                // models.llm
                "KEBAB_MODELS_LLM_PROVIDER" => self.models.llm.provider = v.clone(),
--- a/crates/kebab-config/src/migrate.rs
+++ b/crates/kebab-config/src/migrate.rs
@@ -0,0 +1,399 @@
+//! config.toml 마이그레이션 엔진 (순수 변환, I/O 없음).
+//!
+//! 두 메커니즘: (1) reconciliation — default 구조에 있고 사용자 파일에
+//! 없는 섹션/키를 주석과 함께 추가. (2) step 체인 — schema_version 기반
+//! non-additive 변환(deprecated 제거 등). 자세한 계약은 spec
+//! `docs/superpowers/specs/2026-05-31-config-migration-design.md`.
+
+use toml_edit::DocumentMut;
+
+/// 현재 바이너리가 이해하는 config 스키마 버전. 마이그레이션 완료 시
+/// 사용자 파일의 `schema_version` 을 이 값으로 stamp 한다.
+pub const CURRENT_SCHEMA_VERSION: u32 = 2;
+
+/// 한 번의 마이그레이션에서 발생한 개별 변경.
+#[derive(Clone, Debug, PartialEq, serde::Serialize)]
+pub struct MigrationChange {
+    pub kind: ChangeKind,
+    /// dotted path, 예: `ingest.expansion`, `workspace.include`.
+    pub path: String,
+    /// 사람·wire 용 한 줄 설명.
+    pub detail: String,
+}
+
+#[derive(Clone, Copy, Debug, PartialEq, Eq, serde::Serialize)]
+#[serde(rename_all = "snake_case")]
+pub enum ChangeKind {
+    AddedSection,
+    AddedKey,
+    RemovedDeprecated,
+}
+
+/// 마이그레이션 결과 요약(순수 변환 단계 산출). I/O 계층이 backup_path
+/// 등을 채워 wire 로 내보낸다.
+#[derive(Clone, Debug, PartialEq, serde::Serialize)]
+pub struct MigrationOutcome {
+    pub from_schema_version: u32,
+    pub to_schema_version: u32,
+    pub changes: Vec<MigrationChange>,
+    /// 변환 후 직렬화된 새 문서 텍스트(멱등 시 입력과 동일).
+    pub new_text: String,
+}
+
+impl MigrationOutcome {
+    pub fn changed(&self) -> bool {
+        !self.changes.is_empty()
+    }
+}
+
+/// 문서 최상단 헤더(경로 정책 등). 기존 init 헤더를 이전.
+const HEADER: &str = "\
+# kebab config — `~/.config/kebab/config.toml`.
+#
+# `workspace.root` accepts: 절대 / tilde(~) / env(${VAR}) / 상대 경로.
+#   상대 경로의 base 는 cwd 가 아니라 THIS config 파일의 디렉토리.
+#
+# 처리 가능한 형식 (extractor 가 자동 결정 — config 에 명시할 수 없음):
+#   • Markdown: .md
+#   • 이미지:   .png .jpg .jpeg  (OCR + caption)
+#   • PDF:      .pdf
+#
+# 런타임 override: `KEBAB_*` env (예: KEBAB_WORKSPACE_ROOT=/tmp kebab ingest).
+#
+# 이 파일은 `kebab config migrate` 로 새 스키마에 맞춰 갱신할 수 있다
+# (빠진 섹션 추가 + 손본 값·주석 보존).
+";
+
+/// 테이블 헤더(`[section]`) 위에 붙일 주석. dotted path → 한 줄(들).
+fn section_comment(path: &str) -> Option<&'static str> {
+    Some(match path {
+        "workspace" => "# 색인 대상 워크스페이스.",
+        "storage" => "# XDG 저장 경로(데이터/sqlite/벡터/에셋/모델).",
+        "indexing" => "# 병렬도 + 파일시스템 watch.",
+        "chunking" => "# 청크 크기·오버랩·heading 존중.",
+        "models" => "# embedding / llm / nli 모델.",
+        "models.embedding" => "# 다국어 sentence embedding. dim 불일치 시 검색 0건.",
+        "models.llm" => "# Ollama host:port + 모델.",
+        "models.nli" => "# NLI(groundedness) 모델.",
+        "search" => "# 검색 기본 k·stale 기준·fusion.",
+        "rag" => "# 답변 생성: prompt 템플릿·score gate·NLI.",
+        "image" => "# 이미지 OCR + 캡션(기본 off, asset 당 모델 호출 비용).",
+        "image.ocr" => "# 이미지 OCR(기본 off).",
+        "image.caption" => "# 이미지 캡션(기본 off).",
+        "ui" => "# TUI 팔레트·role 스타일.",
+        "ingest" => "# ingest 정책(code skip 등).",
+        "ingest.code" => "# code ingest skip 정책(.gitignore 자동 honor).",
+        "ingest.expansion" => "# doc-side 별칭 확장(기본 off). 패러프레이즈 강건성↑, LLM 비용 큼.",
+        "pdf" => "# PDF ingest. scanned PDF OCR 은 기본 off(page 당 cost).",
+        "pdf.ocr" => "# scanned PDF page-단위 OCR(기본 off).",
+        "logging" => "# ingest 로그(기본 on, ~/.local/state/kebab/logs).",
+        _ => return None,
+    })
+}
+
+/// Config::defaults() 를 직렬화 + 주석 부착한 "완전체" 문서.
+/// init 과 migrate reconciliation 의 단일 참조 원천.
+pub fn annotated_default_document() -> DocumentMut {
+    let defaults = crate::Config::defaults();
+    let pretty = toml::to_string_pretty(&defaults).expect("defaults serialize");
+    let mut doc: DocumentMut = pretty.parse().expect("defaults parse as toml_edit");
+
+    // 헤더: 첫 최상위 항목의 prefix 로.
+    if let Some((mut first_key, _)) = doc.as_table_mut().iter_mut().next() {
+        first_key.leaf_decor_mut().set_prefix(format!("{HEADER}\n"));
+    }
+
+    annotate_table(doc.as_table_mut(), "");
+    doc
+}
+
+/// 재귀적으로 하위 테이블에 헤더 주석 부착. `prefix_path` 는 dotted 누적 경로.
+/// annotated_default_document 는 항상 주석 없는 defaults 에서 새로 만들므로
+/// 무조건 부착해도 중복되지 않는다.
+fn annotate_table(table: &mut toml_edit::Table, prefix_path: &str) {
+    let keys: Vec<String> = table.iter().map(|(k, _)| k.to_string()).collect();
+    for key in keys {
+        let path = if prefix_path.is_empty() {
+            key.clone()
+        } else {
+            format!("{prefix_path}.{key}")
+        };
+        if let Some(item) = table.get_mut(&key) {
+            if let Some(sub) = item.as_table_mut() {
+                if let Some(c) = section_comment(&path) {
+                    sub.decor_mut().set_prefix(format!("\n{c}\n"));
+                }
+                annotate_table(sub, &path);
+            }
+        }
+    }
+}
+
+/// 참조(주석 달린 default) 테이블 `reference` 를 기준으로, 사용자 테이블
+/// `user` 에 없는 항목을 decor(주석) 포함 통째 복사한다. 이미 있는 키는
+/// 건드리지 않는다(값 불가침). 양쪽이 테이블이면 하위로 재귀.
+pub fn reconcile(
+    reference: &toml_edit::Table,
+    user: &mut toml_edit::Table,
+    prefix_path: &str,
+    changes: &mut Vec<MigrationChange>,
+) {
+    for (key, ref_item) in reference.iter() {
+        let path = if prefix_path.is_empty() {
+            key.to_string()
+        } else {
+            format!("{prefix_path}.{key}")
+        };
+        match user.get_mut(key) {
+            None => {
+                // schema_version 키는 stamp 단계가 다룬다(change 기록 X).
+                if path == "schema_version" {
+                    user.insert(key, ref_item.clone());
+                    continue;
+                }
+                let kind = if ref_item.is_table() {
+                    ChangeKind::AddedSection
+                } else {
+                    ChangeKind::AddedKey
+                };
+                user.insert(key, ref_item.clone());
+                changes.push(MigrationChange {
+                    kind,
+                    path: path.clone(),
+                    detail: section_comment(&path).map_or_else(
+                        || format!("{key} 추가"),
+                        |c| c.trim_start_matches("# ").to_string(),
+                    ),
+                });
+            }
+            Some(existing) => {
+                if let (Some(ref_tbl), Some(user_tbl)) =
+                    (ref_item.as_table(), existing.as_table_mut())
+                {
+                    reconcile(ref_tbl, user_tbl, &path, changes);
+                }
+                // 둘 다 테이블이 아니면(스칼라 등) 값 불가침 → 무시.
+            }
+        }
+    }
+}
+
+/// v1 → v2: deprecated `workspace.include` 제거(p9-fb-25). 멱등.
+pub fn step_1_to_2(doc: &mut DocumentMut, changes: &mut Vec<MigrationChange>) {
+    if let Some(ws) = doc.get_mut("workspace").and_then(|i| i.as_table_mut()) {
+        if ws.remove("include").is_some() {
+            changes.push(MigrationChange {
+                kind: ChangeKind::RemovedDeprecated,
+                path: "workspace.include".to_string(),
+                detail: "p9-fb-25: 처리 형식은 extractor 가 자동 결정 — 더 이상 사용 안 함."
+                    .to_string(),
+            });
+        }
+    }
+}
+
+/// 파일의 schema_version(없으면 1) 부터 CURRENT 까지 step 적용.
+fn run_steps(doc: &mut DocumentMut, from: u32, changes: &mut Vec<MigrationChange>) {
+    if from < 2 {
+        step_1_to_2(doc, changes);
+    }
+    // 미래 step: if from < 3 { step_2_to_3(...) } ...
+}
+
+/// 사용자 config.toml 텍스트를 받아 step 체인 + reconciliation + version
+/// stamp 를 적용하고 결과를 반환한다. 순수 함수(I/O 없음). 파싱 실패 시
+/// from=1, 변경 없음, new_text=입력 그대로(상위에서 파싱 에러를 따로 처리).
+pub fn migrate_document(text: &str) -> MigrationOutcome {
+    let mut doc: DocumentMut = match text.parse() {
+        Ok(d) => d,
+        Err(_) => {
+            return MigrationOutcome {
+                from_schema_version: 1,
+                to_schema_version: CURRENT_SCHEMA_VERSION,
+                changes: Vec::new(),
+                new_text: text.to_string(),
+            };
+        }
+    };
+    let from = doc
+        .get("schema_version")
+        .and_then(toml_edit::Item::as_integer)
+        .unwrap_or(1) as u32;
+
+    let mut changes = Vec::new();
+
+    // 1. non-additive step 체인.
+    run_steps(&mut doc, from, &mut changes);
+
+    // 2. additive reconciliation(버전 무관).
+    let reference = annotated_default_document();
+    let ref_table = reference.as_table().clone();
+    reconcile(&ref_table, doc.as_table_mut(), "", &mut changes);
+
+    // 3. schema_version stamp.
+    let current_in_file = doc
+        .get("schema_version")
+        .and_then(toml_edit::Item::as_integer)
+        .unwrap_or(0) as u32;
+    if current_in_file != CURRENT_SCHEMA_VERSION {
+        doc["schema_version"] = toml_edit::value(i64::from(CURRENT_SCHEMA_VERSION));
+    }
+
+    MigrationOutcome {
+        from_schema_version: from,
+        to_schema_version: CURRENT_SCHEMA_VERSION,
+        changes,
+        new_text: doc.to_string(),
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn annotated_default_has_all_sections_and_parses_back_to_defaults() {
+        let doc = annotated_default_document();
+        let text = doc.to_string();
+        // PdfCfg/ImageCfg/ModelsCfg/IngestCfg 는 스칼라 필드가 없어 bare
+        // `[pdf]` 등은 안 나오고 `[pdf.ocr]` 같은 하위 테이블만 직렬화된다.
+        for section in [
+            "[workspace]",
+            "[ingest.expansion]",
+            "[pdf.ocr]",
+            "[logging]",
+            "[ui]",
+        ] {
+            assert!(text.contains(section), "missing {section}:\n{text}");
+        }
+        assert!(text.contains("# "), "no comments attached");
+        let back: crate::Config = toml::from_str(&text).expect("parse annotated default");
+        assert_eq!(back, crate::Config::defaults());
+    }
+
+    #[test]
+    fn reconcile_adds_missing_section_preserving_user_values_and_comments() {
+        // ingest 는 code 만 있고 expansion 누락(v0.21.0 동기 시나리오),
+        // logging 통째 누락, score 는 사용자가 바꿈, 주석 보유.
+        let user_text = "\
+schema_version = 1
+
+[workspace]
+root = \"/my/notes\"   # 내 워크스페이스
+
+[search]
+default_k = 25
+
+[ingest.code]
+skip_generated_header = true
+";
+        let mut user: DocumentMut = user_text.parse().unwrap();
+        let reference = annotated_default_document();
+        let ref_tbl = reference.as_table().clone();
+        let mut changes = Vec::new();
+        reconcile(&ref_tbl, user.as_table_mut(), "", &mut changes);
+        let out = user.to_string();
+
+        // 부분 존재하는 [ingest] 에 expansion 만 주석과 함께 추가.
+        assert!(out.contains("[ingest.expansion]"), "expansion not added:\n{out}");
+        // 통째 누락된 logging 추가.
+        assert!(out.contains("[logging]"), "logging not added");
+        // 사용자 값/주석/기존 섹션 보존.
+        assert!(out.contains("root = \"/my/notes\""));
+        assert!(out.contains("# 내 워크스페이스"));
+        assert!(out.contains("default_k = 25"));
+        assert!(out.contains("skip_generated_header = true"));
+        // 새 섹션 주석 부착.
+        assert!(out.contains("doc-side 별칭"));
+        // 부분 존재 부모로 재귀해 leaf 경로를 기록.
+        assert!(
+            changes
+                .iter()
+                .any(|c| c.kind == ChangeKind::AddedSection && c.path == "ingest.expansion"),
+            "changes: {changes:?}"
+        );
+        // 통째 누락 부모는 부모 경로로 한 번 기록.
+        assert!(
+            changes
+                .iter()
+                .any(|c| c.kind == ChangeKind::AddedSection && c.path == "logging")
+        );
+    }
+
+    #[test]
+    fn reconcile_does_not_overwrite_user_value_differing_from_default() {
+        let user_text = "\
+schema_version = 2
+
+[rag]
+score_gate = 0.8
+";
+        let mut user: DocumentMut = user_text.parse().unwrap();
+        let reference = annotated_default_document();
+        let ref_tbl = reference.as_table().clone();
+        let mut changes = Vec::new();
+        reconcile(&ref_tbl, user.as_table_mut(), "", &mut changes);
+        let out = user.to_string();
+        assert!(out.contains("score_gate = 0.8"), "user value clobbered:\n{out}");
+        assert!(!changes.iter().any(|c| c.path == "rag.score_gate"));
+    }
+
+    #[test]
+    fn step_1_to_2_removes_deprecated_workspace_include() {
+        let user_text = "\
+[workspace]
+root = \"/n\"
+include = [\"*.md\"]
+";
+        let mut user: DocumentMut = user_text.parse().unwrap();
+        let mut changes = Vec::new();
+        step_1_to_2(&mut user, &mut changes);
+        let out = user.to_string();
+        assert!(!out.contains("include"), "include not removed:\n{out}");
+        assert!(
+            changes
+                .iter()
+                .any(|c| c.kind == ChangeKind::RemovedDeprecated && c.path == "workspace.include")
+        );
+        let mut changes2 = Vec::new();
+        step_1_to_2(&mut user, &mut changes2);
+        assert!(changes2.is_empty());
+    }
+
+    fn read_schema_version(text: &str) -> u32 {
+        let doc: DocumentMut = text.parse().unwrap();
+        doc.get("schema_version")
+            .and_then(toml_edit::Item::as_integer)
+            .unwrap_or(1) as u32
+    }
+
+    #[test]
+    fn migrate_document_stamps_version_and_is_idempotent() {
+        let old = "\
+schema_version = 1
+
+[workspace]
+root = \"/n\"
+include = [\"*.md\"]
+";
+        let outcome = migrate_document(old);
+        assert_eq!(outcome.from_schema_version, 1);
+        assert_eq!(outcome.to_schema_version, CURRENT_SCHEMA_VERSION);
+        assert!(outcome.changed());
+        assert!(!outcome.new_text.contains("include"));
+        assert!(outcome.new_text.contains("[ingest.expansion]"));
+        assert_eq!(read_schema_version(&outcome.new_text), CURRENT_SCHEMA_VERSION);
+
+        let again = migrate_document(&outcome.new_text);
+        assert!(!again.changed(), "not idempotent: {:?}", again.changes);
+        assert_eq!(again.new_text, outcome.new_text);
+    }
+
+    #[test]
+    fn migrate_document_missing_schema_version_treated_as_v1() {
+        let old = "[workspace]\nroot = \"/n\"\n";
+        let outcome = migrate_document(old);
+        assert_eq!(outcome.from_schema_version, 1);
+        assert_eq!(read_schema_version(&outcome.new_text), CURRENT_SCHEMA_VERSION);
+    }
+}
--- a/crates/kebab-embed-candle/Cargo.toml
+++ b/crates/kebab-embed-candle/Cargo.toml
@@ -0,0 +1,47 @@
+[package]
+name = "kebab-embed-candle"
+version       = { workspace = true }
+edition       = { workspace = true }
+rust-version  = { workspace = true }
+license       = { workspace = true }
+repository    = { workspace = true }
+description   = "Pure-Rust candle adapter implementing kb_core::Embedder (multilingual-e5-large, NUMA-safe thread cap)"
+
+[dependencies]
+kebab-core = { path = "../kebab-core" }
+kebab-config = { path = "../kebab-config" }
+# candle stack — pinned to the workspace-locked crates.io release (0.10.x),
+# same versions the Phase 0 spike compiled so build artifacts are reused.
+candle-core = "0.10.2"
+candle-nn = "0.10.2"
+candle-transformers = "0.10.2"
+tokenizers = "0.21"
+hf-hub = { version = "0.4", features = ["ureq"] }
+serde_json = { workspace = true }
+# Thread cap: a one-shot global rayon pool sizes candle's CPU threads
+# (the Phase 0 spike proved RAYON_NUM_THREADS caps candle), so a NUMA host
+# can keep onnxruntime's hard-coded 48-intra-op heap corruption at bay.
+rayon = "1"
+anyhow = { workspace = true }
+tracing = { workspace = true }
+
+[features]
+# opt-in: run candle on the Apple Silicon GPU (Metal). macOS-only — the build
+# enables candle's metal backend and `select_device()` picks Metal (CPU fallback
+# on failure). Lets an M-series Mac ingest e5-large on GPU (10×+ vs CPU); the
+# resulting vectors are cross-compatible with the CPU path (same model), so the
+# Linux server can serve queries on CPU candle.
+metal = ["candle-core/metal", "candle-nn/metal", "candle-transformers/metal"]
+
+[dev-dependencies]
+# Integration-test binaries can only see the library's public API + these,
+# not the library's own (non-dev) dependencies — so rayon/kebab-config/kebab-core
+# are repeated here for tests/parity.rs and tests/thread_cap.rs.
+kebab-embed-local = { path = "../kebab-embed-local" }
+kebab-config = { path = "../kebab-config" }
+kebab-core = { path = "../kebab-core" }
+rayon = "1"
+tempfile = { workspace = true }
+
+[lints]
+workspace = true
--- a/crates/kebab-embed-candle/src/lib.rs
+++ b/crates/kebab-embed-candle/src/lib.rs
@@ -0,0 +1,444 @@
+//! `kebab-embed-candle` — [`CandleEmbedder`], a pure-Rust (candle)
+//! implementation of [`Embedder`](kebab_core::Embedder).
+//!
+//! Runs the same `intfloat/multilingual-e5-large` model as the default
+//! [`FastembedEmbedder`](kebab_embed_local) but through `candle`
+//! (`candle-transformers`' XLM-RoBERTa) instead of onnxruntime. Motivation:
+//! fastembed 4.9's onnxruntime hard-codes 48 intra-op threads, which corrupts
+//! the heap (double-free) on dual-socket NUMA hosts. candle's CPU backend
+//! sizes its threads off the global rayon pool, so a one-shot
+//! [`rayon::ThreadPoolBuilder`] cap (config `num_threads` / env
+//! `KEBAB_EMBED_THREADS`) keeps the worker count NUMA-safe.
+//!
+//! Output parity with the onnxruntime path was proven by the Phase 0 spike
+//! (cosine 1.000000); this crate absorbs that pipeline verbatim:
+//!
+//! 1. e5 prefix (`passage: ` for documents, `query: ` for queries — the same
+//!    convention as `kebab-embed-local`'s `prefix_input`);
+//! 2. tokenize (max_len 512, batch-longest padding, special tokens);
+//! 3. XLM-RoBERTa forward on `Device::Cpu`;
+//! 4. attention-mask-weighted mean pooling;
+//! 5. L2 normalization.
+//!
+//! Model files (`config.json`, `tokenizer.json`, `model.safetensors`) are
+//! fetched via `hf-hub` into `{config.storage.model_dir}/candle/`.
+//!
+//! This crate is **opt-in** (`config.models.embedding.provider = "candle"`);
+//! the default provider stays `fastembed`. See
+//! `docs/superpowers/specs/2026-06-01-embed-candle-track-spec.md`.
+
+use std::sync::Mutex;
+
+use anyhow::{Context, Result};
+use candle_core::{DType, Device, Tensor};
+use candle_nn::VarBuilder;
+use candle_transformers::models::xlm_roberta::{Config as XlmConfig, XLMRobertaModel};
+use kebab_config::{Config, expand_path};
+use kebab_core::{Embedder, EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion};
+use tokenizers::{PaddingParams, PaddingStrategy, Tokenizer, TruncationParams};
+
+/// Subdirectory under `config.storage.model_dir` where the candle adapter
+/// caches safetensors + tokenizer. Mirrors `kebab-embed-local`'s
+/// `fastembed/` subdir so the two backends never collide.
+const CANDLE_CACHE_SUBDIR: &str = "candle";
+
+/// HuggingFace repo id for the multilingual e5 large model. Same weights the
+/// onnxruntime path uses, just the safetensors variant candle can read.
+const HF_MODEL: &str = "intfloat/multilingual-e5-large";
+
+/// The only `config.models.embedding.model` value the candle adapter accepts
+/// (the e5-large weights `HF_MODEL` resolves to). Guards against silently
+/// downloading e5-large while `model_id()` reports a different name.
+const SUPPORTED_MODEL: &str = "multilingual-e5-large";
+
+/// Token truncation length (e5 was trained at 512).
+const MAX_LEN: usize = 512;
+
+/// Env var that overrides `config.models.embedding.num_threads`. Read once in
+/// [`CandleEmbedder::new`]; `0`/unset/unparseable means "leave rayon default".
+const ENV_EMBED_THREADS: &str = "KEBAB_EMBED_THREADS";
+
+/// Pure-Rust candle adapter. Construct via [`CandleEmbedder::new`]; the
+/// constructor downloads the model on first use, so share one instance.
+pub struct CandleEmbedder {
+    // candle's `forward` is `&self`, but `XLMRobertaModel` is not guaranteed
+    // `Sync`; the `Mutex` both supplies that bound and serializes inference
+    // (callers batch sequentially anyway — same rationale as
+    // `FastembedEmbedder`).
+    model: Mutex<XLMRobertaModel>,
+    tokenizer: Tokenizer,
+    device: Device,
+    model_id: EmbeddingModelId,
+    version: EmbeddingVersion,
+    dimensions: usize,
+    batch_size: usize,
+}
+
+impl CandleEmbedder {
+    /// Build an embedder from `Config`. Applies the NUMA thread cap, fetches
+    /// the model into `{model_dir}/candle/`, and validates that the model's
+    /// hidden size matches `config.models.embedding.dimensions` before
+    /// returning.
+    pub fn new(config: &Config) -> Result<Self> {
+        // 1. NUMA thread cap. env `KEBAB_EMBED_THREADS` wins over the config
+        //    field; `0`/unset leaves rayon's default. `build_global` errors if
+        //    the pool was already initialized — intentionally ignored so a
+        //    second embedder (or a prior rayon user) is a no-op, not a failure.
+        let n_threads = std::env::var(ENV_EMBED_THREADS)
+            .ok()
+            .and_then(|v| v.parse::<usize>().ok())
+            .unwrap_or(config.models.embedding.num_threads as usize);
+        if n_threads > 0 {
+            if apply_thread_cap(n_threads) {
+                tracing::info!(
+                    target: "kebab-embed-candle",
+                    num_threads = n_threads,
+                    "capped global rayon pool for candle CPU backend"
+                );
+            } else {
+                tracing::debug!(
+                    target: "kebab-embed-candle",
+                    requested = n_threads,
+                    "global rayon pool already initialized; thread cap not applied"
+                );
+            }
+        }
+
+        // 1b. Model guard. `HF_MODEL` is hard-coded (candle currently only wires
+        //     e5-large), so if the operator configured a *different* model name
+        //     we must NOT silently download e5-large and then label its vectors
+        //     with the configured name via `model_id()` — that would mislabel
+        //     `embedding_version` and corrupt a mixed index. Fail fast, before
+        //     the ~2GB download.
+        let want = config.models.embedding.model.as_str();
+        if want != SUPPORTED_MODEL && want != HF_MODEL {
+            anyhow::bail!(
+                "candle provider currently supports only '{SUPPORTED_MODEL}' (or \
+                 the HF id '{HF_MODEL}'), but config.models.embedding.model = \
+                 '{want}'. Use provider=fastembed for other models, or set \
+                 model = \"{SUPPORTED_MODEL}\"."
+            );
+        }
+
+        // 2. Resolve `{data_dir}/models/candle/` exactly like the fastembed
+        //    adapter resolves its own subdir.
+        let data_dir = expand_path(&config.storage.data_dir, "");
+        let model_dir = expand_path(&config.storage.model_dir, &data_dir.to_string_lossy());
+        let cache_dir = model_dir.join(CANDLE_CACHE_SUBDIR);
+        std::fs::create_dir_all(&cache_dir)
+            .with_context(|| format!("create candle cache dir {}", cache_dir.display()))?;
+
+        let device = select_device();
+
+        // 3. Fetch model files via hf-hub into the candle cache.
+        tracing::info!(
+            target: "kebab-embed-candle",
+            cache_dir = %cache_dir.display(),
+            model = HF_MODEL,
+            "loading candle embedding model (first run downloads ~2GB safetensors)"
+        );
+        let api = hf_hub::api::sync::ApiBuilder::new()
+            .with_cache_dir(cache_dir.clone())
+            .build()
+            .context("kb-embed-candle: build hf-hub api")?;
+        let repo = api.model(HF_MODEL.to_string());
+        let config_path = repo.get("config.json").context("download config.json")?;
+        let tokenizer_path = repo
+            .get("tokenizer.json")
+            .context("download tokenizer.json")?;
+        let weights_path = repo
+            .get("model.safetensors")
+            .context("download model.safetensors")?;
+
+        // 4. Build the candle XLM-RoBERTa model.
+        let cfg_json = std::fs::read_to_string(&config_path)
+            .with_context(|| format!("read {}", config_path.display()))?;
+        let cfg: XlmConfig =
+            serde_json::from_str(&cfg_json).context("kb-embed-candle: parse XLM-R config")?;
+
+        // Validate dim BEFORE building the model so a misconfigured
+        // `dimensions` fails cheaply (matches FastembedEmbedder's contract).
+        check_dim(cfg.hidden_size, config.models.embedding.dimensions)?;
+
+        let vb = unsafe {
+            VarBuilder::from_mmaped_safetensors(&[weights_path], DType::F32, &device)
+                .context("kb-embed-candle: mmap safetensors")?
+        };
+        let model =
+            XLMRobertaModel::new(&cfg, vb).context("kb-embed-candle: build XLMRobertaModel")?;
+
+        let mut tokenizer = Tokenizer::from_file(&tokenizer_path)
+            .map_err(|e| anyhow::anyhow!("kb-embed-candle: load tokenizer: {e}"))?;
+        tokenizer
+            .with_padding(Some(PaddingParams {
+                strategy: PaddingStrategy::BatchLongest,
+                ..Default::default()
+            }))
+            .with_truncation(Some(TruncationParams {
+                max_length: MAX_LEN,
+                ..Default::default()
+            }))
+            .map_err(|e| anyhow::anyhow!("kb-embed-candle: set truncation: {e}"))?;
+
+        tracing::info!(
+            target: "kebab-embed-candle",
+            dimensions = cfg.hidden_size,
+            layers = cfg.num_hidden_layers,
+            "candle embedding model loaded"
+        );
+
+        Ok(Self {
+            model: Mutex::new(model),
+            tokenizer,
+            device,
+            model_id: EmbeddingModelId(config.models.embedding.model.clone()),
+            version: EmbeddingVersion(config.models.embedding.version.clone()),
+            dimensions: cfg.hidden_size,
+            batch_size: config.models.embedding.batch_size.max(1),
+        })
+    }
+
+    /// Embed one batch of **already-prefixed** strings (the e5 `query:`/
+    /// `passage:` prefix is applied by the caller [`CandleEmbedder::embed`])
+    /// through the candle pipeline: tokenize → forward → masked mean pool → L2.
+    fn embed_batch(&self, prefixed: &[String]) -> Result<Vec<Vec<f32>>> {
+        let encodings = self
+            .tokenizer
+            .encode_batch(prefixed.to_vec(), true)
+            .map_err(|e| anyhow::anyhow!("kb-embed-candle: encode_batch: {e}"))?;
+
+        let bsz = encodings.len();
+        // `embed` already returns early on empty input and `.chunks()` never
+        // yields an empty slice, so this is currently unreachable — but guard
+        // the index so a future refactor can't turn it into a panic.
+        let Some(first) = encodings.first() else {
+            return Ok(Vec::new());
+        };
+        let seq = first.get_ids().len();
+
+        let mut ids = Vec::with_capacity(bsz * seq);
+        let mut mask = Vec::with_capacity(bsz * seq);
+        for enc in &encodings {
+            ids.extend(enc.get_ids().iter().copied());
+            mask.extend(enc.get_attention_mask().iter().map(|&m| m as f32));
+        }
+
+        let input_ids = Tensor::from_vec(ids, (bsz, seq), &self.device)?;
+        let attn_f32 = Tensor::from_vec(mask, (bsz, seq), &self.device)?;
+        let token_type_ids = input_ids.zeros_like()?;
+
+        let hidden = {
+            let guard = self
+                .model
+                .lock()
+                .unwrap_or_else(std::sync::PoisonError::into_inner);
+            // forward: (input_ids, attention_mask, token_type_ids, past,
+            // encoder_hidden, encoder_mask)
+            guard.forward(&input_ids, &attn_f32, &token_type_ids, None, None, None)?
+        };
+
+        // attention-mask-weighted mean pooling
+        let mask3 = attn_f32.unsqueeze(2)?; // (b, seq, 1)
+        let summed = hidden.broadcast_mul(&mask3)?.sum(1)?; // (b, hidden)
+        // counts ≥ 1 always: every input is e5-prefixed AND special tokens are
+        // added (encode_batch(_, true)), so no row has an all-zero mask. If that
+        // invariant ever breaks, broadcast_div would emit NaN vectors.
+        let counts = mask3.sum(1)?; // (b, 1)
+        let mean = summed.broadcast_div(&counts)?;
+
+        // L2 normalize
+        let norm = mean.sqr()?.sum_keepdim(1)?.sqrt()?;
+        let normalized = mean.broadcast_div(&norm)?;
+
+        // `.contiguous()` before host copy: broadcast ops can leave a strided
+        // view, which `to_vec2` rejects on the Metal backend (CPU tolerates it).
+        Ok(normalized.contiguous()?.to_vec2::<f32>()?)
+    }
+}
+
+impl Embedder for CandleEmbedder {
+    fn model_id(&self) -> EmbeddingModelId {
+        self.model_id.clone()
+    }
+
+    fn model_version(&self) -> EmbeddingVersion {
+        self.version.clone()
+    }
+
+    fn dimensions(&self) -> usize {
+        self.dimensions
+    }
+
+    fn embed(&self, inputs: &[EmbeddingInput<'_>]) -> Result<Vec<Vec<f32>>> {
+        if inputs.is_empty() {
+            return Ok(Vec::new());
+        }
+
+        // e5 prefix per §11.3 BEFORE tokenization (same convention as
+        // FastembedEmbedder so the two backends produce comparable vectors).
+        let prefixed: Vec<String> = inputs.iter().map(prefix_input).collect();
+
+        let mut out: Vec<Vec<f32>> = Vec::with_capacity(prefixed.len());
+        for chunk in prefixed.chunks(self.batch_size) {
+            let batch = self.embed_batch(chunk)?;
+            for v in &batch {
+                if v.len() != self.dimensions {
+                    anyhow::bail!(
+                        "candle returned vector of length {} but adapter expects {}",
+                        v.len(),
+                        self.dimensions
+                    );
+                }
+            }
+            out.extend(batch);
+        }
+
+        debug_assert_eq!(out.len(), inputs.len());
+        Ok(out)
+    }
+}
+
+/// Build the e5-prefixed string for one [`EmbeddingInput`]. Free function so
+/// a unit test can pin the format without loading the model. Byte-identical to
+/// `kebab-embed-local`'s `prefix_input` — the two backends MUST agree here or
+/// their vectors diverge.
+fn prefix_input(input: &EmbeddingInput<'_>) -> String {
+    match input.kind {
+        EmbeddingKind::Document => format!("passage: {}", input.text),
+        EmbeddingKind::Query => format!("query: {}", input.text),
+    }
+}
+
+/// Select the compute device. Built with the `metal` feature (Apple Silicon
+/// GPU), try Metal and fall back to CPU on failure; otherwise CPU. Metal only
+/// compiles/runs on macOS — the Linux server builds the CPU path. e5-large
+/// vectors are model-defined, so Metal-produced and CPU-produced embeddings are
+/// cross-compatible (a Mac can ingest on GPU, the server query on CPU).
+fn select_device() -> Device {
+    #[cfg(feature = "metal")]
+    {
+        match Device::new_metal(0) {
+            Ok(d) => {
+                tracing::info!(target: "kebab-embed-candle", "candle device = Metal (GPU)");
+                return d;
+            }
+            Err(e) => {
+                tracing::warn!(
+                    target: "kebab-embed-candle",
+                    error = %e,
+                    "Metal device unavailable; falling back to CPU"
+                );
+            }
+        }
+    }
+    tracing::info!(target: "kebab-embed-candle", "candle device = CPU");
+    Device::Cpu
+}
+
+/// Apply a one-shot global rayon thread cap (the NUMA-safety lever). Returns
+/// `true` if this call set the pool, `false` if it was already initialized
+/// (cap not applied) or `n_threads == 0`. `#[doc(hidden)] pub` so the
+/// thread-cap test can drive it without loading the 2GB model.
+#[doc(hidden)]
+pub fn apply_thread_cap(n_threads: usize) -> bool {
+    if n_threads == 0 {
+        return false;
+    }
+    rayon::ThreadPoolBuilder::new()
+        .num_threads(n_threads)
+        .build_global()
+        .is_ok()
+}
+
+/// Compare model hidden size against the configured dim. Extracted so a unit
+/// test can exercise the error branch without loading the model.
+pub(crate) fn check_dim(model_dim: usize, cfg_dim: usize) -> Result<()> {
+    if model_dim != cfg_dim {
+        anyhow::bail!(
+            "dimension mismatch: model={model_dim}, config={cfg_dim}; \
+             update `config.models.embedding.dimensions` to match the model \
+             (or pick a different model)."
+        );
+    }
+    Ok(())
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    // ── prefix_input ─────────────────────────────────────────────────
+    // Pin the exact e5 prefix strings; these MUST match
+    // kebab-embed-local::prefix_input or candle vs fastembed parity breaks.
+
+    #[test]
+    fn prefix_document_uses_passage() {
+        let input = EmbeddingInput {
+            text: "hello world",
+            kind: EmbeddingKind::Document,
+        };
+        assert_eq!(prefix_input(&input), "passage: hello world");
+    }
+
+    #[test]
+    fn prefix_query_uses_query() {
+        let input = EmbeddingInput {
+            text: "hello world",
+            kind: EmbeddingKind::Query,
+        };
+        assert_eq!(prefix_input(&input), "query: hello world");
+    }
+
+    #[test]
+    fn prefix_handles_empty_text() {
+        let doc = EmbeddingInput {
+            text: "",
+            kind: EmbeddingKind::Document,
+        };
+        let qry = EmbeddingInput {
+            text: "",
+            kind: EmbeddingKind::Query,
+        };
+        assert_eq!(prefix_input(&doc), "passage: ");
+        assert_eq!(prefix_input(&qry), "query: ");
+    }
+
+    // ── check_dim ────────────────────────────────────────────────────
+
+    #[test]
+    fn check_dim_passes_for_1024() {
+        check_dim(1024, 1024).expect("matching dims must pass");
+    }
+
+    #[test]
+    fn check_dim_rejects_384_vs_1024() {
+        let err = check_dim(384, 1024).expect_err("dim mismatch must error");
+        let msg = format!("{err}");
+        assert!(
+            msg.contains("384") && msg.contains("1024"),
+            "error must mention both dims, got: {msg}"
+        );
+    }
+
+    // ── model guard ──────────────────────────────────────────────────
+    // A non-e5-large model name must fail fast (BEFORE the ~2GB download),
+    // so we never download e5-large yet label its vectors with another name
+    // via model_id() — which would mislabel embedding_version.
+
+    #[test]
+    fn new_rejects_unsupported_model() {
+        let mut config = kebab_config::Config::defaults();
+        config.models.embedding.model = "multilingual-e5-small".to_string();
+        // num_threads defaults to 0, so no global rayon side effect here.
+        // `.err()` (not `expect_err`) avoids requiring `CandleEmbedder: Debug`
+        // — it holds a Mutex/Tokenizer and intentionally derives no Debug.
+        let err = CandleEmbedder::new(&config)
+            .err()
+            .expect("unsupported model must error");
+        let msg = format!("{err:#}");
+        assert!(
+            msg.contains("candle provider currently supports only"),
+            "expected model-guard error, got: {msg}"
+        );
+    }
+}
--- a/crates/kebab-embed-candle/tests/parity.rs
+++ b/crates/kebab-embed-candle/tests/parity.rs
@@ -0,0 +1,96 @@
+//! Parity test (spec §7, `#[ignore]` — needs the ~2GB model + network).
+//!
+//! Confirms the candle backend reproduces the onnxruntime `FastembedEmbedder`
+//! vectors closely enough that no re-index is required (spec D-reindex):
+//! per-sentence cosine ≥ 0.9999, and reports the dimension-wise max absolute
+//! difference (the number the re-index decision hangs on).
+//!
+//! Run manually:
+//!   CARGO_TARGET_DIR=/build/out/cargo-target/target \
+//!   cargo test -p kebab-embed-candle --release -- --ignored --nocapture
+//!
+//! Uses the canonical dogfood config so both backends resolve the same model
+//! identifiers and cache roots.
+
+use kebab_config::Config;
+use kebab_core::{Embedder, EmbeddingInput, EmbeddingKind};
+use kebab_embed_candle::CandleEmbedder;
+use kebab_embed_local::FastembedEmbedder;
+
+const DOGFOOD_CONFIG: &str = "/build/dogfood/config.toml";
+
+/// Mixed Korean / English parity set (≥ 8 sentences, mirrors the Phase 0 spike).
+const SENTENCES: &[&str] = &[
+    "The quick brown fox jumps over the lazy dog.",
+    "오늘 날씨가 정말 좋아서 산책을 나가고 싶다.",
+    "Rust is a systems programming language focused on safety and performance.",
+    "벡터 검색은 임베딩 사이의 코사인 유사도를 이용한다.",
+    "Machine learning models require large amounts of training data.",
+    "한국어와 영어가 섞인 문장도 멀티링구얼 모델은 잘 처리한다.",
+    "The capital of France is Paris, a city known for its art and culture.",
+    "이 프로젝트는 로컬 우선 지식 베이스와 검색 증강 생성을 목표로 한다.",
+    "Database indexing dramatically speeds up query performance.",
+    "임베딩 모델을 candle 로 옮기면 NUMA 서버에서 안전하게 돌릴 수 있다.",
+];
+
+fn cosine(a: &[f32], b: &[f32]) -> f32 {
+    let dot: f32 = a.iter().zip(b).map(|(x, y)| x * y).sum();
+    let na: f32 = a.iter().map(|x| x * x).sum::<f32>().sqrt();
+    let nb: f32 = b.iter().map(|x| x * x).sum::<f32>().sqrt();
+    dot / (na * nb)
+}
+
+#[test]
+#[ignore = "needs ~2GB model + network; run manually for the re-index decision"]
+fn candle_matches_fastembed() {
+    let config = Config::load(Some(std::path::Path::new(DOGFOOD_CONFIG)))
+        .expect("load dogfood config for parity baseline");
+
+    let candle = CandleEmbedder::new(&config).expect("build CandleEmbedder");
+    let fastembed = FastembedEmbedder::new(&config).expect("build FastembedEmbedder");
+
+    // Cover BOTH prefix paths (`passage:` for Document, `query:` for Query) so
+    // a query-side prefix/pooling divergence can't slip through (reviewer note).
+    let inputs: Vec<EmbeddingInput> = SENTENCES
+        .iter()
+        .flat_map(|s| {
+            [EmbeddingKind::Document, EmbeddingKind::Query]
+                .into_iter()
+                .map(move |kind| EmbeddingInput { text: s, kind })
+        })
+        .collect();
+
+    let cv = candle.embed(&inputs).expect("candle embed");
+    let fv = fastembed.embed(&inputs).expect("fastembed embed");
+
+    assert_eq!(cv.len(), fv.len(), "embedding counts must match");
+    assert_eq!(cv.len(), inputs.len(), "one vector per input");
+    assert_eq!(candle.dimensions(), 1024);
+
+    let mut min_cos = f32::INFINITY;
+    let mut max_abs_diff = 0f32;
+    for (i, inp) in inputs.iter().enumerate() {
+        assert_eq!(cv[i].len(), 1024, "candle dim");
+        assert_eq!(fv[i].len(), 1024, "fastembed dim");
+        let c = cosine(&cv[i], &fv[i]);
+        min_cos = min_cos.min(c);
+        let diff = cv[i]
+            .iter()
+            .zip(&fv[i])
+            .map(|(a, b)| (a - b).abs())
+            .fold(0f32, f32::max);
+        max_abs_diff = max_abs_diff.max(diff);
+        let kind = match inp.kind {
+            EmbeddingKind::Document => "doc",
+            EmbeddingKind::Query => "qry",
+        };
+        let preview: String = inp.text.chars().take(36).collect();
+        println!("[{i:>2}] {kind} cos={c:.6} max_abs_diff={diff:.6e}  {preview}");
+    }
+
+    println!("PARITY_SUMMARY cosine_min={min_cos:.6} max_abs_diff={max_abs_diff:.6e}");
+    assert!(
+        min_cos >= 0.9999,
+        "candle vs fastembed cosine_min={min_cos:.6} < 0.9999 — investigate before merge"
+    );
+}
--- a/crates/kebab-embed-candle/tests/thread_cap.rs
+++ b/crates/kebab-embed-candle/tests/thread_cap.rs
@@ -0,0 +1,32 @@
+//! Thread-cap test (spec §7). Own integration binary → clean process, so the
+//! one-shot global rayon pool is initialized exactly once, by us.
+//!
+//! Verifies that `apply_thread_cap(4)` sizes the global rayon pool to 4, which
+//! is the lever that keeps candle's CPU backend NUMA-safe (vs onnxruntime's
+//! hard-coded 48 intra-op threads).
+
+use kebab_embed_candle::apply_thread_cap;
+
+#[test]
+fn thread_cap_sizes_global_rayon_pool() {
+    // Must run before any other rayon use in this process. As the only test in
+    // this binary that touches rayon, that holds.
+    let applied = apply_thread_cap(4);
+    assert!(applied, "first build_global call should succeed");
+    assert_eq!(
+        rayon::current_num_threads(),
+        4,
+        "global rayon pool must be capped at the requested 4 threads"
+    );
+
+    // A second cap attempt is a no-op (pool already built), not a panic.
+    assert!(
+        !apply_thread_cap(8),
+        "second build_global must report not-applied"
+    );
+    assert_eq!(
+        rayon::current_num_threads(),
+        4,
+        "thread count must stay at the first cap"
+    );
+}
--- a/crates/kebab-tui/src/ingest_progress.rs
+++ b/crates/kebab-tui/src/ingest_progress.rs
@@ -154,7 +154,14 @@ fn apply_event(state: &mut IngestState, event: IngestEvent) {
        }
        // v0.20.0 sub-item 1: per-page PDF OCR events — TUI does not
        // surface per-page OCR progress in v1; no counter to update.
-        IngestEvent::PdfOcrStarted { .. } | IngestEvent::PdfOcrFinished { .. } => {}
+        IngestEvent::PdfOcrStarted { .. }
+        | IngestEvent::PdfOcrFinished { .. }
+        // v0.24.0 asset-internal phase events: the status-bar reducer tracks
+        // per-asset counters, not sub-asset phase progress, so these are
+        // no-ops here (the CLI / --json surfaces render them).
+        | IngestEvent::AssetChunked { .. }
+        | IngestEvent::ExpansionProgress { .. }
+        | IngestEvent::AssetTimings { .. } => {}
    }
 }

--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -66,7 +66,8 @@ flowchart TB
    end
    subgraph Adapters ["traits + adapters"]
        embed["kebab-embed<br/>(trait)"]
-        embedlocal["kebab-embed-local<br/>(fastembed)"]
+        embedlocal["kebab-embed-local<br/>(fastembed, default)"]
+        embedcandle["kebab-embed-candle<br/>(candle, NUMA-safe opt-in)"]
        llm["kebab-llm<br/>(trait)"]
        llmlocal["kebab-llm-local<br/>(Ollama)"]
        search["kebab-search"]
@@ -92,6 +93,7 @@ flowchart TB
    app --> sqlite
    app --> vector
    app --> embedlocal
+    app --> embedcandle
    app --> llmlocal
    app --> search
    app --> rag
@@ -104,6 +106,8 @@ flowchart TB
    paud --> core
    pcode --> core
    embedlocal --> embed
+    embedcandle --> core
+    embedcandle --> config
    llmlocal --> llm
    rag --> search
    rag --> llm
@@ -180,6 +184,7 @@ kebab/
 │   ├── kebab-store-sqlite/                            # SQLite + FTS5 (V001/V002/V003) (P1-6, P2-1, P3-3). src/derivation_cache.rs = derivation_cache 테이블 저장소 (V012, v0.21.0)
 │   ├── kebab-search/                                  # Lexical + Vector + Hybrid retriever (P2-2, P3-4)
 │   ├── kebab-embed/  kebab-embed-local/                  # Embedder trait + fastembed adapter (P3-1, P3-2)
+│   ├── kebab-embed-candle/                             # candle (pure-Rust) Embedder, NUMA-safe opt-in provider=candle (Track 1, v0.22.0)
 │   ├── kebab-store-vector/                            # LanceDB VectorStore (P3-3, P7-3 follow-up)
 │   ├── kebab-llm/  kebab-llm-local/                      # LanguageModel trait + Ollama adapter (P4-1, P4-2)
 │   ├── kebab-rag/                                     # RAG pipeline (P4-3)
--- a/docs/DOGFOOD.md
+++ b/docs/DOGFOOD.md
@@ -683,6 +683,20 @@ ajv-cli validate -s docs/wire-schema/v1/<schema>.schema.json -d <output>

 ---

+### config migrate (스키마 마이그레이션, v0.21.1)
+
+```bash
+# 옛 스키마 흉내(섹션 누락 + deprecated) 후 migrate.
+printf 'schema_version = 1\n\n[workspace]\nroot = "~/MyNotes"\ninclude = ["*.md"]\n\n[search]\ndefault_k = 25\n' \
+  > "$DOGFOOD/old.toml"
+"$RELEASE_BIN" --config "$DOGFOOD/old.toml" config migrate --dry-run    # 미리보기, 파일 미수정
+"$RELEASE_BIN" --config "$DOGFOOD/old.toml" config migrate              # .bak + 빠진 섹션 주석과 함께 추가
+"$RELEASE_BIN" --config "$DOGFOOD/old.toml" config migrate              # 멱등
+"$RELEASE_BIN" --config "$DOGFOOD/old.toml" doctor | grep config_migration  # ok 확인
+```
+
+기대: dry-run 파일 미수정 → apply 시 `old.toml.bak`(원본 byte-identical) + `[ingest.expansion]`·`[logging]`·`[pdf.ocr]` 가시화 + 손본 `default_k`/주석 보존 + `workspace.include` 제거 → 재실행 멱등 → doctor `config_migration` ok. v0.21.1 evidence 는 `tasks/HOTFIXES.md` 2026-05-31.
+
 ## §10 Eval (P5)

 ### §10.1 Basic eval run
--- a/docs/SMOKE.md
+++ b/docs/SMOKE.md
@@ -107,11 +107,16 @@ respect_markdown_headings = true
 chunker_version = "md-heading-v1"

 [models.embedding]
-provider = "fastembed"               # "none" 으로 두면 lexical-only — Ollama 불필요
+provider = "fastembed"               # "fastembed"(기본) / "candle"(순수 Rust, NUMA-안전)
+                                     # / "none"(lexical-only — Ollama 불필요)
+                                     # ⚠ provider="candle" 사용 시 아래 model/dimensions 도
+                                     #   multilingual-e5-large / 1024 로 바꿔야 함
+                                     #   (candle 은 현재 e5-large 만 지원).
 model = "multilingual-e5-small"
 version = "v1"
 dimensions = 384
 batch_size = 64
+num_threads = 0                      # candle 전용 CPU 스레드 캡 (0=auto). env KEBAB_EMBED_THREADS 우선.

 [models.llm]
 provider = "ollama"
@@ -690,6 +695,20 @@ KB --json schema | jq '.stats.code_lang_breakdown'
 - OCR / caption 부분 실패는 `errors` 카운터 미증가 — `kebab inspect doc <id>` 의 Provenance Warning 이벤트 또는 `--debug` 로그에서만 확인.
 - (P7-3) `*.pdf` 자산을 워크스페이스에 두면 `kebab ingest` 출력에 PDF 도 `new` 카운터에 포함. `kebab inspect doc <pdf_doc_id>` 가 `parser_version = "pdf-text-v1"` + 페이지마다 `Block::Paragraph` + `SourceSpan::Page { page, char_start, char_end }`. 본문에 등장하는 단어로 `kebab search --mode hybrid` 시 PDF chunk 가 결과에 포함되고 `source_span.kind = "page"` 면 wiring 정상. 암호화 PDF 는 `errors+=1` 로 분류되며 `error` 필드에 `qpdf --decrypt` 안내 보존. 빈/스캔 페이지 (PDF 가 텍스트를 추출하지 못한 페이지) 는 0 chunk + `Provenance::Warning` ("scanned candidate") 로 표시 — P+ scanned-PDF OCR fallback 까지는 검색 불가.

+## config migrate (마이그레이션)
+
+```bash
+# 옛 스키마처럼 섹션이 빠진 config 를 흉내내 migrate 동작 확인.
+printf 'schema_version = 1\n\n[workspace]\nroot = "/tmp/kb"\ninclude = ["*.md"]\n' \
+  > /tmp/kebab-smoke/old.toml
+kebab --config /tmp/kebab-smoke/old.toml config migrate --dry-run   # 변경 미리보기 (파일 미수정)
+kebab --config /tmp/kebab-smoke/old.toml config migrate             # 적용 (.bak 백업)
+kebab --config /tmp/kebab-smoke/old.toml config migrate             # 멱등: "config 이미 최신입니다"
+kebab --config /tmp/kebab-smoke/old.toml --json config migrate --dry-run | jq .schema_version
+```
+
+기대: dry-run 은 추가될 섹션(`[ingest.expansion]`·`[logging]` 등)과 제거될 `workspace.include` 를 출력하고 **파일을 수정하지 않는다**. 적용 시 `old.toml.bak`(원본과 동일)이 생기고 빠진 섹션이 주석과 함께 추가되며 사용자가 손본 값·주석은 보존된다. 재실행은 멱등(`config 이미 최신입니다`), `--json` 은 `config_migration.v1`.
+
 ## 정리

 ```bash
--- a/docs/release-notes/v0.22.0-draft.md
+++ b/docs/release-notes/v0.22.0-draft.md
@@ -0,0 +1,97 @@
+---
+title: kebab v0.22.0 release notes (draft)
+created: 2026-06-01
+status: draft
+release_trigger:
+  - 신규 config surface (provider=candle, num_threads / KEBAB_EMBED_THREADS) — pre-1.0 minor bump
+  - 임베딩 백엔드 다변화 (NUMA-안전 candle provider 추가, opt-in)
+---
+
+# kebab v0.22.0 — candle 임베딩 provider (NUMA-안전, opt-in)
+
+v0.21.1 (config 마이그레이션) 후속 minor release. 듀얼소켓 NUMA 서버에서
+onnxruntime 의 스레드 하드코딩이 일으키던 ingest 크래시를 피하기 위해, 같은
+임베딩 모델을 **순수 Rust(candle)** 로 돌리는 opt-in provider 를 추가한다.
+**기본 동작은 그대로다** — 기존 사용자는 아무것도 바꿀 필요가 없다.
+
+---
+
+## 핵심 변경
+
+### candle 임베딩 provider (`provider = "candle"`)
+
+**변경 사실.** `[models.embedding].provider` 에 `"candle"` 값이 추가됐다.
+`"fastembed"`(기본, onnxruntime) / `"candle"`(순수 Rust) / `"none"`(lexical-only)
+중 하나를 고를 수 있다. candle provider 는 fastembed 와 **완전히 같은 모델**
+(`intfloat/multilingual-e5-large`, 1024-dim)을 쓰고, e5 prefix → mean pooling
+→ L2 정규화 파이프라인도 동일하다. 첫 사용 시 safetensors(~2GB)를
+`{model_dir}/candle/` 아래로 자동 다운로드한다.
+
+```toml
+[models.embedding]
+provider = "candle"     # 기본은 "fastembed" — NUMA 서버에서만 candle 권장
+num_threads = 8         # candle CPU 스레드 캡 (0 = auto = #cores)
+```
+
+```bash
+# env 로도 캡 가능 (config 보다 우선)
+KEBAB_EMBED_THREADS=8 kebab ingest
+```
+
+**Trade-off.** candle 는 순수 Rust 라 onnxruntime 의 네이티브 SIMD 커널보다
+CPU latency 가 느리다 (Phase 0 스파이크 측정 ~4×). 그래서 **기본값은
+fastembed 를 유지**하고, candle 은 onnxruntime 가 죽는 NUMA 환경에서만 켜는
+opt-in 으로 둔다. 단일 워크스테이션 사용자는 fastembed 가 더 빠르다.
+
+**Mitigation (왜 안전한가).** candle 의 CPU 백엔드는 글로벌 rayon 풀 크기로
+스레드를 정한다. `num_threads`(또는 env `KEBAB_EMBED_THREADS`)가 그 풀을 한 번
+캡하므로, onnxruntime 가 하드코딩하던 48 intra-op 스레드 → NUMA 힙 손상 →
+double-free 경로를 원천 차단한다. NUMA 노드 바인딩이 더 필요하면 `numactl`
+과 조합한다.
+
+**Upgrade 절차.** 재색인 **불필요**. candle 과 fastembed 의 벡터는 사실상
+동일(Phase 0 스파이크 코사인 1.000000)해서 `embedding_version` 을 유지했고,
+기존 LanceDB 색인을 그대로 재사용한다. provider 를 바꿔도 검색 결과는
+바뀌지 않는다. 기존 `config.toml` 은 `num_threads` 가 자동으로 `0`(auto)으로
+채워져 그대로 로드된다 — `kebab config migrate` 도, 수동 편집도 필요 없다.
+
+---
+
+## 그 외
+
+- 신규 crate `kebab-embed-candle` (candle 의존성 트리를 이 crate 에 격리,
+  `kebab-core`/`kebab-config` 외 다른 kebab-* 의존 없음).
+- Phase 0 feasibility 스파이크(`spike-embed-candle`)는 production 흡수 후 제거.
+- 문서: README Configuration, `docs/SMOKE.md` config 예시, `docs/ARCHITECTURE.md`
+  crate 그래프/트리에 candle provider 반영.
+
+## 검증 / 도그푸딩
+
+- **패리티 (candle vs onnxruntime)**: 동일 e5-large 가중치로 cosine_min =
+  1.000000, 차원별 max 절대오차 = **2.01e-7**. 벡터가 사실상 동일 →
+  `embedding_version` 유지(재색인 0). 재현: `crates/kebab-embed-candle/tests/parity.rs`
+  (`--ignored`).
+- **전체 도그푸딩 (2026-06-02)**: `provider=candle` 로 도그푸딩 코퍼스 전체
+  재색인 — **997 docs / 23,151 chunks, 에러 0** 완주 (≈9.5 h, 단일소켓 VM).
+  candle 가 23k+ 청크를 메모리 오류 없이 처리함을 확인.
+- **A1(taskset/numactl) 반증**: NUMA 서버에서 `taskset -c 0-3` 으로 스레드를
+  4개로 묶어도 onnxruntime 은 그대로 죽었다(6/5150 segfault). 스레드 축소는
+  해법이 아니며, **`provider=candle` 만이 실 해법**이다 (candle 은 onnxruntime 을
+  호출하지 않음).
+- **최종 인수 게이트 (사용자)**: 그 듀얼소켓 NUMA 서버에서 `provider=candle` 로
+  ingest 가 EXIT=0 완주 — 배포·실사용이 이 검증을 겸한다.
+
+## 성능 노트 (중요)
+
+candle CPU 임베딩은 onnxruntime 대비 약 **3~4× 느리다** (e5-large/512-tok 의
+순수-Rust 커널 비용). 측정상 ~1.86 s/chunk, CPU 약 4코어 활용. **이는 의도된
+트레이드오프** — onnxruntime 이 전 코어를 AVX-512 로 빡빡하게 굴리는 바로 그
+경로가 NUMA 에서 힙을 손상시켜 죽기 때문이다. "느려도 완주" > "빨라도 크래시".
+
+- Intel **MKL 가속을 실험했으나 부정 결과**: MKL 은 코어를 더 쓰지만(8~9코어)
+  오히려 38~50% 느렸다(과다구독 + MKL 2020.1 오버헤드). 채택하지 않음.
+- 더 많은 코어/스레드로는 빨라지지 않는다(병목이 코어 수가 아님). 속도가
+  critical 하면 청크 길이 단축 / 더 작은 모델 / GPU 가 레버다(별도 검토).
+- 9.5 h 는 **최초 전체 색인 1회 비용**이며, 이후 증분 ingest 는 새/변경 문서만
+  처리해 저렴하다. 단일 워크스테이션(비-NUMA)에서는 기본 `fastembed` 가 더 빠르니
+  candle 은 NUMA 호스트 전용 opt-in 으로 둔다.
--- a/docs/superpowers/handoffs/2026-05-31-config-migration-kickoff.md
+++ b/docs/superpowers/handoffs/2026-05-31-config-migration-kickoff.md
@@ -0,0 +1,77 @@
+# config 마이그레이션 — 작업 인계 (kickoff)
+
+> 2026-05-31. config.toml **스키마 진화 시 기존 사용자 파일을 자동 마이그레이션**하는
+> 기능. 새 세션은 이 문서 + 메모리 [[project_paraphrase_robustness]] 로 이어받는다.
+> 본격 진행은 brainstorm → spec → plan → 구현 (방법론 §5).
+
+## 1. 동기
+
+v0.21.0 에서 `[ingest.expansion]`(별칭) 섹션을 추가했다. 기존 사용자 config.toml 은
+serde default 로 **동작은 호환**(off 로 로드)되지만, 그 섹션이 **파일에 써지지 않아**
+사용자가 파일을 열어도 새 기능의 존재·노브를 알 수 없다. DB 는 V00X refinery
+마이그레이션이 있는데 **config 는 마이그레이션 메커니즘이 없다** — 이걸 만든다.
+
+## 2. 현황 (코드, 현재 main = v0.21.0)
+
+- **읽기는 이미 forward-compatible**: `crates/kebab-config/src/lib.rs` 의 모든 새
+  섹션/필드가 `#[serde(default)]` (예: ImageCfg L50, UiCfg L55, ingest.code L60,
+  PdfCfg L65, logging L70, nli L132 …). missing 필드는 default 로 로드돼 **기존
+  config 가 깨지지 않는다**. → 동작 호환성은 확보돼 있고, 만들 것은 *파일 갱신*이다.
+- **`schema_version: u32`** (lib.rs:38, 현재 `1`) — **검증·마이그레이션에 안 쓰이는
+  장식**. 마이그레이션의 버전 축으로 활용할 자리.
+- **파일 쓰기는 init 뿐**: `kebab init` 이 `toml::to_string(&Config::defaults())`
+  로 default config 생성(lib.rs:1349 부근). **기존 파일을 갱신하는 경로는 없다.**
+- **deprecated 선례**: 옛 `workspace.include` 는 로드 시 무시 + 1회 deprecation
+  warning (p9-fb-25). 마이그레이션의 "deprecated 정리" 참고 패턴.
+
+## 3. 풀어야 할 핵심 — 주석/순서 보존
+
+`toml::to_string` 으로 통째 재작성하면 **사용자가 손본 주석·정렬·순서가 전부
+날아간다**. 이게 config 마이그레이션의 본질적 난점. 접근 3안:
+
+| 방식 | 주석 보존 | 복잡도 | 비고 |
+|------|-----------|--------|------|
+| A. 전체 재작성(로드→재직렬화) | ✗ | 낮음 | 사용자 값은 보존되나 주석 손실 |
+| B. `toml_edit` 로 missing 섹션만 주석과 함께 append/수정 | ✓ | 중간 | 의존성 추가, 가장 사용자 친화적 |
+| C. 백업(.bak) 후 재생성 + diff 안내 | △ | 낮음 | 안전하나 사용자가 주석 수동 복원 |
+
+→ **B(`toml_edit`)** 가 사용자 손본 config 보존엔 최선. 의존성·복잡도 trade-off 를
+brainstorm 에서 결정.
+
+## 4. 설계 결정 (brainstorm 시작점)
+
+1. **트리거**: `kebab config migrate` 명시 명령 vs `load` 시 자동(+백업). 자동은
+   편하나 예측 가능성/안전(쓰기 권한·손상)이 걸린다. 명시 명령 + `kebab doctor`
+   에서 "마이그레이션 필요" 안내가 무난할 수 있음.
+2. **버전 축**: `schema_version` 기반 버전별 변환 함수 체인 (v1→v2→…, DB refinery
+   패턴 차용). 각 step 은 "이 버전에서 추가된 섹션/바뀐 형식/제거된 deprecated".
+3. **동작**: (a) 새 섹션을 주석과 함께 추가 (b) deprecated 필드 정리/이동
+   (c) 형식 변경 변환. 모두 **멱등**(재실행 안전).
+4. **안전**: 사용자 손본 config 손상 절대 금지 → **백업(.bak) 필수**, dry-run 옵션,
+   실패 시 원본 보존.
+
+## 5. 방법론 (v0.21.0 작업과 동일 — PR #195/#196 참고)
+
+brainstorm(사용자 컨펌 게이트 skip, self-review) → spec(self-review) → plan(TDD,
+bite-sized) → executor(opus) 또는 OMC teammate 구현 → **gitea-pr 리뷰 루프**
+(round1 리뷰 opus, closure verify sonnet) → 머지. 빌드는 항상
+`CARGO_TARGET_DIR=/build/out/cargo-target/target cargo … -j 4 > /tmp/x.log 2>&1; echo EXIT=$?`
+(절대 `cargo | grep` 금지). PR 은 gitea REST(`~/.netrc`), gh 안 됨.
+
+## 6. 관련 파일
+
+- `crates/kebab-config/src/lib.rs` — `Config` struct, `schema_version`, serde default
+  패턴, `load`/`defaults`/`to_string`. 마이그레이션 모듈을 여기 or 신규 `migrate.rs`.
+- `crates/kebab-cli/src/*` — `init` 명령 옆에 `config migrate`(또는 `config`) 서브커맨드.
+- `migrations/V0XX__*.sql` — DB 마이그레이션의 버전 체인 패턴 차용 참고.
+- `toml_edit` 크레이트(주석 보존 편집) — B안 시 의존성 후보.
+
+## 7. 주의
+
+- config 마이그레이션은 **user-facing surface** → README(Configuration)/HOTFIXES 동기화
+  (이번 세션 패턴 [[feedback_readme_sync_rule]]). 마이그레이션 *동작 디테일*은 spec 에
+  충실히([[feedback_design_detail_docs]]).
+- `schema_version` bump 가 release 트리거인지는 별도 판단 — DB schema(V00X)와 달리
+  config 버전은 데이터 무효화가 아니므로, additive 면 release 트리거 아닐 수 있음
+  (CLAUDE.md §Versioning 의 DB/wire 기준과 구분).
+- 멱등 + 백업 + dry-run 이 안전의 3축.
--- a/docs/superpowers/plans/2026-05-31-config-migration.md
+++ b/docs/superpowers/plans/2026-05-31-config-migration.md
--- a/docs/superpowers/specs/2026-05-31-config-migration-design.md
+++ b/docs/superpowers/specs/2026-05-31-config-migration-design.md
@@ -0,0 +1,268 @@
+# config 마이그레이션 — 설계 (spec)
+
+> 2026-05-31. config.toml **스키마 진화 시 기존 사용자 파일을 자동 갱신**하는 기능의
+> 설계 계약. kickoff 인계 문서
+> [`2026-05-31-config-migration-kickoff.md`](../handoffs/2026-05-31-config-migration-kickoff.md)
+> 의 brainstorm 결과를 확정한 spec 이다. 본 문서를 기준으로 plan → 구현.
+
+## 0. 결정 요약 (brainstorm 게이트)
+
+| 축 | 결정 | 근거 |
+|----|------|------|
+| **트리거** | 명시 명령 `kebab config migrate` + `kebab doctor` 안내 | 예측 가능성·안전. load 시 자동 쓰기는 쓰기 권한/동시 실행/손상 위험. |
+| **주석 보존** | `toml_edit` 부분 편집 | 사용자가 손본 값·주석·순서·정렬 100% 보존. 빠진 것만 추가. |
+| **버전 메커니즘** | reconciliation(additive) + step 체인(non-additive) 하이브리드 | kebab config 는 `schema_version` 이 줄곧 `1` 인 채로 섹션이 누적돼 버전 번호만으로 "무엇이 빠졌는지" 구분 불가 → 구조 비교가 본질. |
+
+## 1. 동기 (kickoff §1 재확인)
+
+v0.21.0 에서 `[ingest.expansion]` 등 섹션이 늘었지만, 기존 사용자 config.toml 은
+serde default 로 **동작은 호환**(off 로 로드)되나 그 섹션이 **파일에 써지지 않아**
+사용자가 파일을 열어도 새 기능의 존재·노브를 알 수 없다. DB 는 V00X refinery
+마이그레이션이 있는데 config 는 없다 — 이걸 만든다.
+
+핵심: **데이터 무효화가 아니라 *파일 가시성* 문제**. 읽기 호환성은 이미 확보돼 있으므로
+(`#[serde(default)]`), 만들 것은 *사용자 파일을 새 스키마에 맞춰 갱신*하는 것이다.
+
+## 2. 비목표 (YAGNI)
+
+- config 값의 **의미적 검증**(예: score_gate 범위 체크) — 별개 기능. 본 작업 범위 아님.
+- **load 시 자동 마이그레이션** — 명시적으로 제외(트리거 결정). 추후 필요 시 별 작업.
+- **다운그레이드**(새 → 옛 스키마) — 단방향만.
+- 기존 사용자 **값의 재조정**(default 가 바뀌었다고 사용자 값 덮어쓰기) — 절대 안 함.
+  마이그레이션은 *없는 것 추가* + *deprecated 정리*만. 사용자가 명시한 값은 불가침.
+
+## 3. 아키텍처 — 두 메커니즘
+
+마이그레이션은 사용자 파일(`toml_edit::DocumentMut`)에 다음 순서로 적용한다.
+
+```
+원본 파일 → [1. step 체인(non-additive)] → [2. reconciliation(additive)] → [3. schema_version stamp] → 결과
+```
+
+### 3.1 Reconciliation (additive — 핵심 메커니즘)
+
+**정의**: "default Config 구조에는 있지만 사용자 파일에 없는 테이블/키를, 설명 주석과
+함께 사용자 파일에 추가한다." 버전과 무관하게 동작하며 멱등이다.
+
+**참조 문서 = 주석 달린 default**: `annotated_default_document()` 가 단일 진실 원천이다.
+
+```
+fn annotated_default_document() -> toml_edit::DocumentMut
+//  Config::defaults() 를 toml_edit Document 로 직렬화한 뒤,
+//  주석 카탈로그(§3.3)의 설명을 각 테이블/키의 decor(prefix)에 부착.
+//  → 이 문서가 "완전체 config.toml" 의 정의.
+```
+
+`kebab init` 도 이 함수의 출력을 그대로 파일로 쓴다(§5.2). 즉 **init 과 migrate 가
+동일한 참조 문서를 공유** → 주석·구조의 단일 원천.
+
+**reconcile 알고리즘** (참조 문서 `ref` → 사용자 문서 `user`, 재귀):
+
+```
+for each (key, ref_item) in ref (문서 순서 유지):
+    if key 가 user 에 없음:
+        user 에 ref_item 을 통째 복사 (decor=주석 포함).  → change: added_section / added_key
+    else if ref_item 과 user[key] 가 둘 다 테이블:
+        recurse(ref_item, user[key])    # 하위만 비교
+    else:
+        # 키가 이미 존재(값이 default 와 달라도) → 건드리지 않음. (값 불가침)
+```
+
+- **삽입 위치**: 누락 키는 해당 테이블 **끝에 append**(결정적·단순). 사용자가 짜둔 기존
+  순서는 보존되고 새 항목만 뒤에 붙는다.
+- **중첩 테이블**: `[ingest]` 는 있는데 `[ingest.expansion]` 이 없으면 `expansion`
+  하위 테이블만 추가. `[ingest]` 자체가 없으면 `[ingest]` + 그 안의 모든 하위를 추가.
+- **값 불가침 예시**: 사용자가 `score_gate = 0.8` 로 바꿔뒀고 default 가 0.6 이어도,
+  키가 존재하므로 **0.8 유지**. 마이그레이션은 0.6 으로 되돌리지 않는다.
+
+### 3.2 Step 체인 (non-additive)
+
+`schema_version` 기반 버전별 변환 함수. additive 가 아닌 변경(deprecated 제거, rename,
+형식 변환)을 담당한다. DB refinery 패턴 차용.
+
+```
+const CURRENT_SCHEMA_VERSION: u32 = 2;   // 이번 작업에서 1 → 2
+
+fn step_1_to_2(doc: &mut DocumentMut, changes: &mut Vec<MigrationChange>)
+//  v1 → v2 변환: 옛 `workspace.include` 키 제거 (p9-fb-25 deprecated).
+//   - doc["workspace"]["include"] 존재 시 remove → change: removed_deprecated.
+//   - 없으면 noop (멱등).
+```
+
+- **실행 범위**: 파일의 `schema_version`(없으면 1 로 간주) 부터 `CURRENT` 까지 순차 적용.
+  이미 `CURRENT` 이상이면 step 없음.
+- 각 step 은 **개별적으로 멱등**(이미 적용된 상태에서 재실행해도 noop).
+- 이번 작업의 유일한 step 은 `1→2`(workspace.include 제거). 누적된 섹션 추가
+  (image/ui/ingest/pdf/logging/expansion)는 **전부 reconciliation 이 처리**하므로
+  step 으로 만들지 않는다. step 체인은 "구조로 표현 못 하는 변환"만 담는다.
+
+### 3.3 주석 카탈로그
+
+"섹션/키 → 한국어 설명 주석" 매핑을 kebab-config 의 마이그레이션 모듈 한 곳에 정적
+정의한다. 단일 원천 — README/SMOKE 와 중복하지 않고 여기를 정본으로.
+
+- 기존 `init_workspace` 의 헤더(경로 정책 설명, `kebab-app/src/lib.rs:147~`)는
+  **문서 레벨 prefix** 로 이전한다(`annotated_default_document` 가 부착).
+- 섹션별 주석은 README Configuration §의 노브 설명을 차용해 **간결**하게(1~2줄).
+  예: `[ingest.expansion]` → `# doc-side 별칭 확장 (기본 off). 검색 패러프레이즈 강건성↑.`
+- 주석 문구는 짧게, 과하지 않게. 전체 문서는 생성된 파일·README·SMOKE 참고로 유도.
+
+### 3.4 멱등성 보장 (안전 1축)
+
+- reconciliation: 이미 있는 키는 skip → 두 번째 실행 시 changes 비어 있음.
+- step: 각 step 이 noop-safe.
+- 결과: **마이그레이션 후 재실행하면 `changed=false`, 파일 미변경.** 이것이 doctor
+  체크(§5.3)와 멱등 테스트의 핵심 단언.
+
+## 4. 안전 3축 (kickoff §4.4)
+
+1. **멱등** — §3.4.
+2. **백업** — 파일 수정 직전 `<config>.bak` 생성(원본 복사). 기존 `.bak` 있으면 덮어씀
+   (단순화; 변경 내용은 dry-run 으로 사전 확인 가능). dry-run 시 백업도 안 만듦.
+3. **dry-run** — `--dry-run` 은 changes 만 계산·출력하고 **파일·백업 모두 미수정**.
+
+**실패 시 원본 보존(atomic write)**: 편집 결과는 `<config>.tmp` 에 먼저 쓰고
+`rename(tmp, config)` 로 교체. rename 이전 어느 단계에서 실패해도 원본 불변. 순서:
+`백업 생성 → tmp 쓰기 → tmp 검증(재파싱 round-trip) → atomic rename`.
+
+## 5. 표면 (surface)
+
+### 5.1 CLI — `kebab config migrate`
+
+신규 top-level `Config` 서브커맨드 그룹(clap nested, `Inspect`/`List` 패턴 차용):
+
+```
+kebab config migrate [--dry-run] [--json]
+```
+
+- 전역 `--config <path>` 존중 (facade rule). 미지정 시 XDG 기본 경로.
+- 대상 파일이 없으면 에러: `config 파일이 없습니다. 먼저 kebab init 을 실행하세요.`
+  (`--json` 시 `error.v1`, code `config_not_found`).
+- 사람용 출력: 변경 목록(추가된 섹션/키, 제거된 deprecated) + 백업 경로 + "N changes
+  applied" 또는 "already up to date".
+- `--json`: `config_migration.v1` (§5.4).
+
+**facade**: kebab-cli 는 kebab-app 의
+`config_migrate_with_config_path(config_path: Option<&Path>, dry_run: bool)
+-> anyhow::Result<ConfigMigrationReport>` 를 호출(파일 read/백업/atomic write
+오케스트레이션은 app 계층, 순수 변환은 config 계층 — §6).
+
+### 5.2 `kebab init` 영향 (user-visible)
+
+`init_workspace` 가 `annotated_default_document()` 출력을 쓰도록 변경. 결과: init 이
+생성하는 config.toml 이 **섹션별 주석을 포함**(기존엔 헤더만). 이는 user-visible surface
+변경이므로 README Configuration §·docs/SMOKE.md 의 config 예시 블록 동기화 필요.
+
+### 5.3 `kebab doctor` 체크 추가 (additive)
+
+config load 체크 직후 `config_migration` 체크 1개 추가:
+
+- 내부적으로 dry-run 마이그레이션 실행 → changes 비었으면 `ok=true`,
+  detail `config up to date (schema v2)`, hint=None.
+- changes 있으면 `ok=false`, detail `N pending changes (added M sections, removed K
+  deprecated)`, hint `run kebab config migrate to update your config.toml`.
+- **trade-off (확정)**: `DoctorCheck` 는 `ok: bool` 뿐이고 hint 는 `ok==false` 일 때
+  표시되는 규약이므로, "마이그레이션 필요"는 `ok=false` 로 신호한다. 이는 전체
+  `DoctorReport.ok`(모든 체크의 AND)를 false 로 만든다 — 즉 *완전히 동작하지만
+  config 가 옛 스키마인* 환경에서 `kebab doctor` 가 "비정상"으로 보고된다. 이를
+  의도된 동작으로 받아들인다(doctor = "정리할 것이 있는가"의 점검이고, hint 가 정확한
+  교정 명령을 제시). 새 키만 추가하는 additive 변경을 "건강 실패"로 과하게 보는 면이
+  있으나, 별도 warn 상태를 도입하는 것(스키마·표면 변경)보다 단순함을 택한다.
+- `doctor.v1` 스키마는 변경 없음(checks 배열에 행 1개 추가 — additive, backward-compat).
+
+### 5.4 wire schema `config_migration.v1` (신규)
+
+`docs/wire-schema/v1/config_migration.schema.json` 신설. `--json` 출력:
+
+```json
+{
+  "schema_version": "config_migration.v1",
+  "dry_run": true,
+  "config_path": "/home/me/.config/kebab/config.toml",
+  "from_schema_version": 1,
+  "to_schema_version": 2,
+  "changed": true,
+  "backup_path": null,
+  "changes": [
+    { "kind": "added_section",     "path": "ingest.expansion", "detail": "doc-side 별칭 확장 (기본 off)" },
+    { "kind": "added_key",         "path": "logging.enabled",  "detail": "ingest 로그 활성화" },
+    { "kind": "removed_deprecated","path": "workspace.include","detail": "p9-fb-25: extractor 자동 결정" }
+  ]
+}
+```
+
+- `changed`: 실제(또는 dry-run 시 가정) 변경 발생 여부. false 면 changes=[].
+- `backup_path`: 실제 적용 시 `.bak` 경로, dry-run 시 `null`.
+- `kind` enum: `added_section | added_key | removed_deprecated`. (향후 `renamed`,
+  `reformatted` 확장 여지 — 본 작업은 3종.)
+- additive 신규 스키마 → 기존 통합 영향 없음. wire major bump 아님(v1 추가).
+
+## 6. 코드 배치 (crate 경계)
+
+| 위치 | 책임 | 비고 |
+|------|------|------|
+| `crates/kebab-config/src/migrate.rs` (신규) | **순수 변환**: `annotated_default_document`, `reconcile`, step 체인, `CURRENT_SCHEMA_VERSION`, 주석 카탈로그, `MigrationChange`/`ConfigMigrationReport` 타입, `migrate_document(doc) -> Vec<MigrationChange>` | I/O 없음. 문자열 in → 문자열 out 로 테스트 가능. |
+| `crates/kebab-config/Cargo.toml` | `toml_edit = "0.22"` 의존성 추가 | 주석 보존 편집 핵심. |
+| `crates/kebab-app/src/lib.rs` | **I/O 오케스트레이션**: `config_migrate_with_config_path`(read → migrate_document → 백업 → tmp write → atomic rename), `init_workspace` 가 `annotated_default_document` 사용하도록 수정, doctor 에 체크 추가 | facade. fs 부작용은 app 계층. |
+| `crates/kebab-cli/src/main.rs` | `Config { Migrate { dry_run } }` 서브커맨드, 사람용 출력 | kebab-app facade 만 호출. |
+| `crates/kebab-cli/src/wire.rs` | `wire_config_migration(report) -> Value` | `config_migration.v1` 직렬화. |
+| `docs/wire-schema/v1/config_migration.schema.json` (신규) | wire 계약 | |
+
+**경계 근거**: kebab-config 는 이미 파일 *읽기*(`from_file`)를 하지만, *쓰기*는
+`init_workspace`(app)에 있다. 일관성·테스트성 위해 순수 변환은 config, 부작용(백업·쓰기)
+은 app. doctor(app)·cli 모두 동일 순수 변환을 재사용.
+
+## 7. schema_version 의 새 의미
+
+- 기존: 항상 `1`, 검증·로직에 안 쓰이는 장식.
+- 신규: "이 파일이 sync 된 스키마 버전" 마커 + step 체인의 축.
+- `Config::defaults().schema_version` 및 `CURRENT_SCHEMA_VERSION` 을 **2** 로 bump.
+  마이그레이션 완료 시 사용자 파일의 `schema_version` 을 `CURRENT` 로 stamp.
+- 읽기 경로(`from_file`)는 여전히 `schema_version` 으로 **거부하지 않음**(forward-compat
+  유지). 즉 옛 바이너리로 새 파일을, 새 바이너리로 옛 파일을 읽어도 동작.
+
+## 8. 문서 동기화 (user-facing surface)
+
+- **README.md Configuration §**: `kebab config migrate` 한 줄 + init config 가 섹션
+  주석을 갖는다는 설명. config 예시 블록을 `annotated_default_document` 산출과 일치.
+- **docs/SMOKE.md**: config 예시 블록 동기화. migrate dry-run smoke 단계 추가.
+- **docs/DOGFOOD.md**: config 관련 section 에 migrate 시나리오(옛 파일 → migrate →
+  섹션 가시성 확인) 추가.
+- **tasks/HOTFIXES.md**: 머지 후 dated entry(`## YYYY-MM-DD — config 마이그레이션`),
+  도그푸딩 evidence(옛 config 에 빠진 섹션 N개 추가 + workspace.include 제거 멱등 확인).
+- **HANDOFF.md**: 해당되면 한 줄.
+
+## 9. 릴리스 트리거 판단
+
+- 신규 CLI 서브커맨드(`config migrate`) + doctor 체크 + init 출력 변경 = **user-visible
+  surface 변경** → 도그푸딩 필수, README 동기화 필수.
+- `schema_version` bump(1→2)는 **additive**(데이터 무효화 아님, 읽기 호환 유지) →
+  CLAUDE.md §Versioning 의 DB/wire breaking 기준엔 해당 안 됨. 다만 surface 누적이
+  있으므로 **minor bump** 대상일 수 있음. 실제 bump/release 컷 시점은 사용자 판단.
+
+## 10. 테스트 전략 (plan 의 TDD 근거)
+
+순수 변환(kebab-config)이 테스트의 중심 — 문자열 in/out, fs 불필요:
+
+1. **reconciliation 추가**: 옛 config 문자열(섹션 누락) → migrate → 누락 섹션이 주석과
+   함께 추가됐고, 기존 키·주석·순서는 보존.
+2. **값 불가침**: 사용자가 바꾼 값(예: `score_gate = 0.8`)이 migrate 후에도 유지.
+3. **멱등**: migrate 출력을 다시 migrate → `changed=false`, 동일 문자열.
+4. **step (workspace.include 제거)**: 옛 키 있는 문자열 → 제거됨 + change 기록. 없으면 noop.
+5. **schema_version stamp**: 결과의 `schema_version = 2`. 없던 파일엔 추가됨.
+6. **주석 보존**: 사용자가 임의 키에 단 주석이 migrate 후에도 그대로.
+7. (app) **백업·atomic·실패 보존**: 백업 파일 생성, tmp rename, 손상 입력 시 원본 불변.
+8. (app) **dry-run**: 파일·백업 미생성, report.changed 정확.
+9. (cli/wire) `config_migration.v1` 직렬화 형태.
+
+## 11. Risks / notes
+
+- `toml_edit` 신규 의존성 — kebab-config 에 추가. `toml`(0.8)과 공존(serde 경로는
+  여전히 `toml`, 편집 경로만 `toml_edit`). 버전은 구현 시 최신 0.22.x 확인.
+- reconciliation 의 "끝에 append" 는 사용자가 짠 미적 순서를 흩뜨릴 수 있으나(새 섹션이
+  뒤로 몰림), 값·주석·기존 순서 보존이 우선이며 단순·결정적이라 채택.
+- 첫 step(`1→2`)은 사실상 이미 무시되는 `workspace.include` 청소뿐 — step 체인은 주로
+  *프레임워크*로서 미래 non-additive 변경을 위해 깔아둔다.
+- kickoff 인계 문서와의 차이: kickoff §4.2 는 "버전별 변환 함수 체인"만 제안했으나,
+  kebab 의 serde-default 특성상 additive 변경은 step 으로 표현하기 부적절(버전 무관) →
+  **reconciliation 을 1급 메커니즘으로 승격**하고 step 은 non-additive 전용으로 한정.
--- a/docs/superpowers/specs/2026-06-01-embed-candle-track-spec.md
+++ b/docs/superpowers/specs/2026-06-01-embed-candle-track-spec.md
@@ -0,0 +1,78 @@
+# Track 1 Spec — candle e5-large 임베딩 provider (NUMA-안전)
+
+- 날짜: 2026-06-01
+- 우산: [meta-spec](./2026-06-01-embedding-numa-backends-meta-spec.md) / [meta-plan](./2026-06-01-embedding-numa-backends-meta-plan.md)
+- 선행: Phase 0 스파이크 PASS+독립검증 (cosine 1.000000, 스레드 캡 가능, latency ~4×). 커밋 76841af.
+- 브랜치: `feat/embed-candle`
+
+## 1. 목표
+
+fastembed(onnxruntime) 의 "intra-op 스레드 48 하드코딩 → NUMA 힙 손상" 을 회피하기 위해, 동일 모델 `multilingual-e5-large` 를 **candle(순수 Rust)** 로 돌리는 임베딩 provider 를 추가한다. opt-in, 품질 중립, NUMA 스레드 캡 가능.
+
+## 2. 확정 결정 (사용자 승인 2026-06-01)
+
+- **D-reindex**: `embedding_version` **유지(재색인 0)** 를 목표. 구현 중 candle vs onnxruntime 벡터의 **차원별 max 절대오차**를 측정해 사실상 동일(예: max abs diff < 1e-5)함을 확인하고, 골든 스위트로 회귀 0 을 실측해 확정. 유의미한 차이가 나오면만 version bump + 재색인.
+- **D-default**: 글로벌 default provider 는 **onnxruntime 유지**, candle 은 **opt-in** (`models.embedding.provider = "candle"`).
+- **조기 종료**: candle 이 골든 baseline 충족 시 ollama/A2 트랙 생략 (A1 stopgap 문서만 별도).
+
+## 3. 아키텍처
+
+- **신규 crate `kebab-embed-candle`** — `kebab_core::Embedder` 구현. candle 의 큰 의존성 트리를 이 crate 에 격리.
+  - 허용 deps: `candle-core`/`candle-nn`/`candle-transformers` (0.10.x), `tokenizers`, `hf-hub`, `kebab-core`, `kebab-config`, `anyhow`, `tracing`. **다른 `kebab-*` 의존 금지**(core/config 외) — design §8 경계.
+- **주입 분기**: `kebab-app/src/app.rs` 의 `embedder()` (현 :829-837, `FastembedEmbedder::new` 무조건 생성) 를 `config.models.embedding.provider` 로 분기:
+  - `"fastembed"` | `"onnx"` | (빈값/기존) → `FastembedEmbedder` (default, 기존 동작 유지).
+  - `"candle"` → `CandleEmbedder`.
+  - 알 수 없는 값 → 명확한 에러.
+- **facade 규칙 준수**: UI crate 는 `kebab-app` 만. `kebab-app` 이 `kebab-embed-candle` 의존 추가.
+
+## 4. CandleEmbedder 동작 (스파이크에서 검증된 파이프라인)
+
+- 모델: `intfloat/multilingual-e5-large` 의 `model.safetensors` + `config.json` + `tokenizer.json` 을 `hf-hub` 으로 `{model_dir}/candle/` (config `storage.model_dir`) 아래에 캐시.
+- `candle_transformers::models::xlm_roberta::{Config, XLMRobertaModel}` 로 로드 (CPU `Device::Cpu`).
+- `embed()`: e5 프리픽스(`query: `/`passage: `, `EmbeddingInput` kind 기준 — `kebab-embed-local` 의 `prefix_input` 규약과 동일) → 토크나이즈(max_len 512, batch-longest 패딩, special tokens) → forward → **attention-mask 가중 mean pooling** → **L2 정규화**.
+- `dimensions()` = 1024, `model_id`/`model_version` = config 값(기존과 동일 식별자 유지).
+- **스레드 캡**: config 신규 필드 `models.embedding.num_threads`(u32, 0=auto) + env `KEBAB_EMBED_THREADS`. `CandleEmbedder::new` 에서 `rayon::ThreadPoolBuilder::new().num_threads(n).build_global()` 1회 적용(이미 초기화 시 무시). 0/auto 면 미설정(rayon 기본). NUMA 노드 바인딩은 `numactl`(A1) 과 조합 — 문서화.
+- `Mutex<XLMRobertaModel>` 또는 forward 가 `&self` 면 불필요 — candle forward 는 `&self` 가능, 단 내부 가변 없으면 `Send+Sync` 보장 확인.
+
+## 5. config 변경
+
+- `EmbeddingModelCfg` 에 `num_threads: u32`(default 0) 추가. env `KEBAB_EMBED_THREADS`.
+- `provider` 허용값 문서화: `fastembed`(default)/`candle`.
+- default toml + `Config::default()` 갱신, 기존 테스트 영향 확인.
+
+## 6. 버전/캐스케이드
+
+- D-reindex 에 따라 `embedding_version` 유지 (벡터 동일). cascade(design §9) 트리거 안 함 — 기존 색인 재사용. (max abs diff 확인 실패 시에만 bump.)
+- wire schema 변경 없음.
+
+## 7. 테스트 (산출물)
+
+- **단위**(`kebab-embed-candle`): `dimensions()==1024`; `embed()` 출력 L2≈1; 빈 입력 빈 출력; 프리픽스 적용 확인.
+- **패리티 테스트**(`#[ignore]`, 모델 2GB+네트워크 필요): candle vs `FastembedEmbedder` 동일 문장 cosine ≥ 0.9999 + max abs diff 보고. CI 기본 제외, 수동/도그푸딩에서 실행.
+- **통합**(`kebab-cli` 또는 `kebab-app`): `provider="candle"` 로 소량 fixture ingest → 청크/임베딩 카운트 > 0, 검색 1건 성공. (모델 필요 → `#[ignore]` 또는 feature.)
+- **스레드 캡**: `num_threads=4` 설정 시 `rayon::current_num_threads()==4` 확인.
+- **회귀**: 기존 fastembed 경로 default 동작 불변(provider 미지정 시).
+- clippy `-D warnings`, 빌드 직렬 `-j 4`.
+
+## 8. 품질 게이트 (머지 전)
+
+- `kebab-eval` 골든 스위트(`/build/dogfood/golden_queries.yaml`) 를 provider=candle 로 실행 → MRR/hit@k ≥ 현 baseline (회귀 0). [[feedback_search_quality_dogfood]]
+- 패러프레이즈 robustness(#195/#196) 스폿 확인.
+
+## 9. 문서/릴리스 (머지 시 동일 PR)
+
+- README: Configuration 에 `provider=candle` + `num_threads`/`KEBAB_EMBED_THREADS` 추가. SMOKE config 예시 동기화. [[feedback_readme_sync_rule]]
+- ARCHITECTURE: crate 그래프 + 디렉터리에 `kebab-embed-candle` 추가.
+- HANDOFF: 머지 후 한 줄(임베딩 백엔드 다변화).
+- HOTFIXES: 본 날짜 dated entry (NUMA double-free 진단 + candle provider 도입 + 스파이크 패리티 증거).
+- 버전 bump: 신규 config surface(provider=candle, num_threads) = pre-1.0 minor bump (0.21.1 → 0.22.0), release notes.
+
+## 10. 범위 밖 / 후속
+
+- candle crate feature-gate 로 빌드 비용 격리 (후속).
+- NUMA 노드 자동 바인딩(현재는 numactl 외부 조합).
+- ollama/A2/A1 트랙 (candle 게이트 통과 시 생략).
+
+## 11. 잔여 게이트 (사용자 실행, Claude 불가)
+
+- 그 듀얼소켓 NUMA 서버에서 `provider=candle` 로 5150-doc ingest **double-free 없이 EXIT=0 완주**. PR 머지 전/후 검증 예약. (meta-spec §4.3)
--- a/docs/superpowers/specs/2026-06-01-embedding-numa-backends-meta-plan.md
+++ b/docs/superpowers/specs/2026-06-01-embedding-numa-backends-meta-plan.md
@@ -0,0 +1,77 @@
+# Meta-Plan — NUMA-안전 임베딩 백엔드 실행 계획
+
+- 날짜: 2026-06-01
+- 우산 스펙: [2026-06-01-embedding-numa-backends-meta-spec.md](./2026-06-01-embedding-numa-backends-meta-spec.md)
+- 실행 모델: 트랙별 worktree 격리 + omc teammate (omc-teams, sequential single-team). 트랙 내 단계는 spec → plan → 구현 → 테스트 → PR.
+
+## 0. 즉시 (본 계획과 병행, 무코드)
+
+- **A1 stopgap 문서화 + 사용자 제공**: `numactl --cpunodebind=0 --membind=0 kebab ingest` (또는 `taskset -c 0-11`). 현재 불통 해소용. 이건 트랙 4의 산출물 일부지만 지금 바로 안내.
+- 사용자 NUMA 서버에서 A1 로 5150-doc 완주되는지 1회 확인 → "스레드/NUMA 가 원인" 인과 확정(메타스펙 §1 보강).
+
+## 1. 트랙 실행 순서 & 게이트
+
+`candle → ollama → A2 → A1(정식 문서화)`. 한 트랙의 PR open + NUMA 검증 예약 전까지 다음 트랙 미착수.
+
+**조기 종료 (D1 확정)**: candle 또는 ollama 가 허용 품질(골든 ≥ baseline 무회귀) + NUMA 안전을 만족하면 **거기서 종료**, 이후 트랙 미진행. 둘 다 품질 미달 시에만 A2 → A1 진행. candle 은 동일 e5-large 라 패리티 통과 시 종착 유력.
+
+### 트랙 1 — candle (`feat/embed-candle`)
+
+- **Phase 0 — 타당성 스파이크 (게이트, 최우선)**
+  - worktree 에서 candle + candle-transformers 의존성 추가, `xlm_roberta::XLMRobertaModel` 로 `intfloat/multilingual-e5-large` safetensors 로드 (CPU).
+  - 몇 개 문장 임베딩 → (a) onnxruntime e5-large 벡터와 cosine 패리티, (b) CPU latency, (c) `RAYON_NUM_THREADS` 로 스레드 캡 동작, (d) padding_idx 위치 임베딩 정확성.
+  - 산출: 스파이크 리포트(패리티 수치 + latency + 스레드 제어 확인). **통과해야 Phase 1 진행.**
+- **Phase 1 — spec**: 트랙 spec 작성 (Embedder 구현, config provider="candle", embedding_version, 재색인 절차, 테스트 매트릭스).
+- **Phase 2 — plan**: 구현 plan.
+- **Phase 3 — 구현**: `kebab-embed-candle`(신규 crate) 또는 `kebab-embed-local` 내 provider 분기. Embedder 구현 + app.rs 주입 분기 + config.
+- **Phase 4 — 테스트**: 단위/통합 + 패리티 + 골든. 빌드는 직렬 `-j 4`.
+- **Phase 5 — PR + 검증**: gitea PR. 사용자 NUMA 서버 5150-doc 완주 + 골든 baseline 확인.
+
+### 트랙 2 — ollama (`feat/embed-ollama`)
+
+- spec → plan → 구현(`OllamaEmbedder`: `/api/embed` 호출, provider="ollama", 모델 선택[e5 GGUF 또는 bge-m3]) → 테스트(패리티/골든, 프로세스 격리로 double-free 부재) → PR + NUMA 검증.
+
+### 트랙 3 — A2 (`feat/embed-ort-direct`)
+
+- spec → plan → 구현(fastembed 우회, `ort` 세션 직접 + `with_intra_threads(N)` + NUMA affinity, 토크나이즈/mean-pool/L2 재현, provider="onnx" 기본 유지) → 테스트(기존 e5 벡터와 cosine≈1.0, 재색인 0) → PR + NUMA 검증.
+- **품질-중립 안전망**: 재색인 없이 즉시 default 가능.
+
+### 트랙 4 — A1 정식화 (`docs/embed-numa-affinity`)
+
+- 런처 래핑/문서 + (선택) config 노브로 affinity 힌트. README/SMOKE/HOTFIXES 동기화.
+
+## 2. omc teammate 운용 (메모리 규약 준수)
+
+- spawn: omc-teams tmux pane + brief 파일. **sequential single-team** (multi-team 동시 spawn 금지).
+- 모델 라우팅: executor + initial draft + round-1 review = **opus**; closure verify / micro-patch round = **sonnet**. (`OMC_TEAM_ROLE_OVERRIDES` env)
+- worker spawn 직후 completion polling shell `run_in_background=true` (phase=completed/failed 감지 → main session 자동 알림).
+- 빌드/테스트 직렬, `-j 4` 기본. `CARGO_TARGET_DIR=/build` 사용 (routinely clean 금지).
+
+## 3. 워크트리 / 브랜치
+
+| 트랙 | 브랜치 | worktree |
+|---|---|---|
+| 1 candle | `feat/embed-candle` | 신규 |
+| 2 ollama | `feat/embed-ollama` | 신규 |
+| 3 A2 | `feat/embed-ort-direct` | 신규 |
+| 4 A1 | `docs/embed-numa-affinity` | 신규 |
+
+각 트랙 머지 후 다음 트랙 rebase. 트랙 간 공유 상태 없음(독립 provider).
+
+## 4. 리스크 레지스터
+
+- candle Phase 0 패리티 실패 → 트랙 1 강등, ollama 우선.
+- candle CPU latency 가 onnxruntime 대비 과도 → opt-in provider 로만.
+- ollama 모델이 e5 아님 → 골든 회귀 가능 → default 승격 보류.
+- NUMA 검증이 사용자 가용성에 의존 → 각 PR 은 검증 전까지 "merge-pending".
+- ort rc.9 자체 버그가 A2 에서도 재현 가능성 → A2 스레드 캡으로도 안 죽는지 NUMA 검증 필수.
+
+## 5. 진행 상태 (라이브)
+
+- [x] candle 타당성 desk-research (xlm_roberta 모듈 존재 + cembedd 선례) — 2026-06-01
+- [ ] A1 stopgap 사용자 NUMA 서버 확인
+- [x] 트랙 1 Phase 0 스파이크 — **VERDICT=PASS** (2026-06-01). cosine min=mean=1.000000(onnxruntime 동일), RAYON 스레드 캡 가능, latency ~4×(67.5 vs 16.8 ms/문장, 4 vs 12 스레드). 커밋 76841af. → **조기 종료 유력**: candle 이 품질 baseline 자동 충족 → ollama/A2/A1 불필요 전망. 잔여 게이트=골든 실측 + NUMA 서버 5150-doc 완주.
+- [ ] 트랙 1 spec/plan/impl/test/PR (진행)
+- [ ] 트랙 2 …
+- [ ] 트랙 3 …
+- [ ] 트랙 4 …
--- a/docs/superpowers/specs/2026-06-01-embedding-numa-backends-meta-spec.md
+++ b/docs/superpowers/specs/2026-06-01-embedding-numa-backends-meta-spec.md
@@ -0,0 +1,102 @@
+# Meta-Spec — NUMA-안전 임베딩 백엔드 (다중 트랙)
+
+- 날짜: 2026-06-01
+- 상태: DRAFT (umbrella)
+- 범위: `kebab-embed-local` 및 임베더 주입 경로. 4개 트랙의 우산 스펙.
+- 하위 산출물: 각 트랙은 본 메타스펙을 참조하는 자체 spec(`tasks/` 또는 `docs/superpowers/specs/`)과 plan을 가진다.
+
+## 1. 문제
+
+CPU-only Ollama 서버(Intel Xeon Silver 4214 ×2 소켓 = 48 logical, NUMA 2노드)에서 `kebab ingest` 가 매 실행 힙 손상으로 죽는다:
+
+```
+ingest [>     ] 3/5150  double free or corruption (!prev)
+중지됨 (core dumped)
+```
+
+근본 원인(코드로 확정): fastembed 4.9.1 (`text_embedding/impl.rs:52,80`) 이 ONNX intra-op 스레드를 `available_parallelism()`(=48) 로 **하드코딩**하고 `InitOptions` 에 이를 덮어쓸 API 가 없다. 듀얼소켓 NUMA 에서 onnxruntime(`ort 2.0.0-rc.9`) 스레드풀이 힙을 손상시킨다. 진단 근거: `tasks/HOTFIXES.md` 의 본 날짜 entry + 대화 로그.
+
+- 모델/디스크/AVX/데이터 문제 아님 (모델 2.08GB 정상, AVX-512 완비). 순수 스레드/NUMA × 네이티브 런타임 버그.
+- onnxruntime 공식 문서도 듀얼소켓 NUMA 는 intra-op 스레드를 한 노드로 묶으라고 권고.
+
+## 2. 목표 / 비목표
+
+목표:
+- 그 NUMA 서버에서 5150-doc 코퍼스를 **double-free 없이 완주**하는 임베딩 경로 확보.
+- 검색 품질을 골든 스위트(MRR/hit@k) baseline 이상으로 유지.
+- `models.embedding.provider` 로 선택 가능한 백엔드들로 구현 (기존 provider 필드 활용).
+
+비목표:
+- 랭킹 자동 조정 (별도 보류 결정, `[[project_ranking_deferred]]`).
+- 임베딩 모델 품질 개선 자체 (NUMA 안정성이 본 과제의 초점).
+- GPU 경로.
+
+## 3. 공유 아키텍처
+
+- 교체 지점은 **단일**: `crates/kebab-app/src/app.rs:836` 의 `FastembedEmbedder::new(&config)`.
+- 트레이트 표면이 작다: `kebab_core::Embedder` (`traits.rs:127`) — `model_id / model_version / dimensions / embed`. 새 백엔드는 이 4개만 구현.
+- 설정: `models.embedding.provider` (이미 존재), `model`, `version`, `dimensions`, `batch_size`. 신규로 트랙별 스레드/affinity 노브 추가 가능.
+
+## 4. 횡단 정책 (모든 트랙 공통)
+
+### 4.1 embedding_version & 재색인
+- 벡터가 바뀌면(=candle, ollama) **`embedding_version` bump → 전체 재색인** (design §9 cascade). A2/A1 은 동일 onnxruntime e5-large 라 벡터 불변 → 재색인 불필요.
+- 재색인 비용/절차를 각 트랙 spec 에 명시.
+
+### 4.2 품질 검증 (필수 게이트)
+- 벡터가 바뀌는 트랙은 머지 전 `kebab-eval` 골든 스위트(`/build/dogfood/golden_queries.yaml`) 로 MRR/hit@k 측정, **baseline 이상**이어야 default 승격. baseline 미달이면 opt-in provider 로만 유지.
+- 패러프레이즈 robustness(#195/#196) 회귀 확인.
+
+### 4.3 NUMA 서버 검증 (필수 게이트, 사용자 실행)
+- **결정적 증거는 그 서버에서만 난다 (Claude 접근 불가).** 각 트랙은 사용자가 그 서버에서 5150-doc 코퍼스 ingest 를 **double-free 없이 완주(EXIT=0)** 함을 확인해야 "검증 완료".
+- 각 트랙 spec 에 사용자-실행 검증 절차(명령 + 기대 출력)를 문서화.
+
+### 4.4 스레드/NUMA 제어
+- 각 백엔드가 intra-op/worker 스레드를 캡하고 한 NUMA 노드로 묶을 수 있어야 함. 캡 못 하면 트랙 실패.
+
+## 5. 트랙
+
+선호/구현 순서: **candle → ollama → A2 → A1**. (단 A1 은 무코드 stopgap 이라 즉시 문서화해 당장의 불통을 해소; 구현 순서와 별개.)
+
+| # | 트랙 | 백엔드 | 벡터 변경(재색인) | 핵심 리스크 | 격리 브랜치 |
+|---|------|--------|----|------|------|
+| 1 | candle | 순수 Rust (candle `xlm_roberta`) | 예 | XLM-R padding_idx/패리티/CPU 성능 | `feat/embed-candle` |
+| 2 | ollama | 별 프로세스 (Ollama `/api/embed`) | 예 | 모델이 e5 아님→품질, ingest 가 Ollama 의존 | `feat/embed-ollama` |
+| 3 | A2 | onnxruntime 직접(`ort` 세션) | 아니오 | fastembed 우회 후 토크나이즈/풀링 재현 정확도 | `feat/embed-ort-direct` |
+| 4 | A1 | onnxruntime + 실행 래핑(taskset/numactl) | 아니오 | 코드 변경 거의 없음, 문서/런처만 | `docs/embed-numa-affinity` |
+
+### 5.1 트랙별 테스트 매트릭스 (각 트랙 spec 에서 구체화)
+
+모든 트랙:
+- 단위: `embed()` 가 올바른 dim/정규화(L2≈1) 벡터 반환.
+- 통합: `kebab ingest` 소량 fixture → 청크/임베딩 카운트.
+- **NUMA 서버 검증**(§4.3): 5150-doc 완주.
+
+벡터-변경 트랙(candle/ollama) 추가:
+- 패리티: onnxruntime e5-large 대비 동일 입력 cosine 유사도(가능 시) 또는 골든 스위트 동등성.
+- 골든: MRR/hit@k ≥ baseline (§4.2).
+- 재색인 절차 검증.
+
+벡터-불변 트랙(A2/A1) 추가:
+- 회귀: 기존 e5-large 벡터와 cosine ≈ 1.0 (A2 는 같은 런타임이라 사실상 동일해야).
+
+## 6. 결정사항 (확정 2026-06-01)
+
+- **D1 조기 종료 (사용자 확정)**: 트랙을 선호 순서로 진행하되, candle 또는 ollama 가 **허용 품질 기준 + NUMA 안전**을 만족하면 **거기서 멈춘다** (이후 트랙 미진행). 둘 다 품질이 너무 낮으면 A2 → A1 까지 계속.
+  - **허용 품질 기준**: 골든 스위트 MRR/hit@k 가 현 e5-large(onnxruntime) baseline 대비 유의미한 회귀 없음. candle 은 동일 e5-large 가중치라 패리티 통과 시 이 기준을 거의 자동 충족 → candle 이 종착 가능성 높음. ollama 는 모델이 달라 경계선이면 사용자 판단.
+  - A2/A1 은 candle·ollama 둘 다 실패 시의 **fallback** (A2 는 재색인 0 품질-중립).
+- **D2 즉시 완화**: A1(taskset/numactl) 은 무코드라 본 작업과 무관하게 지금 바로 사용자에게 워크어라운드로 제공.
+- **D3 메타 산출물 위치**: 본 메타스펙 + 메타플랜은 `docs/superpowers/specs/`. 트랙별 spec 은 도달 시 작성.
+- **D4 frozen design 영향**: 임베딩 백엔드 다변화는 design §(임베딩) 갱신 가능 — 트랙 머지 시 동기화.
+
+## 7. 성공 기준
+
+- 그 NUMA 서버에서 최소 1개 트랙이 5150-doc 완주(EXIT=0).
+- default 로 승격되는 백엔드는 골든 baseline 이상.
+- 각 트랙이 자체 브랜치/워크트리 + 문서화된 테스트로 독립 검증.
+
+## 8. 시퀀싱 게이트
+
+1. candle **스파이크**(Phase 0) 가 패리티+CPU 성능+스레드 제어를 입증해야 candle 본 구현 진행. 실패 시 candle 트랙 강등/스킵 후 ollama 로.
+2. 각 트랙은 PR open + NUMA 서버 검증 예약 후 다음 트랙 시작 (omc-teams sequential single-team 제약).
+3. 벡터-변경 트랙은 골든 게이트 통과 전 default 승격 금지.
--- a/docs/wire-schema/v1/config_migration.v1.schema.json
+++ b/docs/wire-schema/v1/config_migration.v1.schema.json
@@ -0,0 +1,38 @@
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "title": "config_migration.v1",
+  "description": "Result of `kebab config migrate` — schema reconciliation of a user's config.toml.",
+  "type": "object",
+  "required": [
+    "schema_version",
+    "config_path",
+    "dry_run",
+    "from_schema_version",
+    "to_schema_version",
+    "changed",
+    "changes"
+  ],
+  "properties": {
+    "schema_version": { "const": "config_migration.v1" },
+    "config_path": { "type": "string" },
+    "dry_run": { "type": "boolean" },
+    "from_schema_version": { "type": "integer" },
+    "to_schema_version": { "type": "integer" },
+    "changed": { "type": "boolean" },
+    "backup_path": { "type": ["string", "null"] },
+    "changes": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "required": ["kind", "path", "detail"],
+        "properties": {
+          "kind": {
+            "enum": ["added_section", "added_key", "removed_deprecated"]
+          },
+          "path": { "type": "string" },
+          "detail": { "type": "string" }
+        }
+      }
+    }
+  }
+}
--- a/docs/wire-schema/v1/ingest_progress.schema.json
+++ b/docs/wire-schema/v1/ingest_progress.schema.json
@@ -14,6 +14,9 @@
        "scan_completed",
        "asset_started",
        "asset_finished",
+        "asset_chunked",
+        "expansion_progress",
+        "asset_timings",
        "embed_batch_started",
        "embed_batch_finished",
        "pdf_ocr_started",
@@ -33,7 +36,13 @@
      "enum": ["new", "updated", "skipped", "error"],
      "description": "asset_finished: per-asset outcome (mirrors `ingest_report.v1.items[].kind`)."
    },
-    "chunks":       { "type": "integer", "minimum": 0, "description": "asset_finished: chunk count produced for this asset." },
+    "chunks":       { "type": "integer", "minimum": 0, "description": "asset_finished / asset_chunked / expansion_progress (v0.24.0): chunk count produced for this asset." },
+    "done":         { "type": "integer", "minimum": 0, "description": "expansion_progress (v0.24.0, additive): chunks processed so far in the per-chunk alias-expansion loop (cache hits included). Throttled: emitted at most every 25 chunks or once per second, plus a final frame where done == chunks." },
+    "parse_ms":     { "type": "integer", "minimum": 0, "description": "asset_timings (v0.24.0, additive): parse phase wall-clock (ms). Markdown path only." },
+    "chunk_ms":     { "type": "integer", "minimum": 0, "description": "asset_timings (v0.24.0, additive): chunk phase wall-clock (ms). Markdown path only." },
+    "expansion_ms": { "type": "integer", "minimum": 0, "description": "asset_timings (v0.24.0, additive): alias-expansion phase wall-clock (ms). Markdown path only; 0 when expansion is disabled." },
+    "embed_ms":     { "type": "integer", "minimum": 0, "description": "asset_timings (v0.24.0, additive): embed + vector phase wall-clock (ms) — embedding, vector upsert, and stale-vector purge. Markdown path only." },
+    "store_ms":     { "type": "integer", "minimum": 0, "description": "asset_timings (v0.24.0, additive): SQLite persist phase wall-clock (ms) — put_asset/document/blocks/chunks only. Markdown path only." },
    "n_chunks":     { "type": "integer", "minimum": 0, "description": "embed_batch_started / embed_batch_finished: chunks in this embedding batch." },
    "ms":           { "type": "integer", "minimum": 0, "description": "embed_batch_finished / pdf_ocr_finished: wall-clock duration (ms). pdf_ocr_finished skip path 의 의미는 mixed (DCTDecode 부재 시 0, engine 실패 시 latency-before-bail)." },
    "chars":        { "type": "integer", "minimum": 0, "description": "pdf_ocr_finished: char count of OCR result. Skip 시 0." },
--- a/tasks/HOTFIXES.md
+++ b/tasks/HOTFIXES.md
@@ -14,6 +14,206 @@ historical contract that was implemented; this file accumulates the
 deltas so phase 5+ readers can find the live behavior without diffing
 git history.

+## 2026-06-02 — 상세 ingest 진행 로깅 (asset 내부 phase 가시화, v0.24.0)
+
+**무엇이 문제였나.** ingest 진행 이벤트가 asset(문서) 단위(`asset_started` /
+`asset_finished`)뿐이라 한 문서 내부의 parse / chunk / **expansion(별칭 LLM,
+청크당 순차 호출)** / embed / store 가 깜깜했다. expansion 은 청크당 ~1~4s
+(원격 GPU Ollama)이고 큰 문서는 청크 수백~천 개 → 그 한 문서에서 수십 분이
+걸리는데, 진행바는 `1/5150` 에 멈춘 듯 보여 사용자가 병목을 못 봤다.
+
+**무엇을 추가했나 (wire `ingest_progress.v1` additive, 호환 유지).**
+`IngestEvent` 에 세 변이 추가 — `#[serde(tag="kind")]` 라 신규 `kind` 추가는
+wire v1 호환:
+
+- `asset_chunked { idx, total, chunks }` — 청킹 직후(expansion/embed 전) 즉시
+  "이 문서가 N청크" 노출. markdown / image / pdf 세 경로 모두 emit.
+- `expansion_progress { idx, total, done, chunks }` — expansion 루프 중
+  **스로틀** 발신(매 25청크 또는 ≥1s, 종료 시 `done == chunks` 1프레임 더).
+  캐시 히트 청크도 `done` 에 포함(warm 재색인 fast-forward 가시화). 채널 폭주
+  방지 — 매 청크 emit 금지.
+- `asset_timings { idx, total, parse_ms, chunk_ms, expansion_ms, embed_ms,
+  store_ms }` — asset 처리 phase 별 소요시간. **markdown 경로만** emit
+  (image/pdf 는 phase shape 가 달라 생략; AssetChunked 만 emit).
+
+**설계 결정 — AssetTimings 이벤트 vs AssetFinished 필드.** IMPL_BRIEF §1 은
+`AssetFinished` 에 optional phase-timing 필드를, §2 는 대안으로 신규
+`AssetTimings` 이벤트를 제시(권장). 후자를 택함 — `AssetFinished` 는 호출부
+(`ingest_with_config_progress` 루프)에서 만들어지는데 timing 데이터는
+`ingest_one_asset` 내부에만 있어, 필드를 채우려면 `kebab_core::IngestItem`
+(wire-stable struct) 변경 또는 별도 plumbing 이 필요. `ingest_one_asset` 가
+`progress` 핸들을 이미 들고 있으므로 새 이벤트를 직접 emit 하는 쪽이 crate
+경계(kebab-core 불변)도 지키고 더 깔끔. `AssetFinished` 는 손대지 않음.
+
+**CLI 렌더(`kebab-cli` progress.rs).** `asset_chunked` → 진행바 message `→ N
+chunks`. `expansion_progress` → message `별칭 확장 {done}/{chunks}` (라이브).
+`asset_timings` → asset 종료 시 `⏱ parse Xs · chunk Ys · expand Zs · embed Ws
+· store Vs` 한 줄(`fmt_ms`: <1s 는 ms, ≥1s 는 1-decimal 초). `--json` 은
+`emit_json` 이 임의 이벤트를 직렬화하므로 자동 처리. `--quiet` 억제, 비-TTY
+expansion_progress 는 로그 폭주 방지로 기본 억제(진행바 message 로 커버).
+
+**검증.** `cargo clippy --workspace --all-targets -- -D warnings` exit 0,
+`cargo test -p kebab-app -p kebab-cli` exit 0. 단위 테스트: ingest_progress.rs
+(3 신규 변이 직렬화 `kind` 판별 + 순서 불변식 재작성), progress.rs(`fmt_ms` 단위
+전환), 통합(`--json`/human stderr 에 새 이벤트 흐름). 실동작 smoke: 2-문서 ingest
+의 `--json` 에 `asset_chunked`/`asset_timings` 출현 + human `⏱ parse…·store…` 라인
+확인. expansion 라이브 카운터는 원격 LLM 필요라 단위/통합으로 커버.
+
+**리뷰 반영.** (1) `store_ms` 경계 정정 — stale-vector orphan purge(LanceDB I/O)를
+`store_ms`(SQLite persist 전용)에서 빼 `embed_ms`(vector phase)로 이동. 진단
+정확도: store_ms 가 이제 SQLite put_* 만 의미(편집 재색인 시 920ms 가 실은 벡터
+삭제였던 오귀속 제거). purge 는 여전히 unconditional + 새 upsert 이전 실행 —
+기능 동등. (2) 최종 `expansion_progress` 프레임을 `done != last_done` 로 가드 —
+chunks 가 throttle 배수일 때의 중복 프레임 + chunks==0 시 0/0 프레임 제거.
+
+**알려진 한계.** image/pdf 경로는 phase timing 없음(AssetChunked 만).
+expansion_progress 비-TTY 억제는 의도적(필요 시 `--json` 으로 전량 관측).
+
+## 2026-06-02 — ingest 백엔드/디바이스 표시 + KB 이전 문서 (v0.23.1)
+
+**동기.** Metal 빌드가 실제로 GPU 를 쓰는지 사용자가 터미널에서 못 봐서 Activity
+Monitor 로 확인해야 했다(`select_device()` 의 device 로그는 kb.log 파일로만, 기본
+EnvFilter=warn 이라 `--verbose` 필요). 또 "어떤 DB 파일을 옮기나" 가 README 에
+구체적이지 않았다.
+
+**무엇.** (1) `kebab-cli` ingest 시작 시 임베딩 백엔드/모델/차원을 stderr 한 줄로
+표시(`임베딩 백엔드: candle (Metal/GPU 빌드) · 모델 …`), `--json`/`--quiet` 에선
+억제. Metal 표기는 `cfg!(feature="embed_metal")` 기반(빌드 사실); 확정 런타임
+디바이스는 여전히 kb.log(`candle device = …`). (2) README "외부 계산 + 로컬 검색"
+절에 복사 대상 2개(`kebab.sqlite`/`sqlite`, `lancedb/`/`vector_dir`)와 `[storage]`
+config 키·`models/`·`assets/` 복사 불필요·동일 버전/모델 조건·rsync 예시 추가.
+
+**범위.** CLI 출력 + 문서만. 동작·wire·schema·벡터 변경 없음. 버전 0.23.0 → 0.23.1.
+
+## 2026-06-02 — candle Metal(Apple Silicon GPU) opt-in build feature
+
+**동기.** candle CPU 임베딩은 e5-large/512-tok 에서 ~1.5~1.9 s/chunk 로 느리고,
+코어를 더 줘도(rayon/MKL) 안 빨라진다(병목=커널 효율). 대용량 코퍼스(수만 청크)는
+CPU 로는 수 시간. 사용자 워크플로: **M4 Pro 맥에서 GPU 로 빠르게 색인 → sqlite +
+lancedb 만 Linux NUMA 서버로 복사 → 서버는 CPU candle 로 질의** (벡터 동일 모델이라
+호환, KB 이식성은 06-01 항목 + workspace_path 상대경로 + chunks.text 저장으로 확인).
+
+**무엇.** `kebab-embed-candle` 에 `metal` feature 추가 →
+`candle-core/-nn/-transformers` 의 metal 백엔드 활성. `select_device()` 가 metal
+빌드 시 `Device::new_metal(0)` 선택(실패 시 CPU fallback), 비-metal 빌드는 기존
+`Device::Cpu` 그대로. host 복사 전 `.contiguous()` 추가(Metal 의 strided view 가
+`to_vec2` 거부 — CPU 는 허용). feature passthrough: `kebab-app/embed_metal` →
+`kebab-cli/embed_metal`. 빌드: `cargo build --release --features embed_metal`(macOS).
+
+**제약 / 검증 분담.** metal 은 **macOS 전용 컴파일** — Linux CPU 머신(개발/서버)은
+비-metal 경로만 빌드(검증: clippy 0 + candle 단위 6 + thread_cap + parity, exit 0).
+**Metal 실행·속도·벡터 패리티(GPU vs CPU)는 M4 Pro 에서 사용자 검증** (Claude 의
+Linux 환경에서 불가). 로그 `candle device = Metal (GPU)` 로 GPU 사용 확인.
+
+**호환성.** default(비-metal) 동작·벡터 불변. wire/schema 변경 없음. 버전 0.22.0 →
+**0.23.0** (신규 opt-in build feature surface).
+
+amends: `docs/superpowers/specs/2026-06-01-embed-candle-track-spec.md` (§10 후속 — GPU 가속).
+
+## 2026-06-01 — candle 임베딩 provider (NUMA double-free 회피, opt-in)
+
+**무엇이 문제였나.** 듀얼소켓 NUMA 서버에서 `provider=fastembed`(onnxruntime)로
+대규모 ingest(5150-doc)를 돌리면 onnxruntime 가 intra-op 스레드를 48개로
+하드코딩해 NUMA 힙을 손상시키고 double-free 로 프로세스가 죽었다. 스레드 수를
+config 로 줄일 surface 가 없었고, fastembed 4.9 의 ORT 바인딩은 이를 노출하지
+않는다.
+
+**진단 / 결정 (사용자 승인 2026-06-01).** 같은 모델
+`intfloat/multilingual-e5-large` 를 **candle(순수 Rust)** 로 돌리는 임베딩
+provider 를 추가하기로 결정. candle 의 CPU 백엔드는 글로벌 rayon 풀 크기로
+스레드를 정하므로, 한 번의 `rayon::ThreadPoolBuilder::build_global` 캡으로
+스레드를 NUMA-안전한 수로 묶을 수 있다. **재색인 0 목표**(`embedding_version`
+유지) — Phase 0 스파이크(커밋 76841af)가 candle vs onnxruntime **코사인
+1.000000** 패리티를 입증했고, 본 Track 1 구현의 패리티 테스트로 차원별 max
+절대오차를 재실측해 확정.
+
+**무엇을 건드렸나.**
+- 신규 crate `crates/kebab-embed-candle` — `kebab_core::Embedder` 구현
+  (`CandleEmbedder`). 스파이크 파이프라인(safetensors via hf-hub → XLM-RoBERTa
+  forward → attention-mask mean pooling → L2 → e5 prefix)을 production 으로
+  흡수. deps 는 candle 트리를 이 crate 에 격리 (core/config 외 다른 kebab-*
+  의존 0 — design §8 경계). 모델 캐시 `{model_dir}/candle/`.
+- 스레드 캡: `[models.embedding].num_threads`(u32, default 0=auto) + env
+  `KEBAB_EMBED_THREADS`(우선). `CandleEmbedder::new` 에서 n>0 이면 글로벌 rayon
+  풀 1회 캡(이미 init 시 no-op).
+- 주입 분기: `kebab-app::App::embedder()` 가 `config.models.embedding.provider`
+  분기 — `fastembed`/`onnx`/(빈값) → 기존 `FastembedEmbedder`(동작 불변),
+  `candle` → `CandleEmbedder`, 미지값 → 에러. `none` 은 기존 lexical-only 유지.
+- 스파이크 crate `crates/spike-embed-candle` 제거(학습은 production 으로 흡수됨).
+- 버전 0.21.1 → **0.22.0** (신규 config surface — pre-1.0 minor bump).
+
+**패리티 증거.** candle vs `FastembedEmbedder`(onnxruntime), 동일 10문장
+(한/영 혼합, e5 `passage:`/`query:` prefix): **cosine_min = 1.000000,
+차원별 max 절대오차 = 2.01e-7** (f32 커널 반올림 수준 — 랭킹 영향 임계보다
+약 50배 작음). 재현: `cargo test -p kebab-embed-candle --release -- --ignored
+--nocapture` (`crates/kebab-embed-candle/tests/parity.rs`, 모델 ~2GB 필요라
+CI 기본 제외). 이 수치가 `embedding_version` 유지(재색인 0) 결정의 근거.
+
+**호환성.** fastembed default 경로의 동작/벡터 불변. `embedding_version`
+유지 → 기존 색인 재사용(재색인 0). wire schema 변경 없음. 옛 config.toml 은
+`num_threads` 가 serde default(0)로 채워져 그대로 파싱.
+
+**잔여 게이트 (사용자 실행, Claude 불가).** 그 듀얼소켓 NUMA 서버에서
+`provider=candle` 로 ingest 가 double-free 없이 EXIT=0 완주하는지 — 사용자
+배포·실사용이 곧 이 검증을 겸한다 (meta-spec §4.3).
+
+**도그푸딩 (2026-06-02, 단일소켓 12-thread VM).** `provider=candle` +
+`config-candle.toml`(expansion off — 임베더 격리) 로 `/build/dogfood/corpus`
+전체 재색인: **scanned=998, new=997, errors=0, stderr=0, KB 997 docs /
+23,151 chunks**, duration ≈ 34,329 s (9.5 h). candle 가 23k+ 청크를 메모리
+오류 0 으로 완주 — onnxruntime 이 서버에서 6/5150 에 죽던 것과 정반대.
+(이 VM 은 비-NUMA 라 NUMA 자체 재현은 아니나, candle 은 onnxruntime 을
+호출하지 않으므로 동일 크래시 종류가 구조적으로 불가.)
+
+**A1(taskset/numactl) 워크어라운드 실서버 반증 (2026-06-02).** 사용자가 NUMA
+서버에서 `taskset -c 0-3 kebab ingest`(fastembed/onnx 바이너리) 실행 → 4코어로
+제한했는데도 6/5150 에서 `세그멘테이션 오류 (core dumped)`. 스레드 축소가
+onnxruntime 힙 손상을 제거하지 못함(크래시 위치만 3→6 이동). 결론: 이 크래시는
+스레드 *수* 문제가 아니라 onnxruntime 네이티브 코드의 메모리 안전 결함 →
+**A1 은 신뢰 불가 우회책. candle(onnxruntime-free)이 유일한 실 해법.**
+
+**MKL 가속 부정 결과 (2026-06-02).** "candle 이 코어를 더 쓰게" 하려고 candle
+`mkl` feature(Intel MKL) 를 벤치 (e5-large, 512-tok 청크, N=32):
+pure-Rust 1857 ms/chunk(381% CPU) vs MKL 2574 ms/chunk(896% CPU, rayon12+mkl12)
+/ 2792 ms/chunk(817% CPU, rayon1+mkl12). **MKL 은 코어를 더 쓰지만 모든 설정에서
+38~50% 더 느림** (MKL 2020.1 sgemm + 스레드 오버헤드/과다구독; candle 0.10.2 는
+f16 `hgemm_` 미해결로 링크도 실패 — 벤치는 호출 안 되는 스텁으로 우회). 또
+pure-Rust 는 rayon 8↔12 간 throughput 불변(~1.86 s/chunk) — 병목은 코어 수가
+아니라 candle e5-large/512tok 커널 효율. **결론: MKL 미채택, 순수-Rust 유지(안전
+최상 + CPU 에서 더 빠름). 속도 레버는 코어가 아니라 청크 길이/모델 크기/GPU.**
+
+amends: `docs/superpowers/specs/2026-06-01-embed-candle-track-spec.md`.
+
+## 2026-05-31 — config 마이그레이션 (`kebab config migrate`)
+
+**Trigger**: config.toml 스키마가 진화해도(v0.21.0 의 `[ingest.expansion]` 등) 기존 사용자 파일은 serde default 로 *동작*만 호환될 뿐 새 섹션이 파일에 안 써져 사용자가 노브의 존재를 알 수 없었다. DB 의 V00X refinery 와 달리 config 엔 마이그레이션 메커니즘이 없어 추가. 설계 `docs/superpowers/specs/2026-05-31-config-migration-design.md`, 계획 `docs/superpowers/plans/2026-05-31-config-migration.md`, PR #198.
+
+### 메커니즘
+
+`kebab config migrate` 가 (1) **reconciliation** — `Config::defaults()` 구조에 있고 사용자 파일에 없는 섹션/키를 주석과 함께 `toml_edit` 으로 추가(버전 무관·멱등) + (2) **step 체인** — `schema_version` 기반 non-additive 변환(첫 step v1→v2 = `workspace.include` 제거, p9-fb-25). `init` 과 migrate 가 `annotated_default_document()` 로 주석·헤더 단일 원천 공유 → init config 도 섹션 주석 보유. `schema_version` default 1→2(sync 마커+step 축). 안전 3축=멱등·백업(`.bak`, 원본 byte-identical)·dry-run + tmp atomic rename(round-trip 검증). 순수변환=`kebab-config/migrate.rs`, I/O facade=`kebab-app`.
+
+### 도그푸딩 evidence (v0.21.0 release 바이너리)
+
+옛 스키마 흉내(`schema_version=1`, `[workspace]`+`[search]`+`[rag]`, `workspace.include` 보유, 사용자가 `default_k=25`/`score_gate=0.8`+인라인 주석 손봄):
+
+| 시나리오 | 결과 |
+|----------|------|
+| `migrate --dry-run` | 22 changes 나열, **파일 미수정** |
+| `migrate` | 적용 v1→v2, `.bak` **원본과 byte-identical**(diff 0) |
+| 값·주석 보존 | `root="~/MyNotes" # 내가 직접 바꾼…`, `default_k=25`, `score_gate=0.8` 유지 |
+| deprecated 정리 | `workspace.include` 제거(grep 0) |
+| 가시화 | `[ingest.expansion]`·`[logging]`·`[pdf.ocr]` 등장 |
+| 멱등 | 재실행 → `config 이미 최신입니다 (schema v2)` |
+| doctor | `✓ config_migration  config up to date (schema v2)` |
+| `--json` | `config_migration.v1` (kind=added_section/removed_deprecated) |
+
+### 알려진 한계 / 결정
+
+- 누락 섹션은 테이블 끝 append(순서 미보존, 값·주석·기존순서는 보존).
+- 통째 누락 부모는 부모 경로 1건 기록, 부분 존재 부모는 leaf 경로 기록(재귀 깊이 차이).
+- doctor 의 `config_migration` ok=false 가 전체 `DoctorReport.ok` 를 false 로 만듦(의도; hint 가 교정 명령 제시, warn 상태 미도입).
+- `schema_version` bump(1→2)은 additive(데이터 무효화 아님, 읽기 호환 유지) → DB/wire breaking release 트리거 아님. 신규 CLI 서브커맨드+doctor 체크+init 출력 변경은 user-visible surface.
+
 ## 2026-05-31 — doc-side expansion 별칭 개선 + 파생물 캐시(V012)

 **Trigger**: Phase 2 doc-side expansion(별칭) 효과를 실사용 규모(한국어 나무위키 ~1000 문서 CS corpus)로 검증하고, 그 과정에서 드러난 별칭 생성 비용을 "내용 해시 기반 파생물 캐시"로 해소. v0.21.0 cut. 측정 상세: `docs/superpowers/handoffs/2026-05-31-namu-wiki-alias-cache-study.md`, 설계: `docs/superpowers/specs/2026-05-31-derivation-cache-design.md`.
Author	SHA1	Message	Date
altair823	acb4fa6c65	Merge pull request 'feat(ingest): asset 내부 phase 진행 로깅 (asset_chunked/expansion_progress/asset_timings)' (#201 ) from feat/ingest-progress-detail into main Reviewed-on: #201	2026-06-02 17:26:47 +00:00
altair823	8bfa4ba76e	fix(ingest-progress): 리뷰 반영 — store_ms 경계 정정 + 중복 expansion 프레임 가드 - store_ms 에서 stale-vector orphan purge(LanceDB I/O) 제거 → embed/vector phase (embed_ms)로 이동. store_ms 가 이제 SQLite put_* 만 의미(진단 정확도; 편집 재색인 시 920ms 오귀속 제거). purge 는 여전히 unconditional + upsert 이전. - 최종 expansion_progress 프레임을 done != last_done 로 가드 (throttle 배수 시 중복 프레임 + chunks==0 시 0/0 프레임 제거). - schema/HOTFIXES: store_ms/embed_ms 설명 정정 + dangling IMPL_REPORT 참조 제거. clippy -D warnings 0, test 312 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 14:49:02 +00:00
altair823	ad0ccf4ccf	chore(ingest-progress): remove process artifacts before PR Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 14:13:36 +00:00
altair823	b351523e51	docs(worktree): IMPL_BRIEF + IMPL_REPORT for ingest-progress-detail 작업 입력(brief)과 산출 증거(report: 변경/이벤트/exit-code 검증/smoke 샘플/ 잔여 리스크). 메인 세션이 PR 정리 시 드롭 가능한 worktree 메타 아티팩트. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 13:58:33 +00:00
altair823	a48b055358	feat(ingest): asset 내부 phase 진행 로깅 (asset_chunked/expansion_progress/asset_timings) + v0.24.0 asset(문서) 단위뿐이던 ingest 진행 이벤트에 문서 내부 phase 가시성을 추가. 큰 문서가 expansion(별칭 LLM, 청크당 순차)으로 수십 분 걸려도 진행바가 1/N 에 멈춘 듯 보이던 문제 해결. wire ingest_progress.v1 additive (backward-compat): - asset_chunked {idx,total,chunks} — 청킹 직후, markdown/image/pdf 전 경로 - expansion_progress {idx,total,done,chunks} — expansion 루프 스로틀 (25청크 또는 1s, 종료 시 done==chunks). 캐시 히트도 done 에 포함 - asset_timings {idx,total,parse_ms,chunk_ms,expansion_ms,embed_ms,store_ms} — markdown 경로 phase별 wall-clock 설계: timing 은 kebab_core::IngestItem(wire-stable) 변경을 피해 신규 AssetTimings 이벤트로 ingest_one_asset 가 직접 emit (AssetFinished 무변경). CLI(progress.rs): 진행바 sub-message(→ N chunks / 별칭 확장 done/chunks) + asset 종료 시 phase timing 한 줄(fmt_ms). TUI reducer no-op arm. 검증: clippy -D warnings exit 0; cargo test -p kebab-app -p kebab-cli 312 passed/0 failed. ordering-invariant 테스트 재작성 + 신규 직렬화 테스트. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 13:58:27 +00:00
altair823	581e1d5d55	feat(cli): ingest 시 임베딩 백엔드/디바이스 한 줄 표시 + README KB 이전 문서 (v0.23.1) - kebab-cli ingest: 시작 시 `임베딩 백엔드: <provider> (Metal/GPU 빌드\|CPU) · 모델 …` 를 stderr 로 표시 (--json/--quiet 억제). Metal 표기는 cfg!(feature=embed_metal) 기반; 확정 런타임 디바이스는 kb.log(`candle device = …`). - README: '외부 계산 + 로컬 검색' 절에 복사 대상(kebab.sqlite/sqlite, lancedb/vector_dir) + [storage] config 키 + models/assets 복사 불필요 + 동일 버전/모델 조건 + rsync 예시. - 버전 0.23.0 → 0.23.1 (CLI 출력 + 문서만, 동작/schema 불변). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 12:25:45 +00:00
altair823	c17d6e67a8	Merge pull request 'feat(embed): candle Metal (Apple Silicon GPU) opt-in build feature' (#200 ) from feat/embed-candle-metal into main Reviewed-on: #200	2026-06-02 11:40:52 +00:00
altair823	af8fd34716	docs(embed): README 에 cargo install --features embed_metal 안내 추가 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 11:38:28 +00:00
altair823	369aeb3d24	feat(embed): candle Metal (Apple Silicon GPU) opt-in build feature + v0.23.0 - kebab-embed-candle: `metal` feature → candle metal backend; select_device() picks Device::new_metal(0) (CPU fallback) under the feature, else Device::Cpu. .contiguous() before to_vec2 (Metal rejects strided views; CPU tolerates). - feature passthrough: kebab-app/embed_metal → kebab-cli/embed_metal. Build on macOS: cargo build --release --features embed_metal. - default (non-metal) path unchanged: clippy 0, candle units + thread_cap + parity pass. - README + HOTFIXES: Mac-GPU-ingest → copy sqlite+lancedb → server CPU-query workflow. - version 0.22.0 → 0.23.0 (opt-in build surface). macOS-only compile; Metal execution/speed/parity validated by user on M4 Pro (not buildable on the Linux CI/dev machine). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 11:37:08 +00:00
altair823	99f8cfa691	Merge pull request 'feat(embed): candle 임베딩 provider (NUMA-안전, opt-in)' (#199 ) from feat/embed-candle into main Reviewed-on: #199	2026-06-02 10:14:39 +00:00
altair823	d85d7348a5	docs(embed-candle): 도그푸딩 + A1 반증 + MKL 부정결과 증거 기록 - HOTFIXES + release-notes: candle 전체 도그푸딩 997 docs/23,151 chunks/에러 0 (9.5h) - A1(taskset -c 0-3) 실서버 반증: 4코어 제한에도 onnxruntime segfault → candle 만이 실 해법 - MKL 가속 부정 결과: 코어 더 쓰나 38~50% 느림 → 미채택, 순수-Rust 유지 - 패리티 2.01e-7 재확인, 성능 트레이드오프 명시 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 09:08:12 +00:00
altair823	edac3ae737	chore(embed-candle): PR #199 회차 1 리뷰 반영 — SMOKE.md candle 모델 주의 명시 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 17:01:35 +00:00
altair823	6ec4e6809f	fix(embed-candle): address round-1 review - commit track-spec + meta-spec/plan into branch (HIGH: dangling `amends:` ref) - inline parity evidence (cosine 1.0, max_abs_diff 2.01e-7) into HOTFIXES + release notes; drop refs to deleted IMPL_REPORT/SPIKE_REPORT (MEDIUM) - model guard: reject non-e5-large `model` before the 2GB download so model_id() can't mislabel vectors (MEDIUM) + unit test - parity test now covers BOTH query: and passage: prefixes (MEDIUM) - guard encodings.first() index; document zero-attention/pooling invariant; clarify embed_batch prefixing doc (LOW) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 16:54:20 +00:00
altair823	1011c75fff	chore(embed-candle): remove spike/impl process artifacts before PR Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 16:37:46 +00:00
altair823	8f7b6ee538	feat(embed): candle 임베딩 provider (NUMA-안전, opt-in) + v0.22.0 duo-socket NUMA 서버에서 fastembed(onnxruntime)가 intra-op 스레드를 48개로 하드코딩해 NUMA 힙 손상 → double-free 로 ingest 가 죽는 문제를 회피하기 위해, 같은 multilingual-e5-large 모델을 순수 Rust(candle)로 돌리는 opt-in 임베딩 provider 를 추가한다. - 신규 crate kebab-embed-candle: CandleEmbedder (kebab_core::Embedder). hf-hub safetensors → XLMRobertaModel forward → mask mean-pool → L2 → e5 prefix. candle 의존성 트리를 이 crate 에 격리 (core/config 외 kebab-* 의존 0). - 스레드 캡: [models.embedding].num_threads + env KEBAB_EMBED_THREADS → 글로벌 rayon 풀 1회 캡 (NUMA-안전 레버). - kebab-app::embedder() 가 provider 분기 (fastembed/onnx/"" → 기존 경로 불변, candle → CandleEmbedder, 미지값 → 에러). - Phase 0 스파이크 crate 제거 (production 흡수). - 버전 0.21.1 → 0.22.0 (신규 config surface, pre-1.0 minor bump). 패리티: cosine_min=1.000000, max abs diff=2.01e-7 (< 1e-5) → embedding_version 유지, 재색인 0. fastembed default 동작/벡터 불변. wire schema 변경 없음. 검증(파일+exit code): clippy -D warnings EXIT=0(warning 0), test EXIT=0 (candle unit 5 + thread_cap rayon=4 + config 68), parity #[ignore] EXIT=0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 14:52:25 +00:00
altair823	76841af7d3	spike(embed-candle): candle e5-large 타당성 검증 — VERDICT PASS Track 1 / Phase 0 격리 스파이크. candle(순수 Rust)로 intfloat/multilingual-e5-large 를 돌려 기존 onnxruntime FastembedEmbedder 와 비교. 결과: - 패리티: 한/영 10문장 cosine min=mean=1.000000 (완전 일치) - padding_idx: XLM-R 규약 정상 (소스 + 패리티 이중 확인) - 스레드 제어: RAYON_NUM_THREADS=4 로 컴퓨트 스레드 12→4 캡 확인 (fastembed 4.9.1 의 48-하드코딩+override불가 문제 구조적 부재) - latency: batch=32 candle 2.161s vs fastembed 0.536s (~4×, 4 vs 12 스레드) → candle 본 구현 진행 권고 (GREEN). 상세 SPIKE_REPORT.md. candle 의존성은 crates/spike-embed-candle 에만 격리. 프로덕션 crate 동작 변경 없음. 결정적 NUMA 검증은 그 듀얼소켓 서버에서 사용자 실행 필요 (meta-spec §4.3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 14:23:51 +00:00
altair823	980e20fd8d	docs: SMOKE/DOGFOOD 에 config migrate 플레이북 추가 SMOKE 에 config migrate 스모크 단계(dry-run/적용/멱등/--json), DOGFOOD §9 에 스키마 마이그레이션 시나리오(.bak byte-identical·값 보존·가시화·멱등·doctor). v0.21.1 에 포함되도록 태그 이동. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 13:58:08 +00:00
altair823	cd79ed326c	chore: bump version 0.21.0 → 0.21.1 config 마이그레이션(kebab config migrate, PR #198) — 신규 CLI 서브커맨드 + doctor 체크 + init 섹션 주석 + wire config_migration.v1 + schema_version 1→2. additive 변경(데이터 무효화 아님)이라 patch bump. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 13:51:56 +00:00
altair823	9dbf9d781d	Merge pull request 'feat(config): config.toml 마이그레이션 (kebab config migrate)' (#198 ) from feat/config-migration into main Reviewed-on: #198	2026-05-31 13:48:10 +00:00
altair823	9501edd82b	docs: config migrate surface 동기화 (README/HOTFIXES/HANDOFF) README Configuration 에 kebab config migrate 불릿, HOTFIXES 에 dated entry (메커니즘 + 도그푸딩 evidence 표 + 한계), HANDOFF 한 줄. lib.rs 백업 경로는 with_extension 유지(리뷰 nit: .toml config 엔 정상 동작, 회귀 위험 회피). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 13:25:42 +00:00
altair823	4b4a4c0b32	fix(config): init 헤더에 지원 확장자 상세 목록 유지 annotated_default_document 의 HEADER 가 기존 init 헤더의 '처리 가능한 형식' 상세 목록(.md / .png .jpg .jpeg / .pdf)을 보존하도록 복원. p9-fb-25 의 init_template 계약(지원 확장자 안내) 유지. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 12:46:45 +00:00
altair823	f2cc325cf3	feat(cli): kebab config migrate 서브커맨드 + wire config_migration.v1 - Cmd::Config { Migrate { --dry-run } }, --json 시 config_migration.v1. - wire_config_migration (ConfigMigrationReport 가 schema_version 자체 보유). - schema.rs WIRE_SCHEMAS 에 config_migration.v1 등록 + JSON schema 파일. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 12:09:31 +00:00
altair823	b7e022a5e3	feat(app): config migrate facade + init 주석 공유 + doctor 체크 - config_migrate_with_config_path: 백업(.bak)+atomic write(tmp→rename)+dry-run, round-trip 검증으로 실패 시 원본 보존. ConfigMigrationReport 반환. - init_workspace 가 annotated_default_document() 사용(섹션 주석 포함). - doctor 에 config_migration 체크 추가(미동기 시 ok=false + hint). - tests/config_migrate.rs 4개(백업/atomic/dry-run/멱등/doctor) 통과. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 12:09:31 +00:00
altair823	bd7c4fd7ef	feat(config): config 마이그레이션 엔진 (reconcile + step 체인) - toml_edit 0.22 의존성 추가 - migrate.rs: CURRENT_SCHEMA_VERSION=2, annotated_default_document(주석 카탈로그 공유 원천), reconcile(빠진 섹션/키 주석과 함께 추가, 값 불가침), step_1_to_2(workspace.include 제거), migrate_document(step+reconcile+stamp) - schema_version default 1 → 2 - 56 tests green, clippy -D warnings clean Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 12:09:31 +00:00
altair823	4dcb4a45d6	feat(config): migrate 모듈 스캐폴딩 + toml_edit 의존성 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:41:32 +00:00
altair823	6d86214060	docs(plan): config 마이그레이션 구현 계획 (TDD, 13 tasks) reconcile(additive)+step 체인(non-additive) 분리, init/migrate 공유 annotated_default_document, app facade 백업+atomic write, doctor 체크, CLI config migrate, wire config_migration.v1. bite-sized TDD steps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:39:31 +00:00
altair823	6bbb8f854b	docs(spec): config 마이그레이션 설계 계약 kickoff 인계(#197)의 brainstorm 결과를 확정한 spec. 트리거=명시 명령 `kebab config migrate`+doctor 안내, 주석 보존=toml_edit 부분 편집, 메커니즘=reconciliation(additive)+step 체인(non-additive) 하이브리드. init/migrate 가 주석 달린 default 문서를 공유. 안전 3축(멱등·백업·dry-run) + atomic write. wire schema config_migration.v1 신설. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:34:19 +00:00
altair823	2a4df4d48d	Merge pull request 'docs: config 마이그레이션 작업 인계 kickoff' (#197 ) from docs/config-migration-kickoff into main	2026-05-31 11:11:17 +00:00
altair823	16f3d6eef2	docs: config 마이그레이션 작업 인계 kickoff config.toml 스키마 진화 시 기존 사용자 파일 자동 마이그레이션 기능의 별도 세션 인계 문서. 현황(serde default forward-compat 있음/파일 마이그레이션 없음/schema_version 장식), 핵심 난점(주석 보존), 설계 3안(전체재작성/toml_edit append/백업), 트리거(명령 vs 자동), 방법론(v0.21.0 PR #195/#196 패턴) 정리. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:11:08 +00:00