chore: bump version 0.17.1 → 0.17.2

v0.17.1 post-dogfood polish cut. 두 PR 묶어 release: - PR #164 — `[image.ocr] request_timeout_secs` 별 노브 (v0.17.1 미진행 closure). LLM 패턴을 OCR 어댑터에 동일 적용, 별 노브로 분리 (OCR vs LLM 의 cold start 패턴 차이로 독립 조절). - PR #165 — `heading_path` FTS5 column filter 로 text-only 매칭 + raw-mode escape hatch (2026-05-24 v0.17.0 trigram entry 의 JSON 노이즈 closure). lexical.rs 가 non-raw 분기 결과를 `text : (<expr>)` 로 wrap, 색인 자체는 V007 verbatim 그대로 유지. raw mode `'heading_path : <token>'` 로 opt-in 가능. 둘 다 additive (옛 config 호환) + re-ingest 불필요. binary 교체만. HANDOFF 한 줄 요약 + 머지 후 결정 절에 v0.17.2 entry 추가. HOTFIXES 의 두 entry anchor 가 `post-v0.17.1 dogfood` → `v0.17.2` 로 갱신. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge pull request 'feat(search): heading_path FTS5 text column filter' (#165 ) from feat/heading-text-column-filter into main
2026-05-25 05:55:50 +00:00 · 2026-05-25 05:48:22 +00:00 · 2026-05-25 05:45:41 +00:00 · 2026-05-25 05:44:21 +00:00 · 2026-05-25 05:40:51 +00:00 · 2026-05-25 05:14:27 +00:00
14 changed files with 548 additions and 77 deletions
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -4127,7 +4127,7 @@ dependencies = [

 [[package]]
 name = "kebab-app"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "base64 0.22.1",
@@ -4172,7 +4172,7 @@ dependencies = [

 [[package]]
 name = "kebab-chunk"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "blake3",
@@ -4189,7 +4189,7 @@ dependencies = [

 [[package]]
 name = "kebab-cli"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "clap",
@@ -4210,7 +4210,7 @@ dependencies = [

 [[package]]
 name = "kebab-config"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "dirs 5.0.1",
@@ -4225,7 +4225,7 @@ dependencies = [

 [[package]]
 name = "kebab-core"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "blake3",
@@ -4239,7 +4239,7 @@ dependencies = [

 [[package]]
 name = "kebab-embed"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "blake3",
@@ -4253,7 +4253,7 @@ dependencies = [

 [[package]]
 name = "kebab-embed-local"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "fastembed",
@@ -4266,7 +4266,7 @@ dependencies = [

 [[package]]
 name = "kebab-eval"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "kebab-app",
@@ -4285,7 +4285,7 @@ dependencies = [

 [[package]]
 name = "kebab-llm"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "kebab-core",
@@ -4294,7 +4294,7 @@ dependencies = [

 [[package]]
 name = "kebab-llm-local"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "kebab-config",
@@ -4311,7 +4311,7 @@ dependencies = [

 [[package]]
 name = "kebab-mcp"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "kebab-app",
@@ -4329,7 +4329,7 @@ dependencies = [

 [[package]]
 name = "kebab-normalize"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "kebab-core",
@@ -4344,7 +4344,7 @@ dependencies = [

 [[package]]
 name = "kebab-parse-code"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "gix",
@@ -4367,7 +4367,7 @@ dependencies = [

 [[package]]
 name = "kebab-parse-image"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "ab_glyph",
 "anyhow",
@@ -4391,7 +4391,7 @@ dependencies = [

 [[package]]
 name = "kebab-parse-md"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "kebab-core",
@@ -4408,7 +4408,7 @@ dependencies = [

 [[package]]
 name = "kebab-parse-pdf"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "blake3",
@@ -4421,7 +4421,7 @@ dependencies = [

 [[package]]
 name = "kebab-parse-types"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "kebab-core",
 "serde",
@@ -4429,7 +4429,7 @@ dependencies = [

 [[package]]
 name = "kebab-rag"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "blake3",
@@ -4450,7 +4450,7 @@ dependencies = [

 [[package]]
 name = "kebab-search"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "globset",
@@ -4469,7 +4469,7 @@ dependencies = [

 [[package]]
 name = "kebab-source-fs"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "blake3",
@@ -4488,7 +4488,7 @@ dependencies = [

 [[package]]
 name = "kebab-store-sqlite"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "blake3",
@@ -4509,7 +4509,7 @@ dependencies = [

 [[package]]
 name = "kebab-store-vector"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "arrow",
@@ -4533,7 +4533,7 @@ dependencies = [

 [[package]]
 name = "kebab-tui"
-version = "0.17.0"
+version = "0.17.2"
 dependencies = [
 "anyhow",
 "crossterm",
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -31,7 +31,7 @@ edition       = "2024"
 rust-version  = "1.85"
 license       = "MIT OR Apache-2.0"
 repository    = "https://github.com/altair823/kebab"
-version       = "0.17.0"
+version       = "0.17.2"

 [workspace.dependencies]
 anyhow       = "1"
--- a/HANDOFF.md
+++ b/HANDOFF.md
@@ -4,7 +4,7 @@

 ## 한 줄 요약

-P0–P5 + P6 + P7 + P9-1/2/3/4 (Library / Search / Ask / Inspect) + P10 전체 머지 완료 (현재 **v0.16.1**). `kebab ingest` 가 markdown / image / PDF / 소스코드 (Rust / Python / TS / JS / Go / Java / Kotlin / C / C++) / Tier 2 리소스 파일 (yaml/k8s / dockerfile / toml / json / xml / groovy / go-mod) + Tier 3 paragraph fallback (shell / 비-k8s YAML / AST 실패 케이스) 처리. `kebab search` / `kebab ask` 가 매체 가로질러 결과 + page / code citation 반환. `kebab tui` 가 4 패널 (Library + Search + Ask + Inspect) 제공. P10-1D (C + C++) 완료로 Tier 1 chunker family 마무리 + 직후 도그푸딩 round 2 의 k8s multi-resource chunk_id 충돌을 v0.16.1 핫픽스로 해결 — 구조적으로 남은 component 는 P9-5 (desktop tauri) 하나뿐, P8 (audio) 는 사용자 보류.
+P0–P5 + P6 + P7 + P9-1/2/3/4 (Library / Search / Ask / Inspect) + P10 전체 머지 완료 (현재 **v0.17.2**). `kebab ingest` 가 markdown / image / PDF / 소스코드 (Rust / Python / TS / JS / Go / Java / Kotlin / C / C++) / Tier 2 리소스 파일 (yaml/k8s / dockerfile / toml / json / xml / groovy / go-mod) + Tier 3 paragraph fallback (shell / 비-k8s YAML / AST 실패 케이스) 처리. `kebab search` / `kebab ask` 가 매체 가로질러 결과 + page / code citation 반환. `kebab tui` 가 4 패널 (Library + Search + Ask + Inspect) 제공. **v0.17.0 cut (2026-05-24)**: 한국어 trigram FTS5 tokenizer (PR #159) + C typedef alias unit (PR #160) + `code_lang_chunk_breakdown` additive (PR #161). **v0.17.1 cut (2026-05-25)**: 확장 도그푸딩 후 `[models.llm] request_timeout_secs` config 노브 (PR #162) + sudo 없이 ollama 설치 + `kebab ask --stream` UX 권장 docs (PR #163). **v0.17.2 cut (2026-05-25)**: v0.17.1 post-dogfood polish — `[image.ocr] request_timeout_secs` 별 노브 (PR #164, v0.17.1 미진행 closure) + `heading_path` FTS5 column filter 로 text-only 매칭 + raw-mode escape hatch (PR #165, 2026-05-24 v0.17.0 trigram entry 의 JSON 노이즈 closure). 자세한 영향은 [v0.17.0 release notes](https://gitea.altair823.xyz/altair823-org/kebab/releases/tag/v0.17.0) + [v0.17.1 release notes](https://gitea.altair823.xyz/altair823-org/kebab/releases/tag/v0.17.1) + [v0.17.2 release notes](https://gitea.altair823.xyz/altair823-org/kebab/releases/tag/v0.17.2). 구조적으로 남은 component 는 P9-5 (desktop tauri) 하나뿐, P8 (audio) 는 사용자 보류.

 ## Phase 로드맵

@@ -32,7 +32,11 @@ P0~P5 직렬. P6~P9 P5 이후 병렬 가능.

 머지 후 발견된 모든 deviation / hotfix 의 dated 로그는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md). 본 요약은 \"누군가가 인수받을 때 알아두면 시간을 많이 절약하는\" 항목만:

- **2026-05-24 v0.17.0 한국어 trigram tokenizer 채택 (closure of 2026-05-22 한국어 lexical)** — `chunks_fts` 가 FTS5 `unicode61` → `trigram` 으로 V007 migration (자동 backfill, re-ingest 불필요). `lexical.rs::build_match_string` trigram-aware 재설계 — multi-token 한국어 query (`해시 충돌`) 가 whole-phrase 후보로 hit, 한영 혼합 (`Rust 충돌은`) 도 OR-combined. 2자 이하 query 는 0-hit + CLI/TUI/wire `hint` 안내. 영어 lexical 도 substring 매칭으로 바뀜 (recall ↑ / 단어 경계 ↓). `kebab.sqlite` 크기 ~2-5배 증가 (trigram index). 자세한 내용: `tasks/HOTFIXES.md` (2026-05-24).
+- **2026-05-25 v0.17.2 post-v0.17.1 polish (PR #164 + #165)** — v0.17.1 의 두 follow-up closure. (1) `[image.ocr] request_timeout_secs` 별 노브 — `crates/kebab-parse-image/src/ocr.rs::REQUEST_TIMEOUT` hard 300s 제거, LLM 쪽 패턴 (PR #162) 을 OCR 어댑터에 동일 적용. 사용자 결정으로 별 노브 분리 (OCR vs LLM 의 cold start 패턴이 달라 독립 조절). v0.17.1 미진행 항목 closure. (2) `chunks_fts` 의 `heading_path` 컬럼이 JSON 표기 + path 세그먼트 까지 trigram 색인 → query false positive 가능 문제 closure. `lexical.rs::build_match_string` 가 non-raw 분기 결과를 `text : (<expr>)` 로 wrap — heading 색인 V007 verbatim 유지, 매칭만 text 한정. 사용자가 명시 heading 검색 하려면 raw mode `'heading_path : <token>'` escape hatch (SKILL.md 갱신). 둘 다 additive (옛 config 호환) / re-ingest 불필요. 자세한 내용: `tasks/HOTFIXES.md` (2026-05-25 v0.17.2 두 entry).
+- **2026-05-25 v0.17.1 post-dogfood (PR #162 + #163)** — 확장 도그푸딩 (16 GB CPU only, gemma4:e4b 시도) 에서 발견된 두 follow-up 한 묶음. (1) `crates/kebab-llm-local/src/ollama.rs::REQUEST_TIMEOUT` hard 300s → `[models.llm] request_timeout_secs` config + env override (additive, default 300, `=0` 은 disable 아닌 "즉시 timeout" 이라 doc 명시). (2) README + SMOKE 에 sudo / systemd 없이 ollama 설치 + ≤4B Q4 권장 모델 + `kebab ask --stream` UX 권장 docs. additive only — 옛 config / wire 호환. 자세한 내용: `tasks/HOTFIXES.md` (2026-05-25).
+- **2026-05-24 v0.17.0 PR-C `code_lang_chunk_breakdown` additive (closure of 2026-05-22 LOW)** — `schema.v1.stats` 에 chunk 수 집계 신규 키. 기존 `code_lang_breakdown` (doc count) 와 sister. 또 기존 두 필드 JSON schema description 의 "chunk count" 오기재 → "doc count" 로 정정. wire additive — schema_version bump 불필요. 자세한 내용: `tasks/HOTFIXES.md` (2026-05-24 PR-C).
+- **2026-05-24 v0.17.0 PR-B C typedef alias unit (closure of 2026-05-21)** — `kebab-parse-code::c::extract_blocks` 의 `type_definition` 분기로 inner anonymous struct/enum/union → declarator 의 typedef alias 이름으로 synthetic unit 방출. `PARSER_VERSION code-c-v1` → `code-c-v2` bump + 같은-asset/다른-doc_id 케이스용 `purge_workspace_path_for_parser_bump` cascade (`stale_chunk_ids_for_workspace_path_except_doc_id` + `purge_document_at_workspace_path_except_doc_id` helper 신규). 사용자 작업 불필요 (다음 ingest 가 자동 재처리). 자세한 내용: `tasks/HOTFIXES.md` (2026-05-24 PR-B).
+- **2026-05-24 v0.17.0 PR-A 한국어 trigram tokenizer 채택 (closure of 2026-05-22 한국어 lexical)** — `chunks_fts` 가 FTS5 `unicode61` → `trigram` 으로 V007 migration (자동 backfill, re-ingest 불필요). `lexical.rs::build_match_string` trigram-aware 재설계 — multi-token 한국어 query (`해시 충돌`) 가 whole-phrase 후보로 hit, 한영 혼합 (`Rust 충돌은`) 도 OR-combined. 2자 이하 query 는 0-hit + CLI/TUI/wire `hint` 안내. 영어 lexical 도 substring 매칭으로 바뀜 (recall ↑ / 단어 경계 ↓). `kebab.sqlite` 크기 ~2-5배 증가 (trigram index). 자세한 내용: `tasks/HOTFIXES.md` (2026-05-24).
 - **2026-05-22 P10 종합 도그푸딩 round 2 (한국어 lexical 검색 한계)** — `kebab search --mode lexical` 의 한국어 query 가 FTS5 `unicode61` 토크나이저에서 거의 0 hit (어절 단위 토큰화 → 부분 매칭 불가). 기본 hybrid 모드는 `multilingual-e5-small` vector 가 carry 해 한국어 검색 정상. **closure**: 위 2026-05-24 v0.17.0 entry.
 - **2026-05-20 P10-1B (Rust 1A symbol path 비일관 + expression-level 함수 미방출)** — (a) Rust `code-rust-ast-v1` 은 file-scope nesting 만 (workspace path prefix 없음), 1B 의 Python/TypeScript/JavaScript 는 workspace 경로 → module path prefix 사용 (비일관 수용, retrofit = chunker_version bump + reindex 필요, 사용자 명시 요청까지 보류); (b) TS/JS 의 `const foo = () => {...}` 같은 expression-level 함수는 `<top-level>` glue 로 처리됨 (declaration-level 단위만 1B 1차 범위). 자세한 내용: `tasks/HOTFIXES.md` (2026-05-20) 두 항목.
 - **2026-05-19 P10-1A-2 (code_rust_ast_v1.rs + SourceType)** — `AST_CHUNK_MAX_LINES` 상수가 `IngestCodeCfg.ast_chunk_max_lines` 를 읽지 않고 모듈 상수 200 고정 (Chunker trait 이 per-medium config 미노출); `SourceType::Code` variant 부재로 code 파일이 `SourceType::Note` 로 분류됨 — 두 항목 모두 `tasks/HOTFIXES.md` (2026-05-19) 에 기록.
@@ -86,7 +90,7 @@ P0~P5 직렬. P6~P9 P5 이후 병렬 가능.
 구조적으로 미완인 component 는 P9-5 하나뿐. 나머지는 도그푸딩 follow-up (아래 "P10 dogfooding 백로그") 또는 사용자 결정 대기.

 - **P9-5 desktop tauri** — 마지막 남은 P9 component. `kebab-desktop` crate + Tauri 앱, 별도 분기. PDF citation rendering UI 가치 큼. 사용자 우선순위 (P9 우선 · 책/PDF 위주) 와 부합.
- **P10 도그푸딩 round 2 follow-up** — 한국어 lexical tokenizer ✅ v0.17.0. 잔여: code_lang_breakdown chunk 단위 집계 (LOW, PR-C) / C typedef-wrapped struct (LOW, PR-B). 상세는 아래 "P10 dogfooding 백로그" 절.
+- **P10 도그푸딩 round 2 follow-up** — ✅ v0.17.0 cut (2026-05-24) 으로 세 항목 모두 closure (한국어 trigram PR-A + C typedef alias PR-B + code_lang_chunk_breakdown additive PR-C). 상세 cross-link: 아래 "P10 dogfooding 백로그" 절 + `tasks/HOTFIXES.md` (2026-05-24 PR-A/B/C).
 - **P8 audio brainstorm** — whisper-rs 시스템 dep 받을지 / 외부 transcription endpoint 사용할지 사용자 결정 필요. 사용자 패턴 (책+PDF 위주, audio 의향 없음) 상 보류.
 - **fb-41 multi-hop reasoning** — ⏳ 미구현, XL, eval 인프라 선행 + brainstorm 필요.
 - **Rust symbol path retrofit** — Rust `code-rust-ast-v1` symbol 이 file-scope-only (1B+ 는 module prefix). `code-rust-ast-v2` bump + Rust corpus re-ingest 비용 → 사용자 명시 요청까지 보류. HOTFIXES `2026-05-20`.
@@ -107,8 +111,9 @@ P0~P5 직렬. P6~P9 P5 이후 병렬 가능.

 P10 종합 도그푸딩 round 2 (`/build/cache/dogfood-p10b/`, OSS 8 repo + 한국어 위키 문서 10편) 에서 발견된 follow-up 후보. 자세한 내용 + 우선순위 근거는 `tasks/HOTFIXES.md` (2026-05-22).

- **한국어 lexical tokenizer** — ✅ v0.17.0 (2026-05-24) 머지. V007 trigram migration 자동 backfill + `build_match_string` 재설계 + CLI/TUI/wire hint. HOTFIXES `2026-05-24` 참조.
- **code_lang_breakdown chunk 단위 집계 (LOW)** — `schema.v1.stats` 의 언어별 분포를 doc 수 → chunk 수로. 소규모, wire additive 필드.
+- **한국어 lexical tokenizer** — ✅ v0.17.0 (2026-05-24) PR-A 머지 (#159). V007 trigram migration 자동 backfill + `build_match_string` 재설계 + CLI/TUI/wire hint. HOTFIXES `2026-05-24 PR-A` 참조.
+- **code_lang_chunk_breakdown chunk 단위 집계 (LOW)** — ✅ v0.17.0 (2026-05-24) PR-C 머지 (#161). `schema.v1.stats` additive 필드. HOTFIXES `2026-05-24 PR-C` 참조.
+- **C typedef-wrapped struct (LOW)** — ✅ v0.17.0 (2026-05-24) PR-B 머지 (#160). `type_definition` 분기 + `PARSER_VERSION code-c-v2` bump + orphan purge cascade. HOTFIXES `2026-05-24 PR-B` 참조.
 - **ranking glue chunk 편향 (deferred)** — 자동 heuristic 은 user intent misalignment 위험. 사용자 명시 요청 전까지 surface 변경 0 유지. 1주+ 실사용 후 재 brainstorm.

 ## 검증된 운영 동작 (release binary, fastembed enabled)
--- a/README.md
+++ b/README.md
@@ -6,6 +6,20 @@

 - **Rust toolchain** ≥ 1.85 (workspace 가 edition 2024 + resolver 3 사용). [rustup](https://rustup.rs) 권장.
 - **Ollama** — `kebab ask` 와 이미지 OCR/caption 가 사용. `https://ollama.com/download` 에서 설치 후 `ollama serve` 실행. 기본 LLM 은 gemma4 계열 (`ollama pull gemma4:e4b`) — OCR / caption 도 같은 family 라 모델 하나만 pull 하면 됨. 더 큰 variant 원하면 `gemma4:26b` 등으로 config override. config 의 `[models.llm].endpoint` 에 host:port 명시.
+  - **CPU only / RAM ≤ 16 GB 환경 권장 모델**: gemma4:e4b (8B) 는 CPU 추론에 무거워 RAG 한 답변이 5분을 넘기기 쉽다 — `[models.llm] request_timeout_secs` 의 기본 300 s 한도에 걸려 `error: kb-rag: llm.generate_stream` 으로 떨어진다 (HOTFIXES 2026-05-25). `gemma3:4b` / `qwen2.5:3b` / `phi3:mini` 같은 ≤ 4B Q4 모델로 바꾸면 답변 1-3 분에 안정 동작 (확장 도그푸딩에서 검증). 모델 storage 가 부담이면 `OLLAMA_MODELS=/path` env 로 위치 분리 가능.
+  - **`request_timeout_secs` 노브 (v0.17.0)**: `[models.llm] request_timeout_secs = 1200` (또는 `KEBAB_MODELS_LLM_REQUEST_TIMEOUT_SECS=1200`) 로 한도를 늘려 큰 모델도 시도 가능. 단 응답 동안 RAM 점유가 길어진다. **`= 0` 은 disable 이 아니라 "즉시 timeout"** (reqwest 의 의미상) — "사실상 무제한" 의도면 `u64::MAX` 또는 `86400` 같이 큰 finite 값 사용.
+  - **sudo 없이 설치 (격리 디렉토리 사용)**: `install.sh` 가 `/usr/local/bin/ollama` + `systemd` 유닛까지 건드리는 게 부담이면 binary tarball 만 받아 사용자 디렉토리에 풀고 env 로 모델 위치 분리하면 된다.
+    ```bash
+    mkdir -p /opt/ollama/{models,logs}
+    curl -fL https://ollama.com/download/ollama-linux-amd64.tar.zst -o /tmp/ollama.tar.zst
+    zstd -d /tmp/ollama.tar.zst -o /tmp/ollama.tar && tar -xf /tmp/ollama.tar -C /opt/ollama/
+    # bin/ollama + lib/ollama/ 가 풀린다. 모델 디렉토리는 OLLAMA_MODELS 로 분리.
+    OLLAMA_MODELS=/opt/ollama/models OLLAMA_HOST=127.0.0.1:11434 \
+        /opt/ollama/bin/ollama serve > /opt/ollama/logs/serve.log 2>&1 &
+    /opt/ollama/bin/ollama pull gemma3:4b
+    ```
+    루트 디스크 부담을 분리하고 싶을 때 (`~/.ollama/models` 가 기본) 그대로 활용. systemd 가 없는 컨테이너 / WSL2 / 회사 머신 등에서 유용.
+  - **`kebab ask --stream` 권장 (fb-33)**: 모델 cold start 가 길 때 (8B+ 또는 첫 호출) `--stream` 으로 토큰을 stderr 에 ndjson 으로 흘려 받으면 5 분 timeout 한도 안에서도 첫 토큰이 빨리 보여 사용자 체감이 개선된다. 동일 inference 시간이라도 wait-and-pray 보다 progressive 가 안정적. CLI: `kebab ask "..." --stream 2> events.ndjson > final.json`. MCP host 도 `streaming_ask` capability flag 가 `true` 면 자동 사용 권장.
 - **빌드 디스크** — 첫 빌드 시 `target/` 가 6–10 GB (Lance + DataFusion + fastembed). 여유 확인.
 - **fastembed 모델** — 첫 `kebab ingest` 시 `multilingual-e5-large` (~1.3 GB, fb-39b) 자동 다운로드. `config.toml` 에서 `model = "multilingual-e5-small"` 로 명시하면 이전 모델 사용.

--- a/crates/kebab-config/src/lib.rs
+++ b/crates/kebab-config/src/lib.rs
@@ -122,6 +122,23 @@ pub struct LlmCfg {
    pub endpoint: String,
    pub temperature: f32,
    pub seed: u64,
+    /// v0.17.0 post-dogfood: Hard ceiling on a single HTTP exchange to
+    /// the LLM endpoint (Ollama, etc.). Cold-loading an 8B+ model on
+    /// CPU-only hosts can spend 60-90s on model load + several minutes
+    /// on a first inference, blowing past the old hard-coded 300s cap
+    /// and surfacing as `error: kb-rag: llm.generate_stream` to the
+    /// user. Config-driven so 16-GB / CPU-only deployments using small
+    /// (≤4B) models can keep the original 300s and large-model dogfood
+    /// can dial it up (e.g. 1200s) without rebuilding.
+    ///
+    /// **Edge case — `0` is NOT a disable sentinel.**
+    /// `reqwest::ClientBuilder::timeout(Duration::from_secs(0))` sets a
+    /// 0-second read timeout, so every request fails *immediately* with
+    /// `error: kb-rag: ollama timeout`. To approximate "no cap", use a
+    /// large finite value (e.g. `u64::MAX` ≈ 5.8 × 10¹¹ years, or
+    /// just a generous number like `86400`).
+    #[serde(default = "default_llm_request_timeout_secs")]
+    pub request_timeout_secs: u64,
 }

 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
@@ -147,6 +164,13 @@ fn default_cache_capacity() -> usize {
    256
 }

+/// v0.17.0 post-dogfood: matches the legacy hard-coded ceiling so
+/// existing configs that omit the field keep behaving identically.
+/// Overridable per config / `KEBAB_MODELS_LLM_REQUEST_TIMEOUT_SECS`.
+fn default_llm_request_timeout_secs() -> u64 {
+    300
+}
+
 fn default_stale_threshold_days() -> u32 {
    30
 }
@@ -204,6 +228,22 @@ pub struct OcrCfg {
    /// Cap the long edge of the image (in pixels) before sending. Larger
    /// images bloat prompt cost. Default `1600`.
    pub max_pixels: u32,
+    /// v0.17.2 post-dogfood: Hard ceiling on a single HTTP exchange to
+    /// the OCR endpoint. Sister knob to [`LlmCfg::request_timeout_secs`]
+    /// — kept separate because OCR latency is typically shorter than
+    /// chat-LLM cold start, and large vision models on CPU-only hosts
+    /// occasionally need a different budget. See HOTFIXES 2026-05-25
+    /// for the rationale.
+    ///
+    /// **Edge case — `0` is NOT a disable sentinel.** Same semantics as
+    /// [`LlmCfg::request_timeout_secs`]: `Duration::from_secs(0)` means
+    /// "every request fails immediately" (reqwest 0.12.x — the read
+    /// timeout is applied as a 0-second deadline), not "no timeout".
+    /// To approximate "no cap", use a large finite value (e.g.
+    /// `u64::MAX` ≈ 5.8 × 10¹¹ years, or just a generous number like
+    /// `86400`).
+    #[serde(default = "default_ocr_request_timeout_secs")]
+    pub request_timeout_secs: u64,
 }

 impl OcrCfg {
@@ -215,10 +255,18 @@ impl OcrCfg {
            endpoint: None,
            languages: vec!["eng".to_string(), "kor".to_string()],
            max_pixels: 1600,
+            request_timeout_secs: default_ocr_request_timeout_secs(),
        }
    }
 }

+/// v0.17.2 post-dogfood: matches the legacy hard-coded ceiling so
+/// existing configs that omit the field keep behaving identically.
+/// Overridable per config / `KEBAB_IMAGE_OCR_REQUEST_TIMEOUT_SECS`.
+fn default_ocr_request_timeout_secs() -> u64 {
+    300
+}
+
 /// Caption settings (P6-3). Caption uses the same Ollama-vision /
 /// `LanguageModel` pipeline as the rest of the workspace; the trait
 /// abstraction is the part the spec demands. `enabled` defaults to
@@ -363,12 +411,14 @@ impl Config {
                    // gemma4 계열 통일 — OCR (P6-2) + caption (P6-3)
                    // 어댑터가 같은 family 사용. 사용자가 더 큰
                    // variant (gemma4:26b 등) 원하면 자기 config.toml
-                    // 에서 override.
+                    // 에서 override. CPU-only / ≤16 GB RAM 환경이면
+                    // gemma3:4b 같은 ≤4B Q4 모델 권장 (README 참조).
                    model: "gemma4:e4b".to_string(),
                    context_tokens: 32768,
                    endpoint: "http://127.0.0.1:11434".to_string(),
                    temperature: 0.0,
                    seed: 0,
+                    request_timeout_secs: default_llm_request_timeout_secs(),
                },
            },
            search: SearchCfg {
@@ -621,6 +671,11 @@ impl Config {
                        self.models.llm.seed = n;
                    }
                }
+                "KEBAB_MODELS_LLM_REQUEST_TIMEOUT_SECS" => {
+                    if let Ok(n) = v.parse::<u64>() {
+                        self.models.llm.request_timeout_secs = n;
+                    }
+                }

                // search
                "KEBAB_SEARCH_DEFAULT_K" => {
@@ -691,6 +746,11 @@ impl Config {
                        self.image.ocr.max_pixels = n;
                    }
                }
+                "KEBAB_IMAGE_OCR_REQUEST_TIMEOUT_SECS" => {
+                    if let Ok(n) = v.parse::<u64>() {
+                        self.image.ocr.request_timeout_secs = n;
+                    }
+                }

                // image.caption (P6-3)
                "KEBAB_IMAGE_CAPTION_ENABLED" => {
@@ -803,6 +863,83 @@ fn parse_bool(s: &str) -> bool {
 mod tests {
    use super::*;

+    /// Legacy TOML fixture written before the `request_timeout_secs`
+    /// knobs (LLM in v0.17.1, OCR follow-up) existed. Shared by
+    /// `legacy_config_without_request_timeout_secs_uses_default`
+    /// (LLM-side) and `legacy_config_without_ocr_request_timeout_secs_uses_default`
+    /// (OCR-side) so both invariants pin against the same on-disk
+    /// shape — schema drift in the legacy form only needs one edit.
+    const LEGACY_PRE_TIMEOUT_TOML: &str = r#"
+schema_version = 1
+
+[workspace]
+root = "/tmp/x"
+exclude = []
+
+[storage]
+data_dir = "/tmp/x"
+sqlite = "/tmp/x/kebab.sqlite"
+vector_dir = "/tmp/x/lancedb"
+asset_dir = "/tmp/x/assets"
+artifact_dir = "/tmp/x/artifacts"
+model_dir = "/tmp/x/models"
+runs_dir = "/tmp/x/runs"
+copy_threshold_mb = 100
+
+[indexing]
+max_parallel_extractors = 2
+max_parallel_embeddings = 1
+watch_filesystem = false
+
+[chunking]
+target_tokens = 500
+overlap_tokens = 80
+respect_markdown_headings = true
+chunker_version = "md-heading-v1"
+
+[models.embedding]
+provider = "fastembed"
+model = "multilingual-e5-large"
+version = "v1"
+dimensions = 1024
+batch_size = 64
+
+[models.llm]
+provider = "ollama"
+model = "gemma3:4b"
+context_tokens = 4096
+endpoint = "http://127.0.0.1:11434"
+temperature = 0.0
+seed = 0
+
+[search]
+default_k = 10
+hybrid_fusion = "rrf"
+rrf_k = 60
+snippet_chars = 220
+
+[rag]
+prompt_template_version = "rag-v2"
+score_gate = 0.3
+explain_default = false
+max_context_tokens = 8000
+
+[image.ocr]
+enabled = false
+engine = "ollama-vision"
+model = "gemma3:4b"
+languages = ["eng"]
+max_pixels = 1600
+
+[image.caption]
+enabled = false
+max_pixels = 768
+prompt_template_version = "caption-v1"
+
+[ui]
+theme = "dark"
+"#;
+
    #[test]
    fn defaults_are_serde_roundtrip_stable() {
        let c = Config::defaults();
@@ -873,6 +1010,35 @@ mod tests {
        assert!((c.models.llm.temperature - 0.7).abs() < 1e-6);
    }

+    /// v0.17.0 post-dogfood: matches the legacy hard-coded 300s cap so
+    /// existing configs that omit the new field are not affected.
+    #[test]
+    fn default_llm_request_timeout_secs_is_300() {
+        assert_eq!(Config::defaults().models.llm.request_timeout_secs, 300);
+    }
+
+    #[test]
+    fn env_overrides_models_llm_request_timeout_secs() {
+        let mut env = HashMap::new();
+        env.insert(
+            "KEBAB_MODELS_LLM_REQUEST_TIMEOUT_SECS".to_string(),
+            "1200".to_string(),
+        );
+        let c = Config::defaults().apply_env(&env);
+        assert_eq!(c.models.llm.request_timeout_secs, 1200);
+    }
+
+    /// v0.17.0 post-dogfood: a config file written before the field
+    /// existed (no `request_timeout_secs` key) must still parse and fall
+    /// back to the 300s default — backwards-compat invariant. Fixture
+    /// shared with the OCR-side invariant via [`LEGACY_PRE_TIMEOUT_TOML`].
+    #[test]
+    fn legacy_config_without_request_timeout_secs_uses_default() {
+        let c: Config = toml::from_str(LEGACY_PRE_TIMEOUT_TOML)
+            .expect("parse legacy config");
+        assert_eq!(c.models.llm.request_timeout_secs, 300);
+    }
+
    #[test]
    fn env_overrides_indexing_watch_filesystem_bool() {
        let mut env = HashMap::new();
@@ -894,6 +1060,38 @@ mod tests {
        assert_eq!(c.image.ocr.max_pixels, 1600);
    }

+    /// v0.17.2 post-dogfood: matches the legacy hard-coded 300s cap so
+    /// existing configs that omit the new field keep behaving identically.
+    #[test]
+    fn default_ocr_request_timeout_secs_is_300() {
+        assert_eq!(
+            Config::defaults().image.ocr.request_timeout_secs,
+            300
+        );
+    }
+
+    #[test]
+    fn env_overrides_image_ocr_request_timeout_secs() {
+        let mut env = HashMap::new();
+        env.insert(
+            "KEBAB_IMAGE_OCR_REQUEST_TIMEOUT_SECS".to_string(),
+            "900".to_string(),
+        );
+        let c = Config::defaults().apply_env(&env);
+        assert_eq!(c.image.ocr.request_timeout_secs, 900);
+    }
+
+    /// post-v0.17.1 dogfood: a config file written before the OCR
+    /// timeout field existed must still parse and fall back to the
+    /// 300s default — backwards-compat invariant. Fixture shared
+    /// with the LLM-side invariant via [`LEGACY_PRE_TIMEOUT_TOML`].
+    #[test]
+    fn legacy_config_without_ocr_request_timeout_secs_uses_default() {
+        let c: Config = toml::from_str(LEGACY_PRE_TIMEOUT_TOML)
+            .expect("parse legacy config");
+        assert_eq!(c.image.ocr.request_timeout_secs, 300);
+    }
+
    #[test]
    fn image_ocr_env_overrides() {
        let mut env = HashMap::new();
--- a/crates/kebab-llm-local/src/ollama.rs
+++ b/crates/kebab-llm-local/src/ollama.rs
@@ -48,10 +48,17 @@ use serde::{Deserialize, Serialize};

 use crate::error::LlmError;

-/// Hard ceiling on a single HTTP exchange. Cold-loading a 14B model on
-/// first call can take ~30s; 5 minutes is generous without being
-/// open-ended.
-const REQUEST_TIMEOUT: Duration = Duration::from_secs(300);
+// v0.17.0 post-dogfood: the per-request ceiling now lives in
+// `kebab_config::LlmCfg::request_timeout_secs` (default 300s) so users
+// running larger models on CPU-only hosts can extend it without a
+// rebuild. Cold-loading an 8B+ model on first call routinely takes
+// 60-90 s plus multi-minute inference; 300s was the legacy hard
+// ceiling and remains the default for back-compat.
+//
+// Edge case: `request_timeout_secs = 0` becomes
+// `Duration::from_secs(0)` which is reqwest's "fail immediately", NOT
+// "disable". The field doc explains the workaround (use u64::MAX or a
+// large finite value).

 /// `reqwest::blocking` adapter implementing [`LanguageModel`] over Ollama's
 /// local HTTP API. Construction is cheap and offline; the first network
@@ -79,7 +86,7 @@ impl OllamaLanguageModel {
    pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self> {
        let llm = &config.models.llm;
        let client = reqwest::blocking::Client::builder()
-            .timeout(REQUEST_TIMEOUT)
+            .timeout(Duration::from_secs(llm.request_timeout_secs))
            .build()?;
        Ok(Self {
            client,
@@ -262,9 +269,11 @@ struct OllamaLine {
 ///
 /// Timeout invariant: the iterator has no inherent stop condition for an
 /// indefinitely-stalled server — only the underlying
-/// `reqwest::blocking::Client`'s read timeout (`REQUEST_TIMEOUT`, 300s)
-/// breaks the hang. Callers needing tighter cancellation should adjust
-/// the client timeout in [`OllamaLanguageModel::new`].
+/// `reqwest::blocking::Client`'s read timeout (configured via
+/// `kebab_config::LlmCfg::request_timeout_secs`, default 300 s) breaks
+/// the hang. Callers needing tighter / looser bounds should set
+/// `[models.llm] request_timeout_secs = N` (or
+/// `KEBAB_MODELS_LLM_REQUEST_TIMEOUT_SECS=N`) before building.
 struct OllamaStream {
    reader: BufReader<reqwest::blocking::Response>,
    line_buf: Vec<u8>,
--- a/crates/kebab-parse-image/src/ocr.rs
+++ b/crates/kebab-parse-image/src/ocr.rs
@@ -39,10 +39,6 @@ use crate::image_prep;
 /// Engine name written into `OcrText.engine` for the Ollama-vision adapter.
 pub const OLLAMA_VISION_ENGINE: &str = "ollama-vision";

-/// Hard ceiling on the OCR HTTP exchange. Cold-loading a vision model on
-/// first call can take ~30s; 5 minutes is generous without being open-ended.
-const REQUEST_TIMEOUT: Duration = Duration::from_secs(300);
-
 /// Lower bound on `config.image.ocr.max_pixels`. Anything below this is
 /// silently bumped to keep the model from receiving an unreadable thumbnail.
 const MIN_LONG_EDGE: u32 = 256;
@@ -139,7 +135,13 @@ impl OllamaVisionOcr {
            Some(s) if !s.is_empty() => s.to_string(),
            _ => config.models.llm.endpoint.clone(),
        };
-        Self::build(endpoint, ocr.model.clone(), ocr.languages.clone(), ocr.max_pixels)
+        Self::build(
+            endpoint,
+            ocr.model.clone(),
+            ocr.languages.clone(),
+            ocr.max_pixels,
+            ocr.request_timeout_secs,
+        )
    }

    /// Build directly from explicit fields. Useful for tests that need
@@ -153,8 +155,15 @@ impl OllamaVisionOcr {
        model: impl Into<String>,
        languages: Vec<String>,
        max_pixels: u32,
+        request_timeout_secs: u64,
    ) -> Result<Self> {
-        Self::build(endpoint.into(), model.into(), languages, max_pixels)
+        Self::build(
+            endpoint.into(),
+            model.into(),
+            languages,
+            max_pixels,
+            request_timeout_secs,
+        )
    }

    /// Shared validation + construction. Centralised so `new` and
@@ -164,6 +173,7 @@ impl OllamaVisionOcr {
        model: String,
        languages: Vec<String>,
        requested_max_pixels: u32,
+        request_timeout_secs: u64,
    ) -> Result<Self> {
        if endpoint.is_empty() {
            anyhow::bail!(
@@ -183,7 +193,7 @@ impl OllamaVisionOcr {
            );
        }
        let client = reqwest::blocking::Client::builder()
-            .timeout(REQUEST_TIMEOUT)
+            .timeout(Duration::from_secs(request_timeout_secs))
            .build()
            .context("building OCR HTTP client")?;
        Ok(Self {
@@ -375,6 +385,7 @@ mod tests {
            "m",
            vec!["eng".into(), "kor".into()],
            1024,
+            300,
        )
        .unwrap();
        let p = engine.build_prompt(Some(&Lang("ko".into())));
@@ -389,6 +400,7 @@ mod tests {
            "m",
            vec!["eng".into()],
            1024,
+            300,
        )
        .unwrap();
        let p = engine.build_prompt(Some(&Lang("und".into())));
@@ -400,7 +412,7 @@ mod tests {
    /// the constructor cannot drift to "silently accept a bad config".
    #[test]
    fn build_rejects_empty_endpoint() {
-        let r = OllamaVisionOcr::from_parts("", "m", vec![], 1024);
+        let r = OllamaVisionOcr::from_parts("", "m", vec![], 1024, 300);
        let err = r.expect_err("empty endpoint must bail").to_string();
        assert!(
            err.contains("endpoint is empty"),
@@ -413,7 +425,7 @@ mod tests {
    /// so testing `from_parts` covers both.
    #[test]
    fn build_rejects_empty_model_after_trim() {
-        let r = OllamaVisionOcr::from_parts("http://x", "   ", vec![], 1024);
+        let r = OllamaVisionOcr::from_parts("http://x", "   ", vec![], 1024, 300);
        let err = r.expect_err("empty model must bail").to_string();
        assert!(
            err.contains("model is empty"),
@@ -428,10 +440,10 @@ mod tests {
    #[test]
    fn build_clamps_max_pixels_outside_legal_range() {
        let too_small =
-            OllamaVisionOcr::from_parts("http://x", "m", vec![], 1).unwrap();
+            OllamaVisionOcr::from_parts("http://x", "m", vec![], 1, 300).unwrap();
        assert_eq!(too_small.max_pixels(), MIN_LONG_EDGE);
        let too_big =
-            OllamaVisionOcr::from_parts("http://x", "m", vec![], u32::MAX).unwrap();
+            OllamaVisionOcr::from_parts("http://x", "m", vec![], u32::MAX, 300).unwrap();
        assert_eq!(too_big.max_pixels(), MAX_LONG_EDGE);
    }
 }
--- a/crates/kebab-parse-image/tests/ocr.rs
+++ b/crates/kebab-parse-image/tests/ocr.rs
@@ -322,7 +322,8 @@ async fn ocr_downscales_large_image_before_sending() {
 #[test]
 fn from_parts_clamps_max_pixels_into_legal_range() {
    // Below MIN_LONG_EDGE — bumped up to the floor.
-    let too_small = OllamaVisionOcr::from_parts("http://x", "m", vec![], 10).unwrap();
+    let too_small =
+        OllamaVisionOcr::from_parts("http://x", "m", vec![], 10, 300).unwrap();
    assert_eq!(
        too_small.max_pixels(),
        256,
@@ -331,7 +332,7 @@ fn from_parts_clamps_max_pixels_into_legal_range() {

    // Above MAX_LONG_EDGE — capped at the ceiling.
    let too_big =
-        OllamaVisionOcr::from_parts("http://x", "m", vec![], 99_999).unwrap();
+        OllamaVisionOcr::from_parts("http://x", "m", vec![], 99_999, 300).unwrap();
    assert_eq!(
        too_big.max_pixels(),
        4096,
@@ -339,7 +340,8 @@ fn from_parts_clamps_max_pixels_into_legal_range() {
    );

    // Inside the legal range — pass through untouched.
-    let in_range = OllamaVisionOcr::from_parts("http://x", "m", vec![], 1024).unwrap();
+    let in_range =
+        OllamaVisionOcr::from_parts("http://x", "m", vec![], 1024, 300).unwrap();
    assert_eq!(in_range.max_pixels(), 1024);
 }

--- a/crates/kebab-search/src/lexical.rs
+++ b/crates/kebab-search/src/lexical.rs
@@ -169,12 +169,26 @@ impl Retriever for LexicalRetriever {
 /// branch. Korean compounds typically split into 2-char eojeols (e.g.
 /// `해시 충돌`), so a naive token AND drops the dominant usage pattern.
 ///
+/// post-v0.17.1 dogfood — `text` column filter (closure of HOTFIXES
+/// 2026-05-24 `heading_path_json` 노이즈). The `chunks_fts` virtual
+/// table indexes both `heading_path` (the JSON-serialized
+/// `chunks.heading_path_json` per V002/V007 triggers) and `text`. Under
+/// the trigram tokenizer the JSON punctuation (`[`, `"`, `,`) plus the
+/// path segments (`app`, `src`, …) become indexable 3-grams, so a
+/// query can hit a chunk purely because its file's heading JSON shares
+/// a path segment with the query — false positives that have no body
+/// relevance. The default match expression therefore scopes to the
+/// `text` column. The `heading_path` column stays indexed (V007 / §5.5
+/// verbatim block is preserved) so a user who *wants* heading matching
+/// can opt in via raw mode (`'heading_path : foo'`).
+///
 /// Rules:
 ///
 /// - Raw mode (unchanged): the query is wrapped in a single pair of
 ///   `'...'` → strip the quotes and pass the inner text through verbatim.
 ///   The user has explicitly opted into FTS5 syntax (e.g.
-///   `'rust AND cargo'`, `'foo*'`).
+///   `'rust AND cargo'`, `'foo*'`, `'heading_path : agent'`). No column
+///   scoping is applied — the raw expression is honored as-is.
 ///
 /// - Otherwise build up to two MATCH candidates:
 ///   1. **whole-phrase**: the entire trimmed input wrapped as one FTS5
@@ -191,6 +205,10 @@ impl Retriever for LexicalRetriever {
 ///
 /// - A single-token long query (`러스트`, `foo`) yields `whole == token_and`
 ///   → return the bare quoted form so the OR doesn't duplicate.
+///
+/// - Finally wrap the combined expression in `text : (<expr>)` so the
+///   match is scoped to the body column. FTS5's column-filter syntax
+///   accepts an arbitrary OR/AND sub-expression inside the parens.
 fn build_match_string(text: &str) -> Option<String> {
    let trimmed = text.trim();
    if trimmed.is_empty() {
@@ -218,13 +236,14 @@ fn build_match_string(text: &str) -> Option<String> {
        (!toks.is_empty()).then(|| toks.join(" "))
    };

-    match (whole_candidate, token_and_candidate) {
-        (None, None) => None,
-        (Some(w), None) => Some(w),
-        (None, Some(a)) => Some(a),
-        (Some(w), Some(a)) if w == a => Some(w),
-        (Some(w), Some(a)) => Some(format!("({w}) OR ({a})")),
-    }
+    let expression = match (whole_candidate, token_and_candidate) {
+        (None, None) => return None,
+        (Some(w), None) => w,
+        (None, Some(a)) => a,
+        (Some(w), Some(a)) if w == a => w,
+        (Some(w), Some(a)) => format!("({w}) OR ({a})"),
+    };
+    Some(format!("text : ({expression})"))
 }

 /// Return `Some(inner)` if `s` is wrapped in a matching pair of single
@@ -587,9 +606,11 @@ mod tests {
    #[test]
    fn build_match_string_default_emits_or_of_phrase_and_and() {
        // Two long tokens: both whole-phrase and token-AND candidates
-        // exist and differ, so the builder combines them with OR.
+        // exist and differ, so the builder combines them with OR
+        // inside a `text : (...)` column filter (post-v0.17.1 dogfood:
+        // text-only scoping to avoid heading_path_json false positives).
        let s = build_match_string("rust cargo").unwrap();
-        assert_eq!(s, r#"("rust cargo") OR ("rust" "cargo")"#);
+        assert_eq!(s, r#"text : (("rust cargo") OR ("rust" "cargo"))"#);
    }

    #[test]
@@ -597,28 +618,41 @@ mod tests {
        // `*`, `(`, `)`, `:`, `^`, `"` should all be wrapped inside
        // FTS5 string-literal quotes so they're treated as literal
        // text rather than FTS5 operators. Every token is ≥3 chars,
-        // so both the whole-phrase and token-AND candidates exist.
+        // so both the whole-phrase and token-AND candidates exist,
+        // wrapped in the `text : (...)` column filter.
        let s = build_match_string(r#"foo* (bar) baz:qux ^head he"llo"#).unwrap();
        assert_eq!(
            s,
-            r#"("foo* (bar) baz:qux ^head he""llo") OR ("foo*" "(bar)" "baz:qux" "^head" "he""llo")"#
+            r#"text : (("foo* (bar) baz:qux ^head he""llo") OR ("foo*" "(bar)" "baz:qux" "^head" "he""llo"))"#
        );
        // The doubled `""` is FTS5's way of embedding a literal quote
        // inside a string literal. Appears in both whole-phrase and
        // token-AND halves.
        assert!(s.contains(r#"he""llo"#));
-        // Sanity: the combined expression is `(...) OR (...)` so it
-        // starts with `(` and ends with `)`.
-        assert!(s.starts_with('(') && s.ends_with(')'));
+        // Sanity: outermost wrapper is the column filter.
+        assert!(s.starts_with("text : ("));
+        assert!(s.ends_with(')'));
    }

    #[test]
    fn build_match_string_passthrough_when_single_quoted() {
-        // The FTS5 expression is preserved verbatim.
+        // Raw mode bypasses column scoping — the FTS5 expression is
+        // preserved verbatim, including any explicit column filter
+        // (e.g. `'heading_path : foo'`) the user opts into.
        let s = build_match_string("'foo OR bar*'").unwrap();
        assert_eq!(s, "foo OR bar*");
    }

+    /// Raw mode preserves an explicit `heading_path :` column filter
+    /// — opt-in path for users who deliberately want heading matching
+    /// (post-v0.17.1 dogfood default scopes to `text` only).
+    #[test]
+    fn build_match_string_raw_mode_preserves_heading_filter() {
+        let s = build_match_string("'heading_path : agent'").unwrap();
+        assert_eq!(s, "heading_path : agent");
+        assert!(!s.starts_with("text : "));
+    }
+
    // ── v0.17.0 trigram-aware redesign coverage ──────────────────────────

    /// 2-char Korean query (`충돌`) yields neither a whole-phrase nor a
@@ -634,38 +668,43 @@ mod tests {
    /// `해시 충돌` — both tokens are 2 chars (dropped from the AND), but
    /// the whole-phrase candidate (`"해시 충돌"`, 5 chars total) survives.
    /// This is the dominant Korean usage pattern targeted by A5.
+    /// The whole-phrase candidate is then wrapped in the `text : (...)`
+    /// column filter.
    #[test]
    fn build_match_string_whole_phrase_only_when_all_tokens_short() {
        let s = build_match_string("해시 충돌").unwrap();
-        assert_eq!(s, r#""해시 충돌""#);
+        assert_eq!(s, r#"text : ("해시 충돌")"#);
    }

    /// Single long token: whole-phrase and token-AND candidates collapse
    /// to the same string. The builder returns the bare quoted form so
-    /// the MATCH expression doesn't carry a redundant `(x) OR (x)`.
+    /// the MATCH expression doesn't carry a redundant `(x) OR (x)`,
+    /// wrapped in `text : (...)`.
    #[test]
    fn build_match_string_single_long_token_no_duplicate_or() {
-        assert_eq!(build_match_string("러스트").unwrap(), r#""러스트""#);
-        assert_eq!(build_match_string("rust").unwrap(), r#""rust""#);
+        assert_eq!(build_match_string("러스트").unwrap(), r#"text : ("러스트")"#);
+        assert_eq!(build_match_string("rust").unwrap(), r#"text : ("rust")"#);
    }

    /// Mixed Korean+English multi-token query where every token is ≥3
-    /// chars: both candidates exist and differ, OR-combined.
+    /// chars: both candidates exist and differ, OR-combined inside
+    /// `text : (...)`.
    #[test]
    fn build_match_string_mixed_lang_emits_or_of_phrase_and_and() {
        let s = build_match_string("Rust 충돌은").unwrap();
-        assert_eq!(s, r#"("Rust 충돌은") OR ("Rust" "충돌은")"#);
+        assert_eq!(s, r#"text : (("Rust 충돌은") OR ("Rust" "충돌은"))"#);
    }

    /// One ≥3 token + one <3 token: short token is dropped from the
    /// AND, leaving a single long token there; whole-phrase exists
-    /// independently. Both candidates differ → OR-combined.
+    /// independently. Both candidates differ → OR-combined inside
+    /// `text : (...)`.
    #[test]
    fn build_match_string_drops_short_token_in_and_keeps_whole() {
        // "키" (1 char) dropped from AND; "해시테이블" (5 chars) kept.
        // Whole phrase "키 해시테이블" (7 chars) keeps the short token.
        let s = build_match_string("키 해시테이블").unwrap();
-        assert_eq!(s, r#"("키 해시테이블") OR ("해시테이블")"#);
+        assert_eq!(s, r#"text : (("키 해시테이블") OR ("해시테이블"))"#);
    }

    #[test]
--- a/crates/kebab-search/tests/lexical.rs
+++ b/crates/kebab-search/tests/lexical.rs
@@ -1060,3 +1060,99 @@ fn lexical_snapshot_run_1() {
    let expected: serde_json::Value = serde_json::from_str(&baseline_text).unwrap();
    assert_eq!(actual, expected, "lexical run-1 snapshot drift");
 }
+
+// ── post-v0.17.1 dogfood — `text` column filter ──────────────────────────
+
+/// Heading-only token (unique to `chunks.heading_path_json`, absent
+/// from `chunks.text`) must NOT hit in default mode after the column
+/// filter clamp. Pins HOTFIXES 2026-05-24 closure — the JSON
+/// punctuation + path segments in `heading_path_json` are no longer
+/// matchable from a plain query.
+#[test]
+fn lexical_heading_only_token_does_not_hit_default_mode() {
+    let env = Env::new();
+    let conn = env.raw_conn();
+    insert_document(
+        &conn,
+        &id32("d"),
+        "notes/heading-only.md",
+        "Heading-only fixture",
+        "en",
+        "primary",
+        &[],
+    );
+    insert_chunk(
+        &conn,
+        &id32("c1"),
+        &id32("d"),
+        "bravo charlie delta echo",
+        &["kubernetes-agent-controller"],
+        Some("Heading"),
+        r#"[{"kind":"line","start":1,"end":2}]"#,
+        "v1",
+    );
+    drop(conn);
+
+    let r = env.retriever();
+    let hits = r
+        .search(&SearchQuery {
+            // "kubernetes-agent-controller" is in heading_path only.
+            text: "kubernetes-agent-controller".to_string(),
+            mode: SearchMode::Lexical,
+            k: 10,
+            filters: SearchFilters::default(),
+        })
+        .unwrap();
+    assert!(
+        hits.is_empty(),
+        "heading-only token must not hit text column; got {} hits",
+        hits.len()
+    );
+}
+
+/// Raw mode (`'heading_path : <token>'`) is the opt-in escape hatch
+/// for users who deliberately want heading-column matching after the
+/// default text-only clamp. The same fixture that 0-hits in default
+/// mode must hit when the user explicitly scopes to `heading_path`.
+#[test]
+fn lexical_raw_mode_can_opt_into_heading_path_filter() {
+    let env = Env::new();
+    let conn = env.raw_conn();
+    insert_document(
+        &conn,
+        &id32("d"),
+        "notes/heading-only.md",
+        "Heading-only fixture",
+        "en",
+        "primary",
+        &[],
+    );
+    insert_chunk(
+        &conn,
+        &id32("c1"),
+        &id32("d"),
+        "bravo charlie delta echo",
+        &["kubernetes-agent-controller"],
+        Some("Heading"),
+        r#"[{"kind":"line","start":1,"end":2}]"#,
+        "v1",
+    );
+    drop(conn);
+
+    let r = env.retriever();
+    let hits = r
+        .search(&SearchQuery {
+            // Raw mode: outer single quotes opt out of column-filter
+            // wrapping and pass the FTS5 expression through verbatim.
+            text: "'heading_path : \"kubernetes-agent-controller\"'".to_string(),
+            mode: SearchMode::Lexical,
+            k: 10,
+            filters: SearchFilters::default(),
+        })
+        .unwrap();
+    assert_eq!(
+        hits.len(),
+        1,
+        "raw-mode heading_path filter must hit the seeded chunk"
+    );
+}
--- a/docs/SMOKE.md
+++ b/docs/SMOKE.md
@@ -21,6 +21,30 @@ cargo build --release -p kebab-cli   # debug 도 무방. 디버그가 더 빠르
 # Mac 등 별도 호스트에서
 OLLAMA_HOST=0.0.0.0:11434 ollama serve
 ollama pull gemma4:e4b           # 기본 default. 더 큰 variant 원하면 gemma4:26b
+# CPU only / RAM ≤ 16 GB 환경이면 ≤ 4B Q4 모델 권장 (gemma3:4b / qwen2.5:3b 등) —
+# 8B+ 모델은 첫 RAG 답변이 5분 (기본 [models.llm] request_timeout_secs)
+# 한도를 넘기 쉬워 `error: kb-rag: llm.generate_stream` 으로 떨어짐.
+# 노브 늘리려면 config 에 request_timeout_secs = 1200 추가
+# 또는 KEBAB_MODELS_LLM_REQUEST_TIMEOUT_SECS=1200 env. HOTFIXES 2026-05-25 참조.
+```
+
+sudo / systemd 없이 격리 디렉토리에 설치하는 경로 (컨테이너 / WSL2 / 회사 머신
+유용):
+
+```bash
+# tarball 만 받아 사용자 디렉토리에 풀고 OLLAMA_MODELS 로 모델 디렉토리 분리.
+mkdir -p /opt/ollama/{models,logs}
+curl -fL https://ollama.com/download/ollama-linux-amd64.tar.zst -o /tmp/ollama.tar.zst
+zstd -d /tmp/ollama.tar.zst -o /tmp/ollama.tar && tar -xf /tmp/ollama.tar -C /opt/ollama/
+OLLAMA_MODELS=/opt/ollama/models OLLAMA_HOST=127.0.0.1:11434 \
+    /opt/ollama/bin/ollama serve > /opt/ollama/logs/serve.log 2>&1 &
+/opt/ollama/bin/ollama pull gemma3:4b
+# 종료: pkill -f "ollama serve"
+```
+
+cold start 가 긴 모델 (8B+ 또는 첫 호출) 은 `kebab ask --stream` 으로 시도 권장
+— 토큰을 stderr 에 ndjson 으로 흘려 받아 5분 timeout 한도 안에서도 첫 토큰이
+빨리 surface 됨 (fb-33). 자세한 명령은 아래 "Streaming ask (fb-33)" 절.
 ```

 본 머신에서 reachability 검증:
--- a/integrations/claude-code/kebab/SKILL.md
+++ b/integrations/claude-code/kebab/SKILL.md
@@ -61,6 +61,7 @@ Input:
 - When `truncated: true`, the budget loop modified the page (snippet shortening or k reduction). `next_cursor` is **independent** — non-null whenever more hits may be reachable. Caller may widen `max_tokens` (re-issue same query for fuller snippets / more hits per page) or follow `next_cursor` (advance through more hits) or both. Mismatched cursor (corpus_revision changed) returns `error.v1.code = stale_cursor` — re-issue the search to obtain a fresh one.
 - **`trace: true` (p9-fb-37)** — debug aid. Response carries an extra `trace` block: `lexical[]` + `vector[]` (pre-fusion candidates), `rrf_inputs[]` (RRF union before final cut), and `timing` (`lexical_ms`, `vector_ms`, `fusion_ms`, `total_ms`). Trace bypasses the search cache (always cold). Use sparingly — it bloats the wire response and is for diagnosing "why did this hit / not hit", not normal retrieval.
 - **`hint` (v0.17.0)** — optional advisory string on `search_response.v1`. Present only when the result is empty AND the trimmed query is shorter than the FTS5 trigram tokenizer's 3-char minimum. Surface it to the user instead of retrying the same short query. Korean lexical search benefits most from ≥3-char keywords (`충돌` zero-hit, `충돌은` substring-hit). Raw FTS5 mode (`'...'`) opts out — the user opted into FTS5 syntax. Vector / hybrid modes carry the field too but it's rarely triggered (semantic embeddings handle short queries).
+- **Column scoping (post-v0.17.1 dogfood)** — default lexical / hybrid matching is scoped to the `text` column only. The `heading_path` column is indexed (path segments like `app`, `src` plus JSON punctuation are 3-gram'd) but excluded from the default MATCH — past JSON noise produced false positives where a query coincidentally shared a 3-gram with a file's heading path. To deliberately search heading paths, escape into raw FTS5 mode with an explicit column filter: `'heading_path : <token>'` (e.g. `kebab search "'heading_path : agent'"`). Same applies to MCP `search` — quote the inner expression. Raw mode bypasses both column scoping and the 3-char `hint` short-circuit.

 ### `mcp__kebab__bulk_search`

--- a/tasks/HOTFIXES.md
+++ b/tasks/HOTFIXES.md
@@ -14,6 +14,65 @@ historical contract that was implemented; this file accumulates the
 deltas so phase 5+ readers can find the live behavior without diffing
 git history.

+## 2026-05-25 — v0.17.0 post-dogfood: `[models.llm] request_timeout_secs` 노브 + 권장 모델 가이드
+
+v0.17.0 후속 도그푸딩에서 발견: 사용자가 default `gemma4:e4b` (8B Q4, 9.6 GB) 를 CPU only / 16 GB RAM 환경에서 시도 시 첫 RAG 답변이 5 분 (hard-coded 300 s) 한도를 항상 넘겨 `error: kb-rag: llm.generate_stream` 으로 떨어졌다. 메모리도 ollama RSS 10.7 GB / free 2 GB 까지 압박. 후속 도그푸딩 32 분 / 199 mem-monitor sample 결과는 `tasks/HOTFIXES.md` 의 본 entry 와 conversation 의 도그푸딩 보고 참조.
+
+**변경**:
+- `crates/kebab-config/src/lib.rs::LlmCfg` 에 `request_timeout_secs: u64` additive 필드 (`#[serde(default = "default_llm_request_timeout_secs")]`, default `300`). 옛 config 가 필드 누락해도 그대로 파싱 + 동일 동작 (3 신규 unit test 가 default / env override / legacy parse 핀).
+- env override `KEBAB_MODELS_LLM_REQUEST_TIMEOUT_SECS`.
+- `crates/kebab-llm-local/src/ollama.rs` 의 `REQUEST_TIMEOUT` 상수 제거. `OllamaLanguageModel::new` 가 `Duration::from_secs(llm.request_timeout_secs)` 로 reqwest blocking client 빌드. doc comment 도 동일하게 갱신.
+- `README.md` 사전 요구 절 + `docs/SMOKE.md` 의 ollama 안내에 권장 모델 (≤ 4B Q4 — `gemma3:4b` / `qwen2.5:3b` / `phi3:mini`) + timeout 노브 anchor 한 줄. 8B+ 시도 시 timeout 패턴 사전 안내.
+- `crates/kebab-config/src/lib.rs::Config::defaults` 의 LlmCfg literal 에 `request_timeout_secs: default_llm_request_timeout_secs()` + comment 한 줄로 CPU only 권장 안내.
+
+**미진행 (scope 밖) — closure 갱신**:
+- ~~`crates/kebab-parse-image/src/ocr.rs::REQUEST_TIMEOUT` 도 동일한 hard-coded 300 s — OCR 이 보통 짧아 LLM 만큼 부담 안 되지만, 일관성 측면에서 다음 round 에 같은 노브 (또는 별 노브) 로 재검토.~~ → **closure**: 아래 2026-05-25 v0.17.2 OCR timeout entry 참조 (별 노브 `[image.ocr] request_timeout_secs` 신설, PR #164).
+- ~~`kebab ask --stream` (fb-33) 권장 강조: 5분 cold-start 동안 첫 token 빠르게 surface — UX 개선. README/SKILL.md 추가 한 줄 후속.~~ → **closure**: PR #163 (v0.17.1 cut) 에서 이미 README + SMOKE + SKILL.md 세 곳 모두 추가됨 (`README.md:22` cold start 권장 단락, `docs/SMOKE.md:45/209` 예제, `SKILL.md:114/119` 사용 가이드). 본 entry 의 미진행 표기가 outdated 였음.
+
+**후속 도그푸딩 baseline 보존**: `/build/cache/dogfood-v017/` (466 MB workspace + DB + memory.log), `/build/cache/ollama/` (21 GB binary + gemma3:4b/gemma4:e4b 모델). 다음 round 회귀 비교용.
+
+Cross-link: `crates/kebab-config/src/lib.rs::LlmCfg::request_timeout_secs`, `crates/kebab-llm-local/src/ollama.rs::OllamaLanguageModel::new`.
+
+## 2026-05-25 — v0.17.2: `[image.ocr] request_timeout_secs` 노브 (closure of v0.17.1 미진행, PR #164)
+
+v0.17.1 entry 의 첫 번째 미진행 항목 closure. LLM 쪽이 v0.17.1 에서 `[models.llm] request_timeout_secs` 로 풀려난 패턴을 OCR 어댑터에 동일 적용. 별 노브로 분리한 이유 (사용자 결정): OCR 은 통상 LLM 대비 짧고 cold start 패턴도 다름 — 두 노브를 독립 조절할 수 있어야 16 GB / CPU only 환경에서 vision 모델만 다른 timeout 을 쓰기 편함.
+
+**변경**:
+- `crates/kebab-config/src/lib.rs::OcrCfg` 에 `request_timeout_secs: u64` additive 필드 (`#[serde(default = "default_ocr_request_timeout_secs")]`, default `300`). 옛 config 가 필드 누락해도 그대로 파싱 + 동일 동작 (3 신규 unit test 가 default / env override / legacy parse 핀).
+- env override `KEBAB_IMAGE_OCR_REQUEST_TIMEOUT_SECS`.
+- `crates/kebab-parse-image/src/ocr.rs` 의 `REQUEST_TIMEOUT` 상수 제거. `OllamaVisionOcr::build` 시그니처가 `request_timeout_secs: u64` 추가, `new(&Config)` 는 `config.image.ocr.request_timeout_secs` 전달. `from_parts` (테스트 전용 surface) 도 동일하게 시그니처 확장 — caller 9 call site (`crates/kebab-parse-image/src/ocr.rs::tests` 5 test / 6 call site, `crates/kebab-parse-image/tests/ocr.rs::from_parts_clamps_max_pixels_into_legal_range` 1 test / 3 call site) 모두 `300` 명시 갱신.
+- `OcrCfg::defaults()` 에 `request_timeout_secs: default_ocr_request_timeout_secs()` 추가. `Config::defaults()` 는 `ImageCfg::defaults()` 경유라 cascade.
+
+**Edge case 동일**: `0` 은 disable 아닌 "즉시 timeout" (`Duration::from_secs(0)` 의 reqwest 의미). LlmCfg 의 doc comment 와 같은 안내가 OcrCfg field doc 에 명시.
+
+**사용자 영향**: 기존 v0.17.x KB / config 는 변경 불필요 — 새 필드는 serde default 로 채워지고 동작도 동일 (300s). vision 모델 cold start 가 길면 `KEBAB_IMAGE_OCR_REQUEST_TIMEOUT_SECS=600` 또는 config 에서 `[image.ocr] request_timeout_secs = 600` 설정.
+
+Cross-link: `crates/kebab-config/src/lib.rs::OcrCfg::request_timeout_secs`, `crates/kebab-parse-image/src/ocr.rs::OllamaVisionOcr::build`.
+
+## 2026-05-25 — v0.17.2: `heading_path` FTS5 column filter (text-only matching, closure of 2026-05-24 `heading_path_json` 노이즈, PR #165)
+
+v0.17.0 의 한국어 trigram tokenizer 채택 entry (2026-05-24 위) 가 미수정으로 남겨둔 `heading_path_json` JSON 노이즈 closure. trigram 이 `chunks_fts.heading_path` 컬럼 (V002/V007 트리거가 `chunks.heading_path_json` 을 그대로 INSERT) 의 JSON 표기 (`[`, `"`, `,`) + 안의 path 세그먼트 (`app`, `src`) 까지 3-gram 색인해서 query 가 우연히 false positive hit 하는 문제. 사용자 결정 (column filter vs 평문 heading 변환): **column filter** — `heading_path` 색인은 V007 verbatim 그대로 유지, 매칭 대상만 `text` 컬럼으로 한정. V008 migration / design §5.5 verbatim 블록 변경 불필요.
+
+**변경**:
+- `crates/kebab-search/src/lexical.rs::build_match_string` 가 non-raw 분기에서 combined expression 을 `text : (<expr>)` 로 wrap. FTS5 column filter syntax (`column:expr`) 가 OR/AND sub-expression 허용 — 한국어 trigram 빌더의 `(whole) OR (token_and)` 형태가 그대로 들어감.
+- Raw mode (`'...'`) 는 변경 없음 — 사용자가 명시 의도로 `'heading_path : agent'` 같은 explicit column filter opt-in 가능 (escape hatch).
+- 9 unit test (8 갱신 + 1 신규) + 2 신규 통합 test (`crates/kebab-search/tests/lexical.rs`) = 11 total:
+  - `build_match_string_*` 8 expected string 갱신 (column filter prefix 추가)
+  - `build_match_string_raw_mode_preserves_heading_filter` 신규 unit — raw mode 가 `heading_path : ...` 보존
+  - `lexical_heading_only_token_does_not_hit_default_mode` 신규 통합 — heading-only unique token 이 default mode 에서 0 hit
+  - `lexical_raw_mode_can_opt_into_heading_path_filter` 신규 통합 — 같은 fixture 가 raw mode 로 hit 확인
+- `integrations/claude-code/kebab/SKILL.md` 의 search 절에 column scoping + heading_path raw-mode escape hatch 안내 한 bullet 추가 (회차 1 follow-up suggestion 반영, 본 PR 에 포함).
+
+**사용자 영향**:
+- 기본 lexical / hybrid 검색에서 heading 만 매칭되던 false positive 차단. 한국어 / 영어 substring 매칭의 recall 은 그대로 (text 본문에 있는 token 은 변함없이 hit). 본문 검색의 precision 가 올라감.
+- heading 으로 일부러 검색하던 사용자는 `'heading_path : <token>'` 형태로 raw mode 진입. CLI / TUI / MCP 모든 surface 동일.
+- `kebab.sqlite` 크기 변화 없음 (색인 column 그대로 유지). re-ingest 불필요 (FTS query 시점의 매칭 범위만 변경).
+- BM25 score 영향: `lexical_snapshot_run_1` + `hybrid_snapshot_run_1` 둘 다 column filter 적용 후에도 점수 동일 (text 본문에만 매칭되던 query 라 column filter 가 점수 분포에 영향 안 줌). fixture regenerate 불필요.
+
+**MCP / agent 가시성**: `search_response.v1` 의 wire shape 변경 없음. heading 검색 의도 사용자 / agent 를 위해 `integrations/claude-code/kebab/SKILL.md` 의 search 절에 column scoping + heading_path raw-mode escape hatch 안내 한 bullet 추가 (회차 1 follow-up 반영). 새 escape hatch (`'heading_path : <token>'`) 는 v0.17.0 의 raw mode (`'foo OR bar*'`) 와 같은 single-quote opt-out 패턴 위에 build — 새 surface 가 아닌 documented column-filter 활용.
+
+Cross-link: `crates/kebab-search/src/lexical.rs::build_match_string`, `migrations/V007__fts_trigram.sql` (verbatim 유지), design §5.5 (verbatim 유지, query-time 동작만 변경).
+
 ## 2026-05-24 — v0.17.0: 한국어 trigram FTS5 tokenizer 채택 (closure of 2026-05-22 한국어 lexical)

 V007 migration 으로 `chunks_fts` 의 tokenizer 를 `unicode61` → `trigram` 으로 교체. `chunks` 원본 + embedding + vector index 는 그대로, FTS shadow 만 재구축 + 자동 backfill — 사용자는 `kebab ingest` 재실행 불필요 (binary 만 교체하면 다음 open 시 V007 가 즉시 적용). 같은 라운드의 다른 두 follow-up (`code_lang_chunk_breakdown`, C typedef) 은 별 PR (PR-C / PR-B).
@@ -26,7 +85,7 @@ V007 migration 으로 `chunks_fts` 의 tokenizer 를 `unicode61` → `trigram`

 **디스크 용량**: trigram 인덱스는 unicode61 대비 통상 2-10배. V007 자동 backfill 후 `kebab.sqlite` 파일 크기 증가 (도그푸딩 KB 기준 ~2-5배 또는 수백 MB). release notes 명시.

-**`heading_path_json` JSON 노이즈 (관찰, 미수정)**: trigram 이 JSON 표기 (`[`, `"`, `,`) 와 그 안의 단어 (`app`, `src`) 까지 3-gram 색인 → query 가 우연히 JSON 구문 / 흔한 경로 단어와 겹쳐 false positive 가능. v0.17.0 에서는 컬럼 구성 유지, 도그푸딩 후 column filter (`{text} : <q>` 한정) 또는 평문 heading 변환 결정. 후속 도그푸딩 entry 로 등재 예정.
+**`heading_path_json` JSON 노이즈 (관찰, 미수정)**: trigram 이 JSON 표기 (`[`, `"`, `,`) 와 그 안의 단어 (`app`, `src`) 까지 3-gram 색인 → query 가 우연히 JSON 구문 / 흔한 경로 단어와 겹쳐 false positive 가능. v0.17.0 에서는 컬럼 구성 유지, 도그푸딩 후 column filter (`{text} : <q>` 한정) 또는 평문 heading 변환 결정. 후속 도그푸딩 entry 로 등재 예정. → **closure**: 위 2026-05-25 v0.17.2 heading text column filter entry 참조 (column filter 방식 채택, V008 migration 불필요, PR #165).

 **MCP / agent 가시성**: `search_response.v1` 에 `hint: Option<String>` additive 필드. 결과가 비어 있고 query trimmed.chars().count() < 3 + raw mode 아닐 때만 set (helper `kebab_app::short_query_hint`). `integrations/claude-code/kebab/SKILL.md` 의 search 절에 "한국어 lexical 은 3자 이상 권장, `hint` 필드 확인" 안내 추가.

--- a/tasks/INDEX.md
+++ b/tasks/INDEX.md
@@ -148,6 +148,18 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.
  - p10-2 Tier 2 resource-aware — ✅ 머지 (v0.14.0, `k8s-manifest-resource-v1` / `dockerfile-file-v1` / `manifest-file-v1`)
  - p10-3 Tier 3 paragraph + line-window fallback — ✅ 머지 (v0.15.0, `code-text-paragraph-v1`)

+  ### 🎯 P10 Dogfooding Feedback (v0.17.0)
+
+  도그푸딩 round 2 (2026-05-22) 에서 발견된 follow-up 셋. spec + plan: `docs/superpowers/specs/2026-05-22-korean-trigram-tokenizer-design.md`, `docs/superpowers/plans/2026-05-22-korean-trigram-tokenizer.md`. release: [v0.17.0](https://gitea.altair823.xyz/altair823-org/kebab/releases/tag/v0.17.0).
+
+  - **PR-A 한국어 trigram FTS5 tokenizer + lexical builder + hint** — ✅ 머지 (#159, 2026-05-24). `chunks_fts` 가 V007 migration 으로 `unicode61` → `trigram`. `lexical.rs::build_match_string` trigram-aware 재설계 (whole-phrase OR token-AND, 3자 미만 토큰 drop, raw FTS5 mode 유지). `SearchResponse.hint` additive 필드 + CLI/TUI 안내. 영어 lexical 도 substring 매칭으로 동작 변경.
+  - **PR-B C typedef alias unit + parser_version cascade** — ✅ 머지 (#160, 2026-05-24). `type_definition` 분기 — top-level typedef-wrapped anonymous struct/enum/union 의 alias 이름으로 synthetic unit. `PARSER_VERSION code-c-v1` → `code-c-v2` bump + same-workspace_path orphan purge cascade.
+  - **PR-C `code_lang_chunk_breakdown` additive wire field** — ✅ 머지 (#161, 2026-05-24). `schema.v1.stats` 에 chunk 수 집계 sister 필드 + 기존 `code_lang_breakdown` / `repo_breakdown` JSON schema description 정정 ("chunk count" 오기재 → "doc count").
+
+  **v0.17.1 post-dogfood polish** (release: [v0.17.1](https://gitea.altair823.xyz/altair823-org/kebab/releases/tag/v0.17.1)):
+  - **PR #162 `[models.llm] request_timeout_secs` config + 권장 모델 가이드** — ✅ 머지 (2026-05-25). 8B+ 모델 CPU 추론 시 5분 hard timeout 회피용 노브. additive serde default + env override + 0-edge doc. README + SMOKE 에 CPU only / ≤16GB RAM ⇒ ≤4B Q4 모델 권장 한 단락.
+  - **PR #163 sudo 없이 ollama 설치 + ask --stream 권장 (docs only)** — ✅ 머지 (2026-05-25). README + SMOKE 에 tarball + OLLAMA_MODELS env 설치 패턴 + cold start 긴 모델은 progressive 토큰 권고 (p9-fb-33 surface).
+
 ## Post-merge 핫픽스

 머지 후 발견된 버그들과 그 follow-up PR들은 [HOTFIXES.md](HOTFIXES.md)에 dated 로그로 기록한다. 원래 task spec은 frozen 상태로 두고, post-merge 동작 변경은 HOTFIXES.md를 source of truth로 본다.
Author	SHA1	Message	Date
altair823	1640ecf288	chore: bump version 0.17.1 → 0.17.2 v0.17.1 post-dogfood polish cut. 두 PR 묶어 release: - PR #164 — `[image.ocr] request_timeout_secs` 별 노브 (v0.17.1 미진행 closure). LLM 패턴을 OCR 어댑터에 동일 적용, 별 노브로 분리 (OCR vs LLM 의 cold start 패턴 차이로 독립 조절). - PR #165 — `heading_path` FTS5 column filter 로 text-only 매칭 + raw-mode escape hatch (2026-05-24 v0.17.0 trigram entry 의 JSON 노이즈 closure). lexical.rs 가 non-raw 분기 결과를 `text : (<expr>)` 로 wrap, 색인 자체는 V007 verbatim 그대로 유지. raw mode `'heading_path : <token>'` 로 opt-in 가능. 둘 다 additive (옛 config 호환) + re-ingest 불필요. binary 교체만. HANDOFF 한 줄 요약 + 머지 후 결정 절에 v0.17.2 entry 추가. HOTFIXES 의 두 entry anchor 가 `post-v0.17.1 dogfood` → `v0.17.2` 로 갱신. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:55:50 +00:00
altair823	90e77631a8	Merge pull request 'feat(search): heading_path FTS5 text column filter' (#165 ) from feat/heading-text-column-filter into main	2026-05-25 05:48:22 +00:00
altair823	fa251db48f	chore(search): PR #165 회차 2 리뷰 반영 HOTFIXES entry 의 MCP / agent 가시성 단락이 회차 1 의 SKILL.md 추가 결정과 contradiction (`별도 SKILL.md 갱신 불필요` 잘못된 표기). 갱신 사실 + 새 escape hatch 가 v0.17.0 raw mode pattern 위에 build 됐다는 점 명시. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:45:41 +00:00
altair823	3114c31841	chore(search): PR #165 회차 1 리뷰 반영 - HOTFIXES test 카운트 표기 정정: `9 신규 / 갱신 unit test` 의 산수 ambiguity → `9 unit test (8 갱신 + 1 신규) + 2 신규 통합 test = 11 total` 로 명시. - SKILL.md (Claude Code integration) 의 search 절에 column scoping + heading_path raw-mode escape hatch 안내 한 bullet 추가. 회차 1 의 follow-up suggestion 반영 — heading 검색 의도 agent 가 새 escape hatch `'heading_path : <token>'` 를 발견 가능. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:44:21 +00:00
altair823	271329efbd	feat(search): heading_path FTS5 text column filter (default text-only matching) v0.17.0 trigram tokenizer entry 가 미수정으로 남겨둔 heading_path_json JSON 노이즈 (HOTFIXES 2026-05-24) closure. trigram 이 chunks_fts.heading_path 컬럼 (V002/V007 트리거가 chunks.heading_path_json 그대로 INSERT) 의 JSON 표기 + 안의 path 세그먼트 (app, src) 까지 3-gram 색인해서 query 가 우연히 false positive hit 하는 문제. column filter 채택 — heading 색인 유지 (V007 verbatim 불변), 매칭 대상만 text 컬럼 한정. - build_match_string 가 non-raw 분기에서 combined expression 을 `text : (<expr>)` 로 wrap. FTS5 column filter syntax 가 OR/AND sub-expression 허용. - Raw mode (`'...'`) 는 그대로 — 사용자가 명시 의도로 `'heading_path : agent'` 같은 explicit opt-in 가능 (escape hatch). - 8 기존 build_match_string unit test expected string 갱신 + `build_match_string_raw_mode_preserves_heading_filter` 신규. - `lexical_heading_only_token_does_not_hit_default_mode` 신규 회귀 핀 (heading-only unique token 이 default mode 에서 0 hit). - `lexical_raw_mode_can_opt_into_heading_path_filter` 신규 — 같은 fixture 가 raw mode 로 hit 확인 (escape hatch 동작 핀). 사용자 영향: lexical / hybrid 검색의 본문 precision ↑. recall 변화 없음 (text 본문 token 매칭은 동일). re-ingest 불필요 (FTS query 시점 매칭만 변경). lexical_snapshot_run_1 + hybrid_snapshot 도 fixture regenerate 불필요 (text 본문 매칭 query 라 BM25 동일). HOTFIXES: 2026-05-24 v0.17.0 entry 의 `heading_path_json` 노이즈 항목 closure 표기 + 새 2026-05-25 post-v0.17.1 dogfood entry 추가. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:40:51 +00:00
altair823	f2867540d2	Merge pull request 'feat(ocr): request_timeout_secs config knob' (#164 ) from feat/ocr-timeout-config into main	2026-05-25 05:14:27 +00:00
altair823	e118844256	chore(ocr): PR #164 회차 1 리뷰 반영 - HOTFIXES 헤더 `v0.17.2` (vaporware) → `post-v0.17.1 dogfood` 로 변경, release tag 결정과 무관하게 정확한 anchor. - HOTFIXES caller 수 `6 (5+3)` → `9 call site (6+3)` 으로 정정. - OcrCfg.request_timeout_secs doc 의 edge case 가 LlmCfg sister doc 과 동일한 구체 예제 (`u64::MAX`, `86400`) + reqwest 0.12.x 명시 주석으로 강화. - LLM + OCR 양쪽의 legacy TOML fixture (78 줄 거의 동일) 를 module-level `LEGACY_PRE_TIMEOUT_TOML` const 로 추출. 두 test 가 동일 source 공유 → 옛 schema 가 또 변하면 한 곳만 수정. reqwest::Duration::ZERO fact-check (회차 1 점 5) 는 회차 2 reply 에서 검증 결과 보고. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:13:09 +00:00
altair823	41c5edc517	feat(image.ocr): request_timeout_secs config knob + closure of v0.17.1 미진행 v0.17.1 (PR #162) 가 LLM 쪽 hard-coded 300s 를 [models.llm] request_timeout_secs 로 풀어준 것과 같은 패턴을 OCR 어댑터에 적용. 사용자 결정으로 별 노브 분리 ([image.ocr] request_timeout_secs) — OCR 는 LLM 대비 cold start 패턴이 달라 독립 조절이 편함. - OcrCfg.request_timeout_secs: u64 (serde default 300) - KEBAB_IMAGE_OCR_REQUEST_TIMEOUT_SECS env override - OllamaVisionOcr::build / from_parts 시그니처에 timeout 인자 추가 - REQUEST_TIMEOUT 상수 제거 - 3 신규 unit test (default / env / legacy parse) — LlmCfg 패턴 그대로 - HOTFIXES 2026-05-25 v0.17.1 entry 의 두 미진행 항목 모두 closure (OCR timeout = 본 PR, --stream docs = PR #163 에서 이미 완료) 기존 config / 옛 KB 영향 없음 — 새 필드는 default 로 채워지고 동작도 동일 (300s). vision 모델 cold start 가 길면 env 또는 config 로 늘릴 수 있음. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:06:53 +00:00
altair823	d02149c010	docs(v0.17.1): HANDOFF + INDEX — v0.17.1 cut sync - HANDOFF 한 줄 요약 v0.17.0 → v0.17.1 + release URL 추가 (v0.17.1 cut: PR #162 + #163 한 묶음 안내). - 머지 후 발견 deviation 절: 2026-05-25 v0.17.1 entry 추가. - INDEX P10 Dogfooding Feedback section 하단에 'v0.17.1 post-dogfood polish' subsection 추가 (PR #162, #163 각 한 줄). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 03:35:58 +00:00
altair823	0c69b9621b	chore: bump version 0.17.0 → 0.17.1 v0.17.1 patch release — v0.17.0 post-dogfood follow-up 두 PR 머지 후. - PR #162: [models.llm] request_timeout_secs config + 권장 모델 가이드 - PR #163: sudo 없이 ollama 설치 가이드 + kebab ask --stream UX 권장 둘 다 additive only (config field) + docs only — wire breaking 없음, 기존 사용자 영향 없음. patch bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 03:34:12 +00:00
altair823	0d69d85757	Merge pull request 'docs: sudo 없이 ollama 설치 + ask --stream 권장 (v0.17.0 post-dogfood)' (#163 ) from docs/ollama-install-and-stream into main Reviewed-on: #163	2026-05-25 03:26:24 +00:00
altair823	a67300317b	docs(ollama): sudo 없이 설치 가이드 + ask --stream 권장 (v0.17.0 post-dogfood) 확장 도그푸딩에서 사용된 두 패턴을 README + SMOKE 에 옮김. (1) sudo / systemd 없이 격리 디렉토리에 ollama 설치 — tarball 받아 /opt/ollama/{bin,models,logs} 같은 사용자 디렉토리에 풀고 OLLAMA_MODELS env 로 모델 위치 분리. 컨테이너 / WSL2 / 회사 머신 등 root 권한 제약 환경에 유용. 도그푸딩 머신에서 /build/cache/ollama 로 같은 패턴 검증. (2) cold start 가 긴 모델 (8B+ 또는 첫 호출) 은 `kebab ask --stream` 권장 — 동일 inference 시간이라도 progressive 토큰이 5분 timeout 한도 안에서 빠르게 surface 됨. p9-fb-33 의 streaming 경로를 UX 개선 권고로 명시. 코드 변경 없음 — docs only. README + SMOKE 두 군데 동일 패턴 sub-bullet + bash snippet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 03:23:35 +00:00
altair823	abb05ebc23	Merge pull request 'feat: [models.llm] request_timeout_secs config + 권장 모델 가이드' (#162 ) from feat/llm-timeout-config into main Reviewed-on: #162	2026-05-25 03:21:19 +00:00
altair823	26fdc4f344	docs(llm-timeout): 0-as-disable 함정 명시 + HOTFIXES typo + 용어 정리 PR #162 워커 리뷰 반영. - MEDIUM (W2) + LOW (W1): request_timeout_secs = 0 이 reqwest 의 의미상 disable 이 아닌 instant timeout (모든 요청 즉시 실패). LlmCfg field rustdoc + ollama.rs module-level comment + README 세 군데에 명시 + u64::MAX / 86400 같은 large finite 값 권장. - NIT (W1): HOTFIXES 2026-05-25 entry 의 '답변이 인 5분' typo → '답변이 5분' (1자 삭제). - NIT (W1): README + HOTFIXES 의 '확장 도그푸딩' 내부 jargon → '후속 도그푸딩' 으로 통일. 코드 동작 변경 없음 — doc only. cargo test request_timeout 3 PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 03:14:41 +00:00
altair823	3f5e0e6e90	feat(llm): [models.llm] request_timeout_secs config + 권장 모델 가이드 v0.17.0 확장 도그푸딩 (2026-05-25) 에서 발견된 두 가지를 한 PR 에 묶음. (1) llm.generate_stream 의 hard-coded 300s timeout 을 config 노브로 빼냄. 8B+ 모델 (gemma4:e4b 등) 은 CPU only 환경에서 5분 안에 첫 RAG 답변 못 마치고 `error: kb-rag: llm.generate_stream` 으로 떨어지던 문제. - kebab-config::LlmCfg 에 request_timeout_secs: u64 additive 필드 (#[serde(default = "default_llm_request_timeout_secs")] default 300). 옛 config 가 키 누락해도 그대로 파싱 + 동일 동작. - env override KEBAB_MODELS_LLM_REQUEST_TIMEOUT_SECS. - kebab-llm-local::ollama.rs 의 REQUEST_TIMEOUT 상수 제거 → OllamaLanguageModel::new 가 Duration::from_secs( llm.request_timeout_secs) 로 reqwest client 빌드. doc comment 도 동일 갱신. - 신규 unit test 3 — default 300 핀 / env override / legacy config (필드 누락) backward-compat. (2) docs — README 사전 요구 절 + docs/SMOKE.md ollama 안내에 한 단락: CPU only / RAM ≤ 16 GB 환경 ⇒ ≤ 4B Q4 모델 권장 (gemma3:4b / qwen2.5:3b / phi3:mini). 8B+ 시도 시 timeout 패턴 사전 안내. request_timeout_secs 노브 사용법. HOTFIXES 2026-05-25 entry — 위 두 변경 + 미진행 사항 (kebab-parse-image OCR 의 같은 hard-coded 300s 는 scope 외 follow-up 으로 등재 + ask --stream 권장 강조 후속) 기록. workspace cargo test -j 1 + clippy 통과. 코드 변경은 backwards-compat (additive serde field) 라 기존 사용자 영향 없음. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 03:01:03 +00:00
altair823	578a60e3bb	docs(v0.17.0): HANDOFF — version + PR-A/B/C closure entries (R1) - 한 줄 요약 v0.16.1 → v0.17.0 + release notes URL + PR-A/B/C 한 줄 요약. - 머지 후 발견 deviation 절: PR-A 외 PR-B / PR-C 의 2026-05-24 closure entry 추가. - '다음 task 후보' 의 P10 round 2 follow-up 라인: 세 항목 모두 v0.17.0 closure 표시. - 'P10 dogfooding 백로그' 의 chunk_breakdown + C typedef 두 항목도 ✅ v0.17.0 closure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:55:47 +00:00
altair823	64f518e08e	docs(v0.17.0): HANDOFF + INDEX — v0.17.0 cut sync (R1) - HANDOFF 한 줄 요약 v0.16.1 → v0.17.0, release notes URL, PR-A/B/C 셋 한 줄 요약. 머지 후 발견 deviation 절에 PR-B / PR-C closure entry 추가. "다음 task 후보" + "P10 백로그" 의 세 항목 ✅ v0.17.0 closure 표시. - INDEX 의 P10 섹션 하단에 신규 "P10 Dogfooding Feedback (v0.17.0)" subsection — PR-A/B/C 3 항목 listup (Gemini round 2 권장 형식). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:54:39 +00:00