v0.18.0 cut PR. fb-41 multi-hop RAG + NLI verification 의 user-visible surface (PR #176-180) + post-PR9 cleanup/refactor (PR #181) ship 마무리.
## 변경 사항
### Version
- workspace `Cargo.toml`: 0.17.2 → 0.18.0. Cargo.lock 자동 cascade (24 kebab-* crate 모두 0.18.0).
### Frozen design contract
- `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`:
- §3.8 RAG types — RefusalReason 에 NliVerificationFailed + NliModelUnavailable + MultiHopDecomposeFailed 추가 + Multi-hop RAG + NLI verification 의 ask_multi_hop facade + step 8.5 NLI hook + HopRecord / VerificationSummary 명시.
- §9 versioning rules 표 — nli_model_version row 신규 (선택 — v0.19+ second adapter 시 wire surface candidate).
### Status transitions
- `docs/superpowers/specs/2026-05-25-p9-fb-41-finalize-spec.md`: status approved-by-team → completed.
- `docs/superpowers/plans/2026-05-25-p9-fb-41-finalize-plan.md`: status approved-by-team → completed (spec_status 도).
### User-facing docs
- `README.md`: 명령 표의 `kebab ask` row 에 `--multi-hop` flag + NLI 옵션 안내 한 단락 (mDeBERTa-v3 XNLI 280 MB 자동 다운로드 / RAM peak ~7-8 GB / threshold tuning 0.5 prod / 0.0 disable).
- `docs/SMOKE.md`: `[rag] nli_threshold = 0.0` config 예시 + 활성화 절차 + first-run download + RAM 권장 inline 안내.
### Handoff + dashboard
- `HANDOFF.md`: 한 줄 요약 의 현재 version 0.17.2 → 0.18.0. v0.18.0 cut entry 추가 (fb-41 multi-hop + NLI + cleanup ship). Component 카운트 단락에 fb-41 PR-9 의 kebab-nli + ask_multi_hop 추가 명시. 머지 후 결정 절 맨 위에 v0.18.0 fb-41 entry 신규.
- `tasks/INDEX.md`: p9-fb-41 ⏳ → ✅ 머지 (v0.18.0). v0.18.0 subsection 신규 — PR #176-181 의 6 sub-PR + cleanup 각 한 줄 요약.
## 비범위 / 별 작업
- HOTFIXES.md 의 fb-41 entry 는 이미 PR #180 (PR-9d closure) 에서 작성 완료 — 본 cut PR 에서 추가 anchor 불필요.
- SKILL.md 의 v0.18+ NLI 안내는 이미 PR-9c-2 에서 inline 추가 완료.
## 검증
- `cargo check --workspace -j 1` 통과 (모든 24 crate v0.18.0 확인).
- frozen design 의 RefusalReason enum 확장이 kebab-core 의 production code 와 정합 (PR-9c-1 시점부터 동일 variants 있음).
Wire 영향: 없음 (additive minor 는 PR-9c-1 에서 이미 ship, 본 commit 은 documentation cascade only).
Behavior 영향: 없음.
머지 후 `gitea-release v0.18.0` 으로 tag + release notes 작성.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.17.1 post-dogfood polish cut. 두 PR 묶어 release:
- PR #164 — `[image.ocr] request_timeout_secs` 별 노브 (v0.17.1
미진행 closure). LLM 패턴을 OCR 어댑터에 동일 적용, 별 노브로
분리 (OCR vs LLM 의 cold start 패턴 차이로 독립 조절).
- PR #165 — `heading_path` FTS5 column filter 로 text-only 매칭
+ raw-mode escape hatch (2026-05-24 v0.17.0 trigram entry 의
JSON 노이즈 closure). lexical.rs 가 non-raw 분기 결과를
`text : (<expr>)` 로 wrap, 색인 자체는 V007 verbatim 그대로
유지. raw mode `'heading_path : <token>'` 로 opt-in 가능.
둘 다 additive (옛 config 호환) + re-ingest 불필요. binary 교체만.
HANDOFF 한 줄 요약 + 머지 후 결정 절에 v0.17.2 entry 추가.
HOTFIXES 의 두 entry anchor 가 `post-v0.17.1 dogfood` → `v0.17.2`
로 갱신.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.17.1 patch release — v0.17.0 post-dogfood follow-up 두 PR 머지 후.
- PR #162: [models.llm] request_timeout_secs config + 권장 모델 가이드
- PR #163: sudo 없이 ollama 설치 가이드 + kebab ask --stream UX 권장
둘 다 additive only (config field) + docs only — wire breaking 없음,
기존 사용자 영향 없음. patch bump.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.17.0 release cut — PR-A (한국어 trigram FTS tokenizer + lexical
builder + hint surface) + PR-B (C typedef alias unit + parser_version
cascade + orphan purge) + PR-C (code_lang_chunk_breakdown additive
wire field) 셋 머지 후.
Breaking changes:
- V007 migration (chunks_fts unicode61 → trigram) — chunks 원본 /
embedding / vector 불변, FTS shadow 자동 backfill. 사용자는 다음
open 시 V007 즉시 적용 (re-ingest 불필요). kebab.sqlite 파일 크기
~2-5배 또는 수백 MB 증가.
- 영어 lexical 검색이 substring 매칭으로 동작 변경 (token →
tokenization/tokenizer 도 hit, recall ↑ / 단어 경계 ↓).
- C parser_version code-c-v1 → code-c-v2 (typedef alias 추출
cascade). 같은 file 의 옛 doc/chunks/vector 는 same-workspace_path
orphan purge 가 자동 정리.
Additive (backwards-compat):
- SearchResponse.hint additive field — 한국어 2자 query 등 trigram
비호환 시 안내.
- schema.v1.stats.code_lang_chunk_breakdown additive field — chunk
단위 언어별 분포.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Patch bump — bug fix only (P10 dogfood-discovered k8s multi-resource
chunk_id collision). New binary needed to resume dogfooding. No wire
schema change, no DB migration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Minor bump — additive new chunker_versions code-c-ast-v1 + code-cpp-ast-v1
+ new routing langs c / cpp + new tree-sitter-c / tree-sitter-cpp workspace
deps. P10 Tier 1 chunker family complete. No DB migration, no wire schema
major bump.
Also lands the missing p10-3 try_skip_unchanged fallback-aware fix (Option
B1 — 7th param) that PR #155 was supposed to ship but never made it to main
(implementer reported commit SHA 2a39513 that didn't exist in the merged
branch). Same commit extends tier3_fallback_cv to include c/cpp.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Standard crate names resolved cleanly: tree-sitter-c v0.24.2 and
tree-sitter-cpp v0.23.4 are both compatible with workspace tree-sitter 0.26.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Minor bump — additive new chunker_version "code-text-paragraph-v1" + new
routing lang "shell" + new Tier 1/2 → Tier 3 fallback wrapper behavior.
No DB migration, no wire schema major bump (Citation::Code.lang values
remain a free string field).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bare tree-sitter-kotlin v0.3.8 requires tree-sitter >=0.21,<0.23 which
conflicts with the workspace's tree-sitter 0.26 (links = "tree-sitter"
is a singleton). tree-sitter-kotlin-ng v1.1.0 (from
tree-sitter-grammars/tree-sitter-kotlin) uses the tree-sitter-language
0.1 shim which is compatible with tree-sitter 0.26. Using
tree-sitter-kotlin-ng as the Kotlin grammar crate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dogfood-discovered file-deletion auto-purge (PR #148) lands. minor bump
사유: additive wire field IngestReport.purged_deleted_files + 새 CLI
summary surface (purged N) + 새 사용자-가시 동작 (rm a.md 후 ingest 시
자동 정리). design §10.4 도그푸딩-ready surface 확장 트리거.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dogfood-discovered routing additions (PR #147) land:
- .mts / .cts → MediaType::Code(typescript)
- .mdx → MediaType::Markdown
minor bump 사유: 사용자 도그푸딩 surface 확장 — 이전에 skip 되던 28+ 파일이
이제 색인됨. design §10.4 dogfooding-ready surface 확장 = minor trigger.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dogfood-discovered fix (PR #146) lands: idempotent re-ingest now correctly
returns Unchanged for twin files (identical content at different paths)
via document-centric try_skip_unchanged lookup.
patch bump 사유: advertised idempotency 의 정상 동작 복원. 새 wire / config / surface 변경 없음.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dogfood-discovered fixes (PR #145) land in production:
- schema.v1.repo_breakdown 가 실제로 채워짐 (이전: 항상 빈 BTreeMap)
- workspace.include glob 가 walker 에서 enforce 됨 (이전: 완전 무시)
patch bump 사유: 둘 다 advertised surface 의 정상 동작 복원.
새 wire / config / surface 변경 없음.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dogfood-discovered code_lang/repo filter bug (PR #144) fix lands in
production. patch bump because:
- 1A-1 advertised CLI flags --code-lang / --repo were live but inert
(SearchFilters fields propagated but never applied to retriever SQL)
- fix restores intended behavior; no new wire surface
- user has dogfooded against httpx + zod + lodash and re-validating
needs the fixed binary
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tasks 5-8: new `kebab-parse-code` crate with three infrastructure modules
for the code ingest framework. Ships lang.rs (extension→language identifier
mapping), repo.rs (.git walk-up via gix 0.70 for RepoMeta), and skip.rs
(BUILTIN_BLACKLIST, is_generated_file, is_oversized). 14 integration tests
across three test files, all passing; clippy -D warnings clean.
Note: gix pinned to 0.70 (not 0.83 as originally suggested) because 0.83
fails to compile against Rust 1.94.1 due to non-exhaustive match patterns
in gix-hash. 0.70 resolves cleanly and has identical head_name/head_id API.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Opaque base64(JSON{offset, corpus_revision}). Mismatch or
malformed input returns ErrorV1 with code = stale_cursor.
base64 promoted to workspace dep.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fb-32 머지로 wire schema 가 search_hit.v1 / citation.v1 의 required
필드를 두 개 (indexed_at, stale) 확장 — additive minor 로 분류했지만
strict validator 입장에서는 한 번 깨진 셈이라 minor bump.
surface 변경 (사용자 도그푸딩 영향):
- 모든 search hit / RAG citation 의 wire JSON 에 indexed_at (RFC3339) +
stale (bool) 두 필드 추가
- CLI plain 출력 — stale doc 의 doc_path 옆에 [stale] tag (TTY = 노란색)
- TUI Search/Inspect/Ask pane — stale doc 의 doc_path 좌측에 [STALE] 배지
(Theme::Warning role)
- config.toml [search] stale_threshold_days 신규 (default 30, 0 = 비활성)
- env KEBAB_SEARCH_STALE_THRESHOLD_DAYS
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
agent integration MVP (fb-30 MCP server stdio) 머지 후 patch bump.
trigger:
- 신규 CLI surface `kebab mcp` (additive, 기존 명령 동작 무영향)
- new crate `kebab-mcp` (lib only, transitive 만)
- capability flag `mcp_server: false → true` (additive on schema.v1)
- design §10.2 MCP transport 절 추가 (additive subsection)
CLAUDE.md release 규약상 wire schema additive / surface 추가는
minor bump 영역이지만, pre-1.0 (0.x.y) 단계에서는 fb-26~31 group
누적 시까지 0.3.x patch 로 묶고, 큰 jump 시 0.4.0 cut 으로 운용.
fb-30 단독으로 break 없는 additive 라 0.3.1 patch 적정.
후속 fb-26 / 28 / 31 머지 시점에 추가 0.3.x patch 누적, breaking
schema 변경 시 0.4.0 minor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empty lib + serve_stdio entry that bails until Task 3 wires rmcp. Adds
rmcp 1.6 to workspace dependencies (server + macros + transport-io +
schemars features) + tokio multi-thread/io-util/io-std local extensions.
schemars declared as "1" (resolved to 1.2.1) — matches rmcp 1.6's ^1.0
requirement (verified via crates.io /dependencies; plan literal was 0.9
which would conflict). Path-style refs for kebab-app / kebab-config /
kebab-core follow workspace convention.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.1.0 release tagged on 2026-05-05 — first official release covering
P0~P4 + P5 + P6~P7 + P9 (UI) 의 도그푸딩 사이클 1회 완성.
이후 도그푸딩 follow-up (p9-fb-21 ~ p9-fb-24) 까지 v0.1.0 에 포함:
- p9-fb-22: TUI input cursor mid-string editing + Ask follow-tail.
- p9-fb-23: incremental ingest (skip unchanged docs).
- p9-fb-24: TUI status/key bar + Library 컬럼 헤더 + PgUp/PgDn.
다음 dev cycle 부터 0.2.0. minor bump rationale (semver pre-1.0):
- Wire schema additive (`IngestReport.unchanged`, `IngestEvent` 의
`Unchanged` variant) — backward-compat 유지 but 신규 surface.
- 신규 CLI flag (`--force-reingest`).
- 신규 SQLite migration V006 (incremental ingest 컬럼).
- TUI surface 변경 (status bar 항상 노출, Library 헤더 row, PgUp/PgDn,
cursor 편집).
- IngestOpts struct 도입 (AskOpts 패턴) — 기존 wrapper 보존하므로
caller breaking 은 없음.
기존 ~723 워크스페이스 테스트 무수정 통과 (CARGO_PKG_VERSION 가 env!
로 동적 — TUI status bar 테스트 자동 추적).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
도그푸딩 item 15 — TUI / 같은 process 안에서 동일 query 반복 시 SQLite
FTS + Lance + RRF 재계산이 매번 발생하던 비용 해소. in-process LRU
캐시 + 모노토닉 corpus_revision 카운터로 ingest commit 발생 시 모든
entry 자동 stale.
## 핵심 변경
- **SQLite V004 migration**: `kv (key TEXT PRIMARY KEY, value TEXT)
STRICT` + `corpus_revision = '0'` seed. 미래의 다른 scalar 도 같은
테이블에 들어갈 수 있는 generic shape.
- **`SqliteStore::corpus_revision()` / `bump_corpus_revision()`** —
`UPDATE ... CAST AS INTEGER + 1` atomic. INSERT-OR-IGNORE 도 함께
실행 (V004 seed 가 무슨 이유로 누락된 케이스 paranoid).
- **`kebab-app::ingest_with_config_cancellable`** — `new + updated > 0`
시 bump, no-op (skipped-only) reingest 는 cache 보존.
- **`App.search_cache: Option<Mutex<LruCache<SearchCacheKey, Vec<
SearchHit>>>>`** — `config.search.cache_capacity` (default 256, 0
비활성). `lru = "0.12"` workspace dep 추가.
- **`SearchCacheKey`** = `query_norm` (NFKC + trim + lowercase) +
`mode` + `k` + `snippet_chars` + `embedding_version` (vector/hybrid
만, lexical 은 빈 문자열) + `chunker_version` + `corpus_revision`
snapshot.
- **`App::search`** rewrite — cache 활성 시 lookup → miss 면 기존
`search_uncached` 호출 후 put. cache 비활성이거나 lock 실패면
straight-line.
- **`App::search_uncached`** (rename of pre-fb-19 `search` body) +
`search_uncached_with_config` facade — CLI `kebab search --no-cache`
로 진입.
- **`Config.search.cache_capacity: usize`** field, `#[serde(default)]`
로 기존 config 호환.
- **CLI `--no-cache`** flag — 디버깅용 (CLI 는 매 호출이 새 process
라 사실상 no-op 이지만 spec 명시 + 향후 long-lived process 호환).
- **frozen design §9 versioning** 표에 `corpus_revision` row 추가
(기존 `index_version` 라벨과 다른 차원: 라벨은 retrieval 형상,
corpus_revision 은 ingest commit ack).
## 테스트
- `kebab-store-sqlite` 신규 3 unit (fresh=0, monotonic bump, persist
across reopen)
- `kebab-app` 신규 4 integration (cached repeat 같은 hits, NFKC 정규화
로 case/whitespace collapse, --no-cache parity, first ingest bumps
corpus_revision)
- 워크스페이스 전체 `cargo test --workspace --no-fail-fast -j 1` exit 0
- `cargo clippy --workspace --all-targets -- -D warnings` clean
## 문서
- README `kebab search` 행: 캐시 동작 + `--no-cache` 안내 + corpus_
revision 무효화 메커니즘
- docs/SMOKE.md `[search]` 절에 `cache_capacity` 라인 추가
- HANDOFF: 2026-05-03 entry
- spec status planned → in_progress
## Out of scope
- patch-and-merge incremental (RRF 정규화 전체 hit set 기준이라 어려움)
- SQLite 영속 cache (P+)
- 다른 process 간 cache 공유 (in-process 만 — corpus_revision 이
cross-process 무효화는 O(1))
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
새 crate `kebab-tui` 가 §8 facade rule 따라 `kebab-app` 만 import.
Ratatui 0.28 + crossterm 0.28 기반 shell 이 다음을 제공:
- `App` 구조체: config + focus + library + 3 Option sub-state slot
(search/ask/inspect — p9-2/3/4 가 자기 모듈에서 채우는 parallel-safety
contract). p9-1 외에 App 정의 손대지 않음.
- `Pane` enum (Library/Search/Ask/Inspect/Jobs).
- `KeyOutcome` (Continue/Quit/SwitchPane/Refresh).
- `LibraryState` + 내부 inner: docs / list_state / filter / filter_edit /
needs_refresh / loading / pending_g.
- `render_library` (Frame, area, &App) — heading/body, filter overlay
toggleable, Korean/wide-char 너비는 unicode-width 로 계산.
- `handle_key_library`: j/k/Down/Up 이동, gg/G 끝, f 필터 overlay,
/=>Search ?=>Ask Enter=>Inspect, q/Esc 종료. error overlay 가 켜
있으면 어떤 키든 dismiss.
- 필터 overlay: tags_any (CSV) + lang 두 필드, Tab cycle, Enter
apply→Refresh, Esc cancel.
- `ErrorOverlay`: anyhow chain 캡쳐 후 popup 렌더 (Clear + 빨간 border).
- 터미널 lifecycle: `TuiTerminal` 가 enter raw mode + alt screen,
Drop 이 종료 시 (panic 포함) restore — 사용자 쉘 깨지지 않게.
- 비동기 없음: facade 호출은 main thread 동기. v1 의 brief hang 수용.
CLI: `kebab tui` 서브커맨드 추가, --config 받아 App::new + run.
테스트 10건 (`tests/library.rs`, TestBackend 사용):
- 빈 library / 3-doc render / q,Esc quit / / Search 전환 / ? Ask 전환
- Enter 빈 list 무동작 / Enter Inspect 전환 / j 이동 (3-step clamp) /
f 필터 overlay → 입력 → Enter Refresh.
Test seam: `App::populate_library_for_testing` (#[doc(hidden)]) 가
`pub(crate)` inner 를 우회. spec parallel-safety contract 그대로 유지.
Spec deviation (HOTFIXES `2026-05-02 P9-1`):
- `render_library` 의 `<B: Backend>` generic 제거 — ratatui 0.28 의 Frame
이 backend-agnostic.
- `populate_library_for_testing` 추가 (test seam, 공식 API 아님).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`PdfTextExtractor`(MediaType::Pdf) lopdf 기반 per-page 텍스트 추출.
페이지마다 `Block::Paragraph` + `SourceSpan::Page { page, char_start, char_end }`
emit. 본문이 비거나 추출 panic 인 페이지는 빈 paragraph + `Provenance::Warning`
("scanned candidate") 로 표시 — 이후 OCR fallback (별도 task) 의 입력.
핵심 동작:
- `lopdf::Document::load_mem` + `is_encrypted()` → 암호화 PDF 는 명시 에러
(`qpdf --decrypt` 안내).
- 페이지 단위 `extract_text(&[page])` 를 `catch_unwind` 로 감싸 malformed
page panic 을 recoverable warning 으로 변환.
- `/Info` dict 에서 Title/Producer/Creator best-effort 추출. UTF-16BE BOM
prefixed 문자열도 디코드 (한국어 등 non-ASCII Title 정상 처리).
- 9개 통합 테스트: 3-page emit, scanned-mixed warning, encrypted refuse,
corrupt header error, page_count 메타, UTF-16BE Title, filename
fallback, determinism, snapshot.
`parser_version = "pdf-text-v1"`. Allowed deps: `lopdf 0.32` + `pdf-extract 0.7`
(원본 spec 그대로). 본문 다국어 OCR fallback 은 §9.2 후속 task (out of scope).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dev/test profile 의 DWARF 디버그 정보 양을 줄이고 별도 .dwo 파일로
분리해서 target/ 디스크 사용량 절감.
## 동작 영향
없음. line-tables-only 는 line 번호는 유지하고 column 번호만 뺌
(backtrace 함수+라인 다 보임). split-debuginfo 는 symbol 정보를
별도 파일에 둘 뿐 binary codegen / opt-level 동일. 즉 컴파일러가
만드는 코드는 byte-identical.
release profile 안 건드림 → CI / `cargo run --release` 는 upstream
default 와 동일.
## Cargo.toml 코멘트 rename leftover 도 같이 정리
워크스페이스 root `Cargo.toml` 의 dep 코멘트 5 곳에 `kb-eval` /
`kb-store-sqlite` / `kb-store-vector` / `kb-rag` / `kb-llm-local`
옛 이름 남아있던 거 `kebab-*` 로 갱신 (rename PR #29 sweep 이
crates/ + docs/ + tasks/ 만 대상으로 해서 빠진 부분).
## 검증
- `cargo build --workspace` clean (fresh build).
- `cargo test --workspace --no-fail-fast -j 1` clean (모든 테스트
green; 0 FAILED).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P4 terminal task. Implements the user-facing payoff: retrieve →
score gate → pack → render → generate → cite-validate → persist.
After this commit, `kb ask` actually works against an Ollama
backend; the pipeline grounds the answer in retrieved chunks and
refuses cleanly when the gate trips or the model self-judges.
New crate kb-rag:
- pub struct RagPipeline { retriever, llm, docs, config } — all
Arc<dyn Trait + Send + Sync> so the pipeline shares + Sync.
- pub fn ask(query, opts) -> Result<Answer> drives the nine-stage
flow per spec §1.
- pub struct AskOpts { k, explain, mode, temperature, seed,
stream_sink: Option<mpsc::Sender<String>> }. k acts as a floor
over config.search.default_k so a low-k caller can't starve
retrieval (documented in field doc).
Pipeline stages:
1. Retrieve via the injected dyn Retriever.
2. Score gate: empty hits → NoChunks refusal (no LLM call); top-1 <
config.rag.score_gate → ScoreGate refusal (no LLM call) with
top-3 candidates listed in the synthesized answer text.
3. Pack: budget = config.rag.max_context_tokens.saturating_sub
(prompt overhead). Per-hit `[#n] doc=… heading=… span=…\n<text>`
with deterministic enumeration. If every hit's chunk is
unfetchable from the store (deleted between search and pack),
fall back to NoChunks refusal with a tracing::warn rather than
feeding an empty [근거] to the LLM.
4. Render rag-v1 prompt with the spec's verbatim Korean system
string + `[질문]/[근거]` user template.
5. Generate via dyn LanguageModel. Single-thread token loop owns
the iterator; tokens optionally forward to opts.stream_sink (a
`mpsc::Sender<String>`). SendError silently dropped — caller
cancellation never panics the pipeline. After Done the loop
reads (acc, finish_reason, usage) in lockstep with no race.
max_completion = llm.context_tokens().saturating_sub
(used_for_input).max(64) — explicitly NOT capped by
config.rag.max_context_tokens (that's the packing budget for
[근거], not the LM completion ceiling).
6. Citation extract via STRICT regex `\[#(\d{1,3})\]` (compiled
once via OnceLock). Loose forms `[1]`, `[ #1 ]`, `[#foo]`,
`[#1234]`, `vec![1]` are all rejected to prevent prose
false-positives.
7. Citation validate covers four cases:
- unknown marker (e.g. `[#7]` when only 3 packed) →
LlmSelfJudge refusal.
- empty answer with hits → LlmSelfJudge.
- non-empty + no marker + matches `근거 (가|이) 부족` regex →
LlmSelfJudge (model self-refused with the canonical phrase;
phrase match logged via tracing::debug for observability).
- non-empty + no marker + no refusal phrase → LlmSelfJudge
(silent ungrounded answers are still refusals).
- non-empty + ≥1 valid marker → grounded = true.
8. Build Answer per kb_core::Answer shape:
- citations: filter packed list to exactly the markers cited.
Wire format `marker: Some("[1]")` (square-bracketed bare
index) per design §2.3, distinct from the prompt-side
`[#n]` grammar.
- embedding ModelRef: read from config.models.embedding for
Vector/Hybrid; None for Lexical. Documented deviation since
the Retriever trait doesn't expose the embedder. For
ScoreGate/NoChunks refusals on Vector/Hybrid the embedding
model is still recorded — the vector retriever WAS consulted
even when the gate tripped.
- TraceId minted as `ret_<8-hex>` from blake3(query, top_score,
model_id, ns).
- retrieval AnswerRetrievalSummary populated.
- usage from the final Done chunk; latency_ms wall-clock
fallback when the LLM reports zero.
- created_at OffsetDateTime::now_utc().
9. Persist via SqliteStore::put_answer (new inherent method on
SqliteStore, not on the DocumentStore trait — answers aren't
documents and adding to kb-core was forbidden). Always inserts,
refusals included. packed_chunks_json is null unless
opts.explain == true.
kb-store-sqlite extension:
- pub fn put_answer(&Answer, query, packed_chunks_json) ->
Result<AnswerId>. Maps all 22 fields of the answers table per
V001 schema in a single INSERT under a transaction.
kb-app::ask wired:
- bail!("not yet wired (P4-3)") replaced with a real body that
builds the retriever per opts.mode (Lexical | Vector | Hybrid),
instantiates OllamaLanguageModel from config, constructs
RagPipeline, calls pipeline.ask. AskOpts moves to kb-rag and is
re-exported via `pub use kb_rag::AskOpts` so kb-cli's
`use kb_app::AskOpts` keeps working.
- kb-app/Cargo.toml gains kb-rag, kb-llm, kb-llm-local. P3-5's
forbids on these are lifted by P4-3 spec — kb-app is the
orchestrator and ask requires both the trait crate and the
Ollama adapter.
- kb-cli/main.rs's AskOpts literal updated with stream_sink: None
for the CLI path (TUI in P9 will plumb a real sink).
Tests (kb-rag: 18; kb-app: 1 ignored):
- 3 unit in src/pipeline.rs: marker regex strictness (rejects all
loose forms with byte-equal expectations), Send+Sync compile
check, embedding_ref_for behavior across modes.
- 15 integration in tests/pipeline.rs covering every spec test row
+ the new "all chunks unfetchable falls back to NoChunks" guard:
empty-hits, score-gate, grounded happy path, unknown-marker,
prose-`[1]` rejection, `vec![1]` rejection, refusal-phrase,
packing-budget overflow, streaming-forwards-to-mpsc, dropped-
receiver-no-panic, usage-from-final-Done, answers-row-inserted-
for-each-refusal-kind, determinism temp=0 seed=0, Answer JSON
shape, unfetchable-chunks-fall-back-to-no-chunks (the new
M3 test).
- kb-app/tests/ask_smoke.rs: 1 #[ignore]'d real-Ollama smoke that
drives the wired ask end-to-end against `localhost:11434`.
Workspace: 319 passed / 26 ignored / 0 failed. cargo clippy
--workspace --all-targets -- -D warnings clean.
Allowed deps respected (kb-core, kb-config, kb-search, kb-llm,
kb-store-sqlite, serde, serde_json, regex, time, tracing,
thiserror) plus forced waivers anyhow (Retriever / LanguageModel
trait return types) and blake3 (TraceId minting). Forbidden
(kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-
vector direct, kb-embed* direct, kb-llm-local direct, kb-tui,
kb-desktop) all absent from `cargo tree -p kb-rag` — concrete
adapters reach the pipeline only through trait objects.
Out of scope: reranker between retrieve and pack (P+), multi-turn
chat memory (P+), LLM-as-judge eval (P5 uses rule-based
must_contain), --json streaming (buffers per §0 Q5 hybrid).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>