P6-1 / P6-2 / P6-3 의 라이브러리 파이프라인 (`ImageExtractor`,
`OllamaVisionOcr`, `apply_caption`) 이 모두 머지되어 있지만
`kebab-app::ingest_with_config` 의 dispatch 가 markdown 만 처리하므로
CLI 에서 이미지 자산이 색인되지 않는 미완 구간 존재. 본 spec 은
그 wiring 을 별도 component task 로 잡아 P6-1/2/3 의 frozen contract
는 보존하고 통합만 본 task 의 contract 로 진행되게 한다.
핵심 결정 (사용자 brainstorming 반영):
- 청킹 옵션 A — `kebab-chunk::md_heading_v1` 에 image-only document
분기 추가, 단일 합성 청크 emit.
- 청크 텍스트 포맷 (β) — `<alt>\n\n<ocr.joined>\n\n<caption.text>`
plain concatenation. 라벨 없음. 빈 부분 drop.
- 실패 정책 (b) Lenient — extract 성공이면 doc 저장, OCR/caption
부분 실패는 Provenance Warning + `errors` 카운터 미증가.
- LM 인스턴스 — ingest 세션당 1회 빌드, `&dyn LanguageModel` 공유.
- 책 / 스캔 PDF — P6-4 scope 외, P7 PDF 라인이 책임.
- P6-5 (image-scale-hardening) 미시작 — 사용자 시나리오가
\"다이어그램 / 스크린샷 / 카메라 사진\" 으로 좁아져 불필요.
INDEX.md: P6 \"3 components\" → \"4 components\".
contract: docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
sections: §3.4 ImageRefBlock, §6.1 ingest pipeline, §7.2
Extractor/Chunker traits, §9.1 image extraction policy.
- 새 모듈 `crates/kebab-parse-image/src/image_prep.rs` — OCR + caption
+ 향후 PDF/video 가 공유할 단일 다운스케일 헬퍼 (`downscale_to_png`)
추출. 기존 ocr.rs / caption.rs 의 거의 동일 알고리즘 두 벌을 한
곳으로 통합. 1px 후행 클램프 / PNG passthrough hot path / 에러
메시지 패턴이 한 곳에서 관리됨.
- src/ocr.rs: `downscale_to_long_edge` 제거 → `image_prep::downscale_to_png`
호출. `image::ImageReader / ImageFormat / Cursor` import 도 정리.
- src/caption.rs:
• `caption_image` / `apply_caption` 의 disabled 처리 비대칭 해소.
`caption_image` 는 raw 연산 (gate 없음), `apply_caption` 만
`cfg.image.caption.enabled` 게이트 검사. 호출자가 같은 함수에서
같은 의미를 얻음.
• `apply_caption` 의 caption.model / model_version `String::clone`
2회 → 0회. caption move 전에 ProvenanceEvent.note 를 먼저 빌드.
• 다운스케일 로직 통째로 image_prep 위임.
• `MIN_CAPTION_LONG_EDGE` / `MAX_CAPTION_LONG_EDGE` 를 `pub const`
로 노출 (P6-2 의 `MAX_DECODE_DIM` 가시성 컨벤션과 일관).
- tests/caption.rs:
• `caption_image_errors_when_feature_disabled` 를
`caption_image_runs_regardless_of_enabled_flag` 로 교체 — 새
책임 분리 의미 검증.
• `caption_image_clamps_oversized_max_pixels` 가 literal 1536 대신
`kebab_parse_image::caption::MAX_CAPTION_LONG_EDGE` 상수 참조.
- tasks/HOTFIXES.md: `model_version` 형태 deviation 한 단락 추가
(spec literal `provider` → `<provider>/<prompt_template_version>`
확장 + 사유).
cargo test -p kebab-parse-image — 42 pass + 2 ignored
(13 unit + 12 P6-1 + 8 P6-2 + 9 P6-3).
cargo clippy --workspace --all-targets -- -D warnings — pass.
- src/ocr.rs:
• `OllamaVisionOcr` 에 `#[derive(Debug)]` 추가 (test 의 expect_err
바운드 충족용; reqwest::blocking::Client 도 Debug 구현).
• 신규 unit 테스트 3건 (`build_rejects_empty_endpoint`,
`build_rejects_empty_model_after_trim`,
`build_clamps_max_pixels_outside_legal_range`) — 회차 2 에서
추가된 `fn build` 가드의 회귀 신호.
- src/lib.rs:
• 모듈-레벨 doc-comment 에 OCR 트러스트 정책 한 줄 추가
(\"LLM-driven default can hallucinate; OcrText.engine carries
source identity\"). lib 사용자가 ocr 모듈 doc 까지 안 들어가도
의도 캐치 가능.
cargo test -p kebab-parse-image — 31 pass + 1 ignored
(11 unit + 12 P6-1 integration + 8 P6-2 integration).
cargo clippy -p kebab-parse-image --all-targets -- -D warnings — pass.
- src/ocr.rs:
• `OllamaVisionOcr::new` 와 `from_parts` 의 입력 검증을 공통
`fn build` 으로 통합. 두 생성자가 빈 endpoint / 빈 model /
`max_pixels` 클램프 동일 invariant 를 공유 — \"테스트는 통과하지만
프로덕션은 panic\" 분기 차단.
• `max_pixels` clamp 가 실제로 발동 시 `tracing::warn!` 로 사유
기록 (사용자가 \"왜 항상 4096?\" 디버깅 가능).
• `downscale_to_long_edge` 의 long-axis 가 `f32` 라운딩으로 1px
초과하는 코너 케이스 (예: max=1601, long=4001) 후행 클램프로
엄격히 묶음. doc-comment 의 \"long edge is at most max_long_edge\"
가 실제 동작과 정확히 일치.
- tests/ocr.rs:
• 통합 테스트의 이중 게이트 (`#[ignore]` + `KEBAB_OCR_INTEGRATION=1`)
제거. `--ignored` 만으로 실행 의도 단일 신호화 — `kebab-llm-local`
의 통합 테스트 컨벤션과 일관됨. endpoint / model 의 env 오버라이드는
유지.
cargo test -p kebab-parse-image — 28 pass + 1 ignored.
cargo test -p kebab-config — 21 pass.
cargo clippy --workspace --all-targets -- -D warnings — pass.
- crates/kebab-config/src/lib.rs:
• `OcrCfg.endpoint: String` (\"\" sentinel) → `Option<String>` 으로 교체.
`#[serde(default)]` 적용. `KEBAB_IMAGE_OCR_ENDPOINT=\"\"` (빈 값) 도
None 으로 매핑하는 분기 추가.
• 신규 회귀 테스트 `image_ocr_endpoint_empty_env_value_is_none`.
- crates/kebab-parse-image/src/ocr.rs:
• `OllamaVisionOcr::new` 의 endpoint fallback 로직을 새 `Option<String>`
스키마에 맞춰 정리 (`as_deref` + match).
• `OllamaGenerateResponse` 의 dead `_other: HashMap<String, Value>` 필드
제거. `serde_json::Value` import 도 같이 정리.
• `OllamaGenerateRequest.images: Vec<&'a str>` → `[&'a str; 1]`
(호출당 vec! 알로케이션 제거, multi-image 는 OcrEngine trait 가
단일 이미지를 받으므로 OOS).
• `downscale_to_long_edge` 단일-디코드로 리팩터. PNG passthrough
hot path 보존 (header sniff 만으로 분기), 그 외 모든 경로는
decode 1회 + (필요 시) resize + PNG re-encode 1회로 통일.
• `pub fn max_pixels(&self) -> u32` accessor 추가 — clamp 결과
검증 용 (단순 inspector).
- crates/kebab-parse-image/tests/ocr.rs:
• `cfg_for_endpoint` / 통합 테스트가 `Some(endpoint)` 형태로 갱신.
• `from_parts_clamps_max_pixels_into_legal_range` 가 새 accessor
로 실제 클램프 결과 (256 / 4096 / 1024) 를 검증하도록 강화.
• 통합 테스트가 폰트 부재 시 panic 대신 skip 하도록 분기.
- crates/kebab-parse-image/tests/common/mod.rs:
• `hello_world_png` 가 `anyhow::Result<Vec<u8>>` 반환하도록 변경.
expect(\"DejaVu Sans Bold required\") 메시지를 \"only the opt-in
OCR integration fixture needs this font\" 로 의도 명확화.
cargo test -p kebab-parse-image — 28 pass + 1 ignored.
cargo test -p kebab-config — 21 pass (+1 회귀).
cargo clippy --workspace --all-targets -- -D warnings — pass.
Reviewer-suggested workspace.dependencies 통합 (reqwest / base64) 은
P6-3 와 함께 처리할 수 있도록 follow-up 으로 두고 본 PR scope 에서
제외 (회차 1 본문에서 명시).
- 새 모듈 `crates/kebab-parse-image/src/ocr.rs` 추가. spec 의 `OcrEngine`
trait 그대로 + `OllamaVisionOcr` default 구현 + `apply_ocr` 헬퍼.
- `OllamaVisionOcr`: `<endpoint>/api/generate` 비스트리밍 호출,
`images: [base64]` 필드로 이미지 전달, 프롬프트는 언어 힌트
+ 화이트리스트 언어 목록 포함. 응답 prose 를 `OcrText.joined` 로,
prepared image 전체 영역 단일 region (confidence 1.0) 으로 wrap.
기본 모델 `gemma4:e4b`. endpoint 비어 있으면 `models.llm.endpoint`
로 fallback.
- 이미지 전처리: long-edge `config.image.ocr.max_pixels` (기본 1600,
256~4096 클램프) 초과 시 PNG 로 재인코딩 (image::imageops::resize,
Triangle filter). PNG 입력이 max 이내면 zero-copy passthrough.
- `apply_ocr` 는 OCR 성공 시 block.ocr 를 Some 으로 채우고
ProvenanceKind::OcrApplied 이벤트 추가. 실패 시 block.ocr 는
None 그대로 + provenance 미기록 (부분 상태 누출 금지).
- `kebab-config`: 새 `ImageCfg.ocr: OcrCfg` 블록 (enabled/engine/model
/endpoint/languages/max_pixels). `#[serde(default)]` 로 pre-P6
TOML 호환. `KEBAB_IMAGE_OCR_*` 환경변수 5종 추가.
## Spec deviation
원래 P6-2 spec 은 Tesseract 를 default OCR 엔진으로 지정했으나, dev /
CI 호스트에서 `libtesseract-dev` 시스템 패키지 설치를 피하려고
Ollama-vision 으로 default 를 교체. `OcrEngine` trait 추상화는 spec
그대로 보존 — Tesseract / Apple Vision / PaddleOCR 어댑터는 같은
trait 으로 추후 feature-gate 추가 가능. 자세한 내역은
`tasks/HOTFIXES.md` 2026-05-02 항목 참조.
Trust 측면: vision LM 은 hallucinate 가능. `OcrText.engine = "ollama-vision"`
필드로 consumer 가 엔진 별 신뢰 분기 가능.
## 테스트
- 신규 (`tests/ocr.rs`, 8 + 1 ignored):
- 200 happy → OcrText 디코딩 (joined / engine / engine_version /
region count / bbox / confidence)
- 빈 응답 → 빈 regions
- 5xx → Err with status + body 포함
- 200 error envelope → Err
- apply_ocr → block.ocr Some + Provenance OcrApplied 1건
- apply_ocr error → block.ocr None 유지 + events 미기록
- 4000×3000 PNG → max_pixels=1024 까지 다운스케일, aspect ratio 보존
- from_parts max_pixels 클램프
- opt-in `KEBAB_OCR_INTEGRATION=1` 통합 (실제 192.168.0.47 Ollama
`gemma4:e4b` 로 \"Hello World 2026\" 전사 검증 완료)
- 신규 (`src/ocr.rs` unit): truncate, build_prompt 언어/힌트 처리
- `kebab-config` 테스트 +3: defaults, env override, pre-P6 TOML 호환
전체: `cargo test -p kebab-parse-image` 28 pass + 1 ignored,
`cargo test -p kebab-config` 20 pass,
`cargo clippy --workspace --all-targets -- -D warnings` pass.
contract: docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
sections: §3.4 ImageRefBlock.ocr, §3.7a OcrText / OcrRegion, §9.1 OCR
vs caption provenance.
- src/exif_extract.rs:
• `gps_decimal` 에 ±90 / ±180 범위 검증 추가. 비정상 EXIF (예: 위도
300°) 가 들어와도 wire 에 흘러나가지 않고 silent drop.
• GPSLatitudeRef / GPSLongitudeRef 가 빠진 좌표는 양수 가정으로
내보내지 않고 None 반환 — 모호한 부호를 그대로 두는 대신 손상된
메타데이터로 처리.
• `read_from_container` 실패 시 `tracing::debug!` 한 줄로 사유 기록
(운영시 \"EXIF 없음\" vs \"EXIF 손상\" 구분 단서).
- src/dims.rs: `match Some/None` 을 `anyhow::Context::context()?` 로
압축. import 한 줄 추가.
- src/lib.rs: `Vec::with_capacity` 를 dim_warning 분기에 따라
`2` / `3` 으로 정확히 맞추고 의미 주석 한 줄 추가.
- tests/common/mod.rs: `build_exif_blob_gps` 를 `GpsFlavor`
파라미터로 일반화 (`Valid` / `NoRef` / `OutOfRange`). JPEG 스플라이스
로직은 `splice_exif_into_jpeg` 헬퍼로 추출.
- tests/extractor.rs: 회귀 테스트 2건 추가 — `*Ref` 누락 좌표 드롭,
out-of-range 위도 드롭 (경도는 정상 통과 검증).
cargo test -p kebab-parse-image — 16건 (4 unit + 12 integration) pass.
cargo clippy -p kebab-parse-image --all-targets -- -D warnings — pass.
Codebase-specific guidance only (build/test/clippy commands, the
facade rule, spec contract location, allowed/forbidden deps,
versioning cascade, wire schema, post-merge HOTFIXES pattern,
smoke testing, naming/paths, Gitea remote). Defers to README.md
for the high-level overview, dependency graph, phase roadmap,
and directory tree rather than restating them.
Generated by `/init` skill against the current main (post-rename).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dev/test profile 의 DWARF 디버그 정보 양을 줄이고 별도 .dwo 파일로
분리해서 target/ 디스크 사용량 절감.
## 동작 영향
없음. line-tables-only 는 line 번호는 유지하고 column 번호만 뺌
(backtrace 함수+라인 다 보임). split-debuginfo 는 symbol 정보를
별도 파일에 둘 뿐 binary codegen / opt-level 동일. 즉 컴파일러가
만드는 코드는 byte-identical.
release profile 안 건드림 → CI / `cargo run --release` 는 upstream
default 와 동일.
## Cargo.toml 코멘트 rename leftover 도 같이 정리
워크스페이스 root `Cargo.toml` 의 dep 코멘트 5 곳에 `kb-eval` /
`kb-store-sqlite` / `kb-store-vector` / `kb-rag` / `kb-llm-local`
옛 이름 남아있던 거 `kebab-*` 로 갱신 (rename PR #29 sweep 이
crates/ + docs/ + tasks/ 만 대상으로 해서 빠진 부분).
## 검증
- `cargo build --workspace` clean (fresh build).
- `cargo test --workspace --no-fail-fast -j 1` clean (모든 테스트
green; 0 FAILED).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
두 reviewer 의 should-fix 4 건 + nit 5 건 push 전 반영.
## should-fix
- `citation_coverage`: 빈 citations[] 가 `Iterator::all` vacuous-true 로
1.0 새는 거 차단 — `!is_empty() && all(non-empty path)` 로 변경.
또한 `_store: &SqliteStore` dead 인자 시그니처에서 제거 (호출 사이트
+ 테스트 helper 정리).
- `refusal_correctness`: lexical-only run 에서 `answer == None` 인 경우
분모 증가 안 함 (NaN/null 출력) — 자동 fail 처리하던 게 metric 의미를
왜곡함. 새 unit test `refusal_correctness_nan_for_non_rag_run` 추가.
- `groundedness`: `must_contain.is_empty() && forbidden.is_empty()`인
golden 은 분모에서 제외. unconfigured entry 가 free 1.0 받지 않게.
새 unit test `groundedness_skips_unconfigured_goldens` 추가.
- `kb-cli/Cargo.toml` rationale 코멘트 사실 오류 정정 — kb-eval →
kb-app 의존이지 그 반대 아님.
## nits
- `KB_EVAL_GOLDEN` / `DEFAULT_GOLDEN_PATH` 중복 — `metrics::` 의
`pub(crate)` 로 단일화, `runner` 가 import.
- `render_report_md` 의 `{:?}` `ComparisonKind` → 명시적 lowercase
매핑 함수 (`win`/`loss`/`draw`/`regression`) — JSON 직렬화 컨벤션과
통일.
- `extract_chunker_version` `None == None` 매치 silent 위험에 대한
defensive 코멘트.
- `delta_null_when_either_nan` 테스트의 `let mut` suppress hack →
struct update syntax 로 정리.
- `empty_store` test helper + 매번 `mem::forget(tmp)` 죽은 코드 제거.
## 추가 spec doc
`tasks/p5/p5-2-metrics-compare.md` deviations 섹션 4 항목 추가:
- `kb-eval` crate-level `kb-app` dep — P5-1 inheritance, 새 모듈 surface
는 import 안 함.
- `citation_coverage` 약화된 resolver — `document_exists_by_path` 기다리는
중.
- `refusal_correctness` non-RAG 런 NaN.
- `groundedness` no-check golden skip.
## 검증
- `cargo test -p kb-eval` 35/35 (18 unit + 2 loader + 8 integration + 7
runner; 새 3 unit test).
- `cargo clippy --workspace --all-targets -- -D warnings` clean.
- `compare_report_snapshot_matches_fixture` 변경 없이 통과 — 새 동작이
스냅샷 입력 (lexical-only, no must_contain, no should-refuse) 영향 없음.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
State drift after P0–P4 completion + 3 post-merge hotfixes (PR #20
--config across subcommands, PR #24 --config in kb ask, PR #25 RRF
fusion_score normalization). README still framed the project as
"spec frozen, code 0 lines"; phase docs and task specs all carried
status: planned. Sweep:
- README.md: top banner now "P0–P4 done (17/31 tasks) + 3 hotfixes
applied"; command table marks each subcommand's owning phase and
current status (kb ask = ✅ via P4-3, kb eval = ⏳ P5);
phase roadmap table grew a Status column (P0–P4 completed, P5
next, P6–P9 pending); component count bumped 30 → 31 to reflect
P3-5 (app-wiring, post-spec); core decisions table notes the
RRF [0,1] normalization invariant; build+실행 section drops the
"P0 미시작" caveat; new pointers to HOTFIXES.md and SMOKE.md.
- docs/SMOKE.md (new): ~80-line recipe for running the full
pipeline against an isolated /tmp/kb-smoke/ workspace via
--config, without polluting ~/.config/kb/ or
~/.local/share/kb/. Covers fixture seeding, sample config.toml
with the post-merge defaults, doctor → ingest → list →
search × 3 modes → inspect → ask sequence, verification
checklist, and known-behaviour notes (fastembed model
download, RAG response time, --config hard-fail on missing
path).
- tasks/phase-{0..4}-*.md: status frontmatter flipped planned →
completed.
- tasks/p0/, tasks/p1/, tasks/p2/, tasks/p3/, tasks/p4/: same
status flip across all 17 component task specs (1+6+2+5+3).
P5–P9 stay planned.
cargo test --workspace: 319 passed; clippy clean (no source
changes in this commit, just docs + frontmatter).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Bug
config.rag.score_gate default 0.05 was incompatible with hybrid RRF
fusion_score: raw RRF tops out at num_retrievers / (k_rrf + 1) ≈
0.0328 at the default k_rrf=60, so every hybrid `kb ask` tripped
ScoreGate refusal even when the top hit was perfectly aligned across
both retrievers. Symptomatic on the post-P4-3 manual smoke at
/tmp/kb-smoke/ pointed at 192.168.0.47 Ollama:
$ kb ask "Rust ownership 모델의 핵심 규칙은 뭐야?" --mode hybrid
근거 부족. KB에 해당 내용 없음. # top fusion_score = 0.0164
Per-mode score_gate (lexical_score_gate / vector_score_gate /
hybrid_score_gate) was rejected because it forces every consumer
(CLI, eval, TUI) to know which mode picks which threshold. Score
normalization solves it at the source.
## Fix
crates/kb-search/src/hybrid.rs divides every fused score by
2 / (k_rrf + 1), the theoretical RRF maximum with two retrievers
each contributing rank 1. After normalization:
- both retrievers agree on rank 1 → fusion_score = 1.0
- only one retriever finds the chunk → caps near 0.5
- typical mixed ranks → falls between 0 and 0.5
RRF's rank-ordering invariants are preserved (every score divides
by the same positive constant), so sort + tiebreak behaviour is
unchanged. Wire schema label `fusion_score` keeps its slot in
RetrievalDetail; only the magnitude shifts, and only for hybrid
mode (lexical / vector were already in [0, 1]).
Verification: re-ran the four-scenario smoke at /tmp/kb-smoke/ with
default score_gate = 0.05 — all four (Korean→Korean, English→
English, cross-language Korean↔English, out-of-corpus) succeed
with the expected grounded / refusal classification, top
fusion_score now ≈ 0.5.
## Tests
One unit test (rrf_formula_matches_known_value) updated to expect
the normalized value `(1/61 + 1/62) / (2/61) ≈ 0.9919` instead of
the raw `1/61 + 1/62 ≈ 0.0325`. The integration snapshot fixture
crates/kb-search/tests/fixtures/search/hybrid/run-1.json already
used presence checks (fusion_score_positive: true) rather than
absolute values, so it doesn't need regeneration. Workspace 319
tests pass; clippy clean across both feature configs.
## Docs
This commit also adds tasks/HOTFIXES.md as a dated post-merge log
covering this fix and the two earlier --config-flag regressions
(P3-5 hotfix #20 across ingest/search/list/inspect/doctor; P4-3
follow-up #24 for kb ask). Original task specs in tasks/p<N>/
*.md stay frozen as the historical contract; HOTFIXES.md is the
live source of truth for post-merge deltas. Each affected task
spec gets a "Risks/notes" addendum pointing back to HOTFIXES.md
so a reader landing on the spec sees the active behaviour:
- tasks/INDEX.md gains a "Post-merge 핫픽스" section.
- tasks/phase-3-vector-hybrid.md updates the RRF formula text to
show the normalized form.
- tasks/p3/p3-4-hybrid-fusion.md "Behavior contract" RRF bullet
notes the normalization and reason.
- tasks/p3/p3-5-app-wiring.md "Risks/notes" notes the --config
fix.
- tasks/p4/p4-3-rag-pipeline.md "Risks/notes" notes the kb-ask
--config fix and the score_gate-RRF incompatibility (closed by
the normalization in p3-4).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier P3-5 hotfix wired --config through ingest / search / list /
inspect / doctor by switching kb-cli to call the *_with_config
companions. P4-3 added the ask body but kb-cli's Cmd::Ask arm still
called bare kb_app::ask(query, opts) — same bug as before, ask
silently fell back to ~/.config/kb/config.toml regardless of what the
user passed.
Caught during the post-P4-3 smoke against /tmp/kb-smoke/ pointed at
192.168.0.47 Ollama with gemma4:26b: the answer's wire JSON reported
`model.id = "qwen2.5:14b-instruct"` (the user's XDG default) instead
of `gemma4:26b` from the explicit --config, plus the score_gate /
data_dir / model fields all reflected XDG defaults. After this fix
the same invocation correctly returns model.id=gemma4:26b,
embedding=multilingual-e5-small (from the smoke config), grounded=true
with `[#2]` citation pointing at rust/ownership.md.
Same minimal pattern as the P3-5 hotfix:
- Build the Config once via Config::load(cli.config.as_deref()).
- Call kb_app::ask_with_config(cfg, query, opts) instead of
kb_app::ask(query, opts).
Workspace 319 tests pass; cargo clippy --workspace --all-targets --
-D warnings clean.
Smoke verified across four scenarios:
- Korean→Korean-body lookup: grounded with rust/ownership.md citation.
- English→Korean-body cross-language: grounded with arch/rag-
architecture.md citation.
- Korean→English-body cross-language: grounded with arch/embeddings.md
citation.
- Out-of-corpus query: LlmSelfJudge refusal with "근거가 부족하다."
Out of scope (filed for follow-up):
- config.rag.score_gate default 0.05 is incompatible with hybrid
RRF scores. RRF top score is bounded by 2/k_rrf (≈0.033 at k_rrf
=60), so the spec default trips ScoreGate on every hybrid query.
Workaround: lower the gate to 0.005 in the user's config; long-
term fix needs either per-mode gate config or RRF score
normalization to [0,1]. Tracked separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P4 terminal task. Implements the user-facing payoff: retrieve →
score gate → pack → render → generate → cite-validate → persist.
After this commit, `kb ask` actually works against an Ollama
backend; the pipeline grounds the answer in retrieved chunks and
refuses cleanly when the gate trips or the model self-judges.
New crate kb-rag:
- pub struct RagPipeline { retriever, llm, docs, config } — all
Arc<dyn Trait + Send + Sync> so the pipeline shares + Sync.
- pub fn ask(query, opts) -> Result<Answer> drives the nine-stage
flow per spec §1.
- pub struct AskOpts { k, explain, mode, temperature, seed,
stream_sink: Option<mpsc::Sender<String>> }. k acts as a floor
over config.search.default_k so a low-k caller can't starve
retrieval (documented in field doc).
Pipeline stages:
1. Retrieve via the injected dyn Retriever.
2. Score gate: empty hits → NoChunks refusal (no LLM call); top-1 <
config.rag.score_gate → ScoreGate refusal (no LLM call) with
top-3 candidates listed in the synthesized answer text.
3. Pack: budget = config.rag.max_context_tokens.saturating_sub
(prompt overhead). Per-hit `[#n] doc=… heading=… span=…\n<text>`
with deterministic enumeration. If every hit's chunk is
unfetchable from the store (deleted between search and pack),
fall back to NoChunks refusal with a tracing::warn rather than
feeding an empty [근거] to the LLM.
4. Render rag-v1 prompt with the spec's verbatim Korean system
string + `[질문]/[근거]` user template.
5. Generate via dyn LanguageModel. Single-thread token loop owns
the iterator; tokens optionally forward to opts.stream_sink (a
`mpsc::Sender<String>`). SendError silently dropped — caller
cancellation never panics the pipeline. After Done the loop
reads (acc, finish_reason, usage) in lockstep with no race.
max_completion = llm.context_tokens().saturating_sub
(used_for_input).max(64) — explicitly NOT capped by
config.rag.max_context_tokens (that's the packing budget for
[근거], not the LM completion ceiling).
6. Citation extract via STRICT regex `\[#(\d{1,3})\]` (compiled
once via OnceLock). Loose forms `[1]`, `[ #1 ]`, `[#foo]`,
`[#1234]`, `vec![1]` are all rejected to prevent prose
false-positives.
7. Citation validate covers four cases:
- unknown marker (e.g. `[#7]` when only 3 packed) →
LlmSelfJudge refusal.
- empty answer with hits → LlmSelfJudge.
- non-empty + no marker + matches `근거 (가|이) 부족` regex →
LlmSelfJudge (model self-refused with the canonical phrase;
phrase match logged via tracing::debug for observability).
- non-empty + no marker + no refusal phrase → LlmSelfJudge
(silent ungrounded answers are still refusals).
- non-empty + ≥1 valid marker → grounded = true.
8. Build Answer per kb_core::Answer shape:
- citations: filter packed list to exactly the markers cited.
Wire format `marker: Some("[1]")` (square-bracketed bare
index) per design §2.3, distinct from the prompt-side
`[#n]` grammar.
- embedding ModelRef: read from config.models.embedding for
Vector/Hybrid; None for Lexical. Documented deviation since
the Retriever trait doesn't expose the embedder. For
ScoreGate/NoChunks refusals on Vector/Hybrid the embedding
model is still recorded — the vector retriever WAS consulted
even when the gate tripped.
- TraceId minted as `ret_<8-hex>` from blake3(query, top_score,
model_id, ns).
- retrieval AnswerRetrievalSummary populated.
- usage from the final Done chunk; latency_ms wall-clock
fallback when the LLM reports zero.
- created_at OffsetDateTime::now_utc().
9. Persist via SqliteStore::put_answer (new inherent method on
SqliteStore, not on the DocumentStore trait — answers aren't
documents and adding to kb-core was forbidden). Always inserts,
refusals included. packed_chunks_json is null unless
opts.explain == true.
kb-store-sqlite extension:
- pub fn put_answer(&Answer, query, packed_chunks_json) ->
Result<AnswerId>. Maps all 22 fields of the answers table per
V001 schema in a single INSERT under a transaction.
kb-app::ask wired:
- bail!("not yet wired (P4-3)") replaced with a real body that
builds the retriever per opts.mode (Lexical | Vector | Hybrid),
instantiates OllamaLanguageModel from config, constructs
RagPipeline, calls pipeline.ask. AskOpts moves to kb-rag and is
re-exported via `pub use kb_rag::AskOpts` so kb-cli's
`use kb_app::AskOpts` keeps working.
- kb-app/Cargo.toml gains kb-rag, kb-llm, kb-llm-local. P3-5's
forbids on these are lifted by P4-3 spec — kb-app is the
orchestrator and ask requires both the trait crate and the
Ollama adapter.
- kb-cli/main.rs's AskOpts literal updated with stream_sink: None
for the CLI path (TUI in P9 will plumb a real sink).
Tests (kb-rag: 18; kb-app: 1 ignored):
- 3 unit in src/pipeline.rs: marker regex strictness (rejects all
loose forms with byte-equal expectations), Send+Sync compile
check, embedding_ref_for behavior across modes.
- 15 integration in tests/pipeline.rs covering every spec test row
+ the new "all chunks unfetchable falls back to NoChunks" guard:
empty-hits, score-gate, grounded happy path, unknown-marker,
prose-`[1]` rejection, `vec![1]` rejection, refusal-phrase,
packing-budget overflow, streaming-forwards-to-mpsc, dropped-
receiver-no-panic, usage-from-final-Done, answers-row-inserted-
for-each-refusal-kind, determinism temp=0 seed=0, Answer JSON
shape, unfetchable-chunks-fall-back-to-no-chunks (the new
M3 test).
- kb-app/tests/ask_smoke.rs: 1 #[ignore]'d real-Ollama smoke that
drives the wired ask end-to-end against `localhost:11434`.
Workspace: 319 passed / 26 ignored / 0 failed. cargo clippy
--workspace --all-targets -- -D warnings clean.
Allowed deps respected (kb-core, kb-config, kb-search, kb-llm,
kb-store-sqlite, serde, serde_json, regex, time, tracing,
thiserror) plus forced waivers anyhow (Retriever / LanguageModel
trait return types) and blake3 (TraceId minting). Forbidden
(kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-
vector direct, kb-embed* direct, kb-llm-local direct, kb-tui,
kb-desktop) all absent from `cargo tree -p kb-rag` — concrete
adapters reach the pipeline only through trait objects.
Out of scope: reranker between retrieve and pack (P+), multi-turn
chat memory (P+), LLM-as-judge eval (P5 uses rule-based
must_contain), --json streaming (buffers per §0 Q5 hybrid).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First real LanguageModel implementation. Wraps Ollama's local HTTP
API at POST {endpoint}/api/generate with stream:true, parses the
NDJSON streaming response into TokenChunk events, and maps Ollama
error states to a thiserror-derived LlmError with actionable hints.
Synchronous trait surface; reqwest::blocking handles the HTTP I/O.
Public surface:
- pub struct OllamaLanguageModel
- pub fn new(config: &Config) -> Result<Self> — lazy connect; never
hits the network. Spec line 96.
- pub enum LlmError { Unreachable, ModelNotPulled, Timeout, Stream,
Malformed }. Lives in this crate per spec — kb-core / kb-llm stay
free of error taxonomy.
- impl kb_core::LanguageModel via re-export from kb-llm.
Streaming:
- POST body shape per spec §11.2: model, prompt = system + "\n\n" +
user, stream: true, options { temperature, seed, num_ctx, stop }.
- OllamaStream owns BufReader<reqwest::blocking::Response>, reads
NDJSON lines via read_until(b'\n'), parses each as
{response, done, done_reason?, prompt_eval_count?, eval_count?,
total_duration?}. Token frame → TokenChunk::Token; done frame →
TokenChunk::Done { finish_reason, usage }.
- done_reason mapping: "length" → Length, "abort" → Aborted,
"stop" / missing / unknown → Stop (forward-compat with future
Ollama tags).
- Missing prompt_eval_count / eval_count default to 0 + tracing::warn
(do NOT fail). Spec line 135.
- EOF without a done line synthesizes Done { Aborted, zeros } so
downstream pipelines never deadlock waiting for a terminal frame.
- UTF-8: line-delimited framing means each JSON line is a complete
UTF-8 sequence — no cross-HTTP-chunk codepoint splits to worry
about. read_until accumulates whole lines regardless of how the
underlying reqwest body chunks.
Error mapping (LlmError):
- reqwest::Error::is_connect() → Unreachable { endpoint, source }
with hint "ensure `ollama serve` is running and reachable at
<endpoint>".
- reqwest::Error::is_timeout() → Timeout.
- 200 with non-NDJSON first line (e.g., transparent-proxy HTML
error page) → Stream(truncated body) — distinguished from
Malformed by the iterator's has_emitted flag.
- 404 with body containing model_id (case-insensitive) OR English
"model" + "not found" → ModelNotPulled(model_id) with hint
"ollama pull <model_id>". Tightened beyond spec to survive
Ollama localizing the error message (Korean / Japanese / etc.)
while keeping the original English-substring fallback.
- Other 4xx/5xx → Stream(truncated body).
- Mid-stream JSON parse failure (after at least one valid line) →
Malformed(line). Truncate all error bodies to 512 chars
(chars-based, multibyte safe) so an nginx 500 page can't blow up
the diagnostic.
- Trailing slash in endpoint stripped before formatting the URL —
endpoint = "http://x:1234/" produces .../api/generate, not
.../api//generate. Pinned by trailing-slash test.
Tokio note: reqwest 0.12's blocking feature internally wraps a
private current-thread tokio runtime, so cargo tree --edges normal
shows tokio. The auditable invariant is "no top-level tokio dep +
no async surface exposed to callers" — verified: src/ has zero
async/await/tokio::*. default-features = false drops default-tls
(rustls only) but does NOT drop tokio. Documented honestly in
Cargo.toml + lib.rs. Switching to ureq would remove tokio
entirely; deferred since reqwest is the spec's allowed dep.
Tests (24 total: 23 default + 1 ignored):
- 7 unit in src/ollama.rs: prompt-build, options-build, finish-
reason mapping, truncate_body bounds (under_cap / over_cap_marker
/ multibyte_chars_not_bytes), 404+model-id heuristic.
- 3 in tests/construction.rs: ModelRef shape, context_tokens
passthrough, lazy-connect proven via port-1 pointing.
- 13 in tests/streaming.rs: streamed tokens then Done, multibyte
chars within a line round-trip (renamed from "split across
chunks" to honestly reflect what's tested), Unreachable-with-
hint, 4xx→Stream, 404→ModelNotPulled, concat-equals-canned,
done_reason length / abort, missing eval counts default to zero,
missing done_reason defaults to Stop, determinism-by-mock,
trailing-slash endpoint, non-NDJSON 200 body → Stream not
Malformed.
- 1 #[ignore] in tests/integration.rs: real Ollama on
localhost:11434 with the configured model. Opt-in via
cargo test -p kb-llm-local -- --ignored after `ollama serve`
+ `ollama pull`.
Workspace: 288 passed / 25 ignored / 0 failed. cargo clippy
--workspace --all-targets -- -D warnings clean. No native-tls,
no openssl in the dep graph.
Allowed deps respected: kb-core, kb-config, kb-llm, reqwest 0.12
(default-features=false; blocking, json, rustls-tls), serde,
serde_json, tracing, thiserror plus anyhow (forced by trait return
type). wiremock + tokio in [dev-dependencies] only.
Out of scope: llama.cpp / candle adapters (P+), Ollama embed
endpoint (separate adapter inside kb-embed-local if requested),
cancellation / abort tokens (P+), connection-pool tuning.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Establishes the kb-llm trait crate so concrete LLM adapters (p4-2
Ollama, future llama.cpp / candle) target a stable surface. Pure re-
export of kb_core::{LanguageModel, GenerateRequest, TokenChunk,
FinishReason, TokenUsage, ModelRef} plus a feature-gated deterministic
mock for downstream RAG tests (p4-3) that need an LLM trait object
without an Ollama dependency.
MockLanguageModel (cfg(feature = "mock"), default OFF):
- Holds canned_response + canned_finish + canned_usage + (model_id,
provider, context_tokens). Pure in-memory; no I/O.
- generate_stream() honors GenerateRequest.stop: scans every non-empty
stop string against the canned response, takes the earliest byte
position (Iterator::min returns the first equal element on ties so
declaration order in req.stop wins), truncates with a direct byte-
slice (str::find returns a UTF-8 char boundary by contract).
- When a stop matches, finish_reason is overridden to Stop (matches
OpenAI / Ollama real-world behaviour); otherwise the caller's
canned_finish passes through verbatim.
- Emits one TokenChunk::Token per Unicode scalar value (char), NOT per
grapheme cluster — Hangul jamo, emoji ZWJ sequences, combining
marks split. Acceptable for trait-shape testing; real adapters MAY
combine. Documented in module docs.
- Always terminates with TokenChunk::Done { finish_reason, usage } even
if the canned response is empty. The returned iterator is a boxed
Vec<TokenChunk>::into_iter().map(Ok), trivially Send.
- Real adapters MAY return Err from generate_stream itself (e.g.
connection refused) before any chunk is yielded; the mock never does.
Documented for the trait re-exporter consumer audience.
Helpers:
- assert_finish_chunk(chunks) — asserts the last chunk is a Done.
Useful for proptests asserting trait contract over random inputs.
Tests:
- cargo test -p kb-llm (no features): 2 reexport / dyn-dispatch tests.
- cargo test -p kb-llm --features mock: 9 tests including 100-case
proptest over random Unicode strings asserting Done terminator,
char-count == streamed Token chunks, concat == canned (truncated by
stop), plus explicit cases for stop-string truncation, first-stop-
match precedence, model_ref dimensions=None invariant, finish reason
pass-through.
- All 271 workspace tests pass; clippy clean for both default and
mock-on feature configurations.
Symbol gating verified:
- cargo build --release -p kb-llm (default): nm shows zero
MockLanguageModel symbols.
- cargo build --release -p kb-llm --features mock: three trait-impl
symbols present. Spec invariant "release builds MUST NOT include
MockLanguageModel" enforced at the symbol level.
Allowed deps respected: only kb-core (path) and anyhow (workspace,
forced by trait return type). Dropped kb-config / serde / thiserror /
tracing from the spec's allowed list — they are listed as Allowed but
nothing in this skeleton crate references them, and dropping them
keeps the dependency graph slim for downstream consumers. p4-2/p4-3
will add what they need at their own dep sites.
Forbidden deps (reqwest, ureq, tokio, whisper-rs, kb-source-fs,
kb-parse-md, kb-normalize, kb-chunk, kb-store-*, kb-embed*, kb-search,
kb-rag, kb-tui, kb-desktop) all absent from cargo tree -p kb-llm.
Out of scope: real adapter (p4-2 Ollama), token counting against the
real tokenizer, server-side cancellation / abort signals (P+).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two issues surfaced during the post-P3-5 manual smoke test against a
six-document workspace:
1. --config flag was silently ignored. kb-cli read cli.config only
while building SourceScope inside the Ingest arm, then called
kb_app::ingest(scope, summary_only) which internally re-loads
Config::load(None) — falling back to ~/.config/kb/config.toml
regardless of what the user passed. Same pattern in search,
list, inspect, doctor. Users had to rely on KB_* env vars to
point at a non-default config.
2. Search output collapsed RRF hybrid scores to "0.02" because
`{:.2}` truncated the (0, 0.033]-bounded fused score, and
chunks from the same document showed up as identical lines
("3. 0.02 arch/rag-architecture.md") since heading_path was
never printed.
Fix:
- kb-app: doctor/ingest/search/list/inspect already had
*_with_config(Config, ...) seams introduced for integration tests
(#[doc(hidden)] pub). Repurpose them as the official "config-explicit"
API — kb-cli now builds the Config once via
Config::load(cli.config.as_deref()) at the top of every subcommand
and threads it into the *_with_config variant. Module doc-comment
updated to reflect three callers (CLI --config, integration tests,
TUI session) instead of "test-only seam".
- kb-app: doctor() rewritten as doctor_with_config_path(Option<&Path>)
that respects an explicit path. config_loaded probe now reports the
actual path checked, returning a clear hard error if --config points
at a non-existent or malformed file (defaults would silently mask
user intent). data_dir_writable resolves storage.data_dir from the
loaded config (with env overrides applied via Config::apply_env) so
--config users see their custom paths reflected. Original doctor()
signature kept as a None-passing wrapper.
- kb-cli: ingest/search/list/inspect/doctor each call the
*_with_config* companion. Search printer switches to {:.4} score
formatting (RRF hybrid range bounded by ~2/k_rrf ≈ 0.033 at k_rrf=60
default) and appends `> head1 / head2` when heading_path is non-
empty so chunks from the same document are visually distinguishable.
Verified manually:
- `kb --config /tmp/kb-smoke/config.toml doctor` reports the
custom config path + custom data_dir, not the XDG defaults.
- `kb --config /tmp/kb-smoke/config.toml search "..." --mode hybrid`
returns hits with distinct 4-digit scores and heading paths
("rust/ownership.md > Rust 소유권 모델 / Borrow checker").
Workspace 269 passed / 24 ignored / 0 failed; cargo clippy
--workspace --all-targets -- -D warnings clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the P0 `bail!("not yet wired")` stubs in kb-app with real
bodies that compose the libraries shipped through P3-4. After this
commit, `cargo run -p kb-cli -- index` actually walks the workspace
and persists chunks (SQLite + optionally LanceDB), and
`cargo run -p kb-cli -- search --mode {lexical,vector,hybrid}` returns
real SearchHits with citations. `kb-app::ask` stays stubbed; P4-3
owns it.
App lifecycle (crates/kb-app/src/app.rs):
- Internal pub(crate) struct App holds the Config plus
Arc<SqliteStore> eagerly, with embedder + LanceVectorStore behind
OnceLock<Arc<...>> for memoization. First call pays the ~470MB
fastembed init / Lance open; subsequent calls return the cached
Arc::clone. OnceLock::set race losers fall back to get().cloned()
so the lazy-init is concurrent-safe.
- One-shot CLI invocations pay the cost once at most. The P9 TUI
(which holds an App for the session) gets memoization for free.
ingest pipeline (lib.rs):
- FsSourceConnector::scan(&scope) → per asset:
parse_frontmatter → parse_blocks → build_canonical_document →
MdHeadingV1Chunker.chunk → put_asset_with_bytes → put_document →
put_blocks → put_chunks. One transaction per document per design
§5.8 (kb-store-sqlite's put_* methods own the transactions).
- When provider != "none" and dimensions > 0: build embedder once,
embed each doc's chunks as Document kind, ensure_table once at the
top of the run, then upsert the VectorRecord batch. Lexical-only
config (provider == "none") skips both — verified by
ingest_provider_none_skips_lance test.
- Per-asset parse failures recorded as IngestItemKind::Error with
the warning attached; the run continues. Only structural failures
(DB unreachable etc.) abort.
- Aggregate counts (assets_scanned / new / updated / skipped /
errors / chunks_indexed / embeddings_indexed / duration_ms) flow
into both the JobRepo progress_json AND a dedicated ingest_runs
row written via SqliteStore::record_ingest_run (new
pub(crate) helper added to kb-store-sqlite — see below).
summary_only=true writes items_json=NULL but still populates the
count columns.
search dispatch:
- SearchMode::Lexical → LexicalRetriever directly.
- SearchMode::Vector → VectorRetriever with embedder + LanceVectorStore.
- SearchMode::Hybrid → HybridRetriever composing the two.
- Vector / Hybrid with provider=none returns a clear error naming the
config key to flip ("models.embedding.provider").
list_docs / inspect_doc / inspect_chunk delegate straight to
DocumentStore trait methods. Returns Err with actionable message on
not-found.
Test seam: each public free function has a matching
#[doc(hidden)] pub fn *_with_config(cfg, ...) companion that
integration tests invoke directly (the public form internally calls
load_config()). pub(crate) would not reach across the integration-
tests crate boundary; #[doc(hidden)] keeps it out of rustdoc and the
function comment flags it as test-only.
kb-store-sqlite additions:
- pub struct IngestRunRow + pub fn record_ingest_run on SqliteStore
for the kb-app aggregate-counts persistence path. Helper writes
the ingest_runs row directly with all aggregate columns; jobs
table still gets a JobRepo create/update_progress/finish trio in
parallel.
Tests (11 default, 2 #[ignore] AVX-gated):
- ingest_lexical: round-trip, idempotent, summary_only_drops_items,
provider_none_skips_lance (asserts no .lance dir on disk),
records_ingest_runs_row_with_aggregate_counts, tags_any filter,
inspect_doc_not_found, inspect_chunk_not_found.
- search_lexical: lexical hits with embedding_model=None,
empty_query_returns_empty, vector_mode_with_provider_none returns
clear error.
- search_vector: hybrid mode end-to-end (#[ignore], AVX), Vector
mode embedding_model assertion (#[ignore], AVX). Both run on the
AVX VM in ~21s combined (first run pays the model download).
- TestEnv pins workspace.root + storage.{data_dir,model_dir} to a
TempDir so tests don't touch the user's $HOME/.local/share.
- Fixture workspace at crates/kb-app/tests/fixtures/workspace/ has
three small markdown files with varied frontmatter (rust+cargo+
python tags) so the tags_any filter test exercises a non-trivial
predicate.
Workspace 269 passed / 24 ignored / 0 failed (was 261/22). cargo
clippy --workspace --all-targets -- -D warnings clean. CLI smoke
verified manually: `cargo run -p kb-cli -- index` returns a real
IngestReport JSON; `cargo run -p kb-cli -- search "..."` returns
hits with citations; `cargo run -p kb-cli -- list docs` lists the
indexed documents.
Allowed deps respected: kb-source-fs, kb-parse-md, kb-parse-types,
kb-normalize, kb-chunk, kb-store-sqlite, kb-search, kb-store-vector,
kb-embed, kb-embed-local plus existing tracing / anyhow / serde /
toml / dirs and now blake3 (run_id) + time. Forbidden (kb-llm*,
kb-rag, kb-tui, kb-desktop, kb-parse-{pdf,image,audio}) absent from
cargo tree -p kb-app.
Out of scope per spec: ask body (P4-3), --rebuild-fts wiring,
--resume checkpointing (P+), --watch (P+), TUI / desktop integration
(P9 consumes this facade).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1–P3 shipped libraries but kb-app facade is still all `bail!("not yet
wired")` stubs, so the CLI is structurally complete but unusable.
Insert p3-5 between P3-4 and P4-1 to swap the facade bodies — ingest,
search, list_docs, inspect_doc, inspect_chunk — into real
compositions of the libraries shipped through P3-4. `ask` stays
stubbed; P4-3 owns it.
After p3-5 merges:
- `kb index` walks a workspace and persists chunks (SQLite +
optionally LanceDB).
- `kb search --mode {lexical,vector,hybrid}` returns real SearchHits
with citations.
- `kb list` / `kb inspect doc|chunk` round-trip from the store.
Updates:
- New task spec at tasks/p3/p3-5-app-wiring.md (depends_on
p1-6/p2-2/p3-2/p3-3/p3-4; unblocks p4-3/p9-1/p9-2/p9-4).
- tasks/INDEX.md bumps P3 component count 4 → 5 and adds the link.
- tasks/phase-3-vector-hybrid.md replaces the speculative
`embed_index` facade signature with the actual frozen kb-app
surface and updates the phase completion checklist.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Composes the existing LexicalRetriever (P2-2) with a new VectorRetriever
wrapper around LanceVectorStore (P3-3) into a single Retriever that
dispatches by SearchMode. For SearchMode::Hybrid, fuses lexical and
vector candidates via Reciprocal Rank Fusion and populates the full
RetrievalDetail per SearchHit so kb search --explain can attribute
scores back to each side.
Public surface (kb-search crate):
- pub struct VectorRetriever — Arc<dyn VectorStore + Send + Sync>,
Arc<dyn Embedder>, Arc<SqliteStore>, IndexVersion at construction.
- pub struct HybridRetriever { lexical, vector, fusion, k }.
- pub enum FusionPolicy { Rrf { k_rrf: u32 } }.
VectorRetriever:
- Embeds query.text as EmbeddingKind::Query before delegating to
VectorStore::search(query_vec, query.k * 2, &query.filters). Over-
fetches by ×2 for filter losses; LanceVectorStore applies the
filters internally so they propagate naturally.
- Hydrates each VectorHit into a full SearchHit by joining on
chunk_id in a single IN-clause batch (no N+1): doc_path,
section_label, chunker_version, source_spans for citation, plus
embedding_model from embedder.model_id().
- Snippet trimmed to config.search.snippet_chars (vector mode lacks
FTS5 highlighting; chunk text prefix is the next-best signal).
- Citation built from the chunk's first source span via the shared
citation_helper module — extracted from lexical.rs so both
retrievers compute citations identically (Byte/empty fallback to
Line{1,1} preserved with tracing::warn).
- RetrievalDetail.method = Vector for standalone calls; both
fusion_score and vector_score set to the LanceVectorStore-shifted
cosine score; lexical_* None.
HybridRetriever:
- Lexical / Vector modes delegate 1:1 — no rebuild of RetrievalDetail.
- Hybrid mode runs both retrievers with k * 2 fanout, fuses with
RRF (score(c) = Σ 1/(k_rrf + rank_m(c))), sorts fused-score DESC
with deterministic tiebreaker (lex_rank ASC then chunk_id ASC),
takes top query.k. Fusion math runs in f64 throughout; cast to
f32 only at the SearchHit boundary where bounded magnitude (≤
~0.033 at k_rrf=60) makes f32 precision sufficient for ranking.
- Per-hit lexical preferred for snippet/citation/heading_path/
chunker_version/embedding_model when the chunk appears in both
retrievers — FTS5 highlighting is more user-relevant than vector's
truncated text. Vector-only chunks fall through to vector hit data.
- index_version returns format!("hybrid:{}+{}", lex_iv, vec_iv) at
construction; mismatched lex/vec versions trigger a tracing::warn
so users notice stale indexes (spec line 143).
kb-search additions:
- citation_helper.rs — pub(crate) citation_from_first_span shared
between lexical and vector retrievers. Extracted from lexical.rs;
no behavior drift.
Tests (38 default + 3 ignored):
- 12 unit tests in hybrid.rs covering RRF math (1/61 + 1/62 within
f32 epsilon × 10 tolerance), lexical/vector mode delegation, hybrid
preserves single-side hits with the missing side's RetrievalDetail
None, deterministic tiebreaker on identical fused scores, composite
index_version, mismatched-version warn at construction.
- 2 unit tests in vector.rs covering the snippet-prefix and citation
fallback paths.
- 11 unit tests in lexical.rs (unchanged from P2-2).
- 13 lexical integration tests (unchanged).
- 3 #[ignore] AVX-gated hybrid integration tests: disjoint-corpus
recall (lex returns A,B; vec returns C,D; hybrid returns all 4),
determinism over two queries, snapshot stability against
tests/fixtures/search/hybrid/run-1.json. Snapshot fixture was
regenerated against this branch on an AVX-enabled VM and contains
4 real chunks (c1/c2 lex+vec, c3/c4 vec-only).
- KB_UPDATE_SNAPSHOTS=1 path now panics after writing instead of
silently passing — matches the P3-2/P3-3 fail-loud-instead-of-
silent-pass philosophy.
Allowed deps respected (kb-core, kb-config, kb-store-sqlite,
kb-store-vector, kb-embed, tracing, thiserror) plus pre-existing
kb-search deps from P2-2 (rusqlite, globset, serde_json, anyhow).
kb-embed-local does NOT appear — VectorRetriever takes Arc<dyn Embedder>
trait object; the concrete adapter is runtime-injected by kb-app.
Out of scope: reranker (P+), score calibration across modes (RRF is
rank-comparable so absolute calibration is P+), multimodal retrieval
(P6+).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>