Files

altair823 9dde01eb9f fix(rag): normalize RRF fusion_score to [0,1] + log post-merge hotfixes

## Bug

config.rag.score_gate default 0.05 was incompatible with hybrid RRF
fusion_score: raw RRF tops out at num_retrievers / (k_rrf + 1) ≈
0.0328 at the default k_rrf=60, so every hybrid `kb ask` tripped
ScoreGate refusal even when the top hit was perfectly aligned across
both retrievers. Symptomatic on the post-P4-3 manual smoke at
/tmp/kb-smoke/ pointed at 192.168.0.47 Ollama:

    $ kb ask "Rust ownership 모델의 핵심 규칙은 뭐야?" --mode hybrid
    근거 부족. KB에 해당 내용 없음.        # top fusion_score = 0.0164

Per-mode score_gate (lexical_score_gate / vector_score_gate /
hybrid_score_gate) was rejected because it forces every consumer
(CLI, eval, TUI) to know which mode picks which threshold. Score
normalization solves it at the source.

## Fix

crates/kb-search/src/hybrid.rs divides every fused score by
2 / (k_rrf + 1), the theoretical RRF maximum with two retrievers
each contributing rank 1. After normalization:

- both retrievers agree on rank 1 → fusion_score = 1.0
- only one retriever finds the chunk → caps near 0.5
- typical mixed ranks → falls between 0 and 0.5

RRF's rank-ordering invariants are preserved (every score divides
by the same positive constant), so sort + tiebreak behaviour is
unchanged. Wire schema label `fusion_score` keeps its slot in
RetrievalDetail; only the magnitude shifts, and only for hybrid
mode (lexical / vector were already in [0, 1]).

Verification: re-ran the four-scenario smoke at /tmp/kb-smoke/ with
default score_gate = 0.05 — all four (Korean→Korean, English→
English, cross-language Korean↔English, out-of-corpus) succeed
with the expected grounded / refusal classification, top
fusion_score now ≈ 0.5.

## Tests

One unit test (rrf_formula_matches_known_value) updated to expect
the normalized value `(1/61 + 1/62) / (2/61) ≈ 0.9919` instead of
the raw `1/61 + 1/62 ≈ 0.0325`. The integration snapshot fixture
crates/kb-search/tests/fixtures/search/hybrid/run-1.json already
used presence checks (fusion_score_positive: true) rather than
absolute values, so it doesn't need regeneration. Workspace 319
tests pass; clippy clean across both feature configs.

## Docs

This commit also adds tasks/HOTFIXES.md as a dated post-merge log
covering this fix and the two earlier --config-flag regressions
(P3-5 hotfix #20 across ingest/search/list/inspect/doctor; P4-3
follow-up #24 for kb ask). Original task specs in tasks/p<N>/
*.md stay frozen as the historical contract; HOTFIXES.md is the
live source of truth for post-merge deltas. Each affected task
spec gets a "Risks/notes" addendum pointing back to HOTFIXES.md
so a reader landing on the spec sees the active behaviour:

- tasks/INDEX.md gains a "Post-merge 핫픽스" section.
- tasks/phase-3-vector-hybrid.md updates the RRF formula text to
  show the normalized form.
- tasks/p3/p3-4-hybrid-fusion.md "Behavior contract" RRF bullet
  notes the normalization and reason.
- tasks/p3/p3-5-app-wiring.md "Risks/notes" notes the --config
  fix.
- tasks/p4/p4-3-rag-pipeline.md "Risks/notes" notes the kb-ask
  --config fix and the score_gate-RRF incompatibility (closed by
  the normalization in p3-4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 16:16:01 +00:00

4.7 KiB

Raw Blame History

title, source, date

title	source	date
KB 작업 단위 인덱스	kb_local_rust_report.md	2026-04-27

KB 작업 단위 인덱스

kb_local_rust_report.md 의 Phase 로드맵을 아키텍처 수준 작업 단위로 분해. 각 task 문서는 독립적으로 착수/검수 가능한 단위.

의존 그래프

P0 ── P1 ── P2 ── P3 ── P4 ── P5
                              │
                              ├─ P6 (image)
                              ├─ P7 (pdf)
                              ├─ P8 (audio)
                              └─ P9 (TUI/desktop)

P0~~P5 는 직렬. P6~~P9 는 P5 이후 병렬 가능.

작업 단위

#	코드	제목	핵심 산출 crate	선행
P0	phase-0-skeleton.md	Workspace 뼈대 + 도메인 계약	kb-core, kb-parse-types, kb-config, kb-app, kb-cli	–
P1	phase-1-markdown-ingestion.md	Markdown ingestion 파이프라인	kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-sqlite	P0
P2	phase-2-lexical-search.md	SQLite FTS5 lexical 검색 + citation	kb-search (lexical)	P1
P3	phase-3-vector-hybrid.md	Local embedding + LanceDB + hybrid	kb-embed, kb-embed-local, kb-store-vector, kb-search	P2
P4	phase-4-local-llm-rag.md	Local LLM + RAG + grounded answer	kb-llm, kb-llm-local, kb-rag	P3
P5	phase-5-evaluation.md	Golden query / regression eval	kb-eval	P4
P6	phase-6-image.md	이미지 ingestion (OCR + caption)	kb-parse-image	P5
P7	phase-7-pdf.md	PDF text + page citation	kb-parse-pdf	P5
P8	phase-8-audio.md	음성 transcription + timestamp citation	kb-parse-audio	P5
P9	phase-9-ui.md	TUI + desktop app	kb-tui, kb-desktop	P5

Component task decomposition (per phase)

각 phase 의 component-level 분해. AI sub-agent 1세션 = 1 task 가 sweet spot.

P0 — p0/ — 1 component
- p0-1 skeleton
P1 — p1/ — 6 components
P2 — p2/ — 2 components
- p2-1 fts-schema
- p2-2 lexical-retriever
P3 — p3/ — 5 components
P4 — p4/ — 3 components
P5 — p5/ — 2 components
- p5-1 golden-fixture-runner
- p5-2 metrics-compare
P6 — p6/ — 3 components
P7 — p7/ — 2 components
- p7-1 pdf-text-extractor
- p7-2 pdf-page-chunker
P8 — p8/ — 2 components
- p8-1 whisper-adapter
- p8-2 segment-chunker
P9 — p9/ — 5 components

Post-merge 핫픽스

머지 후 발견된 버그들과 그 follow-up PR들은 HOTFIXES.md에 dated 로그로 기록한다. 원래 task spec은 frozen 상태로 두고, post-merge 동작 변경은 HOTFIXES.md를 source of truth로 본다.

모든 task 공통 규약

의존성 경계 (Allowed / Forbidden) 위반 금지. report §19 참조.
citation 없는 검색 결과 / RAG 응답 금지.
원본 파일 파괴 금지. 파생물만 재생성.
모든 record 에 version (parser/chunker/embedding/index/prompt) 기록.
각 phase 완료 = cargo check --workspace && cargo test --workspace 통과 + 해당 phase 의 완료 조건 CLI 데모 통과.

4.7 KiB Raw Blame History Unescape Escape