## Bug
config.rag.score_gate default 0.05 was incompatible with hybrid RRF
fusion_score: raw RRF tops out at num_retrievers / (k_rrf + 1) ≈
0.0328 at the default k_rrf=60, so every hybrid `kb ask` tripped
ScoreGate refusal even when the top hit was perfectly aligned across
both retrievers. Symptomatic on the post-P4-3 manual smoke at
/tmp/kb-smoke/ pointed at 192.168.0.47 Ollama:
$ kb ask "Rust ownership 모델의 핵심 규칙은 뭐야?" --mode hybrid
근거 부족. KB에 해당 내용 없음. # top fusion_score = 0.0164
Per-mode score_gate (lexical_score_gate / vector_score_gate /
hybrid_score_gate) was rejected because it forces every consumer
(CLI, eval, TUI) to know which mode picks which threshold. Score
normalization solves it at the source.
## Fix
crates/kb-search/src/hybrid.rs divides every fused score by
2 / (k_rrf + 1), the theoretical RRF maximum with two retrievers
each contributing rank 1. After normalization:
- both retrievers agree on rank 1 → fusion_score = 1.0
- only one retriever finds the chunk → caps near 0.5
- typical mixed ranks → falls between 0 and 0.5
RRF's rank-ordering invariants are preserved (every score divides
by the same positive constant), so sort + tiebreak behaviour is
unchanged. Wire schema label `fusion_score` keeps its slot in
RetrievalDetail; only the magnitude shifts, and only for hybrid
mode (lexical / vector were already in [0, 1]).
Verification: re-ran the four-scenario smoke at /tmp/kb-smoke/ with
default score_gate = 0.05 — all four (Korean→Korean, English→
English, cross-language Korean↔English, out-of-corpus) succeed
with the expected grounded / refusal classification, top
fusion_score now ≈ 0.5.
## Tests
One unit test (rrf_formula_matches_known_value) updated to expect
the normalized value `(1/61 + 1/62) / (2/61) ≈ 0.9919` instead of
the raw `1/61 + 1/62 ≈ 0.0325`. The integration snapshot fixture
crates/kb-search/tests/fixtures/search/hybrid/run-1.json already
used presence checks (fusion_score_positive: true) rather than
absolute values, so it doesn't need regeneration. Workspace 319
tests pass; clippy clean across both feature configs.
## Docs
This commit also adds tasks/HOTFIXES.md as a dated post-merge log
covering this fix and the two earlier --config-flag regressions
(P3-5 hotfix #20 across ingest/search/list/inspect/doctor; P4-3
follow-up #24 for kb ask). Original task specs in tasks/p<N>/
*.md stay frozen as the historical contract; HOTFIXES.md is the
live source of truth for post-merge deltas. Each affected task
spec gets a "Risks/notes" addendum pointing back to HOTFIXES.md
so a reader landing on the spec sees the active behaviour:
- tasks/INDEX.md gains a "Post-merge 핫픽스" section.
- tasks/phase-3-vector-hybrid.md updates the RRF formula text to
show the normalized form.
- tasks/p3/p3-4-hybrid-fusion.md "Behavior contract" RRF bullet
notes the normalization and reason.
- tasks/p3/p3-5-app-wiring.md "Risks/notes" notes the --config
fix.
- tasks/p4/p4-3-rag-pipeline.md "Risks/notes" notes the kb-ask
--config fix and the score_gate-RRF incompatibility (closed by
the normalization in p3-4).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
72 lines
5.8 KiB
Markdown
72 lines
5.8 KiB
Markdown
---
|
|
title: "Post-merge hotfixes log"
|
|
date: 2026-05-01
|
|
---
|
|
|
|
# Post-merge hotfixes log
|
|
|
|
Bugs discovered AFTER a phase task was merged, and the small follow-up
|
|
PRs that close them. Each entry: what broke, how it surfaced, what the
|
|
fix touched, and which task spec it amends.
|
|
|
|
The original task specs in `tasks/p<N>/p<N>-<M>-*.md` stay frozen as the
|
|
historical contract that was implemented; this file accumulates the
|
|
deltas so phase 5+ readers can find the live behavior without diffing
|
|
git history.
|
|
|
|
## 2026-05-01 — `--config` flag silently ignored across all kb-cli subcommands
|
|
|
|
**Discovered**: post-P3-5 manual smoke at `/tmp/kb-smoke/`.
|
|
|
|
**Symptom**: `kb --config /path/to/config.toml ingest|search|list|inspect|doctor` ignored the flag and fell back to `~/.config/kb/config.toml` (XDG default). Users had to use `KB_*` env vars to point at a non-default config.
|
|
|
|
**Root cause**: `kb-cli` read `cli.config` only inside `Cmd::Ingest` to build `SourceScope`, then called bare `kb_app::ingest(scope, summary_only)` which internally re-loaded `Config::load(None)` (XDG path). Same pattern in `Cmd::Search` / `List` / `Inspect` / `Doctor`. P3-5 introduced `*_with_config` test seams via `#[doc(hidden)] pub fn` but kb-cli never used them.
|
|
|
|
**Fix** (PR #20, fix/cli-config-flag-and-search-output):
|
|
- `kb-cli` now builds the Config once via `Config::load(cli.config.as_deref())` at the top of every subcommand and threads it into `kb_app::*_with_config(cfg, ...)` instead of `kb_app::*(...)`.
|
|
- `kb_app::doctor()` rewritten as `doctor_with_config_path(Option<&Path>)` that reports the actual path probed and hard-fails when `--config <path>` doesn't exist (defaults would otherwise mask user intent).
|
|
- `kb-app` module doc-comment updated: `#[doc(hidden)] pub fn *_with_config` is no longer "test-only seam" — it's the official "config-explicit" API consumed by CLI `--config`, integration tests, and TUI sessions.
|
|
- Same PR also improved `kb search` printer: `{:.4}` score formatting (RRF range collapses on `{:.2}`) and `> heading_path` suffix so chunks from the same document are visually distinct.
|
|
|
|
**Amends**: tasks/p3/p3-5-app-wiring.md (the test seam was always meant to be the config-explicit API; only the doc-comment lied).
|
|
|
|
### 2026-05-01 — `--config` regression in `kb ask` (P4-3 follow-up)
|
|
|
|
**Discovered**: post-P4-3 manual smoke against 192.168.0.47 Ollama with `gemma4:26b`.
|
|
|
|
**Symptom**: `kb --config <path> ask` returned `model.id = qwen2.5:14b-instruct` (XDG default model) and `score_gate = 0.30` (XDG default), instead of `gemma4:26b` / `0.05` from the explicit config. P4-3 added the ask body but kb-cli's `Cmd::Ask` arm still called bare `kb_app::ask(query, opts)` — same regression class as the P3-5 fix above, just missed when ask was wired.
|
|
|
|
**Fix** (PR #24, fix/cli-ask-honor-config-flag):
|
|
- `kb-cli` builds `Config::load(cli.config.as_deref())` once at the top of `Cmd::Ask` and calls `kb_app::ask_with_config(cfg, query, opts)`.
|
|
|
|
**Amends**: tasks/p4/p4-3-rag-pipeline.md.
|
|
|
|
## 2026-05-01 — RRF `fusion_score` incompatible with `config.rag.score_gate` default
|
|
|
|
**Discovered**: post-P4-3 manual smoke. Top hybrid result returned `fusion_score = 0.0164` against `score_gate = 0.05` → ScoreGate refusal on every hybrid query.
|
|
|
|
**Root cause**: RRF formula `score(c) = Σ 1/(k_rrf + rank_m(c))` produces values bounded by `num_retrievers / (k_rrf + 1)`. With `num_retrievers = 2` and the default `k_rrf = 60`, the upper bound is `2/61 ≈ 0.0328`. The default `config.rag.score_gate = 0.05` was calibrated for vector / lexical scores already in `[0, 1]` and silently refused every hybrid query. `fusion_score` was also incomparable across modes — Lexical / Vector lived in `[0, 1]`, Hybrid lived in `(0, 0.033]`.
|
|
|
|
**Fix** (PR #25, fix/rrf-fusion-score-normalize-and-docs):
|
|
- `crates/kb-search/src/hybrid.rs` divides every raw RRF score by `2 / (k_rrf + 1)` so `fusion_score` always lives in `[0, 1]` regardless of mode. Both retrievers contributing rank 1 normalises to `1.0`; chunks present in only one retriever cap around `0.5`. RRF's rank-ordering invariants are preserved (same constant divides every score), so sort + tiebreak behaviour is identical.
|
|
- One unit test (`rrf_formula_matches_known_value`) updated to expect the normalised value `(1/61 + 1/62) / (2/61) ≈ 0.9919`.
|
|
- The integration snapshot `crates/kb-search/tests/fixtures/search/hybrid/run-1.json` already used presence checks (`fusion_score_positive: true`) rather than absolute values, so it didn't need regeneration.
|
|
|
|
**Why not a per-mode `score_gate` config**: separate `lexical_score_gate / vector_score_gate / hybrid_score_gate` would force every downstream consumer (CLI, eval, TUI) to know which mode picks which threshold. Normalising the score itself is a one-line change at the source and makes `Answer.retrieval.score_gate` semantically meaningful without per-mode bookkeeping.
|
|
|
|
**Amends**: tasks/p3/p3-4-hybrid-fusion.md (RRF formula now divides by `2/(k_rrf+1)` after summation), tasks/phase-3-vector-hybrid.md (RRF section).
|
|
|
|
**Verification**: post-fix smoke at `/tmp/kb-smoke/` with default `score_gate = 0.05` succeeded across four scenarios — Korean→Korean, English→English, cross-language, and out-of-corpus refusal.
|
|
|
|
## How to add an entry
|
|
|
|
Each fix gets a dated subsection with five fields:
|
|
|
|
- **Discovered**: when / how the bug surfaced (smoke, integration test, user report).
|
|
- **Symptom**: what the user saw / what was wrong.
|
|
- **Root cause**: the actual code or design issue.
|
|
- **Fix**: PR number / branch + a one-paragraph summary of the change.
|
|
- **Amends**: which `tasks/p<N>/...` spec docs the fix retroactively contradicts. Spec text stays frozen; this log is the live source of truth for post-merge deltas.
|
|
|
|
If a fix is large enough that the original spec is no longer a useful reference, promote the entry into a new task spec (e.g., `p<N>-<M+1>-<topic>.md`) and link from here.
|