fix(rag): RRF fusion_score를 [0,1]로 정규화 + post-merge hotfix 로그 추가 #25

altair823 · 2026-05-01T16:16:45Z

altair823 commented

2026-05-01 16:16:45 +00:00

변경 요약

P4-3 머지 후 직접 스모크에서 발견된 RRF score_gate 버그를 닫고, 지금까지 발견된 post-merge hotfix들을 tasks/HOTFIXES.md에 dated 로그로 정리했습니다.

버그

config.rag.score_gate default 0.05이 hybrid RRF fusion_score 범위와 incompatible. raw RRF top은 num_retrievers / (k_rrf + 1) 즉 default k_rrf=60에서 ≈ 0.0328로 bounded라, 두 retriever에서 모두 rank 1을 차지한 perfect-aligned 청크조차 ScoreGate refusal로 떨어집니다.

스모크 재현:

$ kb ask \"Rust ownership 모델의 핵심 규칙은 뭐야?\" --mode hybrid
근거 부족. KB에 해당 내용 없음.        # top fusion_score = 0.0164

fusion_score이 mode별로 다른 범위에 살았다는 점도 의미적 문제 — Lexical / Vector는 [0, 1], Hybrid는 (0, 0.033]. wire schema reader가 모드별 분기 로직을 가져야 했음.

Fix 결정

옵션:

(A) per-mode score_gate (lexical_score_gate / vector_score_gate / hybrid_score_gate) — 모든 consumer (CLI, eval, TUI)가 mode마다 다른 threshold를 알아야 함. 기각.
(B) RRF score를 [0, 1]로 정규화 — source에서 한 줄 수정, mode 간 일관성 자연스럽게 회복. 채택.

구현

crates/kb-search/src/hybrid.rs에서 raw RRF score를 2 / (k_rrf + 1) (두 retriever가 각각 rank 1을 차지할 때의 이론적 max)로 나눠 normalize:

let rrf_normalizer = 2.0_f64 / (k_rrf_f + 1.0);
// ... raw 누산 후
rrf /= rrf_normalizer;

정규화 후 의미:

두 retriever 모두 rank 1에 같은 chunk → fusion_score = 1.0
한 retriever만 chunk를 발견 → ≈ 0.5에서 cap
일반적인 mixed rank → 0과 0.5 사이

RRF의 rank ordering 불변량은 보존 (모든 score를 같은 양수 상수로 나누므로). sort + tiebreak 동작은 byte-identical. wire schema label fusion_score은 슬롯 그대로, magnitude만 shift.

검증

스모크 재실행 (default score_gate = 0.05로 복원):

시나리오	결과	top fusion
한국어 query → 한국어 corpus	grounded `[#2]` rust/ownership.md	0.5
영어 query → 영어 corpus	grounded `[#1]` arch/rag-architecture.md	~0.5
한국어 query → 영어 corpus (cross-lang)	grounded `[#1]` arch/embeddings.md	~0.5
out-of-corpus	LlmSelfJudge refusal "근거가 부족하다"	—

워크스페이스 319 tests pass. clippy clean.

테스트 변경

rrf_formula_matches_known_value (단위 테스트)가 1/61 + 1/62 ≈ 0.0325 대신 정규화된 (1/61 + 1/62) / (2/61) ≈ 0.9919을 기대하게 갱신.
통합 스냅샷 fixture (crates/kb-search/tests/fixtures/search/hybrid/run-1.json)는 absolute 값이 아니라 presence check (fusion_score_positive: true) 기반이라 regeneration 불필요.

문서 갱신

tasks/HOTFIXES.md 신규 — post-merge 핫픽스 dated 로그. 다음 세 fix 등록:

2026-05-01 — --config flag silently ignored across all kb-cli subcommands (P3-5 follow-up, PR #20).
2026-05-01 — --config regression in kb ask (P4-3 follow-up, PR #24).
2026-05-01 — RRF fusion_score incompatible with config.rag.score_gate default (이 PR).

각 entry는 Discovered / Symptom / Root cause / Fix / Amends 5개 필드. 원래 task spec은 frozen하고 HOTFIXES.md를 live source of truth로 운영.

각 영향받은 task spec에 "Risks/notes" addendum 추가 (HOTFIXES.md 링크):

tasks/INDEX.md — "Post-merge 핫픽스" 섹션 신규.
tasks/phase-3-vector-hybrid.md — RRF formula 정규화 형태로 갱신.
tasks/p3/p3-4-hybrid-fusion.md — Behavior contract RRF 항목 갱신.
tasks/p3/p3-5-app-wiring.md — --config 핫픽스 노트.
tasks/p4/p4-3-rag-pipeline.md — --config 핫픽스 + score_gate 버그 노트.

변경 파일

crates/kb-search/src/hybrid.rs (정규화 + 테스트 갱신)
tasks/HOTFIXES.md (신규)
tasks/INDEX.md, tasks/phase-3-vector-hybrid.md, tasks/p3/p3-4-hybrid-fusion.md, tasks/p3/p3-5-app-wiring.md, tasks/p4/p4-3-rag-pipeline.md (addendum)

후속

P5-1 진입 가능. eval task가 grounded/refusal classification + citation_coverage 측정 시 정규화된 [0, 1] fusion_score을 기준으로 하면 됩니다.

## 변경 요약 P4-3 머지 후 직접 스모크에서 발견된 RRF score_gate 버그를 닫고, 지금까지 발견된 post-merge hotfix들을 `tasks/HOTFIXES.md`에 dated 로그로 정리했습니다. ## 버그 `config.rag.score_gate` default `0.05`이 hybrid RRF `fusion_score` 범위와 incompatible. raw RRF top은 `num_retrievers / (k_rrf + 1)` 즉 default `k_rrf=60`에서 `≈ 0.0328`로 bounded라, 두 retriever에서 모두 rank 1을 차지한 perfect-aligned 청크조차 ScoreGate refusal로 떨어집니다. 스모크 재현: ``` $ kb ask \"Rust ownership 모델의 핵심 규칙은 뭐야?\" --mode hybrid 근거 부족. KB에 해당 내용 없음. # top fusion_score = 0.0164 ``` `fusion_score`이 mode별로 다른 범위에 살았다는 점도 의미적 문제 — Lexical / Vector는 `[0, 1]`, Hybrid는 `(0, 0.033]`. wire schema reader가 모드별 분기 로직을 가져야 했음. ## Fix 결정 옵션: - (A) per-mode `score_gate` (`lexical_score_gate` / `vector_score_gate` / `hybrid_score_gate`) — 모든 consumer (CLI, eval, TUI)가 mode마다 다른 threshold를 알아야 함. 기각. - (B) **RRF score를 [0, 1]로 정규화** — source에서 한 줄 수정, mode 간 일관성 자연스럽게 회복. 채택. ## 구현 `crates/kb-search/src/hybrid.rs`에서 raw RRF score를 `2 / (k_rrf + 1)` (두 retriever가 각각 rank 1을 차지할 때의 이론적 max)로 나눠 normalize: ```rust let rrf_normalizer = 2.0_f64 / (k_rrf_f + 1.0); // ... raw 누산 후 rrf /= rrf_normalizer; ``` 정규화 후 의미: - 두 retriever 모두 rank 1에 같은 chunk → fusion_score = 1.0 - 한 retriever만 chunk를 발견 → ≈ 0.5에서 cap - 일반적인 mixed rank → 0과 0.5 사이 RRF의 rank ordering 불변량은 보존 (모든 score를 같은 양수 상수로 나누므로). sort + tiebreak 동작은 byte-identical. wire schema label `fusion_score`은 슬롯 그대로, magnitude만 shift. ## 검증 스모크 재실행 (default `score_gate = 0.05`로 복원): | 시나리오 | 결과 | top fusion | |---|---|---| | 한국어 query → 한국어 corpus | grounded `[#2]` rust/ownership.md | 0.5 | | 영어 query → 영어 corpus | grounded `[#1]` arch/rag-architecture.md | ~0.5 | | 한국어 query → 영어 corpus (cross-lang) | grounded `[#1]` arch/embeddings.md | ~0.5 | | out-of-corpus | LlmSelfJudge refusal \"근거가 부족하다\" | — | 워크스페이스 319 tests pass. clippy clean. ## 테스트 변경 - `rrf_formula_matches_known_value` (단위 테스트)가 `1/61 + 1/62 ≈ 0.0325` 대신 정규화된 `(1/61 + 1/62) / (2/61) ≈ 0.9919`을 기대하게 갱신. - 통합 스냅샷 fixture (`crates/kb-search/tests/fixtures/search/hybrid/run-1.json`)는 absolute 값이 아니라 presence check (`fusion_score_positive: true`) 기반이라 regeneration 불필요. ## 문서 갱신 `tasks/HOTFIXES.md` 신규 — post-merge 핫픽스 dated 로그. 다음 세 fix 등록: 1. **2026-05-01** — `--config` flag silently ignored across all kb-cli subcommands (P3-5 follow-up, PR #20). 2. **2026-05-01** — `--config` regression in `kb ask` (P4-3 follow-up, PR #24). 3. **2026-05-01** — RRF `fusion_score` incompatible with `config.rag.score_gate` default (이 PR). 각 entry는 `Discovered / Symptom / Root cause / Fix / Amends` 5개 필드. 원래 task spec은 frozen하고 HOTFIXES.md를 live source of truth로 운영. 각 영향받은 task spec에 \"Risks/notes\" addendum 추가 (HOTFIXES.md 링크): - `tasks/INDEX.md` — \"Post-merge 핫픽스\" 섹션 신규. - `tasks/phase-3-vector-hybrid.md` — RRF formula 정규화 형태로 갱신. - `tasks/p3/p3-4-hybrid-fusion.md` — Behavior contract RRF 항목 갱신. - `tasks/p3/p3-5-app-wiring.md` — `--config` 핫픽스 노트. - `tasks/p4/p4-3-rag-pipeline.md` — `--config` 핫픽스 + score_gate 버그 노트. ## 변경 파일 - `crates/kb-search/src/hybrid.rs` (정규화 + 테스트 갱신) - `tasks/HOTFIXES.md` (신규) - `tasks/INDEX.md`, `tasks/phase-3-vector-hybrid.md`, `tasks/p3/p3-4-hybrid-fusion.md`, `tasks/p3/p3-5-app-wiring.md`, `tasks/p4/p4-3-rag-pipeline.md` (addendum) ## 후속 P5-1 진입 가능. eval task가 grounded/refusal classification + citation_coverage 측정 시 정규화된 `[0, 1]` fusion_score을 기준으로 하면 됩니다.

altair823 added 1 commit 2026-05-01 16:16:45 +00:00

fix(rag): normalize RRF fusion_score to [0,1] + log post-merge hotfixes 9dde01eb9f

## Bug

config.rag.score_gate default 0.05 was incompatible with hybrid RRF
fusion_score: raw RRF tops out at num_retrievers / (k_rrf + 1) ≈
0.0328 at the default k_rrf=60, so every hybrid `kb ask` tripped
ScoreGate refusal even when the top hit was perfectly aligned across
both retrievers. Symptomatic on the post-P4-3 manual smoke at
/tmp/kb-smoke/ pointed at 192.168.0.47 Ollama:

    $ kb ask "Rust ownership 모델의 핵심 규칙은 뭐야?" --mode hybrid
    근거 부족. KB에 해당 내용 없음.        # top fusion_score = 0.0164

Per-mode score_gate (lexical_score_gate / vector_score_gate /
hybrid_score_gate) was rejected because it forces every consumer
(CLI, eval, TUI) to know which mode picks which threshold. Score
normalization solves it at the source.

## Fix

crates/kb-search/src/hybrid.rs divides every fused score by
2 / (k_rrf + 1), the theoretical RRF maximum with two retrievers
each contributing rank 1. After normalization:

- both retrievers agree on rank 1 → fusion_score = 1.0
- only one retriever finds the chunk → caps near 0.5
- typical mixed ranks → falls between 0 and 0.5

RRF's rank-ordering invariants are preserved (every score divides
by the same positive constant), so sort + tiebreak behaviour is
unchanged. Wire schema label `fusion_score` keeps its slot in
RetrievalDetail; only the magnitude shifts, and only for hybrid
mode (lexical / vector were already in [0, 1]).

Verification: re-ran the four-scenario smoke at /tmp/kb-smoke/ with
default score_gate = 0.05 — all four (Korean→Korean, English→
English, cross-language Korean↔English, out-of-corpus) succeed
with the expected grounded / refusal classification, top
fusion_score now ≈ 0.5.

## Tests

One unit test (rrf_formula_matches_known_value) updated to expect
the normalized value `(1/61 + 1/62) / (2/61) ≈ 0.9919` instead of
the raw `1/61 + 1/62 ≈ 0.0325`. The integration snapshot fixture
crates/kb-search/tests/fixtures/search/hybrid/run-1.json already
used presence checks (fusion_score_positive: true) rather than
absolute values, so it doesn't need regeneration. Workspace 319
tests pass; clippy clean across both feature configs.

## Docs

This commit also adds tasks/HOTFIXES.md as a dated post-merge log
covering this fix and the two earlier --config-flag regressions
(P3-5 hotfix #20 across ingest/search/list/inspect/doctor; P4-3
follow-up #24 for kb ask). Original task specs in tasks/p<N>/
*.md stay frozen as the historical contract; HOTFIXES.md is the
live source of truth for post-merge deltas. Each affected task
spec gets a "Risks/notes" addendum pointing back to HOTFIXES.md
so a reader landing on the spec sees the active behaviour:

- tasks/INDEX.md gains a "Post-merge 핫픽스" section.
- tasks/phase-3-vector-hybrid.md updates the RRF formula text to
  show the normalized form.
- tasks/p3/p3-4-hybrid-fusion.md "Behavior contract" RRF bullet
  notes the normalization and reason.
- tasks/p3/p3-5-app-wiring.md "Risks/notes" notes the --config
  fix.
- tasks/p4/p4-3-rag-pipeline.md "Risks/notes" notes the kb-ask
  --config fix and the score_gate-RRF incompatibility (closed by
  the normalization in p3-4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude-reviewer-01 reviewed 2026-05-01 16:17:24 +00:00

claude-reviewer-01 left a comment

RRF 정규화 + post-merge hotfix 로그 코드 리뷰 — 셀프 머지 게이트로 인해 COMMENT only.

P4-3 후 직접 워크스페이스 (192.168.0.47 Ollama + gemma4:26b)에서 발견한 ScoreGate-RRF 호환성 버그를 source-level 정규화로 닫았습니다. 핵심 결정:

per-mode score_gate 대신 score 정규화 채택 — caller (CLI / eval / TUI)가 mode마다 다른 threshold를 알 필요 없게 source에서 closure.
rank ordering invariant 보존 — 모든 score를 같은 양수 상수 2/(k_rrf+1)로 나누므로 sort + tiebreak byte-identical. 단위 테스트 + 통합 스냅샷 모두 그대로 통과 (스냅샷은 presence check 기반이라 absolute 값에 종속되지 않았음).
Answer.retrieval.score_gate이 다시 의미 있는 값 — [0,1] 범위에서 비교 가능하므로 user-facing config로서 정상 동작.

검증: /tmp/kb-smoke/에서 default score_gate = 0.05로 4개 시나리오 모두 통과. 한국어/영어/cross-language grounded + out-of-corpus refusal.

문서 측면:

tasks/HOTFIXES.md 신규 — post-merge 버그 dated 로그. 지금까지 발견된 3건 (P3-5 hotfix --config, P4-3 hotfix --config-in-ask, RRF normalize) 모두 등록.
영향받은 5개 spec doc에 "Risks/notes" addendum + HOTFIXES.md cross-link. 원래 task spec은 frozen 그대로, HOTFIXES.md가 live source-of-truth.

워크스페이스 319 passed / 0 failed. clippy clean.

inline 코멘트는 모두 결정에 대한 노트입니다. 머지 진행해도 됩니다. 다음은 P5-1 golden-fixture-runner.

RRF 정규화 + post-merge hotfix 로그 코드 리뷰 — 셀프 머지 게이트로 인해 COMMENT only. P4-3 후 직접 워크스페이스 (192.168.0.47 Ollama + gemma4:26b)에서 발견한 ScoreGate-RRF 호환성 버그를 source-level 정규화로 닫았습니다. 핵심 결정: 1. **per-mode score_gate 대신 score 정규화 채택** — caller (CLI / eval / TUI)가 mode마다 다른 threshold를 알 필요 없게 source에서 closure. 2. **rank ordering invariant 보존** — 모든 score를 같은 양수 상수 `2/(k_rrf+1)`로 나누므로 sort + tiebreak byte-identical. 단위 테스트 + 통합 스냅샷 모두 그대로 통과 (스냅샷은 presence check 기반이라 absolute 값에 종속되지 않았음). 3. **`Answer.retrieval.score_gate`이 다시 의미 있는 값** — `[0,1]` 범위에서 비교 가능하므로 user-facing config로서 정상 동작. 검증: `/tmp/kb-smoke/`에서 default `score_gate = 0.05`로 4개 시나리오 모두 통과. 한국어/영어/cross-language grounded + out-of-corpus refusal. 문서 측면: - `tasks/HOTFIXES.md` 신규 — post-merge 버그 dated 로그. 지금까지 발견된 3건 (P3-5 hotfix --config, P4-3 hotfix --config-in-ask, RRF normalize) 모두 등록. - 영향받은 5개 spec doc에 \"Risks/notes\" addendum + HOTFIXES.md cross-link. 원래 task spec은 frozen 그대로, HOTFIXES.md가 live source-of-truth. 워크스페이스 319 passed / 0 failed. clippy clean. inline 코멘트는 모두 결정에 대한 노트입니다. 머지 진행해도 됩니다. 다음은 P5-1 golden-fixture-runner.

crates/kb-search/src/hybrid.rs

						
				@@ -194,0 +195,4 @@

				        // Raw RRF: `Σ 1/(k_rrf + rank_m(c))` over the retrievers a chunk

				        // appears in. With two retrievers the raw upper bound is

				        // `2/(k_rrf + 1)` — at k_rrf=60 that's only ≈0.0328, which makes

				        // a single `config.rag.score_gate` default of 0.05 silently

claude-reviewer-01 commented

2026-05-01 16:17:24 +00:00

정규화 인자 2 / (k_rrf + 1)의 의미를 코멘트 블록으로 박아둠. 두 retriever가 각각 rank 1을 차지할 때의 이론적 max가 2/(k_rrf+1)이라는 점, 한쪽 retriever만 발견한 chunk는 ≈ 0.5에서 cap한다는 점, RRF rank ordering 보존된다는 점 모두 in-source. 미래 누군가가 "왜 2/(k_rrf+1)인가?" 질문할 때 즉시 답이 보임.

정규화 인자 `2 / (k_rrf + 1)`의 의미를 코멘트 블록으로 박아둠. 두 retriever가 각각 rank 1을 차지할 때의 이론적 max가 `2/(k_rrf+1)`이라는 점, 한쪽 retriever만 발견한 chunk는 ≈ 0.5에서 cap한다는 점, RRF rank ordering 보존된다는 점 모두 in-source. 미래 누군가가 "왜 2/(k_rrf+1)인가?" 질문할 때 즉시 답이 보임.

crates/kb-search/src/hybrid.rs

						
				@@ -194,2 +212,4 @@

				        let FusionPolicy::Rrf { k_rrf } = self.fusion;

				        let k_rrf_f = f64::from(k_rrf);

				        // Both retrievers can contribute, so the per-mode RRF max is

				        // 2 / (k_rrf + 1). Even when a chunk lands in only one mode, we

claude-reviewer-01 commented

2026-05-01 16:17:24 +00:00

RRF score를 2 / (k_rrf + 1)로 나누는 한 줄 수정으로 두 가지 문제 동시 해결: (1) fusion_score이 mode 간 incomparable했던 문제 (Lexical/Vector [0,1] vs Hybrid (0, 0.033]), (2) score_gate default 0.05이 hybrid에서 항상 trip하던 문제. 모든 score를 같은 양수 상수로 나누므로 rank ordering invariant은 보존 — sort + tiebreak 행동은 byte-identical.

RRF score를 `2 / (k_rrf + 1)`로 나누는 한 줄 수정으로 두 가지 문제 동시 해결: (1) `fusion_score`이 mode 간 incomparable했던 문제 (Lexical/Vector [0,1] vs Hybrid (0, 0.033]), (2) `score_gate` default 0.05이 hybrid에서 항상 trip하던 문제. 모든 score를 같은 양수 상수로 나누므로 rank ordering invariant은 보존 — sort + tiebreak 행동은 byte-identical.

tasks/HOTFIXES.md

						
				@@ -0,0 +1,71 @@

				---

claude-reviewer-01 commented

2026-05-01 16:17:24 +00:00

post-merge hotfix dated 로그 신규. 이번 fix가 세 번째 entry (앞서 P3-5 hotfix #20, P4-3 hotfix #24). 각 entry는 5필드 형식 (Discovered / Symptom / Root cause / Fix / Amends)으로 원래 task spec에 frozen 형태로 남기되 HOTFIXES.md가 live source of truth로 작동. 향후 P5+ 진입 시 reader가 git history diffing 없이 actual behavior를 알 수 있음.

tasks/INDEX.md

						
				@@ -83,2 +83,4 @@

				  - [p9-5 desktop-tauri](p9/p9-5-desktop-tauri.md)

				## Post-merge 핫픽스

claude-reviewer-01 commented

2026-05-01 16:17:24 +00:00

INDEX 최상위에 "Post-merge 핫픽스" 섹션 신규. workspace를 처음 보는 reader가 task spec list 옆에 HOTFIXES.md를 발견할 수 있게. spec frozen / HOTFIXES live source-of-truth 분담을 INDEX 레벨에 명문화.

tasks/p3/p3-4-hybrid-fusion.md

						
				@@ -94,3 +94,3 @@

				- `SearchMode::Hybrid`:

				  - run `lexical.search(query)` and `vector.search(query)` in sequence (fan-out is fine; not required).

				  - fuse with RRF: `score(c) = Σ_{m ∈ {lex, vec}} 1 / (k_rrf + rank_m(c))` where `k_rrf` from config (default 60). `rank_m` is 1-based; chunks not appearing in retriever `m` contribute 0.

				  - fuse with RRF: `raw(c) = Σ_{m ∈ {lex, vec}} 1 / (k_rrf + rank_m(c))` where `k_rrf` from config (default 60). `rank_m` is 1-based; chunks not appearing in retriever `m` contribute 0.

claude-reviewer-01 commented

2026-05-01 16:17:24 +00:00

원래 spec의 RRF formula에 normalization step을 명시적으로 추가 — "divide by num_retrievers / (k_rrf + 1)" + 이유 (lexical/vector와의 score 일관성 + score_gate default 호환). HOTFIXES.md로 cross-link해서 amendment 이력이 follow 가능.

원래 spec의 RRF formula에 normalization step을 명시적으로 추가 — "divide by `num_retrievers / (k_rrf + 1)`" + 이유 (lexical/vector와의 score 일관성 + score_gate default 호환). HOTFIXES.md로 cross-link해서 amendment 이력이 follow 가능.

tasks/p4/p4-3-rag-pipeline.md

						
				@@ -179,6 +179,8 @@ All tests under `cargo test -p kb-rag` with no real Ollama (mock LM only).

				## Risks / notes

				- Citation regex is STRICT `\[#(\d{1,3})\]` only. Models that emit `[1]`/`[ #1 ]`/`[foo]` are treated as no-marker → refusal. This is intentional: a noisy citation grammar lets prose `[1]` or `vec![1]` slip through as false positives, which corrupts both `grounded` and `kb eval` `citation_coverage`. The prompt template (`rag-v1`) explicitly instructs `[#번호]`.

				- **Post-merge fix (2026-05)**: kb-cli's `Cmd::Ask` arm originally called bare `kb_app::ask(query, opts)`, ignoring `--config <path>` and silently using XDG defaults (manifested as wrong model id / score_gate / data_dir surfacing in `Answer.retrieval`). Fixed by routing through `kb_app::ask_with_config(cfg, query, opts)`. See [HOTFIXES.md](../HOTFIXES.md).

claude-reviewer-01 commented

2026-05-01 16:17:24 +00:00

RAG pipeline 측 spec에 두 핫픽스 모두 노트로 추가 — (a) --config flag 무시 회귀, (b) score_gate × RRF 호환성. 두 번째는 p3-4의 정규화로 닫혔지만 RAG가 score_gate를 사용하는 caller라 RAG spec에도 cross-reference 남겼음. 미래 reader가 ScoreGate refusal을 디버깅할 때 즉시 두 가지 경로 모두 확인 가능.

RAG pipeline 측 spec에 두 핫픽스 모두 노트로 추가 — (a) `--config` flag 무시 회귀, (b) `score_gate` × RRF 호환성. 두 번째는 p3-4의 정규화로 닫혔지만 RAG가 score_gate를 사용하는 caller라 RAG spec에도 cross-reference 남겼음. 미래 reader가 ScoreGate refusal을 디버깅할 때 즉시 두 가지 경로 모두 확인 가능.