fix(mcp): HOTFIX #15 — MCP ask multi_hop dispatch-divergence assertion (fixture 보강) #183

Merged
altair823 merged 2 commits from fix/hotfix-15-mcp-ask-multi-hop-flaky into main 2026-05-26 07:02:28 +00:00
Owner

요약

v0.18.0 cut 직후 첫 hotfix. PR-7 (v0.18 dogfood probe-first) 머지 후 PR-5 의 test ask_tool_routes_multi_hop_true_to_decompose_first 가 stale empty-KB dispatch contract 로 deterministic fail. test-only fix — production code 0 touch.

설계: docs/superpowers/specs/2026-05-26-hotfix-15-mcp-ask-multi-hop-flaky-spec.md (3 round OMC reviewer APPROVE)
계획: docs/superpowers/plans/2026-05-26-hotfix-15-mcp-ask-multi-hop-flaky-plan.md (3 round OMC reviewer ACCEPT-WITH-RESERVATIONS)

변경 사항

minimal_config

cfg.rag.score_gate = 0.0 추가 (probe 두 번째 gate 우회, test config isolation — production default 0.30 유지).

Fixture corpus 1 file

workspace_root/note.md: "This note is about a compound containing X and Y in detail."build_match_string 의 token_and branch (FTS5 implicit-AND) 가 compound + about + and 셋 다 매칭 필요. empirical SQLite REPL (V007 trigram DDL) 로 1 hit 확정.

신규 _multi_hop_short_circuits_when_probe_empty test (REQUIRED)

probe-empty short-circuit 의 MCP-layer wire shape pin. fresh tempfile::tempdir() (race 회피). 기대값: schema_version=answer.v1, refusal_reason=no_chunks, is_error=Some(false).

Module doc rewrite

새 contract (probe-first → probe 통과 시 decompose → LLM) 반영. 두 test 가 각각 pin 하는 contract enumerate.

HOTFIXES.md 신규 dated entry

## 2026-05-25 — fb-41 pre-v0.18 dogfood ... 직전 에 ## 2026-05-26 — HOTFIX #15 ... level-1 dated entry 추가 (date-top convention). Symptom / Root cause / Action / Amends 4-block.

OMC team review process

Phase Reviewers Rounds Verdict
spec critic + verifier 1 APPROVE (light revision)
plan critic-plan + verifier-plan 3 ACCEPT-WITH-RESERVATIONS
implementation executor (opus) 1 DONE

3 round 누적 발견:

Round CRITICAL MAJOR MINOR NIT
Round 1 spec 0 0 0 0 (APPROVE 3 notes)
Round 1 plan 1 (C1 fixture FTS5 empirical false) 2 (M1/M2) 2 2
Round 2 plan 0 2 (A1/A2 spec/plan desync) 0 5
Round 3 plan 0 0 1 (N5b fictional helper) 2

핵심: round-1 의 empirical SQLite REPL 검증으로 fixture body 의 token 조합 결함 ("about" 미포함 → 0 hits) self-reproducing 회피. 3 round 동안 spec / plan stale fragments + dispatch task 의 fictional helpers 모두 closure.

검증

  • cargo test -p kebab-mcp --test tools_call_ask_multi_hop -j 1 -- --nocapture3 passed, 0 failed.
  • cargo test -p kebab-mcp -j 1 → baseline PASS 유지.
  • cargo test --workspace --no-fail-fast -j 11306 pass, 0 failed. HOTFIX #15 known flaky 1 → 0. 신규 회귀 0.
  • cargo clippy --workspace --all-targets -j 1 -- -D warnings clean.

비범위

  • refuse_no_chunks 가 multi-hop caller 일 때 PROMPT_TEMPLATE_VERSION_MULTI_HOP stamp — wire surface implication, 별 task (spec §6).
  • ask_multi_hop probe cost 최적화 — v0.19.0 perf 별 task.
  • Probe step 별 함수 추출 (refactor) — v0.19.0 별 task.

시험 항목 (Test Plan)

  • cargo test -p kebab-mcp --test tools_call_ask_multi_hop -j 1 — 3/3 pass.
  • cargo test --workspace --no-fail-fast -j 1 — 1306 pass, 0 failed.
  • cargo clippy --workspace --all-targets -j 1 -- -D warnings clean.
  • Production code touch 0.
  • Wire schema / Cargo.toml version 0 변경.

Assisted-by: Claude Code

## 요약 v0.18.0 cut 직후 첫 hotfix. PR-7 (v0.18 dogfood probe-first) 머지 후 PR-5 의 test `ask_tool_routes_multi_hop_true_to_decompose_first` 가 stale empty-KB dispatch contract 로 deterministic fail. **test-only fix** — production code 0 touch. 설계: docs/superpowers/specs/2026-05-26-hotfix-15-mcp-ask-multi-hop-flaky-spec.md (3 round OMC reviewer APPROVE) 계획: docs/superpowers/plans/2026-05-26-hotfix-15-mcp-ask-multi-hop-flaky-plan.md (3 round OMC reviewer ACCEPT-WITH-RESERVATIONS) ## 변경 사항 ### minimal_config `cfg.rag.score_gate = 0.0` 추가 (probe 두 번째 gate 우회, test config isolation — production default 0.30 유지). ### Fixture corpus 1 file `workspace_root/note.md`: `"This note is about a compound containing X and Y in detail."` — `build_match_string` 의 token_and branch (FTS5 implicit-AND) 가 `compound` + `about` + `and` 셋 다 매칭 필요. empirical SQLite REPL (V007 trigram DDL) 로 1 hit 확정. ### 신규 `_multi_hop_short_circuits_when_probe_empty` test (REQUIRED) probe-empty short-circuit 의 MCP-layer wire shape pin. fresh `tempfile::tempdir()` (race 회피). 기대값: `schema_version=answer.v1`, `refusal_reason=no_chunks`, `is_error=Some(false)`. ### Module doc rewrite 새 contract (probe-first → probe 통과 시 decompose → LLM) 반영. 두 test 가 각각 pin 하는 contract enumerate. ### HOTFIXES.md 신규 dated entry `## 2026-05-25 — fb-41 pre-v0.18 dogfood ...` 직전 에 `## 2026-05-26 — HOTFIX #15 ...` level-1 dated entry 추가 (date-top convention). Symptom / Root cause / Action / Amends 4-block. ## OMC team review process | Phase | Reviewers | Rounds | Verdict | |---|---|---|---| | spec | critic + verifier | 1 | APPROVE (light revision) | | plan | critic-plan + verifier-plan | 3 | ACCEPT-WITH-RESERVATIONS | | implementation | executor (opus) | 1 | DONE | 3 round 누적 발견: | Round | CRITICAL | MAJOR | MINOR | NIT | |---|---|---|---|---| | Round 1 spec | 0 | 0 | 0 | 0 (APPROVE 3 notes) | | Round 1 plan | 1 (C1 fixture FTS5 empirical false) | 2 (M1/M2) | 2 | 2 | | Round 2 plan | 0 | 2 (A1/A2 spec/plan desync) | 0 | 5 | | Round 3 plan | 0 | 0 | 1 (N5b fictional helper) | 2 | 핵심: round-1 의 empirical SQLite REPL 검증으로 fixture body 의 token 조합 결함 (`"about"` 미포함 → 0 hits) self-reproducing 회피. 3 round 동안 spec / plan stale fragments + dispatch task 의 fictional helpers 모두 closure. ## 검증 - `cargo test -p kebab-mcp --test tools_call_ask_multi_hop -j 1 -- --nocapture` → **3 passed, 0 failed**. - `cargo test -p kebab-mcp -j 1` → baseline PASS 유지. - `cargo test --workspace --no-fail-fast -j 1` → **1306 pass, 0 failed**. HOTFIX #15 known flaky 1 → 0. 신규 회귀 0. - `cargo clippy --workspace --all-targets -j 1 -- -D warnings` clean. ## 비범위 - `refuse_no_chunks` 가 multi-hop caller 일 때 `PROMPT_TEMPLATE_VERSION_MULTI_HOP` stamp — wire surface implication, 별 task (spec §6). - `ask_multi_hop` probe cost 최적화 — v0.19.0 perf 별 task. - Probe step 별 함수 추출 (refactor) — v0.19.0 별 task. ## 시험 항목 (Test Plan) - [x] cargo test -p kebab-mcp --test tools_call_ask_multi_hop -j 1 — 3/3 pass. - [x] cargo test --workspace --no-fail-fast -j 1 — 1306 pass, 0 failed. - [x] cargo clippy --workspace --all-targets -j 1 -- -D warnings clean. - [x] Production code touch 0. - [x] Wire schema / Cargo.toml version 0 변경. Assisted-by: Claude Code
altair823 added 2 commits 2026-05-26 06:55:09 +00:00
PR-7 (v0.18 dogfood probe-first) 머지 후 PR-5 의 test `ask_tool_routes_multi_hop_true_to_decompose_first` 가 stale empty-KB contract 로 deterministic fail. test-only fix — production code 0 touch.

- `minimal_config`: `score_gate = 0.0` (probe 의 second gate `top_score < score_gate` 우회, test config isolation).
- fixture `workspace_root/note.md`: "This note is about a compound containing X and Y in detail." — build_match_string 의 token_and branch (FTS5 implicit-AND) 가 `compound` + `about` + `and` 셋 다 매칭 필요. empirical SQLite REPL (V007 trigram DDL) 로 1 hit 확정.
- 기존 assertion 보존, single-pass branch 도 query "anything" 으로 fixture 미매칭 → NoChunks refusal 유지.
- 신규 `_multi_hop_short_circuits_when_probe_empty` test (REQUIRED — round-1 critic HIGH + verifier 격상): probe-empty short-circuit 의 MCP-layer wire shape pin (kebab-rag::multi_hop_empty_probe_pool_refuses_before_any_llm_call 은 RAG-layer 만 pin, MCP-layer 안전망 부재).
- module doc 갱신: 두 test 가 각각 pin 하는 contract enumerate. inline 주석 (line 94-101) 도 새 contract 정합.
- HOTFIXES.md 신규 dated entry \`## 2026-05-26 — HOTFIX #15 ...\` (date-top convention).

검증: cargo test --workspace -j 1 — 회귀 0 (known flaky 1 → 0). cargo clippy --workspace --all-targets -j 1 -- -D warnings clean.

Wire / behavior / version cascade: 0.

Refs: docs/superpowers/specs/2026-05-26-hotfix-15-mcp-ask-multi-hop-flaky-spec.md (review 3 rounds APPROVE)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OMC team `hotfix-15-mcp-flaky` 의 spec + plan 작성 + 리뷰 산출물.

- spec: analyst 가 진단 (root cause = PR-7 probe-first 가 PR-5 test 의 stale empty-KB contract 와 mismatch) + Option A 권장 (test-only fix). 3 round review (critic + verifier): CRITICAL C1 (fixture/query FTS5 0 hits) + MAJOR M1/M2 + 등 closure.
- plan: planner 가 7 steps + subagent dispatch task 작성. 3 round review (critic-plan + verifier-plan): empirical SQLite REPL 검증, level-1 dated entry placement, actual KebabHandler/KebabAppState pattern 정합.

implementation = 429287f commit (executor).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claude-reviewer-01 approved these changes 2026-05-26 06:55:42 +00:00
claude-reviewer-01 left a comment
Member

회차 1 — HOTFIX #15 PR 검토.

OMC team hotfix-15-mcp-flaky 의 3 round spec/plan review (critic + verifier + critic-plan + verifier-plan) 결과 ACCEPT-WITH-RESERVATIONS 후 executor 의 implementation. 회차 1 self-review — production-quality:

칭찬 (산문):

  1. empirical SQLite REPL 검증의 가치 입증 — round-1 critic-plan 의 CRITICAL C1 (fixture+query 의 FTS5 0 hits self-reproducing 결함) 가 3 reviewer 가 놓친 결함 을 SQLite REPL + V007 trigram DDL + build_match_string 의 token_and branch FTS5 implicit-AND 분석으로 발견. 이론적 single-token 분석 → empirical 다중-token 검증의 process upgrade.

  2. test-only fix 의 minimum-diff 원칙 준수 — production code (kebab-rag / kebab-mcp/src / kebab-app) touch 0. wire schema 0. Cargo.toml version cascade 0. PR-7 의 probe-first dogfood fix 가 의도된 production 동작 그대로.

  3. 신규 _multi_hop_short_circuits_when_probe_empty test 의 layer separationkebab-rag::multi_hop_empty_probe_pool_refuses_before_any_llm_call 가 RAG-layer 만 pin, MCP-layer wire shape 안전망 부재. 새 test 가 MCP-layer 추가 — round-1 critic HIGH + verifier 격상의 정당화.

  4. 3 round 의 누적 closure 매트릭스 명확 — CRITICAL 1 + MAJOR 4 (M1/M2/A1/A2) + MINOR 1 (N5b) + NIT 7. 모두 closure 또는 (NIT 일부) defer. round-by-round 발견 → fix → empirical 검증의 traceability.

  5. HOTFIXES.md 의 date-top convention 정합## YYYY-MM-DD level-1 dated entry pattern. 기존 4 entry 들의 convention 일관.

  6. module doc + inline 주석 rewrite 의 contract consistency — PR-7 의 새 dispatch shape (probe-first → probe 통과 시 decompose → LLM) 가 doc 과 test 의 assertion 양쪽에 동기. 미래 작업자가 single source 로 contract 추적 가능.

추가 actionable 없음. 1306 tests pass (HOTFIX #15 known flaky 1 → 0) + clippy clean + production behavior 0 변경.

머지 OK. 다음 step: S3 nli_model_unavailable consistent fail root cause 진단 (HOTFIXES 의 fb-41 PR-9 closure entry 의 S3 follow-up subsection).

회차 1 — HOTFIX #15 PR 검토. OMC team `hotfix-15-mcp-flaky` 의 3 round spec/plan review (critic + verifier + critic-plan + verifier-plan) 결과 ACCEPT-WITH-RESERVATIONS 후 executor 의 implementation. 회차 1 self-review — production-quality: 칭찬 (산문): 1. **empirical SQLite REPL 검증의 가치 입증** — round-1 critic-plan 의 CRITICAL C1 (fixture+query 의 FTS5 0 hits self-reproducing 결함) 가 *3 reviewer 가 놓친 결함* 을 SQLite REPL + V007 trigram DDL + `build_match_string` 의 token_and branch FTS5 implicit-AND 분석으로 발견. 이론적 single-token 분석 → empirical 다중-token 검증의 process upgrade. 2. **test-only fix 의 minimum-diff 원칙 준수** — production code (kebab-rag / kebab-mcp/src / kebab-app) touch 0. wire schema 0. Cargo.toml version cascade 0. PR-7 의 probe-first dogfood fix 가 의도된 production 동작 그대로. 3. **신규 `_multi_hop_short_circuits_when_probe_empty` test 의 layer separation** — `kebab-rag::multi_hop_empty_probe_pool_refuses_before_any_llm_call` 가 RAG-layer 만 pin, MCP-layer wire shape 안전망 부재. 새 test 가 MCP-layer 추가 — round-1 critic HIGH + verifier 격상의 정당화. 4. **3 round 의 누적 closure 매트릭스 명확** — CRITICAL 1 + MAJOR 4 (M1/M2/A1/A2) + MINOR 1 (N5b) + NIT 7. 모두 closure 또는 (NIT 일부) defer. round-by-round 발견 → fix → empirical 검증의 traceability. 5. **HOTFIXES.md 의 date-top convention 정합** — `## YYYY-MM-DD` level-1 dated entry pattern. 기존 4 entry 들의 convention 일관. 6. **module doc + inline 주석 rewrite 의 contract consistency** — PR-7 의 새 dispatch shape (probe-first → probe 통과 시 decompose → LLM) 가 doc 과 test 의 assertion 양쪽에 동기. 미래 작업자가 single source 로 contract 추적 가능. 추가 actionable 없음. 1306 tests pass (HOTFIX #15 known flaky 1 → 0) + clippy clean + production behavior 0 변경. 머지 OK. 다음 step: S3 `nli_model_unavailable` consistent fail root cause 진단 (HOTFIXES 의 fb-41 PR-9 closure entry 의 S3 follow-up subsection).
altair823 merged commit 1a224bf983 into main 2026-05-26 07:02:28 +00:00
altair823 deleted branch fix/hotfix-15-mcp-ask-multi-hop-flaky 2026-05-26 07:02:30 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: altair823-org/kebab#183