feat(mcp): fb-41 PR-5 — MCP ask multi_hop arg + SKILL.md 안내

fb-41 multi-hop RAG 의 **PR-5** (PR-4 머지 직후). PR-4 의 CLI `--multi-hop`
flag 와 sister surface — agent (Claude Code 등 MCP host) 가 `mcp__kebab__ask`
호출 시 `multi_hop: true` 옵션 사용 가능.

설계: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md
계획: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-5 단락)

## MCP surface

- `crates/kebab-mcp/src/tools/ask.rs`:
  - `AskInput.multi_hop: Option<bool>` 추가. JsonSchema derive 가 tools/list
    에 자동 반영 — agent capability discovery 가 새 필드 인식.
  - `handle()` 가 `AskOpts.multi_hop = input.multi_hop.unwrap_or(false)` —
    기존 caller (필드 누락 / null) 는 single-pass 그대로.
- `crates/kebab-mcp/src/lib.rs` (tools/list):
  - `ask` tool description 에 multi-hop 한 줄 (decompose → retrieve →
    synthesize, 2-5× LLM cost, per-hop trace on Answer.hops).

## SKILL.md 안내

- `integrations/claude-code/kebab/SKILL.md` 의 `mcp__kebab__ask` 절:
  - Input shape JSON 예제에 `multi_hop: false` 추가.
  - Returns 절에 `hops` (multi-hop only) 추가.
  - 신규 bullet (p9-fb-41) — opt-in 조건 / 비용 trade-off / 사용 케이스
    (compound questions / prereq chains / cross-doc reasoning) /
    `Answer.hops` 의 per-hop trace shape / `multi_hop_decompose_failed`
    refusal 처리.

## Tests (`tests/tools_call_ask_multi_hop.rs` 신규, 2 Ollama-free pins)

- `ask_tool_routes_multi_hop_true_to_decompose_first`: dispatch
  divergence 핀. invalid LLM endpoint (`http://127.0.0.1:1`,
  request_timeout_secs=2) 로 force unreachable. multi_hop=true 는
  decompose 먼저 호출 → `error.v1` (code=model_unreachable) /
  isError=true. multi_hop=false (single-pass) 는 empty KB 에서 retrieve
  먼저 → no LLM call → `answer.v1` grounded=false / isError=false. 두
  shape 의 분기가 dispatch 가 실제로 다른 path 로 라우팅됨의 증거.
- `ask_input_schema_advertises_multi_hop_field`: AskInput 의 JsonSchema
  가 `multi_hop` property 노출 — MCP host capability discovery
  (tools/list 의 input schema) 회귀 핀.

기존 `tools_call_ask.rs` 의 AskInput literal 도 `multi_hop: None`
추가 (struct field 추가에 따른 minimal cascade).

## 변경 없음

- `prompt_template_version` (`rag-multi-hop-v1`) — 그대로.
- TUI surface — PR-6 의 책임.
- error.v1 매핑 — PR-4 의 enum reservation 그대로 (no error_wire
  promotion).

## 검증

- `cargo test -p kebab-mcp -j 1` — 신규 tools_call_ask_multi_hop 2 +
  기존 ask / search / bulk_search / fetch / ingest / schema / doctor /
  tools_list / initialize 등 모두 통과 (회귀 없음).
- `cargo clippy -p kebab-mcp --all-targets -j 1 -- -D warnings` clean.
- 단일 crate 직렬 build (16 GB RAM 제약).

## 다음 PR

- PR-6: TUI Ask 패널 multi-hop toggle (F2 / Ctrl-T) + hop trace render +
  cheatsheet 갱신.
- v0.18.0 cut (PR-6 머지 후).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-25 09:06:28 +00:00
parent f28a422f79
commit 8a2f7affa6
5 changed files with 159 additions and 4 deletions

View File

@@ -49,7 +49,7 @@ pub fn build_tools_vec() -> Vec<Tool> {
),
Tool::new(
"ask",
"RAG question answering over the knowledge base. Returns answer.v1 JSON. Pass session_id for multi-turn context.",
"RAG question answering over the knowledge base. Returns answer.v1 JSON. Pass session_id for multi-turn context. Set multi_hop=true for compound / cross-doc questions (decompose → retrieve → synthesize; 2-5× LLM cost; per-hop trace on Answer.hops).",
schema_for_type::<tools::ask::AskInput>(),
),
Tool::new(

View File

@@ -20,6 +20,15 @@ pub struct AskInput {
pub session_id: Option<String>,
/// Optional retrieval mode override ("lexical" / "vector" / "hybrid"). Default "hybrid".
pub mode: Option<String>,
/// p9-fb-41: opt the ask into the multi-hop pipeline. Default `false`.
/// When `true`, the query is decomposed into sub-questions, each
/// retrieved independently, then synthesized over the merged
/// chunk pool. Cost trade-off: 25× LLM calls vs. single-pass.
/// Use for compound questions / cross-doc reasoning / prereq
/// chains; keep `false` for simple fact lookups. The full
/// per-hop trace (`decompose` / `decide` / `synthesize`) is
/// exposed on `Answer.hops`.
pub multi_hop: Option<bool>,
}
pub fn handle(state: &KebabAppState, input: AskInput) -> CallToolResult {
@@ -38,7 +47,7 @@ pub fn handle(state: &KebabAppState, input: AskInput) -> CallToolResult {
history: Vec::new(),
conversation_id: None,
turn_index: None,
multi_hop: false,
multi_hop: input.multi_hop.unwrap_or(false),
};
let cfg_clone = (*state.config).clone();
let result = match input.session_id {

View File

@@ -55,6 +55,7 @@ async fn ask_tool_returns_answer_v1_with_refusal_on_empty_kb() {
// Test env uses provider="none" — Hybrid would hard-error on embedding.
// Pass Lexical explicitly so the test stays functional.
mode: Some("lexical".to_string()),
multi_hop: None,
},
)
})

View File

@@ -0,0 +1,144 @@
//! p9-fb-41 PR-5: MCP `ask` tool with `multi_hop: true` argument.
//!
//! Two Ollama-free pins:
//!
//! 1. `ask_tool_routes_multi_hop_true_to_decompose_first` — multi-hop
//! dispatch differs from single-pass on dispatch shape. Single-pass
//! retrieves *first* (empty KB → `NoChunks` refusal, no LLM call,
//! `grounded=false`). Multi-hop calls *decompose first* (no
//! retrieval yet), so an empty KB + no Ollama yields `error.v1`
//! with `code=model_unreachable` — different wire shape than the
//! refusal envelope. The two surfaces' divergence is the signal
//! that the `multi_hop` arg actually routed the dispatch.
//! 2. `ask_input_schema_advertises_multi_hop_field` — `AskInput`'s
//! `JsonSchema` exposes the new field so MCP host capability
//! discovery (tools/list) renders it for agents.
//!
//! A live-Ollama end-to-end multi-hop pin lands in a follow-up
//! `#[ignore]` test (same pattern as `wire_ask_stale.rs`).
use kebab_config::Config;
use kebab_core::SourceScope;
use kebab_mcp::{KebabAppState, KebabHandler};
use rmcp::model::RawContent;
fn minimal_config(data_dir: &std::path::Path, workspace_root: &std::path::Path) -> Config {
let mut cfg = Config::defaults();
cfg.storage.data_dir = data_dir.to_string_lossy().into_owned();
cfg.storage.model_dir = data_dir.join("models").to_string_lossy().into_owned();
cfg.workspace.root = workspace_root.to_string_lossy().into_owned();
cfg.workspace.exclude.clear();
cfg.models.embedding.provider = "none".to_string();
cfg.models.embedding.dimensions = 0;
// Force the LLM endpoint to a known-unreachable port so this test
// is robust against whether a real Ollama happens to be running
// on 127.0.0.1:11434 (the developer's box; CI; etc.). Combined
// with a tight `request_timeout_secs`, the multi-hop dispatch
// surfaces `model_unreachable` quickly and deterministically.
cfg.models.llm.endpoint = "http://127.0.0.1:1".to_string();
cfg.models.llm.request_timeout_secs = 2;
cfg
}
/// The dispatch contract: with an empty KB, single-pass `ask` short-
/// circuits at retrieval (no LLM call) and returns a refusal Answer
/// (`grounded=false`, `isError=false`). Multi-hop calls *decompose
/// first*, so the same empty KB + unreachable LLM yields `error.v1`
/// with `code=model_unreachable` (`isError=true`). The divergence
/// confirms the `multi_hop` arg actually rerouted the dispatch.
#[tokio::test]
async fn ask_tool_routes_multi_hop_true_to_decompose_first() {
let dir = tempfile::tempdir().unwrap();
let data_dir = dir.path().join("data");
let workspace_root = dir.path().join("notes");
std::fs::create_dir_all(&data_dir).unwrap();
std::fs::create_dir_all(&workspace_root).unwrap();
let cfg = minimal_config(&data_dir, &workspace_root);
let scope = SourceScope {
root: workspace_root.clone(),
include: vec![],
exclude: vec![],
};
let _ = kebab_app::ingest_with_config(cfg.clone(), scope, false).unwrap();
let state = KebabAppState::new(cfg, None);
let handler = KebabHandler::new(state);
// Multi-hop branch — decompose runs first, hits the unreachable
// endpoint, MCP wraps as error.v1.
let state_mh = handler.state().clone();
let mh = tokio::task::spawn_blocking(move || {
kebab_mcp::tools::ask::handle(
&state_mh,
kebab_mcp::tools::ask::AskInput {
query: "compound about X and Y".to_string(),
session_id: None,
mode: Some("lexical".to_string()),
multi_hop: Some(true),
},
)
})
.await
.unwrap();
assert!(
mh.is_error.unwrap_or(false),
"multi_hop=true must reach the LLM (decompose first) — got {mh:?}"
);
let mh_text = match &mh.content.first().unwrap().raw {
RawContent::Text(t) => t.text.clone(),
other => panic!("expected text, got {other:?}"),
};
let mh_v: serde_json::Value = serde_json::from_str(&mh_text).unwrap();
assert_eq!(mh_v["schema_version"], "error.v1");
assert_eq!(
mh_v["code"], "model_unreachable",
"multi-hop dispatch must hit the LLM and surface model_unreachable; got {mh_v}"
);
// Single-pass branch — empty KB short-circuits at retrieve, no LLM
// call happens, refusal Answer comes back as isError=false.
let state_sp = handler.state().clone();
let sp = tokio::task::spawn_blocking(move || {
kebab_mcp::tools::ask::handle(
&state_sp,
kebab_mcp::tools::ask::AskInput {
query: "anything".to_string(),
session_id: None,
mode: Some("lexical".to_string()),
multi_hop: Some(false),
},
)
})
.await
.unwrap();
assert!(
!sp.is_error.unwrap_or(false),
"single-pass empty-KB refusal must NOT be isError — got {sp:?}"
);
let sp_text = match &sp.content.first().unwrap().raw {
RawContent::Text(t) => t.text.clone(),
other => panic!("expected text, got {other:?}"),
};
let sp_v: serde_json::Value = serde_json::from_str(&sp_text).unwrap();
assert_eq!(sp_v["schema_version"], "answer.v1");
assert_eq!(sp_v["grounded"], false);
}
/// AskInput's JSON-schema (rendered for tools/list) advertises the
/// new `multi_hop` field. Pins agent / MCP host capability discovery
/// against accidental schema-rename or omission.
#[test]
fn ask_input_schema_advertises_multi_hop_field() {
let schema = schemars::schema_for!(kebab_mcp::tools::ask::AskInput);
let v = serde_json::to_value(&schema).unwrap();
let props = v
.get("properties")
.and_then(|p| p.as_object())
.expect("AskInput schema must declare properties");
assert!(
props.contains_key("multi_hop"),
"AskInput.multi_hop must surface in the JsonSchema — got keys: {:?}",
props.keys().collect::<Vec<_>>()
);
}

View File

@@ -80,13 +80,14 @@ Use when the user wants a synthesized answer, not a list of links.
Input:
```json
{ "query": "<question>", "session_id": "<optional-slug>", "mode": "hybrid" }
{ "query": "<question>", "session_id": "<optional-slug>", "mode": "hybrid", "multi_hop": false }
```
- Returns `answer.v1`: `answer` (markdown), `citations[]`, `grounded` (bool), `refusal_reason`, `model`, `conversation_id`, `turn_index`.
- Returns `answer.v1`: `answer` (markdown), `citations[]`, `grounded` (bool), `refusal_reason`, `model`, `conversation_id`, `turn_index`, `hops` (multi-hop only).
- **If `grounded == false`** → KB doesn't have enough context. Don't paraphrase the refusal as if it were an answer. Tell the user the KB came up dry and fall back to your own knowledge or ask for the source.
- For follow-up turns on the same topic, pass `session_id` (e.g. `"team-onboarding-2026-05"`) and reuse it across the conversation. Sessions persist until `kebab reset --data-only`.
- p9-fb-40: 기본 `prompt_template_version = "rag-v2"`. 답변이 더 strict — fact 인용 시 verbatim span, 학습 지식 동원 금지, 근거 모호 시 "확실하지 않다" 출현 가능. user 가 `[rag] prompt_template_version = "rag-v1"` 명시 시 legacy 동작.
- **p9-fb-41 `multi_hop: true`** — opt the ask into the multi-hop pipeline. The query is decomposed into sub-questions, each retrieved independently (LLM-driven decide loop, up to `rag.multi_hop_max_depth` iters), then synthesized over the merged chunk pool. Cost trade-off: 25× LLM calls vs. single-pass. **Use** for compound questions ("X 와 Y 의 차이는?", prereq chains, cross-doc reasoning where one chunk alone is insufficient). **Don't** for simple fact-finding (single-pass is faster + cheaper). When set, `answer.v1.hops[]` carries the per-hop trace (`{iter, kind, sub_queries[], context_chunks_added, forced_stop, llm_call_ms}`) — surface a brief "Searched in N hops" note when the trace is non-trivial. Decompose-failure (model emitted non-JSON) → `refusal_reason = "multi_hop_decompose_failed"`; treat like any other refusal.
### `mcp__kebab__fetch` — when you need raw text