feat(mcp): fb-41 PR-5 — MCP ask multi_hop arg + SKILL.md 안내

fb-41 multi-hop RAG 의 **PR-5** (PR-4 머지 직후). PR-4 의 CLI `--multi-hop` flag 와 sister surface — agent (Claude Code 등 MCP host) 가 `mcp__kebab__ask` 호출 시 `multi_hop: true` 옵션 사용 가능. 설계: docs/superpowers/specs/2026-05-25-p9-fb-41-multi-hop-rag-design.md 계획: docs/superpowers/plans/2026-05-25-p9-fb-41-multi-hop-rag.md (PR-5 단락) ## MCP surface - `crates/kebab-mcp/src/tools/ask.rs`: - `AskInput.multi_hop: Option<bool>` 추가. JsonSchema derive 가 tools/list 에 자동 반영 — agent capability discovery 가 새 필드 인식. - `handle()` 가 `AskOpts.multi_hop = input.multi_hop.unwrap_or(false)` — 기존 caller (필드 누락 / null) 는 single-pass 그대로. - `crates/kebab-mcp/src/lib.rs` (tools/list): - `ask` tool description 에 multi-hop 한 줄 (decompose → retrieve → synthesize, 2-5× LLM cost, per-hop trace on Answer.hops). ## SKILL.md 안내 - `integrations/claude-code/kebab/SKILL.md` 의 `mcp__kebab__ask` 절: - Input shape JSON 예제에 `multi_hop: false` 추가. - Returns 절에 `hops` (multi-hop only) 추가. - 신규 bullet (p9-fb-41) — opt-in 조건 / 비용 trade-off / 사용 케이스 (compound questions / prereq chains / cross-doc reasoning) / `Answer.hops` 의 per-hop trace shape / `multi_hop_decompose_failed` refusal 처리. ## Tests (`tests/tools_call_ask_multi_hop.rs` 신규, 2 Ollama-free pins) - `ask_tool_routes_multi_hop_true_to_decompose_first`: dispatch divergence 핀. invalid LLM endpoint (`http://127.0.0.1:1`, request_timeout_secs=2) 로 force unreachable. multi_hop=true 는 decompose 먼저 호출 → `error.v1` (code=model_unreachable) / isError=true. multi_hop=false (single-pass) 는 empty KB 에서 retrieve 먼저 → no LLM call → `answer.v1` grounded=false / isError=false. 두 shape 의 분기가 dispatch 가 실제로 다른 path 로 라우팅됨의 증거. - `ask_input_schema_advertises_multi_hop_field`: AskInput 의 JsonSchema 가 `multi_hop` property 노출 — MCP host capability discovery (tools/list 의 input schema) 회귀 핀. 기존 `tools_call_ask.rs` 의 AskInput literal 도 `multi_hop: None` 추가 (struct field 추가에 따른 minimal cascade). ## 변경 없음 - `prompt_template_version` (`rag-multi-hop-v1`) — 그대로. - TUI surface — PR-6 의 책임. - error.v1 매핑 — PR-4 의 enum reservation 그대로 (no error_wire promotion). ## 검증 - `cargo test -p kebab-mcp -j 1` — 신규 tools_call_ask_multi_hop 2 + 기존 ask / search / bulk_search / fetch / ingest / schema / doctor / tools_list / initialize 등 모두 통과 (회귀 없음). - `cargo clippy -p kebab-mcp --all-targets -j 1 -- -D warnings` clean. - 단일 crate 직렬 build (16 GB RAM 제약). ## 다음 PR - PR-6: TUI Ask 패널 multi-hop toggle (F2 / Ctrl-T) + hop trace render + cheatsheet 갱신. - v0.18.0 cut (PR-6 머지 후). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:06:28 +00:00
parent f28a422f79
commit 8a2f7affa6
5 changed files with 159 additions and 4 deletions
--- a/integrations/claude-code/kebab/SKILL.md
+++ b/integrations/claude-code/kebab/SKILL.md
@@ -80,13 +80,14 @@ Use when the user wants a synthesized answer, not a list of links.

 Input:
 ```json
-{ "query": "<question>", "session_id": "<optional-slug>", "mode": "hybrid" }
+{ "query": "<question>", "session_id": "<optional-slug>", "mode": "hybrid", "multi_hop": false }
 ```

- Returns `answer.v1`: `answer` (markdown), `citations[]`, `grounded` (bool), `refusal_reason`, `model`, `conversation_id`, `turn_index`.
+- Returns `answer.v1`: `answer` (markdown), `citations[]`, `grounded` (bool), `refusal_reason`, `model`, `conversation_id`, `turn_index`, `hops` (multi-hop only).
 - **If `grounded == false`** → KB doesn't have enough context. Don't paraphrase the refusal as if it were an answer. Tell the user the KB came up dry and fall back to your own knowledge or ask for the source.
 - For follow-up turns on the same topic, pass `session_id` (e.g. `"team-onboarding-2026-05"`) and reuse it across the conversation. Sessions persist until `kebab reset --data-only`.
 - p9-fb-40: 기본 `prompt_template_version = "rag-v2"`. 답변이 더 strict — fact 인용 시 verbatim span, 학습 지식 동원 금지, 근거 모호 시 "확실하지 않다" 출현 가능. user 가 `[rag] prompt_template_version = "rag-v1"` 명시 시 legacy 동작.
+- **p9-fb-41 `multi_hop: true`** — opt the ask into the multi-hop pipeline. The query is decomposed into sub-questions, each retrieved independently (LLM-driven decide loop, up to `rag.multi_hop_max_depth` iters), then synthesized over the merged chunk pool. Cost trade-off: 2–5× LLM calls vs. single-pass. **Use** for compound questions ("X 와 Y 의 차이는?", prereq chains, cross-doc reasoning where one chunk alone is insufficient). **Don't** for simple fact-finding (single-pass is faster + cheaper). When set, `answer.v1.hops[]` carries the per-hop trace (`{iter, kind, sub_queries[], context_chunks_added, forced_stop, llm_call_ms}`) — surface a brief "Searched in N hops" note when the trace is non-trivial. Decompose-failure (model emitted non-JSON) → `refusal_reason = "multi_hop_decompose_failed"`; treat like any other refusal.

 ### `mcp__kebab__fetch` — when you need raw text