Files
kebab/integrations/claude-code/kebab/SKILL.md
th-kim0823 600c6182fc docs(fb-40): rag-v2 prompt + README + design + SKILL + INDEX
- README: [rag] prompt_template_version default rag-v2 + V2 강화 3 규칙
- design §7: rag-v2 본문 + V1 legacy note
- SKILL.md: mcp__kebab__ask 응답 행태 변화 안내
- task spec: status open → completed, design + plan 링크
- INDEX: fb-40  머지 (2026-05-10)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 19:37:28 +09:00

14 KiB

name, description
name description
kebab Local knowledge base + RAG over the user's pre-indexed documents (wiki crawls, Markdown notes, PDFs, images). Use when answering questions that need internal context the user has indexed locally — e.g. team-specific procedures, internal runbooks, infrastructure docs, credentials registries, project-specific conventions. Also use when a domain question (Kubernetes, MLOps, internal tooling, etc.) needs additional grounding from indexed docs before answering. Do NOT use for general public questions, code in the working directory, or anything obviously outside the indexed corpus.

kebab — local KB / RAG access

kebab indexes the user's personal documents and exposes them via lexical / vector / hybrid search and a local-LLM RAG answer. All output speaks frozen wire schema v1 — every JSON record carries a schema_version field.

Two surfaces ship: an MCP server (kebab mcp, preferred — process stays hot across calls) and a CLI (~/.cargo/bin/kebab, fallback for hosts without MCP).

When to invoke

Trigger when the user's question matches any of:

  • Refers to internal/organization-specific systems, procedures, or jargon that a generic public answer would miss.
  • Names a runbook or procedure the user is likely to have indexed ("how do I X", "what's our policy on Y", "where's the doc for Z").
  • Domain-technical question where additional internal context (custom CRDs, internal naming, team conventions) would change the answer vs. a generic public answer.
  • User explicitly references "the wiki", "내부 문서", "kb", or asks "do we have docs on X".

Skip when:

  • The question is about public OSS, language semantics, or anything in the current working directory.
  • The user is editing kebab's own source — that's a code task, not a KB query.
  • A previous kebab call in this session already returned grounded: false on a near-identical query (don't loop).

User-specific trigger keywords (team names, system names, internal acronyms) belong in a per-user override of this SKILL.md, not in this repo-shipped version.

MCP tools (preferred)

When kebab is registered as an MCP server (see ~/.claude/mcp.json example below), seven tools are exposed as mcp__kebab__<name>:

tool purpose mutation
mcp__kebab__search corpus search → search_response.v1 ({hits, next_cursor, truncated}) no
mcp__kebab__ask RAG answer → answer.v1 no
mcp__kebab__fetch verbatim text → fetch_result.v1 (chunk / doc / span) no
mcp__kebab__schema capability discovery → schema.v1 no
mcp__kebab__doctor health check → doctor.v1 no
mcp__kebab__ingest_file save single file → ingest_report.v1 yes
mcp__kebab__ingest_stdin save markdown blob → ingest_report.v1 yes

Mutation tools require explicit user intent — never auto-invoke.

mcp__kebab__search — when you need the source

Use when the user wants to find a doc, or when you (the model) need raw chunks to reason from before answering.

Input:

{ "query": "<query>", "mode": "hybrid", "k": 10, "max_tokens": null, "snippet_chars": null, "cursor": null, "tags": null, "lang": null, "path_glob": null, "trust_min": null, "media": null, "ingested_after": null, "doc_id": null, "trace": null }
  • mode = "hybrid" is the default-correct choice. Use "vector" for semantic-only ("docs about X concept"), "lexical" for exact strings ("the literal flag --foo-bar").
  • max_tokens / snippet_chars / cursor (p9-fb-34) — agent budget controls. Set max_tokens to cap result wire size (chars/4 estimate); set cursor to the previous response's next_cursor to fetch the next page.
  • p9-fb-36 filter inputs: tags (string array — OR-within, AND across keys), lang (BCP-47 language code), path_glob (glob pattern matched against doc path), trust_min ("primary" | "secondary" | "generated" — includes that level and above), media (string array — IN-list of "markdown" | "pdf" | "image" | "audio" | "other"; alias "md""markdown"), ingested_after (RFC3339 UTC string), doc_id (exact doc UUID). AND combinator across keys. Invalid ingested_after or unknown trust_minerror.v1.code = invalid_input. Unknown media value → empty hits, no error.
  • Output is search_response.v1: { hits: search_hit.v1[], next_cursor: string|null, truncated: bool }. Iterate response.hits[] for individual hits. Key hit fields: rank, score, doc_path, heading_path[], section_label, snippet, citation (line range / page), chunk_id.
  • hits[].score_kind (p9-fb-38): "rrf" (hybrid) / "bm25" (lexical) / "cosine" (vector). Declares the meaning of the top-level score — it is a ranking signal, not a confidence value. If you need a trust threshold, use retrieval.lexical_score (BM25 raw) / retrieval.vector_score (cosine raw) instead of the top-level score.
  • Cite back to the user as doc_path § heading_path[-1] so they can open the source.
  • When truncated: true, the budget loop modified the page (snippet shortening or k reduction). next_cursor is independent — non-null whenever more hits may be reachable. Caller may widen max_tokens (re-issue same query for fuller snippets / more hits per page) or follow next_cursor (advance through more hits) or both. Mismatched cursor (corpus_revision changed) returns error.v1.code = stale_cursor — re-issue the search to obtain a fresh one.
  • trace: true (p9-fb-37) — debug aid. Response carries an extra trace block: lexical[] + vector[] (pre-fusion candidates), rrf_inputs[] (RRF union before final cut), and timing (lexical_ms, vector_ms, fusion_ms, total_ms). Trace bypasses the search cache (always cold). Use sparingly — it bloats the wire response and is for diagnosing "why did this hit / not hit", not normal retrieval.

mcp__kebab__ask — when you need the answer

Use when the user wants a synthesized answer, not a list of links.

Input:

{ "query": "<question>", "session_id": "<optional-slug>", "mode": "hybrid" }
  • Returns answer.v1: answer (markdown), citations[], grounded (bool), refusal_reason, model, conversation_id, turn_index.
  • If grounded == false → KB doesn't have enough context. Don't paraphrase the refusal as if it were an answer. Tell the user the KB came up dry and fall back to your own knowledge or ask for the source.
  • For follow-up turns on the same topic, pass session_id (e.g. "team-onboarding-2026-05") and reuse it across the conversation. Sessions persist until kebab reset --data-only.
  • p9-fb-40: 기본 prompt_template_version = "rag-v2". 답변이 더 strict — fact 인용 시 verbatim span, 학습 지식 동원 금지, 근거 모호 시 "확실하지 않다" 출현 가능. user 가 [rag] prompt_template_version = "rag-v1" 명시 시 legacy 동작.

mcp__kebab__fetch — when you need raw text

Use after search to read the verbatim chunk text + surrounding context, or to pull a full doc / line range.

Input:

{ "kind": "chunk", "chunk_id": "<id>", "context": 2 }
{ "kind": "doc", "doc_id": "<id>", "max_tokens": 1000 }
{ "kind": "span", "doc_id": "<id>", "line_start": 1, "line_end": 5 }
  • chunk mode: context: N returns ordinal-adjacent chunks before/after for surrounding paragraphs.
  • doc mode: full normalized markdown. max_tokens (chars/4) caps the response — truncated: true when applied.
  • span mode: 1-based inclusive line range. PDF / audio docs reject as error.v1.code = span_not_supported (use chunk mode instead — PDF chunks are page-aligned).
  • error.v1.code = chunk_not_found / doc_not_found are non-retryable from the same id — re-issue search to get a fresh one.

CLI fallback

If MCP tools aren't in scope (host without MCP support, or mcp.json not configured), call the CLI via Bash:

kebab search "<query>" --mode hybrid --json 2>/dev/null
kebab ask "<question>" --json 2>/dev/null
kebab ask "<question>" --session <stable-id> --json 2>/dev/null
kebab ask "<question>" --stream  # ndjson answer_event.v1 on stderr, final answer.v1 on stdout

Same wire shapes as MCP. CLI pays cold start (~1-2s) per call — prefer MCP when available.

--stream (p9-fb-33) emits retrieval_donetoken* → final events on stderr while the answer streams; the final stdout line is the standard answer.v1 for backwards compat. Use when you need progressive token consumption; otherwise the default non-streaming path is simpler. Refusal paths (score-gate / no-chunks) emit retrieval_done then no token/final — read stdout answer.v1 for the canonical refusal signal.

MCP host config

Register kebab mcp once in your host's MCP config. For Claude Code, edit ~/.claude/mcp.json:

{
  "mcpServers": {
    "kebab": {
      "command": "kebab",
      "args": ["mcp"]
    }
  }
}

Claude Code spawns kebab mcp at session start; the process stays alive across all tool calls so SQLite / Lance / fastembed are hot after the first call (~1-2s cold, sub-100ms thereafter). For Cursor / OpenAI Agents / Copilot CLI host examples plus per-tool input/output reference, see docs/mcp-usage.md in the kebab repo.

Parsing tips

  • MCP tools return JSON content blocks; CLI prints one JSON value to stdout, progress / warnings to stderr. Capture stdout only: kebab search ... --json 2>/dev/null.
  • search output can be large for broad queries. Project relevant fields when summarizing — for CLI: jq '.hits[] | {rank, doc_path, heading: .heading_path[-1], snippet}' (note: .hits[], not .[] — fb-34 wrapped the array). Use --max-tokens N (CLI) / max_tokens (MCP) to cap wire size in advance.
  • Pagination: search_response.v1.next_cursor is opaque base64 — pass back as --cursor (CLI) or cursor (MCP) for the next page. null means no more hits. corpus_revision mismatch returns error.v1.code = stale_cursor — re-issue search to obtain a fresh cursor.
  • search_response.v1.truncated = true means budget forced snippet shortening or k reduction. Independent of next_cursor: widen max_tokens for fuller snippets, follow next_cursor for more hits, or both.
  • ask's citations[] mirrors search_hit.v1 minus retrieval internals — same doc_path / citation shape.
  • Schema reference lives in the kebab repo at docs/wire-schema/v1/*.schema.json if a field is unclear.
  • search_hit.v1 and answer.v1.citations[] carry indexed_at (RFC3339) + stale (bool). When stale == true, the source doc hasn't been re-processed since config.search.stale_threshold_days. Surface this caveat to the user when summarizing — the cited snapshot may not reflect current reality.

Capability discovery

Before using streaming or multi-turn features, probe what this binary supports — call mcp__kebab__schema (or CLI kebab schema --json):

Returns schema.v1: wire.schemas (supported wire ids), capabilities (bool flags — e.g. streaming_ask, rag_multi_turn), models (version cascade 6-axis), stats (doc/chunk/asset count + last_ingest_at, plus p9-fb-37 health surface: media_breakdown per-kind doc counts (5 zero-padded keys: markdown / pdf / image / audio / other), lang_breakdown per BCP-47 lang (NULL keyed as the literal string "null"), index_bytes.{sqlite,lancedb} on-disk byte sums, stale_doc_count for docs older than config.search.stale_threshold_days). Gate streaming / session flows on capabilities.streaming_ask / capabilities.rag_multi_turn being true. Cheap call (no LLM), once per session.

Quick health check

If a call fails or returns suspicious output, call mcp__kebab__doctor (or CLI kebab doctor) first — it surfaces config-load / data-dir / Ollama-reachability problems in one line each. Don't silently retry on errors; report the doctor output.

Workflow recipes

Recipe A — user asks an internal-context question, you want grounded answer:

  1. Call mcp__kebab__ask (or CLI kebab ask "<question>" --json).
  2. If grounded, cite citations[].doc_path in your reply and quote the answer (translate / condense as needed).
  3. If !grounded, call mcp__kebab__search with the same query and look at top 3 hits — sometimes content exists but RAG threshold rejected it. If hits look relevant, summarize from snippets and cite. If still nothing, tell the user.

Recipe B — domain question where internal context might exist:

  1. Call mcp__kebab__search with key terms (cheap — no LLM).
  2. If top hit's score is low (< ~0.3) or no hits, answer from general knowledge without mentioning the KB.
  3. If top hit is relevant, fold its content into your answer and cite doc_path.

Recipe C — user wants to know "what's in the KB about X":

  1. Call mcp__kebab__search with the topic.
  2. List unique doc_paths back to the user as a discovery surface.

Recipe D — agent fetched a web doc, save to KB:

When you've fetched a markdown article (e.g. via WebFetch) that the user might query later:

  1. Call mcp__kebab__ingest_stdin with:
    • content: the markdown body
    • title: a stable title (article H1 or page title)
    • source_uri: the URL you fetched from

The doc lands in <workspace.root>/_external/<hash>.md and is indexed for search / ask immediately. Subsequent calls with identical content are no-ops (content-hash dedup).

Don't loop ingest the same article — dedup makes it safe but wastes embedding cost.

For files already on disk the user references, prefer mcp__kebab__ingest_file with the path — kebab handles the copy + dedup.

Don't

  • Don't auto-invoke mcp__kebab__ingest_file / mcp__kebab__ingest_stdin / kebab ingest / kebab reset / kebab init. Those mutate state — the user must explicitly request.
  • Don't pass user-supplied raw text into the query without trimming — long queries (> a few hundred chars) waste embedding budget. Extract the question.
  • Don't fabricate doc_paths. If you didn't see a doc in search / ask output, it's not in the KB.
  • Don't use kebab tui from a skill — it's interactive only.