Why: kebab search --media code 가 v0.18.0 부터 functional support 됨 (MEDIA_KINDS 외 path 로 first-class 처리, schema.v1.media_breakdown.code 존재). 그러나 SearchArgs 의 clap doc-comment + SKILL.md line 57 의 value list 가 stale — `code` 누락. user 가 --help 만 보고 code 미지원이라 오해 가능. Change: 2 surface 동기 — main.rs line 158-160 의 multi-line clap doc-comment + integrations/claude-code/kebab/SKILL.md line 57. Rust binary surface / wire schema 변경 0. Out of scope (follow-up): crates/kebab-mcp/tools/search.rs:44, crates/kebab-core/src/search.rs:32+52, crates/kebab-app/src/ ingest_progress.rs:69, crates/kebab-cli/tests/wire_schema_breakdowns.rs:35 도 동일 stale list 보유. spec ACCEPT (round 1c) 의 grep boundary 밖이므로 본 round 미포함. Refs: docs/superpowers/specs/2026-05-27-v0.20-sub1-bugfix2-spec.md §4.3 / §4.3a. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18 KiB
name, description
| name | description |
|---|---|
| kebab | Local knowledge base + RAG over the user's pre-indexed documents (wiki crawls, Markdown notes, PDFs, images). Use when answering questions that need internal context the user has indexed locally — e.g. team-specific procedures, internal runbooks, infrastructure docs, credentials registries, project-specific conventions. Also use when a domain question (Kubernetes, MLOps, internal tooling, etc.) needs additional grounding from indexed docs before answering. Do NOT use for general public questions, code in the working directory, or anything obviously outside the indexed corpus. |
kebab — local KB / RAG access
kebab indexes the user's personal documents and exposes them via lexical / vector / hybrid search and a local-LLM RAG answer. All output speaks frozen wire schema v1 — every JSON record carries a schema_version field.
Two surfaces ship: an MCP server (kebab mcp, preferred — process stays hot across calls) and a CLI (~/.cargo/bin/kebab, fallback for hosts without MCP).
When to invoke
Trigger when the user's question matches any of:
- Refers to internal/organization-specific systems, procedures, or jargon that a generic public answer would miss.
- Names a runbook or procedure the user is likely to have indexed ("how do I X", "what's our policy on Y", "where's the doc for Z").
- Domain-technical question where additional internal context (custom CRDs, internal naming, team conventions) would change the answer vs. a generic public answer.
- User explicitly references "the wiki", "내부 문서", "kb", or asks "do we have docs on X".
Skip when:
- The question is about public OSS, language semantics, or anything in the current working directory.
- The user is editing kebab's own source — that's a code task, not a KB query.
- A previous
kebabcall in this session already returnedgrounded: falseon a near-identical query (don't loop).
User-specific trigger keywords (team names, system names, internal acronyms) belong in a per-user override of this SKILL.md, not in this repo-shipped version.
MCP tools (preferred)
When kebab is registered as an MCP server (see ~/.claude/mcp.json example below), eight tools are exposed as mcp__kebab__<name>:
| tool | purpose | mutation |
|---|---|---|
mcp__kebab__search |
corpus search → search_response.v1 ({hits, next_cursor, truncated}) |
no |
mcp__kebab__bulk_search |
N queries in one call → bulk_search_response.v1 ({results, summary}) |
no |
mcp__kebab__ask |
RAG answer → answer.v1 |
no |
mcp__kebab__fetch |
verbatim text → fetch_result.v1 (chunk / doc / span) |
no |
mcp__kebab__schema |
capability discovery → schema.v1 |
no |
mcp__kebab__doctor |
health check → doctor.v1 |
no |
mcp__kebab__ingest_file |
save single file → ingest_report.v1 |
yes |
mcp__kebab__ingest_stdin |
save markdown blob → ingest_report.v1 |
yes |
Mutation tools require explicit user intent — never auto-invoke.
mcp__kebab__search — when you need the source
Use when the user wants to find a doc, or when you (the model) need raw chunks to reason from before answering.
Input:
{ "query": "<query>", "mode": "hybrid", "k": 10, "max_tokens": null, "snippet_chars": null, "cursor": null, "tags": null, "lang": null, "path_glob": null, "trust_min": null, "media": null, "ingested_after": null, "doc_id": null, "trace": null }
mode = "hybrid"is the default-correct choice. Use"vector"for semantic-only ("docs about X concept"),"lexical"for exact strings ("the literal flag--foo-bar").max_tokens/snippet_chars/cursor(p9-fb-34) — agent budget controls. Setmax_tokensto cap result wire size (chars/4 estimate); setcursorto the previous response'snext_cursorto fetch the next page.- p9-fb-36 filter inputs:
tags(string array — OR-within, AND across keys),lang(BCP-47 language code),path_glob(glob pattern matched against doc path),trust_min("primary"|"secondary"|"generated"— includes that level and above),media(string array — IN-list of"markdown"|"pdf"|"image"|"audio"|"code"|"other"; alias"md"→"markdown"),ingested_after(RFC3339 UTC string),doc_id(exact doc UUID). AND combinator across keys. Invalidingested_afteror unknowntrust_min→error.v1.code = invalid_input. Unknownmediavalue → empty hits, no error. - Output is
search_response.v1:{ hits: search_hit.v1[], next_cursor: string|null, truncated: bool }. Iterateresponse.hits[]for individual hits. Key hit fields:rank,score,doc_path,heading_path[],section_label,snippet,citation(line range / page),chunk_id. hits[].score_kind(p9-fb-38):"rrf"(hybrid) /"bm25"(lexical) /"cosine"(vector). Declares the meaning of the top-levelscore— it is a ranking signal, not a confidence value. If you need a trust threshold, useretrieval.lexical_score(BM25 raw) /retrieval.vector_score(cosine raw) instead of the top-levelscore.- Cite back to the user as
doc_path § heading_path[-1]so they can open the source. - When
truncated: true, the budget loop modified the page (snippet shortening or k reduction).next_cursoris independent — non-null whenever more hits may be reachable. Caller may widenmax_tokens(re-issue same query for fuller snippets / more hits per page) or follownext_cursor(advance through more hits) or both. Mismatched cursor (corpus_revision changed) returnserror.v1.code = stale_cursor— re-issue the search to obtain a fresh one. trace: true(p9-fb-37) — debug aid. Response carries an extratraceblock:lexical[]+vector[](pre-fusion candidates),rrf_inputs[](RRF union before final cut), andtiming(lexical_ms,vector_ms,fusion_ms,total_ms). Trace bypasses the search cache (always cold). Use sparingly — it bloats the wire response and is for diagnosing "why did this hit / not hit", not normal retrieval.hint(v0.17.0) — optional advisory string onsearch_response.v1. Present only when the result is empty AND the trimmed query is shorter than the FTS5 trigram tokenizer's 3-char minimum. Surface it to the user instead of retrying the same short query. Korean lexical search benefits most from ≥3-char keywords (충돌zero-hit,충돌은substring-hit). Raw FTS5 mode ('...') opts out — the user opted into FTS5 syntax. Vector / hybrid modes carry the field too but it's rarely triggered (semantic embeddings handle short queries).- Column scoping (post-v0.17.1 dogfood) — default lexical / hybrid matching is scoped to the
textcolumn only. Theheading_pathcolumn is indexed (path segments likeapp,srcplus JSON punctuation are 3-gram'd) but excluded from the default MATCH — past JSON noise produced false positives where a query coincidentally shared a 3-gram with a file's heading path. To deliberately search heading paths, escape into raw FTS5 mode with an explicit column filter:'heading_path : <token>'(e.g.kebab search "'heading_path : agent'"). Same applies to MCPsearch— quote the inner expression. Raw mode bypasses both column scoping and the 3-charhintshort-circuit.
mcp__kebab__bulk_search
N개 query 한 번에 — agent loop 효율 개선. 각 query 는 mcp__kebab__search 와 동일 input shape (query 필수, 나머지 optional). Cap 100.
Input:
{"queries": [{"query": "..."}, {"query": "...", "mode": "lexical"}, ...]}
Output: bulk_search_response.v1 envelope — results: [bulk_search_item.v1] (각 item = {query, response | null, error | null}) + summary: {total, succeeded, failed}. Per-query 실패는 item 의 error 에 격리, 다른 query 계속 진행.
mcp__kebab__ask — when you need the answer
Use when the user wants a synthesized answer, not a list of links.
Input:
{ "query": "<question>", "session_id": "<optional-slug>", "mode": "hybrid", "multi_hop": false }
- Returns
answer.v1:answer(markdown),citations[],grounded(bool),refusal_reason,model,conversation_id,turn_index,hops(multi-hop only). - If
grounded == false→ KB doesn't have enough context. Don't paraphrase the refusal as if it were an answer. Tell the user the KB came up dry and fall back to your own knowledge or ask for the source. - For follow-up turns on the same topic, pass
session_id(e.g."team-onboarding-2026-05") and reuse it across the conversation. Sessions persist untilkebab reset --data-only. - p9-fb-40: 기본
prompt_template_version = "rag-v2". 답변이 더 strict — fact 인용 시 verbatim span, 학습 지식 동원 금지, 근거 모호 시 "확실하지 않다" 출현 가능. user 가[rag] prompt_template_version = "rag-v1"명시 시 legacy 동작. - p9-fb-41
multi_hop: true— opt the ask into the multi-hop pipeline. The query is decomposed into sub-questions, each retrieved independently (LLM-driven decide loop, up torag.multi_hop_max_depthiters), then synthesized over the merged chunk pool. Cost trade-off: 2–5× LLM calls vs. single-pass. Use for compound questions ("X 와 Y 의 차이는?", prereq chains, cross-doc reasoning where one chunk alone is insufficient). Don't for simple fact-finding (single-pass is faster + cheaper). When set,answer.v1.hops[]carries the per-hop trace ({iter, kind, sub_queries[], context_chunks_added, forced_stop, llm_call_ms}) — surface a brief "Searched in N hops" note when the trace is non-trivial. Decompose-failure (model emitted non-JSON) →refusal_reason = "multi_hop_decompose_failed"; treat like any other refusal. - v0.18+ multi-hop NLI verification — multi-hop ask (
mcp__kebab__askwithmulti_hop: true) runs a post-synthesize NLI groundedness gate when[rag] nli_threshold > 0is set in the user's config.answer.v1.verification.nli_passed == truemeans the generated answer is entailed by the retrieved chunks (grounded);falsemeans the answer is refused withrefusal_reason = "nli_verification_failed"and theverificationblock still ships so the agent can show what entailment score was rejected. Threshold tuning: 0.5 is the production default, 0.9 is strict mode. If the NLI model download / inference fails the pipeline emitsrefusal_reason = "nli_model_unavailable"— user-side workaround is[rag] nli_threshold = 0then retry multi-hop. Single-passask(multi_hop: false / unset) is unaffected — it keeps the LLM self-judge gate as the only verification.
mcp__kebab__fetch — when you need raw text
Use after search to read the verbatim chunk text + surrounding context, or to pull a full doc / line range.
Input:
{ "kind": "chunk", "chunk_id": "<id>", "context": 2 }
{ "kind": "doc", "doc_id": "<id>", "max_tokens": 1000 }
{ "kind": "span", "doc_id": "<id>", "line_start": 1, "line_end": 5 }
chunkmode:context: Nreturns ordinal-adjacent chunks before/after for surrounding paragraphs.docmode: full normalized markdown.max_tokens(chars/4) caps the response —truncated: truewhen applied.spanmode: 1-based inclusive line range. PDF / audio docs reject aserror.v1.code = span_not_supported(usechunkmode instead — PDF chunks are page-aligned).error.v1.code = chunk_not_found/doc_not_foundare non-retryable from the same id — re-issue search to get a fresh one.
CLI fallback
If MCP tools aren't in scope (host without MCP support, or mcp.json not configured), call the CLI via Bash:
kebab search "<query>" --mode hybrid --json 2>/dev/null
kebab ask "<question>" --json 2>/dev/null
kebab ask "<question>" --session <stable-id> --json 2>/dev/null
kebab ask "<question>" --stream # ndjson answer_event.v1 on stderr, final answer.v1 on stdout
Same wire shapes as MCP. CLI pays cold start (~1-2s) per call — prefer MCP when available.
--stream (p9-fb-33) emits retrieval_done → token* → final events on stderr while the answer streams; the final stdout line is the standard answer.v1 for backwards compat. Use when you need progressive token consumption; otherwise the default non-streaming path is simpler. Refusal paths (score-gate / no-chunks) emit retrieval_done then no token/final — read stdout answer.v1 for the canonical refusal signal.
MCP host config
Register kebab mcp once in your host's MCP config. For Claude Code, edit ~/.claude/mcp.json:
{
"mcpServers": {
"kebab": {
"command": "kebab",
"args": ["mcp"]
}
}
}
Claude Code spawns kebab mcp at session start; the process stays alive across all tool calls so SQLite / Lance / fastembed are hot after the first call (~1-2s cold, sub-100ms thereafter). For Cursor / OpenAI Agents / Copilot CLI host examples plus per-tool input/output reference, see docs/mcp-usage.md in the kebab repo.
Parsing tips
- MCP tools return JSON content blocks; CLI prints one JSON value to stdout, progress / warnings to stderr. Capture stdout only:
kebab search ... --json 2>/dev/null. searchoutput can be large for broad queries. Project relevant fields when summarizing — for CLI:jq '.hits[] | {rank, doc_path, heading: .heading_path[-1], snippet}'(note:.hits[], not.[]— fb-34 wrapped the array). Use--max-tokens N(CLI) /max_tokens(MCP) to cap wire size in advance.- Pagination:
search_response.v1.next_cursoris opaque base64 — pass back as--cursor(CLI) orcursor(MCP) for the next page.nullmeans no more hits.corpus_revisionmismatch returnserror.v1.code = stale_cursor— re-issue search to obtain a fresh cursor. search_response.v1.truncated = truemeans budget forced snippet shortening or k reduction. Independent ofnext_cursor: widenmax_tokensfor fuller snippets, follownext_cursorfor more hits, or both.ask'scitations[]mirrorssearch_hit.v1minus retrieval internals — samedoc_path/citationshape.- Schema reference lives in the kebab repo at
docs/wire-schema/v1/*.schema.jsonif a field is unclear. search_hit.v1andanswer.v1.citations[]carryindexed_at(RFC3339) +stale(bool). Whenstale == true, the source doc hasn't been re-processed sinceconfig.search.stale_threshold_days. Surface this caveat to the user when summarizing — the cited snapshot may not reflect current reality.
Capability discovery
Before using streaming or multi-turn features, probe what this binary supports — call mcp__kebab__schema (or CLI kebab schema --json):
Returns schema.v1: wire.schemas (supported wire ids), capabilities (bool flags — e.g. streaming_ask, rag_multi_turn), models (version cascade 6-axis), stats (doc/chunk/asset count + last_ingest_at, plus p9-fb-37 health surface: media_breakdown per-kind doc counts (5 zero-padded keys: markdown / pdf / image / audio / other), lang_breakdown per BCP-47 lang (NULL keyed as the literal string "null"), index_bytes.{sqlite,lancedb} on-disk byte sums, stale_doc_count for docs older than config.search.stale_threshold_days). Gate streaming / session flows on capabilities.streaming_ask / capabilities.rag_multi_turn being true. Cheap call (no LLM), once per session.
Quick health check
If a call fails or returns suspicious output, call mcp__kebab__doctor (or CLI kebab doctor) first — it surfaces config-load / data-dir / Ollama-reachability problems in one line each. Don't silently retry on errors; report the doctor output.
Workflow recipes
Recipe A — user asks an internal-context question, you want grounded answer:
- Call
mcp__kebab__ask(or CLIkebab ask "<question>" --json). - If
grounded, citecitations[].doc_pathin your reply and quote theanswer(translate / condense as needed). - If
!grounded, callmcp__kebab__searchwith the same query and look at top 3 hits — sometimes content exists but RAG threshold rejected it. If hits look relevant, summarize from snippets and cite. If still nothing, tell the user.
Recipe B — domain question where internal context might exist:
- Call
mcp__kebab__searchwith key terms (cheap — no LLM). - If top hit's
scoreis low (< ~0.3) or no hits, answer from general knowledge without mentioning the KB. - If top hit is relevant, fold its content into your answer and cite
doc_path.
Recipe C — user wants to know "what's in the KB about X":
- Call
mcp__kebab__searchwith the topic. - List unique
doc_paths back to the user as a discovery surface.
Recipe D — agent fetched a web doc, save to KB:
When you've fetched a markdown article (e.g. via WebFetch) that the user might query later:
- Call
mcp__kebab__ingest_stdinwith:content: the markdown bodytitle: a stable title (article H1 or page title)source_uri: the URL you fetched from
The doc lands in <workspace.root>/_external/<hash>.md and is indexed for search / ask immediately. Subsequent calls with identical content are no-ops (content-hash dedup).
Don't loop ingest the same article — dedup makes it safe but wastes embedding cost.
For files already on disk the user references, prefer mcp__kebab__ingest_file with the path — kebab handles the copy + dedup.
Don't
- Don't auto-invoke
mcp__kebab__ingest_file/mcp__kebab__ingest_stdin/kebab ingest/kebab reset/kebab init. Those mutate state — the user must explicitly request. - Don't pass user-supplied raw text into the query without trimming — long queries (> a few hundred chars) waste embedding budget. Extract the question.
- Don't fabricate
doc_paths. If you didn't see a doc insearch/askoutput, it's not in the KB. - Don't use
kebab tuifrom a skill — it's interactive only.