마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신. - 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈 path 표기 `kb_*` → `kebab_*` 포함). - 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`, `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`, `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때 같은 prefix 사용). - CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` / `kb doctor` / `kb inspect` / `kb list` / `kb eval` → `kebab <verb>`. fenced code block + 인라인 backtick 모두. - XDG paths + env vars + binary 경로 (`target/release/kb` → `target/release/kebab`) 동기화. - design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task spec 모든 reference 통일. - task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history 기록용 author 정보라 그대로 유지 (실제 git history 의 author 는 변경 불가). - `tasks/phase-5-evaluation.md` 의 `status: planned` → `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분). ## 검증 - `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]" --include="*.md"` 0 hits (task-decomposition.md 의 git author 제외). - 모든 file path reference 살아있음 (renamed file 들 모두 새 path 로 update). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.3 KiB
6.3 KiB
phase: P4
component: kebab-llm-local (Ollama adapter)
task_id: p4-2
title: "OllamaLanguageModel — streaming /api/generate"
status: completed
depends_on: [p4-1]
unblocks: [p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §7.2 LanguageModel, report §11.2 Ollama, design §6.4 [models.llm], design §0 Q5 streaming, design §10 errors]
p4-2 — Ollama adapter
Goal
Implement OllamaLanguageModel against Ollama's local HTTP API (POST /api/generate with stream: true). Honors temperature/seed for determinism, maps Ollama error states to LlmError per §10, and surfaces helpful hints (e.g., ollama pull <model>).
Why now / why this size
First real LM. Required for kebab ask to function. Isolated from RAG pipeline so swapping providers stays config-only.
Allowed dependencies
kebab-corekebab-configkebab-llmreqwest = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls"] }serde,serde_jsontracingthiserror
Forbidden dependencies
tokio,async-std,kebab-source-fs,kebab-parse-md,kebab-normalize,kebab-chunk,kebab-store-*,kebab-embed*,kebab-search,kebab-rag,kebab-tui,kebab-desktop. (Streaming usesreqwest::blocking::Response::bytes_streamvia line-delimited JSON; no async runtime needed.)
Inputs
| input | type | source |
|---|---|---|
kebab-config::Config.models.llm |
endpoint, model, context, temperature, seed | runtime |
GenerateRequest |
kebab_core::GenerateRequest |
RAG pipeline |
| Ollama HTTP server (local) | http://127.0.0.1:11434 |
external process |
Outputs
| output | type | downstream |
|---|---|---|
streaming TokenChunk iterator |
per §7.2 | kebab-rag |
ModelRef |
{ id, provider="ollama", dimensions=None } |
Answer.model |
Public surface (signatures only — no new types)
pub struct OllamaLanguageModel { /* internal: reqwest::blocking::Client + config */ }
impl OllamaLanguageModel {
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
}
impl kebab_core::LanguageModel for OllamaLanguageModel {
fn model_ref(&self) -> kebab_core::ModelRef;
fn context_tokens(&self) -> usize;
fn generate_stream(&self, req: kebab_core::GenerateRequest)
-> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<kebab_core::TokenChunk>> + Send>>;
}
Behavior contract
- HTTP:
POST {endpoint}/api/generatewith body{ "model": "<config.models.llm.model>", "prompt": "<system + '\n\n' + user>", "stream": true, "options": { "temperature": <config.temperature ?? req.temperature ?? 0.0>, "seed": <config.seed ?? req.seed ?? 0>, "num_ctx": <config.context_tokens>, "stop": <req.stop> } } - Response is line-delimited JSON. Each line:
{"response": "...", "done": false}→ emitTokenChunk::Token(text){"response": "", "done": true, "prompt_eval_count": p, "eval_count": c, "total_duration": ns, ...}→ emit finalTokenChunk::Done { finish_reason: Stop, usage: TokenUsage { prompt_tokens: p, completion_tokens: c, latency_ms: total_duration / 1_000_000 } }.
- HTTP errors:
- connection refused →
LlmError::Unreachable,anyhowmessage includeshint: ensure 'ollama serve' is running and reachable at <endpoint>. - 404 with
model "<id>" not found→LlmError::ModelNotPulled(model_id), hintollama pull <model_id>. - timeouts →
LlmError::Timeout. - other 4xx/5xx →
LlmError::Stream(body).
- connection refused →
- UTF-8 boundary: buffer incomplete byte sequences across stream lines before emitting
TokenChunk::Token. - Determinism: with
temperature=0and fixedseed, Ollama's output is reproducible (modulo nondeterminism in the model itself); tests that verify determinism use a fixed seed and may rely on aggregate hash with tolerance, NOT byte equality. model_ref().provider = "ollama",dimensions = None.- Reachability check:
OllamaLanguageModel::newdoes NOT eagerly hit the network; first failure surfaces ongenerate_stream. Usekebab doctor(separate task) to probe.
Storage / wire effects
- Reads/writes only the local HTTP socket. No DB or filesystem effects.
Test plan
| kind | description | fixture / data |
|---|---|---|
| unit | construction with default config returns expected ModelRef |
inline |
| unit | streamed line {"response":"hi","done":false} followed by {"done":true,...} produces 2 chunks then Done |
mocked via wiremock or tiny_http |
| unit | UTF-8 splits across two HTTP chunks reassemble correctly | mocked HTTP |
| unit | unreachable endpoint → LlmError::Unreachable with hint |
mocked (closed port) |
| unit | 404 missing model → LlmError::ModelNotPulled with hint |
mocked HTTP |
| unit | concatenation of streamed tokens equals server's full text | mocked HTTP |
| determinism | identical request + temperature=0 + seed=0 produces identical token stream against mock | mocked HTTP |
#[ignore] integration |
real Ollama on localhost:11434 with qwen2.5:14b-instruct produces non-empty output |
requires user opt-in |
All non-ignored tests under cargo test -p kebab-llm-local. Real-LM integration runs via cargo test -p kebab-llm-local -- --ignored.
Definition of Done
cargo check -p kebab-llm-localpassescargo test -p kebab-llm-localpasses (mocked tests; real LM behind#[ignore])- No async runtime present (uses
reqwest::blocking) - No imports outside Allowed dependencies
- PR links design §11.2, §0 Q5, §10
Out of scope
- llama.cpp / candle adapters (P+).
- Embedding via Ollama's
/api/embedendpoint (alternate adapter insidekebab-embed-localif requested later). - Cancellation / abort tokens (P+).
- Connection pooling tuning (default
reqwest::blockingis sufficient for single-user CLI).
Risks / notes
- Ollama versions sometimes change response field names. Pin a target version range and assert on missing fields with a friendly message.
prompt_eval_count/eval_countmay be absent on older Ollama; default to0and emit a warning span, do NOT fail the stream.- If Ollama returns a
doneline withdone_reason: "length", map toFinishReason::Length.