Files
kebab/tasks/p4/p4-2-ollama-adapter.md
altair823 f9714aa5cb docs(rename): kb → kebab — README, tasks/, docs/, design doc, report
마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:01:55 +00:00

6.3 KiB

phase: P4 component: kebab-llm-local (Ollama adapter) task_id: p4-2 title: "OllamaLanguageModel — streaming /api/generate" status: completed depends_on: [p4-1] unblocks: [p4-3] contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [design §7.2 LanguageModel, report §11.2 Ollama, design §6.4 [models.llm], design §0 Q5 streaming, design §10 errors]

p4-2 — Ollama adapter

Goal

Implement OllamaLanguageModel against Ollama's local HTTP API (POST /api/generate with stream: true). Honors temperature/seed for determinism, maps Ollama error states to LlmError per §10, and surfaces helpful hints (e.g., ollama pull <model>).

Why now / why this size

First real LM. Required for kebab ask to function. Isolated from RAG pipeline so swapping providers stays config-only.

Allowed dependencies

  • kebab-core
  • kebab-config
  • kebab-llm
  • reqwest = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls"] }
  • serde, serde_json
  • tracing
  • thiserror

Forbidden dependencies

  • tokio, async-std, kebab-source-fs, kebab-parse-md, kebab-normalize, kebab-chunk, kebab-store-*, kebab-embed*, kebab-search, kebab-rag, kebab-tui, kebab-desktop. (Streaming uses reqwest::blocking::Response::bytes_stream via line-delimited JSON; no async runtime needed.)

Inputs

input type source
kebab-config::Config.models.llm endpoint, model, context, temperature, seed runtime
GenerateRequest kebab_core::GenerateRequest RAG pipeline
Ollama HTTP server (local) http://127.0.0.1:11434 external process

Outputs

output type downstream
streaming TokenChunk iterator per §7.2 kebab-rag
ModelRef { id, provider="ollama", dimensions=None } Answer.model

Public surface (signatures only — no new types)

pub struct OllamaLanguageModel { /* internal: reqwest::blocking::Client + config */ }

impl OllamaLanguageModel {
    pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
}

impl kebab_core::LanguageModel for OllamaLanguageModel {
    fn model_ref(&self) -> kebab_core::ModelRef;
    fn context_tokens(&self) -> usize;
    fn generate_stream(&self, req: kebab_core::GenerateRequest)
        -> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<kebab_core::TokenChunk>> + Send>>;
}

Behavior contract

  • HTTP: POST {endpoint}/api/generate with body
    {
      "model": "<config.models.llm.model>",
      "prompt": "<system + '\n\n' + user>",
      "stream": true,
      "options": {
        "temperature": <config.temperature ?? req.temperature ?? 0.0>,
        "seed":        <config.seed ?? req.seed ?? 0>,
        "num_ctx":     <config.context_tokens>,
        "stop":        <req.stop>
      }
    }
    
  • Response is line-delimited JSON. Each line:
    • {"response": "...", "done": false} → emit TokenChunk::Token(text)
    • {"response": "", "done": true, "prompt_eval_count": p, "eval_count": c, "total_duration": ns, ...} → emit final TokenChunk::Done { finish_reason: Stop, usage: TokenUsage { prompt_tokens: p, completion_tokens: c, latency_ms: total_duration / 1_000_000 } }.
  • HTTP errors:
    • connection refused → LlmError::Unreachable, anyhow message includes hint: ensure 'ollama serve' is running and reachable at <endpoint>.
    • 404 with model "<id>" not foundLlmError::ModelNotPulled(model_id), hint ollama pull <model_id>.
    • timeouts → LlmError::Timeout.
    • other 4xx/5xx → LlmError::Stream(body).
  • UTF-8 boundary: buffer incomplete byte sequences across stream lines before emitting TokenChunk::Token.
  • Determinism: with temperature=0 and fixed seed, Ollama's output is reproducible (modulo nondeterminism in the model itself); tests that verify determinism use a fixed seed and may rely on aggregate hash with tolerance, NOT byte equality.
  • model_ref().provider = "ollama", dimensions = None.
  • Reachability check: OllamaLanguageModel::new does NOT eagerly hit the network; first failure surfaces on generate_stream. Use kebab doctor (separate task) to probe.

Storage / wire effects

  • Reads/writes only the local HTTP socket. No DB or filesystem effects.

Test plan

kind description fixture / data
unit construction with default config returns expected ModelRef inline
unit streamed line {"response":"hi","done":false} followed by {"done":true,...} produces 2 chunks then Done mocked via wiremock or tiny_http
unit UTF-8 splits across two HTTP chunks reassemble correctly mocked HTTP
unit unreachable endpoint → LlmError::Unreachable with hint mocked (closed port)
unit 404 missing model → LlmError::ModelNotPulled with hint mocked HTTP
unit concatenation of streamed tokens equals server's full text mocked HTTP
determinism identical request + temperature=0 + seed=0 produces identical token stream against mock mocked HTTP
#[ignore] integration real Ollama on localhost:11434 with qwen2.5:14b-instruct produces non-empty output requires user opt-in

All non-ignored tests under cargo test -p kebab-llm-local. Real-LM integration runs via cargo test -p kebab-llm-local -- --ignored.

Definition of Done

  • cargo check -p kebab-llm-local passes
  • cargo test -p kebab-llm-local passes (mocked tests; real LM behind #[ignore])
  • No async runtime present (uses reqwest::blocking)
  • No imports outside Allowed dependencies
  • PR links design §11.2, §0 Q5, §10

Out of scope

  • llama.cpp / candle adapters (P+).
  • Embedding via Ollama's /api/embed endpoint (alternate adapter inside kebab-embed-local if requested later).
  • Cancellation / abort tokens (P+).
  • Connection pooling tuning (default reqwest::blocking is sufficient for single-user CLI).

Risks / notes

  • Ollama versions sometimes change response field names. Pin a target version range and assert on missing fields with a friendly message.
  • prompt_eval_count / eval_count may be absent on older Ollama; default to 0 and emit a warning span, do NOT fail the stream.
  • If Ollama returns a done line with done_reason: "length", map to FinishReason::Length.