마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신. - 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈 path 표기 `kb_*` → `kebab_*` 포함). - 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`, `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`, `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때 같은 prefix 사용). - CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` / `kb doctor` / `kb inspect` / `kb list` / `kb eval` → `kebab <verb>`. fenced code block + 인라인 backtick 모두. - XDG paths + env vars + binary 경로 (`target/release/kb` → `target/release/kebab`) 동기화. - design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task spec 모든 reference 통일. - task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history 기록용 author 정보라 그대로 유지 (실제 git history 의 author 는 변경 불가). - `tasks/phase-5-evaluation.md` 의 `status: planned` → `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분). ## 검증 - `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]" --include="*.md"` 0 hits (task-decomposition.md 의 git author 제외). - 모든 file path reference 살아있음 (renamed file 들 모두 새 path 로 update). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
137 lines
6.3 KiB
Markdown
137 lines
6.3 KiB
Markdown
---
|
|
phase: P4
|
|
component: kebab-llm-local (Ollama adapter)
|
|
task_id: p4-2
|
|
title: "OllamaLanguageModel — streaming /api/generate"
|
|
status: completed
|
|
depends_on: [p4-1]
|
|
unblocks: [p4-3]
|
|
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
|
contract_sections: [design §7.2 LanguageModel, report §11.2 Ollama, design §6.4 [models.llm], design §0 Q5 streaming, design §10 errors]
|
|
---
|
|
|
|
# p4-2 — Ollama adapter
|
|
|
|
## Goal
|
|
|
|
Implement `OllamaLanguageModel` against Ollama's local HTTP API (`POST /api/generate` with `stream: true`). Honors temperature/seed for determinism, maps Ollama error states to `LlmError` per §10, and surfaces helpful hints (e.g., `ollama pull <model>`).
|
|
|
|
## Why now / why this size
|
|
|
|
First real LM. Required for `kebab ask` to function. Isolated from RAG pipeline so swapping providers stays config-only.
|
|
|
|
## Allowed dependencies
|
|
|
|
- `kebab-core`
|
|
- `kebab-config`
|
|
- `kebab-llm`
|
|
- `reqwest = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls"] }`
|
|
- `serde`, `serde_json`
|
|
- `tracing`
|
|
- `thiserror`
|
|
|
|
## Forbidden dependencies
|
|
|
|
- `tokio`, `async-std`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-tui`, `kebab-desktop`. (Streaming uses `reqwest::blocking::Response::bytes_stream` via line-delimited JSON; no async runtime needed.)
|
|
|
|
## Inputs
|
|
|
|
| input | type | source |
|
|
|-------|------|--------|
|
|
| `kebab-config::Config.models.llm` | endpoint, model, context, temperature, seed | runtime |
|
|
| `GenerateRequest` | `kebab_core::GenerateRequest` | RAG pipeline |
|
|
| Ollama HTTP server (local) | `http://127.0.0.1:11434` | external process |
|
|
|
|
## Outputs
|
|
|
|
| output | type | downstream |
|
|
|--------|------|------------|
|
|
| streaming `TokenChunk` iterator | per §7.2 | `kebab-rag` |
|
|
| `ModelRef` | `{ id, provider="ollama", dimensions=None }` | `Answer.model` |
|
|
|
|
## Public surface (signatures only — no new types)
|
|
|
|
```rust
|
|
pub struct OllamaLanguageModel { /* internal: reqwest::blocking::Client + config */ }
|
|
|
|
impl OllamaLanguageModel {
|
|
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
|
|
}
|
|
|
|
impl kebab_core::LanguageModel for OllamaLanguageModel {
|
|
fn model_ref(&self) -> kebab_core::ModelRef;
|
|
fn context_tokens(&self) -> usize;
|
|
fn generate_stream(&self, req: kebab_core::GenerateRequest)
|
|
-> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<kebab_core::TokenChunk>> + Send>>;
|
|
}
|
|
```
|
|
|
|
## Behavior contract
|
|
|
|
- HTTP: `POST {endpoint}/api/generate` with body
|
|
```json
|
|
{
|
|
"model": "<config.models.llm.model>",
|
|
"prompt": "<system + '\n\n' + user>",
|
|
"stream": true,
|
|
"options": {
|
|
"temperature": <config.temperature ?? req.temperature ?? 0.0>,
|
|
"seed": <config.seed ?? req.seed ?? 0>,
|
|
"num_ctx": <config.context_tokens>,
|
|
"stop": <req.stop>
|
|
}
|
|
}
|
|
```
|
|
- Response is line-delimited JSON. Each line:
|
|
- `{"response": "...", "done": false}` → emit `TokenChunk::Token(text)`
|
|
- `{"response": "", "done": true, "prompt_eval_count": p, "eval_count": c, "total_duration": ns, ...}` → emit final `TokenChunk::Done { finish_reason: Stop, usage: TokenUsage { prompt_tokens: p, completion_tokens: c, latency_ms: total_duration / 1_000_000 } }`.
|
|
- HTTP errors:
|
|
- connection refused → `LlmError::Unreachable`, `anyhow` message includes `hint: ensure 'ollama serve' is running and reachable at <endpoint>`.
|
|
- 404 with `model "<id>" not found` → `LlmError::ModelNotPulled(model_id)`, hint `ollama pull <model_id>`.
|
|
- timeouts → `LlmError::Timeout`.
|
|
- other 4xx/5xx → `LlmError::Stream(body)`.
|
|
- UTF-8 boundary: buffer incomplete byte sequences across stream lines before emitting `TokenChunk::Token`.
|
|
- Determinism: with `temperature=0` and fixed `seed`, Ollama's output is reproducible (modulo nondeterminism in the model itself); tests that verify determinism use a fixed seed and may rely on aggregate hash with tolerance, NOT byte equality.
|
|
- `model_ref().provider = "ollama"`, `dimensions = None`.
|
|
- Reachability check: `OllamaLanguageModel::new` does NOT eagerly hit the network; first failure surfaces on `generate_stream`. Use `kebab doctor` (separate task) to probe.
|
|
|
|
## Storage / wire effects
|
|
|
|
- Reads/writes only the local HTTP socket. No DB or filesystem effects.
|
|
|
|
## Test plan
|
|
|
|
| kind | description | fixture / data |
|
|
|------|-------------|----------------|
|
|
| unit | construction with default config returns expected `ModelRef` | inline |
|
|
| unit | streamed line `{"response":"hi","done":false}` followed by `{"done":true,...}` produces 2 chunks then Done | mocked via `wiremock` or `tiny_http` |
|
|
| unit | UTF-8 splits across two HTTP chunks reassemble correctly | mocked HTTP |
|
|
| unit | unreachable endpoint → `LlmError::Unreachable` with hint | mocked (closed port) |
|
|
| unit | 404 missing model → `LlmError::ModelNotPulled` with hint | mocked HTTP |
|
|
| unit | concatenation of streamed tokens equals server's full text | mocked HTTP |
|
|
| determinism | identical request + temperature=0 + seed=0 produces identical token stream against mock | mocked HTTP |
|
|
| `#[ignore]` integration | real Ollama on `localhost:11434` with `qwen2.5:14b-instruct` produces non-empty output | requires user opt-in |
|
|
|
|
All non-ignored tests under `cargo test -p kebab-llm-local`. Real-LM integration runs via `cargo test -p kebab-llm-local -- --ignored`.
|
|
|
|
## Definition of Done
|
|
|
|
- [ ] `cargo check -p kebab-llm-local` passes
|
|
- [ ] `cargo test -p kebab-llm-local` passes (mocked tests; real LM behind `#[ignore]`)
|
|
- [ ] No async runtime present (uses `reqwest::blocking`)
|
|
- [ ] No imports outside Allowed dependencies
|
|
- [ ] PR links design §11.2, §0 Q5, §10
|
|
|
|
## Out of scope
|
|
|
|
- llama.cpp / candle adapters (P+).
|
|
- Embedding via Ollama's `/api/embed` endpoint (alternate adapter inside `kebab-embed-local` if requested later).
|
|
- Cancellation / abort tokens (P+).
|
|
- Connection pooling tuning (default `reqwest::blocking` is sufficient for single-user CLI).
|
|
|
|
## Risks / notes
|
|
|
|
- Ollama versions sometimes change response field names. Pin a target version range and assert on missing fields with a friendly message.
|
|
- `prompt_eval_count` / `eval_count` may be absent on older Ollama; default to `0` and emit a warning span, do NOT fail the stream.
|
|
- If Ollama returns a `done` line with `done_reason: "length"`, map to `FinishReason::Length`.
|