Files
kebab/crates
altair823 3e38a9bcb4 feat(p4-2): kb-llm-local crate — Ollama HTTP adapter (reqwest::blocking)
First real LanguageModel implementation. Wraps Ollama's local HTTP
API at POST {endpoint}/api/generate with stream:true, parses the
NDJSON streaming response into TokenChunk events, and maps Ollama
error states to a thiserror-derived LlmError with actionable hints.
Synchronous trait surface; reqwest::blocking handles the HTTP I/O.

Public surface:
- pub struct OllamaLanguageModel
- pub fn new(config: &Config) -> Result<Self> — lazy connect; never
  hits the network. Spec line 96.
- pub enum LlmError { Unreachable, ModelNotPulled, Timeout, Stream,
  Malformed }. Lives in this crate per spec — kb-core / kb-llm stay
  free of error taxonomy.
- impl kb_core::LanguageModel via re-export from kb-llm.

Streaming:
- POST body shape per spec §11.2: model, prompt = system + "\n\n" +
  user, stream: true, options { temperature, seed, num_ctx, stop }.
- OllamaStream owns BufReader<reqwest::blocking::Response>, reads
  NDJSON lines via read_until(b'\n'), parses each as
  {response, done, done_reason?, prompt_eval_count?, eval_count?,
  total_duration?}. Token frame → TokenChunk::Token; done frame →
  TokenChunk::Done { finish_reason, usage }.
- done_reason mapping: "length" → Length, "abort" → Aborted,
  "stop" / missing / unknown → Stop (forward-compat with future
  Ollama tags).
- Missing prompt_eval_count / eval_count default to 0 + tracing::warn
  (do NOT fail). Spec line 135.
- EOF without a done line synthesizes Done { Aborted, zeros } so
  downstream pipelines never deadlock waiting for a terminal frame.
- UTF-8: line-delimited framing means each JSON line is a complete
  UTF-8 sequence — no cross-HTTP-chunk codepoint splits to worry
  about. read_until accumulates whole lines regardless of how the
  underlying reqwest body chunks.

Error mapping (LlmError):
- reqwest::Error::is_connect() → Unreachable { endpoint, source }
  with hint "ensure `ollama serve` is running and reachable at
  <endpoint>".
- reqwest::Error::is_timeout() → Timeout.
- 200 with non-NDJSON first line (e.g., transparent-proxy HTML
  error page) → Stream(truncated body) — distinguished from
  Malformed by the iterator's has_emitted flag.
- 404 with body containing model_id (case-insensitive) OR English
  "model" + "not found" → ModelNotPulled(model_id) with hint
  "ollama pull <model_id>". Tightened beyond spec to survive
  Ollama localizing the error message (Korean / Japanese / etc.)
  while keeping the original English-substring fallback.
- Other 4xx/5xx → Stream(truncated body).
- Mid-stream JSON parse failure (after at least one valid line) →
  Malformed(line). Truncate all error bodies to 512 chars
  (chars-based, multibyte safe) so an nginx 500 page can't blow up
  the diagnostic.
- Trailing slash in endpoint stripped before formatting the URL —
  endpoint = "http://x:1234/" produces .../api/generate, not
  .../api//generate. Pinned by trailing-slash test.

Tokio note: reqwest 0.12's blocking feature internally wraps a
private current-thread tokio runtime, so cargo tree --edges normal
shows tokio. The auditable invariant is "no top-level tokio dep +
no async surface exposed to callers" — verified: src/ has zero
async/await/tokio::*. default-features = false drops default-tls
(rustls only) but does NOT drop tokio. Documented honestly in
Cargo.toml + lib.rs. Switching to ureq would remove tokio
entirely; deferred since reqwest is the spec's allowed dep.

Tests (24 total: 23 default + 1 ignored):
- 7 unit in src/ollama.rs: prompt-build, options-build, finish-
  reason mapping, truncate_body bounds (under_cap / over_cap_marker
  / multibyte_chars_not_bytes), 404+model-id heuristic.
- 3 in tests/construction.rs: ModelRef shape, context_tokens
  passthrough, lazy-connect proven via port-1 pointing.
- 13 in tests/streaming.rs: streamed tokens then Done, multibyte
  chars within a line round-trip (renamed from "split across
  chunks" to honestly reflect what's tested), Unreachable-with-
  hint, 4xx→Stream, 404→ModelNotPulled, concat-equals-canned,
  done_reason length / abort, missing eval counts default to zero,
  missing done_reason defaults to Stop, determinism-by-mock,
  trailing-slash endpoint, non-NDJSON 200 body → Stream not
  Malformed.
- 1 #[ignore] in tests/integration.rs: real Ollama on
  localhost:11434 with the configured model. Opt-in via
  cargo test -p kb-llm-local -- --ignored after `ollama serve`
  + `ollama pull`.

Workspace: 288 passed / 25 ignored / 0 failed. cargo clippy
--workspace --all-targets -- -D warnings clean. No native-tls,
no openssl in the dep graph.

Allowed deps respected: kb-core, kb-config, kb-llm, reqwest 0.12
(default-features=false; blocking, json, rustls-tls), serde,
serde_json, tracing, thiserror plus anyhow (forced by trait return
type). wiremock + tokio in [dev-dependencies] only.

Out of scope: llama.cpp / candle adapters (P+), Ollama embed
endpoint (separate adapter inside kb-embed-local if requested),
cancellation / abort tokens (P+), connection-pool tuning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:28:34 +00:00
..