feat(p4-2): kb-llm-local crate — Ollama HTTP adapter (reqwest::blocking)

First real LanguageModel implementation. Wraps Ollama's local HTTP
API at POST {endpoint}/api/generate with stream:true, parses the
NDJSON streaming response into TokenChunk events, and maps Ollama
error states to a thiserror-derived LlmError with actionable hints.
Synchronous trait surface; reqwest::blocking handles the HTTP I/O.

Public surface:
- pub struct OllamaLanguageModel
- pub fn new(config: &Config) -> Result<Self> — lazy connect; never
  hits the network. Spec line 96.
- pub enum LlmError { Unreachable, ModelNotPulled, Timeout, Stream,
  Malformed }. Lives in this crate per spec — kb-core / kb-llm stay
  free of error taxonomy.
- impl kb_core::LanguageModel via re-export from kb-llm.

Streaming:
- POST body shape per spec §11.2: model, prompt = system + "\n\n" +
  user, stream: true, options { temperature, seed, num_ctx, stop }.
- OllamaStream owns BufReader<reqwest::blocking::Response>, reads
  NDJSON lines via read_until(b'\n'), parses each as
  {response, done, done_reason?, prompt_eval_count?, eval_count?,
  total_duration?}. Token frame → TokenChunk::Token; done frame →
  TokenChunk::Done { finish_reason, usage }.
- done_reason mapping: "length" → Length, "abort" → Aborted,
  "stop" / missing / unknown → Stop (forward-compat with future
  Ollama tags).
- Missing prompt_eval_count / eval_count default to 0 + tracing::warn
  (do NOT fail). Spec line 135.
- EOF without a done line synthesizes Done { Aborted, zeros } so
  downstream pipelines never deadlock waiting for a terminal frame.
- UTF-8: line-delimited framing means each JSON line is a complete
  UTF-8 sequence — no cross-HTTP-chunk codepoint splits to worry
  about. read_until accumulates whole lines regardless of how the
  underlying reqwest body chunks.

Error mapping (LlmError):
- reqwest::Error::is_connect() → Unreachable { endpoint, source }
  with hint "ensure `ollama serve` is running and reachable at
  <endpoint>".
- reqwest::Error::is_timeout() → Timeout.
- 200 with non-NDJSON first line (e.g., transparent-proxy HTML
  error page) → Stream(truncated body) — distinguished from
  Malformed by the iterator's has_emitted flag.
- 404 with body containing model_id (case-insensitive) OR English
  "model" + "not found" → ModelNotPulled(model_id) with hint
  "ollama pull <model_id>". Tightened beyond spec to survive
  Ollama localizing the error message (Korean / Japanese / etc.)
  while keeping the original English-substring fallback.
- Other 4xx/5xx → Stream(truncated body).
- Mid-stream JSON parse failure (after at least one valid line) →
  Malformed(line). Truncate all error bodies to 512 chars
  (chars-based, multibyte safe) so an nginx 500 page can't blow up
  the diagnostic.
- Trailing slash in endpoint stripped before formatting the URL —
  endpoint = "http://x:1234/" produces .../api/generate, not
  .../api//generate. Pinned by trailing-slash test.

Tokio note: reqwest 0.12's blocking feature internally wraps a
private current-thread tokio runtime, so cargo tree --edges normal
shows tokio. The auditable invariant is "no top-level tokio dep +
no async surface exposed to callers" — verified: src/ has zero
async/await/tokio::*. default-features = false drops default-tls
(rustls only) but does NOT drop tokio. Documented honestly in
Cargo.toml + lib.rs. Switching to ureq would remove tokio
entirely; deferred since reqwest is the spec's allowed dep.

Tests (24 total: 23 default + 1 ignored):
- 7 unit in src/ollama.rs: prompt-build, options-build, finish-
  reason mapping, truncate_body bounds (under_cap / over_cap_marker
  / multibyte_chars_not_bytes), 404+model-id heuristic.
- 3 in tests/construction.rs: ModelRef shape, context_tokens
  passthrough, lazy-connect proven via port-1 pointing.
- 13 in tests/streaming.rs: streamed tokens then Done, multibyte
  chars within a line round-trip (renamed from "split across
  chunks" to honestly reflect what's tested), Unreachable-with-
  hint, 4xx→Stream, 404→ModelNotPulled, concat-equals-canned,
  done_reason length / abort, missing eval counts default to zero,
  missing done_reason defaults to Stop, determinism-by-mock,
  trailing-slash endpoint, non-NDJSON 200 body → Stream not
  Malformed.
- 1 #[ignore] in tests/integration.rs: real Ollama on
  localhost:11434 with the configured model. Opt-in via
  cargo test -p kb-llm-local -- --ignored after `ollama serve`
  + `ollama pull`.

Workspace: 288 passed / 25 ignored / 0 failed. cargo clippy
--workspace --all-targets -- -D warnings clean. No native-tls,
no openssl in the dep graph.

Allowed deps respected: kb-core, kb-config, kb-llm, reqwest 0.12
(default-features=false; blocking, json, rustls-tls), serde,
serde_json, tracing, thiserror plus anyhow (forced by trait return
type). wiremock + tokio in [dev-dependencies] only.

Out of scope: llama.cpp / candle adapters (P+), Ollama embed
endpoint (separate adapter inside kb-embed-local if requested),
cancellation / abort tokens (P+), connection-pool tuning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-01 14:28:34 +00:00
parent 9ceabebf38
commit 3e38a9bcb4
9 changed files with 1356 additions and 1 deletions

80
Cargo.lock generated
View File

@@ -418,6 +418,16 @@ dependencies = [
"stable_deref_trait",
]
[[package]]
name = "assert-json-diff"
version = "2.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "47e4f2b81832e72834d7518d8487a0396a28cc408186a2e8854c0f98011faf12"
dependencies = [
"serde",
"serde_json",
]
[[package]]
name = "async-channel"
version = "2.5.0"
@@ -683,7 +693,7 @@ version = "3.9.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "519bd3116aeeb42d5372c29d982d16d0170d3d4a5ed85fc7dd91642ffff3c67c"
dependencies = [
"darling 0.20.11",
"darling 0.21.3",
"ident_case",
"prettyplease",
"proc-macro2",
@@ -1854,6 +1864,24 @@ dependencies = [
"sqlparser",
]
[[package]]
name = "deadpool"
version = "0.12.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0be2b1d1d6ec8d846f05e137292d0b89133caf95ef33695424c09568bdd39b1b"
dependencies = [
"deadpool-runtime",
"lazy_static",
"num_cpus",
"tokio",
]
[[package]]
name = "deadpool-runtime"
version = "0.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "092966b41edc516079bdf31ec78a2e0588d1d0c08f78b91d8307215928642b2b"
[[package]]
name = "deepsize"
version = "0.2.0"
@@ -2789,6 +2817,12 @@ version = "1.10.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6dbf3de79e51f3d586ab4cb9d5c3e2c14aa28ed23d180cf89b4df0454a69cc87"
[[package]]
name = "httpdate"
version = "1.0.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "df3b46402a9d5adb4c86a0cf463f42e19994e3ee891101b1841f30a545cb49a9"
[[package]]
name = "humantime"
version = "2.3.0"
@@ -2809,6 +2843,7 @@ dependencies = [
"http",
"http-body",
"httparse",
"httpdate",
"itoa",
"pin-project-lite",
"smallvec",
@@ -2830,6 +2865,7 @@ dependencies = [
"tokio",
"tokio-rustls",
"tower-service",
"webpki-roots 1.0.7",
]
[[package]]
@@ -3448,6 +3484,23 @@ dependencies = [
"proptest",
]
[[package]]
name = "kb-llm-local"
version = "0.1.0"
dependencies = [
"anyhow",
"kb-config",
"kb-core",
"kb-llm",
"reqwest",
"serde",
"serde_json",
"thiserror 2.0.18",
"tokio",
"tracing",
"wiremock",
]
[[package]]
name = "kb-normalize"
version = "0.1.0"
@@ -5855,6 +5908,7 @@ dependencies = [
"base64 0.22.1",
"bytes",
"encoding_rs",
"futures-channel",
"futures-core",
"futures-util",
"h2",
@@ -5892,6 +5946,7 @@ dependencies = [
"wasm-bindgen-futures",
"wasm-streams",
"web-sys",
"webpki-roots 1.0.7",
]
[[package]]
@@ -8034,6 +8089,29 @@ dependencies = [
"memchr",
]
[[package]]
name = "wiremock"
version = "0.6.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "08db1edfb05d9b3c1542e521aea074442088292f00b5f28e435c714a98f85031"
dependencies = [
"assert-json-diff",
"base64 0.22.1",
"deadpool",
"futures",
"http",
"http-body-util",
"hyper",
"hyper-util",
"log",
"once_cell",
"regex",
"serde",
"serde_json",
"tokio",
"url",
]
[[package]]
name = "wit-bindgen"
version = "0.46.0"