Establishes the kb-llm trait crate so concrete LLM adapters (p4-2
Ollama, future llama.cpp / candle) target a stable surface. Pure re-
export of kb_core::{LanguageModel, GenerateRequest, TokenChunk,
FinishReason, TokenUsage, ModelRef} plus a feature-gated deterministic
mock for downstream RAG tests (p4-3) that need an LLM trait object
without an Ollama dependency.
MockLanguageModel (cfg(feature = "mock"), default OFF):
- Holds canned_response + canned_finish + canned_usage + (model_id,
provider, context_tokens). Pure in-memory; no I/O.
- generate_stream() honors GenerateRequest.stop: scans every non-empty
stop string against the canned response, takes the earliest byte
position (Iterator::min returns the first equal element on ties so
declaration order in req.stop wins), truncates with a direct byte-
slice (str::find returns a UTF-8 char boundary by contract).
- When a stop matches, finish_reason is overridden to Stop (matches
OpenAI / Ollama real-world behaviour); otherwise the caller's
canned_finish passes through verbatim.
- Emits one TokenChunk::Token per Unicode scalar value (char), NOT per
grapheme cluster — Hangul jamo, emoji ZWJ sequences, combining
marks split. Acceptable for trait-shape testing; real adapters MAY
combine. Documented in module docs.
- Always terminates with TokenChunk::Done { finish_reason, usage } even
if the canned response is empty. The returned iterator is a boxed
Vec<TokenChunk>::into_iter().map(Ok), trivially Send.
- Real adapters MAY return Err from generate_stream itself (e.g.
connection refused) before any chunk is yielded; the mock never does.
Documented for the trait re-exporter consumer audience.
Helpers:
- assert_finish_chunk(chunks) — asserts the last chunk is a Done.
Useful for proptests asserting trait contract over random inputs.
Tests:
- cargo test -p kb-llm (no features): 2 reexport / dyn-dispatch tests.
- cargo test -p kb-llm --features mock: 9 tests including 100-case
proptest over random Unicode strings asserting Done terminator,
char-count == streamed Token chunks, concat == canned (truncated by
stop), plus explicit cases for stop-string truncation, first-stop-
match precedence, model_ref dimensions=None invariant, finish reason
pass-through.
- All 271 workspace tests pass; clippy clean for both default and
mock-on feature configurations.
Symbol gating verified:
- cargo build --release -p kb-llm (default): nm shows zero
MockLanguageModel symbols.
- cargo build --release -p kb-llm --features mock: three trait-impl
symbols present. Spec invariant "release builds MUST NOT include
MockLanguageModel" enforced at the symbol level.
Allowed deps respected: only kb-core (path) and anyhow (workspace,
forced by trait return type). Dropped kb-config / serde / thiserror /
tracing from the spec's allowed list — they are listed as Allowed but
nothing in this skeleton crate references them, and dropping them
keeps the dependency graph slim for downstream consumers. p4-2/p4-3
will add what they need at their own dep sites.
Forbidden deps (reqwest, ureq, tokio, whisper-rs, kb-source-fs,
kb-parse-md, kb-normalize, kb-chunk, kb-store-*, kb-embed*, kb-search,
kb-rag, kb-tui, kb-desktop) all absent from cargo tree -p kb-llm.
Out of scope: real adapter (p4-2 Ollama), token counting against the
real tokenizer, server-side cancellation / abort signals (P+).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>