Establishes the kb-llm trait crate so concrete LLM adapters (p4-2
Ollama, future llama.cpp / candle) target a stable surface. Pure re-
export of kb_core::{LanguageModel, GenerateRequest, TokenChunk,
FinishReason, TokenUsage, ModelRef} plus a feature-gated deterministic
mock for downstream RAG tests (p4-3) that need an LLM trait object
without an Ollama dependency.
MockLanguageModel (cfg(feature = "mock"), default OFF):
- Holds canned_response + canned_finish + canned_usage + (model_id,
provider, context_tokens). Pure in-memory; no I/O.
- generate_stream() honors GenerateRequest.stop: scans every non-empty
stop string against the canned response, takes the earliest byte
position (Iterator::min returns the first equal element on ties so
declaration order in req.stop wins), truncates with a direct byte-
slice (str::find returns a UTF-8 char boundary by contract).
- When a stop matches, finish_reason is overridden to Stop (matches
OpenAI / Ollama real-world behaviour); otherwise the caller's
canned_finish passes through verbatim.
- Emits one TokenChunk::Token per Unicode scalar value (char), NOT per
grapheme cluster — Hangul jamo, emoji ZWJ sequences, combining
marks split. Acceptable for trait-shape testing; real adapters MAY
combine. Documented in module docs.
- Always terminates with TokenChunk::Done { finish_reason, usage } even
if the canned response is empty. The returned iterator is a boxed
Vec<TokenChunk>::into_iter().map(Ok), trivially Send.
- Real adapters MAY return Err from generate_stream itself (e.g.
connection refused) before any chunk is yielded; the mock never does.
Documented for the trait re-exporter consumer audience.
Helpers:
- assert_finish_chunk(chunks) — asserts the last chunk is a Done.
Useful for proptests asserting trait contract over random inputs.
Tests:
- cargo test -p kb-llm (no features): 2 reexport / dyn-dispatch tests.
- cargo test -p kb-llm --features mock: 9 tests including 100-case
proptest over random Unicode strings asserting Done terminator,
char-count == streamed Token chunks, concat == canned (truncated by
stop), plus explicit cases for stop-string truncation, first-stop-
match precedence, model_ref dimensions=None invariant, finish reason
pass-through.
- All 271 workspace tests pass; clippy clean for both default and
mock-on feature configurations.
Symbol gating verified:
- cargo build --release -p kb-llm (default): nm shows zero
MockLanguageModel symbols.
- cargo build --release -p kb-llm --features mock: three trait-impl
symbols present. Spec invariant "release builds MUST NOT include
MockLanguageModel" enforced at the symbol level.
Allowed deps respected: only kb-core (path) and anyhow (workspace,
forced by trait return type). Dropped kb-config / serde / thiserror /
tracing from the spec's allowed list — they are listed as Allowed but
nothing in this skeleton crate references them, and dropping them
keeps the dependency graph slim for downstream consumers. p4-2/p4-3
will add what they need at their own dep sites.
Forbidden deps (reqwest, ureq, tokio, whisper-rs, kb-source-fs,
kb-parse-md, kb-normalize, kb-chunk, kb-store-*, kb-embed*, kb-search,
kb-rag, kb-tui, kb-desktop) all absent from cargo tree -p kb-llm.
Out of scope: real adapter (p4-2 Ollama), token counting against the
real tokenizer, server-side cancellation / abort signals (P+).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
118 lines
4.7 KiB
Rust
118 lines
4.7 KiB
Rust
//! Deterministic mock language model for downstream tests.
|
|
//!
|
|
//! Compiled only when the `mock` feature is enabled. Default builds
|
|
//! (`cargo build --release -p kb-llm`) MUST NOT contain the `MockLanguageModel`
|
|
//! symbol — verifiable by symbol scan (`nm`/`cargo bloat`).
|
|
//!
|
|
//! ## Streaming contract
|
|
//!
|
|
//! For every call to [`MockLanguageModel::generate_stream`]:
|
|
//!
|
|
//! 1. The configured `canned_response` is examined for any of `req.stop`. If
|
|
//! one or more stop strings are substrings of the response, the response
|
|
//! is truncated at the **earliest byte position** of any match (i.e., the
|
|
//! first stop string to land — ties broken by the order entries appear in
|
|
//! `req.stop`, since `Iterator::min` returns the first equal element on
|
|
//! ties, breaking by `req.stop` declaration order).
|
|
//! 2. The (possibly truncated) string is iterated by Unicode scalar
|
|
//! (`str::chars()`) and each character is yielded as
|
|
//! [`TokenChunk::Token`]`(c.to_string())`. This makes streaming UTF-8 safe
|
|
//! by construction (no character is split across chunks). Emits one
|
|
//! `TokenChunk` per Unicode scalar value (`char`), not per grapheme
|
|
//! cluster — Hangul jamo, emoji ZWJ sequences, and combining marks split
|
|
//! into multiple chunks. Acceptable for trait-shape testing; real adapters
|
|
//! MAY combine.
|
|
//! 3. After all tokens, a single terminal [`TokenChunk::Done`] is yielded
|
|
//! with:
|
|
//! * `finish_reason = FinishReason::Stop` if a stop string truncated the
|
|
//! canned text — mirroring real LLM behavior, which reports Stop on
|
|
//! stop-sequence termination regardless of the configured finish.
|
|
//! * `finish_reason = canned_finish.clone()` otherwise.
|
|
//! * `usage = canned_usage.clone()` always.
|
|
//!
|
|
//! ## Non-effects
|
|
//!
|
|
//! - No network. No filesystem. No async runtime.
|
|
//! - No tokenizer. `usage.prompt_tokens` / `completion_tokens` are whatever
|
|
//! the constructor was given — the mock does not count.
|
|
|
|
use kb_core::{
|
|
FinishReason, GenerateRequest, LanguageModel, ModelRef, TokenChunk, TokenUsage,
|
|
};
|
|
|
|
/// Deterministic test double. See module docs for the streaming recipe.
|
|
pub struct MockLanguageModel {
|
|
pub model_id: String,
|
|
pub provider: String,
|
|
pub context_tokens: usize,
|
|
pub canned_response: String,
|
|
pub canned_finish: FinishReason,
|
|
pub canned_usage: TokenUsage,
|
|
}
|
|
|
|
impl MockLanguageModel {
|
|
/// Apply `req.stop` to `canned_response`. Returns `(truncated_text,
|
|
/// stop_hit)` where `stop_hit` is true iff any stop string was found.
|
|
fn apply_stop<'a>(canned: &'a str, stop: &[String]) -> (&'a str, bool) {
|
|
// Earliest byte position wins. Ties break by first occurrence in
|
|
// `stop` (Iterator::min returns the first equal element, and we
|
|
// iterate `stop` in its declared order). Empty stop strings are
|
|
// ignored — they would otherwise match at position 0 and silently
|
|
// eat the entire response.
|
|
let earliest = stop
|
|
.iter()
|
|
.filter(|s| !s.is_empty())
|
|
.filter_map(|s| canned.find(s.as_str()))
|
|
.min();
|
|
match earliest {
|
|
// `str::find` returns a UTF-8 char boundary by contract, so direct byte-slice is sound.
|
|
Some(idx) => (&canned[..idx], true),
|
|
None => (canned, false),
|
|
}
|
|
}
|
|
}
|
|
|
|
impl LanguageModel for MockLanguageModel {
|
|
fn model_ref(&self) -> ModelRef {
|
|
ModelRef {
|
|
id: self.model_id.clone(),
|
|
provider: self.provider.clone(),
|
|
// Per §3.8: `dimensions` carries the embedder's output dim and is
|
|
// intentionally None for chat models.
|
|
dimensions: None,
|
|
}
|
|
}
|
|
|
|
fn context_tokens(&self) -> usize {
|
|
self.context_tokens
|
|
}
|
|
|
|
fn generate_stream(
|
|
&self,
|
|
req: GenerateRequest,
|
|
) -> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<TokenChunk>> + Send>> {
|
|
let (truncated, stop_hit) = Self::apply_stop(&self.canned_response, &req.stop);
|
|
|
|
// Pre-materialize the full chunk sequence into an owned Vec. This
|
|
// sidesteps lifetime juggling around `&self.canned_response` inside
|
|
// a `'static` iterator and trivially gives `Send` (Vec<TokenChunk>
|
|
// is Send because TokenChunk is Send).
|
|
let mut chunks: Vec<TokenChunk> = truncated
|
|
.chars()
|
|
.map(|c| TokenChunk::Token(c.to_string()))
|
|
.collect();
|
|
|
|
let finish_reason = if stop_hit {
|
|
FinishReason::Stop
|
|
} else {
|
|
self.canned_finish.clone()
|
|
};
|
|
chunks.push(TokenChunk::Done {
|
|
finish_reason,
|
|
usage: self.canned_usage.clone(),
|
|
});
|
|
|
|
Ok(Box::new(chunks.into_iter().map(Ok)))
|
|
}
|
|
}
|