Provide the kebab-llm crate that re-exports the LanguageModel trait and helper types (GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef), plus a MockLanguageModel for downstream tests.
Why now / why this size
kebab-rag (p4-3) consumes a LanguageModel trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond.
Allowed dependencies
kebab-core
kebab-config
serde
thiserror
tracing
[features] mock = [] — opt-in feature flag exposing MockLanguageModel. Default OFF. Release builds compile mock out entirely.
pubusekebab_core::{LanguageModel,GenerateRequest,TokenChunk,FinishReason,TokenUsage,ModelRef};/// Test-only deterministic mock. Compiled only when `mock` feature is on.
#[cfg(feature = "mock")]pubstructMockLanguageModel{pubmodel_id: String,pubprovider: String,pubcontext_tokens: usize,pubcanned_response: String,// emitted token-by-token
pubcanned_finish: kebab_core::FinishReason,pubcanned_usage: kebab_core::TokenUsage,}#[cfg(feature = "mock")]implkebab_core::LanguageModelforMockLanguageModel{/* per §7.2 */}
Behavior contract
MockLanguageModel::generate_stream produces a Box<dyn Iterator> that yields the canned response one Unicode character at a time as TokenChunk::Token, then a final TokenChunk::Done { finish_reason, usage }.
The mock honors GenerateRequest.stop: if any stop string appears in the canned response, truncate before emitting.
The mock must NOT touch the network or filesystem.
Real adapters (p4-2+) MUST NOT live in this crate.
Storage / wire effects
None.
Test plan
kind
description
fixture / data
unit
mock streams 5 tokens then Done
inline
unit
mock honors stop strings
inline
unit
trait dyn dispatch via Box<dyn LanguageModel> works
inline
unit
concatenation of streamed TokenChunk::Token equals canned text (truncated by stop strings)
inline
contract
model_ref() populates provider and leaves dimensions = None
inline
All tests under cargo test -p kebab-llm.
Definition of Done
cargo check -p kebab-llm passes
cargo test -p kebab-llm passes
No HTTP / async runtime deps present
PR links design §7.2 LanguageModel, §0 Q5
Out of scope
Real adapter (p4-2).
Token counting against the actual tokenizer (best-effort via usage.prompt_tokens reported by the adapter).
Server-side cancellation / abort signals (P+).
Risks / notes
Real adapters return Unicode-incomplete byte sequences mid-stream; the trait emits TokenChunk::Token(String) so adapters must handle UTF-8 boundary buffering internally.
TokenChunk::Done { usage } must always fire, even on error — adapters convert errors into FinishReason::Error(msg) and a final Done.