Address 8 issues found in spec audit (post PR #2):
1. §refs label: distinguish design vs report sections in p3-1 / p3-2 / p4-2 /
p9-1 / p9-5 contract_sections (e.g., "report §11.2 Ollama" not "§11.2").
2. mock feature gate: gate MockEmbedder (p3-1) and MockLanguageModel (p4-1)
behind `mock` cargo feature, default OFF; add CI symbol-scan as DoD item.
3. Warning type unification: p1-2 frontmatter now emits
`kb_parse_types::Warning` (matches p1-3 / p1-4); drops crate-internal type.
4. p4-3 streaming thread: explicitly single-threaded inside RagPipeline::ask;
collection + sink.send share the calling thread, no race. UI concurrency
is callers responsibility (TUI worker thread pattern in p9-3).
5. p6-2 tesseract version: noted that `tesseract` 0.13 has no stable Rust
`version()` accessor; use TessVersion FFI or shell-out + cache approach.
6. p9-* App struct extensions: introduce `kb_tui::{Library,Search,Ask,Inspect}State`
slots in p9-1 forward-decl form; p9-2/3/4 fill bodies in their own crate
without editing `App`. Parallel-safety contract added.
7. p3-3 cosine score: shift `(sim+1)/2` instead of clamp; preserve ranking
signal between unrelated and opposite vectors. Clamp reserved for NaN.
8. fixtures/ root: p0-1 DoD now creates all fixture subdirs with .gitkeep so
downstream tasks have a stable target path.
Provide the kb-llm crate that re-exports the LanguageModel trait and helper types (GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef), plus a MockLanguageModel for downstream tests.
Why now / why this size
kb-rag (p4-3) consumes a LanguageModel trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond.
Allowed dependencies
kb-core
kb-config
serde
thiserror
tracing
[features] mock = [] — opt-in feature flag exposing MockLanguageModel. Default OFF. Release builds compile mock out entirely.
pubusekb_core::{LanguageModel,GenerateRequest,TokenChunk,FinishReason,TokenUsage,ModelRef};/// Test-only deterministic mock. Compiled only when `mock` feature is on.
#[cfg(feature = "mock")]pubstructMockLanguageModel{pubmodel_id: String,pubprovider: String,pubcontext_tokens: usize,pubcanned_response: String,// emitted token-by-token
pubcanned_finish: kb_core::FinishReason,pubcanned_usage: kb_core::TokenUsage,}#[cfg(feature = "mock")]implkb_core::LanguageModelforMockLanguageModel{/* per §7.2 */}
Behavior contract
MockLanguageModel::generate_stream produces a Box<dyn Iterator> that yields the canned response one Unicode character at a time as TokenChunk::Token, then a final TokenChunk::Done { finish_reason, usage }.
The mock honors GenerateRequest.stop: if any stop string appears in the canned response, truncate before emitting.
The mock must NOT touch the network or filesystem.
Real adapters (p4-2+) MUST NOT live in this crate.
Storage / wire effects
None.
Test plan
kind
description
fixture / data
unit
mock streams 5 tokens then Done
inline
unit
mock honors stop strings
inline
unit
trait dyn dispatch via Box<dyn LanguageModel> works
inline
unit
concatenation of streamed TokenChunk::Token equals canned text (truncated by stop strings)
inline
contract
model_ref() populates provider and leaves dimensions = None
inline
All tests under cargo test -p kb-llm.
Definition of Done
cargo check -p kb-llm passes
cargo test -p kb-llm passes
No HTTP / async runtime deps present
PR links design §7.2 LanguageModel, §0 Q5
Out of scope
Real adapter (p4-2).
Token counting against the actual tokenizer (best-effort via usage.prompt_tokens reported by the adapter).
Server-side cancellation / abort signals (P+).
Risks / notes
Real adapters return Unicode-incomplete byte sequences mid-stream; the trait emits TokenChunk::Token(String) so adapters must handle UTF-8 boundary buffering internally.
TokenChunk::Done { usage } must always fire, even on error — adapters convert errors into FinishReason::Error(msg) and a final Done.