Files
kebab/tasks/p4/p4-1-llm-trait.md
kb bc1b3147cd refactor(spec): cleanup pass over component specs
Address 8 issues found in spec audit (post PR #2):

1. §refs label: distinguish design vs report sections in p3-1 / p3-2 / p4-2 /
   p9-1 / p9-5 contract_sections (e.g., "report §11.2 Ollama" not "§11.2").
2. mock feature gate: gate MockEmbedder (p3-1) and MockLanguageModel (p4-1)
   behind `mock` cargo feature, default OFF; add CI symbol-scan as DoD item.
3. Warning type unification: p1-2 frontmatter now emits
   `kb_parse_types::Warning` (matches p1-3 / p1-4); drops crate-internal type.
4. p4-3 streaming thread: explicitly single-threaded inside RagPipeline::ask;
   collection + sink.send share the calling thread, no race. UI concurrency
   is callers responsibility (TUI worker thread pattern in p9-3).
5. p6-2 tesseract version: noted that `tesseract` 0.13 has no stable Rust
   `version()` accessor; use TessVersion FFI or shell-out + cache approach.
6. p9-* App struct extensions: introduce `kb_tui::{Library,Search,Ask,Inspect}State`
   slots in p9-1 forward-decl form; p9-2/3/4 fill bodies in their own crate
   without editing `App`. Parallel-safety contract added.
7. p3-3 cosine score: shift `(sim+1)/2` instead of clamp; preserve ranking
   signal between unrelated and opposite vectors. Clamp reserved for NaN.
8. fixtures/ root: p0-1 DoD now creates all fixture subdirs with .gitkeep so
   downstream tasks have a stable target path.
2026-04-27 23:38:13 +00:00

4.1 KiB

phase, component, task_id, title, status, depends_on, unblocks, contract_source, contract_sections
phase component task_id title status depends_on unblocks contract_source contract_sections
P4 kb-llm (trait crate) p4-1 LanguageModel trait + GenerateRequest/TokenChunk planned
p0-1
p4-2
p4-3
../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
§7.1 GenerateRequest/TokenChunk
§7.2 LanguageModel
§0 Q5 streaming
§3.8 ModelRef

p4-1 — LanguageModel trait crate

Goal

Provide the kb-llm crate that re-exports the LanguageModel trait and helper types (GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef), plus a MockLanguageModel for downstream tests.

Why now / why this size

kb-rag (p4-3) consumes a LanguageModel trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond.

Allowed dependencies

  • kb-core
  • kb-config
  • serde
  • thiserror
  • tracing
  • [features] mock = [] — opt-in feature flag exposing MockLanguageModel. Default OFF. Release builds compile mock out entirely.

Forbidden dependencies

  • reqwest, ureq, tokio, whisper-rs, kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-*, kb-embed*, kb-search, kb-rag, kb-tui, kb-desktop

Inputs

input type source
GenerateRequest kb_core::GenerateRequest RAG pipeline
concrete adapter at runtime dyn LanguageModel p4-2+

Outputs

output type downstream
streaming TokenChunk iterator Box<dyn Iterator<Item=anyhow::Result<TokenChunk>> + Send> RAG pipeline
ModelRef identity kb_core::ModelRef Answer.model

Public surface (signatures only — no new types)

pub use kb_core::{LanguageModel, GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef};

/// Test-only deterministic mock. Compiled only when `mock` feature is on.
#[cfg(feature = "mock")]
pub struct MockLanguageModel {
    pub model_id: String,
    pub provider: String,
    pub context_tokens: usize,
    pub canned_response: String,                 // emitted token-by-token
    pub canned_finish: kb_core::FinishReason,
    pub canned_usage:  kb_core::TokenUsage,
}

#[cfg(feature = "mock")]
impl kb_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }

Behavior contract

  • MockLanguageModel::generate_stream produces a Box<dyn Iterator> that yields the canned response one Unicode character at a time as TokenChunk::Token, then a final TokenChunk::Done { finish_reason, usage }.
  • The mock honors GenerateRequest.stop: if any stop string appears in the canned response, truncate before emitting.
  • model_ref() returns ModelRef { id, provider, dimensions: None }.
  • The mock must NOT touch the network or filesystem.
  • Real adapters (p4-2+) MUST NOT live in this crate.

Storage / wire effects

  • None.

Test plan

kind description fixture / data
unit mock streams 5 tokens then Done inline
unit mock honors stop strings inline
unit trait dyn dispatch via Box<dyn LanguageModel> works inline
unit concatenation of streamed TokenChunk::Token equals canned text (truncated by stop strings) inline
contract model_ref() populates provider and leaves dimensions = None inline

All tests under cargo test -p kb-llm.

Definition of Done

  • cargo check -p kb-llm passes
  • cargo test -p kb-llm passes
  • No HTTP / async runtime deps present
  • PR links design §7.2 LanguageModel, §0 Q5

Out of scope

  • Real adapter (p4-2).
  • Token counting against the actual tokenizer (best-effort via usage.prompt_tokens reported by the adapter).
  • Server-side cancellation / abort signals (P+).

Risks / notes

  • Real adapters return Unicode-incomplete byte sequences mid-stream; the trait emits TokenChunk::Token(String) so adapters must handle UTF-8 boundary buffering internally.
  • TokenChunk::Done { usage } must always fire, even on error — adapters convert errors into FinishReason::Error(msg) and a final Done.