phase, component, task_id, title, status, depends_on, unblocks, contract_source, contract_sections
phase
component
task_id
title
status
depends_on
unblocks
contract_source
contract_sections
P0
workspace + kb-core + kb-config + kb-app + kb-cli
p0-1
Workspace skeleton + frozen domain types/traits + ID recipe + facade
planned
p1-1
p1-2
p1-3
p1-4
p1-5
p1-6
p2-1
p2-2
p3-1
p3-2
p3-3
p3-4
p4-1
p4-2
p4-3
p5-1
p5-2
p6-1
p6-2
p6-3
p7-1
p7-2
p8-1
p8-2
p9-1
p9-2
p9-3
p9-4
p9-5
../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
§3 (all)
§4
§5.1 schema_meta+migrations
§6 (config + XDG)
§7 (all traits)
§8 module boundaries
§9 versioning
§10 errors+exit codes
§2.8 wire schema_version
p0-1 — Workspace skeleton + frozen contracts
Goal
Stand up the Cargo workspace (Rust 2024, resolver=3) with kb-core, kb-config, kb-app, kb-cli crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts.
Why now / why this size
Every other task imports kb-core. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them.
Allowed dependencies
workspace [workspace.dependencies]: anyhow = "1", thiserror = "2", serde = { version = "1", features = ["derive"] }, serde_json = "1", time = { version = "0.3", features = ["serde", "macros"] }, uuid = { version = "1", features = ["v7", "serde"] }, blake3 = "1", tracing = "0.1"
per crate:
kb-core: workspace deps + serde_json::Map, serde-json-canonicalizer, unicode-normalization
kb-config: workspace deps + toml = "0.8", dirs = "5" (XDG paths)
kb-app: workspace deps + kb-core, kb-config, tracing-subscriber, tracing-appender
kb-cli: workspace deps + kb-core, kb-config, kb-app, clap = { version = "4", features = ["derive"] }
Forbidden dependencies
kb-core MUST NOT depend on any other kb-* crate.
kb-config MUST NOT depend on kb-app, kb-cli, parsers, stores, embedders, search, llm, rag, tui, desktop.
kb-app MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return unimplemented!() or anyhow::bail!("not yet wired (Pn-i)")).
kb-cli MUST NOT call any non-kb-app crate directly.
Inputs
input
type
source
frozen design doc
Markdown
docs/superpowers/specs/2026-04-27-kb-final-form-design.md
user kb invocation
command-line args
end user
Outputs
output
type
downstream consumer
compiling workspace
Rust crates
every later task
kb-core types/traits
Rust API
every other crate
kb-core ID functions
Rust API
parsers, normalize, chunkers, embedders, search, rag
kb-config::Config
Rust struct
every other crate
kb-app facade methods (stubs)
Rust API
kb-cli, future TUI/desktop
kb binary
executable
end user
docs/wire-schema/v1/*.schema.json stubs
JSON Schema files
future wire emitters and consumers
docs/spec/*.md stubs (link to frozen design)
Markdown
future contributors
Public surface (signatures only — no new types)
All types/traits below are defined in kb-core exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field.
// ── kb-core ─────────────────────────────────────────────────────────────────
// Newtype IDs (design §3.1) — Display + FromStr implemented.
pub struct AssetId ( pub String );
pub struct DocumentId ( pub String );
pub struct BlockId ( pub String );
pub struct ChunkId ( pub String );
pub struct EmbeddingId ( pub String );
pub struct IndexId ( pub String );
// Versions / labels (§3.2)
pub struct ParserVersion ( pub String );
pub struct ChunkerVersion ( pub String );
pub struct EmbeddingModelId ( pub String );
pub struct EmbeddingVersion ( pub String );
pub struct IndexVersion ( pub String );
pub struct PromptTemplateVersion ( pub String );
pub struct SchemaVersion ( pub & 'static str );
// Forward-declared (§3.7a)
pub struct OcrText { /* per §3.7a */ }
pub struct OcrRegion { /* per §3.7a */ }
pub struct ModelCaption { /* per §3.7a */ }
pub struct Transcript { /* per §3.7a */ }
pub struct TranscriptSegment { /* per §3.7a */ }
pub struct Checksum ( pub String );
pub struct Lang ( pub String );
pub enum ImageType { Png , Jpeg , Webp , Gif , Tiff , Other ( String ) }
pub enum AudioType { M4a , Mp3 , Wav , Flac , Ogg , Other ( String ) }
// RawAsset (§3.3)
pub struct RawAsset { /* per §3.3 */ }
pub enum SourceUri { File ( std ::path ::PathBuf ), Kb ( String ) }
pub struct WorkspacePath ( pub String );
pub enum MediaType { Markdown , Pdf , Image ( ImageType ), Audio ( AudioType ), Other ( String ) }
pub enum AssetStorage { Copied { path : std ::path ::PathBuf }, Reference { path : std ::path ::PathBuf , sha : Checksum } }
// CanonicalDocument + Block + SourceSpan + Inline (§3.4)
pub struct CanonicalDocument { /* per §3.4 */ }
pub enum Block { /* per §3.4 */ }
pub struct CommonBlock { /* per §3.4 */ }
pub struct HeadingBlock { /* per §3.4 */ }
pub struct TextBlock { /* per §3.4 */ }
pub struct ListBlock { /* per §3.4 */ }
pub struct CodeBlock { /* per §3.4 */ }
pub struct TableBlock { /* per §3.4 */ }
pub struct ImageRefBlock { /* per §3.4 */ }
pub struct AudioRefBlock { /* per §3.4 */ }
pub enum Inline { /* per §3.4 */ }
pub enum SourceSpan { /* per §3.4 */ }
// ParsedBlock (intermediate, exposed via core for normalize — see p1-4 spec)
pub struct ParsedBlock { /* mirror of Block without BlockId */ }
// Chunk + Citation (§3.5)
pub struct Chunk { /* per §3.5 */ }
pub enum Citation { /* 5 variants per §3.5 */ }
impl Citation {
pub fn path ( & self ) -> & WorkspacePath ;
pub fn to_uri ( & self ) -> String ; // W3C Media Fragments per §0 Q3
pub fn parse ( s : & str ) -> anyhow ::Result < Self > ;
}
// Metadata + Provenance (§3.6)
pub struct Metadata { /* per §3.6 */ }
pub enum SourceType { Markdown , Note , Paper , Reference , Inbox }
pub enum TrustLevel { Primary , Secondary , Generated }
pub struct Provenance { /* per §3.6 */ }
pub struct ProvenanceEvent { /* per §3.6 */ }
pub enum ProvenanceKind { Discovered , Parsed , Normalized , Chunked , OcrApplied , CaptionApplied , Transcribed , Embedded , Indexed , Warning , Error }
// Search types (§3.7)
pub enum SearchMode { Lexical , Vector , Hybrid }
pub struct SearchQuery { /* per §3.7 */ }
pub struct SearchFilters { /* per §3.7 */ }
pub struct SearchHit { /* per §3.7 */ }
pub struct RetrievalDetail { /* per §3.7 */ }
pub struct DocFilter { /* tags_any/lang/path_glob/trust_min */ }
pub struct DocSummary { /* per §2.5 wire — mirrored internally */ }
// Answer / RAG (§3.8)
pub struct Answer { /* per §3.8 */ }
pub struct AnswerCitation { /* per §3.8 */ }
pub enum RefusalReason { ScoreGate , LlmSelfJudge , NoIndex , NoChunks }
pub struct ModelRef { /* per §3.8 */ }
pub struct AnswerRetrievalSummary { /* per §3.8 */ }
pub struct TokenUsage { /* per §3.8 */ }
pub struct TraceId ( pub String );
// IngestReport (mirrored from wire §2.4 for facade return)
pub struct IngestReport { /* per §2.4 */ }
pub struct IngestItem { /* per §2.4 items */ }
// JobRepo support types (forward-declared; full shapes can land here)
pub enum JobKind { Ingest , Chunk , Embed , Ocr , Transcribe , Reindex , Doctor }
pub enum JobStatus { Pending , Running , Succeeded , Failed , Canceled }
pub struct JobId ( pub String );
pub struct JobFilter { /* status/kind */ }
pub struct JobRow { /* row mirror */ }
// Vector (forward-declared per §7.2)
pub struct VectorRecord { /* chunk_id, embedding_id, vector, doc_id, text, heading_path, model_id, model_version, dimensions */ }
pub struct VectorHit { /* chunk_id, score, payload */ }
// Errors (§10)
#[derive(Debug, thiserror::Error)]
pub enum CoreError {
#[error( "invalid id: {0}" )] InvalidId ( String ),
#[error( "invalid citation: {0}" )] InvalidCitation ( String ),
#[error( "invalid source span: {0}" )] InvalidSpan ( String ),
#[error( "malformed input: {0}" )] Malformed ( String ),
}
// ── Traits (§7.2) ───────────────────────────────────────────────────────────
pub trait SourceConnector { fn scan ( & self , scope : & SourceScope ) -> anyhow ::Result < Vec < RawAsset >> ; }
pub trait Extractor : Send + Sync {
fn supports ( & self , media_type : & MediaType ) -> bool ;
fn parser_version ( & self ) -> ParserVersion ;
fn extract ( & self , ctx : & ExtractContext , bytes : & [ u8 ]) -> anyhow ::Result < CanonicalDocument > ;
}
pub trait Chunker : Send + Sync {
fn chunker_version ( & self ) -> ChunkerVersion ;
fn policy_hash ( & self , policy : & ChunkPolicy ) -> String ;
fn chunk ( & self , doc : & CanonicalDocument , policy : & ChunkPolicy ) -> anyhow ::Result < Vec < Chunk >> ;
}
pub trait Embedder : Send + Sync {
fn model_id ( & self ) -> EmbeddingModelId ;
fn model_version ( & self ) -> EmbeddingVersion ;
fn dimensions ( & self ) -> usize ;
fn embed ( & self , inputs : & [ EmbeddingInput < '_ > ]) -> anyhow ::Result < Vec < Vec < f32 >>> ;
}
pub trait Retriever : Send + Sync {
fn search ( & self , query : & SearchQuery ) -> anyhow ::Result < Vec < SearchHit >> ;
fn index_version ( & self ) -> IndexVersion ;
}
pub trait LanguageModel : Send + Sync {
fn model_ref ( & self ) -> ModelRef ;
fn context_tokens ( & self ) -> usize ;
fn generate_stream ( & self , req : GenerateRequest )
-> anyhow ::Result < Box < dyn Iterator < Item = anyhow ::Result < TokenChunk >> + Send >> ;
}
pub trait DocumentStore { /* full set per §7.2 */ }
pub trait VectorStore { /* full set per §7.2 */ }
pub trait JobRepo { /* full set per §7.2 */ }
// Helper input types (§7.1)
pub struct SourceScope { pub root : std ::path ::PathBuf , pub include : Vec < String > , pub exclude : Vec < String > }
pub struct ExtractContext < 'a > { /* per §7.1 */ }
pub struct ExtractConfig { /* TBD by extractors; carry path-only for now */ }
pub struct ChunkPolicy { /* per §7.1 */ }
pub enum EmbeddingKind { Document , Query }
pub struct EmbeddingInput < 'a > { pub text : & 'a str , pub kind : EmbeddingKind }
pub struct GenerateRequest { /* per §7.1 */ }
pub enum TokenChunk { Token ( String ), Done { finish_reason : FinishReason , usage : TokenUsage } }
pub enum FinishReason { Stop , Length , Aborted , Error ( String ) }
// ── ID functions (§4.2) ─────────────────────────────────────────────────────
pub fn id_from < T : serde ::Serialize > ( tuple : T ) -> String ; // hex prefix 32
pub fn id_for_asset ( asset_blake3_full_hex : & str ) -> AssetId ;
pub fn id_for_doc ( workspace_path : & WorkspacePath , asset : & AssetId , parser_version : & ParserVersion ) -> DocumentId ;
pub fn id_for_block ( doc : & DocumentId , block_kind : & str , heading_path : & [ String ], ordinal : u32 , span : & SourceSpan ) -> BlockId ;
pub fn id_for_chunk ( doc : & DocumentId , chunker_version : & ChunkerVersion , block_ids : & [ BlockId ], policy_hash : & str ) -> ChunkId ;
pub fn id_for_embedding ( chunk : & ChunkId , model : & EmbeddingModelId , version : & EmbeddingVersion , dims : usize ) -> EmbeddingId ;
pub fn id_for_index ( collection : & str , model : & EmbeddingModelId , dims : usize , version : & IndexVersion , kind : & str , params_hash : & str ) -> IndexId ;
pub fn to_posix ( path : & std ::path ::Path ) -> WorkspacePath ; // §6.6
pub fn nfc ( input : & str ) -> String ; // §4.1
P0 facade implementations call anyhow::bail!("not yet wired (P<n>-<i>)"); later phases replace bodies but never change signatures.
Behavior contract
Workspace Cargo.toml sets resolver = "3", [workspace.package] edition = "2024", rust-version = "1.85".
Every newtype ID implements Display (returns inner) and FromStr (validates hex length 32).
id_from uses serde-json-canonicalizer exactly as design §4.2 specifies and truncates blake3 to 32 hex chars.
Citation::to_uri emits W3C Media Fragments URIs per §0 Q3 (#L<a>-L<b>, #p=<n>, #xywh=…, #caption, #t=hh:mm:ss,hh:mm:ss[&speaker=…]).
Citation::parse is the strict inverse (round-trip property).
kb-config resolves XDG paths via dirs crate; respects XDG_CONFIG_HOME, XDG_DATA_HOME, XDG_CACHE_HOME, XDG_STATE_HOME if set.
Config layer order: defaults → file → env (KB_<SECTION>_<KEY>) → CLI flag (CLI override is applied by kb-cli after Config::load).
kb-cli global flags: --config <path>, --verbose, --debug, --json, --explain (where applicable). On --json, output conforms to wire schema v1.
kb-cli exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10).
All facade-returned wire objects emit schema_version per §2 (e.g., "answer.v1", "search_hit.v1").
Storage / wire effects
Filesystem: creates ~/.config/kb/, ~/.local/share/kb/, ~/KnowledgeBase/ only when kb init runs; never on Config::load.
Wire schemas: ships docs/wire-schema/v1/{citation,search_hit,answer,ingest_report,doc_summary,chunk_inspection,doctor}.schema.json as stubs declaring the top-level schema_version and required fields per §2. Full property validation can land later.
DB: workspace ships migrations/V001__init.sql containing only §5.1 schema_meta + migrations tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within kb-core/kb-config/kb-app/kb-cli scope).
Logging: tracing initialized in kb-cli; daily-rolling file in ~/.local/state/kb/logs/.
Test plan
kind
description
fixture / data
unit
id_from deterministic across 1000 runs for fixed inputs
inline
unit
each id_for_* recipe matches design §4.2 byte-for-byte (verify against fixed expected hex)
inline
unit
to_posix collapses ./a//b.md → a/b.md and NFC-normalizes Korean
inline
unit
Citation::to_uri and parse round-trip for all 5 variants
inline
unit
newtype Display/FromStr rejects invalid lengths/chars
inline
unit
Config::defaults + env override + CLI override produces expected merged config
inline
snapshot
Config::defaults JSON serde stable
inline (round-trip)
smoke
kb --help, kb init, kb doctor run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint)
tmp XDG_* env
build
cargo check --workspace and cargo test --workspace pass
repo
All tests must run with no network, no Ollama, no models.
Definition of Done
Cargo.toml workspace lists kb-core, kb-config, kb-app, kb-cli and resolver=3, edition 2024
cargo check --workspace passes
cargo test --workspace passes
kb --help prints subcommands
kb init creates XDG dirs idempotently and writes config.toml
kb doctor returns wire JSON conforming to doctor.v1 (in --json mode)
docs/wire-schema/v1/*.schema.json stubs exist (7 files: citation, search_hit, answer, ingest_report, doc_summary, chunk_inspection, doctor)
docs/spec/ stubs exist linking to the frozen design (one file per: domain-model, ids, canonical-document, chunk-policy, citation-policy, module-boundaries, ai-generation-guidelines)
No imports outside Allowed dependencies (CI deny check)
PR body links design §3, §4, §6, §7, §8, §9, §10
Out of scope
Any parser / store / embedder / search / llm / rag / tui / desktop logic (downstream phases).
Full schema migrations (most DDL lands in p1-6 / p2-1 / p3-3).
Wire schema deep validation (only required fields + schema_version checked here).
Real kb-app business logic (functions stub with unimplemented!() or explicit bail!).
Risks / notes
ID recipe is the contract that every later record depends on. Any change after this task lands forces a parser_version / chunker_version / embedding_version cascade per §9. Treat changes as schema migrations and update the design doc first.
Newtype IDs use String (not [u8; 16]) to keep serde simple; tests must still enforce 32-char hex constraint on FromStr.
kb-app stubs must use bail! not panic! so the CLI exits with code 2 cleanly per §10.
clap v4 derive: subcommand inspect has nested doc / chunk variants; ensure exit code 0/1/2 mapping wraps the facade call uniformly.
XDG path discovery on macOS: spec uses XDG (not Application Support) per §6.1 — dirs crate honors XDG env vars; tests must set them explicitly.