Files
kebab/tasks/p0/p0-1-skeleton.md
altair823 f9714aa5cb docs(rename): kb → kebab — README, tasks/, docs/, design doc, report
마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:01:55 +00:00

22 KiB

phase, component, task_id, title, status, depends_on, unblocks, contract_source, contract_sections
phase component task_id title status depends_on unblocks contract_source contract_sections
P0 workspace + kebab-core + kebab-config + kebab-app + kebab-cli p0-1 Workspace skeleton + frozen domain types/traits + ID recipe + facade completed
p1-1
p1-2
p1-3
p1-4
p1-5
p1-6
p2-1
p2-2
p3-1
p3-2
p3-3
p3-4
p4-1
p4-2
p4-3
p5-1
p5-2
p6-1
p6-2
p6-3
p7-1
p7-2
p8-1
p8-2
p9-1
p9-2
p9-3
p9-4
p9-5
../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
§3 (all)
§4
§5.1 schema_meta+migrations
§6 (config + XDG)
§7 (all traits)
§8 module boundaries
§9 versioning
§10 errors+exit codes
§2.8 wire schema_version

p0-1 — Workspace skeleton + frozen contracts

Goal

Stand up the Cargo workspace (Rust 2024, resolver=3) with kebab-core, kebab-parse-types, kebab-config, kebab-app, kebab-cli crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts.

Why now / why this size

Every other task imports kebab-core. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them.

Allowed dependencies

  • workspace [workspace.dependencies]: anyhow = "1", thiserror = "2", serde = { version = "1", features = ["derive"] }, serde_json = "1", time = { version = "0.3", features = ["serde", "macros"] }, uuid = { version = "1", features = ["v7", "serde"] }, blake3 = "1", tracing = "0.1"
  • per crate:
    • kebab-core: workspace deps + serde_json::Map, serde-json-canonicalizer, unicode-normalization
    • kebab-parse-types: workspace deps + kebab-core ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b.
    • kebab-config: workspace deps + toml = "0.8", dirs = "5" (XDG paths)
    • kebab-app: workspace deps + kebab-core, kebab-config, tracing-subscriber, tracing-appender
    • kebab-cli: workspace deps + kebab-core, kebab-config, kebab-app, clap = { version = "4", features = ["derive"] }

Forbidden dependencies

  • kebab-core MUST NOT depend on any other kebab-* crate.
  • kebab-parse-types MUST depend ONLY on kebab-core. No parser libraries (pulldown-cmark, pdf-extract, image, whisper-rs, …), no other kebab-* crate.
  • kebab-config MUST NOT depend on kebab-app, kebab-cli, parsers, stores, embedders, search, llm, rag, tui, desktop.
  • kebab-app MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return unimplemented!() or anyhow::bail!("not yet wired (Pn-i)")).
  • kebab-cli MUST NOT call any non-kebab-app crate directly.

Inputs

input type source
frozen design doc Markdown docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
user kebab invocation command-line args end user

Outputs

output type downstream consumer
compiling workspace Rust crates every later task
kebab-core types/traits Rust API every other crate
kebab-core ID functions Rust API parsers, normalize, chunkers, embedders, search, rag
kebab-config::Config Rust struct every other crate
kebab-app facade methods (stubs) Rust API kebab-cli, future TUI/desktop
kebab binary executable end user
docs/wire-schema/v1/*.schema.json stubs JSON Schema files future wire emitters and consumers
docs/spec/*.md stubs (link to frozen design) Markdown future contributors

Public surface (signatures only — no new types)

All types/traits below are defined in kebab-core exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field.

// ── kebab-core ─────────────────────────────────────────────────────────────────

// Newtype IDs (design §3.1) — Display + FromStr implemented.
pub struct AssetId(pub String);
pub struct DocumentId(pub String);
pub struct BlockId(pub String);
pub struct ChunkId(pub String);
pub struct EmbeddingId(pub String);
pub struct IndexId(pub String);

// Versions / labels (§3.2)
pub struct ParserVersion(pub String);
pub struct ChunkerVersion(pub String);
pub struct EmbeddingModelId(pub String);
pub struct EmbeddingVersion(pub String);
pub struct IndexVersion(pub String);
pub struct PromptTemplateVersion(pub String);
pub struct SchemaVersion(pub &'static str);

// Forward-declared (§3.7a)
pub struct OcrText { /* per §3.7a */ }
pub struct OcrRegion { /* per §3.7a */ }
pub struct ModelCaption { /* per §3.7a */ }
pub struct Transcript { /* per §3.7a */ }
pub struct TranscriptSegment { /* per §3.7a */ }
pub struct Checksum(pub String);
pub struct Lang(pub String);
pub enum   ImageType { Png, Jpeg, Webp, Gif, Tiff, Other(String) }
pub enum   AudioType { M4a, Mp3, Wav, Flac, Ogg, Other(String) }

// RawAsset (§3.3)
pub struct RawAsset { /* per §3.3 */ }
pub enum   SourceUri { File(std::path::PathBuf), Kb(String) }
pub struct WorkspacePath(pub String);
pub enum   MediaType { Markdown, Pdf, Image(ImageType), Audio(AudioType), Other(String) }
pub enum   AssetStorage { Copied { path: std::path::PathBuf }, Reference { path: std::path::PathBuf, sha: Checksum } }

// CanonicalDocument + Block + SourceSpan + Inline (§3.4)
pub struct CanonicalDocument { /* per §3.4 */ }
pub enum   Block { /* per §3.4 */ }
pub struct CommonBlock { /* per §3.4 */ }
pub struct HeadingBlock { /* per §3.4 */ }
pub struct TextBlock { /* per §3.4 */ }
pub struct ListBlock { /* per §3.4 */ }
pub struct CodeBlock { /* per §3.4 */ }
pub struct TableBlock { /* per §3.4 */ }
pub struct ImageRefBlock { /* per §3.4 */ }
pub struct AudioRefBlock { /* per §3.4 */ }
pub enum   Inline { /* per §3.4 */ }
pub enum   SourceSpan { /* per §3.4 */ }

// (ParsedBlock + parser intermediates live in kebab-parse-types per design §3.7b — NOT in kebab-core.)

// Chunk + Citation (§3.5)
pub struct Chunk { /* per §3.5 */ }
pub enum   Citation { /* 5 variants per §3.5 */ }
impl Citation {
    pub fn path(&self) -> &WorkspacePath;
    pub fn to_uri(&self) -> String;          // W3C Media Fragments per §0 Q3
    pub fn parse(s: &str) -> anyhow::Result<Self>;
}

// Metadata + Provenance (§3.6)
pub struct Metadata { /* per §3.6 */ }
pub enum   SourceType { Markdown, Note, Paper, Reference, Inbox }
pub enum   TrustLevel { Primary, Secondary, Generated }
pub struct Provenance { /* per §3.6 */ }
pub struct ProvenanceEvent { /* per §3.6 */ }
pub enum   ProvenanceKind { Discovered, Parsed, Normalized, Chunked, OcrApplied, CaptionApplied, Transcribed, Embedded, Indexed, Warning, Error }

// Search types (§3.7)
pub enum   SearchMode { Lexical, Vector, Hybrid }
pub struct SearchQuery { /* per §3.7 */ }
pub struct SearchFilters { /* per §3.7 */ }
pub struct SearchHit { /* per §3.7 */ }
pub struct RetrievalDetail { /* per §3.7 */ }
pub struct DocFilter { /* tags_any/lang/path_glob/trust_min */ }
pub struct DocSummary { /* per §2.5 wire — mirrored internally */ }

// Answer / RAG (§3.8)
pub struct Answer { /* per §3.8 */ }
pub struct AnswerCitation { /* per §3.8 */ }
pub enum   RefusalReason { ScoreGate, LlmSelfJudge, NoIndex, NoChunks }
pub struct ModelRef { /* per §3.8 */ }
pub struct AnswerRetrievalSummary { /* per §3.8 */ }
pub struct TokenUsage { /* per §3.8 */ }
pub struct TraceId(pub String);

// IngestReport (mirrored from wire §2.4 for facade return)
pub struct IngestReport { /* per §2.4 */ }
pub struct IngestItem { /* per §2.4 items */ }

// JobRepo support types (forward-declared; full shapes can land here)
pub enum   JobKind { Ingest, Chunk, Embed, Ocr, Transcribe, Reindex, Doctor }
pub enum   JobStatus { Pending, Running, Succeeded, Failed, Canceled }
pub struct JobId(pub String);
pub struct JobFilter { /* status/kind */ }
pub struct JobRow { /* row mirror */ }

// Vector (forward-declared per §7.2)
pub struct VectorRecord { /* chunk_id, embedding_id, vector, doc_id, text, heading_path, model_id, model_version, dimensions */ }
pub struct VectorHit { /* chunk_id, score, payload */ }

// Errors (§10)
#[derive(Debug, thiserror::Error)]
pub enum CoreError {
    #[error("invalid id: {0}")]      InvalidId(String),
    #[error("invalid citation: {0}")] InvalidCitation(String),
    #[error("invalid source span: {0}")] InvalidSpan(String),
    #[error("malformed input: {0}")]  Malformed(String),
}

// ── Traits (§7.2) ───────────────────────────────────────────────────────────
pub trait SourceConnector { fn scan(&self, scope: &SourceScope) -> anyhow::Result<Vec<RawAsset>>; }
pub trait Extractor: Send + Sync {
    fn supports(&self, media_type: &MediaType) -> bool;
    fn parser_version(&self) -> ParserVersion;
    fn extract(&self, ctx: &ExtractContext, bytes: &[u8]) -> anyhow::Result<CanonicalDocument>;
}
pub trait Chunker: Send + Sync {
    fn chunker_version(&self) -> ChunkerVersion;
    fn policy_hash(&self, policy: &ChunkPolicy) -> String;
    fn chunk(&self, doc: &CanonicalDocument, policy: &ChunkPolicy) -> anyhow::Result<Vec<Chunk>>;
}
pub trait Embedder: Send + Sync {
    fn model_id(&self) -> EmbeddingModelId;
    fn model_version(&self) -> EmbeddingVersion;
    fn dimensions(&self) -> usize;
    fn embed(&self, inputs: &[EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>;
}
pub trait Retriever: Send + Sync {
    fn search(&self, query: &SearchQuery) -> anyhow::Result<Vec<SearchHit>>;
    fn index_version(&self) -> IndexVersion;
}
pub trait LanguageModel: Send + Sync {
    fn model_ref(&self) -> ModelRef;
    fn context_tokens(&self) -> usize;
    fn generate_stream(&self, req: GenerateRequest)
        -> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<TokenChunk>> + Send>>;
}
pub trait DocumentStore { /* full set per §7.2 */ }
pub trait VectorStore { /* full set per §7.2 */ }
pub trait JobRepo { /* full set per §7.2 */ }

// Helper input types (§7.1)
pub struct SourceScope { pub root: std::path::PathBuf, pub include: Vec<String>, pub exclude: Vec<String> }
pub struct ExtractContext<'a> { /* per §7.1 */ }
pub struct ExtractConfig { /* TBD by extractors; carry path-only for now */ }
pub struct ChunkPolicy { /* per §7.1 */ }
pub enum   EmbeddingKind { Document, Query }
pub struct EmbeddingInput<'a> { pub text: &'a str, pub kind: EmbeddingKind }
pub struct GenerateRequest { /* per §7.1 */ }
pub enum   TokenChunk { Token(String), Done { finish_reason: FinishReason, usage: TokenUsage } }
pub enum   FinishReason { Stop, Length, Aborted, Error(String) }

// ── ID functions (§4.2) ─────────────────────────────────────────────────────
pub fn id_from<T: serde::Serialize>(tuple: T) -> String;        // hex prefix 32
pub fn id_for_asset(asset_blake3_full_hex: &str) -> AssetId;
pub fn id_for_doc(workspace_path: &WorkspacePath, asset: &AssetId, parser_version: &ParserVersion) -> DocumentId;
pub fn id_for_block(doc: &DocumentId, block_kind: &str, heading_path: &[String], ordinal: u32, span: &SourceSpan) -> BlockId;
pub fn id_for_chunk(doc: &DocumentId, chunker_version: &ChunkerVersion, block_ids: &[BlockId], policy_hash: &str) -> ChunkId;
pub fn id_for_embedding(chunk: &ChunkId, model: &EmbeddingModelId, version: &EmbeddingVersion, dims: usize) -> EmbeddingId;
pub fn id_for_index(collection: &str, model: &EmbeddingModelId, dims: usize, version: &IndexVersion, kind: &str, params_hash: &str) -> IndexId;

pub fn to_posix(path: &std::path::Path) -> WorkspacePath;       // §6.6
pub fn nfc(input: &str) -> String;                              // §4.1
// ── kebab-parse-types ──────────────────────────────────────────────────────────
// Per design §3.7b. Defines parser intermediate representations consumed by
// kebab-normalize. Depends on kebab-core only — never on parser libraries.

pub struct ParsedBlock {
    pub kind: ParsedBlockKind,
    pub heading_path: Vec<String>,
    pub source_span: kebab_core::SourceSpan,
    pub payload: ParsedPayload,
}

pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRef, AudioRef }

pub enum ParsedPayload {
    Heading   { level: u8, text: String },
    Paragraph { text: String, inlines: Vec<kebab_core::Inline> },
    List      { ordered: bool, items: Vec<Vec<kebab_core::Inline>> },
    Code      { lang: Option<String>, code: String },
    Table     { headers: Vec<String>, rows: Vec<Vec<String>> },
    Quote     { text: String, inlines: Vec<kebab_core::Inline> },
    ImageRef  { src: String, alt: String },
    AudioRef  { src: String },
}

// `Inline` itself lives in kebab-core (§3.4) — parse-types references it, never duplicates it.

pub struct Warning { pub kind: WarningKind, pub note: String }
pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed }

// Forward-ref for P6/P7/P8 — defined when those phases land.
pub struct ParsedImageRegion;
pub struct ParsedPdfPage;
pub struct ParsedAudioSegment;
// ── kebab-config ───────────────────────────────────────────────────────────────
pub struct Config { /* full schema per §6.4 */ }
impl Config {
    pub fn load(path: Option<&std::path::Path>) -> anyhow::Result<Self>;
    pub fn from_file(path: &std::path::Path) -> anyhow::Result<Self>;
    pub fn defaults() -> Self;
    pub fn apply_env(self, env: &std::collections::HashMap<String, String>) -> Self;
    pub fn xdg_config_path() -> std::path::PathBuf;             // ~/.config/kebab/config.toml
    pub fn xdg_data_dir() -> std::path::PathBuf;                // ~/.local/share/kebab
    pub fn xdg_cache_dir() -> std::path::PathBuf;
    pub fn xdg_state_dir() -> std::path::PathBuf;
}
// ── kebab-app ──────────────────────────────────────────────────────────────────
pub fn init_workspace(force: bool) -> anyhow::Result<()>;
pub fn ingest(scope: kebab_core::SourceScope, summary_only: bool) -> anyhow::Result<kebab_core::IngestReport>;
pub fn list_docs(filter: kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
pub fn inspect_doc(id: &kebab_core::DocumentId) -> anyhow::Result<kebab_core::CanonicalDocument>;
pub fn inspect_chunk(id: &kebab_core::ChunkId) -> anyhow::Result<kebab_core::Chunk>;
pub fn search(query: kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<kebab_core::Answer>;
pub fn doctor() -> anyhow::Result<DoctorReport>;
pub struct AskOpts { pub k: usize, pub explain: bool, pub mode: kebab_core::SearchMode, pub temperature: Option<f32>, pub seed: Option<u64> }
pub struct DoctorReport { pub ok: bool, pub checks: Vec<DoctorCheck> }
pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub hint: Option<String> }

P0 facade implementations call anyhow::bail!("not yet wired (P<n>-<i>)"); later phases replace bodies but never change signatures.

// ── kebab-cli ──────────────────────────────────────────────────────────────────
// clap subcommands: init | ingest | list (docs) | inspect (doc|chunk) | search | ask | doctor | eval (subcommand placeholder)
// Each maps 1:1 to a kebab_app function. Exit code mapping per §10.

Behavior contract

  • Workspace Cargo.toml sets resolver = "3", [workspace.package] edition = "2024", rust-version = "1.85".
  • Every newtype ID implements Display (returns inner) and FromStr (validates hex length 32).
  • id_from uses serde-json-canonicalizer exactly as design §4.2 specifies and truncates blake3 to 32 hex chars.
  • Citation::to_uri emits W3C Media Fragments URIs per §0 Q3 (#L<a>-L<b>, #p=<n>, #xywh=…, #caption, #t=hh:mm:ss,hh:mm:ss[&speaker=…]).
  • Citation::parse is the strict inverse (round-trip property).
  • kebab-config resolves XDG paths via dirs crate; respects XDG_CONFIG_HOME, XDG_DATA_HOME, XDG_CACHE_HOME, XDG_STATE_HOME if set.
  • Config layer order: defaults → file → env (KB_<SECTION>_<KEY>) → CLI flag (CLI override is applied by kebab-cli after Config::load).
  • kebab-cli global flags: --config <path>, --verbose, --debug, --json, --explain (where applicable). On --json, output conforms to wire schema v1.
  • kebab-cli exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10).
  • All facade-returned wire objects emit schema_version per §2 (e.g., "answer.v1", "search_hit.v1").

Storage / wire effects

  • Filesystem: creates ~/.config/kebab/, ~/.local/share/kebab/, ~/KnowledgeBase/ only when kebab init runs; never on Config::load.
  • Wire schemas: ships docs/wire-schema/v1/{citation,search_hit,answer,ingest_report,doc_summary,chunk_inspection,doctor}.schema.json as stubs declaring the top-level schema_version and required fields per §2. Full property validation can land later.
  • DB: workspace ships migrations/V001__init.sql containing only §5.1 schema_meta + migrations tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within kebab-core/kebab-config/kebab-app/kebab-cli scope).
  • Logging: tracing initialized in kebab-cli; daily-rolling file in ~/.local/state/kebab/logs/.

Test plan

kind description fixture / data
unit id_from deterministic across 1000 runs for fixed inputs inline
unit each id_for_* recipe matches design §4.2 byte-for-byte (verify against fixed expected hex) inline
unit to_posix collapses ./a//b.mda/b.md and NFC-normalizes Korean inline
unit Citation::to_uri and parse round-trip for all 5 variants inline
unit newtype Display/FromStr rejects invalid lengths/chars inline
unit Config::defaults + env override + CLI override produces expected merged config inline
snapshot Config::defaults JSON serde stable inline (round-trip)
smoke kebab --help, kebab init, kebab doctor run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint) tmp XDG_* env
build cargo check --workspace and cargo test --workspace pass repo

All tests must run with no network, no Ollama, no models.

Definition of Done

  • Cargo.toml workspace lists kebab-core, kebab-parse-types, kebab-config, kebab-app, kebab-cli and resolver=3, edition 2024
  • cargo check --workspace passes
  • cargo test --workspace passes
  • kebab-parse-types cargo tree shows ONLY kebab-core + serde/thiserror style deps (no parser libs, no other kebab-*)
  • kebab --help prints subcommands
  • kebab init creates XDG dirs idempotently and writes config.toml
  • kebab doctor returns wire JSON conforming to doctor.v1 (in --json mode)
  • docs/wire-schema/v1/*.schema.json stubs exist (7 files: citation, search_hit, answer, ingest_report, doc_summary, chunk_inspection, doctor)
  • docs/spec/ stubs exist linking to the frozen design (one file per: domain-model, ids, canonical-document, chunk-policy, citation-policy, module-boundaries, ai-generation-guidelines)
  • fixtures/ root directory created with all subdirectories that downstream tasks reference: fixtures/markdown/, fixtures/source-fs/, fixtures/search/lexical/, fixtures/search/hybrid/, fixtures/embed/, fixtures/vector/, fixtures/rag/, fixtures/eval/, fixtures/image/, fixtures/pdf/, fixtures/audio/. Each subdir gets a .gitkeep so it tracks. P1 ships at minimum fixtures/markdown/{simple-note,nested-headings,code-and-table}.md (per epic phase-0); other dirs stay empty until their phase lands.
  • No imports outside Allowed dependencies (CI deny check)
  • PR body links design §3, §3.7b, §4, §6, §7, §8, §9, §10

Out of scope

  • Any parser / store / embedder / search / llm / rag / tui / desktop logic (downstream phases).
  • Full schema migrations (most DDL lands in p1-6 / p2-1 / p3-3).
  • Wire schema deep validation (only required fields + schema_version checked here).
  • Real kebab-app business logic (functions stub with unimplemented!() or explicit bail!).

Risks / notes

  • ID recipe is the contract that every later record depends on. Any change after this task lands forces a parser_version / chunker_version / embedding_version cascade per §9. Treat changes as schema migrations and update the design doc first.
  • Newtype IDs use String (not [u8; 16]) to keep serde simple; tests must still enforce 32-char hex constraint on FromStr.
  • kebab-app stubs must use bail! not panic! so the CLI exits with code 2 cleanly per §10.
  • clap v4 derive: subcommand inspect has nested doc / chunk variants; ensure exit code 0/1/2 mapping wraps the facade call uniformly.
  • XDG path discovery on macOS: spec uses XDG (not Application Support) per §6.1 — dirs crate honors XDG env vars; tests must set them explicitly.