phase, component, task_id, title, status, depends_on, unblocks, contract_source, contract_sections
phase
component
task_id
title
status
depends_on
unblocks
contract_source
contract_sections
P1
kb-source-fs
p1-1
Local filesystem source connector
planned
../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
§3.3
§6.2
§6.6
§7.1
§7.2 SourceConnector
§8
p1-1 — Local filesystem source connector
Goal
Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums, and produce Vec<RawAsset>.
Why now / why this size
SourceConnector is the entry point of every ingest. Stable RawAsset output unblocks every downstream P1 task (parser, normalize, chunk, store). Small enough to deliver in one PR with full test coverage.
Allowed dependencies
kb-core
kb-config
ignore (gitignore semantics)
blake3
walkdir
time
serde
thiserror
tracing
Forbidden dependencies
kb-parse-*, kb-normalize, kb-chunk, kb-store-*, kb-embed*, kb-search, kb-llm*, kb-rag, kb-tui, kb-desktop
Inputs
input
type
source
SourceScope
kb_core::SourceScope
kb-app from config
filesystem
&Path
OS
.kbignore
text file
workspace root, optional
Outputs
output
type
downstream consumer
Vec<RawAsset>
kb_core::RawAsset
kb-parse-md, asset writer in kb-store-sqlite (via kb-app)
Public surface (signatures only — no new types)
Behavior contract
POSIX-normalize every emitted workspace_path (NFC, leading ./ stripped, single /).
asset_id derived per design §4.2 from blake3(raw bytes) full hex.
media_type selected from extension + libmagic-like sniff fallback (.md → Markdown, others fall through to MediaType::Other).
discovered_at = current OffsetDateTime::now_utc() at scan time.
Combine config.workspace.exclude ∪ .kbignore for filter (union; ordering does not matter).
Symbolic links: follow once, detect cycles via canonicalize + visited set.
Files larger than storage.copy_threshold_mb MB → emit AssetStorage::Reference { path, sha } (do not copy bytes here; copying is done by the asset writer task).
Idempotent: same input → same Vec<RawAsset> (sort by workspace_path).
Storage / wire effects
Reads: filesystem under config.workspace.root.
Writes: nothing. (Asset copy is handled by the asset writer in kb-store-sqlite.)
Test plan
kind
description
fixture / data
unit
POSIX path normalization
inline cases incl. ./a/b.md, a//b.md, a/b.md → identical
unit
blake3 of known bytes matches expected hex
inline
unit
gitignore filter (*.tmp, node_modules/**) excludes correctly
tmp tree built in test
unit
.kbignore ∪ config exclude works
tmp tree
unit
symlink cycle does not loop
tmp tree with a -> b -> a
snapshot
Vec<RawAsset> serialized JSON for fixture tree is stable
fixtures/source-fs/tree-1
determinism
re-running scan twice produces byte-identical JSON
fixtures/source-fs/tree-1
All tests run under cargo test -p kb-source-fs with no network and no model.
Definition of Done
Out of scope
File watching (P+).
Asset copy/reference storage on disk (kb-store-sqlite task p1-6).
Non-fs source connectors (HTTP, S3 — P+).
Risks / notes
BLAKE3 of large files (>1 GB) is fast but allocate streaming; do not load whole file in memory.
macOS resource forks / .DS_Store should be excluded by default.