Add the workspace member with the dep allow-list pinned by design §0 Q9
and the task spec. P1-2 will land the frontmatter submodule in the next
commit; P1-3 will add the block parser as a sibling.
Notable choice: serde_yaml (dtolnay) was archived as unmaintained in 2024
so we use serde_yaml_ng, the maintained fork. lingua's per-language
features are explicitly enabled (default-features=false) to keep build
time + binary size sane — only the languages we need at parse time.
Walk config.workspace.root, apply gitignore-style filters
(config.workspace.exclude ∪ .kbignore ∪ baked-in defaults for
.DS_Store / ._*), stream BLAKE3 over each file, and emit a
deterministic Vec<RawAsset> sorted by workspace_path.
Modules:
- hash: streaming blake3::Hasher + 64 KiB read buffer (no whole-file
loads); pinned digests for empty input and "hello world".
- media: extension → MediaType (markdown/pdf/image/audio/other).
- walker: ignore::OverrideBuilder for filter union; walkdir with
manual visited-set cycle protection on top of follow_links.
- connector: public FsSourceConnector::new(&Config) +
SourceConnector::scan(&SourceScope) impl. Uses
kb_core::to_posix for WorkspacePath construction (carries
P0-1 # rejection through unchanged) and kb_core::id_for_asset
for AssetId derivation. Storage variant signals intent only;
actual byte copy is P1-6's responsibility.
Per design §3.3, §6.2, §6.6, §7.1, §7.2, §8.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>