Persist RawAsset, CanonicalDocument, Blocks, Chunks into SQLite per design §5; copy raw asset bytes into data_dir/assets/<aa>/<asset_id> (or reference if larger than threshold); record an ingest_runs row.
Why now / why this size
P1's terminal task. Closes the loop walk → parse → chunk → store. The FTS5 virtual table and triggers are intentionally deferred to p2-1 to keep this task focused on the relational schema and asset I/O.
Allowed dependencies
kebab-core
kebab-config
rusqlite (with bundled-sqlcipher disabled; use bundled feature)
refinery for migrations
serde_json
time
blake3 (asset copy verification)
tracing
thiserror
Forbidden dependencies
kebab-source-fs (only types via kebab-core), kebab-parse-md, kebab-normalize, kebab-chunk (only types via kebab-core), kebab-store-vector, kebab-embed*, kebab-search, kebab-llm*, kebab-rag, kebab-tui, kebab-desktop
DDL: migrations/V001__init.sql ships exactly the SQL in design §5.1, §5.2, §5.3, §5.4, §5.5 (chunks table only — FTS table & triggers come in p2-1 as V002), §5.7 jobs/ingest_runs/answers/eval_runs/eval_query_results, §5.6 embedding_records.
Pragmas at open: foreign_keys=ON, journal_mode=WAL, synchronous=NORMAL, temp_store=MEMORY.
One ingest of one document = one transaction (BEGIN..COMMIT). Partial failures roll back; warnings are not failures.
Bulk ingest commits per-document.
Asset writer:
if asset.byte_len <= storage.copy_threshold_mb * 1_048_576: write bytes to assets_dir/<asset_id[..2]>/<asset_id> (mode 0o644), record storage_kind='copied'.
else: do not copy; record storage_kind='reference' with storage_path = asset.source_uri's file path.
In either case, recompute blake3 of the source bytes once on write/verify and store in assets.checksum. Mismatch → return StoreError::Conflict.
Idempotency: re-ingesting the same (workspace_path, asset_id, parser_version) updates documents.updated_at, increments doc_version, replaces blocks/chunks. No row duplication.
document_tags: re-derived from Metadata.tags on each put.
ingest_runs.items_json is null when caller passes summary_only=true.
All wire JSON returned (IngestReport) conforms to docs/wire-schema/v1/ingest_report.schema.json. Fail loudly if schema not present (caller must vendor it).