State drift after P0–P4 completion + 3 post-merge hotfixes (PR #20
--config across subcommands, PR #24 --config in kb ask, PR #25 RRF
fusion_score normalization). README still framed the project as
"spec frozen, code 0 lines"; phase docs and task specs all carried
status: planned. Sweep:
- README.md: top banner now "P0–P4 done (17/31 tasks) + 3 hotfixes
applied"; command table marks each subcommand's owning phase and
current status (kb ask = ✅ via P4-3, kb eval = ⏳ P5);
phase roadmap table grew a Status column (P0–P4 completed, P5
next, P6–P9 pending); component count bumped 30 → 31 to reflect
P3-5 (app-wiring, post-spec); core decisions table notes the
RRF [0,1] normalization invariant; build+실행 section drops the
"P0 미시작" caveat; new pointers to HOTFIXES.md and SMOKE.md.
- docs/SMOKE.md (new): ~80-line recipe for running the full
pipeline against an isolated /tmp/kb-smoke/ workspace via
--config, without polluting ~/.config/kb/ or
~/.local/share/kb/. Covers fixture seeding, sample config.toml
with the post-merge defaults, doctor → ingest → list →
search × 3 modes → inspect → ask sequence, verification
checklist, and known-behaviour notes (fastembed model
download, RAG response time, --config hard-fail on missing
path).
- tasks/phase-{0..4}-*.md: status frontmatter flipped planned →
completed.
- tasks/p0/, tasks/p1/, tasks/p2/, tasks/p3/, tasks/p4/: same
status flip across all 17 component task specs (1+6+2+5+3).
P5–P9 stay planned.
cargo test --workspace: 319 passed; clippy clean (no source
changes in this commit, just docs + frontmatter).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Persist RawAsset, CanonicalDocument, Blocks, Chunks into SQLite per design §5; copy raw asset bytes into data_dir/assets/<aa>/<asset_id> (or reference if larger than threshold); record an ingest_runs row.
Why now / why this size
P1's terminal task. Closes the loop walk → parse → chunk → store. The FTS5 virtual table and triggers are intentionally deferred to p2-1 to keep this task focused on the relational schema and asset I/O.
Allowed dependencies
kb-core
kb-config
rusqlite (with bundled-sqlcipher disabled; use bundled feature)
refinery for migrations
serde_json
time
blake3 (asset copy verification)
tracing
thiserror
Forbidden dependencies
kb-source-fs (only types via kb-core), kb-parse-md, kb-normalize, kb-chunk (only types via kb-core), kb-store-vector, kb-embed*, kb-search, kb-llm*, kb-rag, kb-tui, kb-desktop
DDL: migrations/V001__init.sql ships exactly the SQL in design §5.1, §5.2, §5.3, §5.4, §5.5 (chunks table only — FTS table & triggers come in p2-1 as V002), §5.7 jobs/ingest_runs/answers/eval_runs/eval_query_results, §5.6 embedding_records.
Pragmas at open: foreign_keys=ON, journal_mode=WAL, synchronous=NORMAL, temp_store=MEMORY.
One ingest of one document = one transaction (BEGIN..COMMIT). Partial failures roll back; warnings are not failures.
Bulk ingest commits per-document.
Asset writer:
if asset.byte_len <= storage.copy_threshold_mb * 1_048_576: write bytes to assets_dir/<asset_id[..2]>/<asset_id> (mode 0o644), record storage_kind='copied'.
else: do not copy; record storage_kind='reference' with storage_path = asset.source_uri's file path.
In either case, recompute blake3 of the source bytes once on write/verify and store in assets.checksum. Mismatch → return StoreError::Conflict.
Idempotency: re-ingesting the same (workspace_path, asset_id, parser_version) updates documents.updated_at, increments doc_version, replaces blocks/chunks. No row duplication.
document_tags: re-derived from Metadata.tags on each put.
ingest_runs.items_json is null when caller passes summary_only=true.
All wire JSON returned (IngestReport) conforms to docs/wire-schema/v1/ingest_report.schema.json. Fail loudly if schema not present (caller must vendor it).