Commit Graph

5 Commits

Author SHA1 Message Date
15b4d80efc p1-6: rename StoreError::Sqlx -> Sqlite, drop dead assets_root helper
M1: `Sqlx` is a misleading leftover — this crate uses `rusqlite`, not
sqlx. Rename the variant (and the doc reference to it) to `Sqlite`. No
external pattern matches; the variant is reached only via `#[from]`.

M11: `assets_root` was an `#[allow(dead_code)]` helper introduced for a
test that never landed. Delete it so the dead-code allow goes with it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:33:50 +00:00
e41279de96 p1-6: harden store boundary (atomic asset write, poison-tolerant mutex, AssetId validation)
Three Important review fixes on the kb-store-sqlite write path:

I1. Atomic asset write. put_asset_with_bytes now stages bytes to
    `<final>.tmp.<pid>.<n>`, fsyncs, UPSERTs the row, then `rename`s into
    place (atomic on POSIX same-fs). On any failure between staging and
    rename we best-effort `remove_file` the temp so the previous orphan
    risk on UPSERT failure is gone. Reference mode is unchanged (no I/O,
    no orphan risk).

I2. Poison-tolerant mutex. New `lock_conn` helper does
    `.lock().unwrap_or_else(|p| p.into_inner())`, so a single panic mid-
    transaction no longer poisons every subsequent store call. The
    rusqlite Transaction Drop already rolls back on panic, leaving the
    Connection state safe to reuse. All 13 prior `.expect("sqlite mutex
    poisoned")` sites in store.rs / documents.rs / jobs.rs now route
    through `lock_conn`.

I3. AssetId shape validation. `kb_core::AssetId(pub String)` lets a
    hand-construction bypass the `FromStr` 32-hex invariant. Added
    `validate_asset_id` (32 ASCII hex chars) at every store entry that
    turns an AssetId into a path: `put_asset_with_bytes` and
    `DocumentStore::put_asset`. This shuts a potential path-traversal via
    `assets_path_for`'s `&id[..2]` shard slice.

Tests:
- `put_asset_with_bytes_orphan_cleanup_on_upsert_failure` — pre-seeds a
  row that takes the same `workspace_path` (UNIQUE), so the UPSERT trips
  a constraint not covered by `ON CONFLICT(asset_id)`. Asserts no final
  file and no leaked `*.tmp.*`.
- `put_asset_with_bytes_rejects_invalid_asset_id` — passes
  `AssetId("../etc/passwd_padded_to_xx_xxxxx")` (32 chars, contains `/`).
  Asserts error and zero filesystem artifacts under `data_dir/assets/`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:33:19 +00:00
efdb71b1c3 p1-6: list_documents filter coverage test
Round-trip three docs (en/ko, varied tags, varied trust) and exercise
each DocFilter axis: default (all), lang, path_glob (workspace_path
GLOB), tags_any (intersection via document_tags subquery + per-row tag
hydration), and trust_min (Primary > Secondary > Generated rank gate).
2026-04-30 17:14:17 +00:00
111f40ddf0 p1-6: kb-store-sqlite test suite (8 categories)
All 8 test categories from the task plan, plus a JobRepo subset:

  migration   — tests/migration.rs: fresh DB after run_migrations
                exposes every required §5 table + index.
  unit (copy) — tests/asset_writer.rs: copy mode writes file with
                mode 0o644 + correct bytes.
  unit (ref)  — tests/asset_writer.rs: reference mode does not write
                file; row records source path.
  unit (cs)   — tests/asset_writer.rs: tampered checksum returns a
                Conflict-flavoured anyhow error.
  unit (idem) — tests/idempotency.rs: same put_document twice → 1 row,
                doc_version 1→2; tags re-derived.
  unit (rb)   — tests/idempotency.rs: put_blocks with FK violation
                rolls back; pre-existing rows unchanged.
  contract    — tests/contract_roundtrip.rs: drives kb-parse-md +
                kb-normalize + kb-chunk on
                fixtures/markdown/code-and-table.md, persists, then
                reloads via DocumentStore::get_document /
                get_chunk and asserts byte-equal round-trip.
  snapshot    — tests/ingest_report_snapshot.rs +
                snapshots/ingest_report.snapshot.json: pin the wire
                JSON form of kb_core::IngestReport for an inline
                fixture run.
  jobs        — tests/jobs.rs: create → progress → finish flow;
                error message round-trip; list filters on status/kind.

Drops the unused `serde` direct dep from Cargo.toml; serde_json brings
its own. Dev-deps confirmed via `cargo tree -p kb-store-sqlite --depth 1`
to live only in the dev tree.
2026-04-30 17:13:03 +00:00
a3390d5171 p1-6: scaffold kb-store-sqlite crate + V001 full §5 DDL
New workspace member crate `kb-store-sqlite` (allowed deps only:
kb-core, kb-config, rusqlite[bundled], refinery, serde, serde_json,
time, blake3, tracing, anyhow, thiserror; dev-deps add kb-parse-md /
kb-normalize / kb-chunk for the contract round-trip test).

Migration V001 replaces the P0-1 stub with the full §5 DDL (assets,
documents, document_tags, blocks, chunks with policy_hash,
embedding_records, jobs, ingest_runs, answers, eval_runs,
eval_query_results) plus the §5 indexes. FTS5 virtual table + triggers
remain deferred to V002 (P2-1).

Public surface per task spec:
  SqliteStore::open / run_migrations / put_asset_with_bytes
  impl DocumentStore for SqliteStore (7 trait methods)
  impl JobRepo for SqliteStore (4 trait methods)
  StoreError { Sqlx, Migration, Conflict }

Behavior:
- Pragmas at open: foreign_keys=ON, journal_mode=WAL,
  synchronous=NORMAL, temp_store=MEMORY.
- Asset writer: byte_len ≤ copy_threshold_mb * 1MiB → copy to
  data_dir/assets/<aa>/<asset_id> (mode 0o644 on Unix), else
  reference. blake3(bytes) verified against asset.checksum; mismatch →
  Conflict.
- Idempotency: put_document UPSERTs and bumps doc_version + 1 on
  conflict; put_blocks / put_chunks DELETE-then-INSERT; document_tags
  re-derived per put_document.
- get_document rehydrates blocks via payload_json ordered by stream
  ordinal.
- list_documents builds dynamic WHERE from DocFilter (lang / trust_min
  / path_glob via GLOB / tags_any via document_tags subquery).
- JobRepo: jobs.kind/status are stored as lowercase enum tags; create
  mints a 32-hex JobId via blake3(kind || payload || nanos).

Tests follow in subsequent commits.
2026-04-30 17:08:36 +00:00