Commit Graph

3 Commits

Author SHA1 Message Date
94bfc50efd feat(p2-1): chunks_fts virtual table + sync triggers (V002 migration)
Adds FTS5 lexical index for chunks per design §5.5: chunks_fts virtual
table (unicode61 remove_diacritics 2 tokenizer, contentless w/ UNINDEXED
chunk_id+doc_id) plus chunks_ai/chunks_ad/chunks_au triggers that mirror
every chunks mutation into chunks_fts inside the host transaction.

V002 ships the verbatim §5.5 SQL block plus a one-shot backfill INSERT
so existing P1 databases gain searchability without re-ingest. Refinery
bookkeeping makes double-apply naturally idempotent.

Adds rebuild_chunks_fts(&Connection) escape hatch for kb index
--rebuild-fts (CLI wiring deferred to a later task). Uses SAVEPOINT
instead of Transaction so callers can invoke from inside an outer
transaction; WAL serializes writers so no DELETE/INSERT race vs.
concurrent chunks mutators is possible.

Tests (10): V001-only → V002 cold-upgrade backfill (literal path),
chunks_ai/ad/au trigger sync, MATCH-token verification, rebuild
idempotency, drift recovery, double-run no-op, V002 ↔ design §5.5
verbatim diff guard (anchored extraction from both files), WAL/SHM
release on store drop. All 185 workspace tests pass.

Allowed deps respected (kb-core, kb-config, rusqlite, refinery — no
new deps). FTS query implementation deferred to p2-2 (lexical-retriever).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:42:15 +00:00
a3390d5171 p1-6: scaffold kb-store-sqlite crate + V001 full §5 DDL
New workspace member crate `kb-store-sqlite` (allowed deps only:
kb-core, kb-config, rusqlite[bundled], refinery, serde, serde_json,
time, blake3, tracing, anyhow, thiserror; dev-deps add kb-parse-md /
kb-normalize / kb-chunk for the contract round-trip test).

Migration V001 replaces the P0-1 stub with the full §5 DDL (assets,
documents, document_tags, blocks, chunks with policy_hash,
embedding_records, jobs, ingest_runs, answers, eval_runs,
eval_query_results) plus the §5 indexes. FTS5 virtual table + triggers
remain deferred to V002 (P2-1).

Public surface per task spec:
  SqliteStore::open / run_migrations / put_asset_with_bytes
  impl DocumentStore for SqliteStore (7 trait methods)
  impl JobRepo for SqliteStore (4 trait methods)
  StoreError { Sqlx, Migration, Conflict }

Behavior:
- Pragmas at open: foreign_keys=ON, journal_mode=WAL,
  synchronous=NORMAL, temp_store=MEMORY.
- Asset writer: byte_len ≤ copy_threshold_mb * 1MiB → copy to
  data_dir/assets/<aa>/<asset_id> (mode 0o644 on Unix), else
  reference. blake3(bytes) verified against asset.checksum; mismatch →
  Conflict.
- Idempotency: put_document UPSERTs and bumps doc_version + 1 on
  conflict; put_blocks / put_chunks DELETE-then-INSERT; document_tags
  re-derived per put_document.
- get_document rehydrates blocks via payload_json ordered by stream
  ordinal.
- list_documents builds dynamic WHERE from DocFilter (lang / trust_min
  / path_glob via GLOB / tags_any via document_tags subquery).
- JobRepo: jobs.kind/status are stored as lowercase enum tags; create
  mints a 32-hex JobId via blake3(kind || payload || nanos).

Tests follow in subsequent commits.
2026-04-30 17:08:36 +00:00
a166b7051c p0-1: wire-schema stubs, doc/spec stubs, V001 migration, fixtures
- docs/wire-schema/v1/ ships 7 schema stubs (citation, search_hit,
  answer, ingest_report, doc_summary, chunk_inspection, doctor) that
  pin schema_version + required fields per design §2. Full property
  validation lands in later phases.
- docs/spec/ ships 7 markdown stubs each linking to the canonical
  frozen design (domain-model, ids, canonical-document, chunk-policy,
  citation-policy, module-boundaries, ai-generation-guidelines).
- migrations/V001__init.sql contains only schema_meta + migrations
  tables per design §5.1; data tables ship in P1-6/P2-1/P3-3.
- fixtures/ has the 11 subdirectories every downstream task references
  (markdown, source-fs, search/{lexical,hybrid}, embed, vector, rag,
  eval, image, pdf, audio). Empty subdirs use .gitkeep so they track.
  fixtures/markdown/ ships the 3 phase-0 fixtures: simple-note.md,
  nested-headings.md, code-and-table.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:17:32 +00:00