Files
kebab/migrations/V002__fts.sql
altair823 94bfc50efd feat(p2-1): chunks_fts virtual table + sync triggers (V002 migration)
Adds FTS5 lexical index for chunks per design §5.5: chunks_fts virtual
table (unicode61 remove_diacritics 2 tokenizer, contentless w/ UNINDEXED
chunk_id+doc_id) plus chunks_ai/chunks_ad/chunks_au triggers that mirror
every chunks mutation into chunks_fts inside the host transaction.

V002 ships the verbatim §5.5 SQL block plus a one-shot backfill INSERT
so existing P1 databases gain searchability without re-ingest. Refinery
bookkeeping makes double-apply naturally idempotent.

Adds rebuild_chunks_fts(&Connection) escape hatch for kb index
--rebuild-fts (CLI wiring deferred to a later task). Uses SAVEPOINT
instead of Transaction so callers can invoke from inside an outer
transaction; WAL serializes writers so no DELETE/INSERT race vs.
concurrent chunks mutators is possible.

Tests (10): V001-only → V002 cold-upgrade backfill (literal path),
chunks_ai/ad/au trigger sync, MATCH-token verification, rebuild
idempotency, drift recovery, double-run no-op, V002 ↔ design §5.5
verbatim diff guard (anchored extraction from both files), WAL/SHM
release on store drop. All 185 workspace tests pass.

Allowed deps respected (kb-core, kb-config, rusqlite, refinery — no
new deps). FTS query implementation deferred to p2-2 (lexical-retriever).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:42:15 +00:00

48 lines
2.2 KiB
SQL
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
-- V002__fts.sql — FTS5 virtual table + sync triggers.
--
-- Per design §5.5 (chunks_fts virtual table + chunks_ai/ad/au triggers).
-- The CREATE VIRTUAL TABLE / CREATE TRIGGER block below is reproduced
-- VERBATIM from `docs/superpowers/specs/2026-04-27-kb-final-form-design.md`
-- §5.5 lines 866885; CI diff-checks this against the design doc.
--
-- Tokenizer choice: `unicode61 remove_diacritics 2` follows the design
-- default for P2-1 (Korean morphological tokenizer is a P+ note).
--
-- Backfill: V001 already shipped the `chunks` table without an FTS
-- shadow; on V002 apply we seed `chunks_fts` from the existing rows so
-- already-ingested workspaces become searchable without re-ingesting.
-- Per design §9 (versioning), V002 is additive: no destructive change
-- to V001 tables.
-- ── §5.5 verbatim block ────────────────────────────────────────────────
CREATE VIRTUAL TABLE chunks_fts USING fts5(
chunk_id UNINDEXED,
doc_id UNINDEXED,
heading_path,
text,
tokenize = 'unicode61 remove_diacritics 2'
);
CREATE TRIGGER chunks_ai AFTER INSERT ON chunks BEGIN
INSERT INTO chunks_fts(chunk_id, doc_id, heading_path, text)
VALUES (new.chunk_id, new.doc_id, new.heading_path_json, new.text);
END;
CREATE TRIGGER chunks_ad AFTER DELETE ON chunks BEGIN
DELETE FROM chunks_fts WHERE chunk_id = old.chunk_id;
END;
CREATE TRIGGER chunks_au AFTER UPDATE ON chunks BEGIN
DELETE FROM chunks_fts WHERE chunk_id = old.chunk_id;
INSERT INTO chunks_fts(chunk_id, doc_id, heading_path, text)
VALUES (new.chunk_id, new.doc_id, new.heading_path_json, new.text);
END;
-- ── End §5.5 verbatim block ───────────────────────────────────────────
-- One-shot backfill for existing chunks. The triggers above only fire
-- on future mutations; V001 may have left `chunks` populated. Refinery
-- runs V002 exactly once via its bookkeeping table, so this INSERT is
-- naturally idempotent across restarts.
INSERT INTO chunks_fts(chunk_id, doc_id, heading_path, text)
SELECT chunk_id, doc_id, heading_path_json, text FROM chunks;