feat(p1-6): kb-store-sqlite — P1 terminal task #11
Reference in New Issue
Block a user
Delete Branch "feat/p1-6-store-sqlite"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
요약
P1의 마지막 task인
kb-store-sqlite를 구현했습니다. P1 ingest 파이프라인(walk → parse → chunk → store)을 종결하며, design §5 DDL 전체(13 tables + 15 indexes)와DocumentStore/JobRepo두 trait의 풀 구현, 자산 라이터, 그리고 P1-5에서 미뤄진kb_core::Chunk.policy_hashschema reconcile을 함께 포함합니다.기준 문서: tasks/p1/p1-6-store-sqlite.md, docs/superpowers/specs/2026-04-27-kb-final-form-design.md (§5 ALL DDL / §5.8 transactions / §6.3 data_dir / §7.2 traits / §10 errors).
구현
kb-core schema reconcile (P1-5 review에서 deferred된 항목)
kb_core::Chunk에policy_hash: String필드 추가.chunk_idrecipe(§4.2)는 이미 동일 값을 사용하고 있어 ID는 변하지 않고 wire form만 보강됨.kb-chunk::build_chunk가 새 필드를 채우도록 갱신.fixtures/markdown/long-section.chunks.snapshot.jsonbaseline 재생성 —policy_hash8줄만 추가,chunk_id는 변동 없음.crates/kb-store-sqliteCargo.toml— production deps만 사용 (kb-core, kb-config, anyhow, blake3, refinery, rusqlite[bundled], serde_json, thiserror, time, tracing).kb-parse-md/kb-normalize/kb-chunk는 dev-only.src/store.rs—SqliteStore::open+ pragmas (foreign_keys=ON, journal_mode=WAL, synchronous=NORMAL, temp_store=MEMORY) +run_migrations+ 자산 atomic writer +lock_connpoison-recovery helper +validate_asset_id(32 ASCII hex 강제).src/documents.rs—DocumentStore7개 메서드 (idempotent UPSERT, doc_version 자동 증분, blocks/chunks DELETE-then-INSERT, document_tags 재도출).src/jobs.rs—JobRepo4개 메서드 (pending → running 자동 전이, 에러 round-trip).src/error.rs—StoreError { Sqlite, Migration, Conflict }.migrations/V001__init.sql— §5.1-§5.7 DDL 전체. FTS5 가상 테이블/트리거는 P2-1 V002로 명시 연기.tests/— 15개 통합 테스트 (8 카테고리 + 추가 회귀).동작 contract 핵심
<final>.tmp.<pid>.<atomic_counter>에 쓴 뒤 fsync → row UPSERT → atomicrename. UPSERT 실패 시 temp 파일 best-effort cleanup으로 고아 파일 차단.put_asset_with_bytes/put_asset진입점에서 32 ASCII hex 검사. path-traversal 방어.lock_conn().unwrap_or_else(|p| p.into_inner())을 거쳐 한 번의 panic이 store 전체를 무력화하지 않도록 함.doc_version + 1, blocks/chunks/document_tags DELETE-then-INSERT.kb-app의 컴포지션 책임으로 명시.검증
cargo check / build (RUSTFLAGS=\"-D warnings\") / clippy --all-targets -D warningscleancargo test -p kb-store-sqlite→ 15 통과 (3 asset writer + 1 migration + 3 idempotency + 1 round-trip + 1 ingest_report snapshot + 3 jobs + 1 list_docs + 2 신규 회귀 테스트)cargo test --workspace전체 그린cargo tree -p kb-store-sqlite --depth 1→ 허용 deps만 (parser/normalize/chunk crate 누출 없음)Review trail
e41279d,15b4d80,b7367de) — 신규 회귀 테스트 2건 포함, 기존 13건 유지후속 자연 해소 (비차단, P2-1로 이연)
schema_meta테이블 활성화 (§5.9 권위와 refinery bookkeeping의 정합 결정 — V002에서 정리)IngestReportJSON Schema 검증 단계 추가 (docs/wire-schema/v1/ingest_report.schema.json검사)policy_hash필드 갱신 (doc-only)list_documentsN+1 tags re-query 최적화 (P2 데이터양 측정 후)kb_core::AssetId::try_new(s) -> Result<Self>추가하여 검증 단일 출처화다음
P1 마지막 PR입니다. 머지 후 P2-1(
kb-searchlexical / FTS5) 단계로 진입합니다. P2-1 시작 시 위 후속 항목들을 함께 정리할 예정입니다.New workspace member crate `kb-store-sqlite` (allowed deps only: kb-core, kb-config, rusqlite[bundled], refinery, serde, serde_json, time, blake3, tracing, anyhow, thiserror; dev-deps add kb-parse-md / kb-normalize / kb-chunk for the contract round-trip test). Migration V001 replaces the P0-1 stub with the full §5 DDL (assets, documents, document_tags, blocks, chunks with policy_hash, embedding_records, jobs, ingest_runs, answers, eval_runs, eval_query_results) plus the §5 indexes. FTS5 virtual table + triggers remain deferred to V002 (P2-1). Public surface per task spec: SqliteStore::open / run_migrations / put_asset_with_bytes impl DocumentStore for SqliteStore (7 trait methods) impl JobRepo for SqliteStore (4 trait methods) StoreError { Sqlx, Migration, Conflict } Behavior: - Pragmas at open: foreign_keys=ON, journal_mode=WAL, synchronous=NORMAL, temp_store=MEMORY. - Asset writer: byte_len ≤ copy_threshold_mb * 1MiB → copy to data_dir/assets/<aa>/<asset_id> (mode 0o644 on Unix), else reference. blake3(bytes) verified against asset.checksum; mismatch → Conflict. - Idempotency: put_document UPSERTs and bumps doc_version + 1 on conflict; put_blocks / put_chunks DELETE-then-INSERT; document_tags re-derived per put_document. - get_document rehydrates blocks via payload_json ordered by stream ordinal. - list_documents builds dynamic WHERE from DocFilter (lang / trust_min / path_glob via GLOB / tags_any via document_tags subquery). - JobRepo: jobs.kind/status are stored as lowercase enum tags; create mints a 32-hex JobId via blake3(kind || payload || nanos). Tests follow in subsequent commits.All 8 test categories from the task plan, plus a JobRepo subset: migration — tests/migration.rs: fresh DB after run_migrations exposes every required §5 table + index. unit (copy) — tests/asset_writer.rs: copy mode writes file with mode 0o644 + correct bytes. unit (ref) — tests/asset_writer.rs: reference mode does not write file; row records source path. unit (cs) — tests/asset_writer.rs: tampered checksum returns a Conflict-flavoured anyhow error. unit (idem) — tests/idempotency.rs: same put_document twice → 1 row, doc_version 1→2; tags re-derived. unit (rb) — tests/idempotency.rs: put_blocks with FK violation rolls back; pre-existing rows unchanged. contract — tests/contract_roundtrip.rs: drives kb-parse-md + kb-normalize + kb-chunk on fixtures/markdown/code-and-table.md, persists, then reloads via DocumentStore::get_document / get_chunk and asserts byte-equal round-trip. snapshot — tests/ingest_report_snapshot.rs + snapshots/ingest_report.snapshot.json: pin the wire JSON form of kb_core::IngestReport for an inline fixture run. jobs — tests/jobs.rs: create → progress → finish flow; error message round-trip; list filters on status/kind. Drops the unused `serde` direct dep from Cargo.toml; serde_json brings its own. Dev-deps confirmed via `cargo tree -p kb-store-sqlite --depth 1` to live only in the dev tree.Three Important review fixes on the kb-store-sqlite write path: I1. Atomic asset write. put_asset_with_bytes now stages bytes to `<final>.tmp.<pid>.<n>`, fsyncs, UPSERTs the row, then `rename`s into place (atomic on POSIX same-fs). On any failure between staging and rename we best-effort `remove_file` the temp so the previous orphan risk on UPSERT failure is gone. Reference mode is unchanged (no I/O, no orphan risk). I2. Poison-tolerant mutex. New `lock_conn` helper does `.lock().unwrap_or_else(|p| p.into_inner())`, so a single panic mid- transaction no longer poisons every subsequent store call. The rusqlite Transaction Drop already rolls back on panic, leaving the Connection state safe to reuse. All 13 prior `.expect("sqlite mutex poisoned")` sites in store.rs / documents.rs / jobs.rs now route through `lock_conn`. I3. AssetId shape validation. `kb_core::AssetId(pub String)` lets a hand-construction bypass the `FromStr` 32-hex invariant. Added `validate_asset_id` (32 ASCII hex chars) at every store entry that turns an AssetId into a path: `put_asset_with_bytes` and `DocumentStore::put_asset`. This shuts a potential path-traversal via `assets_path_for`'s `&id[..2]` shard slice. Tests: - `put_asset_with_bytes_orphan_cleanup_on_upsert_failure` — pre-seeds a row that takes the same `workspace_path` (UNIQUE), so the UPSERT trips a constraint not covered by `ON CONFLICT(asset_id)`. Asserts no final file and no leaked `*.tmp.*`. - `put_asset_with_bytes_rejects_invalid_asset_id` — passes `AssetId("../etc/passwd_padded_to_xx_xxxxx")` (32 chars, contains `/`). Asserts error and zero filesystem artifacts under `data_dir/assets/`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>리뷰: 모든 항목 해소 — APPROVE 권고 (gate 정책상 COMMENT 등록)
내부 review loop를 3회 완료했습니다. 마지막 리뷰어 verdict: APPROVED, must-fix-now 0건.
Review trail
e41279d,15b4d80,b7367de핵심 검증
DocumentStore7 trait method +JobRepo4 trait methodlock_connhelperput_asset_with_bytes/put_asset진입점 (I3)doc_version + 1, blocks/chunks DELETE-INSERT)put_blocks_transactional_rollback_on_fk_violation핀kb_core::Chunk.policy_hashschema reconcileGate 통과
cargo check / build (RUSTFLAGS=-D warnings) / clippy --all-targets -D warningscleancargo test -p kb-store-sqlite→ 15 통과 (was 13, +2 신규: orphan cleanup / invalid AssetId rejection)cargo test --workspace전체 그린cargo tree -p kb-store-sqlite --depth 1→ 허용 deps만 (kb-parse-md/kb-normalize/kb-chunk는 dev-only)후속 자연 해소 (비차단, P2-1로 이연)
schema_meta테이블 활성화 + refinery bookkeeping과의 정합 (§5.9)IngestReportJSON Schema 검증 단계kb_core::AssetId::try_new도입하여 store 측validate_asset_id중복 제거policy_hash추가 (doc-only)list_documentsN+1 prepare 최적화 (perf, 데이터양 측정 후)결론
P1의 마지막 빌딩블록이 안정적으로 마무리되었습니다. self-review gate 정책상 본 코멘트는 COMMENT — author 측 수동 APPROVE + merge 부탁드립니다.