Also fixes snapshot drift in code-and-table.canonical.snapshot.json
introduced by task 2 (CanonicalDocument gains last_chunker_version +
last_embedding_version fields).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the terminal (completed) and aborted branches of status_line
to include the unchanged counter alongside new/updated/skipped, so
users can see how many assets were skipped via the incremental-ingest
early-skip path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `--force-reingest` to the `ingest` subcommand and wires it
through `IngestOpts` into `ingest_with_config_opts`, bypassing the
per-asset early-skip path when set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the per-asset incremental-ingest skip block to all three flows
(markdown / image / pdf). When `IngestOpts::force_reingest = false`
AND the asset's blake3 checksum + parser/chunker/embedding versions
all match the existing DB record, ingest emits
`AssetFinished { result: Unchanged }`, bumps `aggregate.unchanged`,
and skips parse / chunk / embed / vector upsert entirely.
Shared `try_skip_unchanged` helper performs the four checks; per-flow
callers supply the active parser_version + chunker_version + optional
embedding_version. `force_reingest = true` bypasses the skip path so
`incremental_ingest::force_reingest_bypasses_skip` still sees `Updated`.
Tests:
- new `incremental_ingest.rs` covers both paths.
- existing `ingest_idempotent_on_second_run` /
`re_ingest_image_produces_*` / `re_ingest_identical_pdf_produces_*`
updated to assert `Unchanged` on identical-bytes re-ingest (the
pre-task behaviour was `Updated`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All three ingest flows (markdown, image, pdf) now set
last_chunker_version and last_embedding_version on the CanonicalDocument
before calling put_document, giving Task 7's skip detection the data it
needs on the second run. No skip path is added yet.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `DocumentStore::get_asset_by_workspace_path` trait method to
`kebab-core` and implement it on `SqliteStore` via a private
`asset_from_row` helper. Used by the incremental-ingest skip path to
compare a freshly-computed blake3 checksum against the persisted row
without a full round-trip through `put_asset_with_bytes`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add V006__incremental_ingest.sql to persist last_chunker_version and
last_embedding_version on the documents table. Wire both columns into
upsert_document (INSERT + ON CONFLICT UPDATE) and get_document (SELECT +
row mapper), replacing the previous hardcoded None. Add two round-trip
tests in tests/incremental_ingest.rs covering the set and None cases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
reviewer-flagged: aa2a6ea claimed build clean but missed:
- crates/kebab-store-sqlite/tests/ingest_report_snapshot.rs (test fixture)
- crates/kebab-cli/src/wire.rs (test fixture)
- crates/kebab-store-sqlite/snapshots/ingest_report.snapshot.json (snapshot)
All three add `unchanged: 0` (or `\"unchanged\": 0`) to match the new
IngestReport.unchanged field. cargo clippy --workspace --all-targets
-- -D warnings now clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
도그푸딩 피드백: 변경/신규 doc 만 ingest, 변하지 않은 문서는 skip.
설계 핵심:
- Skip 조건 4 개 (full version cascade): blake3 checksum + parser_version
+ chunker_version + embedding_version 모두 일치 시 parse/chunk/embed/
vector upsert 회피. 비용 dominator (fastembed) 가 변경된 / 새 doc 에만.
- SQLite V006 migration — `documents` 에 `last_chunker_version` +
`last_embedding_version` column 추가. 기존 row NULL → 첫 ingest 강제
재처리 (안전 default).
- `IngestItemKind::Unchanged` enum variant 신규 (기존 `Skipped` 와
의미 분리 — `Skipped` 는 media-type 필터, `Unchanged` 는 모든 versions
match).
- `IngestReport` + `AggregateCounts` 에 `unchanged: u32` 필드 추가.
wire schema additive — v1 호환 유지.
- `--force-reingest` flag — skip 무시하고 강제 재처리.
- TUI status_line final 에 `unchanged=N` 노출 (p9-fb-24 status bar
dynamic slot 자동 cascade).
Spec status `planned`. 다음 단계: writing-plans skill 로 implementation
plan 작성.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>