feat(kebab-app): p9-fb-23 task 7 — early-skip Unchanged path in ingest
Adds the per-asset incremental-ingest skip block to all three flows
(markdown / image / pdf). When `IngestOpts::force_reingest = false`
AND the asset's blake3 checksum + parser/chunker/embedding versions
all match the existing DB record, ingest emits
`AssetFinished { result: Unchanged }`, bumps `aggregate.unchanged`,
and skips parse / chunk / embed / vector upsert entirely.
Shared `try_skip_unchanged` helper performs the four checks; per-flow
callers supply the active parser_version + chunker_version + optional
embedding_version. `force_reingest = true` bypasses the skip path so
`incremental_ingest::force_reingest_bypasses_skip` still sees `Updated`.
Tests:
- new `incremental_ingest.rs` covers both paths.
- existing `ingest_idempotent_on_second_run` /
`re_ingest_image_produces_*` / `re_ingest_identical_pdf_produces_*`
updated to assert `Unchanged` on identical-bytes re-ingest (the
pre-task behaviour was `Updated`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -363,10 +363,14 @@ async fn garbage_png_increments_errors_counter_exactly_once() {
|
||||
|
||||
// ── 6. Determinism: re-ingest produces identical doc_id / chunk_id ───────
|
||||
|
||||
/// Idempotency contract — running the same ingest twice should mark
|
||||
/// the asset Updated on the second run with byte-identical IDs.
|
||||
/// Idempotency contract — running the same ingest twice keeps the
|
||||
/// doc_id stable. p9-fb-23 task 7 introduced the early-skip path for
|
||||
/// incremental ingest: when checksum + parser/chunker/embedding versions
|
||||
/// all match, the second run reports `Unchanged` rather than `Updated`.
|
||||
/// The pre-p9-fb-23 contract was `Updated` — that path is still exercised
|
||||
/// by `force_reingest = true` tests in `incremental_ingest.rs`.
|
||||
#[tokio::test]
|
||||
async fn re_ingest_image_produces_updated_with_same_doc_id() {
|
||||
async fn re_ingest_image_produces_unchanged_with_same_doc_id() {
|
||||
let server = MockServer::start().await;
|
||||
Mock::given(method("POST"))
|
||||
.and(path("/api/generate"))
|
||||
@@ -416,6 +420,6 @@ async fn re_ingest_image_produces_updated_with_same_doc_id() {
|
||||
.iter()
|
||||
.find(|i| i.doc_path.0.ends_with("diagram.png"))
|
||||
.unwrap();
|
||||
assert_eq!(img2.kind, kebab_core::IngestItemKind::Updated);
|
||||
assert_eq!(img2.kind, kebab_core::IngestItemKind::Unchanged);
|
||||
assert_eq!(img2.doc_id.as_ref().unwrap(), &id1);
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user