feat(app): wire PDF OCR enrichment + cancel propagation into ingest_one_pdf_asset (H-5 eager init + post-extract hook + per-page cancel) + workspace lopdf dep (Step 4 M-4)
Step 6 (Group E) of v0.20.0 sub-item 1 (scanned PDF OCR) plan + Step 7 spillover (IngestEvent variant + IngestItem field for compile boundary) + Step 4 reviewer Minor M-4 fix. E1 — eager PDF OCR engine build at `ingest_with_config_opts` entry, mirror of image OCR pattern (lib.rs:338-347). `pdf.ocr.enabled || always_on` 시 `OllamaVisionOcr::from_parts(endpoint, model, ...)` 호출 + fail-fast `?`. App field 추가 0 (local var only, spec L-1 / Step 1 A1 cosmetic fix 정합). E2 — `ingest_one_pdf_asset` signature extension: +3 param (`pdf_ocr_engine: Option<&OllamaVisionOcr>`, `progress: Option<& mpsc::Sender<IngestEvent>>`, `cancel: Option<&Arc<AtomicBool>>`). `ingest_one_asset` dispatch wrapper + caller (dispatch loop) update. E3 — post-extract enrichment block at `extract_for` 직후 (line 1779). `pdf.ocr.enabled || always_on` 시 `apply_ocr_to_pdf_pages` 호출, PdfOcrProgress → IngestEvent emit (PdfOcrStarted / PdfOcrFinished with ocr_engine), summary 의 pages_ocrd/ms_total 을 IngestItem field 로 carry. PR #187 registry dispatch invariant 보존 (`extract_for(&asset.media_type, ...)` 그대로). E4 — cancel handle propagation: ingest_with_config_cancellable → IngestOpts.cancel → ingest_with_config_opts → ingest_one_asset → ingest_one_pdf_asset (new `cancel` param) → PdfOcrOpts.cancel chain. spec §4.8 line 1159 production wiring. Step 7 spillover (compile boundary): - `kebab_app::ingest_progress::IngestEvent`: PdfOcrStarted { page } + PdfOcrFinished { page, ms, chars, ocr_engine }. serde discriminant `pdf_ocr_started` / `pdf_ocr_finished` (Step 7 G3 wire schema 와 일치). - `kebab_core::IngestItem`: pdf_ocr_pages: Option<u32> + pdf_ocr_ms_total: Option<u64> (warnings/error 사이). 11 non-PDF IngestItem construct site 가 `None` 채움. - `kebab-cli/src/progress.rs` + `kebab-tui/src/ingest_progress.rs`: 새 variant no-op handler (v1에서 per-page progress 미노출, future refinement 시 활성화 가능). - `kebab-store-sqlite/tests/ingest_report_snapshot.rs` + snapshot `ingest_report.snapshot.json`: 2 IngestItem fixture 의 새 field 추가. - Step 7 의 JSON Schema 갱신 + CLI printer activation + snapshot regenerate 는 별 commit (G3/H1/H2 deliverable). M-4 (Step 4 reviewer Minor) — lopdf workspace dep 통합: - workspace `Cargo.toml [workspace.dependencies] lopdf = "0.32"`. - kebab-app + kebab-parse-pdf 의 direct dep → `{ workspace = true }`. Verifier evidence: - workspace test (`cargo test --workspace --no-fail-fast -j 1`): 175 test result summary lines, 0 failures, 0 FAILED. - workspace clippy (`-D warnings`): exit 0, 0 warning. - dep graph baseline (`.omc/state/pdf-ocr-{parse-pdf,app-parse}-deps.baseline.txt`): empty diff for both. spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.4 + §4.6 + §4.8) plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 6 E1-E4 + Step 7 partial G1+G2) prior:4672cba(Step 5 fix) +fd918a6(Step 5) +9f003ef(Step 4 helper) contract: §9 (additive minor wire bump — Step 7 JSON Schema 완료 시) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -201,6 +201,9 @@ impl ProgressDisplay {
|
||||
);
|
||||
}
|
||||
}
|
||||
// v0.20.0 sub-item 1: per-page PDF OCR events — not surfaced in
|
||||
// human-readable progress output (no TTY bar update needed).
|
||||
IngestEvent::PdfOcrStarted { .. } | IngestEvent::PdfOcrFinished { .. } => {}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user