Files
kebab/crates/kebab-store-sqlite/snapshots/ingest_report.snapshot.json
altair823 b9ee09f176 feat(app): wire PDF OCR enrichment + cancel propagation into ingest_one_pdf_asset (H-5 eager init + post-extract hook + per-page cancel) + workspace lopdf dep (Step 4 M-4)
Step 6 (Group E) of v0.20.0 sub-item 1 (scanned PDF OCR) plan +
Step 7 spillover (IngestEvent variant + IngestItem field for compile
boundary) + Step 4 reviewer Minor M-4 fix.

E1 — eager PDF OCR engine build at `ingest_with_config_opts` entry,
mirror of image OCR pattern (lib.rs:338-347). `pdf.ocr.enabled ||
always_on` 시 `OllamaVisionOcr::from_parts(endpoint, model, ...)` 호출
+ fail-fast `?`. App field 추가 0 (local var only, spec L-1 / Step 1
A1 cosmetic fix 정합).

E2 — `ingest_one_pdf_asset` signature extension: +3 param
(`pdf_ocr_engine: Option<&OllamaVisionOcr>`, `progress: Option<&
mpsc::Sender<IngestEvent>>`, `cancel: Option<&Arc<AtomicBool>>`).
`ingest_one_asset` dispatch wrapper + caller (dispatch loop) update.

E3 — post-extract enrichment block at `extract_for` 직후 (line 1779).
`pdf.ocr.enabled || always_on` 시 `apply_ocr_to_pdf_pages` 호출,
PdfOcrProgress → IngestEvent emit (PdfOcrStarted / PdfOcrFinished
with ocr_engine), summary 의 pages_ocrd/ms_total 을 IngestItem field
로 carry. PR #187 registry dispatch invariant 보존
(`extract_for(&asset.media_type, ...)` 그대로).

E4 — cancel handle propagation: ingest_with_config_cancellable →
IngestOpts.cancel → ingest_with_config_opts → ingest_one_asset →
ingest_one_pdf_asset (new `cancel` param) → PdfOcrOpts.cancel chain.
spec §4.8 line 1159 production wiring.

Step 7 spillover (compile boundary):
- `kebab_app::ingest_progress::IngestEvent`: PdfOcrStarted { page } +
  PdfOcrFinished { page, ms, chars, ocr_engine }. serde discriminant
  `pdf_ocr_started` / `pdf_ocr_finished` (Step 7 G3 wire schema 와 일치).
- `kebab_core::IngestItem`: pdf_ocr_pages: Option<u32> +
  pdf_ocr_ms_total: Option<u64> (warnings/error 사이). 11 non-PDF
  IngestItem construct site 가 `None` 채움.
- `kebab-cli/src/progress.rs` + `kebab-tui/src/ingest_progress.rs`:
  새 variant no-op handler (v1에서 per-page progress 미노출, future
  refinement 시 활성화 가능).
- `kebab-store-sqlite/tests/ingest_report_snapshot.rs` + snapshot
  `ingest_report.snapshot.json`: 2 IngestItem fixture 의 새 field 추가.
- Step 7 의 JSON Schema 갱신 + CLI printer activation + snapshot
  regenerate 는 별 commit (G3/H1/H2 deliverable).

M-4 (Step 4 reviewer Minor) — lopdf workspace dep 통합:
- workspace `Cargo.toml [workspace.dependencies] lopdf = "0.32"`.
- kebab-app + kebab-parse-pdf 의 direct dep → `{ workspace = true }`.

Verifier evidence:
- workspace test (`cargo test --workspace --no-fail-fast -j 1`):
  175 test result summary lines, 0 failures, 0 FAILED.
- workspace clippy (`-D warnings`): exit 0, 0 warning.
- dep graph baseline (`.omc/state/pdf-ocr-{parse-pdf,app-parse}-deps.baseline.txt`):
  empty diff for both.

spec:  docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.4 + §4.6 + §4.8)
plan:  docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 6 E1-E4 + Step 7 partial G1+G2)
prior: 4672cba (Step 5 fix) + fd918a6 (Step 5) + 9f003ef (Step 4 helper)
contract: §9 (additive minor wire bump — Step 7 JSON Schema 완료 시)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 08:18:34 +00:00

66 lines
1.5 KiB
JSON

{
"duration_ms": 187,
"errors": 0,
"items": [
{
"asset_id": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"block_count": 7,
"byte_len": 1234,
"chunk_count": 3,
"chunker_version": "md-heading-v1",
"doc_id": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"doc_path": "notes/alpha.md",
"error": null,
"kind": "new",
"parser_version": "md-frontmatter-v2",
"pdf_ocr_ms_total": null,
"pdf_ocr_pages": null,
"warnings": []
},
{
"asset_id": "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb",
"block_count": 12,
"byte_len": 2048,
"chunk_count": 5,
"chunker_version": "md-heading-v1",
"doc_id": "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb",
"doc_path": "notes/beta.md",
"error": null,
"kind": "updated",
"parser_version": "md-frontmatter-v2",
"pdf_ocr_ms_total": null,
"pdf_ocr_pages": null,
"warnings": [
"malformed frontmatter"
]
}
],
"new": 2,
"scanned": 3,
"scope": {
"exclude": [
".git/**"
],
"include": [
"**/*.md"
],
"root": "/home/u/KB"
},
"skip_examples": {
"builtin_blacklist": [],
"generated": [],
"gitignore": [],
"size_exceeded": []
},
"skipped": 0,
"skipped_builtin_blacklist": 0,
"skipped_by_extension": {},
"skipped_generated": 0,
"skipped_gitignore": 0,
"skipped_kebabignore": 0,
"skipped_size_exceeded": 0,
"unchanged": 0,
"purged_deleted_files": 0,
"updated": 1
}