feat(cli): activate per-page PDF OCR progress printer + test(app): ingest_progress emit verify + spec(pdf-ocr): align §4.6.1 literal with option_A (ms/chars)

Step 8 (Group H) of v0.20.0 sub-item 1 (scanned PDF OCR) plan +
Step 7 reviewer concern fix (spec literal deviation).

H1 — kebab-cli/src/progress.rs printer activation:
- 구 no-op stub `IngestEvent::PdfOcr* { .. } => {}` (Step 6 placeholder)
  를 사람-친화 stderr line printer 로 활성화.
- spec §4.6.1 line 1085-1086 wording 그대로:
  - PdfOcrStarted → `  📷 OCR page {page}...`
  - PdfOcrFinished (skipped=false) → `  ✓ OCR page {page} ({chars} chars, {ms}ms via {ocr_engine})`
  - PdfOcrFinished (skipped=true)  → `  ⊘ OCR page {page} skipped (no DCTDecode or engine fail, {ms}ms)` (M-4 의 skipped field carry 활용)
- `!quiet` gate 정합 (AssetStarted/Finished pattern mirror).

H2 — crates/kebab-app/tests/ingest_progress.rs 의 새 test:
- pdf_ocr_progress_emits_started_finished_events (real Ollama 의존, `#[ignore]`).
- F1 fixture (scanned_page1.pdf) ingest 시 pdf_ocr_started + pdf_ocr_finished
  event 가 emit 됨을 verify. Started count == Finished count invariant.
- Manual invoke: `KEBAB_PDF_OCR_ENABLED=true cargo test -p kebab-app --test
  ingest_progress --ignored`.
- mock OcrEngine inject path 부재 (Step 6 의 eager build), Step 9 I5 의
  ocr_e2e pattern (real Ollama + `#[ignore]`) 와 동일.

Step 7 reviewer concern fix — spec §4.6.1 literal:
- line 1076-1077 의 `ocr_ms` / `ocr_chars` literal 을 wire schema 의 실제
  field name `ms` / `chars` (option_A, Rust serde 와 정합) 로 갱신.
- line 1087 의 printer wording 도 `{ocr_chars}` / `{ocr_ms}` → `{chars}` / `{ms}`.
- line 1556 의 rationale 참조 `pdf_ocr_finished.ocr_ms` → `.ms`.
- `skipped` field 도 명시 (Step 6 reviewer M-4 결과).

spec:  docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.6.1)
plan:  docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 8 H1+H2)
prior: 4c5ccd5 (Step 7 wire schema) — Step 7 reviewer concern 1 의 fix
contract: §9 (additive minor wire bump — Step 7 commit 에서 완료)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-27 09:18:49 +00:00
parent 4c5ccd5447
commit c9e05941c5
3 changed files with 89 additions and 7 deletions

View File

@@ -201,9 +201,25 @@ impl ProgressDisplay {
);
}
}
// v0.20.0 sub-item 1: per-page PDF OCR events — not surfaced in
// human-readable progress output (no TTY bar update needed).
IngestEvent::PdfOcrStarted { .. } | IngestEvent::PdfOcrFinished { .. } => {}
// v0.20.0 sub-item 1: per-page PDF OCR events — sub-progress lines
// under AssetStarted for scanned PDF. spec §4.6.1 line 1085-1086.
// skipped=true 시 (DCTDecode 부재 또는 engine fail) skip line.
IngestEvent::PdfOcrStarted { page } => {
if !quiet {
let mut err = std::io::stderr().lock();
let _ = writeln!(err, " 📷 OCR page {page}...");
}
}
IngestEvent::PdfOcrFinished { page, ms, chars, ocr_engine, skipped } => {
if !quiet {
let mut err = std::io::stderr().lock();
if *skipped {
let _ = writeln!(err, " ⊘ OCR page {page} skipped (no DCTDecode or engine fail, {ms}ms)");
} else {
let _ = writeln!(err, " ✓ OCR page {page} ({chars} chars, {ms}ms via {ocr_engine})");
}
}
}
}
Ok(())
}