From 5bba95fd71b10ac4c4caefe303380bbb0e800060 Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 27 May 2026 23:16:18 +0000 Subject: [PATCH] docs(spec): HOTFIXES entry + parent spec cross-link for Bug #11 timeout deviation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bug #11 (이전 commit `fix(config): pdf.ocr.request_timeout_secs default 600 → 60`) 의 frozen-spec deviation handoff. - tasks/HOTFIXES.md: 2026-05-27 dated subsection — Discovered / Symptom / Root cause / Fix / Amends 5-field 포맷 (기존 entries 와 일치). - docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md: PDF OCR config block line 1000 (default value) + OQ-1 line 1628 에 inline HTML 주석 2 줄 cross-link. prose 변경 0 — parent spec frozen contract 보존, HTML 주석은 markdown render 시 invisible. HOTFIXES entry 가 live source of truth (CLAUDE.md "Spec contract" 규칙). Co-Authored-By: Claude Opus 4.7 (1M context) --- .../specs/2026-05-27-pdf-scanned-ocr-spec.md | 2 ++ tasks/HOTFIXES.md | 15 +++++++++++++++ 2 files changed, 17 insertions(+) diff --git a/docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md b/docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md index 924e745..4378663 100644 --- a/docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md +++ b/docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md @@ -998,6 +998,7 @@ impl PdfOcrCfg { } fn default_pdf_ocr_request_timeout_secs() -> u64 { 600 } // CPU 환경 105s 의 5x 여유 + fn default_pdf_ocr_valid_ratio() -> f32 { 0.5 } fn default_pdf_ocr_min_char_count() -> u32 { 20 } fn default_pdf_ocr_lang_hint() -> Option { Some("kor".to_string()) } @@ -1626,6 +1627,7 @@ v0.19 시점 indexed scanned PDF (= 빈 block + warning) 는 v0.20 upgrade 후 | OQ | description | resolution status | |---|---|---| | OQ-1 | `request_timeout_secs = 600` default 가 GPU 환경에서 과보호 — config tuning 가이드 필요 | dogfood 후 documentation update | + | OQ-2 | citation.v1 의 `ocr_origin: bool` additive 추가 시점 | user feedback 누적 후 별 sub-item | | OQ-3 | doctor.v1 의 ollama model availability check | 별 sub-item (sub-item #N — doctor 확장) | | OQ-4 | partial-PDF persistence (per-page commit) | 별 sub-item (사용자 책 ingest UX feedback 후 P+) | diff --git a/tasks/HOTFIXES.md b/tasks/HOTFIXES.md index 36648c4..75103c6 100644 --- a/tasks/HOTFIXES.md +++ b/tasks/HOTFIXES.md @@ -14,6 +14,21 @@ historical contract that was implemented; this file accumulates the deltas so phase 5+ readers can find the live behavior without diffing git history. +## 2026-05-27 — PDF OCR `request_timeout_secs` default 600s → 60s (Bug #11) + +**Discovered**: v0.20.0 final dogfood (2026-05-27), metro-korea.pdf 의 page 8 + 13. + +**Symptom**: 두 page 모두 `kebab ingest` 가 600s 까지 완전 timeout (`ms: 600000, chars: 0, skipped: true`). 본문 indexed 안 됨, page 당 20분 cost 낭비, user 가 ingest 완료 signal 못 받음. + +**Root cause**: `default_pdf_ocr_request_timeout_secs() = 600` (spec `2026-05-27-pdf-scanned-ocr-spec.md` line 1000 + OQ-1 line 1628 의 "CPU 환경 105s 의 5x 여유" 가정). 실측 cloud GPU Ollama 의 per-page throughput 는 6-32s — 600s 까지 가야 timeout 이라면 Ollama 다운 상태가 사실상 확실. 600s 가 fail-fast 신호로 작동 안 함. + +**Fix** (v0.20.0 bugfix3 round 3, branch `feat/pdf-scanned-ocr`): +- `crates/kebab-config/src/lib.rs` `default_pdf_ocr_request_timeout_secs() = 60`. +- Doc-comment 보강 — 6-32s 정상 throughput, 60s 초과는 Ollama 다운 / 매우 dense·고해상도 page 신호. +- User override path 보존 — `config.toml [pdf.ocr] request_timeout_secs = N` 로 늘릴 수 있음. + +**Amends**: `docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md` line 1000 / OQ-1 line 1628 (frozen — text 변경 없음, inline HTML 주석 cross-link 1 줄만 추가). 본 entry 가 live source of truth. + ## 2026-05-27 — Identity-H mojibake marker bypassed OCR fallback (Bug #6) - **Symptom**: `metro-korea.pdf` (Identity-H CID font without ToUnicode CMap) 의 ingest 가 `pdf_ocr_pages=0` 으로 종료. text 전체가 `?Identity-H Unimplemented?` marker 1154회 반복 (lopdf 0.32.0 emit). text-detect ratio = 1.0 → OCR fallback threshold 0.5 bypass.