From fb3952d54f96315a2fa77405a81ac59ef1e10d30 Mon Sep 17 00:00:00 2001 From: altair823 Date: Wed, 27 May 2026 05:36:56 +0000 Subject: [PATCH] docs(pdf-ocr): correct F7 conversion engine record in PoC doc (gs, not ImageMagick) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit aeeff36 의 PoC doc append (engine-comparison.md L134, L141) 가 F7 (`ccitt.pdf`) 의 conversion engine 을 "ImageMagick `convert -compress Group4`" 로 기록했으나, 실제 tests/fixtures/_synth/flate_ccittfax.sh:77-83 은 `gs -sDEVICE=pdfwrite -dMonoImageFilter=/CCITTFaxEncode -dEncodeMonoImages=true` flag 사용 (ImageMagick `convert` 호출 0회). fixture binary (`/Filter [ /CCITTFaxDecode ]`, 2060 bytes) 는 invariant 충족 OK (Step 2 spec compliance + code quality review verified). historical record 의 factual correction only. review trail: - impl result: .omc/reviews/2026-05-27-pdf-ocr-step-02-impl-result.md - spec review: .omc/reviews/2026-05-27-pdf-ocr-step-02-spec-review-result.md - code review: .omc/reviews/2026-05-27-pdf-ocr-step-02-code-review-result.md (I1) Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/superpowers/poc/2026-05-27-pdf-ocr-engine-comparison.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/superpowers/poc/2026-05-27-pdf-ocr-engine-comparison.md b/docs/superpowers/poc/2026-05-27-pdf-ocr-engine-comparison.md index 43aa8f4..102da82 100644 --- a/docs/superpowers/poc/2026-05-27-pdf-ocr-engine-comparison.md +++ b/docs/superpowers/poc/2026-05-27-pdf-ocr-engine-comparison.md @@ -131,11 +131,11 @@ B2 fixture 합성 후 actual `/Filter` 측정 (shell grep + Python re 기반 pro - F6 (`flate_raw.pdf`): `/Filter /FlateDecode` — 872 bytes, DCTDecode 부재 ✅. Pillow RGB→PDF 가 DCTDecode 를 사용하는 것 확인 → 수동 PDF 작성(zlib.compress) 으로 생성. - F7 (`ccitt.pdf`): `/Filter [ /CCITTFaxDecode ]` — 2060 bytes, DCTDecode 부재 ✅. - Pillow bilevel('1') + TIFF group4 → ImageMagick `convert -compress Group4` 경로. + Pillow bilevel('1') + TIFF group4 → ghostscript `pdfwrite -dMonoImageFilter=/CCITTFaxEncode -dEncodeMonoImages=true` 경로. Step 3 의 `extract_dctdecode_page_image` 의 baseline. 주요 관찰: - reportlab 기본값 (`useA85=1`) 은 `/Filter [ /ASCII85Decode /DCTDecode ]` chain → `useA85=0` 으로 순수 DCTDecode 획득. - Pillow `Image.new('RGB').save('.pdf','PDF')` 는 DCTDecode (JPEG) 로 저장 — FlateDecode raw pixel 이 필요한 F6 는 수동 PDF 작성 필요. -- ghostscript pdfwrite 가 TIFF 직접 입력을 지원 안 함 (v10.02.1 `/undefined in II*`) — ImageMagick `convert -compress Group4` 로 대체. +- ghostscript pdfwrite default 가 TIFF 입력 시 `/undefined in II*` 로 실패 — `-dMonoImageFilter=/CCITTFaxEncode -dEncodeMonoImages=true` flag 로 우회 (ImageMagick fallback 불필요).