|
|
375a0693e4
|
chore(ocr): T11/T12 — clippy clean + docs + v0.27.0 bump
T11: fix 12 clippy lints in paddle_onnx.rs/paddle_e2e.rs (doc overindent,
finish_non_exhaustive, map_or_else, RangeInclusive::contains, cast_lossless,
is_some_and, usize::from). Full-workspace clippy -D warnings = 0.
Smoke (paddle-onnx, real binary): clean_paragraph OCR verbatim-correct, real
per-region confidence (0.99/0.96/0.95), FTS5 lexical hit on Korean(검색)+
English(embedding), parser_version folds |ocr:1:paddle-onnx:<ver>. Big page
<4s inference (5.6s ingest incl. one-time session load).
T12: README [image.ocr].engine + ARCHITECTURE OCR row + SMOKE paddle-onnx config
+ HANDOFF + HOTFIXES dated entry. Workspace version 0.26.2 → 0.27.0 (minor:
new engine value + config keys). .gitattributes: onnx as plain blobs (no git-lfs).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
2026-06-04 08:36:19 +00:00 |
|
|
|
901416d8e9
|
feat(ocr): T7-T9 — config overrides + engine factory + signature cascade
T7: OcrCfg gains det_model/rec_model/dict overrides + score_thresh/
unclip_ratio/max_boxes (serde default, KEBAB_IMAGE_OCR_* env). OnnxPaddleOcr::new
threads them via ModelPaths::from_config.
T8: build_image_ocr_engine / build_pdf_ocr_engine factories return
Box<dyn OcrEngine>; match on engine string (ollama-vision|paddle-onnx|err).
ImagePipeline.ocr_engine + pdf_ocr_engine signatures switched to &dyn OcrEngine.
OcrEngine gains model() for the progress label.
T9: ingest_config_signature image/pdf branches emit |ocr:1:{engine}:{engine_version}
(memoized blake3 per asset-triple, m3-safe). Unit tests (a)(b)(c) added.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
2026-06-04 08:15:30 +00:00 |
|