feat(ocr): T0a/T0/T1 — golden harness(CTC blank=0 도출) + deps(ort rc.9) + dict/NOTICE
T0a: onnxruntime 직접 골든 하네스 → CTC blank/dict 매핑 경험 확정(gt CER 0.000). T0: 모델 번들 dict+NOTICE(.onnx 는 T12 LFS 결정까지 워크트리 보관). T1: ort(download-binaries)+imageproc 추가, cargo tree ort rc.9 단일 확인.
This commit is contained in:
33
crates/kebab-parse-image/assets/paddleocr-onnx/NOTICE
Normal file
33
crates/kebab-parse-image/assets/paddleocr-onnx/NOTICE
Normal file
@@ -0,0 +1,33 @@
|
||||
PP-OCRv5 mobile ONNX models bundled with kebab (paddle-onnx OCR engine)
|
||||
=======================================================================
|
||||
|
||||
These model weights and the recognition dictionary are derived from
|
||||
PaddleOCR (https://github.com/PaddlePaddle/PaddleOCR), licensed under the
|
||||
Apache License, Version 2.0.
|
||||
|
||||
Copyright (c) PaddlePaddle Authors.
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use these files except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Files
|
||||
-----
|
||||
ppocrv5_mobile_det.onnx PP-OCRv5_mobile detection model (DBNet)
|
||||
korean_ppocrv5_mobile_rec.onnx korean_PP-OCRv5_mobile recognition model (CTC)
|
||||
korean_dict.txt recognition dictionary (11,945 chars: KR + Latin + digits + symbols)
|
||||
|
||||
These were converted from the official PaddlePaddle inference models to ONNX
|
||||
via paddle2onnx for in-process execution with onnxruntime (`ort`). No model
|
||||
architecture or weights were modified; only the serialization format changed.
|
||||
|
||||
The recognition CTC class layout (empirically confirmed, see
|
||||
tests/golden/ctc_rec_golden.json):
|
||||
index 0 = CTC blank
|
||||
index 1..11945 = korean_dict.txt line N -> class N (dict[N-1])
|
||||
index 11946 = space ' '
|
||||
total classes = 11947 (= 11945 dict + blank + space)
|
||||
|
||||
If any post-processing source (min-area-rect / polygon unclip) is later
|
||||
ported verbatim from oar-ocr (Apache-2.0), record the per-file provenance
|
||||
here as required by the Apache-2.0 attribution clause.
|
||||
Reference in New Issue
Block a user