What's Changed
-
2025/05/24 1.3.12 Released
- Added support for ppocrv5 model, updated
ch_server
model toPP-OCRv5_rec_server
andch_lite
model toPP-OCRv5_rec_mobile
(model update required)- In testing, we found that ppocrv5(server) shows some improvement for handwritten documents, but slightly lower accuracy than v4_server_doc for other document types. Therefore, the default ch model remains unchanged as
PP-OCRv4_server_rec_doc
. - Since ppocrv5 enhances recognition capabilities for handwritten text and special characters, you can manually select ppocrv5 models for Japanese, traditional Chinese mixed scenarios and handwritten document scenarios
- You can select the appropriate model through the lang parameter
lang='ch_server'
(python api) or--lang ch_server
(command line):ch
:PP-OCRv4_rec_server_doc
(default) (Chinese, English, Japanese, Traditional Chinese mixed/15k dictionary)ch_server
:PP-OCRv5_rec_server
(Chinese, English, Japanese, Traditional Chinese mixed + handwriting/18k dictionary)ch_lite
:PP-OCRv5_rec_mobile
(Chinese, English, Japanese, Traditional Chinese mixed + handwriting/18k dictionary)ch_server_v4
:PP-OCRv4_rec_server
(Chinese, English mixed/6k dictionary)ch_lite_v4
:PP-OCRv4_rec_mobile
(Chinese, English mixed/6k dictionary)
- In testing, we found that ppocrv5(server) shows some improvement for handwritten documents, but slightly lower accuracy than v4_server_doc for other document types. Therefore, the default ch model remains unchanged as
- Added support for handwritten documents by optimizing layout recognition of handwritten text areas
- This feature is supported by default, no additional configuration needed
- You can refer to the instructions above to manually select ppocrv5 model for better handwritten document parsing
- Added support for ppocrv5 model, updated
-
2025/05/24 1.3.12 发布
- 增加ppocrv5模型的支持,将
ch_server
模型更新为PP-OCRv5_rec_server
,ch_lite
模型更新为PP-OCRv5_rec_mobile
(需更新模型)- 在测试中,发现ppocrv5(server)对手写文档效果有一定提升,但在其余类别文档的精度略差于v4_server_doc,因此默认的ch模型保持不变,仍为
PP-OCRv4_server_rec_doc
。 - 由于ppocrv5强化了手写场景和特殊字符的识别能力,因此您可以在日繁混合场景以及手写文档场景下手动选择使用ppocrv5模型
- 您可通过lang参数
lang='ch_server'
(python api)或--lang ch_server
(命令行)自行选择相应的模型:ch
:PP-OCRv4_rec_server_doc
(默认)(中英日繁混合/1.5w字典)ch_server
:PP-OCRv5_rec_server
(中英日繁混合+手写场景/1.8w字典)ch_lite
:PP-OCRv5_rec_mobile
(中英日繁混合+手写场景/1.8w字典)ch_server_v4
:PP-OCRv4_rec_server
(中英混合/6k字典)ch_lite_v4
:PP-OCRv4_rec_mobile
(中英混合/6k字典)
- 在测试中,发现ppocrv5(server)对手写文档效果有一定提升,但在其余类别文档的精度略差于v4_server_doc,因此默认的ch模型保持不变,仍为
- 增加手写文档的支持,通过优化layout对手写文本区域的识别,现已支持手写文档的解析
- 默认支持此功能,无需额外配置
- 可以参考上述说明,手动选择ppocrv5模型以获得更好的手写文档解析效果
- 增加ppocrv5模型的支持,将
Full Changelog: magic_pdf-1.3.11-released...magic_pdf-1.3.12-released