What's Changed
- 
Release of 1.3.0, in this version we made many optimizations and improvements:
- Installation and compatibility optimization
- By removing the use of 
layoutlmv3in layout, resolved compatibility issues caused bydetectron2. - Torch version compatibility extended to 2.2~2.6 (excluding 2.5).
 - CUDA compatibility supports 11.8/12.4/12.6 (CUDA version determined by torch), resolving compatibility issues for some users with 50-series and H-series GPUs.
 - Python compatible versions expanded to 3.10~3.12, solving the problem of automatic downgrade to 0.6.1 during installation in non-3.10 environments.
 - Offline deployment process optimized; no internet connection required after successful deployment to download any model files.
 
 - By removing the use of 
 - Performance optimization
- By supporting batch processing of multiple PDF files (script example), improved parsing speed for small files in batches (compared to version 1.0.1, formula parsing speed increased by over 1400%, overall parsing speed increased by over 500%).
 - Optimized loading and usage of the mfr model, reducing GPU memory usage and improving parsing speed (requires re-execution of the model download process to obtain incremental updates of model files).
 - Optimized GPU memory usage, requiring only a minimum of 6GB to run this project.
 - Improved running speed on MPS devices.
 
 - Parsing effect optimization
- Updated the mfr model to 
unimernet(2503), solving the issue of lost line breaks in multi-line formulas. 
 - Updated the mfr model to 
 - Usability Optimization
- By using 
paddleocr2torch, completely replaced the use of thepaddleframework andpaddleocrin the project, resolving conflicts betweenpaddleandtorch, as well as thread safety issues caused by thepaddleframework. - Added a real-time progress bar during the parsing process to accurately track progress, making the wait less painful.
 
 - By using 
 
 - Installation and compatibility optimization
 - 
1.3.0 发布,在这个版本我们做出了许多优化和改进:
- 安装与兼容性优化
- 通过移除layout中
layoutlmv3的使用,解决了由detectron2导致的兼容问题 - torch版本兼容扩展到2.2~2.6(2.5除外)
 - cuda兼容支持11.8/12.4/12.6(cuda版本由torch决定),解决部分用户50系显卡与H系显卡的兼容问题
 - python兼容版本扩展到3.10~3.12,解决了在非3.10环境下安装时自动降级到0.6.1的问题
 - 优化离线部署流程,部署成功后不需要联网下载任何模型文件
 
 - 通过移除layout中
 - 性能优化
- 通过支持多个pdf文件的batch处理(脚本样例),提升了批量小文件的解析速度 (与1.0.1版本相比,公式解析速度最高提升超过1400%,整体解析速度最高提升超过500%)
 - 通过优化mfr模型的加载和使用,降低了显存占用并提升了解析速度(需重新执行模型下载流程以获得模型文件的增量更新)
 - 优化显存占用,最低仅需6GB即可运行本项目
 - 优化了在mps设备上的运行速度
 
 - 解析效果优化
- mfr模型更新到
unimernet(2503),解决多行公式中换行丢失的问题 
 - mfr模型更新到
 - 易用性优化
- 通过使用
paddleocr2torch,完全替代paddle框架以及paddleocr在项目中的使用,解决了paddle和torch的冲突问题,和由于paddle框架导致的线程不安全问题 - 解析过程增加实时进度条显示,精准把握解析进度,让等待不再痛苦
 
 - 通过使用
 
 - 安装与兼容性优化
 
New Contributors
- @JesseChen1031 made their first contribution in #1919
 
Full Changelog: magic_pdf-1.2.2-released...magic_pdf-1.3.0-released