What's Changed
-
2025/10/24 2.6.2 Release
pipelinebackend optimizations- Added experimental support for Chinese formulas, which can be enabled by setting the environment variable
export MINERU_FORMULA_CH_SUPPORT=1. This feature may cause a slight decrease in MFR speed and failures in recognizing some long formulas. It is recommended to enable it only when parsing Chinese formulas is needed. To disable this feature, set the environment variable to0. OCRspeed significantly improved by 200%~300%, thanks to the optimization solution provided by @cjsdurjOCRmodels optimized for improved accuracy and coverage of Latin script recognition, and updated Cyrillic, Arabic, Devanagari, Telugu (te), and Tamil (ta) language systems toppocr-v5version, with accuracy improved by over 40% compared to previous models
- Added experimental support for Chinese formulas, which can be enabled by setting the environment variable
vlmbackend optimizationstable_captionandtable_footnotematching logic optimized to improve the accuracy of table caption and footnote matching and reading order rationality in scenarios with multiple consecutive tables on a page- Optimized CPU resource usage during high concurrency when using
vllmbackend, reducing server pressure - Adapted to
vllmversion 0.11.0
- General optimizations
- Cross-page table merging effect optimized, added support for cross-page continuation table merging, improving table merging effectiveness in multi-column merge scenarios
- Added environment variable configuration option
MINERU_TABLE_MERGE_ENABLEfor table merging feature. Table merging is enabled by default and can be disabled by setting this variable to0
-
2025/10/24 2.6.2 发布
pipline后端优化- 增加对中文公式的实验性支持,可通过配置环境变量
export MINERU_FORMULA_CH_SUPPORT=1开启。该功能可能会导致MFR速率略微下降、部分长公式识别失败等问题,建议仅在需要解析中文公式的场景下开启。如需关闭该功能,可将环境变量设置为0。 OCR速度大幅提升200%~300%,感谢 @cjsdurj 提供的优化方案OCR模型优化拉丁文识别的准度和广度,并更新西里尔文(cyrillic)、阿拉伯文(arabic)、天城文(devanagari)、泰卢固语(te)、泰米尔语(ta)语系至ppocr-v5版本,精度相比上代模型提升40%以上
- 增加对中文公式的实验性支持,可通过配置环境变量
vlm后端优化table_caption、table_footnote匹配逻辑优化,提升页内多张连续表场景下的表格标题和脚注的匹配准确率和阅读顺序合理性- 优化使用
vllm后端时高并发时的cpu资源占用,降低服务端压力 - 适配
vllm0.11.0版本
- 通用优化
- 跨页表格合并效果优化,新增跨页续表合并支持,提升在多列合并场景下的表格合并效果
- 为表格合并功能增加环境变量配置选项
MINERU_TABLE_MERGE_ENABLE,表格合并功能默认开启,可通过设置该变量为0来关闭表格合并功能
New Contributors
- @wangbinDL made their first contribution in #3615
- @cjsdurj made their first contribution in #3672
- @yongtenglei made their first contribution in #3740
- @magicyuan876 made their first contribution in #3742
Full Changelog: mineru-2.5.4-released...mineru-2.6.2-released