opendatalab/MinerU mineru-2.2.0-released on GitHub

What's Changed

2025/09/05 2.2.0 Released
- Major Updates
  - In this version, we focused on improving table parsing accuracy by introducing a new wired table recognition model and a brand-new hybrid table structure parsing algorithm, significantly enhancing the table recognition capabilities of the pipeline backend.
  - We also added support for cross-page table merging, which is supported by both pipeline and vlm backends, further improving the completeness and accuracy of table parsing.
- Other Updates
  - The pipeline backend now supports 270-degree rotated table parsing, bringing support for table parsing in 0/90/270-degree orientations
  - pipeline added OCR capability support for Thai and Greek, and updated the English OCR model to the latest version. English recognition accuracy improved by 11%, Thai recognition model accuracy is 82.68%, and Greek recognition model accuracy is 89.28% (by PPOCRv5)
  - Added bbox field (mapped to 0-1000 range) in the output content_list.json, making it convenient for users to directly obtain position information for each content block
  - Removed the pipeline_old_linux installation option, no longer supporting legacy Linux systems such as CentOS 7, to provide better support for uv's sync/run commands
2025/09/05 2.2.0 发布
- 主要更新
  - 在这个版本我们重点提升了表格的解析精度，通过引入新的有线表识别模型和全新的混合表格结构解析算法，显著提升了pipeline后端的表格识别能力。
  - 另外我们增加了对跨页表格合并的支持，这一功能同时支持pipeline和vlm后端，进一步提升了表格解析的完整性和准确性。
- 其他更新
  - pipeline后端增加270度旋转的表格解析能力，现已支持0/90/270度三个方向的表格解析
  - pipeline增加对泰文、希腊文的ocr能力支持，并更新了英文ocr模型至最新，英文识别精度提升11%，泰文识别模型精度 82.68%，希腊文识别模型精度 89.28%（by PPOCRv5）
  - 在输出的content_list.json中增加了bbox字段(映射至0-1000范围内)，方便用户直接获取每个内容块的位置信息
  - 移除pipeline_old_linux安装可选项，不再支持老版本的Linux系统如Centos 7等，以便对uv的sync/run等命令进行更好的支持

New Contributors

@yeahjack made their first contribution in #3269
@gary-Shen made their first contribution in #3339
@loveRhythm1990 made their first contribution in #3281

Full Changelog: mineru-2.1.11-released...mineru-2.2.0-released