opendatalab/MinerU mineru-3.0.0-released on GitHub

What's Changed

2026/03/29 3.0.0 Released
This release delivers a systematic upgrade centered on parsing capability, system architecture, and engineering usability. The main updates include:
- Native DOCX parsing
  - Official support for native DOCX parsing, delivering high-precision results without hallucinations.
  - Compared with the traditional workflow of first converting DOCX to PDF and then parsing it, end-to-end speed is improved by tens of times, making it better suited for scenarios with high requirements for both accuracy and throughput.
- pipeline backend upgrade
  - The pipeline backend achieves a score of 86.2 on OmniDocBench (v1.5), surpassing the accuracy of the previous-generation mainstream VLM MinerU2.0-2505-0.9B.
  - Added support for parsing images/formulas inside tables, seal text recognition, vertical text support, and interline formula numbering recognition, continuously improving parsing quality for complex document scenarios.
  - While maintaining high accuracy, it keeps resource usage extremely low and continues to support inference in pure CPU environments.
- API / CLI / Router orchestration upgrade
  - mineru now runs as an orchestration client based on mineru-api; when --api-url is not provided, it will automatically start a local temporary service.
  - mineru-api adds a new asynchronous task endpoint POST /tasks, supporting task submission, status querying, and result retrieval; meanwhile, it retains the synchronous parsing endpoint POST /file_parse for compatibility with legacy plugins.
  - Added mineru-router, designed for unified entry deployment and task routing across multiple services and multiple GPUs; its interfaces are fully compatible with mineru-api and support automatic task load balancing.
- Deployment and usability improvements
  - Resolved compatibility issues with torch >= 2.8; the base image has been upgraded to vllm0.11.2 + torch2.9.0, unifying installation paths across different Compute Capabilities.
  - Optimized the parsing pipeline with a sliding-window mechanism, significantly reducing peak memory usage in long-document scenarios, so documents with tens of thousands of pages no longer need to be split manually.
  - Batch inference in pipeline now supports streaming writes to disk, allowing completed parsing results to be written out in time and further improving the experience for long-running tasks.
  - Completed thread-safety optimization and now fully supports multi-threaded concurrent inference; together with mineru-router, this enables one-click multi-GPU deployment and makes it easy to build high-concurrency, high-throughput parsing systems.
  - Completely removed the use of two AGPLv3 models (doclayoutyolo and mfd_yolov8) and one CC-BY-NC-SA 4.0 model (layoutreader).
This update is not just a set of feature enhancements, but a key leap forward in MinerU's overall system capabilities. We specifically addressed the peak memory usage issue in long-document parsing. Through optimizations such as sliding windows and streaming writes to disk, ultra-long document parsing has moved from “requiring manual splitting and careful handling” to being “stable, scalable, and ready for production workloads.” At the same time, we completed thread-safety optimization and fully enabled multi-threaded concurrent inference, further improving single-machine resource utilization and runtime stability under high-concurrency workloads. On top of this, with mineru-router and the new API / CLI orchestration framework, MinerU now supports one-click multi-GPU deployment, unified access across multiple services, and automatic task load balancing, significantly reducing the difficulty of large-scale deployment. As a result, MinerU is evolving from a standalone data production tool into a large-scale document parsing foundation for high-concurrency and high-throughput scenarios, providing enterprise-grade document data processing with infrastructure that is more stable, more efficient, and easier to scale.
2026/03/29 3.0.0 发布
本次版本更新围绕解析能力、系统架构与工程可用性进行了系统升级。主要更新内容包括：
- DOCX 原生解析
  - 正式支持 DOCX 原生解析，在无幻觉前提下实现高精度解析。
  - 相较于“先将 DOCX 转为 PDF 再解析”的传统流程，端到端速度提升数十倍以上，更适合对精度与吞吐均有要求的场景。
- pipeline 后端升级
  - pipeline 后端在 OmniDocBench (v1.5) 上取得 86.2 分，精度超过上一代主流 VLM MinerU2.0-2505-0.9B。
  - 新增表格内图片/公式解析、印章文字识别、竖排文本支持、行间公式序号识别等能力，持续提升复杂文档场景下的解析效果。
  - 在保持高精度的同时，资源占用极低，并继续支持纯 CPU 环境推理。
- API / CLI / Router 编排升级
  - mineru 现作为基于 mineru-api 的编排客户端运行；在未传入 --api-url 时，会自动拉起本地临时服务。
  - mineru-api 新增异步任务接口 POST /tasks，支持任务提交、状态查询与结果获取；同时保留同步解析接口 POST /file_parse，以兼容老版本插件。
  - 新增 mineru-router，适用于多服务、多 GPU 的统一入口部署与任务路由；其接口与 mineru-api 完全兼容，并支持任务自动负载均衡。
- 部署与使用体验优化
  - 解决了 torch >= 2.8 的兼容问题，基础镜像升级为 vllm0.11.2 + torch2.9.0，统一了不同 Compute Capability 的安装路径。
  - 通过滑动窗口优化解析链路，显著降低长文档场景下的内存峰值占用，上万页文档解析不再需要手动拆分。
  - pipeline 的 batch 推理支持流式落盘，已完成的解析结果可及时写出，进一步提升长任务处理体验。
  - 完成线程安全优化，全面支持多线程并发推理；配合 mineru-router，可一键实现多卡部署，轻松构建高并发、高吞吐解析系统。
  - 完全移除了两个 AGPLv3 模型（doclayoutyolo 和 mfd_yolov8）以及一个 CC-BY-NC-SA 4.0 模型（layoutreader）的使用。
本次更新不仅是若干功能点的补强，更是 MinerU 在系统能力上的一次关键跃迁。我们重点解决了长文档解析过程中的内存峰值占用问题，通过滑动窗口、流式落盘等链路优化，让超长文档解析从“需要手动拆分、谨慎处理”走向“稳定可跑、规模可扩展”。同时，我们完成了线程安全优化，全面支持多线程并发推理，进一步提升了单机资源利用率与高并发场景下的运行稳定性。在此基础上，基于 mineru-router 与全新的 API / CLI 编排体系，MinerU 已具备一键多卡部署、多服务统一接入、任务自动负载均衡的能力，显著降低了大规模部署难度。至此，MinerU 正在从单一的数据生产工具，进一步演进为面向高并发、高吞吐场景的大规模文档解析基座，为企业级文档数据处理提供更稳定、更高效、更易扩展的基础设施能力。

New Contributors

@boshi91 made their first contribution in #4523
@Niujunbo2002 made their first contribution in #4662

Full Changelog: mineru-2.7.6-released...mineru-3.0.0-released