github opendatalab/MinerU magic_pdf-0.6.2b1-released

latest releases: mineru-2.2.2-released, mineru-2.2.1-released, mineru-2.2.0-released...
13 months ago

What's Changed

  • Optimized model loading logic, now requiring only a single load during batch processing.
  • Command-line interface now supports batch input.
  • When import fails, prints complete error messages to facilitate troubleshooting.
  • Fixed a bug where overlapping spans were incorrectly removed multiple times.
  • Improved OCR recognition areas, doubling the OCR speed.
  • Embedded language identification models within the whl package for easier offline deployment.
  • Replaced interline_equation_blocks with interline_equations to enhance interline formula recognition capabilities in non-academic paper scenarios.
  • Added page number indexing to the output results of content_list.
  • Locked some dependency versions and adjusted the dependency installation logic to reduce conflicts and redundant installations, cutting down the number of packages by 30% and improving the initial installation success rate.

New Contributors

Full Changelog: magic_pdf-0.6.1-released...magic_pdf-0.6.2b1-released

Don't miss a new MinerU release

NewReleases is sending notifications on new releases.