github opendatalab/MinerU magic_pdf-0.8.0-released

latest releases: mineru-2.2.2-released, mineru-2.2.1-released, mineru-2.2.0-released...
12 months ago

What's Changed

feat:

  • Add RAG API
  • Integration of RAG into llama_index project
  • Update Dockerfile
  • Fine grained model singleton, reducing memory usage and accelerating initialization speed
  • CLI and API add parsing range parameters, allowing customization of start and end pages
  • Support image footnotes

bugfix:

  • When removing the smaller overlapping block, retain the boundary information of that block
  • Fill in the threshold of 0.6->0.3 for the span block
  • The problem of losing low score lines in OCR DET stage
  • Merge multiple spans of a single line in the OCR DET stage
  • Optimization of English Adhesive Word Segmentation Logic
  • Inaccurate layout box issue
  • The problem of merging words after being broken by line breaks
  • The final output result contains certain special characters

Full Changelog: magic_pdf-0.7.1-released...magic_pdf-0.8.0-released

Don't miss a new MinerU release

NewReleases is sending notifications on new releases.