github opendatalab/MinerU magic_pdf-0.10.2-released

latest releases: mineru-2.5.3-released, mineru-2.5.2-released, mineru-2.5.1-released...
10 months ago

What's Changed

  • fix(pdf_parse): Move the logic for filling text content into spans before the discarded_block recognition to fix the issue of empty text blocks in discarded_block. by @myhloli in #1082
  • refactor(txt_spans_extract_v2): optimize span processing and OCR logic by @myhloli in #1086
  • feat(ocr): filter out low confidence ocr results by @myhloli in #1088
  • feat(pdf_parse): add OCR score to span data by @myhloli in #1089
  • fix: test_rag by @icecraft in #1105
  • perf(image_processing): reduce maximum image size for analysis by @myhloli in #1106
  • fix: test_tools unittest by @icecraft in #1104
  • refactor(libs): remove unused imports and functions by @myhloli in #1112
  • Feat/add s3 read write example by @icecraft in #1117

Full Changelog: magic_pdf-0.10.1-released...magic_pdf-0.10.2-released

Don't miss a new MinerU release

NewReleases is sending notifications on new releases.