github opendatalab/MinerU magic_pdf-0.10.3-released

latest releases: mineru-2.5.3-released, mineru-2.5.2-released, mineru-2.5.1-released...
9 months ago

What's Changed

  • fix(Hybrid OCR):Enable Hybrid OCR for Empty Spans That Contain a Certain Number of Placeholders but No Actual Text by @myhloli in #1132
  • refactor(para): improve language detection and block splitting by @myhloli in #1134
  • feat(pdf_parse): filter out skewed text lines by @myhloli in #1135
  • refactor(ocr): improve text processing and span handling by @myhloli in #1136
  • refactor(pdf_check): improve character detection using PyMuPDF by @myhloli in #1137
  • feat(pdf_parse): add line start flag detection and optimize line stop flag logic by @myhloli in #1138
  • fix(ocr_mkcontent): handle empty paragraphs on pages by @myhloli in #1139
  • refactor(pdf_parse): adjust character-axis alignment algorithm by @myhloli in #1140
  • refactor(ocr): Fix the error of paddleocr failing to initialize in a multi-threaded environment by @myhloli in #1141

Full Changelog: magic_pdf-0.10.2-released...magic_pdf-0.10.3-released

Don't miss a new MinerU release

NewReleases is sending notifications on new releases.