github opendataloader-project/opendataloader-pdf v2.4.0
Release v2.4.0

7 hours ago

What's Changed

  • feat(helium): Hancom AI hybrid backend + PDF/UA-2 remediation improvements by @bundolee in #446
  • security: bump lxml 6.0.2 -> 6.1.0 to resolve XXE advisory by @hyunhee-jo in #447
  • chore(deps): bump Python deps to resolve transformers RCE advisory by @bundolee in #453
  • Auto-tagging. Update Link and Aside structure elements creation by @MaximPlusov in #455
  • Create struct destination from each destination by @LonelyMidoriya in #457
  • fix(deps): pin veraPDF and jackson-databind to fixed versions in published POM by @bundolee in #461
  • feat(hybrid): allow disabling OCR and selecting non-EasyOCR engines by @hyunhee-jo in #460
  • feat(cli): make per-page parallelism opt-in via --threads (default 1) by @bundolee in #454
  • feat(cli): --hybrid-hancom-ai-* options + reusable CLIOptions API by @bundolee in #462
  • feat(api): add OutputWriter for two-phase extraction-then-output flow by @bundolee in #464
  • docs(readme): mark auto-tagging as shipped and stop overpromising compliance by @bundolee in #466

Full Changelog: v2.3.0...v2.4.0

Don't miss a new opendataloader-pdf release

NewReleases is sending notifications on new releases.