github opendataloader-project/opendataloader-pdf v2.4.4
Release v2.4.4

3 hours ago

What's Changed

  • fix(python): deduplicate CLI failure output by @hyunhee-jo in #495
  • Add check for PUA in Alt entry by @LonelyMidoriya in #491
  • chore(deps): bump urllib3 to 2.7.0 and python-multipart to 0.0.28 for security advisories by @hyunhee-jo in #498
  • fix(cli): emit error for non-PDF top-level input by @hyunhee-jo in #496
  • Tagged PDF - Add id to content items by @LonelyMidoriya in #501
  • fix(hybrid)!: fail fast when backend left pages unprocessed with fallback disabled by @hyunhee-jo in #499
  • fix(hybrid): forward --picture-description-prompt to docling VLM by @bundolee in #503
  • fix(cli): announce when a folder contains zero processable PDFs (PDFDLOSP-15) by @bundolee in #507
  • fix(cli): replace verapdf password exception stack trace with friendly error by @bundolee in #504
  • fix: clean error for .pdf-named non-PDF input (PDFDLOSP-14) by @bundolee in #506
  • fix(cli): separate markdown modifiers from --format values (PDFDLOSP-6) by @bundolee in #508
  • Tagged PDF - Set ID for artifacts by @LonelyMidoriya in #509
  • Add new rules for text sanitization by @LonelyMidoriya in #510
  • docs(use-struct-tree): clarify that output quality depends on tag quality by @bundolee in #512
  • fix(processors): log actual page count being processed, not document total by @bundolee in #513
  • fix(hybrid): activate --hybrid-fallback on server-absent path (PDFDLOSP-21) by @hnc-jglee in #511
  • fix(tagged-pdf): refuse encrypted documents with friendly message by @bundolee in #514
  • fix(generators): emit page separator only for pages selected by --pages by @hyunhee-jo in #516
  • Update verapdf version by @MaximPlusov in #517

Full Changelog: v2.4.3...v2.4.4

Don't miss a new opendataloader-pdf release

NewReleases is sending notifications on new releases.