github py-pdf/pypdf 2.2.0
Version 2.2.0, 2022-06-13

latest releases: 5.0.1, 5.0.0, 4.3.1...
2 years ago

What's Changed

The 2.2.0 release improves text extraction (#969 - again by @pubpub-zz 🙏):

  • Improvements around /Encoding / /ToUnicode
  • Extraction of CMaps improved
  • Fallback for font def missing
  • Support for /Identity-H and /Identity-V: utf-16-be
  • Support for /GB-EUC-H / /GB-EUC-V / GBp/c-EUC-H / /GBpc-EUC-V (beta release for evaluation)
  • Arabic (for evaluation)
  • Whitespace extraction improvements

Those changes should mainly improve the text extraction for non-ASCII alphabets,
e.g. Russian / Chinese / Japanese / Korean / Arabic.

Full Changelog: 2.1.1...2.2.0

Don't miss a new pypdf release

NewReleases is sending notifications on new releases.