py-pdf/pypdf 2.2.0
Version 2.2.0, 2022-06-13

on GitHub

latest releases: 5.0.1, 5.0.0, 4.3.1...

2 years ago

What's Changed

The 2.2.0 release improves text extraction (#969 - again by @pubpub-zz 🙏):

Improvements around /Encoding / /ToUnicode
Extraction of CMaps improved
Fallback for font def missing
Support for /Identity-H and /Identity-V: utf-16-be
Support for /GB-EUC-H / /GB-EUC-V / GBp/c-EUC-H / /GBpc-EUC-V (beta release for evaluation)
Arabic (for evaluation)
Whitespace extraction improvements

Those changes should mainly improve the text extraction for non-ASCII alphabets,
e.g. Russian / Chinese / Japanese / Korean / Arabic.

Full Changelog: 2.1.1...2.2.0

Check out latest releases or
releases around py-pdf/pypdf 2.2.0

Don't miss a new pypdf release

NewReleases is sending notifications on new releases.

Get notifications