Added
- Add
x_tolerance_ratio
parameter toextract_text
and similar functions, to account for text size when spacing characters (instead of a fixed number of pixels) (h/t @afriedman412). (#1041) - Add support for PDF 1.3 logical structure via
Page.structure_tree
(h/t @dhdaines). (#963) - Add "gswin64c" as another possible Ghostscript executable in
repair.py
(h/t @echedey-ls). (#1032) - Re-add
Page.close()
method, havePDF.close()
close all pages as well, and improve relevant documentation (h/t @luketudge). (#1042) - Add
force_mediabox
parameter toPage.to_image(...)
. (#1054)
Fixed
- Standardize handling of cropbox, fixing various issues with PageImage. (#1054)
- Fix
Page.get_textmap
caching to allow forextra_attrs=[...]
, by preconverting list kwargs to tuples. (#1030) - Explicitly close
pypdfium2.PdfDocument
inget_page_image
(h/t @dhdaines). (#1090) - In
PDFPageAggregatorWithMarkedContent.tag_cur_item
, checkself.cur_item._objs
length before trying to access[-1]
. (4f39d03)