Changed
- Make word segmentation (via
WordExtractor.char_begins_new_word(...)
) more explict and rigorous; should help in catching edge-cases in the future. (6acd580 + ebb93ea + #840) - Use
curve_edge
objects (instead of justline
andrect_edge
objects) in default table-detection strategy. (6f6b465 + #858) - By default, expand ligatures into their consituent letters (e.g.,
ffi
toffi
), and add theexpand_ligatures
boolean parameter to text-extraction methods. (86e935d + #598)
Added
- Add
Page.extract_text_lines(...)
method. (4b37397 + #852) - Add
main_group
,return_groups
,return_chars
parameters toPage.search(...)
. (4b37397) - Add
.curve_edges
property toPDF
andPage
. (6f6b465)