pypi pdfplumber 0.3.0

latest releases: 0.11.0, 0.10.4, 0.10.3...
8 years ago

A ton of improvements and new features:

  • Shifts to a lazy-loading paradigm, so that you don't have to process an entire PDF just to access one page.
  • Strips out pandas requirement and usage.
    • Results in a 3x-ish speedup for within_bbox and similar methods, thanks to short-circuiting & operators.
  • Moves from floats to Decimals to improve accuracy of equality comparisons.
  • Moves to a more modular architecture, adds Container, Page, and CroppedPage classes.
  • Adds Page.crop(...).
  • Adds Page.extract_table(...) for Tabula-like functionality.
  • Adds PDF.metadata property.
  • Adds derived properties Container.rect_edges and Container.edges, decomposing each rectangle decomposed into its constituent lines.
  • Renames collate_chars(...) to get_text(...) (while retaining a reference to the former).
  • Enriches the the command-line tool's JSON output to include PDF metadata and page dimensions. [https://github.com//issues/3]

Don't miss a new pdfplumber release

NewReleases is sending notifications on new releases.