github docling-project/docling v2.37.0

latest releases: v2.51.0, v2.50.0, v2.49.0...
2 months ago

Feature

  • Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) (7d3302c)
  • Support xlsm files (#1520) (df14022)

Fix

  • Pptx line break and space handling (#1664) (f28d23c)
  • asciidoc: Set default size when missing in image directive (#1769) (b886e4d)
  • Handle NoneType error in MsPowerpointDocumentBackend (#1747) (7a275c7)
  • Prov for merged-elems (#1728) (6613b9e)
  • tesseract: Initialize df_osd to avoid uninitialized variable error (#1718) (e979750)
  • Allow custom torch_dtype in vlm models (#1735) (f7f3113)
  • Improve extraction from textboxes in Word docs (#1701) (9dbcb3d)
  • Add WEBP to the list of image file extensions (#1711) (a2b83fe)

Documentation

Don't miss a new docling release

NewReleases is sending notifications on new releases.