Feature
- Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) (
7d3302c
) - Support xlsm files (#1520) (
df14022
)
Fix
- Pptx line break and space handling (#1664) (
f28d23c
) - asciidoc: Set default size when missing in image directive (#1769) (
b886e4d
) - Handle NoneType error in MsPowerpointDocumentBackend (#1747) (
7a275c7
) - Prov for merged-elems (#1728) (
6613b9e
) - tesseract: Initialize df_osd to avoid uninitialized variable error (#1718) (
e979750
) - Allow custom torch_dtype in vlm models (#1735) (
f7f3113
) - Improve extraction from textboxes in Word docs (#1701) (
9dbcb3d
) - Add WEBP to the list of image file extensions (#1711) (
a2b83fe
)