What's Changed
- Migrate the HTML extractor and hOCR conversion to html-to-markdown 2.1 (Rust) bindings, removing the legacy BeautifulSoup-based path
- Automatically capture inline data URI images and inline SVG assets when
extract_imagesis enabled - Add Python 3.14 core support while documenting that EasyOCR, PaddleOCR, and spaCy-based entity extraction remain unavailable until upstream wheels support it