krrome/docling-hierarchical-pdf v0.1.0
Use also PDF-metadata ToC

on GitHub

latest releases: v0.1.5, v0.1.3, v0.1.2...

4 months ago

New in this release:

use pymupdf to read ToC from pdf (if it exists in the pdf metadata)
correct header levels and hierarchy based on this
best effort attempt to:
- convert texts and list items to headers if they were parsed incorrectly and appear in the ToC
- convert header to text items if they were parsed incorrectly and do not appear in the ToC

Don't miss a new docling-hierarchical-pdf release

NewReleases is sending notifications on new releases.