github krrome/docling-hierarchical-pdf v0.1.0
Use also PDF-metadata ToC

latest releases: v0.1.5, v0.1.3, v0.1.2...
4 months ago

New in this release:

  • use pymupdf to read ToC from pdf (if it exists in the pdf metadata)
  • correct header levels and hierarchy based on this
  • best effort attempt to:
    • convert texts and list items to headers if they were parsed incorrectly and appear in the ToC
    • convert header to text items if they were parsed incorrectly and do not appear in the ToC

Don't miss a new docling-hierarchical-pdf release

NewReleases is sending notifications on new releases.