github Michaelliv/markit v0.5.1

9 hours ago

Fix: PDF table column splitting

Text boxes from mupdf sometimes span multiple table columns when fragments on the same line are close together. The grid code used the center point to assign the whole box to one cell, merging content that belongs in separate columns.

What changed

  • Cross-column splitting — text boxes that span vertical column boundaries are now split at word boundaries and placed in their correct cells
  • Header detection guard — wide paragraph text just above a table is no longer absorbed as a header row
  • Column layout fix — pages with tables no longer trigger false multi-column layout detection

Tested on Anthropic's 244-page Claude Mythos Preview System Cardresult

Don't miss a new markit release

NewReleases is sending notifications on new releases.