Michaelliv/markit v0.3.0 on GitHub

PDF converter rewrite

Rewrote the PDF converter from scratch with mupdf (native WASM).

What's new

Table detection — vector line extraction + raycasting places text into markdown tables
Diagram filtering — block diagrams (sparse grids, repeated labels) are excluded from table detection
Multi-column layout — two-column documents (legal docs, datasheets) read in correct order
Header/footer stripping — repeated running headers removed across pages
Image extraction — diagrams cropped and saved as PNGs when imageDir is provided
CTM tracking — content stream coordinate transforms applied correctly
Agent skill — npx skills add Michaelliv/markit

Performance

PDF	Pages	Time
Bitcoin whitepaper	9	26ms
US Constitution	16	56ms
Intel PCH datasheet	224	640ms
NXP S32K3xx datasheet	164	1.9s

Testing

58 tests across 4 test files covering grid detection, rendering, extraction, and column detection. Validated against Intel, NXP, Microchip, and Bitcoin whitepaper PDFs.

Michaelliv/markit v0.3.0 v0.3.0 — PDF converter rewrite on GitHub

PDF converter rewrite

What's new

Performance

Testing

Michaelliv/markit v0.3.0
v0.3.0 — PDF converter rewrite

on GitHub