Improved LaTeX OCR
We trained a new LaTeX OCR model that works a lot better overall. It will reliably output KaTeX-compatible math. It also operates on longer sequences than before.
The rendered output is on the right, original document on the left:
Block visualization
You can now visualize blocks in the streamlit app, thanks to @jazzido . By selecting json output and checking "show blocks", you get a nice visualization where you can see how marker parsed the page. Clicking on blocks will show the HTML.
Links and references
We fixed a bug with links and references, they now render as one block. You can see the extracted references here:
Misc bugfixes
- Fixed some bugs with tables and row splitting
- Escaped $ inside text and tables so we don't accidentally render things as equations
What's Changed
- [streamlit_app] Visualize extracted blocks by @jazzido in #502
- Texify by @VikParuchuri in #513
- Update texify by @VikParuchuri in #514
New Contributors
Full Changelog: v1.3.2...v1.3.3