Fixed
- E2E element type assertions: Fixed element type field name in E2E generator templates for Python, TypeScript, WASM Deno, Elixir, Ruby, PHP, and C#
- Ruby PDF annotation extraction: Fixed
PdfAnnotationandPdfAnnotationBoundingBoxautoload and bounding box field name mismatch - WASM OCR blocking event loop: OCR now runs in a worker thread, keeping the main thread responsive
- JPEG 2000 OCR decode failure: Shared
load_image_for_ocr()helper withhayro-jpeg2000/hayro-jbig2decoders across all OCR backends - WASM PDF empty content: PDFium initialization now properly awaited during
initWasm()
Added
- OMML-to-LaTeX math conversion for DOCX: Mathematical equations converted to LaTeX notation
- Plain text output paths for all extractors: DOCX, PPTX, ODT, FB2, DocBook, RTF, Jupyter produce clean plain text when requested
cells_to_text()shared utility: Tab-separated plain text table formatter
Changed
- CLI includes all features:
kreuzberg-clinow usesfullfeature set including archives
See CHANGELOG.md for full details.