github naptha/tesseract.js v6.0.0

2 days ago

What's Changed

  • Fixed memory leaks (#977)
    • This version fixed a long-standing issue where memory would rise over time, eventually leading to a crash.
  • Reduced runtime and memory usage for most users by updating default formats (#916).
  • Fixed compatibility with Electron main process (#925)
  • Fixed bug where user-provided parameters were overwritten by defaults (#975).

Breaking Changes

  1. All outputs formats other than text are now disabled by default.
    • To re-enable the hocr output (for example), set the following: worker.recognize(image, {}, { hocr: true })
      • See here for a list of possible output formats.
  2. The JavaScript object output format (blocks) was tweaked.
    • Only the array of blocks (blocks) is returned.
      • Previous versions would automatically generate lists of every unit of text (words, symbols, etc.).
        • If needed, these should now be generated by the user.
    • Only text-based blocks are reported.
      • Previous versions reported non-text blocks when detected by Tesseract (e.g. line segments).
    • The shape of some objects were changed.
      • See the type declarations for reference on properties.
      • The main properties--text and bbox--are unchanged.
  3. Various functions and options marked as depreciated previously have been removed.
    1. This includes worker.initialize and worker.loadLanguage, along with several depreciated options from v2.

See #993 for additional discussion about this release.

New Contributors

Full Changelog: v5.1.1...v6.0.0

Don't miss a new tesseract.js release

NewReleases is sending notifications on new releases.