github naptha/tesseract.js v4.0.0

latest releases: v5.1.1, v5.1.0, v5.0.5...
23 months ago

Breaking Changes

  1. createWorker is now async
    1. In most code this means worker = Tesseract.createWorker() should be replaced with worker = await Tesseract.createWorker()
    2. Calling with invalid workerPath or corePath now produces error/rejected promise (#654)
  2. worker.load is no longer needed (createWorker now returns worker pre-loaded)
  3. getPDF function replaced by pdf recognize option (#488)
    1. This allows PDFs to be created when using a scheduler
    2. See browser and node examples for usage

Major New Features

  1. Processed images created by Tesseract can be retrieved using imageColor, imageGrey, and imageBinary options (#588)
    1. See image-processing.html example for usage
  2. Image rotation options rotateAuto and rotateRadians have been added, which significantly improve accuracy on certain documents
    1. See Issue #648 example of how auto-rotation improves accuracy
    2. See image-processing.html example for usage of rotateAuto option
  3. Tesseract parameters (usually set using worker.setParameters) can now be set for single jobs using worker.recognize options (#665)
    1. For example, a single job can be set to recognize only numbers using worker.recognize(image, {tessedit_char_whitelist: "0123456789"})
    2. As these settings are reverted after the job, this allows for using different parameters for specific jobs when working with schedulers
  4. Initialization parameters (e.g. load_system_dawg, load_number_dawg, and load_punc_dawg) can now be set (#613)
    1. The third argument to worker.initialize now accepts either (1) an object with key/value pairs or (2) a string containing contents to write to a config file
    2. For example, both of these lines set load_number_dawg to 0:
      1. worker.initialize('eng', "0", {load_number_dawg: "0"});
      2. worker.initialize('eng', "0", "load_number_dawg 0");

Other Changes

  1. loadLanguage now resolves without error when language is loaded but writing to cache fails
    1. This allows for running in Firefox incognito mode using default settings (#609)
  2. detect returns null values when OS detection fails rather than throwing error (#526)
  3. Memory leak causing crashes fixed (#678)
  4. Cache corruption should now be much less common (#666)

New Contributors

Full Changelog: v3.0.3...v4.0.0

Don't miss a new tesseract.js release

NewReleases is sending notifications on new releases.