github mozilla/DeepSpeech v0.5.1
DeepSpeech 0.5.1

latest releases: v0.10.0-alpha.3, v0.10.0-alpha.2, v0.10.0-alpha.1...
4 years ago

General

This is the 0.5.1 release of Deep Speech, an open speech-to-text engine. This is a bug-fix release that is backwards compatible with models and checkpoints from 0.5.0. Thanks to Li Li for identifying and helping fix these bugs. This release includes source code

v0.5.1.tar.gz

and a trained model

deepspeech-0.5.1-models.tar.gz (identical to 0.5.0 models)

trained on American English which achieves an 8.22% word error rate on the LibriSpeech clean test corpus. Models with a "*.pbmm" extension are memory mapped and much more memory efficient, as well as faster to load. Models with the ".tflite" extension are converted to use with TFLite and have post-training quantization enabled, and are more suitable for resource constrained environments.

We also include example audio files:

audio-0.5.1.tar.gz

which can be used to test the engine; and checkpoint files

deepspeech-0.5.1-checkpoint.tar.gz

which can be used as the basis for further fine-tuning.

Notable changes from the previous release

  • Fixed a bug where evaluate_tflite.py would not correctly take into account all batches when computing the final WER report (#2168)
  • Add option to C++ binary to print intermediate transcripts during streaming (#2181)
  • Fixed a bug where calling DS_IntermediateDecode during streaming would negatively affect the final transcript for the stream (#2184)

Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the hardware used, a server with 8 TitanX Pascal GPUs (12GB of VRAM).

  • train_files Fisher, LibriSpeech, and Switchboard training corpora.
  • dev_files LibriSpeech clean dev corpora.
  • test_files LibriSpeech clean test corpus
  • train_batch_size 24
  • dev_batch_size 48
  • test_batch_size 48
  • n_hidden 2048
  • learning_rate 0.0001
  • dropout_rate 0.15
  • epoch 75
  • lm_alpha 0.75
  • lm_beta 1.85

The weights with the best validation loss were selected at the end of the 75 epochs using --noearly_stop. The selected model was trained for 467356 steps.

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

Also, it exposes bindings for the following languages

  • Python (Versions 3.4, 3.5, 3.6 and 3.7) installed via

    pip install deepspeech

    Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

    pip install deepspeech-gpu
  • NodeJS (Versions 4.x, 5.x, 6.x, 7.x, 8.x, 9.x, 10.x, 11.x, and 12.x) installed via

    npm install deepspeech
    

    Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

    npm install deepspeech-gpu
    
  • ElectronJS versions 3.1, 4.0, 4.1, 5.0 are also supported

  • C++ which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)

  • .NET which is installed by following the instructions on the NuGet package page.

In addition there are third party bindings that are supported by external developers, for example

  • Rust which is installed by following the instructions on the external Rust repo.
  • Go which is installed by following the instructions on the external Go repo.

Supported Platforms

  • Windows 8.1, 10, and Server 2012 R2 64-bits (Needs at least AVX support).
  • OS X 10.10, 10.11, 10.12, 10.13 and 10.14
  • Linux x86 64 bit with a modern CPU (Needs at least AVX/FMA)
  • Linux x86 64 bit with a modern CPU + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)
  • Raspbian Stretch on Raspberry Pi 3
  • ARM64 built against Debian/ARMbian Stretch and tested on LePotato boards
  • Java Android bindings / demo app. Early preview, tested only on Pixel 2 device, TF Lite model only

Known Issues

  • Feature caching speeds training but increases memory usage
  • Current v2 TRIE handling still triggers ~600MB memory usage
  • Code not yet thread safe, having multiple concurrent streams tied to the same model leads to bad transcriptions.

Contact/Getting Help

  1. FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.
  2. Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.
  3. IRC - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning channel on Mozilla IRC; people there can try to answer/help
  4. Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

Contributors to 0.5.1 release

Don't miss a new DeepSpeech release

NewReleases is sending notifications on new releases.