neuralmagic/deepsparse v1.3.0 on GitHub

New Features:

Bfloat16 is now supported on CPUs with the AVX512_BF16 extension. Users can expect up to 30% performance improvement for sparse FP32 networks and an up to 75% performance improvement for dense FP32 networks. This feature is opt-in and is specified with the default_precision parameter in the configuration file.
Several options can now be specified using a configuration file.
Max and min operators are now supported for performance.
SQuAD 2.0 support provided.
NLP multi-label and eval support added.
Fraction of supported operations property added to engine class.
New ML Ops logging capabilities implemented, including metrics logging, custom functions, and Prometheus support.

Minimum Python version set to 3.7.
The default logging level has been changed to warn.
Timing functions and a default no-op deallocator have been added to improve usability of the C++ API.
DeepSparse now supports the axes parameter to be specified either as an input or an attribute in several ONNX operators.
Model compilation times have been improved on machines with many cores.
YOLOv5 pipelines upgraded to latest state from Ultralytics.
Transformers pipelines upgraded to latest state from Hugging Face.

DeepSparse no longer crashes with an assertion failure for softmax operators on dimensions with a single element.
DeepSparse no longer crashes with an assertion failure on some unstructured sparse quantized BERT models.
Image classification evaluation script no longer crashes for larger batch sizes.