neuralmagic/deepsparse v1.0.0 on GitHub

New Features:

Support added for running multiple models with the same engine when using the Elastic Scheduler.
When using the Elastic Scheduler, the caller can now use the num_streams argument to tune the number of requests that are processed in parallel.
Pipeline and annotation support added and generalized for transformers, yolov5, and torchvision.
Documentation additions made for transformers, yolov5, torchvision, and serving that focus on model deployment for the given integrations.
AWS SageMaker example created.

Click as a root dependency added as the new preferred route for CLI invocation and arg management.

Inference performance has been improved for unstructured sparse quantized models on AVX2 and AVX-512 systems that do not support VNNI instructions. This includes up to 20% on BERT and 45% on ResNet-50.

When a layer operates on a dataset larger than 2GB, potential crashes no longer happen.
Assertion error addressed for Reduce operations where the reduction axis is of length 1.
Rare assertion failure addressed related to Tensor Columns.
When running the DeepSparse Engine on a system with a non-uniform system topology, model compilation now properly terminates.

In rare cases, the engine may crash with an assertion failure during model compilation for a convolution with a 1x1 kernel with 2x2 convolution strides; hotfix forthcoming.
The engine will crash with an assertion failure when setting the num_streams parameter to fewer than the number of NUMA nodes; hotfix forthcoming.
In rare cases, the engine may enter an infinite loop when an operation has multiple inputs coming from the same source; hotfix forthcoming.