neuralmagic/deepsparse v1.5.0 on GitHub

New Features:

ONNX evaluation pipeline for OpenPifPaf (#915)
YOLOv8 segmentation pipelines and validation (#924)
deepsparse.benchmark_sweep CLI to enable sweeps of benchmarks across different settings such as cores and batch sizes (#860)
Engine.generate_random_inputs() API (#966)
Example data logging configurations for pipelines/server (#867)
Expanded built-in functions for NLP and CV pipeline logging to enable better monitoring (#865) (#862)
Product usage analytics tracking in DeepSparse Community edition (documentation)

Inference latency for unstructured sparse-quantized CNNs has been improved by up to 2x.
Inference throughput and latency for dense CNNs has been improved by up to 20%.
Inference throughput and latency for dense transformers has been improved by up to 30%.
The following operators are now supported for performance:
- Neg, Unsqueeze with non-constant inputs
- MatMulInteger with two non-constant inputs
- GEMM with constant weights and 4D or 5D inputs

Transformers and YOLOv5 integrations migrated from auto install to install from PyPI packages. Going forward, pip install deepsparse[transformers] and pip install deepsparse[yolov5] will need to be used.
DeepSparse now uses hwloc to determine CPU topology. This fixes a bug where DeepSparse could not be used performantly inside of a Kubernetes cluster with a static CPU manager policy.
When users pass in a num_streams parameter that is smaller than the number of cores, multi-stream and elastic scheduler behaviors have been improved. Previously, DeepSparse would divide the system into num_streams chunks and fill each chunk until it ran out of threads. Now, each stream will use a number of threads equal to num_cores divided by num_streams, with the remainder distributed in a round-robin fashion.

In networks with a Clip operator where min isn't equal to zero, performance bugs no longer occurs.
Crashing eliminated:
- Pipeline conll eval using ignore_labels. (#903)
- YOLOv8 pipelines handling models with dynamic inputs. (#967)
- QA pipelines with sequence lengths equal to or less than 128. (#889)
- Image classification pipelines handling PNG images. (#870)
- ONNX overriding of shapes if a list was not passed in; this now automatically wraps in a list. (#914)
Assertion errors/failures removed:
- Networks with both Convolutions and GEMM operations.
- YOLOv8 model compilation.
- Slice and Unsqueeze operators with a negative axis.
- OPT models involving a constant tensor that is broadcast in two different ways.