neuralmagic/deepsparse v0.10.0 on GitHub

New Features:

Quantization support enabled on AVX2 instruction set for GEMM and elementwise operations.
NM_SPOOF_ARCH environment variable added for testing different architectural configurations.
Elastic scheduler implemented as an alternative to the single-stream or multi-stream schedulers.
deepsparse.benchmark application is now usable from the command-line after installing deepsparse to simplify benchmarking.
deepsparse.server CLI and API added with transformers support to make serving models like BERT with pipelines easy.

More robust architecture detection added to help resolve CPU topology, such as when running inside a virtual machine.
Tensor columns improved, leading to significant speedups from 5 to 20% in pruned YOLO (larger batch size), BERT (smaller batch size), MobileNet, and ResNet models.
Sparse quantized network performance improved on machines that do not support VNNI instructions.
Performance improved for dense BERT with large batch sizes.

Possible crashes eliminated for:
- Pooling operations with small image sizes
- Rarely, networks containing convolution or GEMM operations
- Some models with many residual connections