New Features:
- Quantization support enabled on AVX2 instruction set for GEMM and elementwise operations.
NM_SPOOF_ARCH
environment variable added for testing different architectural configurations.- Elastic scheduler implemented as an alternative to the single-stream or multi-stream schedulers.
deepsparse.benchmark
application is now usable from the command-line after installing deepsparse to simplify benchmarking.deepsparse.server
CLI and API added with transformers support to make serving models like BERT with pipelines easy.
Changes:
- More robust architecture detection added to help resolve CPU topology, such as when running inside a virtual machine.
- Tensor columns improved, leading to significant speedups from 5 to 20% in pruned YOLO (larger batch size), BERT (smaller batch size), MobileNet, and ResNet models.
- Sparse quantized network performance improved on machines that do not support VNNI instructions.
- Performance improved for dense BERT with large batch sizes.
Resolved Issues:
- Possible crashes eliminated for:
- Pooling operations with small image sizes
- Rarely, networks containing convolution or GEMM operations
- Some models with many residual connections
Known Issues:
- None