New Features:
- Python 3.10 support added.
- Zero-shot text classification pipeline implemented.
- Haystack Information Retrieval pipeline implemented.
- YOLACT pipeline native integration for deployments is available.
- DeepSparse pipelines now support dynamic batch, dynamic shape through bucketing, and asynchronous execution support.
- CustomTaskPipeline added to enable easier custom pipeline creation.
Changes:
- The behavior of the Multi-stream scheduler is now identical to the Elastic scheduler, and the old Multi-stream scheduler has been removed.
- NLP pipelines for question answering, text classification, and token classification upgraded to improve accuracy and better match the SparseML training pathways.
- Updates made across the repository for new SparseZoo Python APIs.
- Max torchvision version increased to 0.12.0 for computer vision deployment pathways.
Performance:
- Inference performance improvements for
- unstructured sparse quantized Transformer models.
- slow activation functions (such as Gelu or Swish) when they follow a QuantizeLinear operator.
- some sparse 1D convolutions. Speedups of up to 3x are observed.
- Squeeze, when operating on a single axis.
Resolved Issues:
- Assertion errors no longer when one node had multiple inputs, both coming from the same node no longer occurs.
- An assertion error no longer appears when a MatMul operator followed a Transpose or Reshape operator no longer occurs.
- Pipelines now support hyphenated versions of standard task names such as question-answering,
Known Issues:
- In the C++ interface, the engine will crash with a segmentation fault when the
num_streams
provided to theengine_context_t
is greater than the number of physical CPU cores.