github neuralmagic/deepsparse v0.12.0
DeepSparse v0.12.0

latest releases: v1.8.0, v1.7.1, v1.7.0...
2 years ago

New Features:

Documentation:

Changes:

Performance:

  • Speedup for large batch sizes when using sync mode on AMD EPYC processors.
  • AVX2 improvements for
    • Up to 40% speedup out of the box for dense quantized models.
    • Up to 20% speedup for pruned quantized BERT, ResNet-50 and MobileNet.
  • Speedup from sparsity realized for ConvInteger operators.
  • Model compilation time decreased on systems with many cores.
  • Multi-stream Scheduler: certain computations that were executed during runtime are now precomputed.
  • Hugging Face Transformers integration updated to latest state from upstream main branch.

Documentation:

Resolved Issues:

  • When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will no longer disable optimizations and see very poor performance.
  • Users executing arch.bin now receive a correct architecture profile of their system.

Known Issues:

  • When running the DeepSparse engine on a system with a nonuniform system topology, for example, an AMD EPYC processor where some cores per core-complex (CCX) have been disabled, model compilation will never terminate. A workaround is to set the environment variable NM_SERIAL_UNIT_GENERATION=1.

Don't miss a new deepsparse release

NewReleases is sending notifications on new releases.