DeepSpeed v0.4.0
- [Press release] DeepSpeed: Accelerating large-scale model inference and training via system optimizations and compression
- New inference API inference setup
- DeepSpeed Inference: Multi-GPU inference with customized inference kerenls and quantization support
- Mixture-of-Quantization: A novel quantization approach for reducing model size with minimal accuracy impact
- MoQ tutorial for more details.