DeepSpeed 0.1.0 Release Notes
Features
- Distributed Training with Mixed Precision
- 16-bit mixed precision
- Single-GPU/Multi-GPU/Multi-Node
- Model Parallelism
- Support for Custom Model Parallelism
- Integration with Megatron-LM
- Memory and Bandwidth Optimizations
- Zero Redundancy Optimizer (ZeRO) stage 1 with all-reduce
- Constant Buffer Optimization (CBO)
- Smart Gradient Accumulation
- Training Features
- Simplified training API
- Gradient Clipping
- Automatic loss scaling with mixed precision
- Training Optimizers
- Fused Adam optimizer and arbitrary torch.optim.Optimizer
- Memory bandwidth optimized FP16 Optimizer
- Large Batch Training with LAMB Optimizer
- Memory efficient Training with ZeRO Optimizer
- Training Agnostic Checkpointing
- Advanced Parameter Search
- Learning Rate Range Test
- 1Cycle Learning Rate Schedule
- Simplified Data Loader
- Performance Analysis and Debugging