Added
-
Added process sets to concurrently run collective operations on subsets of Horovod processes in TensorFlow, PyTorch, and MXNet. (#2839, #3042, #3043, #3054, #3083, #3090)
-
Added XLA support for Allreduce via
tf.function(jit_compile=True)
. (#3053) -
Added fused buffer scaling and unpack/pack kernels on GPU. (#2973)
-
Added support for NCCL on CUDA 11.4. (#3182)
-
Added fp16 compression for MXNet. (#2987)
-
Added terminate_on_nan flag to Spark Lightning estimator. (#3088)
-
Added barrier() API to torch module to support simple synchronization among ranks and to achieve parity with PyTorch DDP and similar frameworks. #3139
-
Added params for customizing Tensorboard callback. (#3153)
-
Added
hvd.cross_rank()
for keras. (#3008) -
Added barrier() API to torch module to support simple synchronization among ranks and to achieve parity with PyTorch DDP and similar frameworks. #3139
Changed
-
Implemented more asynchronous dependency handling on GPU. (#2963)
-
Ray: RayExecutor will now use the current placement group instead of always creating a new one. (#3134)
-
Lightning: turned off shuffling for validation dataset. (#2974)
-
Ray: RayExecutor will use the current placement group if one exists. (#3134)
-
Extended
hvd.join()
to return the last rank that joined. (#3097)
Removed
- Spark/Keras: remove bare Keras support. (#3191)
Fixed
-
Fix Horovod develop/editable install mode and incremental builds. (#3074)
-
Estimator/Lightning: use lightning datamodule. (#3084)
-
Fix Horovod Spark StringType and numpy type mapping issue. (#3146)
-
Fixed error in Keras LearningRateScheduler. (#3135)
-
Fixed bug in Lightning Profiler on Ray. (#3122)
-
Fixed torch op lazy release to prevent OOM in elastic training. (#3110)
-
Lightning: Fixed usage of the checkpoint callback. (#3186)
-
Fixed MPICH support to use Intel MPI's implementation. (#3148)
-
Fixed race condition in PyTorch async dataloader. (#3120)