github horovod/horovod v0.28.0
v0.28.0: Keras 2.11+ optimizers, faster reducescatter, fixes for latest TensorFlow, CUDA, NCCL

latest release: v0.28.1
12 months ago

Added

  • TensorFlow: Added new get_local_and_global_gradients to PartialDistributedGradientTape to retrieve local and non-local gradients separately. (#3859)

Changed

  • Improved reducescatter performance by allocating output tensors before enqueuing the operation. (#3824)
  • TensorFlow: Ensured that tf.logical_and within allreduce tf.cond runs on CPU. (#3885)
  • TensorFlow: Added support for Keras 2.11+ optimizers. (#3860)
  • CUDA_VISIBLE_DEVICES environment variable is no longer passed to remote nodes. (#3865)

Fixed

  • Fixed build with ROCm. (#3839, #3848)
  • Fixed build of Docker image horovod-nvtabular. (#3851)
  • Fixed linking recent NCCL by defaulting CUDA runtime library linkage to static and ensuring that weak symbols are overridden. (#3867, #3846)
  • Fixed compatibility with TensorFlow 2.12 and recent nightly versions. (#3864, #3894, #3906, #3907)
  • Fixed missing arguments of Keras allreduce function. (#3905)
  • Updated with_device functions in MXNet and PyTorch to skip unnecessary cudaSetDevice calls. (#3912)

Don't miss a new horovod release

NewReleases is sending notifications on new releases.