github horovod/horovod v0.27.0
Custom data loaders in Spark TorchEstimator, more model parallelism in Keras, improved allgather performance, fixes for latest PyTorch and TensorFlow versions

latest releases: v0.28.1, v0.28.0
15 months ago

Added

  • Keras: Added PartialDistributedOptimizer API. (#3738)
  • Added HOROVOD_SPARK_USE_LOCAL_RANK_GPU_INDEX environment variable to ignore GPU device indices assigned by Spark and always use local rank GPU device in Spark estimators. (#3737)
  • Added support for reducescatter arguments prescale_factor and postscale_factor and moved averaging into Horovod backend. (#3815)
  • Spark Estimator: Added support for custom data loaders in TorchEstimator. (#3787)
  • Spark Estimator: Added NVTabular data loader for TorchEstimator. (#3787)

Changed

  • Improved NCCL performance for fused allgather operations through padding for better memory alignment. (#3727)
  • Improved look-ahead tensor fusion buffer size estimates when allgather and other operations are mixed. (#3727)

Fixed

  • ROCm: Fixed GPU MPI operations support in build. (#3746)
  • PyTorch: Fixed linking order to avoid using Gloo from PyTorch dynamic libraries. (#3750)
  • Fixed memory leak in MPI_GPUAllgather. (#3727)
  • TensorFlow: Fixed deprecation warnings when building with TensorFlow 2.11. (#3767)
  • Keras: Added support for additional arguments to SyncBatchNormalization._moments(). (#3775)
  • Fixed version number parsing with pypa/packaging 22.0. (#3794)
  • TensorFlow: Fixed linking with nightly versions leading up to TensorFlow 2.12. (#3755)
  • TensorFlow: Fixed handling of tf.IndexedSlices types when scaling local gradients. (#3786)
  • Added missing MEMCPY_IN_FUSION_BUFFER timeline event for reducescatter. (#3808)
  • Fixed build of Docker image horovod-nvtabular. (#3817)
  • TensorFlow: Several fixes for allreduce and grouped allreduce handling of tf.IndexedSlices. (#3813)
  • Spark: Restricted PyArrow to versions < 11.0. (#3830)
  • TensorFlow: Resolved conflicts between multiple optimizer wrappers reusing the same gradient accumulation counter. (#3783)
  • TensorFlow/Keras: Fixed DistributedOptimizer with Keras 2.11+. (#3822)
  • PyTorch, ROCm: Fixed allreduce average on process sets. (#3815)

Don't miss a new horovod release

NewReleases is sending notifications on new releases.