horovod/horovod v0.27.0 on GitHub

Keras: Added PartialDistributedOptimizer API. (#3738)
Added HOROVOD_SPARK_USE_LOCAL_RANK_GPU_INDEX environment variable to ignore GPU device indices assigned by Spark and always use local rank GPU device in Spark estimators. (#3737)
Added support for reducescatter arguments prescale_factor and postscale_factor and moved averaging into Horovod backend. (#3815)
Spark Estimator: Added support for custom data loaders in TorchEstimator. (#3787)
Spark Estimator: Added NVTabular data loader for TorchEstimator. (#3787)

Improved NCCL performance for fused allgather operations through padding for better memory alignment. (#3727)
Improved look-ahead tensor fusion buffer size estimates when allgather and other operations are mixed. (#3727)

ROCm: Fixed GPU MPI operations support in build. (#3746)
PyTorch: Fixed linking order to avoid using Gloo from PyTorch dynamic libraries. (#3750)
Fixed memory leak in MPI_GPUAllgather. (#3727)
TensorFlow: Fixed deprecation warnings when building with TensorFlow 2.11. (#3767)
Keras: Added support for additional arguments to SyncBatchNormalization._moments(). (#3775)
Fixed version number parsing with pypa/packaging 22.0. (#3794)
TensorFlow: Fixed linking with nightly versions leading up to TensorFlow 2.12. (#3755)
TensorFlow: Fixed handling of tf.IndexedSlices types when scaling local gradients. (#3786)
Added missing MEMCPY_IN_FUSION_BUFFER timeline event for reducescatter. (#3808)
Fixed build of Docker image horovod-nvtabular. (#3817)
TensorFlow: Several fixes for allreduce and grouped allreduce handling of tf.IndexedSlices. (#3813)
Spark: Restricted PyArrow to versions < 11.0. (#3830)
TensorFlow: Resolved conflicts between multiple optimizer wrappers reusing the same gradient accumulation counter. (#3783)
TensorFlow/Keras: Fixed DistributedOptimizer with Keras 2.11+. (#3822)
PyTorch, ROCm: Fixed allreduce average on process sets. (#3815)

horovod/horovod v0.27.0 Custom data loaders in Spark TorchEstimator, more model parallelism in Keras, improved allgather performance, fixes for latest PyTorch and TensorFlow versions on GitHub