App
Changed
Fixed
- refactor path to root preventing circular import (#18357)
Fabric
Changed
- On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966)
Fixed
- Fixed model parameters getting shared between processes when running with
strategy="ddp_spawn"
andaccelerator="cpu"
; this has a necessary memory impact, as parameters are replicated for each process now (#18238) - Removed false positive warning when using
fabric.no_backward_sync
with XLA strategies (#17761) - Fixed issue where Fabric would not initialize the global rank, world size, and rank-zero-only rank after initialization and before launch (#16966)
- Fixed FSDP full-precision
param_dtype
training (16-mixed
,bf16-mixed
and32-true
configurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278)
PyTorch
Changed
- On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966)
- Fix inefficiency in rich progress bar (#18369)
Fixed
- Fixed FSDP full-precision
param_dtype
training (16-mixed
andbf16-mixed
configurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278) - Fixed an issue that prevented the use of custom logger classes without an
experiment
property defined (#18093) - Fixed setting the tracking uri in
MLFlowLogger
for logging artifacts to the MLFlow server (#18395) - Fixed redundant
iter()
call to dataloader when checking dataloading configuration (#18415) - Fixed model parameters getting shared between processes when running with
strategy="ddp_spawn"
andaccelerator="cpu"
; this has a necessary memory impact, as parameters are replicated for each process now (#18238) - Properly manage
fetcher.done
withdataloader_iter
(#18376)
Contributors
@awaelchli, @Borda, @carmocca, @quintenroets, @rlizzo, @speediedan, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]