Key updates
- PyTorch 1.5 support
- Added Horovod distributed_backend option
- Enable forward compatibility with the native AMP (PyTorch 1.6).
- Support 8-core TPU on Kaggle
- Added ability to customize progress_bar via Callbacks
- Speed/memory optimizations.
- Improved Argparse usability with Trainer
- Docs improvements
- Tons of bug fixes
Detail changes
Added
- Added flag
replace_sampler_ddp
to manually disaple sampler replacement in ddp (#1513) - Added speed parity tests (max 1 sec difference per epoch)(#1482)
- Added
auto_select_gpus
flag to trainer that enables automatic selection of available GPUs on exclusive mode systems. - Added learining rate finder (#1347)
- Added support for ddp mode in clusters without SLURM (#1387)
- Added
test_dataloaders
parameter toTrainer.test()
(#1434) - Added
terminate_on_nan
flag to trainer that performs a NaN check with each training iteration when set toTrue
(#1475) - Added speed parity tests (max 1 sec difference per epoch)(#1482)
- Added
terminate_on_nan
flag to trainer that performs a NaN check with each training iteration when set toTrue
. (#1475) - Added
ddp_cpu
backend for testing ddp without GPUs (#1158) - Added Horovod support as a distributed backend
Trainer(distributed_backend='horovod')
(#1529) - Added support for 8 core distributed training on Kaggle TPU's (#1568)
- Added support for native AMP (#1561, [#1580)
Changed
- Changed the default behaviour to no longer include a NaN check with each training iteration. (#1475)
- Decoupled the progress bar from trainer. It is a callback now and can be customized or even be replaced entirely (#1450).
- Changed lr schedule step interval behavior to update every backwards pass instead of every forwards pass (#1477)
- Defines shared proc. rank, remove rank from instances (e.g. loggers) (#1408)
- Updated semantic segmentation example with custom u-net and logging (#1371)
- Disabled val and test shuffling (#1600)
Deprecated
- Deprecated
training_tqdm_dict
in favor ofprogress_bar_dict
(#1450).
Removed
- Removed
test_dataloaders
parameter fromTrainer.fit()
(#1434)
Fixed
- Added the possibility to pass nested metrics dictionaries to loggers (#1582)
- Fixed memory leak from opt return (#1528)
- Fixed saving checkpoint before deleting old ones (#1453)
- Fixed loggers - flushing last logged metrics even before continue, e.g.
trainer.test()
results (#1459) - Fixed optimizer configuration when
configure_optimizers
returns dict withoutlr_scheduler
(#1443) - Fixed
LightningModule
- mixing hparams and arguments inLightningModule.__init__()
crashes load_from_checkpoint() (#1505) - Added a missing call to the
on_before_zero_grad
model hook (#1493). - Allow use of sweeps with WandbLogger (#1512)
- Fixed a bug that caused the
callbacks
Trainer argument to reference a global variable (#1534). - Fixed a bug that set all boolean CLI arguments from Trainer.add_argparse_args always to True (#1571)
- Fixed do not copy the batch when training on a single GPU (#1576, [#1579)
- Fixed soft checkpoint removing on DDP (#1408)
- Fixed automatic parser bug (#1585)
- Fixed bool conversion from string (#1606)
Contributors
@alexeykarnachev, @areshytko, @awaelchli, @Borda, @borisdayma, @ethanwharris, @fschlatt, @HenryJia, @Ir1d, @justusschock, @karlinjf, @lezwon, @neggert, @rmrao, @rohitgr7, @SkafteNicki, @tgaddair, @williamFalcon
If we forgot someone due to not matching commit email with GitHub account, let us know :]