Overview
Highlights of this release are adding Metric package and new hooks and flags to customize your workflow.
Major features:
- brand new Metrics package with built-in DDP support (by @justusschock and @SkafteNicki)
hparams
can now be anything! (callself.save_hyperparameters()
to register anything in the_init_
- many speed improvements (how we move data, adjusted some flags & PL now adds 300ms overhead per epoch only!)
- much faster
ddp
implementation. Old one was renamedddp_spawn
- better support for Hydra
- added the overfit_batches flag and corrected some bugs with the
limit_[train,val,test]_batches
flag - added conda support
- tons of bug fixes 😉
Detail changes
Added
- Added
overfit_batches
,limit_{val|test}_batches
flags (overfit now uses training set for all three) (#2213) - Added metrics
- Added type hints in
Trainer.fit()
andTrainer.test()
to reflect that also a list of dataloaders can be passed in (#1723) - Allow dataloaders without sampler field present (#1907)
- Added option
save_last
to save the model at the end of every epoch inModelCheckpoint
(#1908) - Early stopping checks
on_validation_end
(#1458) - Attribute
best_model_path
toModelCheckpoint
for storing and later retrieving the path to the best saved model file (#1799) - Speed up single-core TPU training by loading data using
ParallelLoader
(#2033) - Added a model hook
transfer_batch_to_device
that enables moving custom data structures to the target device (#1756) - Added black formatter for the code with code-checker on pull (#1610)
- Added back the slow spawn ddp implementation as
ddp_spawn
(#2115) - Added loading checkpoints from URLs (#1667)
- Added a callback method
on_keyboard_interrupt
for handling KeyboardInterrupt events during training (#2134) - Added a decorator
auto_move_data
that moves data to the correct device when using the LightningModule for inference (#1905) - Added
ckpt_path
option toLightningModule.test(...)
to load particular checkpoint (#2190) - Added
setup
andteardown
hooks for model (#2229)
Changed
- Allow user to select individual TPU core to train on (#1729)
- Removed non-finite values from loss in
LRFinder
(#1862) - Allow passing model hyperparameters as complete kwarg list (#1896)
- Renamed
ModelCheckpoint
's attributesbest
tobest_model_score
andkth_best_model
tokth_best_model_path
(#1799) - Re-Enable Logger's
ImportError
s (#1938) - Changed the default value of the Trainer argument
weights_summary
fromfull
totop
(#2029) - Raise an error when lightning replaces an existing sampler (#2020)
- Enabled prepare_data from correct processes - clarify local vs global rank (#2166)
- Remove explicit flush from tensorboard logger (#2126)
- Changed epoch indexing from 1 instead of 0 (#2206)
Deprecated
- Deprecated flags: (#2213)
overfit_pct
in favour ofoverfit_batches
val_percent_check
in favour oflimit_val_batches
test_percent_check
in favour oflimit_test_batches
- Deprecated
ModelCheckpoint
's attributesbest
andkth_best_model
(#1799) - Dropped official support/testing for older PyTorch versions <1.3 (#1917)
Removed
- Removed unintended Trainer argument
progress_bar_callback
, the callback should be passed in byTrainer(callbacks=[...])
instead (#1855) - Removed obsolete
self._device
in Trainer (#1849) - Removed deprecated API (#2073)
- Packages:
pytorch_lightning.pt_overrides
,pytorch_lightning.root_module
- Modules:
pytorch_lightning.logging.comet_logger
,pytorch_lightning.logging.mlflow_logger
,pytorch_lightning.logging.test_tube_logger
,pytorch_lightning.overrides.override_data_parallel
,pytorch_lightning.core.model_saving
,pytorch_lightning.core.root_module
- Trainer arguments:
add_row_log_interval
,default_save_path
,gradient_clip
,nb_gpu_nodes
,max_nb_epochs
,min_nb_epochs
,nb_sanity_val_steps
- Trainer attributes:
nb_gpu_nodes
,num_gpu_nodes
,gradient_clip
,max_nb_epochs
,min_nb_epochs
,nb_sanity_val_steps
,default_save_path
,tng_tqdm_dic
- Packages:
Fixed
- Run graceful training teardown on interpreter exit (#1631)
- Fixed user warning when apex was used together with learning rate schedulers (#1873)
- Fixed multiple calls of
EarlyStopping
callback (#1863) - Fixed an issue with
Trainer.from_argparse_args
when passing in unknown Trainer args (#1932) - Fixed bug related to logger not being reset correctly for model after tuner algorithms (#1933)
- Fixed root node resolution for SLURM cluster with dash in hostname (#1954)
- Fixed
LearningRateLogger
in multi-scheduler setting (#1944) - Fixed test configuration check and testing (#1804)
- Fixed an issue with Trainer constructor silently ignoring unknown/misspelt arguments (#1820)
- Fixed
save_weights_only
in ModelCheckpoint (#1780) - Allow use of same
WandbLogger
instance for multiple training loops (#2055) - Fixed an issue with
_auto_collect_arguments
collecting local variables that are not constructor arguments and not working for signatures that have the instance not namedself
(#2048) - Fixed mistake in parameters' grad norm tracking (#2012)
- Fixed CPU and hanging GPU crash (#2118)
- Fixed an issue with the model summary and
example_input_array
depending on a specific ordering of the submodules in a LightningModule (#1773) - Fixed Tpu logging (#2230)
- Fixed Pid port + duplicate
rank_zero
logging (#2140, #2231)
Contributors
@awaelchli, @baldassarreFe, @Borda, @borisdayma, @cuent, @devashishshankar, @ivannz, @j-dsouza, @justusschock, @kepler, @kumuji, @lezwon, @lgvaz, @LoicGrobol, @mateuszpieniak, @maximsch2, @moi90, @rohitgr7, @SkafteNicki, @tullie, @williamFalcon, @yukw777, @ZhaofengWu
If we forgot someone due to not matching commit email with GitHub account, let us know :]