Core
- Added State repr and input batch as engine.state.batch (#641)
- Adapted core metrics only to be used in distributed configuration (#635)
- Added fbeta metric as core metric (#653)
- Added event filtering feature (e.g. every/once/event filter logic) (#656)
- BC breaking change: Refactor ModelCheckpoint into Checkpoint + DiskSaver / ModelCheckpoint (#673)
- Added option
n_saved=None
to store all checkpoints (#703)
- Added option
- Improved accumulation metrics (#681)
- Early stopping min delta (#685)
- Droped Python 2.7 support (#699)
- Added feature: Metric can accept a dictionary (#689)
- Added Dice Coefficient metric (#680)
- Added helper method to simplify the setup of class loggers (#712)
Engine refactoring (BC breaking change)
Finally solved the issue #62 to resume training from an epoch or iteration
- Engine refactoring + features (#640)
- engine checkpointing
- variable epoch lenght defined by
epoch_length
- two additional events:
GET_BATCH_STARTED
andGET_BATCH_COMPLETED
- cifar10 example with save/resume in distributed conf
Contrib
- Improved
create_lr_scheduler_with_warmup
(#646) - Added helper method to plot param scheduler values with matplotlib (#650)
- BC Breaking change: with multiple optimizer's param groups (#690)
- Added state_dict/load_state_dict (#690)
- BC Breaking change: Let the user specify tqdm parameters for log_message (#695)
Examples
- Added an example of hyperparameters tuning with Ax on CIFAR10 (#652)
- Added CIFAR10 distributed example
Reproducible trainings as "References"
Inspired by torchvision/references, we provide several reproducible baselines for vision tasks:
Features:
- Distributed training with mixed precision by nvidia/apex
- Experiments tracking with MLflow or Polyaxon
Acknowledgments
🎉 Thanks to our community and all our contributors for the issues, PRs and 🌟 ⭐️ 🌟 !
💯 We really appreciate your implication into the project (in alphabetical order):