github facebookresearch/fairseq v0.6.0

latest releases: v0.12.3, v0.12.2, v0.12.1...
6 years ago

Changelog:

  • 4908863: Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0
    • no more FP16Trainer, we just have an FP16Optimizer wrapper
    • most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
    • Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
    • Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
  • 1c56b58: parallelize preprocessing
  • Misc bug fixes and features

Don't miss a new fairseq release

NewReleases is sending notifications on new releases.