Possibly breaking changes:
- Set global numpy seed (4a7cd58)
- Split
in_proj_weight
into separate k, v, q projections in MultiheadAttention (fdf4c3e) - TransformerEncoder returns namedtuples instead of dict (27568a7)
New features:
- Add
--fast-stat-sync
option (e1ba32a) - Add
--empty-cache-freq
option (315c463) - Support criterions with parameters (ba5f829)
New papers:
- Simple and Effective Noisy Channel Modeling for Neural Machine Translation (49177c9)
- Levenshtein Transformer (86857a5, ...)
- Cross+Self-Attention for Transformer Models (4ac2c5f)
- Jointly Learning to Align and Translate with Transformer Models (1c66792)
- Reducing Transformer Depth on Demand with Structured Dropout (dabbef4)
- Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa) (e23e5ea)
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (a92bcda)
- CamemBERT: a French BERT (b31849a)
Speed improvements: