It's been a long time since our last release (0.9.0) nearly a year ago! There have been numerous changes and new features added since then, which we've tried to summarize below. While this release carries the same major version as our previous release (0.x.x), if you have code that relies on 0.9.0, it is likely you'll need to adapt it before updating to 0.10.0.
Looking forward, this will also be the last significant release with the 0.x.x numbering. The next release will be 1.0.0 and will include a major migration to the Hydra configuration system, with an eye towards modularizing fairseq to be more usable as a library.
Changelog:
New papers:
- Reducing Transformer Depth on Demand with Structured Dropout (Fan et al., 2019)
- MBART: Multilingual Denoising Pre-training for Neural Machine Translation ({Liu*,Gu*,Goyal*} et al., 2020)
- Neural Machine Translation with Byte-Level Subwords (Wang et al., 2019)
- Training with Quantization Noise for Extreme Model Compression ({Fan*,Stock*} et al., 2019)
- Monotonic Multihead Attention (Ma et al., 2020)
- Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)
- Lexically constrained decoding with dynamic beam allocation
- Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)
- Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)
- Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)
- Deep Transformers with Latent Depth (Li et al., 2020)
- Better Fine-Tuning by Reducing Representational Collapse (Aghajanyan et al. 2020)
Major new features:
- TorchScript support for Transformer and SequenceGenerator (PyTorch 1.6+ only)
- Model parallel training support (see Megatron-11b)
- TPU support via
--tpu
and--bf16
options (7751229) - Added VizSeq (a visual analysis toolkit for evaluating fairseq models)
- Migrated to Python logging (fb76dac)
- Added “SlowMo” distributed training backend (0dac0ff)
- Added Optimizer State Sharding (ZeRO) (5d7ed6a)
- Added several features to improve speech recognition support in fairseq: CTC criterion, external ASR decoder support (currently only wav2letter decoder) with KenLM and fairseq language model fusion
Minor features:
- Added
--patience
for early stopping - Added
--shorten-method=[none|truncate|random_crop]
to language modeling (and other) tasks - Added
--eval-bleu
for computing BLEU scores during training (60fbf64) - Added support for training huggingface models (e.g.
hf_gpt2
) (2728f9b) - Added FusedLAMB optimizer (
--optimizer=lamb
) (f75411a) - Added LSTM-based language model (
lstm_lm
) (9f4256e) - Added dummy tasks and models for benchmarking (91f0534; a541b19)
- Added tutorial and pretrained models for paraphrasing (630701e)
- Support quantization for Transformer (6379573)
- Support multi-GPU validation in fairseq-validate (2f7e3f3)
- Support batched inference in hub interface (3b53962)
- Support for language model fusion in standard beam search (5379461)
Breaking changes:
- Updated requirements to Python 3.6+ and PyTorch 1.5+
--max-sentences
renamed to--batch-size
- Main entry point scripts (eval_lm.py, generate.py, etc.) removed from root directory into
fairseq_cli
- Changed format for generation output;
H-
now corresponds to tokenized system outputs and newly addedD-
lines correspond to detokenized outputs (f353913) - We now log the stats from the log-interval (displayed as
train_inner
) instead of a rolling average over each epoch. - SequenceGenerator/Scorer does not print alignment by default, re-enable with
--print-alignment
- Print base 2 scores in generation scripts (660d69f)
- Incremental decoding interface changed to use
FairseqIncrementalState
(4e48c4a; 88185fc) - Refactor namespaces in Criterions to support library usage (introduce
LegacyFairseqCriterion
for BC) (46b773a) - Deprecate
FairseqCriterion::aggregate_logging_outputs
interface, useFairseqCriterion::reduce_metrics
instead (8679339) - Moved
fairseq.meters
tofairseq.logging.meters
and added new metrics aggregation module (fairseq.logging.metrics
) (1e324a5; f8b795f) - Reset mid-epoch stats every log-interval steps (244835d)
- Ignore duplicate entries in dictionary files (dict.txt) and support manual overwrite with
#fairseq:overwrite
option (dd1298e; 937535d) - Use 1-based indexing for epochs everywhere (aa79bb9)
Minor interface changes:
- Added
FairseqTask::begin_epoch
hook (122fc1d) FairseqTask::build_generator
interface changed (cd2555a)- Change
RobertaModel
base class toFairseqEncoder
(307df56) - Expose
FairseqOptimizer.param_groups
property (8340b2d) - Deprecate
--fast-stat-sync
and replace withFairseqCriterion::logging_outputs_can_be_summed
interface (fe6c2ed) --raw-text
and--lazy-load
are fully deprecated; use--dataset-impl
instead- Mixture of expert tasks moved to
examples/
(8845dcf)