AllenNLP v2.0.0 Release Notes

The 2.0 release of AllenNLP represents a major engineering effort that brings several exciting new features to the library, as well as a focus on performance.

If you're upgrading from AllenNLP 1.x, we encourage you to read our comprehensive upgrade guide.

Main new features

AllenNLP gets eyes 👀

One of the most exciting areas in ML research is multimodal learning, and AllenNLP is now taking its first steps in this direction with support for 2 tasks and 3 datasets in the vision + text domain. Check out our ViLBERT for VQA and Visual Entailment models, along with the VQAv2, Visual Entailment, and GQA dataset readers in allennlp-models.

Transformer toolkit

The transformer toolkit offers a collection of modules to experiment with various transformer architectures, such as SelfAttention, TransformerEmbeddings, TransformerLayer, etc. It also simplifies the way one can take apart the pretrained transformer weights for an existing module, and combine them in different ways. For instance, one can pull out the first 8 layers of bert-base-uncased to separately encode two text inputs, combine the representations in some way, and then use the last 4 layers on the combined representation (More examples can be found in allennlp.modules.transformer).

The toolkit also contains modules for bimodal architectures such as ViLBERT. Modules include BiModalEncoder, which encodes two modalities separately, and performs bi-directional attention (BiModalAttention) using a connection layer (BiModalConnectionLayer). The VisionTextModel class is an example of a model that uses these bimodal layers.

Multi-task learning

2.0 adds support for multi-task learning throughout the AllenNLP system. In multi-task learning, the model consists of a backbone that is common to all the tasks, and tends to be the larger part of the model, and multiple task-specific heads that use the output of the backbone to make predictions for a specific task. This way, the backbone gets many more training examples than you might have available for a single task, and can thus produce better representations, which makes all tasks benefit. The canonical example for this is BERT, where the backbone is made up of the transformer stack, and then there are multiple model heads that do classification, tagging, masked-token prediction, etc. AllenNLP 2.0 helps you build such models by giving you those abstractions. The MultiTaskDatasetReader can read datasets for multiple tasks at once. The MultiTaskDataloader loads the instances from the reader and makes batches. The trainer feeds these batches to a MultiTaskModel, which consists of a Backbone and multiple Heads. If you want to look at the details of how this works, we have an example config available at https://github.com/allenai/allennlp-models/blob/main/training_config/vision/vilbert_multitask.jsonnet.

Changes since `v2.0.0rc1`

Added 🎉

The TrainerCallback constructor accepts serialization_dir provided by Trainer. This can be useful for Logger callbacks those need to store files in the run directory.
The TrainerCallback.on_start() is fired at the start of the training.
The TrainerCallback event methods now accept **kwargs. This may be useful to maintain backwards-compability of callbacks easier in the future. E.g. we may decide to pass the exception/traceback object in case of failure to on_end() and this older callbacks may simply ignore the argument instead of raising a TypeError.
Added a TensorBoardCallback which wraps the TensorBoardWriter.

Changed ⚠️

The TrainerCallack.on_epoch() does not fire with epoch=-1 at the start of the training.
Instead, TrainerCallback.on_start() should be used for these cases.
TensorBoardBatchMemoryUsage is converted from BatchCallback into TrainerCallback.
TrackEpochCallback is converted from EpochCallback into TrainerCallback.
Trainer can accept callbacks simply with name callbacks instead of trainer_callbacks.
TensorboardWriter renamed to TensorBoardWriter, and removed as an argument to the GradientDescentTrainer.
In order to enable TensorBoard logging during training, you should utilize the TensorBoardCallback instead.

Removed 👋

Removed EpochCallback, BatchCallback in favour of TrainerCallback.
The metaclass-wrapping implementation is removed as well.
Removed the tensorboard_writer parameter to GradientDescentTrainer. You should use the TensorBoardCallback now instead.

Fixed ✅

Now Trainer always fires TrainerCallback.on_end() so all the resources can be cleaned up properly.
Fixed the misspelling, changed TensoboardBatchMemoryUsage to TensorBoardBatchMemoryUsage.
We set a value to epoch so in case of firing TrainerCallback.on_end() the variable is bound.
This could have lead to an error in case of trying to recover a run after it was finished training.

Commits since `v2.0.0rc1`

1530082 Log to TensorBoard through a TrainerCallback in GradientDescentTrainer (#4913)
8b95316 ci quick fix
fa1dc7b Add link to upgrade guide to README (#4934)
7364da0 Fix parameter name in the documentation
00e3ff2 tick version for nightly release
67fa291 Merging vision into main (#4800)
65e50b3 Bump mypy from 0.790 to 0.800 (#4927)
a744535 fix mkdocs config (#4923)
ed322eb A helper for distributed reductions (#4920)
9ab2bf0 add CUDA 10.1 Docker image (#4921)
d82287e Update transformers requirement from <4.1,>=4.0 to >=4.0,<4.2 (#4872)
4183a49 Update mkdocs-material requirement from <6.2.0,>=5.5.0 to >=5.5.0,<6.3.0 (#4880)
54e85ee disable codecov annotations (#4902)
2623c4b Making TrackEpochCallback an EpochCallback (#4893)
1d21c75 issue warning instead of failing when lock can't be acquired on a resource that exists in a read-only file system (#4867)
ec197c3 Create pull_request_template.md (#4891)
9cf41b2 fix navbar link
9635af8 rename 'master' -> 'main' (#4887)
d0a07fb docs: fix simple typo, multplication -> multiplication (#4883)
d1f032d Moving modelcard and taskcard abstractions to main repo (#4881)
1fff7ca Update docker torch version (#4873)
d2aea97 Fix typo in str (#4874)
6a8d425 add CombinedLearningRateScheduler (#4871)
a3732d0 Fix cache volume (#4869)
832901e Turn superfluous warning to info when extending the vocab in the embedding matrix (#4854)

allenai/allennlp v2.0.0 on GitHub