AllenNLP v2.0.0 Release Notes
The 2.0 release of AllenNLP represents a major engineering effort that brings several exciting new features to the library, as well as a focus on performance.
If you're upgrading from AllenNLP 1.x, we encourage you to read our comprehensive upgrade guide.
Main new features
AllenNLP gets eyes 👀
One of the most exciting areas in ML research is multimodal learning, and AllenNLP is now taking its first steps in this direction with support for 2 tasks and 3 datasets in the vision + text domain. Check out our ViLBERT for VQA and Visual Entailment models, along with the VQAv2, Visual Entailment, and GQA dataset readers in allennlp-models
.
Transformer toolkit
The transformer toolkit offers a collection of modules to experiment with various transformer architectures, such as SelfAttention
, TransformerEmbeddings
, TransformerLayer
, etc. It also simplifies the way one can take apart the pretrained transformer weights for an existing module, and combine them in different ways. For instance, one can pull out the first 8 layers of bert-base-uncased
to separately encode two text inputs, combine the representations in some way, and then use the last 4 layers on the combined representation (More examples can be found in allennlp.modules.transformer
).
The toolkit also contains modules for bimodal architectures such as ViLBERT. Modules include BiModalEncoder
, which encodes two modalities separately, and performs bi-directional attention (BiModalAttention
) using a connection layer (BiModalConnectionLayer
). The VisionTextModel
class is an example of a model that uses these bimodal layers.
Multi-task learning
2.0 adds support for multi-task learning throughout the AllenNLP system. In multi-task learning, the model consists of a backbone that is common to all the tasks, and tends to be the larger part of the model, and multiple task-specific heads that use the output of the backbone to make predictions for a specific task. This way, the backbone gets many more training examples than you might have available for a single task, and can thus produce better representations, which makes all tasks benefit. The canonical example for this is BERT, where the backbone is made up of the transformer stack, and then there are multiple model heads that do classification, tagging, masked-token prediction, etc. AllenNLP 2.0 helps you build such models by giving you those abstractions. The MultiTaskDatasetReader
can read datasets for multiple tasks at once. The MultiTaskDataloader
loads the instances from the reader and makes batches. The trainer feeds these batches to a MultiTaskModel
, which consists of a Backbone
and multiple Head
s. If you want to look at the details of how this works, we have an example config available at https://github.com/allenai/allennlp-models/blob/main/training_config/vision/vilbert_multitask.jsonnet.
Changes since v2.0.0rc1
Added 🎉
- The
TrainerCallback
constructor acceptsserialization_dir
provided byTrainer
. This can be useful forLogger
callbacks those need to store files in the run directory. - The
TrainerCallback.on_start()
is fired at the start of the training. - The
TrainerCallback
event methods now accept**kwargs
. This may be useful to maintain backwards-compability of callbacks easier in the future. E.g. we may decide to pass the exception/traceback object in case of failure toon_end()
and this older callbacks may simply ignore the argument instead of raising aTypeError
. - Added a
TensorBoardCallback
which wraps theTensorBoardWriter
.
Changed ⚠️
- The
TrainerCallack.on_epoch()
does not fire withepoch=-1
at the start of the training.
Instead,TrainerCallback.on_start()
should be used for these cases. TensorBoardBatchMemoryUsage
is converted fromBatchCallback
intoTrainerCallback
.TrackEpochCallback
is converted fromEpochCallback
intoTrainerCallback
.Trainer
can accept callbacks simply with namecallbacks
instead oftrainer_callbacks
.TensorboardWriter
renamed toTensorBoardWriter
, and removed as an argument to theGradientDescentTrainer
.
In order to enable TensorBoard logging during training, you should utilize theTensorBoardCallback
instead.
Removed 👋
- Removed
EpochCallback
,BatchCallback
in favour ofTrainerCallback
.
The metaclass-wrapping implementation is removed as well. - Removed the
tensorboard_writer
parameter toGradientDescentTrainer
. You should use theTensorBoardCallback
now instead.
Fixed ✅
- Now Trainer always fires
TrainerCallback.on_end()
so all the resources can be cleaned up properly. - Fixed the misspelling, changed
TensoboardBatchMemoryUsage
toTensorBoardBatchMemoryUsage
. - We set a value to
epoch
so in case of firingTrainerCallback.on_end()
the variable is bound.
This could have lead to an error in case of trying to recover a run after it was finished training.
Commits since v2.0.0rc1
1530082 Log to TensorBoard through a TrainerCallback in GradientDescentTrainer (#4913)
8b95316 ci quick fix
fa1dc7b Add link to upgrade guide to README (#4934)
7364da0 Fix parameter name in the documentation
00e3ff2 tick version for nightly release
67fa291 Merging vision into main (#4800)
65e50b3 Bump mypy from 0.790 to 0.800 (#4927)
a744535 fix mkdocs config (#4923)
ed322eb A helper for distributed reductions (#4920)
9ab2bf0 add CUDA 10.1 Docker image (#4921)
d82287e Update transformers requirement from <4.1,>=4.0 to >=4.0,<4.2 (#4872)
4183a49 Update mkdocs-material requirement from <6.2.0,>=5.5.0 to >=5.5.0,<6.3.0 (#4880)
54e85ee disable codecov annotations (#4902)
2623c4b Making TrackEpochCallback an EpochCallback (#4893)
1d21c75 issue warning instead of failing when lock can't be acquired on a resource that exists in a read-only file system (#4867)
ec197c3 Create pull_request_template.md (#4891)
9cf41b2 fix navbar link
9635af8 rename 'master' -> 'main' (#4887)
d0a07fb docs: fix simple typo, multplication -> multiplication (#4883)
d1f032d Moving modelcard and taskcard abstractions to main repo (#4881)
1fff7ca Update docker torch version (#4873)
d2aea97 Fix typo in str (#4874)
6a8d425 add CombinedLearningRateScheduler (#4871)
a3732d0 Fix cache volume (#4869)
832901e Turn superfluous warning to info when extending the vocab in the embedding matrix (#4854)