allenai/allennlp v2.7.0 on GitHub

What's new

Added 🎉

Added support to evaluate mutiple datasets and produce corresponding output files in the evaluate command.
Added more documentation to the learning rate schedulers to include a sample config object for how to use it.
Moved the pytorch learning rate schedulers wrappers to their own file called pytorch_lr_schedulers.py so that they will have their own documentation page.
Added a module allennlp.nn.parallel with a new base class, DdpAccelerator, which generalizes
PyTorch's DistributedDataParallel wrapper to support other implementations. Two implementations of
this class are provided. The default is TorchDdpAccelerator (registered at "torch"), which is just a thin wrapper around
DistributedDataParallel. The other is FairScaleFsdpAccelerator, which wraps FairScale's
FullyShardedDataParallel.
You can specify the DdpAccelerator in the "distributed" section of a configuration file under the key "ddp_accelerator".
Added a module allennlp.nn.checkpoint with a new base class, CheckpointWrapper, for implementations
of activation/gradient checkpointing. Two implentations are provided. The default implementation is TorchCheckpointWrapper (registered as "torch"),
which exposes PyTorch's checkpoint functionality.
The other is FairScaleCheckpointWrapper which exposes the more flexible
checkpointing funtionality from FairScale.
The Model base class now takes a ddp_accelerator parameter (an instance of DdpAccelerator) which will be available as
self.ddp_accelerator during distributed training. This is useful when, for example, instantiating submodules in your
model's __init__() method by wrapping them with self.ddp_accelerator.wrap_module(). See the allennlp.modules.transformer.t5
for an example.
We now log batch metrics to tensorboard and wandb.
Added Tango components, to be explored in detail in a later post
Added ScaledDotProductMatrixAttention, and converted the transformer toolkit to use it
Added tests to ensure that all Attention and MatrixAttention implementations are interchangeable
Added a way for AllenNLP Tango to read and write datasets lazily.
Added a way to remix datasets flexibly
Added from_pretrained_transformer_and_instances constructor to Vocabulary
TransformerTextField now supports __len__.

Fixed ✅

Fixed a bug in ConditionalRandomField: transitions and tag_sequence tensors were not initialized on the desired device causing high CPU usage (see #2884)
Fixed a mispelling: the parameter contructor_extras in Lazy() is now correctly called constructor_extras.
Fixed broken links in allennlp.nn.initializers docs.
Fixed bug in BeamSearch where last_backpointers was not being passed to any Constraints.
TransformerTextField can now take tensors of shape (1, n) like the tensors produced from a HuggingFace tokenizer.
tqdm lock is now set inside MultiProcessDataLoading when new workers are spawned to avoid contention when writing output.
ConfigurationError is now pickleable.
Checkpointer cleaning was fixed to work on Windows Paths
Multitask models now support TextFieldTensor in heads, not just in the backbone.
Fixed the signature of ScaledDotProductAttention to match the other Attention classes
allennlp commands will now catch SIGTERM signals and handle them similar to SIGINT (keyboard interrupt).
The MultiProcessDataLoader will properly shutdown its workers when a SIGTERM is received.
Fixed the way names are applied to Tango Step instances.
Fixed a bug in calculating loss in the distributed setting.
Fixed a bug when extending a sparse sequence by 0 items.

Changed ⚠️

The type of the grad_norm parameter of GradientDescentTrainer is now Union[float, bool],
with a default value of False. False means gradients are not rescaled and the gradient
norm is never even calculated. True means the gradients are still not rescaled but the gradient
norm is calculated and passed on to callbacks. A float value means gradients are rescaled.
TensorCache now supports more concurrent readers and writers.
We no longer log parameter statistics to tensorboard or wandb by default.

Commits

48af9d3 Multiple datasets and output files support for the evaluate command (#5340)
60213cd Tiny tango tweaks (#5383)
2895021 improve signal handling and worker cleanup (#5378)
b41cb3e Fix distributed loss (#5381)
6355f07 Fix Checkpointer cleaner regex on Windows (#5361)
27da04c Dataset remix (#5372)
75af38e Create Vocabulary from both pretrained transformers and instances (#5368)
5dc80a6 Adds a dataset that can be read and written lazily (#5344)
01e8a35 Improved Documentation For Learning Rate Schedulers (#5365)
8370cfa skip loading t5-base in CI (#5371)
13de38d Log batch metrics (#5362)
1f5c6e5 Use our own base images to build allennlp Docker images (#5366)
bffdbfd Bugfix: initializing all tensors and parameters of the ConditionalRandomField model on the proper device (#5335)
d45a2da Make sure that all attention works the same (#5360)
c1edaef Update google-cloud-storage requirement (#5357)
524244b Update wandb requirement from <0.12.0,>=0.10.0 to >=0.10.0,<0.13.0 (#5356)
90bf33b small fixes for tango (#5350)
2e11a15 tick version for nightly releases
311f110 Tango (#5162)
1df2e51 Bump fairscale from 0.3.8 to 0.3.9 (#5337)
b72bbfc fix constraint bug in beam search, clean up tests (#5328)
ec3e294 Create CITATION.cff (#5336)
8714aa0 This is a desperate attempt to make TensorCache a little more stable (#5334)
fd429b2 Update transformers requirement from <4.9,>=4.1 to >=4.1,<4.10 (#5326)
1b5ef3a Update spacy requirement from <3.1,>=2.1.0 to >=2.1.0,<3.2 (#5305)
1f20513 TextFieldTensor in multitask models (#5331)
76f2487 set tqdm lock when new workers are spawned (#5330)
67add9d Fix ConfigurationError deserialization (#5319)
42d8529 allow TransformerTextField to take input directly from HF tokenizer (#5329)
64043ac Bump black from 21.6b0 to 21.7b0 (#5320)
3275055 Update mkdocs-material requirement from <7.2.0,>=5.5.0 to >=5.5.0,<7.3.0 (#5327)
5b1da90 Update links in initializers documentation (#5317)
ca656fc FairScale integration (#5242)