What's new
Added 🎉
- Added support to evaluate mutiple datasets and produce corresponding output files in the
evaluate
command. - Added more documentation to the learning rate schedulers to include a sample config object for how to use it.
- Moved the pytorch learning rate schedulers wrappers to their own file called
pytorch_lr_schedulers.py
so that they will have their own documentation page. - Added a module
allennlp.nn.parallel
with a new base class,DdpAccelerator
, which generalizes
PyTorch'sDistributedDataParallel
wrapper to support other implementations. Two implementations of
this class are provided. The default isTorchDdpAccelerator
(registered at "torch"), which is just a thin wrapper around
DistributedDataParallel
. The other isFairScaleFsdpAccelerator
, which wraps FairScale's
FullyShardedDataParallel
.
You can specify theDdpAccelerator
in the "distributed" section of a configuration file under the key "ddp_accelerator". - Added a module
allennlp.nn.checkpoint
with a new base class,CheckpointWrapper
, for implementations
of activation/gradient checkpointing. Two implentations are provided. The default implementation isTorchCheckpointWrapper
(registered as "torch"),
which exposes PyTorch's checkpoint functionality.
The other isFairScaleCheckpointWrapper
which exposes the more flexible
checkpointing funtionality from FairScale. - The
Model
base class now takes addp_accelerator
parameter (an instance ofDdpAccelerator
) which will be available as
self.ddp_accelerator
during distributed training. This is useful when, for example, instantiating submodules in your
model's__init__()
method by wrapping them withself.ddp_accelerator.wrap_module()
. See theallennlp.modules.transformer.t5
for an example. - We now log batch metrics to tensorboard and wandb.
- Added Tango components, to be explored in detail in a later post
- Added
ScaledDotProductMatrixAttention
, and converted the transformer toolkit to use it - Added tests to ensure that all
Attention
andMatrixAttention
implementations are interchangeable - Added a way for AllenNLP Tango to read and write datasets lazily.
- Added a way to remix datasets flexibly
- Added
from_pretrained_transformer_and_instances
constructor toVocabulary
TransformerTextField
now supports__len__
.
Fixed ✅
- Fixed a bug in
ConditionalRandomField
:transitions
andtag_sequence
tensors were not initialized on the desired device causing high CPU usage (see #2884) - Fixed a mispelling: the parameter
contructor_extras
inLazy()
is now correctly calledconstructor_extras
. - Fixed broken links in
allennlp.nn.initializers
docs. - Fixed bug in
BeamSearch
wherelast_backpointers
was not being passed to anyConstraint
s. TransformerTextField
can now take tensors of shape(1, n)
like the tensors produced from a HuggingFace tokenizer.tqdm
lock is now set insideMultiProcessDataLoading
when new workers are spawned to avoid contention when writing output.ConfigurationError
is now pickleable.- Checkpointer cleaning was fixed to work on Windows Paths
- Multitask models now support
TextFieldTensor
in heads, not just in the backbone. - Fixed the signature of
ScaledDotProductAttention
to match the otherAttention
classes allennlp
commands will now catchSIGTERM
signals and handle them similar toSIGINT
(keyboard interrupt).- The
MultiProcessDataLoader
will properly shutdown its workers when aSIGTERM
is received. - Fixed the way names are applied to Tango
Step
instances. - Fixed a bug in calculating loss in the distributed setting.
- Fixed a bug when extending a sparse sequence by 0 items.
Changed ⚠️
- The type of the
grad_norm
parameter ofGradientDescentTrainer
is nowUnion[float, bool]
,
with a default value ofFalse
.False
means gradients are not rescaled and the gradient
norm is never even calculated.True
means the gradients are still not rescaled but the gradient
norm is calculated and passed on to callbacks. Afloat
value means gradients are rescaled. TensorCache
now supports more concurrent readers and writers.- We no longer log parameter statistics to tensorboard or wandb by default.
Commits
48af9d3 Multiple datasets and output files support for the evaluate command (#5340)
60213cd Tiny tango tweaks (#5383)
2895021 improve signal handling and worker cleanup (#5378)
b41cb3e Fix distributed loss (#5381)
6355f07 Fix Checkpointer cleaner regex on Windows (#5361)
27da04c Dataset remix (#5372)
75af38e Create Vocabulary from both pretrained transformers and instances (#5368)
5dc80a6 Adds a dataset that can be read and written lazily (#5344)
01e8a35 Improved Documentation For Learning Rate Schedulers (#5365)
8370cfa skip loading t5-base in CI (#5371)
13de38d Log batch metrics (#5362)
1f5c6e5 Use our own base images to build allennlp Docker images (#5366)
bffdbfd Bugfix: initializing all tensors and parameters of the ConditionalRandomField
model on the proper device (#5335)
d45a2da Make sure that all attention works the same (#5360)
c1edaef Update google-cloud-storage requirement (#5357)
524244b Update wandb requirement from <0.12.0,>=0.10.0 to >=0.10.0,<0.13.0 (#5356)
90bf33b small fixes for tango (#5350)
2e11a15 tick version for nightly releases
311f110 Tango (#5162)
1df2e51 Bump fairscale from 0.3.8 to 0.3.9 (#5337)
b72bbfc fix constraint bug in beam search, clean up tests (#5328)
ec3e294 Create CITATION.cff (#5336)
8714aa0 This is a desperate attempt to make TensorCache a little more stable (#5334)
fd429b2 Update transformers requirement from <4.9,>=4.1 to >=4.1,<4.10 (#5326)
1b5ef3a Update spacy requirement from <3.1,>=2.1.0 to >=2.1.0,<3.2 (#5305)
1f20513 TextFieldTensor in multitask models (#5331)
76f2487 set tqdm lock when new workers are spawned (#5330)
67add9d Fix ConfigurationError
deserialization (#5319)
42d8529 allow TransformerTextField to take input directly from HF tokenizer (#5329)
64043ac Bump black from 21.6b0 to 21.7b0 (#5320)
3275055 Update mkdocs-material requirement from <7.2.0,>=5.5.0 to >=5.5.0,<7.3.0 (#5327)
5b1da90 Update links in initializers documentation (#5317)
ca656fc FairScale integration (#5242)