Highlights
Version 1.1 was mainly focused on bug fixes, but there are a few important new features such as gradient checkpointing with pretrained transformer embedders and official support for automatic mixed precision (AMP) training through the new torch.amp
module.
Details
Added
Predictor.capture_model_internals()
now accepts a regex specifying which modules to capture.- Added the option to specify
requires_grad: false
within an optimizer's parameter groups. - Added the
file-friendly-logging
flag back to thetrain
command. Also added this flag to thepredict
,evaluate
, andfind-learning-rate
commands. - Added an
EpochCallback
to track current epoch as a model class member. - Added the option to enable or disable gradient checkpointing for transformer token embedders via boolean parameter
gradient_checkpointing
. - Added a method to
ModelTestCase
for running basic model tests when you aren't using config files. - Added some convenience methods for reading files.
cached_path()
can now automatically extract and read files inside of archives.- Added the ability to pass an archive file instead of a local directory to
Vocab.from_files
. - Added the ability to pass an archive file instead of a glob to
ShardedDatasetReader
. - Added a new
"linear_with_warmup"
learning rate scheduler. - Added a check in
ShardedDatasetReader
that ensures the base reader doesn't implement manual distributed sharding itself. - Added an option to
PretrainedTransformerEmbedder
andPretrainedTransformerMismatchedEmbedder
to use a scalar mix of all hidden layers from the transformer model instead of just the last layer. To utilize this, just setlast_layer_only
toFalse
. - Training metrics now include
batch_loss
andbatch_reg_loss
in addition to aggregate loss across number of batches.
Changed
- Upgraded PyTorch requirement to 1.6.
- Beam search now supports multi-layer decoders.
- Replaced the NVIDIA Apex AMP module with torch's native AMP module. The default trainer (
GradientDescentTrainer
) now takes ause_amp: bool
parameter instead of the oldopt_level: str
parameter. - Not specifying a
cuda_device
now automatically determines whether to use a GPU or not. - Discovered plugins are logged so you can see what was loaded.
allennlp.data.DataLoader
is now an abstract registrable class. The default implementation remains the same, but was renamed toallennlp.data.PyTorchDataLoader
.BertPooler
can now unwrap and re-wrap extra dimensions if necessary.
Removed
- Removed the
opt_level
parameter toModel.load
andload_archive
. In order to use AMP with a loaded model now, just run the model's forward pass within torch'sautocast
context.
Fixed
- Fixed handling of some edge cases when constructing classes with
FromParams
where the class
accepts**kwargs
. - Fixed division by zero error when there are zero-length spans in the input to a
PretrainedTransformerMismatchedIndexer
. - Improved robustness of
cached_path
when extracting archives so that the cache won't be corrupted
if a failure occurs during extraction. - Fixed a bug with the
average
andevalb_bracketing_score
metrics in distributed training. - Fixed a bug in distributed metrics that caused nan values due to repeated addition of an accumulated variable.
- Fixed how truncation was handled with
PretrainedTransformerTokenizer
.
Previously, ifmax_length
was set toNone
, the tokenizer would still do truncation if the
transformer model had a default max length in its config.
Also, whenmax_length
was set to a non-None
value, several warnings would appear
for certain transformer models around the use of thetruncation
parameter. - Fixed evaluation of all metrics when using distributed training.
- Added a
py.typed
marker. Fixed type annotations inallennlp.training.util
. - Fixed problem with automatically detecting whether tokenization is necessary.
This affected primarily the Roberta SST model. - Improved help text for using the --overrides command line flag.
- Removed unnecessary warning about deadlocks in
DataLoader
. - Fixed testing models that only return a loss when they are in training mode.
- Fixed a bug in
FromParams
that caused silent failure in case of the parameter type beingOptional[Union[...]]
. - Fixed a bug where the program crashes if
evaluation_data_loader
is aAllennlpLazyDataset
. - Reduced the amount of log messages produced by
allennlp.common.file_utils
. - Fixed a bug where
PretrainedTransformerEmbedder
parameters appeared to be trainable
in the log output even whentrain_parameters
was set toFalse
. - Fixed a bug with the sharded dataset reader where it would only read a fraction of the instances
in distributed training. - Fixed checking equality of
ArrayField
s. - Fixed a bug where
NamespaceSwappingField
did not work correctly with.empty_field()
. - Put more sensible defaults on the
huggingface_adamw
optimizer. - Simplified logging so that all logging output always goes to one file.
- Fixed interaction with the python command line debugger.
- Log the grad norm properly even when we're not clipping it.
- Fixed a bug where
PretrainedModelInitializer
fails to initialize a model with a 0-dim tensor - Fixed a bug with the layer unfreezing schedule of the
SlantedTriangular
learning rate scheduler. - Fixed a regression with logging in the distributed setting. Only the main worker should write log output to the terminal.
- Pinned the version of boto3 for package managers (e.g. poetry).
- Fixed issue #4330 by updating the
tokenizers
dependency. - Fixed a bug in
TextClassificationPredictor
so that it passes tokenized inputs to theDatasetReader
in case it does not have a tokenizer. reg_loss
is only now returned for models that have some regularization penalty configured.- Fixed a bug that prevented
cached_path
from downloading assets from GitHub releases. - Fixed a bug that erroneously increased last label's false positive count in calculating fbeta metrics.
Tqdm
output now looks much better when the output is being piped or redirected.- Small improvements to how the API documentation is rendered.
- Only show validation progress bar from main process in distributed training.
Commits
dcc9cdc Prepare for release v1.1.0
aa750be fix Average metric (#4624)
e1aa57c improve robustness of cached_path when extracting archives (#4622)
711afaa Fix division by zero when there are zero-length spans in MismatchedEmbedder. (#4615)
be97943 Improve handling of **kwargs in FromParams (#4616)
187b24e add more tutorial links to README (#4613)
e840a58 s/logging/logger/ (#4609)
dbc3c3f Added batched versions of scatter and fill to util.py (#4598)
2c54cf8 reformat for new version of black (#4605)
2dd335e batched_span_select now guarantees element order in each span (#4511)
62f554f specify module names by a regex in predictor.capture_model_internals() (#4585)
f464aa3 Bump markdown-include from 0.5.1 to 0.6.0 (#4586)
d01cdff Update RELEASE_PROCESS.md to include allennlp-models (#4587)
3aedac9 Prepare for release v1.1.0rc4
87a61ad Bug fix in distributed metrics (#4570)
71a9a90 upgrade actions to cache@v2 (#4573)
bd9ee6a Give better usage info for overrides parameter (#4575)
0a456a7 Fix boolean and categorical accuracy for distributed (#4568)
8511274 add actions workflow for closing stale issues (#4561)
de41306 Static type checking fixes (#4545)
5a07009 Fix RoBERTa SST (#4548)
351941f Only pin mkdocs-material to minor version, ignore specific patch version (#4556)
0ac13a4 fix CHANGELOG
3b86f58 Prepare for release v1.1.0rc3
44d2847 Metrics in distributed setting (#4525)
1d61965 Bump mkdocs-material from 5.5.3 to 5.5.5 (#4547)
5b97780 tick version for nightly releases
b32608e add gradient checkpointing for transformer token embedders (#4544)
f639336 Fix logger being created twice (#4538)
660fdaf Fix handling of max length with transformer tokenizers (#4534)
15e288f EpochCallBack for tracking epoch (#4540)
9209bc9 Bump mkdocs-material from 5.5.0 to 5.5.3 (#4533)
bfecdc3 Ensure len(self.evaluation_data_loader) is not called (#4531)
5bc3b73 Fix typo in warning in file_utils (#4527)
e80d768 pin torch >= 1.6
73220d7 Prepare for release v1.1.0rc2
9415350 Update torch requirement from <1.6.0,>=1.5.0 to >=1.5.0,<1.7.0 (#4519)
146bd9e Remove link to self-attention modules. (#4512)
2401282 add back file-friendly-logging flag (#4509)
54e5c83 closes #4494 (#4508)
fa39d49 ensure call methods are rendered in docs (#4522)
e53d185 Bug fix for case when param type is Optional[Union...] (#4510)
14f63b7 Make sure we have a bool tensor where we expect one (#4505)
18a4eb3 add a requires_grad option to param groups (#4502)
6c848df Bump mkdocs-material from 5.4.0 to 5.5.0 (#4507)
d73f8a9 More BART changes (#4500)
1cab3bf Update beam_search.py (#4462)
478bf46 remove deadlock warning in DataLoader (#4487)
714334a Fix reported loss: Bug fix in batch_loss (#4485)
db20b1f use longer tqdm intervals when output being redirected (#4488)
53eeec1 tick version for nightly releases
d693cf1 PathLike (#4479)
2f87832 only show validation progress bar from main process (#4476)
9144918 Fix reported loss (#4477)
5c97083 fix release link in CHANGELOG and formatting in README
4eb9795 Prepare for release v1.1.0rc1
f195440 update 'Models' links in README (#4475)
9c801a3 add CHANGELOG to API docs, point to license on GitHub, improve API doc formatting (#4472)
69d2f03 Clean up Tqdm bars when output is being piped or redirected (#4470)
7b188c9 fixed bug that erronously increased last label's false positive count (#4473)
64db027 Skip ETag check if OSError (#4469)
b9d011e More BART changes (#4468)
7a563a8 add option to use scalar mix of all transformer layers (#4460)
d00ad66 Minor tqdm and logging clean up (#4448)
6acf205 Fix regloss logging (#4449)
8c32ddf Fixing bug in TextClassificationPredictor so that it passes tokenized inputs to the DatasetReader (#4456)
b9a9164 Update transformers requirement from <2.12,>=2.10 to >=2.10,<3.1 (#4446)
181ef5d pin boto3 to resolve some dependency issues (#4453)
c75a1eb ensure base reader of ShardedDatasetReader doesn't implement sharding itself (#4454)
8a05ad4 Update CONTRIBUTING.md (#4447)
5b988d6 ensure only rank 0 worker writes to terminal (#4445)
8482f02 fix bug with SlantedTriangular LR scheduler (#4443)
e46a578 Update transformers requirement from <2.11,>=2.10 to >=2.10,<2.12 (#4411)
8229aca Fix pretrained model initialization (#4439)
60deece Fix type hint in text_field.py (#4434)
23e549e More multiple-choice changes (#4415)
6d0a4fd generalize DataLoader (#4416)
acd9995 Automatic file-friendly logging (#4383)
637dbb1 fix README, pin mkdocs, update mkdocs-material (#4412)
9c4dfa5 small fix to pretrained transformer tokenizer (#4417)
84988b8 Log plugins discovered and filter out transformers "PyTorch version ... available" log message (#4414)
54c41fc Adds the ability to automatically detect whether we have a GPU (#4400)
96ff585 Changes from my multiple-choice work (#4368)
eee15ca Assign an empty mapping array to empty fields of NamespaceSwappingField
(#4403)
aa2943e Bump mkdocs-material from 5.3.2 to 5.3.3 (#4398)
7fa7531 fix eq method of ArrayField (#4401)
e104e44 Add test to ensure data loader yields all instances when batches_per_epoch is set (#4394)
b6fd697 fix sharded dataset reader (#4396)
30e5dbf Bump mypy from 0.781 to 0.782 (#4395)
b0ba2d4 update version
1d07cc7 Bump mkdocs-material from 5.3.0 to 5.3.2 (#4389)
ffc5184 ensure Vocab.from_files and ShardedDatasetReader can handle archives (#4371)
20afe6c Add Optuna integrated badge to README.md (#4361)
ba79f14 Bump mypy from 0.780 to 0.781 (#4390)
85e531c Update README.md (#4385)
c2ecb7a Add a method to ModelTestCase for use without config files (#4381)
6852def pin some doc building requirements (#4386)
bf422d5 Add github template for using your own python run script (#4380)
ebde6e8 Bump overrides from 3.0.0 to 3.1.0 (#4375)
e52b751 ensure transformer params are frozen at initialization when train_parameters is false (#4377)
3e8a9ef Add link to new template repo for config file development (#4372)
4f70bc9 tick version for nightly releases
63a5e15 Update spacy requirement from <2.3,>=2.1.0 to >=2.1.0,<2.4 (#4370)
ef7c75b reduce amount of log messages produced by file_utils (#4366)