github allenai/allennlp v0.6.0

latest releases: v2.10.1, v2.10.0, v2.9.3...
5 years ago

AllenNLP v.0.6.0 has been upgraded to use PyTorch 0.4.1. Accordingly, it should now run on Python 3.7.

It contains a handful of breaking changes, most of which probably won't affect you.

Breaking changes:

1. HOCON -> Jsonnet for Configuration files

Although our experiment configurations look like JSON, they were technically HOCON (which was a superset of JSON). In this release we changed the format to Jsonnet, which is a different superset of JSON.

If your configuration files are "JSON with comments", this change should not affect you. Your configuration files are valid jsonnet and will work fine as is. We believe this described 99+% of people using allennlp.

If you are using advanced features of HOCON, then these changes will be breaking for you. Probably the two most common issues will be

unquoted strings

JSON requires strings to be quoted. HOCON doesn't. Jsonnet does. So in the off chance that you have not been putting your strings in quotes, you'll need to start putting them in quotes.

environment variables

HOCON allows you to substitute in environment variables, like

    "root_directory": ${HOME}

Jsonnet only allows substitution of explicit variables, using a syntax like

    "root_directory": std.extVar("HOME")

these are in fact variables fed to the Jsonnet parser (not environment variables); however, the allennlp code will read all the environment variables and feed them to the parser

the elimination of ConfigTree

(you probably don't care about this)

previously the AllenNLP Params object was a wrapper around a pyhocon ConfigTree, which is basically a fancy dict. After this change, Params.params is just a plain dict instead of a ConfigTree, so if you have code that relies on it being a ConfigTree, that code will break. This is very unlikely to affect you.

why did we make this change?

There is a bug in the Python HOCON parser that incorrectly handles backslashes in strings. This created issues involving initializer regexes being serialized and deserialized incorrectly. Once we determined that the bug was not simple enough for us to easily fix, we chose this as the next best solution.

(in addition, jsonnet has some nice features involving templates that you might find useful in your experiments)

2. Change to the Predictor API

The API for the _json_to_instance method of the Predictor used to be (json: JsonDict) -> Tuple[Instance, JsonDict], where the returned JsonDict contained information from the input which you wanted to be returned in the predictor. This is now not allowed, and the _json_to_instance method returns only an Instance, meaning any additional information must be routed through your model via the use of MetadataFields. This change was to make Predictors agnostic of where Instances they process come from, allowing us to generate predictions from an original dataset using a DatasetReader to generate instances.

This means you can now do:
allennlp predict /path/to/original/dataset --use-dataset-reader, rather than having to format your data as .jsonl files.

3. Automatic implementation of from_params

It used to be the case that if you implemented your own Model or DatasetReader or whatever, you were required to implement a from_params classmethod that unpacked a Params object and called the constructor with the relevant values. In most cases this method was just boilerplate that didn't do anything interesting -- it popped off strings and strings and ints and ints and so on. And it opened you up to a class of subtle bugs if your from_params popped parameters with a different default value than the constructor used.

In the latest version, any class that inherits from FromParams (which automatically includes all Registrable classes) gets for free a from_params method that does the "right thing". If you need complex logic to instantiate your class from a JSON config, you'll still have to write your own method, but in most cases you won't need to.

There are some from_params methods that take additional parameters; for example, every Model constructor requires a Vocabulary, which will need to be supplied by its from_params method. To support this, the automatic from_params allows extra keyword-only arguments. That is, if you are calling the from_params method yourself (which you probably aren't), you have to do

YourModel.from_params(params, vocab=vocab)

if you try to supply the extra arguments positionally (which you could when all of the from_params were defined explicitly), you will get an error. This is the "breaking" component of the change.

4. changes to TokenIndexers

previously the interface for TokenIndexer was

TokenIndexer.token_to_indices(self, token: Token, vocabulary: Vocabulary) -> TokenType:

this assumption (one token) -> (one or more indices) turned out to be not general enough. there are cases where you want to generate indices that depend on multiple tokens, and where you want to generate multiple sets of (related) indices from one input text. accordingly, we changed the API to

TokenIndexer.tokens_to_indices(self, tokens: List[Token], vocabulary: Vocabulary, index_name: str) -> Dict[str, List[TokenType]]:

this is some real library-innards stuff, and it is unlikely to affect you or your code unless you have been writing your own TokenIndexer subclasses or Field subclasses (which is not most users). If this does describe you, look at the changes to TextField to see how to update your code.


other changes:

9540125 Tree decoding fix (#1606)
4eaeff7 Fix use of scalar tensors in ConllCorefScores (#1604)
982cedd Small typo fixes in tutorial (#1603)
45fff83 use empty list for no package not empty string (#1602)
e0e5f4a revert conllu changes (#1600)
49626cc filter out numpy 'size changed' warnings (#1601)
4fca028 (include-package-fix) Log number of parameters in optimizers (#1598)
b79d500 Add file friendly logging for elmo. (#1593)
152b590 Output details when running check-links. (#1569)
068407e make --include-package include all submodules (#1586)
12b74e5 Add some debugging echo commands to pip. (#1579)
9194b30 copy bin/ into image (#1587)
bf760b0 be friendlier to windows users (#1572)
4fa4dc2 fix and pin conllu dependency == 1.0 (#1581)
6b37dd2 Turn off shuffling during evaluation (#1578)
07bfc31 Demo features for the dependency parser (#1560)
025b5e7 Remove a step from verify. (#1565)
d2c0274 Don't use a hard-coded temp directory. (#1564)
15e3645 openai transformer LM embedder (#1525)
87b32bb Expose iterator shuffling through Trainer config (#1557)
52f44e2 Add parsimonious for SQL parsing to setup.py (#1558)
089d16d SQL action sequences and Atis World (#1524)
6f0fec1 make passing different feedforward modules more flexible (#1555)
dcba726 WIP: Skip tests that require Java in test-install (#1551)
8438f91 Remove the unused NltkWordSplitter and punkt model. (#1548)
09c2cc5 Dependency parser predictor (#1538)
c2e70ca upgrade to pytorch 0.4.1 + make work with python 3.7 (but still 3.6 also) (#1543)
c000ae2 made checklist updates more efficient (#1552)
2ec4c5c re-work dependency parser to use HEAD sentinel inside model (#1544)
10ac9ed Update install_requirements.sh
1c2a0de Remove requirements_test.txt (merge into requirements.txt) (#1541)
2154e72 Allow server start without field-names. (#1523)
e32f486 fix BasicTextFieldEmbedder.from_params to reflect the constructor (#1474)
dc1ff36 Fix the reported broken links. (#1533)
e049afc fix ud reader in case of implicit references (#1529)
ad265f8 Add output-file option in evaluate to save the computed metrics (#1512)
c37ff2c update config files to jsonnet format (#1479)
f3fce4c Move cache breaker to the end. (#1527)
c9385e7 Fixed broken link (#1508)
8cf893e Add a scirpt to report broken links in all markdowns. (#1522)
be69e52 Parser improvements (#1515)
e4b86b0 Show warning before ignoring key with unseparable batches from model.forward. (#1520)
2c9abf9 Minor change of a comment (#1500)
0722d7f DenseSparseAdam + CRF Feedforward layer (#1519)
1402b7c Add to_file method in Params and default preference ordering. (#1517)
e0581b6 Preserve best metrics (#1504)
de0d3f7 Dependency parser (#1465)
be0f0c2 Remove extra .params (#1513)
7df8275 Text field updates to support multiple arrays in TokenIndexer (#1506)
88c381a Make usage of make-vocab, dry-run consistent with train and allow 'extend' to be used by both (#1487)
34a92d0 Update using_as_a_library_pt1.md (#1509)
8e5ee65 fix dumb domain filtering (#1505)
8a20820 Ensure Contiguous Hidden State Tensors in Encoders (#1493)
66b2c1c Bio to bioul (#1497)
f4eef6e [for discussion] change token_to_indices -> tokens_to_indices (in preparation for byte pair encoding) (#1499)
9ec3aa6 Fix start of tqdm logging in training. (#1492)
ee003d2 Fix SpanBasedF1Measure allowed label encodings comment (#1501)
d307a25 Add IOB1 support to SpanBasedF1Metric (#1494)
9c21696 fixing a bug in trainer for histograms (#1498)
5f2f539 Add option to have tie breaking in Categorical Accuracy (#1485)
7457710 Update elmo.py (#1496)
5fc7a00 Fix SpanBasedF1Measure for tags without conll labels (#1491)
01ddd12 make tables nice in validation summary (#1490)
ba6f345 Crf ner tweaks (#1488)
f5bbe59 Move param import (#1484)
e50b102 fine grained ner reader (#1483)
d9e9861 don't call create_kwargs for a class that has no constructor (#1481)
52c0835 instantiate default activation functions in constructor (#1478)
580dc8b (mostly) remove from_params (#1191)
ff41dda Implementation of ESIM model (#1469)
e2edc9b unwrap tensors in avg metric (#1463)
77298a9 Fix logging of no-grad parameters. (#1448)
bef52ed Fix call to vocab.token_from_index -> self.label_namespace (#1459)
7cc3db1 fix Vocabulary.from_params to accept a dict for max_vocab_size (#1460)
59ecd3b Fix conll2003.from_params incorrect default (#1453)
f09ff87 Allow to use a different validation iterator from training iterator (#1455)
a56fa40 remove RegistrableVocabulary (#1454)
f136ae0 Fix a typo in embedding_tokens notebook. (#1449)
d4ee5db Make bucket iterator respect maximum_samples_per_batch (#1446)
f0ed1d4 Few feature additions (#1438)
74a30d0 update the look and feel of the config explorer (#1412)
fa34344 refactor iterators (#1157)
43fc89e Enables Predict to use dataset readers from models (#1434)
d2e3035 enable mypy on tests (#1437)
7664b12 Add support for selective finetune (freeze parameters by regex from config file) (#1427)
8855042 eliminate or make private most of the new Vocabulary methods (#1436)
a0c368a Fix an edge case for incompatible vocabulary extension. (#1435)
18d4fee remove adaptive iterator (#1433)
0312b16 Add support for configurable vocabulary extension (#1416)
5d38282 Avoid non-model state in predictors (#1422)
eaf5b7e Call before logging to tensorboard (#1423)
872acf9 Make evaluation tqdm description ignore metrics starting with _ (#1430)
36d91fd Make tqdm description ignore metrics starting with _ (#1425)
70d4d3c use sensible default for num_serialised_models_to_keep (#1420)
9dbba33 Fix chdir in ModelTestCase breaking downstream models (#1418)
1031815 duplicate config in Predictor.from_archive (#1413)
2bf1e28 fix a minor typo in docstring causing wrong api usage docs of vocabulary config. (#1415)
e16a6b5 Split off function to find latest checkpoint in Trainer (#1414)
70b4ffb (jonborchardt/master) replace hocon with jsonnet (#1409)
8a31494 (upstream/ratecalculus, jonborchardt/ratecalculus) Add --include-sentence-indices flag to ELMo command (#1404)
4bd8e7f Add support for prevention of parameter initialization which match the given regexes (#1405)
76deabb Remove frontend (#1407)
6800d76 fix elmo command to use line indices and disallow empty lines (#1397)
e903018 Fix multiple GPU training after upgrading to pytorch 0.4 (#1401)
db519af Update README.md with ./allennlp/run.py (#1395)
3dff9c7 remove demo (#1338)
6da17d6 Adds support for reading pretrained embeddings (text format) from uncompressed files and archives (#1364)
5e38a08 Update Dockerfile.pip
da429d6 Update Dockerfile.pip
2b32a86 Update Dockerfile.pip
bb08b06 Add a Dockerfile for downstream usage of AllenNLP. (#1389)
bab565a Update elmo.py (#1388)
3aa81e7 In get_from_cache(), allow redirections in head requests (#1387)
f4d8d07 Output answers in wikitables predictor when inputs are batched (#1384)
a807239 create ccgbank dataset reader (#1381)

Don't miss a new allennlp release

NewReleases is sending notifications on new releases.