Major changes:
- Added bABI dataset (#286)
- Added MultiNLP dataset (#326)
- Pytorch 0.4 compatibility + bugfixes (#299, #302)
- Batch iteration now returns a tuple of
(inputs), outputs
by default without having to index attributes fromBatch
(#288) - [BREAKING]
Iterator
no longer repeats infinitely by default (now stops after epoch has completed) (#417)
Minor changes:
- Handle
moses
tokenizer being migrated from nltk (#361) - Vector loading made more efficient and flexible (#353)
- Allow special tokens to be added to the end of the vocabulary (#400)
- Allow filtering unknown words from examples (#413)
Bugfixes:
- Documentation (#382, #383, #393 #395, #410)
- Create cache dir for pretrained embeddings if it doesn't exist (#301)
- Various typos (#293, #369, #373, #344, #401, #404, #405, #418)
Dataset.split()
not copyingsort_key
fixed (#279)- Various python 2.* vs python 3.* issues (#280)
- Fix
OOV
token vector dimensionality (#308) - Lowercased type of
TabularDataset
(#315) - Fix
splits
method in various translation datasets (#377, #385, #392, #429) - Fix
ParseTextField
postprocessing (#386) - Fix SubwordVocab (#399)
- Make NestedField GPU compatible and fix frequency saving (#409, #403)
- Allow
CSVreader
params to be modified by user (#432) - Use tqdm progressbar in downloads (#425)