pytorch/text 0.4.0 on GitHub

Highlights

torchtext 0.4.0 includes several example scripts that showcase how to create data, build vocabularies, train, test and run inference for common supervised learning baselines. We further provide a tutorial to explain these examples in more detail.

For an advanced application of these constructs see the iterable_train.py example.

We would like to thank the open source community, who continues to send pull
requests for new features and bug-fixes.

ngrams_iterator an iterator that yields ngrams based on a given list or iterator of strings. (#567 #577)
build_vocab_from_iterator (#567)
extract_archive (#569)

Added logging to download_from_url (#569)
Added fast, basic english sentence normalization to get_tokenizer (#569 #568)
Updated docs theme to pytorch_sphinx_theme (#573)
Refined Example.fromJSON() to support parse nested key for parsing nested JSON dataset. (#563)
Added __len__ & get_vecs_by_tokens in 'Vectors' class to generate vector from a list of tokens (#561)
Added templates for torchtext users to bring up issues (#553 #574)
Added a new argument specials in Field.build_vocab to save the user-defined special tokens (#495)
Added a new argument is_target in RawField class to show whether the field is a target variable - False by default (#459). Adjusted is_target argument in LabelField to True to take it into effect (#450)
Added the option to serialize fields with torch.save or pickle.dump, allow tokenizers in different languages (#453)

Allow caching from unverified SSL in CharNGram (#554)
Fix the wrong unk index by generating the unk_index according to the specials (#531)
Update Moses tokenizer link in README.rst file (#529)
Fix the url to load wiki.simple.vec (#525), fix the dead url to load fastText vectors (#521)
Fix UnicodeDecodeError for loading sequence tagging dataset (#506)
Fix collisions between oov words and in-vocab words caused by Issue #447 (#482)
Fix a mistake in the processing bar of Vectors class (#480)
Add the dependency to six under 'install_requires' in the setup.py file (PR #475 for Issue #465)
Fix a bug in Field class which causes overwriting the stop_words attribute (PR #458 for Issue #457)
Transpose the text and target tensors if the text field in BPTTIterator has 'batch_first' set to True (#462)
Add <unk> to default specials (#567)