github pytorch/text v0.2.1
0.2.1: Bugfixes and More Datasets

latest releases: v0.17.2, v0.17.2-rc6, v0.17.2-rc5...
6 years ago

This is a minor release; we have not included any breaking API changes but there are some new features that don't break existing APIs.

We have always intended to support lazy datasets (specifically, those implemented as Python generators) but this version includes a bugfix that makes that support more useful. See a demo of it in action here.

Datasets:

  • Added support for sequence tagging (e.g., NER/POS/chunking) datasets and wrapped the Universal Dependencies POS-tagged corpus (#157, thanks @sivareddyg!)

Features:

  • Added pad_first keyword argument to Field constructors, allowing left-padding in addition to right-padding (#161, thanks @GregorySenay!)
  • Support loading word vectors from local folder (#168, thanks @ahhegazy!)
  • Support using list (character tokenization) in ReversibleField (#188)
  • Added hooks for Sphinx/RTD documentation (#179, thanks @keon and @EntilZha, whose preliminary version is available at torch-text.readthedocs.io)
  • Added support for torchtext.__version__ (#179, thanks @keon!)

Bugfixes:

  • Fixed deprecated word vector usage in WT2 dataset (#166, thanks @keon!)
  • Fixed bug in word vector loading (#168, thanks @ahhegazy!)
  • Fixed bug in word vector aliases (#191, thanks @ryanleary!)
  • Fixed side effects of building a vocabulary (#193 + #181, thanks @donglixp!)
  • Fixed arithmetic mistake in language modeling dataset length calculation (#182, thanks @jihunchoi!)
  • Avoid materializing an otherwise-lazy dataset when using filter_pred (#194)
  • Fixed bug in raw float fields (#159)
  • Avoid providing a misleading len when using batch_size_fn (#192)

Don't miss a new text release

NewReleases is sending notifications on new releases.