github huggingface/datasets 1.2.1

latest releases: 2.20.0, 2.19.2, 2.19.1...
3 years ago

New Features

  • Fast start up (#1690): Importing datasets is now significantly faster.

Datasets Changes

  • New: MNIST (#1730)
  • New: Korean intonation-aided intention identification dataset (#1715)
  • New: Switchboard Dialog Act Corpus (#1678)
  • Update: Wiki-Auto - Added unfiltered versions of the training data for the GEM simplification task. (#1722)
  • Update: Scientific papers - Mirror datasets zip (#1721)
  • Update: Update DBRD dataset card and download URL (#1699)
  • Fix: Thainer - fix ner_tag bugs (#1695)
  • Fix: reuters21578 - metadata parsing errors (#1693)
  • Fix: ade_corpus_v2 - fix config names (#1689)
  • Fix: DaNE - fix last example (#1688)

Datasets tagging

  • rename "part-of-speech-tagging" tag in some dataset cards (#1645)

Bug Fixes

  • Fix column list comparison in transmit format (#1719)
  • Fix windows path scheme in cached path (#1711)

Docs

  • Add information about caching and verifications in "Load a Dataset" docs (#1705)

Moreover many dataset cards of datasets added during the sprint were updated ! Thanks to all the contributors :)

Don't miss a new datasets release

NewReleases is sending notifications on new releases.