New Features
- Fast start up (#1690): Importing
datasets
is now significantly faster.
Datasets Changes
- New: MNIST (#1730)
- New: Korean intonation-aided intention identification dataset (#1715)
- New: Switchboard Dialog Act Corpus (#1678)
- Update: Wiki-Auto - Added unfiltered versions of the training data for the GEM simplification task. (#1722)
- Update: Scientific papers - Mirror datasets zip (#1721)
- Update: Update DBRD dataset card and download URL (#1699)
- Fix: Thainer - fix ner_tag bugs (#1695)
- Fix: reuters21578 - metadata parsing errors (#1693)
- Fix: ade_corpus_v2 - fix config names (#1689)
- Fix: DaNE - fix last example (#1688)
Datasets tagging
- rename "part-of-speech-tagging" tag in some dataset cards (#1645)
Bug Fixes
- Fix column list comparison in transmit format (#1719)
- Fix windows path scheme in cached path (#1711)
Docs
- Add information about caching and verifications in "Load a Dataset" docs (#1705)
Moreover many dataset cards of datasets added during the sprint were updated ! Thanks to all the contributors :)