pypi datasets 1.1.0
1.1.0: Windows support, Better Multiprocessing, New Datasets

latest releases: 2.19.0, 2.18.0, 2.17.1...
3 years ago

Windows support

  • Add Windows support (#644):
    • add tests and CI for Windows
    • fix numerous windows specific issues
    • The library now fully supports Windows

Dataset changes

  • New: HotpotQA (#703)
  • New: OpenWebText (#660)
  • New: Winogrande - add debiased subset (#655)
  • Update: XNLI - update download link (#695)
  • Update: text - switch to pandas reader, better memory usage, fix delimiter issues (#689)
  • Update: csv - add features parameter to CSV (#685)
  • Fix: GAP - fix wrong computation of boolean features (#680)
  • Fix: C4 - fix manual instruction function (#681)

Metric changes

  • Update: ROUGE - Add rouge 2 and rouge Lsum to rouge metric outputs by default (#701, #702)
  • Fix: SQuAD - fix kwargs description (#670)

Dataset Features

  • Use multiprocess from pathos for multiprocessing (#656):
    • allow lambda functions in multiprocessed map
    • allow local functions in multiprocessed map
    • and more ! As long as functions are compatible with dill

Bug fixes

  • Datasets: fix possible program hanging with tokenizers - Disable tokenizers parallelism in multiprocessed map (#688)
  • Datasets: fix cast with unordered features - fix column order issue in cast (#684)
  • Datasets: fix first time creation of cache directory - move cache dir root creation in builder's init (#677)
  • Datasets: fix OverflowError when using negative ids - fix negative ids in slicing with an array (#679)
  • Datasets: fix empty dictionaries afetr multiprocessing - keep new columns in transmit format (#659)
  • Datasets: fix type inference for nested types - handle data alteration when trying type (#653)
  • Metrics: fix compute metric with empty input - pass metric features to the reader (#654)

Documentation

  • Elasticsearch integration documentation (#696)

Tests

  • Use GitHub instead of AWS in remote dataset tests (#694)

Don't miss a new datasets release

NewReleases is sending notifications on new releases.