github huggingface/datasets 2.5.0

latest releases: 3.1.0, 3.0.2, 3.0.1...
2 years ago

Important

  • Drop Python 3.6 support by @mariosasko in #4460
  • Deprecate metrics by @albertvillanova in #4739
    • Metrics are now deprecated and have been moved to evaluate:
      !pip install evaluate
      import evaluate
      metric = evaluate.load("accuracy")
  • Load GitHub datasets from Hub by @albertvillanova in #4059
  • Decode mp3 with librosa if torchaudio is > 0.12 as a temporary workaround by @polinaeterna in #4923
    • latest version of torchaudio 0.12 now requires ffmpeg (version 4) to read MP3 files, please downgrade to 0.12 for now or use librosa
  • Use HTTP requests to access data and metadata through the Datasets REST API (docs here)

Datasets features

No-code loaders

Dataset methods

Parquet support

  • Download and prepare as Parquet for cloud storage by @lhoestq in #4724
  • Shard parquet in download_and_prepare by @lhoestq in #4747
  • Embed image/audio data in dl_and_prepare parquet by @lhoestq in #4987

Datasets changes

Dataset cards

Documentation

General improvements and bug fixes

New Contributors

Full Changelog: 2.4.0...2.5.0

Don't miss a new datasets release

NewReleases is sending notifications on new releases.