github huggingface/datasets 2.0.0

latest releases: 3.0.0, 2.21.0, 2.20.0...
2 years ago

🤗 Datasets 2.0.0

We're happy to announce that our new documentation is available at hf.co/docs/datasets !

Dataset Features

  • Load a folder of images using the imagefolder dataset loader:
  • Push your image and audio datasets on the Hugging Face Hub with push_to_hub:
    • Add support for Audio and Image feature in push_to_hub by @mariosasko in #3685
  • New processing methods for streaming datasets:
    • Add IterableDataset.filter by @lhoestq in #3826
    • Manipulate columns on IterableDataset (rename columns, cast, etc.) by @lhoestq in #3862
    • Add the new methods to IterableDatasetDict by @lhoestq in #3923
  • And more:

Breaking changes

  • API changes for map and shuffle for datasets loaded in streaming mode:
    • Align map when streaming: update instead of overwrite + add missing parameters by @lhoestq in #3801
    • Align IterableDataset.shuffle with Dataset.shuffle by @lhoestq in #3842
  • Rename GenerateMode to DownloadMode by @albertvillanova in #3759
  • Remove deprecated methods/params (preparation for v2.0) by @mariosasko in #3803
  • Remove deprecated remove_columns param in filter by @mariosasko in #3827
  • Module namespace cleanup for v2.0 by @mariosasko in #3875

Dataset Changes

Dataset cards

Metric Changes

Metric cards

New documentation

General improvements and bug fixes

New Contributors

Full Changelog: 1.18.3...0.0.0

Don't miss a new datasets release

NewReleases is sending notifications on new releases.