github huggingface/datasets 2.11.0

latest releases: 3.0.0, 2.21.0, 2.20.0...
18 months ago

Important

  • Use soundfile for mp3 decoding instead of torchaudio by @polinaeterna in #5573
    • this allows to not have dependencies on pytorch to decode audio files
    • this was possible with soundfile 0.12 which bundles libsndfile binaries at a recent version with MP3 support
  • Deprecated batch_size on Dataset.to_dict()

Datasets Features

  • Add writer_batch_size for ArrowBasedBuilder by @lhoestq in #5565
    • allow to specofy the row group / record batch size when you download_and_prepare() a dataset
  • Experimental support of cloud storage in load_dataset():
  • Support PyArrow arrays as column values in from_dict by @mariosasko in #5643
  • Allow direct cast from binary to Audio/Image by @mariosasko in #5644
  • Add column_names to IterableDataset by @patrickloeber in #5582
  • pass the dataset features to the IterableDataset.from_generator function by @Hubert-Bonisseur in #5569
  • add Dataset.to_list by @kyoto7250 in #5611

General imrovements and bug fixes

New Contributors

Full Changelog: 2.10.0...2.11.0

Don't miss a new datasets release

NewReleases is sending notifications on new releases.