github huggingface/datasets 1.13.0

latest releases: 2.19.1, 2.19.0, 2.18.0...
2 years ago

Dataset changes

Metric changes

Dataset features

  • Use with TensorFlow:
  • Better support for ZIP files:
    • Support loading dataset from multiple zipped CSV data files #3021 (@albertvillanova)
    • Load private data files + use glob on ZIP archives for json/csv/etc. module inference #3041 (@lhoestq)
  • Streaming improvements:
    • Extend support for streaming datasets that use glob.glob #3015 (@albertvillanova)
    • Add remove_columns to IterableDataset #3030 (@cccntu)
    • All the above ZIP features also work in streaming mode
  • New utilities:
    • Add get_dataset_split_names() to get a dataset config's split names #2906 (@severo)
  • Replace script_version with revision #2933 (@albertvillanova)
    • The script_version parameter in load_dataset is now deprecated, in favor of revision
  • Experimental - Create Audio feature type #2324 (@albertvillanova):
    • It allows to automatically decode audio data (mp3, wav, flac, etc.) when examples are accessed

Dataset cards

Documentation

General improvements and bug fixes

Breaking changes:

  • Due to the big refactoring at #2986, the prepare_module function doesn't support the return_resolved_file_path and return_associated_base_path parameters. As an alternative, you may use the dataset_module_factory instead.

Don't miss a new datasets release

NewReleases is sending notifications on new releases.