github huggingface/datasets 1.10.0

latest releases: 2.20.0, 2.19.2, 2.19.1...
2 years ago

Datasets Features

  • Support remote data files #2616 (@albertvillanova)
    This allows to pass URLs of remote data files to any dataset loader:
    load_dataset("csv", data_files={"train": [url_to_one_csv_file, url_to_another_csv_file...]})
    This works for all these dataset loaders:
    • text
    • csv
    • json
    • parquet
    • pandas
  • Streaming from remote text/json/csv/parquet/pandas files:
    When you pass URLs to a dataset loader, you can enable streaming mode with streaming=True. Main contributions:
  • Faster search_batch for ElasticsearchIndex due to threading #2581 (@mwrzalik)
  • Delete extracted files when loading dataset #2631 (@albertvillanova)

Datasets Changes

Dataset Tasks

Metrics Changes

General improvements and bug fixes

Dataset Cards

Docs

Don't miss a new datasets release

NewReleases is sending notifications on new releases.