huggingface/datasets 2.17.0 on GitHub

What's Changed

Fix parallel downloads for datasets without scripts by @lhoestq in #6551
Fix imagefolder with one image by @lhoestq in #6556
Fix tests based on datasets that used to have scripts by @lhoestq in #6574
remove eli5 test by @lhoestq in #6583
[IterableDataset] Fix drop_last_batchin map after shuffling or sharding by @lhoestq in #6575
[WebDataset] Audio support and bug fixes by @lhoestq in #6573
Support standalone yaml by @lhoestq in #6557
Drop redundant None guard. by @xkszltl in #6596
fix os.listdir return name is empty string by @d710055071 in #6581
Fix CI: pyarrow 15, pandas 2.2 and sqlachemy by @lhoestq in #6617
Dedicated RNG object for fingerprinting by @mariosasko in #6606
Add concurrent loading of shards to datasets.load_from_disk by @kkoutini in #6464
Migrate from setup.cfg to pyproject.toml by @mariosasko in #6619
keep more info in DatasetInfo.from_merge #6585 by @JochenSiegWork in #6586
Read GeoParquet files using parquet reader by @weiji14 in #6508
Use schema metadata only if it matches features by @lhoestq in #6616
Raise error on bad split name by @lhoestq in #6626
Disable tqdm bars in non-interactive environments by @mariosasko in #6627
Add with_rank param to Dataset.filter by @mariosasko in #6608
Bump max range of dill to 0.3.8 by @ringohoffman in #6630
Fix filelock: use current umask for filelock >= 3.10 by @lhoestq in #6631
Faster webdataset streaming by @lhoestq in #6578
Multi gpu docs by @lhoestq in #6550
dataset viewer requires no-script by @severo in #6633
Make split slicing consistent with list slicing by @mariosasko in #5891
Do not use Parquet exports if revision is passed by @albertvillanova in #6555
Make CLI test support multi-processing by @albertvillanova in #6628
Support data_dir parameter in push_to_hub by @albertvillanova in #6634
Support push_to_hub without org/user to default to logged-in user by @albertvillanova in #6629
Fix reload cache with data dir by @lhoestq in #6632
Fix array cast/embed with null values by @mariosasko in #6283
Faster column validation and reordering by @psmyth94 in #6636
Better multi-gpu example by @lhoestq in #6646
Fix missing info when loading some datasets from Parquet export by @lhoestq in #6635
Minor multi gpu doc improvement by @lhoestq in #6649
Document usage of hfh cli instead of git by @lhoestq in #6648
Allow concatenation of datasets with mixed structs by @Dref360 in #6587

New Contributors

@xkszltl made their first contribution in #6596
@kkoutini made their first contribution in #6464
@JochenSiegWork made their first contribution in #6586
@weiji14 made their first contribution in #6508
@ringohoffman made their first contribution in #6630
@psmyth94 made their first contribution in #6636

Full Changelog: 2.16.1...2.17.0