Features
- Support pyarrow large_list by @albertvillanova in #7019
- Support Polars round trip:
import polars as pl from datasets import Dataset df1 = pl.from_dict({"col_1": [[1, 2], [3, 4]]} df2 = Dataset.from_polars(df).to_polars() assert df1.equals(df2)
- Support Polars round trip:
What's Changed
- Use
HF_HUB_OFFLINE
instead ofHF_DATASETS_OFFLINE
by @Wauplin in #6968 - packaging: Remove useless dependencies by @daskol in #6971
- Fix resuming arrow format by @lhoestq in #6964
- Fix webdataset pickling by @lhoestq in #6972
- Set temporary numpy upper version < 2.0.0 to fix CI by @albertvillanova in #6975
- Fix regression for pandas < 2.0.0 in JSON loader by @albertvillanova in #6978
- Ensure compatibility with numpy 2.0.0 by @KennethEnevoldsen in #6976
- Remove underlines between badges by @novialriptide in #6966
- Update docs on trust_remote_code defaults to False by @albertvillanova in #6981
- Improve skip take shuffling and distributed by @lhoestq in #6965
- Fix tests using hf-internal-testing/librispeech_asr_dummy by @albertvillanova in #6998
- Fix dump of bfloat16 torch tensor by @lhoestq in #7002
- minor fix for bfloat16 by @lhoestq in #7003
- Fix incorrect rank value in data splitting by @yzhangcs in #6994
- less script docs by @lhoestq in #6993
- Fix CI by temporarily pinning ruff < 0.5.0 by @albertvillanova in #7007
- Support ruff 0.5.0 in CI by @albertvillanova in #7009
- Fix WebDatasets KeyError for user-defined Features when a field is missing in an example by @ProGamerGov in #7004
- [Streaming] retry on requests errors by @lhoestq in #6963
- Re-enable raising error from huggingface-hub FutureWarning in CI by @albertvillanova in #7011
- Skip faiss tests on Windows to avoid running CI for 360 minutes by @albertvillanova in #7014
- Support fsspec 2024.6.1 by @albertvillanova in #7017
- Persist IterableDataset epoch in workers by @lhoestq in #6710
- Fix casting list array to fixed size list by @albertvillanova in #7021
- Remove dead code for pyarrow < 15.0.0 by @albertvillanova in #7023
- Fix check_library_imports by @lhoestq in #7026
- Missing line from previous pr by @lhoestq in #7027
- Fix ci by @lhoestq in #7028
- Add decorator as explicit test dependency by @albertvillanova in #7043
- Mark tests that require librosa by @albertvillanova in #7044
- Unblock NumPy 2.0 by @NeilGirdhar in #6991
- Fix tensorflow min version depending on Python version by @albertvillanova in #7045
- Support librosa and numpy 2.0 for Python 3.10 by @albertvillanova in #7046
- add checkpoint and resume title in docs by @lhoestq in #7050
- Update load_hub.mdx by @severo in #7057
- Add batching to IterableDataset by @lappemic in #7054
- Avoid calling http_head for non-HTTP URLs by @albertvillanova in #7062
- Fix load_dataset for data_files with protocols other than HF by @matstrand in #6862
- Add batch method to Dataset class by @lappemic in #7064
- Fix doc generation when NamedSplit is used as parameter default value by @albertvillanova in #7036
- Fix CI by temporarily marking test_convert_to_parquet as expected to fail by @albertvillanova in #7074
- add split argument to Generator by @piercus in #7015
- Update required soxr version from pre-release to release by @albertvillanova in #7075
- Fix CI test_convert_to_parquet by @albertvillanova in #7078
- Fix prepare_single_hop_path_and_storage_options by @albertvillanova in #7068
- Set load_from_disk path type as PathLike by @albertvillanova in #7081
- Fix push_to_hub by not calling create_branch if branch exists by @albertvillanova in #7069
- feat: support non streamable arrow file binary format by @kmehant in #7025
- Support HTTP authentication in non-streaming mode by @albertvillanova in #7082
- chore: fix typos in docs by @hattizai in #7034
- Fix CI for metrics by @albertvillanova in 83e5c05
New Contributors
- @novialriptide made their first contribution in #6966
- @yzhangcs made their first contribution in #6994
- @ProGamerGov made their first contribution in #7004
- @NeilGirdhar made their first contribution in #6991
- @matstrand made their first contribution in #6862
- @lappemic made their first contribution in #7054
- @piercus made their first contribution in #7015
- @kmehant made their first contribution in #7025
- @hattizai made their first contribution in #7034
Full Changelog: 2.20.0...2.21.0