Bug fixes
- Bump fsspec from 2021.11.1 to 2022.3.0 by @mariosasko in #6091
- Minor fix in
iter_files
for hidden files by @mariosasko in #6092 - Use yaml instead of get data patterns when possible by @lhoestq in #6154
- Fix Parquet loading with
columns
by @mariosasko in #6160 - Fix: Missing a MetadataConfigs init when the repo has a
datasets_info.json
but no README by @clefourrier in #6164 - PyArrow 13 CI fixes by @mariosasko in #6175
- Don't alter input in Features.from_dict by @lhoestq in #6189
- Fix multiprocessing with spawn in iterable datasets by @Hubert-Bonisseur in #6165
- Set minimal fsspec version requirement to 2023.1.0 by @mariosasko in #6192
- Temporarily pin pandas < 2.1.0 by @albertvillanova in #6200
- Preserve split order in DataFilesDict by @albertvillanova in #6198
- Add missing
revision
argument by @qgallouedec in #6191 - Temporarily pin fsspec < 2023.9.0 by @albertvillanova in #6210
- Do not filter out .zip extensions from no-script datasets by @albertvillanova in #6208
- Fix empty splitinfo json by @lhoestq in #6211
- Fix to_json ValueError and remove pandas pin by @albertvillanova in #6201
- Fix checking patterns to infer packaged builder by @polinaeterna in #6215
- Rename old push_to_hub configs to "default" in dataset_infos by @lhoestq in #6218
Other improvements
- Deprecate
Dataset.export
by @mariosasko in #6081 - Deprecate
download_custom
by @mariosasko in #6093 - Ignore CI lint rule violation in Pickler.memoize by @albertvillanova in #6138
- Remove unused allowed_extensions param by @albertvillanova in #6135
- Export to_iterable_dataset to document by @npuichigo in #6145
- [Docs] Add description of
select_columns
to guide by @unifyh in #6119 - Ignore parallel warning in map_nested by @lhoestq in #6148
- [docs] Complete
to_iterable_dataset
by @stevhliu in #6158 - Raise FileNotFoundError when passing data_files that don't exist by @lhoestq in #6155
- Fix typo in about_mapstyle_vs_iterable.mdx by @lhoestq in #6171
- Document BUILDER_CONFIG_CLASS by @lhoestq in #6166
- Fix import in
image_load
doc by @mariosasko in #6181 - Use object detection images from
huggingface/documentation-images
by @mariosasko in #6177 - Use
hf-internal-testing
repos for hosting test dataset repos by @mariosasko in #6180
New Contributors
- @npuichigo made their first contribution in #6145
- @unifyh made their first contribution in #6119
Full Changelog: 2.14.4...2.14.5