Dataset Features
Enable large scale distributed dataset streaming:
- Keep hffs cache in workers when streaming by @lhoestq in #7820
- Retry open hf file by @lhoestq in #7822
These improvements require huggingface_hub>=0.36.0 to take full effect
What's Changed
- fix conda deps by @lhoestq in #7810
- Add pyarrow's binary view to features by @delta003 in #7795
- Fix polars cast column image by @CloseChoice in #7800
- Allow streaming hdf5 files by @lhoestq in #7814
- Fix batch_size default description in to_polars docstrings by @albertvillanova in #7824
- docs: document_dataset PDFs & OCR by @ethanknights in #7812
- Add custom fingerprint support to from_generatorby @simonreise in #7533
- picklable batch_fn by @lhoestq in #7826
New Contributors
- @delta003 made their first contribution in #7795
- @CloseChoice made their first contribution in #7800
- @ethanknights made their first contribution in #7812
- @simonreise made their first contribution in #7533
Full Changelog: 4.2.0...4.3.0
