Dataset Features
-
Add lance format support by @eddyxu in #7913
- Support for both Lance dataset (including metadata / manifests) and standalone .lance files
- e.g. with lance-format/fineweb-edu
from datasets import load_dataset ds = load_dataset("lance-format/fineweb-edu", streaming=True) for example in ds["train"]: ...
What's Changed
- Raise early for invalid
revisioninload_datasetby @Scott-Simmons in #7929 - fix low but large example indexerror by @CloseChoice in #7912
- Fix method to retrieve attributes from file object by @lhoestq in #7938
- add _OverridableIOWrapper by @lhoestq in #7942
- Add _generate_shards by @lhoestq in #7943
New Contributors
- @eddyxu made their first contribution in #7913
- @Scott-Simmons made their first contribution in #7929
Full Changelog: 4.4.2...4.5.0