Dataset Features
-
Add nifti support by @CloseChoice in #7815
- Load medical imaging datasets from Hugging Face:
ds = load_dataset("username/my_nifti_dataset") ds["train"][0] # {"nifti": <nibabel.nifti1.Nifti1Image>}
- Load medical imaging datasets from your disk:
files = ["/path/to/scan_001.nii.gz", "/path/to/scan_002.nii.gz"] ds = Dataset.from_dict({"nifti": files}).cast_column("nifti", Nifti()) ds["train"][0] # {"nifti": <nibabel.nifti1.Nifti1Image>}
-
Add num channels to audio by @CloseChoice in #7840
# samples have shape (num_channels, num_samples)
ds = ds.cast_column("audio", Audio()) # default, use all channels
ds = ds.cast_column("audio", Audio(num_channels=2)) # use stereo
ds = ds.cast_column("audio", Audio(num_channels=1)) # use monoWhat's Changed
- Fix random seed on shuffle and interleave_datasets by @CloseChoice in #7823
- fix ci compressionfs by @lhoestq in #7830
- fix: better args passthrough for
_batch_setitems()by @sghng in #7817 - Fix: Properly render [!TIP] block in stream.shuffle documentation by @art-test-stack in #7833
- resolves the ValueError: Unable to avoid copy while creating an array by @ArjunJagdale in #7831
- fix column with transform by @lhoestq in #7843
- support fsspec 2025.10.0 by @lhoestq in #7844
New Contributors
- @sghng made their first contribution in #7817
- @art-test-stack made their first contribution in #7833
Full Changelog: 4.3.0...4.4.0