pypi datasets 2.7.0

latest releases: 3.0.1, 3.0.0, 2.21.0...
22 months ago

Dataset Features

  • Multiprocessed dataset builder by @TevenLeScao in #5107
    • Load big datasets faster than before using multiprocessing:
    from datasets import load_dataset
    ds = load_dataset("imagenet-1k", num_proc=4)
  • Make torch.Tensor and spacy models cacheable by @mariosasko in #5191
    • Function passed to map or filter that uses tensors or pipelines can now be cached
  • Drop labels in Image and Audio folders if files are on different levels in directory or if there is only one label by @polinaeterna in #5192
  • TextConfig: added "errors" by @NightMachinery in #5155

Audio setup

Docs

General improvements and bug fixes

New Contributors

Full Changelog: 2.6.1...2.7.0

Don't miss a new datasets release

NewReleases is sending notifications on new releases.