github huggingface/datasets 2.18.0

latest releases: 2.19.1, 2.19.0
2 months ago

Dataset features

  • Make JSON builder support an array of strings by @albertvillanova in #6696
  • Base parquet batch_size on parquet row group size by @lhoestq in #6701
    • Faster cold start for streaming
  • Change default compression argument for JsonDatasetWriter by @Rexhaif in #6659
  • Automatic Conversion for uint16/uint32 to Compatible PyTorch Dtypes by @mohalisad in #6660
  • fsspec: support fsspec>=2023.12.0 glob changes by @pmrowla in #6687
    • Support latest fsspec up to 2024.2.0

General improvements and bug fixes

New Contributors

Full Changelog: 2.17.1...2.18.0

Don't miss a new datasets release

NewReleases is sending notifications on new releases.