github huggingface/datasets 0.3.0

latest releases: 2.19.1, 2.19.0, 2.18.0...
3 years ago

New methods to transform a dataset:

  • dataset.shuffle: create a shuffled dataset
  • dataset.train_test_split: create a train and a test split (similar to sklearn)
  • dataset.sort: create a dataset sorted according to a certain column
  • dataset.select: create a dataset with rows selected following the given list of indices

Other features:

  • Better instructions for datasets that require manual download

    Important: if you load datasets that require manual downloads with an older version of nlp, instructions won't be shown and an error will be raised

  • Better access to dataset information (for instance dataset.feature['label'] or dataset.dataset_size)

Datasets:

  • New: cos_e v1.0
  • New: rotten_tomatoes
  • New: german and italian wikipedia

New docs:

  • documentation about splitting a dataset

Bug fixes:

  • fix metric.compute that couldn't write on file
  • fix squad_v2 imports

Don't miss a new datasets release

NewReleases is sending notifications on new releases.