New methods to transform a dataset:
dataset.shuffle
: create a shuffled datasetdataset.train_test_split
: create a train and a test split (similar to sklearn)dataset.sort
: create a dataset sorted according to a certain columndataset.select
: create a dataset with rows selected following the given list of indices
Other features:
- Better instructions for datasets that require manual download
Important: if you load datasets that require manual downloads with an older version of
nlp
, instructions won't be shown and an error will be raised - Better access to dataset information (for instance
dataset.feature['label']
ordataset.dataset_size
)
Datasets:
- New: cos_e v1.0
- New: rotten_tomatoes
- New: german and italian wikipedia
New docs:
- documentation about splitting a dataset
Bug fixes:
- fix metric.compute that couldn't write on file
- fix squad_v2 imports