huggingface/datasets 0.3.0 on GitHub

New methods to transform a dataset:

dataset.shuffle: create a shuffled dataset
dataset.train_test_split: create a train and a test split (similar to sklearn)
dataset.sort: create a dataset sorted according to a certain column
dataset.select: create a dataset with rows selected following the given list of indices

Other features:

Better instructions for datasets that require manual download

Important: if you load datasets that require manual downloads with an older version of nlp, instructions won't be shown and an error will be raised
Better access to dataset information (for instance dataset.feature['label'] or dataset.dataset_size)

Datasets:

New docs:

Bug fixes: