1.0.0 Release: New name, Speed-ups, Multimodal, Serialization

Package Changes

Update now with

pip install datasets

New: IWSLT 2017 (#470)
New: CommonGen Dataset (#578)
New: CLUE Benchmark (11 datasets) (#572)
New: the KILT knowledge source and tasks (#559)
New: DailyDialog (#556)
New: DoQA dataset (ACL 2020) (#473)
New: reuters21578 (#570)
New: HANS (#551)
New: MLSUM (#529)
New: Guardian authorship (#452)
New: web_questions (#401)
New: MS MARCO (#364)
Update: Germeval14 - update download url (#594)
Update: LinCE - update download url (#550)
Update: Hyperpartisan news detection - update download url, manual download no longer required (#504)
Update: Rotten Tomatoes - update download url (#484)
Update: Wiki DPR - Use HNSW faiss index (#500)
Update: Text - Speed up using multi-threaded PyArrow loading (#548)
Fix: GLUE, PAWS-X - skip header (#497)

Rename the flatten, drop and dictionary_encode_column methods in flatten_, drop_ and dictionary_encode_column_ to indicate that these methods have in-place effects
Remove the dataset.columns property and dataset.nbytes
Add a few more properties and methods to DatasetDict

Disallow the use of positional arguments to avoid predictions vs references mistakes (#466)
Allow to directly feed numpy/pytorch/tensorflow/pandas objects in metrics (#466)

Pin the version of the scripts (reproducibility) (#603, #584)
Specify default script_version with the env variable HF_SCRIPTS_VERSION (#584)
Save scripts in a modules cache directory that can be controlled with HF_MODULES_CACHE (#574)

Better support for tokenizers when caching map results (#601)
Faster caching for text dataset (#573, #502)
Use dataset fingerprints, updated after each transform (#536)
Refactor caching behavior, pickle/cloudpickle metrics and dataset, add tests on metrics (#518)

Datasets: [Breaking] fixed typo in "formated_as" method: rename formated to formatted (#516)
Datasets: fixed the error message when loading text/csv/json without providing data files (#586)
Datasets: fixed select method for pyarrow < 1.0.0 (#585)
Datasets: fixed elasticsearch result ids returning as strings (#487)
Datasets: fixed config used for slow test on real dataset (#527)
Datasets: fixed tensorflow-formatted datasets outputs by using ragged tensor by default (#530)
Datasets: fixed batched map for formatted dataset (#515)
Datasets: fixed encodings issues on Windows - apply utf-8 encoding to all datasets (#481)
Datasets: fixed dataset.map for function without outputs (#506)
Datasets: fixed bad type in overflow check (#496)
Datasets: fixed dataset info save - dont use beam fs to save info for local cache dir (#498)
Datasets: fixed arrays outputs - stack vectors in numpy, pytorch and tensorflow (#495, #494)
Metrics: fixed locking in distributed settings if one process finished before the other started writing (#564, #547)