huggingface/datasets 2.1.0 on GitHub

Datasets Changes

New: initial monash time series forecasting by @kashif in #3743
New: Roman Urdu Hate Speech dataset by @bp-high in #3972
New: Adversarial GLUE by @jxmorris12 in #3849
New: MetaShift by @dnaveenr in #3900
New: GSM8K by @jon-tow in #4103
New: SBU Captions Photo by @thomasw21 in #4130
Deprecated: Multilingual Librispeech - deprecate dataset in favor of facebook/multilingual_librispeechby @polinaeterna in #4060
Update (BREAKING): TIMIT - Redirect users to download data manually from LDC by @lhoestq in #4145
Update: Wikipedia by @albertvillanova in #3821 and #3989
Update: conll2012_ontonotesv5 - Support streaming by @albertvillanova in #4002
Update: daily_dialog - Support streaming by @albertvillanova in #4008
Update: id_clickbait - Support streaming by @albertvillanova in #4014
Update: blimp - Support streaming by @albertvillanova in #4016
Update: scan - Support streaming by @albertvillanova in #4017
Update: yelp_review_full - Replace data url by @lhoestq in #4018
Update: yelp_polarity - Support streaming by @lhoestq in #4019
Update: amazon_polarity - Replace data URL by @lhoestq in #4020
Update: dbpedia_14 - Replace data url by @lhoestq in #4022
Update: xtreme - Support streaming dataset for bucc18 config by @albertvillanova in #4026
Update: yahoo_answers_topics - Replace data url by @lhoestq in #4023* Update: ASSIN 2 dataset - replace broken Google Drive URLS by links on github by @ruanchaves in #4004
Update: xcopa - Support streaming by @albertvillanova in #4039
Update: medical_dialog - Add configs with processed data by @albertvillanova in #4127
Update: xtreme - Support streaming for udpos config by @albertvillanova in #4131
Update: xtreme - Support streaming for PAWS-X config by @albertvillanova in #4132
Update: xtreme - Support streaming for PAN-X config by @albertvillanova in #4135
Update: SQuAD v2 - Use a constant for the articles regex by @bryant1410 in #4030
Update: HANS - Support streaming by @mariosasko in #4155
Fix: cats_vs_dogs - fix checksum error dataset by @albertvillanova in #4033
Fix: xcopa - fix null checksum by @albertvillanova in #4034
Fix: amazon_us_reviews - fix metadata - 4/4/2022 by @trentonstrong in #4092

Dataset Cards

Updated annotations for nli_tr dataset by @e-budur in #4058
Add missing label for emotion description by @lijiazheng99 in #4151
Remove unncessary 'pylint disable' message in ReadMe by @Datta0 in #3955
Improve RedCaps dataset card by @mariosasko in #4100
Fix duplicate key in multi_news by @lhoestq in #4164

Datasets Tags and Search on the Hugging Face Hub

Tasks alignment with models by @lhoestq in #4066
Update datasets task tags to align tags with models by @lhoestq in #4067

Metrics Changes

Xtreme-S Metrics by @patrickvonplaten in #3799
Fix xtreme s metrics by @patrickvonplaten in #3957
Avoid info log messages from transformers in FrugalScore metric by @albertvillanova in #3938
Add exact match metric by @emibaylor in #3899
Fix comet metric by @lhoestq in #3945
Add zero_division argument to precision and recall metrics by @albertvillanova in #4035
Support float data types in pearsonr/spearmanr metrics by @albertvillanova in #4054
Remove GLEU metric by @emibaylor in #3949

Metric Cards

Perplexity Metric Card by @emibaylor in #3905
Create README.md by @sashavor in #3917
Create README.md for CER metric by @sashavor in #3911
Create README.md by @sashavor in #3944
Update README.md by @sashavor in #3933
Create SARI metric card by @sashavor in #3932
Create MAUVE metric card by @sashavor in #3934
Create CoVAL metric card by @sashavor in #3940
Google BLEU Metric Card by @emibaylor in #3948
Create metric card for BERTScore by @sashavor in #3966
Rename wer to cer by @pmgautam in #4012
Create metric card for XNLI by @sashavor in #4046
Create metric card for the Code Eval metric by @sashavor in #4049
Add TER metric card by @emibaylor in #3981
BLEU metric card by @emibaylor in #3947
Create metric card for CUAD by @sashavor in #4043
Create metric card for METEOR by @sashavor in #4065
Create a metric card for Competition MATH by @sashavor in #4073
Create metric card for seqeval by @sashavor in #4070
Create README.md by @sashavor in #3930
Create metric card for Frugal Score by @sashavor in #4089
Updating FrugalScore metric card by @sashavor in #4097
Proposing WikiSplit metric card by @sashavor in #4098
Fix formatting in BLEU metric card by @mariosasko in #4157

Documentation

Doc maintenance by @stevhliu in #3926
[Doc] Don't use v for version tags on GitHub by @sgugger in #3943
Use templates for doc-builidng jobs by @sgugger in #3914
Add align_labels_with_mapping docs by @stevhliu in #3931
Add tip on how to speed up loading with ImageFolder by @mariosasko in #3980
Fix main_classes docs index by @lhoestq in #3925
More consistent references in docs by @mariosasko in #3988
Docs maintenance by @stevhliu in #3999
Add ROUGE Metric Card by @emibaylor in #4076
Add chrF(++) Metric Card by @emibaylor in #4082
Add SacreBLEU Metric Card by @emibaylor in #4083

General improvements and bug fixes

Fix flatten of complex feature types by @mariosasko in #3723
Fix flatten of Sequence feature type by @lhoestq in #3962
Exclude Google Drive tests of the CI by @lhoestq in #3982
Close PIL.Image file handler in Image.decode_example by @mariosasko in #3995
Fix Faiss custom_index device by @albertvillanova in #3987
Fix None issue with Sequence of dict by @lhoestq in #4010
Update main readme by @lhoestq in #3927
Fix map remove_columns on empty dataset by @lhoestq in #4021
Fix Audio.encode_example() when writing an array by @polinaeterna in #3998
Use audio feature in ASR task template by @lhoestq in #4006
Improve out of bounds error message by @lhoestq in #4068
Increase max retries for GitHub metrics by @albertvillanova in #4063
Fix CLI dummy data generation by @albertvillanova in #4045
Fix docs on audio feature installation by @albertvillanova in #4028
Add installation instructions to image_process doc by @mariosasko in #4072
Fix GithubMetricModuleFactory instantiation with None download_config by @albertvillanova in #4078
Increase max retries for GitHub datasets by @albertvillanova in #4079
Close parquet writer properly in push_to_hub by @lhoestq in #4081
fix typo in rename_column error message by @hunterlang in #4095
Fix BeamWriter output Parquet file by @albertvillanova in #4087
Remove unused legacy Beam utils by @albertvillanova in #4088
Hotfix failing CI tests on Windows by @albertvillanova in #4119
Update security policy by @albertvillanova in #4111
Avoid writing empty license files by @albertvillanova in #4090
Support huggingface_hub 0.5 by @lhoestq in #4106
Pretty print dataset info files by @mariosasko in #4116
Add single dataset citations for TweetEval by @gchhablani in #4137
Adjust path to datasets tutorial in How-To by @NimaBoscarino in #4147
Applied index-filters on scores in search.py. by @vishalsrao in #3971
More robust cast_to_python_objects in TypedSequence by @mariosasko in #4128
Sync Features dictionaries by @mariosasko in #3997
Avoid rate limit in update hub repositories by @lhoestq in #4167

New Contributors

@bp-high made their first contribution in #3972
@ruanchaves made their first contribution in #4004
@pmgautam made their first contribution in #4012
@hunterlang made their first contribution in #4095
@trentonstrong made their first contribution in #4092
@NimaBoscarino made their first contribution in #4147
@jon-tow made their first contribution in #4103
@lijiazheng99 made their first contribution in #4151
@Datta0 made their first contribution in #3955
@vishalsrao made their first contribution in #3971

Full Changelog: 2.0.0...2.1.0