Datasets Changes
- New: initial monash time series forecasting by @kashif in #3743
- New: Roman Urdu Hate Speech dataset by @bp-high in #3972
- New: Adversarial GLUE by @jxmorris12 in #3849
- New: MetaShift by @dnaveenr in #3900
- New: GSM8K by @jon-tow in #4103
- New: SBU Captions Photo by @thomasw21 in #4130
- Deprecated: Multilingual Librispeech - deprecate dataset in favor of
facebook/multilingual_librispeech
by @polinaeterna in #4060 - Update (BREAKING): TIMIT - Redirect users to download data manually from LDC by @lhoestq in #4145
- Update: Wikipedia by @albertvillanova in #3821 and #3989
- Update: conll2012_ontonotesv5 - Support streaming by @albertvillanova in #4002
- Update: daily_dialog - Support streaming by @albertvillanova in #4008
- Update: id_clickbait - Support streaming by @albertvillanova in #4014
- Update: blimp - Support streaming by @albertvillanova in #4016
- Update: scan - Support streaming by @albertvillanova in #4017
- Update: yelp_review_full - Replace data url by @lhoestq in #4018
- Update: yelp_polarity - Support streaming by @lhoestq in #4019
- Update: amazon_polarity - Replace data URL by @lhoestq in #4020
- Update: dbpedia_14 - Replace data url by @lhoestq in #4022
- Update: xtreme - Support streaming dataset for bucc18 config by @albertvillanova in #4026
- Update: yahoo_answers_topics - Replace data url by @lhoestq in #4023* Update: ASSIN 2 dataset - replace broken Google Drive URLS by links on github by @ruanchaves in #4004
- Update: xcopa - Support streaming by @albertvillanova in #4039
- Update: medical_dialog - Add configs with processed data by @albertvillanova in #4127
- Update: xtreme - Support streaming for udpos config by @albertvillanova in #4131
- Update: xtreme - Support streaming for PAWS-X config by @albertvillanova in #4132
- Update: xtreme - Support streaming for PAN-X config by @albertvillanova in #4135
- Update: SQuAD v2 - Use a constant for the articles regex by @bryant1410 in #4030
- Update: HANS - Support streaming by @mariosasko in #4155
- Fix: cats_vs_dogs - fix checksum error dataset by @albertvillanova in #4033
- Fix: xcopa - fix null checksum by @albertvillanova in #4034
- Fix: amazon_us_reviews - fix metadata - 4/4/2022 by @trentonstrong in #4092
Dataset Cards
- Updated annotations for nli_tr dataset by @e-budur in #4058
- Add missing label for emotion description by @lijiazheng99 in #4151
- Remove unncessary 'pylint disable' message in ReadMe by @Datta0 in #3955
- Improve RedCaps dataset card by @mariosasko in #4100
- Fix duplicate key in multi_news by @lhoestq in #4164
Datasets Tags and Search on the Hugging Face Hub
- Tasks alignment with models by @lhoestq in #4066
- Update datasets task tags to align tags with models by @lhoestq in #4067
Metrics Changes
- Xtreme-S Metrics by @patrickvonplaten in #3799
- Fix xtreme s metrics by @patrickvonplaten in #3957
- Avoid info log messages from transformers in FrugalScore metric by @albertvillanova in #3938
- Add exact match metric by @emibaylor in #3899
- Fix comet metric by @lhoestq in #3945
- Add zero_division argument to precision and recall metrics by @albertvillanova in #4035
- Support float data types in pearsonr/spearmanr metrics by @albertvillanova in #4054
- Remove GLEU metric by @emibaylor in #3949
Metric Cards
- Perplexity Metric Card by @emibaylor in #3905
- Create README.md by @sashavor in #3917
- Create README.md for CER metric by @sashavor in #3911
- Create README.md by @sashavor in #3944
- Update README.md by @sashavor in #3933
- Create SARI metric card by @sashavor in #3932
- Create MAUVE metric card by @sashavor in #3934
- Create CoVAL metric card by @sashavor in #3940
- Google BLEU Metric Card by @emibaylor in #3948
- Create metric card for BERTScore by @sashavor in #3966
- Rename wer to cer by @pmgautam in #4012
- Create metric card for XNLI by @sashavor in #4046
- Create metric card for the Code Eval metric by @sashavor in #4049
- Add TER metric card by @emibaylor in #3981
- BLEU metric card by @emibaylor in #3947
- Create metric card for CUAD by @sashavor in #4043
- Create metric card for METEOR by @sashavor in #4065
- Create a metric card for Competition MATH by @sashavor in #4073
- Create metric card for seqeval by @sashavor in #4070
- Create README.md by @sashavor in #3930
- Create metric card for Frugal Score by @sashavor in #4089
- Updating FrugalScore metric card by @sashavor in #4097
- Proposing WikiSplit metric card by @sashavor in #4098
- Fix formatting in BLEU metric card by @mariosasko in #4157
Documentation
- Doc maintenance by @stevhliu in #3926
- [Doc] Don't use v for version tags on GitHub by @sgugger in #3943
- Use templates for doc-builidng jobs by @sgugger in #3914
- Add align_labels_with_mapping docs by @stevhliu in #3931
- Add tip on how to speed up loading with ImageFolder by @mariosasko in #3980
- Fix main_classes docs index by @lhoestq in #3925
- More consistent references in docs by @mariosasko in #3988
- Docs maintenance by @stevhliu in #3999
- Add ROUGE Metric Card by @emibaylor in #4076
- Add chrF(++) Metric Card by @emibaylor in #4082
- Add SacreBLEU Metric Card by @emibaylor in #4083
General improvements and bug fixes
- Fix flatten of complex feature types by @mariosasko in #3723
- Fix flatten of Sequence feature type by @lhoestq in #3962
- Exclude Google Drive tests of the CI by @lhoestq in #3982
- Close
PIL.Image
file handler inImage.decode_example
by @mariosasko in #3995 - Fix Faiss custom_index device by @albertvillanova in #3987
- Fix None issue with Sequence of dict by @lhoestq in #4010
- Update main readme by @lhoestq in #3927
- Fix
map
remove_columns on empty dataset by @lhoestq in #4021 - Fix Audio.encode_example() when writing an array by @polinaeterna in #3998
- Use audio feature in ASR task template by @lhoestq in #4006
- Improve out of bounds error message by @lhoestq in #4068
- Increase max retries for GitHub metrics by @albertvillanova in #4063
- Fix CLI dummy data generation by @albertvillanova in #4045
- Fix docs on audio feature installation by @albertvillanova in #4028
- Add installation instructions to image_process doc by @mariosasko in #4072
- Fix GithubMetricModuleFactory instantiation with None download_config by @albertvillanova in #4078
- Increase max retries for GitHub datasets by @albertvillanova in #4079
- Close parquet writer properly in
push_to_hub
by @lhoestq in #4081 - fix typo in rename_column error message by @hunterlang in #4095
- Fix BeamWriter output Parquet file by @albertvillanova in #4087
- Remove unused legacy Beam utils by @albertvillanova in #4088
- Hotfix failing CI tests on Windows by @albertvillanova in #4119
- Update security policy by @albertvillanova in #4111
- Avoid writing empty license files by @albertvillanova in #4090
- Support huggingface_hub 0.5 by @lhoestq in #4106
- Pretty print dataset info files by @mariosasko in #4116
- Add single dataset citations for TweetEval by @gchhablani in #4137
- Adjust path to datasets tutorial in How-To by @NimaBoscarino in #4147
- Applied index-filters on scores in search.py. by @vishalsrao in #3971
- More robust
cast_to_python_objects
inTypedSequence
by @mariosasko in #4128 - Sync Features dictionaries by @mariosasko in #3997
- Avoid rate limit in update hub repositories by @lhoestq in #4167
New Contributors
- @bp-high made their first contribution in #3972
- @ruanchaves made their first contribution in #4004
- @pmgautam made their first contribution in #4012
- @hunterlang made their first contribution in #4095
- @trentonstrong made their first contribution in #4092
- @NimaBoscarino made their first contribution in #4147
- @jon-tow made their first contribution in #4103
- @lijiazheng99 made their first contribution in #4151
- @Datta0 made their first contribution in #3955
- @vishalsrao made their first contribution in #3971
Full Changelog: 2.0.0...2.1.0