Datasets fixes
- Fix: irc_disentangle - fix checksum and bug dataset by @albertvillanova in #4377
- Fix: CC-Aligned - fix invalid url by @juntang-zhuang in #4231
- Fix: multi_news - don't strip proceeding hyphen by @JohnGiorgi in #4353
Bug fixes
- Support lists of multi-dimensional numpy arrays by @albertvillanova in #4194
- Check if dataset features match before push in
DatasetDict.push_to_hub
by @mariosasko in #4372 - Pin dill by @albertvillanova in #4380
- dill 0.3.5 has some issues in
transformers
- pinning the version to<0.3.5
for now
- dill 0.3.5 has some issues in
Dataset Cards
- Adding eval metadata for ade v2 by @sashavor in #4319
- Adding eval metadata for AG News by @sashavor in #4329
- Adding eval metadata to Allociné dataset by @sashavor in #4330
- Adding eval metadata to Amazon Polarity by @sashavor in #4331
- Adding eval metadata for arabic speech corpus by @sashavor in #4332
- Adding eval metadata for Banking 77 by @sashavor in #4333
- Eval metadata Batch 4: Tweet Eval, Tweets Hate Speech Detection, VCTK, Weibo NER, Wisesight Sentiment, XSum, Yahoo Answers Topics, Yelp Polarity, Yelp Review Full by @sashavor in #4338
- Eval metadata batch 3: Reddit, Rotten Tomatoes, SemEval 2010, Sentiment 140, SMS Spam, Snips, SQuAD, SQuAD v2, Timit ASR by @sashavor in #4337
- Eval metadata batch 1: BillSum, CoNLL2003, CoNLLPP, CUAD, Emotion, GigaWord, GLUE, Hate Speech 18, Hate Speech by @sashavor in #4335
- Eval metadata batch 2 : Health Fact, Jigsaw Toxicity, LIAR, LJ Speech, MSRA NER, Multi News, NCBI Disease, Poem Sentiment by @sashavor in #4336
Docs
- Add API code examples for Builder classes by @stevhliu in #4313
- Add redirect to dataset script in the repo structure page by @lhoestq in #4369
Other improvements and bug fixes
- Fix failing CI on Windows for sari and wiki_split metrics by @albertvillanova in #4342
- Fix never ending GH Action to build documentation by @albertvillanova in #4345
- Fix warning in upload_file by @albertvillanova in #4355
- Fix warning in push_to_hub by @albertvillanova in #4357
- Remove config names as yaml keys by @lhoestq in #4367
- Add missing language tags for udhr dataset by @albertvillanova in #4371
- Remove links in docs to old dataset viewer by @mariosasko in #4373
New Contributors
- @JohnGiorgi made their first contribution in #4353
- @juntang-zhuang made their first contribution in #4231
Full Changelog: 2.2.1...2.2.2