Datasets Changes
- New: Add Russian SuperGLUE #2668 (@slowwavesleep)
- New: Add Disfl-QA #2473 (@bhavitvyamalik)
- New: Add TimeDial #2476 (@bhavitvyamalik)
- Fix: Enumerate all ner_tags values in WNUT 17 dataset #2713 (@albertvillanova)
- Fix: Update WikiANN data URL #2710 (@albertvillanova)
- Fix: Update PAN-X data URL in XTREME dataset #2715 (@albertvillanova)
- Fix: C4 - en subset by modifying dataset_info with correct validation infos #2723 (@thomasw21)
General improvements and bug fixes
- fix: 🐛 change string format to allow copy/paste to work in bash #2694 (@severo)
- Update BibTeX entry #2706 (@albertvillanova)
- Print absolute local paths in load_dataset error messages #2684 (@mariosasko)
- Add support for disable_progress_bar on Windows #2696 (@mariosasko)
- Ignore empty batch when writing #2698 (@pcuenca)
- Fix shuffle on IterableDataset that disables batching in case any functions were mapped #2717 (@amankhandelia)
- fix: 🐛 fix two typos #2720 (@severo)
- Docs details #2690 (@severo)
- Deal with the bad check in test_load.py #2721 (@mariosasko)
- Pass use_auth_token to request_etags #2725 (@albertvillanova)
- Typo fix
tokenize_exemple
#2726 (@shabie) - Fix IndexError while loading Arabic Billion Words dataset #2729 (@albertvillanova)
- Add missing parquet known extension #2733 (@lhoestq)