Windows support
- Add Windows support (#644):
- add tests and CI for Windows
- fix numerous windows specific issues
- The library now fully supports Windows
Dataset changes
- New: HotpotQA (#703)
- New: OpenWebText (#660)
- New: Winogrande - add debiased subset (#655)
- Update: XNLI - update download link (#695)
- Update: text - switch to pandas reader, better memory usage, fix delimiter issues (#689)
- Update: csv - add features parameter to CSV (#685)
- Fix: GAP - fix wrong computation of boolean features (#680)
- Fix: C4 - fix manual instruction function (#681)
Metric changes
- Update: ROUGE - Add rouge 2 and rouge Lsum to rouge metric outputs by default (#701, #702)
- Fix: SQuAD - fix kwargs description (#670)
Dataset Features
- Use multiprocess from pathos for multiprocessing (#656):
- allow lambda functions in multiprocessed map
- allow local functions in multiprocessed map
- and more ! As long as functions are compatible with
dill
Bug fixes
- Datasets: fix possible program hanging with tokenizers - Disable tokenizers parallelism in multiprocessed map (#688)
- Datasets: fix cast with unordered features - fix column order issue in cast (#684)
- Datasets: fix first time creation of cache directory - move cache dir root creation in builder's init (#677)
- Datasets: fix OverflowError when using negative ids - fix negative ids in slicing with an array (#679)
- Datasets: fix empty dictionaries afetr multiprocessing - keep new columns in transmit format (#659)
- Datasets: fix type inference for nested types - handle data alteration when trying type (#653)
- Metrics: fix compute metric with empty input - pass metric features to the reader (#654)
Documentation
- Elasticsearch integration documentation (#696)
Tests
- Use GitHub instead of AWS in remote dataset tests (#694)