Datasets changes
- New: NLI-Tr (#787)
- New: Amazon Reviews (#791)(#844)(#845)(#799)
- New: ASNQ - answer sentence selection (#780)
- New: OpenBookCorpus (#856)
- New: ASLG-PC12 - sign language translation (#731)
- New: Quail - question answering dataset (#747)
- Update: SNLI: Created dataset card snli.md (#663)
- Update: csv - Use pandas reader in csv (#857)
- Better memory management
- Breaking: the previous
read_options
,parse_options
and convert_options
are replaced with plain parameters like pandas.read_csv
- Update: conll2000, conll2003, germeval_14, wnut_17, XTREME PAN-X - Create ClassLabel for labelling tasks datasets (#850)
- Breaking: use of ClassLabel features instead of string features + naming of columns updated for consistency
- Update: XNLI - Add XNLI train set (#781)
- Update: XSUM - Use full released xsum dataset (#754)
- Update: CompGuessWhat - New version of CompGuessWhat?! with refined annotations (#748)
- Update: CLUE - add OCNLI, a new CLUE dataset (#742)
- Fix: KOR-NLI - Fix csv reader (#855)
- Fix: Discofuse - fix discofuse urls (#793)
- Fix: Emotion - fix description (#745)
- Fix: TREC - update urls (#740)
Metrics changes
- New: accuracy, precision, recall and F1 metrics (#825)
- Fix: squad_v2 (#840)
- Fix: seqeval (#810)(#738)
- Fix: Rouge - fix description (#774)
- Fix: GLUE - fix description (#734)
- Fix: BertScore - fix custom baseline (#763)
Command line tools
- add clear_cache parameter in the test command (#863)
Dependencies
- Integrate file_lock inside the lib for better logging control (#859)
Dataset features
- Add writer_batch_size attribute to GeneratorBasedBuilder (#828)
- pretty print dataset objects (#725)
- allow custom split names in text dataset (#776)
Tests
- All configs is a slow test now