pypi datasets 1.1.3

latest releases: 2.19.1, 2.19.0, 2.18.0...
3 years ago

Datasets changes

  • New: NLI-Tr (#787)
  • New: Amazon Reviews (#791)(#844)(#845)(#799)
  • New: ASNQ - answer sentence selection (#780)
  • New: OpenBookCorpus (#856)
  • New: ASLG-PC12 - sign language translation (#731)
  • New: Quail - question answering dataset (#747)
  • Update: SNLI: Created dataset card snli.md (#663)
  • Update: csv - Use pandas reader in csv (#857)
    • Better memory management
    • Breaking: the previous read_options, parse_options and convert_options are replaced with plain parameters like pandas.read_csv
  • Update: conll2000, conll2003, germeval_14, wnut_17, XTREME PAN-X - Create ClassLabel for labelling tasks datasets (#850)
    • Breaking: use of ClassLabel features instead of string features + naming of columns updated for consistency
  • Update: XNLI - Add XNLI train set (#781)
  • Update: XSUM - Use full released xsum dataset (#754)
  • Update: CompGuessWhat - New version of CompGuessWhat?! with refined annotations (#748)
  • Update: CLUE - add OCNLI, a new CLUE dataset (#742)
  • Fix: KOR-NLI - Fix csv reader (#855)
  • Fix: Discofuse - fix discofuse urls (#793)
  • Fix: Emotion - fix description (#745)
  • Fix: TREC - update urls (#740)

Metrics changes

  • New: accuracy, precision, recall and F1 metrics (#825)
  • Fix: squad_v2 (#840)
  • Fix: seqeval (#810)(#738)
  • Fix: Rouge - fix description (#774)
  • Fix: GLUE - fix description (#734)
  • Fix: BertScore - fix custom baseline (#763)

Command line tools

  • add clear_cache parameter in the test command (#863)

Dependencies

  • Integrate file_lock inside the lib for better logging control (#859)

Dataset features

  • Add writer_batch_size attribute to GeneratorBasedBuilder (#828)
  • pretty print dataset objects (#725)
  • allow custom split names in text dataset (#776)

Tests

  • All configs is a slow test now

Bug fixes

  • Make save function use deterministic global vars order (#819)
  • fix type hints pickling in python 3.6 (#818)
  • fix metric deletion when attributes are missing (#782)
  • Fix custom builder caching (#770)
  • Fix metric with cache dir (#772)
  • Fix train_test_split output format (#719)

Don't miss a new datasets release

NewReleases is sending notifications on new releases.