pypi stanza 1.4.0
Stanza v1.4.0

latest releases: 1.9.2, 1.9.1, 1.9.0...
2 years ago

Stanza v1.4.0: Transformer integration to NER and conparse

Overview

As part of the new Stanza release, we integrate transformer inputs to the NER and conparse modules. In addition, we now support several additional languages for NER and conparse.

Pipeline interface improvements

  • Download resources.json and models into temp dirs first to avoid race conditions between multiple processors
    #213
    #1001

  • Download models for Pipelines automatically, without needing to call stanza.download(...)
    #486
    #943

  • Add ability to turn off downloads
    68455d8

  • Add a new interface where both processors and package can be set
    #917
    f370429

  • When using pretokenized tokens, get character offsets from text if available
    #967
    #975

  • If Bert or other transformers are used, cache the models rather than loading multiple times
    #980

  • Allow for disabling processors on individual runs of a pipeline
    #945
    #947

Other general improvements

  • Add # text and # sent_id to conll output
    #918
    #983
    #995

  • Add ner to the token conll output
    #993
    #996

  • Fix missing Slovak MWT model
    #971
    5aa19ec

  • Upgrades to EN, IT, and Indonesian models
    #1003
    #1008
    IT improvements with the help of @attardi and @msimi

  • Fix improper tokenization of Chinese text with leading whitespace
    #920
    #924

  • Check if a CoreNLP model exists before downloading it (thank you @Internull)
    #965

  • Convert the run_charlm script to python
    #942

  • Typing and lint fixes (thank you @asears)
    #833
    #856

  • stanza-train examples now compatible with the python training scripts
    #896

NER features

Constituency parser

Don't miss a new stanza release

NewReleases is sending notifications on new releases.