pypi stanza 1.8.2
Old English, MWT improvements, and better memory management of Peft

27 days ago

Add an Old English pipeline, improve the handling of MWT for cases that should be easy, and improve the memory management of our usage of transformers with adapters.

Old English

MWT improvements

  • Fix words ending with -nna split into MWT stanfordnlp/handparsed-treebank@2c48d40 #1366

  • Fix MWT for English splitting into weird words by enforcing that the pieces add up to the whole (which is always the case in the English treebanks) #1371 #1378

  • Mark start_char and end_char on an MWT if it is composed of exactly its subwords 2384089 #1361

Peft memory management

  • Previous versions were loading multiple copies of the transformer in order to use adapters. To save memory, we can use Peft's capacity to attach multiple adapters to the same transformer instead as long as they have different names. This allows for loading just one copy of the entire transformer when using a Pipeline with several finetuned models. huggingface/peft#1523 #1381 #1384

Other bugfixes and minor upgrades

  • Fix crash when trying to load previously unknown language #1360 381736f

  • Check that sys.stderr has isatty before manipulating it with tqdm, in case sys.stderr was monkeypatched: d180ae0 #1367

  • Try to avoid OOM in the POS in the Pipeline by reducing its max batch length 4271813

  • Fix usage of gradient checkpointing & a weird interaction with Peft (thanks to @Jemoka) 597d48f

Other upgrades

  • Add * to the list of functional tags to drop in the constituency parser, helping Icelandic annotation 57bfa8b #1356 (comment)

  • Can train depparse without using any of the POS columns, especially useful if training a cross-lingual parser: 4048cae 15b136b

  • Add a constituency model for German 7a4f48c 86ddaab #1368

Don't miss a new stanza release

NewReleases is sending notifications on new releases.