github explosion/spaCy v3.6.0
v3.6.0: New span finder component and pipelines for Slovenian

latest releases: release-v3.8.2, release-v3.8.0, release-v3.7.7...
16 months ago

✨ New features and improvements

  • NEW: span_finder pipeline component to identify overlapping, unlabeled spans (#12507).
  • Language updates:
    • Add initial support for Malay (#12602).
    • Update Latin defaults to support noun chunks, update lexical/tokenizer defaults and add example sentences (#12538).
  • Add option to return scores separately keyed by component name with spacy evaluate --per-component, Language.evaluate(per_component=True) and Scorer.score(per_component=True) (#12540).
  • Support custom token/lexeme attribute for vectors (#12625).
  • Support spancat_singlelabel in spacy debug data CLI (#12749).
  • Typing updates for PhraseMatcher and SpanGroup (#12642, #12714).

🔴 Bug fixes

  • #12569: Require that all SpanGroup spans come from the current doc.

📦 Trained pipelines updates

We have added new pipelines for Slovenian that use the trainable lemmatizer and floret vectors.

Package UPOS Parser LAS NER F
sl_core_news_sm 96.9 82.1 62.9
sl_core_news_md 97.6 84.3 73.5
sl_core_news_lg 97.7 84.3 79.0
sl_core_news_trf 99.0 91.7 90.0
  • 🙏 Special thanks to @orglce for help with the new pipelines!

The English pipelines have been updated to improve handling of contractions with various apostrophes and to lemmatize "get" as a passive auxiliary.

The Danish pipeline da_core_news_trf has been updated to use vesteinn/DanskBERT with performance improvements across the board.

⚠️ Backwards incompatibilities

  • SpanGroup spans are now required to be from the same doc. When initializing a SpanGroup, there is a new check to verify that all added spans refer to the current doc. Without this check, it was possible to run into string store or other errors.

📖 Documentation and examples

👥 Contributors

@adrianeboyd, @bdura, @danieldk, @davidberenstein1957, @diyclassics, @essenmitsosse, @honnibal, @ines, @isabelizimm, @jmyerston, @kadarakos, @KennethEnevoldsen, @khursani8, @ljvmiranda921, @rmitsch, @shadeMe, @svlandeg, @tomaarsen, @victorialslocum, @vin-ivar, @ZiadAmerr

Don't miss a new spaCy release

NewReleases is sending notifications on new releases.