github explosion/spaCy v3.1.0
v3.1.0: New pipelines for Catalan & Danish, SpanCategorizer for arbitrary overlapping spans, use predicted annotations during training, bug fixes & more

latest releases: release-v3.8.2, release-v3.8.0, release-v3.7.7...
3 years ago

✨ New features and improvements

For more details, see the New in v3.1 usage guide.

📦 New trained pipelines

Package Language UPOS Parser LAS  NER F
ca_core_news_sm Catalan 98.2 87.4 79.8
ca_core_news_md Catalan 98.3 88.2 84.0
ca_core_news_lg Catalan 98.5 88.4 84.2
ca_core_news_trf Catalan 98.9 93.0 91.2
da_core_news_trf Danish 98.0 85.0 82.9

⚠️ Upgrading from v3.0

  • Due to the use of configs with extensive versioning, v3.0 pipelines should be compatible with v3.1, however you may see slight differences in performance. Test your v3.0 pipeline with v3.1 against your test suite and if the performance is identical, extend the spacy_version in your model package meta to ">=3.0.0,<3.2.0". If you run into degraded performance, retrain your pipeline with v3.1.
  • Use spacy init fill-config to update a v3.0 config for v3.1.
  • When sourcing a pipeline component that requires static vectors, it is now required to include the source model's vectors in [initialize.vectors].
  • Logger warnings have been converted to Python warnings. Use warnings.filterwarnings or the new helper method spacy.errors.filter_warning(action, error_msg='') to manage warnings.

For more information, see Notes on upgrading from v3.0.

🔴 Bug fixes

  • Fix issue #7036: Use a context manager when reading model.
  • Fix issue #7629: Fix scoring normalization.
  • Fix issue #7799: Ensure spacy ray command works.
  • Fix issue #7807: Show warning if entity ruler runs without patterns.
  • Fix issue #7886: Fix unknown tokens percentage in debug data.
  • Fix issue #7930: Make EntityLinker robust for nO=None.
  • Fix issue #7925: Skip vector ngram backoff if minn is not set.
  • Fix issue #7973: Fix debug model for transformers.
  • Fix issue #7988: Preserve existing ENT_KB_ID in ner annotation.
  • Fix issue #8004: Handle errors while multiprocessing.
  • Fix issue #8009: Fix Doc.from_docs() for all empty docs.
  • Fix issue #8012: Fix ensemble textcat with listener.
  • Fix issue #8054: Add ENT_ID and NORM to DocBin strings.
  • Fix issue #8055: Handle partial entities in Span.as_doc.
  • Fix issue #8062: Make all Span attrs writable.
  • Fix issue #8066: Update debug data for textcat.
  • Fix issue #8069: Custom warning if DocBin is too large.
  • Fix issue #8099: Update Vietnamese tokenizer.
  • Fix issue #8113: Support to/from_bytes for KnowledgeBase and EntityLinker.
  • Fix issue #8116: Fix offsets in Span.get_lca_matrix.
  • Fix issue #8132: Remove unsupported attrs from attrs.IDS.
  • Fix issue #8158: Ensure tolerance is passed on in spacy.batch_by_words.v1.
  • Fix issue #8169: Fix bug from EntityRuler: ent_ids returns None for phrases.
  • Fix issue #8208: Address missing config overrides post load of models.
  • Fix issue #8212: Add all symbols in Unicode Currency Symbols to currency characters.
  • Fix issue #8216: Don't add duplicate patterns in EntityRuler.
  • Fix issue #8265: Address mypy errors.
  • Fix issue #8335: Raise error if deps not provided with heads in Doc.
  • Fix issue #8368: Preserve whitespace in Span.lemma_.
  • Fix issue #8388: Don't clobber vectors when loading components from source models.
  • Fix issue #8421: Fix non-deterministic deduplication in Greek lemmatizer.
  • Fix issue #8426: Fix setting empty entities in Example.from_dict.
  • Fix issue #8441: Add correct types for Language.pipe return values.
  • Fix issue #8487: Fix span offsets and keys in Doc.from_docs.
  • Fix issue #8559: Fix vectors check for sourced components.
  • Fix issue #8584: Raise an error for textcat with <2 labels.

👥 Contributors

@aajanki, @adrianeboyd, @bodak, @bryant1410, @dhruvrnaik, @explosion-bot, @fhopp, @frascuchon, @graue70, @gtoffoli, @honnibal, @ines, @jacopofar, @jenojp, @jhroy, @jklaise, @juliensalinas, @kevinlu1248, @ldorigo, @mathcass, @meghanabhange, @michael-k, @narayanacharya6, @NirantK, @nsorros, @polm, @sevdimali, @svlandeg, @themrmax, @xadrianzetx, @yohasebe, @ZeeD

Don't miss a new spaCy release

NewReleases is sending notifications on new releases.