✨ New features and improvements
- NEW: Luganda language support (#10847).
- NEW: Latin language support (#11349).
- NEW:
spacy.ConsoleLogger.v2
optionally saves training logs to JSONL (#11214). - NEW: New operators for the
DependencyMatcher
to include matching parents or children to the left or the right of the node (#10371). - Prebuilt Python 3.11 wheels are now available for all spaCy dependencies distributed by @explosion.
- Support pydantic v1.10 and mypy 0.980+, drop mypy support for Python 3.6 (#11546, #11635).
- Support CuPy v11 and add extras for
cuda11x
andcuda-autodetect
(usingcupy-wheel
) (#11279). - Support custom attributes for tokens and spans in
Doc.to_json()
andDoc.from_json()
(#11125). - Make the
enable
anddisable
options forspacy.load()
more consistent (#11459). - Allow a single string argument for
disable
/enclude
/exclude
forspacy.load()
(#11406). - New
--url
flag forspacy info
to print the direct download URL for a pipeline (#11175). - Add a check for missing requirements in the
spacy project
CLI (#11226). - Add a Levenshtein distance function (#11418).
- Improvements to the
spacy debug data
CLI for spancat data (#11504). - Allow overriding
spacy_version
inspacy package
metadata (#11552). - Improve the error message when using the wrong command for
spacy project assets
(#11458). - Ensure parent directories are created when storing the results of the
spacy pretrain
command (#11210). - Extend support to newer versions of
natto-py
for theko
extra (#11222).
📦 Trained pipelines updates
This release includes updated English pipelines for spaCy v3.4 with improved NER performance. The updates in en_core_web_*
v3.4.1 address issues related to training from data with partial named entity annotation, which led to lower NER recall in English pipeline versions v3.0.0–v3.4.0. In particular, entities that appear in the sections of the OntoNotes training data without NER annotation were not predicted consistently by the earlier pipeline versions, such as names and places that are frequent in the Biblical sections, e.g., "David" and "Egypt" (see #7493).
Use spacy download
to update your English pipelines to the newest version. If you'd prefer to keep using an earlier version, you can specify the version directly with e.g. spacy download -d en_core_web_sm-3.4.0
. You can check that you are using the new version (v3.4.1) with spacy validate
:
NAME SPACY VERSION
en_core_web_md >=3.4.0,<3.5.0 3.4.1 ✔
🔴 Bug fixes
- #11275: Fix Dutch noun chunks to skip overlapping spans.
- #11276: Fix regex invalid escape sequences.
- #11312: Better handling of unexpected types in
SetPredicate
. - #11460: Fix config validation failures caused by NVTX pipeline wrappers.
- #11506: Avoid unwanted side effects in
Doc.__init__
. - #11540: Preserve missing entity annotation in augmenters.
- #11592: Fix issues with DVC commands.
- #11631: Fix initialization for
pymorphy2_lookup
lemmatizer mode for Russian and Ukrainian.
⚠️ Backwards incompatibilities
- If you're using a custom component that does not return a
Doc
type, an error will now be raised (#11424). - If you're using a dot in a factory name, an error is raised as this is not supported (#11336).
📖 Documentation and examples
- Added documentation for the new experimental coref component.
- Added Ukrainian trained pipelines to the website.
- Added documentation for the
spacy.models_and_pipes_with_nvtx_range.v1
callback. - Fix English pipeline names in v3.4 release notes.
- Various fixes to the
Example
API documentation. - Extensions and improvements to the
displacy
docs. - Fix the example command for
spacy project dvc
. - Update example code for
spacy-wordnet
. - Improve API documentation around the
initialize()
function for pipeline components. - Fix various typos and inconsistencies.
- spaCy universe additions:
- concepCy: A spaCy wrapper for ConceptNet.
- spaCy partial tagger: build a CRF tagger with a partially annotated dataset.
- Zshot: Zero and Few shot named entity & relationships recognition.
👥 Contributors
@adrianeboyd, @bdura, @danieldk, @diyclassics, @DSLituiev, @GabrielePicco, @honnibal, @ines, @JulesBelveze, @kadarakos, @ljvmiranda921, @ninjalu, @pmbaumgartner, @polm, @radandreicristian, @richardpaulhudson, @rmitsch, @shadeMe, @stefawolf, @svlandeg, @thomashacker, @tobiusaolo, @tzussman , @yasufumy