✨ New features and improvements
- Add the SpanRuler component. This component saves a list of matched spans to
Doc.spans[spans_key]
. - Support for JSON serialization and deserialization of
Doc
objects. - Add span analysis to
debug data
. - Allow data assets to be made optional in a spaCy project.
- Prebuilt macOS ARM64 wheels are now available for all spaCy dependencies distributed by @explosion.
🔴 Bug fixes
- Fix issue #9575: Fix Entity Linker with tokenization mismatches between gold and predicted
Doc
objects. - Fix issue #10685: Fix serialization of
SpanGroup
objects that share the same name within oneSpanGroups
container. - Fix issue #10718: Remove debug print statements in
walk_head_nodes
to avoid acquiring the GIL. - Fix issue #10741: Make the
StringStore.__getitem__
return type dependent on its parameter type. - Fix issue #10734: Support removal of overlapping terms in
PhraseMatcher
. - Fix issue #10772: Override
SpanGroups.setdefault
to also supportIterable[SpanGroup]
as the default. - Fix issue #10817: Ensure that the term
ROOT
is in the glossary. - Fix issue #10830: Better errors for
Doc.has_annotation
andMatcher
. - Fix issue #10864: Avoid pickling
Doc
inputs passed toLanguage.pipe()
. - Fix issue #10898: Fix schemas import in
Doc
.
⚠️ Backward incompatibilities
-
Before this release, a validation bug allowed the configuration of a pipeline component to override the name of the pipeline itself through the
name
attribute. For example, the following pipeline component:[components.transformer] factory = "transformer" name = "custom_transformer_name"
would be registered erroneously as
custom_transformer_name
. Such overrides are now ignored and a warning is emitted (#10779). From spaCy v3.3.1 onwards, this component will be registered astransformer
.
👥 Contributors
@adrianeboyd, @danieldk, @freddyheppell, @honnibal, @ines, @kadarakos, @ldorigo, @ljvmiranda921, @maxTarlov, @pmbaumgartner, @polm, @pypae, @richardpaulhudson, @rmitsch, @shadeMe, @single-fingal, @svlandeg