💥 We'd love to hear more about your experience with spaCy! Take our survey here.
✨ New features and improvements
- NEW:
spancat_singlelabel
pipeline component for multi-class and non-overlapping span classification. Thespancat_singlelabel
component predicts at most one label for each suggested span and adds a new settingallow_overlap
to restrict the output to non-overlapping spans (#11365). - Extend to mypy v1.0 (#12245).
- Use
transformer
+ CNN for efficient GPUtextcat
withspacy init config
(#11900). - Support trainable lemmatizer in
spacy debug data
(#11419). - Add new operators to dependency matcher for left/right immediate child/parent nodes (
>+
,>-
,<+
,<-
) (#12334). - Add
spacy.PlainTextCorpusReader.v1
for plain text input (#12122). - Add
alignment_mode
andspan_id
toSpan.char_span()
(#12145, #12196). - Use string formatting types in logging calls (#12215).
🔴 Bug fixes
- #12017: Improve speed for
top_k>1
in trainable lemmatizer. - #12048: Make
test_cli_find_threshold()
test more robust. - #12227: Fix return type of
registry.find()
. - #12272: Fix speed regression for
Matcher
patterns with extension attributes. - #12287: Add
grc
to languages with lexeme norms inspacy-lookups-data
. - #12320: Make generation of empty
KnowledgeBase
instances configurable. - #12343: Fix error message for displacy
auto_select_port
. - #12347: Fix length check for knowledge base in entity linker, add
InMemoryLookupKB.is_empty
. - #12365: Fix types for
Lexeme.orth
andLexeme.lower
. - #12366: Raise error for non-default vectors with
PretrainVectors
. - #12368: Partially address pending deprecation of
pkg_resources
. - Various improvements and fixes for the test suite (#12148, #12157, #12210, #12303, #12372).
📖 Documentation and examples
- Many website updates to improve accessibility.
- Various documentation corrections and updates.
- New projects:
- Span labeling datasets
- Comparing embedding layers in spaCy from the technical report Multi hash embeddings in spaCy
👥 Contributors
@adrianeboyd, @andyjessen, @danieldk, @essenmitsosse, @honnibal, @ines, @itssimon, @kadarakos, @kwhumphreys, @ljvmiranda921, @pmbaumgartner, @polm, @richardpaulhudson, @rmitsch, @shadeMe, @svlandeg, @tanloong, @thomashacker, @victorialslocum