✨ New features and improvements
- Alpha tokenization support for Azerbaijani.
- Updates for French stop words.
🔴 Bug fixes
- Fix issue #7629: Fix scoring normalization.
- Fix issue #7886: Fix unknown tokens percentage in
debug data
. - Fix issue #7907: Update
load_lookups
return type and docstring. - Fix issue #7930: Make
EntityLinker
robust fornO=None
. - Fix issue #7925: Skip vector ngram backoff if
minn
is not set. - Fix issue #7973: Fix
debug model
for transformers. - Fix issue #7988: Preserve existing
ENT_KB_ID
inner
annotation. - Fix issue #7992: Fix span offsets for
Matcher(as_spans)
on spans. - Fix issue #8004: Handle errors while multiprocessing.
- Fix issue #8009: Fix
Doc.from_docs()
for all empty docs. - Fix issue #8012: Fix ensemble
textcat
with listener. - Fix issue #8054: Add
ENT_ID
andNORM
toDocBin
strings. - Fix issue #8055: Handle partial entities in
Span.as_doc
. - Fix issue #8062: Make all
Span
attrs writable. - Fix issue #8066: Update
debug data
fortextcat
. - Fix issue #8069: Custom warning if
DocBin
is too large. - Fix issue #8113: Support
to/from_bytes
forKnowledgeBase
andEntityLinker
. - Fix issue #8116: Fix offsets in
Span.get_lca_matrix
. - Fix issue #8132: Remove unsupported attrs from
attrs.IDS
. - Fix issue #8158: Ensure tolerance is passed on in
spacy.batch_by_words.v1
. - Fix issue #8169: Fix bug from
EntityRuler
:ent_ids
returnsNone
for phrases. - Fix issue #8208: Address missing config overrides post load of models.
- Fix issue #8212: Add all symbols in Unicode Currency Symbols to currency characters.
- Fix issue #8216: Don't add duplicate patterns in
EntityRuler
. - Fix issue #8244: Use context manager when reading model file.
- Fix issue #8245: Fix other open calls without context managers.
- Fix issue #8265: Address mypy errors.
- Fix issue #8299: Restrict
pymorphy2
requirement topymorphy2
mode in Russian and Ukrainian lemmatizers. - Fix issue #8335: Raise error if deps not provided with heads in
Doc
. - Fix issue #8368: Preserve whitespace in
Span.lemma_
. - Fix issue #8396: Make
JsonlReader
path optional. - Fix issue #8421: Fix non-deterministic deduplication in Greek lemmatizer.
- Fix issue #8423: Update validate CLI to fix compat and ignore warnings.
- Fix issue #8426: Fix setting empty entities in
Example.from_dict
. - Fix issue #8487: Fix span offsets and keys in
Doc.from_docs
. - Fix issue #8584: Raise an error for
textcat
with <2 labels. - Fix issue #8551: Fix duplicate spacy package CLI opts.
👥 Contributors
@adrianeboyd, @bodak, @bryant1410, @dhruvrnaik, @fhopp, @frascuchon, @graue70, @ines, @jenojp, @jhroy, @jklaise, @juliensalinas, @meghanabhange, @michael-k, @narayanacharya6, @polm, @sevdimali, @svlandeg, @ZeeD