✨ New features and improvements
- NEW: Provide scores for the
SpanCategorizer
predictions. - NEW: Broader compatibility with type checkers thanks to
.pyi
stub files. - NEW: Auto-detect package dependencies in
spacy package
. - New
INTERSECTS
operator for the Matcher. - More debugging info for
spacy project
push
andpull
commands. - Allow passing in a precomputed array for speeding up multiple
Span.as_doc
calls. - The default
da
transformer is now the same as the one from the trained pipelines (Maltehb/danish-bert-botxo
).
🔴 Bug fixes
- Fix issue #8767: Fix offsets of empty and out-of-bounds spans.
- Fix issue #8774: Ensure
debug data
runs correctly with a custom tokenizer. - Fix issue #8784: Fix incorrect
ISSUBSET
andISSUPERSET
in schema and docs. - Fix issue #8796: Respect the
no_skip
value forspacy project run
. - Fix issue #8810: Make
ConsoleLogger
flush after each logging line. - Fix issue #8819: Pass
exclude
when serializing the vocab. - Fix issue #8830: Avoid adding sourced vectors hashes if not necessary.
- Fix issue #8970: Fix
allow_overlap
default for span categorizer scoring. - Fix issue #8982: Add glossary entry for
_SP
. - Fix issue #9007: Fix span categorizer training on nested entities.
📖 Documentation and examples
- New developer documentation covering spaCy's internals and code conventions.
- Added a documentation section on preparing training data in spaCy's binary format.
- Updated some error/log messages to be more informative.
- Various updates to the documentation.
- A few new additions to the spaCy universe.
👥 Contributors
@adrianeboyd, @bbieniek, @DuyguA, @ezorita, @HLasse, @honnibal, @ines, @kabirkhan, @kevinlu1248, @ldorigo, @Ledenel, @nsorros, @polm, @svlandeg, @swfarnsworth, @themrmax, @thomashacker