✨ New features and improvements
- Improved
parser
andner
speeds on long documents (see technical details in #10019). - Support for
spancat
components indebug data
. - Support for
ENT_IOB
as aMatcher
token pattern key. - Extended and improved types for many classes.
🔴 Bug fixes
- Fix issue #9735: Make floret murmurhash endian-neutral.
- Fix issue #9738: Support string IOB values for
ENT_IOB
. - Fix issue #9746: Updates to avoid "dictionary size changed during iteration" runtime errors.
- Fix issue #9960: Warn about entities that cross sentence boundaries in
debug data
. - Fix issue #9979: Fix type for
Lexeme.rank
. - Fix issue #10026: Check for 0-size assets in
spacy project
. - Fix issue #10051: Consistently return scalars from similarity methods.
- Fix issue #10052: Fix spaces in
Doc.from_docs()
for empty docs. - Fix issue #10079: Fix label detection in
debug data
for components with custom names. - Fix issue #10109: Add types to
Underscore
andDependencyMatcher
and improve types inLanguage
,Matcher
andPhraseMatcher
. - Fix issue #10130: Fix
Tokenizer.explain
when infixes appear as prefixes. - Fix issue #10143: Use simple suggester in
spancat
initialization. - Fix issue #10164: Support
IS_SENT_END
inDoc.has_annotation
. - Fix issue #10192: Detect invalid package names in
spacy package
. - Fix issue #10223: Support mixed case in package names.
- Fix issue #10234: Fix type in
PhraseMatcher
.
📖 Documentation and examples
- Various documentation updates.
- New spaCy version tags in spaCy universe.
- New
Dockerfile
for repeatable website builds and easier local development. - New additions to spaCy universe:
- Augmenty: a text augmentation library
- Healthsea: an end-to-end spaCy pipeline for exploring health supplement effects
- spacy-wrap: wrap fine-tuned transformers in spaCy pipelines
- spacypdfreader: easy PDF to text to spaCy text extraction
- textnets: text analysis with networks
👥 Contributors
@adrianeboyd, @antonpibm, @ColleterVi, @danieldk, @DuyguA, @ezorita, @HaakonME, @honnibal, @ines, @jboynyc, @KennethEnevoldsen, @ljvmiranda921, @mrshu, @pmbaumgartner, @polm, @ramonziai, @richardpaulhudson, @ryndaniels, @svlandeg, @thiippal, @thomashacker, @yoavxyoav