explosion/spaCy v3.0.6 on GitHub

✨ New features and improvements

New assemble CLI command for assembling a pipeline from a config without training.
Add support for match alignments in the Matcher to align matched tokens with matcher patterns.
Add support for training from streamed corpora.
Add support for W&B data and model checkpoint logging and versioning in spacy.WandbLogger.v2.
Extend Scorer.score_spans to support overlapping and unlabeled spans.
Update debug data for new v3 components.
Improve language data for Italian.
Various improvements to error handling and UX.

Fix issue #7408: Add vocab kwarg to spacy.load.
Fix issue #7419: Exclude user hooks in displacy conversion.
Fix issue #7421: Update --code usage in CLI commands.
Fix issue #7424: Preserve sent starts on retokenization without parse.
Fix issue #7440: Fix pymorphy2 lookup lemmatizer.
Fix issue #7471: Improve warnings related to listening components.
Fix issue #7488: Fix upstream check in pretraining.
Fix issue #7489: Support callbacks entry points.
Fix issue #7497: Merge doc.spans in Doc.from_docs().
Fix issue #7528: Preserve user data for DependencyMatcher on spans.
Fix issue #7557: Fix __add__ method for PRFScore.
Fix issue #7574: Fix conversion of custom extension data in Span.as_doc and Doc.from_docs.
Fix issue #7620: Fix replace_listeners in configs.
Fix issue #7626: Fix vectors data on GPU.
Fix issue #7630: Update NEL for entities crossing sentence boundaries.
Fix issue #7631: Fix parser sourcing in NER converter.
Fix issue #7642: Fix handling of hyphen string value in config files.
Fix issue #7655: Fix sent starts when converting from v2 JSON training format.
Fix issue #7674: Fix handling of unknown tokens in StaticVectors.
Fix issue #7690: Fix pickling of Lemmatizer.
Fix issue #7749: Update Tokenizer.explain for special cases in v3.
Fix issue #7755: Fix config parsing of ints/strings.
Fix issue #7836: Fix tokenizer cache flushing.
Fix issue #7847: Fix handling of boolean values in Example.from_dict for sent starts.