github explosion/spaCy v3.0.6
v3.0.6: assemble CLI, Matcher alignments, training from streamed corpora and many bug fixes

latest releases: v3.7.5, 4.0.0.dev3, v3.7.4...
3 years ago

✨ New features and improvements

  • New assemble CLI command for assembling a pipeline from a config without training.
  • Add support for match alignments in the Matcher to align matched tokens with matcher patterns.
  • Add support for training from streamed corpora.
  • Add support for W&B data and model checkpoint logging and versioning in spacy.WandbLogger.v2.
  • Extend Scorer.score_spans to support overlapping and unlabeled spans.
  • Update debug data for new v3 components.
  • Improve language data for Italian.
  • Various improvements to error handling and UX.

🔴 Bug fixes

  • Fix issue #7408: Add vocab kwarg to spacy.load.
  • Fix issue #7419: Exclude user hooks in displacy conversion.
  • Fix issue #7421: Update --code usage in CLI commands.
  • Fix issue #7424: Preserve sent starts on retokenization without parse.
  • Fix issue #7440: Fix pymorphy2 lookup lemmatizer.
  • Fix issue #7471: Improve warnings related to listening components.
  • Fix issue #7488: Fix upstream check in pretraining.
  • Fix issue #7489: Support callbacks entry points.
  • Fix issue #7497: Merge doc.spans in Doc.from_docs().
  • Fix issue #7528: Preserve user data for DependencyMatcher on spans.
  • Fix issue #7557: Fix __add__ method for PRFScore.
  • Fix issue #7574: Fix conversion of custom extension data in Span.as_doc and Doc.from_docs.
  • Fix issue #7620: Fix replace_listeners in configs.
  • Fix issue #7626: Fix vectors data on GPU.
  • Fix issue #7630: Update NEL for entities crossing sentence boundaries.
  • Fix issue #7631: Fix parser sourcing in NER converter.
  • Fix issue #7642: Fix handling of hyphen string value in config files.
  • Fix issue #7655: Fix sent starts when converting from v2 JSON training format.
  • Fix issue #7674: Fix handling of unknown tokens in StaticVectors.
  • Fix issue #7690: Fix pickling of Lemmatizer.
  • Fix issue #7749: Update Tokenizer.explain for special cases in v3.
  • Fix issue #7755: Fix config parsing of ints/strings.
  • Fix issue #7836: Fix tokenizer cache flushing.
  • Fix issue #7847: Fix handling of boolean values in Example.from_dict for sent starts.

📖 Documentation and examples

  • Add documentation for legacy functions and architectures.
  • Add documentation for pretrained pipeline design.
  • Add more details about pipe and multiprocessing.
  • Fix various typos and inconsistencies.

👥 Contributors

Thanks to @alvaroabascar, @armsp, @AyushExel, @BramVanroy, @broaddeep, @bryant1410, @bsweileh, @dpalmasan, @Findus23, @graue70, @jaidevd, @koaning, @langdonholmes, @m0canu1, @meghanabhange, @paoloq, @plison, @richardpaulhudson, @SamEdwardes, @Stannislav for the pull requests and contributions!

Don't miss a new spaCy release

NewReleases is sending notifications on new releases.