pypi spacy 2.0.12
v2.0.12: Greek, Arabic, Urdu, Tatar, improved language data, better model downloads & various compatibility and bug fixes

latest releases: 4.0.0.dev3, 3.7.4, 3.7.3...
5 years ago

We had to release another update to the v2.0.x branch of spaCy to resolve a dependency issue, so we decided to also include and/or backport a bunch of features and fixes that were originally intended for v2.1.0 (see here for the nightly version).

✨ New features and improvements

  • NEW: Alpha tokenization and language data for Arabic, Urdu, Tatar and Greek.
  • NEW: Mecab-based Japanese tokenization and lemmatization.
  • NEW: Add Norwegian rule-based and lookup lemmatization.
  • NEW: Add Danish lookup lemmatization based on the Den store danske SprogTeknologiske Ordbase, STO dataset, courtesy of The University of Copenhagen.
  • NEW: Romanian lookup lemmatization.
  • Improve language data for Polish, Turkish, French, Romanian, Swedish and Japanese.
  • Improve case-sensitive lookup lemmatization in German.
  • Add Token.sent property that returns the sentence Span the token is part of.
  • Add remove_extension method on Doc, Token and Span.
  • Add Doc.is_sentenced property that returns True if sentence boundaries have been applied.
  • Allow ignoring warning by code via the SPACY_WARNING_IGNORE environment variable.
  • Add --silent option to info command.

🔴 Bug fixes

  • Fix issue #1456: Pass additional arguments of download command to pip and check if model is already installed before downloading it.
  • Fix issue #2191: Update README section on tests and dependencies.
  • Fix issue #2194: Ensure that Doc.noun_chunks_iterator isn't None before calling it.
  • Fix issue #2196: Return data in cli.info and add silent option.
  • Fix issue #2200: Correct typo in spacy package command message.
  • Fix issue #2210: Fix bug in Spanish noun chunks.
  • Fix issue #2211, #2320: Resolve problem in download command and use requests library again.
  • Fix issue #2219: Fix token similarity of single-letter tokens.
  • Fix issue #2222, #2223: Fix typos in documentation and docstrings.
  • Fix issue #2226: Use correct, non-deprecated merge syntax in merge_ents.
  • Fix issue #2228: Fix deserialization when using tensor=False or sentiment=False.
  • Fix issue #2238: Correct Swedish lookup lemmatization.
  • Fix issue #2242: Add remove_extension method on Doc, Token and Span.
  • Fix issue #2266: Add collapse_phrases option to displaCy visualizer.
  • Fix issue #2269: Fix KeyError by renaming SP to _SP.
  • Fix issue #2304: Don't require attrs argument in Doc.retokenize and allow ints/unicode.
  • Fix issue #2361: Escape HTML tags in displacy.render.
  • Fix issue #2376: Improve Matcher examples and add section on using pipeline components.
  • Fix issue #2385: Handle multi-word entities correctly in IOB to BILUO conversion.
  • Fix issue #2452: Fix bug that would cause displacy arrows to only point in one direction.
  • Fix issue #2477: Also allow Span objects in displacy.render.
  • Fix issue #2490: Update Thinc's dependencies for Python 3.7 compatibility.
  • Fix issue #2495: Fix loading tokenizer with custom prefix search.
  • Fix issue #2514: Switch from msgpack-python to msgpack to hopefully prevent conda from downloading a two-year-old spaCy version when installing with latest the Anaconda distribution.
  • Ensure that Doc.is_tagged is set correctly when using Language.pipe.
  • Fix bug in merge_noun_chunks factory that would return None if Doc wasn't parsed.
  • Explicitly require pathlib backport on Python 2 only.

📖 Documentation and examples

  • NEW: Edit and execute code examples in your browser – all across the documentation!
  • NEW: The spaCy Universe, a collection of plugins, extensions and other resources for spaCy.
  • NEW: Experimental rule-based Matcher Explorer demo – create token patterns interactively, test them against your text and copy-paste the Python pattern code.
  • NEW: Document Cython API.
  • Fix various typos and inconsistencies.

👥 Contributors

Thanks to @mollerhoj, @howl-anderson, @pktippa, @skrcode, @miroli, @ivyleavedtoadflax, @5hirish, @therealronnie, @alexvy86, @mn3mos, @polm, @knoxdw, @bellabie, @mauryaland, @LRAbbade, @janimo, @vishnumenon, @tzano, @cclauss, @armsp, @aristorinjuang, @BigstickCarpet, @idealley, @ansgar-t, @mpszumowski, @91ns, @msklvsk, @himkt, @DanielRuf, @nathanathan, @GolanLevy, @nipunsadvilkar, @cjhurst, @aliiae, @mirfan899, @ohenrik, @btrungchi, @kleinay, @DuyguA, @stefan-it, @Eleni170, @datascouting, @tjkemp, @x-ji, @giannisdaras, @kororo and @katarkor for the pull requests and contributions.

Don't miss a new spacy release

NewReleases is sending notifications on new releases.