spacy 2.0.12 on Python PyPI

We had to release another update to the v2.0.x branch of spaCy to resolve a dependency issue, so we decided to also include and/or backport a bunch of features and fixes that were originally intended for v2.1.0 (see here for the nightly version).

✨ New features and improvements

NEW: Alpha tokenization and language data for Arabic, Urdu, Tatar and Greek.
NEW: Mecab-based Japanese tokenization and lemmatization.
NEW: Add Norwegian rule-based and lookup lemmatization.
NEW: Add Danish lookup lemmatization based on the Den store danske SprogTeknologiske Ordbase, STO dataset, courtesy of The University of Copenhagen.
NEW: Romanian lookup lemmatization.
Improve language data for Polish, Turkish, French, Romanian, Swedish and Japanese.
Improve case-sensitive lookup lemmatization in German.
Add Token.sent property that returns the sentence Span the token is part of.
Add remove_extension method on Doc, Token and Span.
Add Doc.is_sentenced property that returns True if sentence boundaries have been applied.
Allow ignoring warning by code via the SPACY_WARNING_IGNORE environment variable.
Add --silent option to info command.

🔴 Bug fixes

Fix issue #1456: Pass additional arguments of download command to pip and check if model is already installed before downloading it.
Fix issue #2191: Update README section on tests and dependencies.
Fix issue #2194: Ensure that Doc.noun_chunks_iterator isn't None before calling it.
Fix issue #2196: Return data in cli.info and add silent option.
Fix issue #2200: Correct typo in spacy package command message.
Fix issue #2210: Fix bug in Spanish noun chunks.
Fix issue #2211, #2320: Resolve problem in download command and use requests library again.
Fix issue #2219: Fix token similarity of single-letter tokens.
Fix issue #2222, #2223: Fix typos in documentation and docstrings.
Fix issue #2226: Use correct, non-deprecated merge syntax in merge_ents.
Fix issue #2228: Fix deserialization when using tensor=False or sentiment=False.
Fix issue #2238: Correct Swedish lookup lemmatization.
Fix issue #2242: Add remove_extension method on Doc, Token and Span.
Fix issue #2266: Add collapse_phrases option to displaCy visualizer.
Fix issue #2269: Fix KeyError by renaming SP to _SP.
Fix issue #2304: Don't require attrs argument in Doc.retokenize and allow ints/unicode.
Fix issue #2361: Escape HTML tags in displacy.render.
Fix issue #2376: Improve Matcher examples and add section on using pipeline components.
Fix issue #2385: Handle multi-word entities correctly in IOB to BILUO conversion.
Fix issue #2452: Fix bug that would cause displacy arrows to only point in one direction.
Fix issue #2477: Also allow Span objects in displacy.render.
Fix issue #2490: Update Thinc's dependencies for Python 3.7 compatibility.
Fix issue #2495: Fix loading tokenizer with custom prefix search.
Fix issue #2514: Switch from msgpack-python to msgpack to hopefully prevent conda from downloading a two-year-old spaCy version when installing with latest the Anaconda distribution.
Ensure that Doc.is_tagged is set correctly when using Language.pipe.
Fix bug in merge_noun_chunks factory that would return None if Doc wasn't parsed.
Explicitly require pathlib backport on Python 2 only.

📖 Documentation and examples

NEW: Edit and execute code examples in your browser – all across the documentation!
NEW: The spaCy Universe, a collection of plugins, extensions and other resources for spaCy.
NEW: Experimental rule-based Matcher Explorer demo – create token patterns interactively, test them against your text and copy-paste the Python pattern code.
NEW: Document Cython API.
Fix various typos and inconsistencies.

👥 Contributors

Thanks to @mollerhoj, @howl-anderson, @pktippa, @skrcode, @miroli, @ivyleavedtoadflax, @5hirish, @therealronnie, @alexvy86, @mn3mos, @polm, @knoxdw, @bellabie, @mauryaland, @LRAbbade, @janimo, @vishnumenon, @tzano, @cclauss, @armsp, @aristorinjuang, @BigstickCarpet, @idealley, @ansgar-t, @mpszumowski, @91ns, @msklvsk, @himkt, @DanielRuf, @nathanathan, @GolanLevy, @nipunsadvilkar, @cjhurst, @aliiae, @mirfan899, @ohenrik, @btrungchi, @kleinay, @DuyguA, @stefan-it, @Eleni170, @datascouting, @tjkemp, @x-ji, @giannisdaras, @kororo and @katarkor for the pull requests and contributions.

spacy 2.0.12 v2.0.12: Greek, Arabic, Urdu, Tatar, improved language data, better model downloads & various compatibility and bug fixes on Python PyPI

✨ New features and improvements

🔴 Bug fixes

📖 Documentation and examples

👥 Contributors

spacy 2.0.12
v2.0.12: Greek, Arabic, Urdu, Tatar, improved language data, better model downloads & various compatibility and bug fixes

on Python PyPI