spacy 1.4.0 on Python PyPI

✨ Major features and improvements

NEW: Alpha support for Dutch tokenization.
Reorganise and improve format of language data.
Add shared tag map, entity rules, emoticons and punctuation to language data.
Convert entity rules, morphological rules and lemmatization rules from JSON to Python.
Update language data for English, German, Spanish, French, Italian and Portuguese.

🔴 Bug fixes

Fix issue #649: Update and reorganise stop lists.
Fix issue #672: Make token.ent_iob_ return unicode.
Fix issue #674: Add missing lemmas for contracted forms of "be" to TOKENIZER_EXCEPTIONS.
Fix issue #683: Morphology class now supplies tag map value for the special space tag if it's missing.
Fix issue #684: Ensure spacy.en.English() loads the Glove vector data if available. Previously was inconsistent with behaviour of spacy.load('en').
Fix issue #685: Expand TOKENIZER_EXCEPTIONS with unicode apostrophe (’).
Fix issue #689: Correct typo in STOP_WORDS.
Fix issue #691: Add tokenizer exceptions for "gonna" and "Gonna".

⚠️ Backwards incompatibilities

No changes to the public, documented API, but the previously undocumented language data and model initialisation processes have been refactored and reorganised. If you were relying on the bin/init_model.py script, see the new spaCy Developer Resources repo. Code that references internals of the spacy.en or spacy.de packages should also be reviewed before updating to this version.

📖 Documentation and examples

NEW: "Adding languages" workflow.
NEW: "Part-of-speech tagging" workflow.
NEW: spaCy Developer Resources repo – scripts, tools and resources for developing spaCy.
Fix various typos and inconsistencies.

👥 Contributors

Thanks to @dafnevk, @jvdzwaan, @RvanNieuwpoort, @wrvhage, @jaspb, @savvopoulos and @davedwards for the pull requests!

spacy 1.4.0 v1.4.0: Improved language data and alpha Dutch support on Python PyPI

✨ Major features and improvements

🔴 Bug fixes

⚠️ Backwards incompatibilities

📖 Documentation and examples

👥 Contributors

spacy 1.4.0
v1.4.0: Improved language data and alpha Dutch support

on Python PyPI