Thanks to all of you for 5,000 stars on GitHub, the valuable feedback in the user survey and testing spaCy v2.0 alpha. We're working hard on getting the new version ready and can't wait to release it. In the meantime, here's a new release for the 1.x branch that fixes a variety of outstanding bugs and adds capabilities for new languages.
💌 P.S.: If you haven't gotten your hands on a set of spaCy stickers yet, you can still do so – send us a DM with your address on Twitter or Gitter, and we'll mail you some!
✨ Major features and improvements
- NEW: The first official Spanish model (377 MB) including vocab, syntax, entities and word vectors. Thanks to the amazing folks at recogn.ai for the collaboration!
python -m spacy download es
nlp = spacy.load('es')
doc = nlp(u'Esto es una frase.')
- NEW: Alpha tokenization for Norwegian Bokmål and Japanese (via Janome).
- NEW: Allow dropout training for
Parser
andEntityRecognizer
, using thedrop
keyword argument to theupdate()
method. - NEW: Glossary for POS, dependency and NER annotation scheme via
spacy.explain()
. For example,spacy.explain('NORP')
will return "Nationalities or religious or political groups". - Improve language data for Dutch, French and Spanish.
- Add
Language.parse_tree
method to generate POS tree for all sentences in aDoc
.
🔴 Bug fixes
- Fix issue #1031: Close gaps in
Lexeme
API. - Fix issue #1034: Add annotation scheme glossary and
spacy.explain()
. - Fix issue #1051: Improved error messaging when trying to load non-existing model.
- Fix issue #1052: Add missing
SP
symbol to tag map. - Fix issue #1061: Add
flush_cache
method to tokenizer. - Fix issue #1069: Fix
Doc.sents
iterator when customised with generator. - Fix issue ##1099, #1143: Improve documentation on models in
requirements.txt
. - Fix issue #1137: Use lower min version for
requests
dependency. - Fix issue #1207: Fix
Span.noun_chunks
. - Fix issue with
six
and its dependencies that occasionally caused spaCy to fail. - Fix typo in
package
command that caused error when printing error messages.
📖 Documentation and examples
- Fix various typos and inconsistencies.
- NEW: spaCy 101 guide for v2.0: all important concepts, explained with examples and illustrations. Note that some of the behaviour and examples are specific to v2.0+ – but the NLP basics are relevant independent of the spaCy version you're using.
👥 Contributors
Thanks to @kengz, @luvogels, @ferdous-al-imran, @uetchy, @akYoung, @pasupulaphani, @dvsrepo, @raphael0202, @yuvalpinter, @frascuchon, @kootenpv, @oroszgy, @bartbroere, @ianmobbs, @garfieldnate, @polm, @callumkift, @swierh, @val314159, @lgenerknol and @jsparedes for the contributions!