✨ Major features and improvements
- NEW: Alpha support for Swedish tokenization.
- NEW: Alpha support for Hungarian tokenization.
- Update language data for Spanish tokenization.
- Speed up tokenization when no data is preloaded by caching the first 10,000 vocabulary items seen.
🔴 Bug fixes
- List the
language_datapackage in thesetup.py. - Fix missing
vec_pathdeclaration that was failing ifadd_vectorswas set. - Allow
Vocabto load withoutserializer_freqs.
📖 Documentation and examples
- NEW: spaCy Jupyter notebooks repo: ongoing collection of easy-to-run spaCy examples and tutorials.
- Fix issue #657: Generalise dependency parsing annotation specs beyond English.
- Fix various typos and inconsistencies.
👥 Contributors
Thanks to @oroszgy, @magnusburton, @jmizgajski, @aikramer2, @fnorf and @bhargavvader for the pull requests!