✨ New features and improvements
- Alpha tokenization support for Persian.
- Add lookup lemmatizer for Turkish.
- Add lookup lemmatizer and tag map for Norwegian and improve tokenizer exceptions.
- Improve model downloading and linking and use proper exit codes in CLI commands.
🔴 Bug fixes
- Fix issue #1503: Fix
Matcher
bugs and behaviour of*
and+
operators. - Fix issue #1539: Fix
Vectors.resize
on Python 3. - Fix issue #1591: Fix compiler flags and remove
march=native
. - Fix issue #1606, #1698: Ensure
LIKE_URL
doesn't returnTrue
for email addresses. - Fix issue #1622: Use
nlp.to_disk
inspacy train
command. - Fix issue #1633: Add missing
Span.vocab
property. - Fix issue #1640: Fix infinite recursion in
token.sent_start
. - Fix issue #1663, #1721, #1761, #1780: Download models with
--no-deps
to avoid conda errors. - Fix issue #1712, #1813: Don't raise deprecation warning in property.
- Fix issue #1714: Make sure
download
andvalidate
commands exit correctly. - Fix issue #1727: Dont overwrite
pretrained_dims
setting from cfg. - Fix issue #1728: Correct
TextCategorizer
documentation. - Fix issue #1750: Remove non-breaking spaces from Hindi examples.
- Fix issue #1757: Fix rich comparison against
None
objects. - Fix issue #1758: Add English tokenizer exception for "would've".
- Fix issue #1769: Make
LIKE_NUM
case-insensitive. - Fix issue #1774: Allow pickling of
Chinese
language class. - Fix issue #1781: Add missing dev dependency.
- Fix issue #1799: Set
l_edge
andr_edge
correctly for non-projective parses. - Fix issue #1807: Make
set_vector
add word to vocab. - Fix issue #1820: Correct documentation of
Matcher
operators. - Fix issue #1831: Allow vector loading to work on 1d data files.
- Fix issue #1834: Fix sentence boundaries serialization.
- Fix issue #1838: Clarify hyperparameters and alias usage in
spacy train
. - Fix issue #1851: Fix typo and use better serialization example.
- Fix issue #1868: Make
Vocab.__contains__
work with ints. - Fix issue #1883: Fix unpickling of
Matcher
. - Fix issue #1911: Improve error handling if pipeline component is not callable.
- Fix issues with
spacy init_model
command.
📖 Documentation and examples
- Update list of community plugins and extensions.
- Fix various typos and inconsistencies.
👥 Contributors
Thanks to @cbilgili, @melanuria, @mpuels, @IsaacHaze, @sorenlind, @Bri-Will, @d99kris, @mdda, @kimfalk, @benjaminp, @zqhZY, @avinashrubird, @nirdesh37, @kwhumphreys, @fucking-signup, @wrathagom, @pbnsilva, @savkov, @matatusko, @GregDubbin, @avadhpatel, @azarezade, @ohenrik, @azarezade, @thomasopsomer, @Kimahriman and @hassanshamim for the pull requests and contributions.