✨ Major features and improvements
- NEW: Support Chinese tokenization, via Jieba.
- NEW: Alpha support for French, Spanish, Italian and Portuguese tokenization.
🔴 Bug fixes
- Fix issue #376: POS tags for "and/or" are now correct.
- Fix issue #578:
--force
argument on download command now operates correctly. - Fix issue #595: Lemmatization corrected for some base forms.
- Fix issue #588:
Matcher
now rejects empty patterns. - Fix issue #592: Added exception rule for tokenization of "Ph.D."
- Fix issue #599: Empty documents now considered tagged and parsed.
- Fix issue #600: Add missing
token.tag
andtoken.tag_
setters. - Fix issue #596: Added missing unicode import when compiling regexes that led to incorrect tokenization.
- Fix issue #587: Resolved bug that caused
Matcher
to sometimes segfault. - Fix issue #429: Ensure missing entity types are added to the entity recognizer.