✨ New features and improvements
- NEW: Add alpha support for Nepali.
- Refactor Japanese tokenizer and include additional custom tokenizer features.
- Update Armenian language data.
- Include spacy git commit in package and model meta for reference.
🔴 Bug fixes
- Fix issue #5620: Skip vocab in component config overrides.
- Fix issue #5634: Fix polarity of
Token.is_oov
andLexeme.is_oov
. - Fix issue #5643: Add strings and
ENT_KB_ID
toDoc
serialization. - Fix issue #5648: Disregard special tag _SP in check for new tag map.
- Fix issue #5658 : Move lemmatizer
is_base_form
to language settings.
👥 Contributors
Thanks to @myavrum, @mahnerak, @rameshhpathak, @hiroshi-matsuda-rit, @PluieElectrique, @hertelm and @alvaroabascar for the pull requests and contributions.