✨ New features and improvements
- Add base support for Amharic.
- Add noun chunk iterator for Danish.
- Updates to French, Portuguese and Romanian stop words.
🔴 Bug fixes
- Fix issue #6705: Fix deserialization of null
token_match
andurl_match
for the tokenizer. - Fix issue #6712: Prevent overlapping noun chunks for Spanish.
- Fix issue #6745: Fix minibatch iterator when size iterator is finished.
- Fix issue #6759: Skip 0-length matches in the
Matcher
. - Fix issue #6771: Support
IS_SENT_START
in thePhraseMatcher
. - Fix issue #6772: Fix
Span.text
for empty spans. - Fix issue #6820: Improve
Doc.char_span
alignment_mode
handling. - Fix issue #6857: Remove
--no-cache-dir
when downloading models. - Fix issue #8115: Fix offsets in
Span.get_lca_matrix
.
👥 Contributors
Thanks to @alexcombessie, @AMArostegui, @bryant1410, @Cristianasp, @garethsparks, @jenojp, @jganseman, @jumasheff, @lorenanda, @ophelielacroix, @thomasbird, @timgates42, @tupui and @yosiasz for the pull requests and contributions.