✨ New features and improvements
- NEW: Add alpha support for Macedonian and Sanskrit.
- Update language data for Croatian, Czech, English, Hebrew, Hindi, Indonesian, Swedish, Thai and Turkish.
- Add support for aarch64 and ppc64le on linux with binary packages available on conda-forge.
🔴 Bug fixes
- Fix issue #5610: Make sure
sys.argv
exists. - Fix issue #5643: Add
ent_id_
to strings serialized withDoc
. - Fix issue #5727: Clarify warning for misaligned BILUO tags.
- Fix issue #5768: Improve tag map initialization and updating.
- Fix issue #5794: Improve warnings around normalization tables.
- Fix issue #5796: Update invalid tag maps.
- Fix issue #5799: Remove hard-coded GPU ID from
pretrain
. - Fix issue #5802: Mark Japanese documents as tagged.
- Fix issue #5823: Fix typo in unit tests.
- Fix issue #5838: Fix
EntityRenderer
to support break lines (after last entity). - Fix issue #5843: Prefer earlier spans in
EntityRuler
. - Fix issue #5849: Allow
Doc.char_span
to snap to token boundaries. - Fix issue #5853: Fix span boundary handling in Spanish noun chunks.
- Fix issue #5861: Add
Span
index boundary checks. - Fix issue #5904: Fix typos in comments.
- Fix issue #5910: Update default sentencizer characters for Armenian, Greek and Arabic.
- Fix issue #6014: Fix off-by-one error for best iteration calculation.
- Fix issue #6112: Fix overlapping German noun chunks.
- Fix issue #6148: Identify final
Matcher
pattern node by quantifier. - Fix issue #6164: Reorder so tag map is replaced only if a custom file is provided.
- Fix issue #6218: Reproducibility for
TextCategorizer
andTok2Vec
. - Fix issue #6219: Add re-enabled pipe names back to the meta before serializing.
- Fix issue #6300: Fix
on_match
callback and exclude empty match lists from results forDependencyMatcher
. - Fix issue #6347: Memory leak issues with
beam_parse
(requiresthinc>=7.4.3
). - Fix issue #6373: Bugfix textcat reproducibility on GPU (requires
thinc>=7.4.3
). - Fix issue #6405: Add all vectors to vocab before pruning.
- Fix issue #6413: Use int8_t instead of char in
Matcher
.
👥 Contributors
Thanks to @abchapman93, @baranitharan2020, @bittlingmayer, @bjascob, @borijang, @BramVanroy, @chopeen, @danielvasic, @delzac, @DuyguA, @erip, @florijanstamenkovic, @graue70, @hiroshi-matsuda-rit, @holubvl3, @idoshr, @jgutix, @KKsharma99, @leyendecker, @lizhe2004, @MartinoMensio, @nipunsadvilkar, @Nuccy90, @oculusrepairo, @rahul1990gupta, @rasyidf, @robertsipek, @SamEdwardes, @snsten, @solarmist, @Stannislav, @tamuhey, @tilusnet, @vha14, @wannaphong, @zaibacu for the pull requests and contributions.