✨ New features and improvements
- Add
Token.tensor
andSpan.tensor
attributes. - Support simple training format of
(text, annotations)
instead of only(doc, gold)
fornlp.evaluate
. - Add support for
"lang_factory"
setting in modelmeta.json
(see #4031). - Also support
"requirements"
inmeta.json
to define packages for setup'sinstall_requires
. - Improve
Pipe
base class methods and make them less presumptuous. - Improve Danish and Korean tokenization.
- Improve error messages when deserializing model fails.
🔴 Bug fixes
- Fix issue #3669, #3962: Fix dependency copy in
Span.as_doc
that could cause segfault. - Fix issue #3968: Fix bug in per-entity scores.
- Fix issue #4000: Improve entity linking API.
- Fix issue #4022: Fix error when Korean text contains special characters.
- Fix issue #4030: Handle edge case when calling
TextCategorizer.predict
with emptyDoc
. - Fix issue #4045: Correct
Span.sent
docs. - Fix issue #4048: Fix
init-model
command if there's no vocab. - Fix issue #4052: Improve per-type scoring of NER.
- Fix issue #4054: Ensure the
lang
ofnlp
andnlp.vocab
stay consistent. - Fix bugs in
Token.similarity
andSpan.similarity
when called via hook.
📖 Documentation and examples
- Add documentation for
gold.align
helper. - Add more explicit section on processing text.
- Improve documentation on disabling pipeline components.
- Fix various typos and inconsistencies.
👥 Contributors
Thanks to @sorenlind, @pmbaumgartner, @svlandeg, @FallakAsad, @BreakBB, @adrianeboyd, @polm, @b1uec0in, @mdaudali and @ejarkm for the pull requests and contributions.