✨ New features and improvements
- Support for mypy 0.950+ and pydantic v1.9 (#10786).
- Prebuilt linux aarch64 wheels are now available for all spaCy dependencies distributed by @explosion.
- Min/max
{n,m}
operator forMatcher
patterns (#10981). - Language updates:
- Improved speed of vector lookups (#10992).
- For the parser, use C
saxpy
/sgemm
provided by theOps
implementation in order to use Accelerate throughthinc-apple-ops
(#10773). - Improved speed of
Example.get_aligned_parse
andExample.get_aligned
(#10952). - Improved speed of
StringStore
lookups (#10938). - Updated
spacy project clone
to try bothmain
andmaster
branches by default (#10843). - Added confidence threshold for named entity linker (#11016).
- Improved handling of Typer optional default values for
init_config_cli
(#10788). - Added cycle detection in parser projectivization methods (#10877).
- Added counts for NER labels in
debug data
(#10960). - Support for adding NVTX ranges to
TrainablePipe
components (#10965). - Support env variable
SPACY_NUM_BUILD_JOBS
to specify the number of build jobs to run in parallel withpip
(#11073).
📦 Trained pipelines updates
We have added new pipelines for Croatian that use the trainable lemmatizer and floret vectors.
Package | UPOS | Parser LAS | NER F |
---|---|---|---|
hr_core_news_sm
| 96.6 | 77.5 | 76.1 |
hr_core_news_md
| 97.3 | 80.1 | 81.8 |
hr_core_news_lg
| 97.5 | 80.4 | 83.0 |
🙏 Special thanks to @gtoffoli for help with the new pipelines!
The English pipelines have new word vectors:
Package | Model Version | TAG | Parser LAS | NER F |
---|---|---|---|---|
en_core_news_md
| v3.3.0 | 97.3 | 90.1 | 84.6 |
en_core_news_md
| v3.4.0 | 97.2 | 90.3 | 85.5 |
en_core_news_lg
| v3.3.0 | 97.4 | 90.1 | 85.3 |
en_core_news_lg
| v3.4.0 | 97.3 | 90.2 | 85.6 |
All CNN pipelines have been extended to add whitespace augmentation.
🔴 Bug fixes
- Fix issue #10960: Support hyphens in NER labels.
- Fix issue #10994: Fix horizontal spacing for spans in displaCy.
- Fix issue #11013: Check for any token with a vector in
Doc.has_vector
, distinguish 0-vectors and missing vectors insimilarity
warnings. - Fix issue #11056: Don't use
get_array_module
intextcat
. - Fix issue #11092: Fix vertical alignment for spans in displaCy.
🚀 Notes about upgrading from v3.3
Doc.has_vector
now matchesToken.has_vector
andSpan.has_vector
: it returnsTrue
if at least one token in the doc has a vector rather than checking only whether the vocab contains vectors.
📖 Documentation and examples
- spaCy universe additions:
- Aim-spacy: An Aim-based spaCy experiment tracker.
- Asent: Fast, flexible and transparent sentiment analysis.
- spaCy fishing: Named entity disambiguation and linking on Wikidata in spaCy with Entity-Fishing.
- spacy-report: Generates interactive reports for spaCy models.
👥 Contributors
@adrianeboyd, @danieldk, @ericholscher, @gorarakelyan, @honnibal, @ines, @jademlc, @kadarakos, @KennethEnevoldsen, @koaning, @Lucaterre, @maxTarlov, @philipvollet, @pmbaumgartner, @polm, @richardpaulhudson, @rmitsch, @sadovnychyi, @shadeMe, @shen-qin, @single-fingal, @svlandeg, @victorialslocum, @Zackere