github explosion/spacy-models de_core_news_sm-2.3.0

Downloads

Details: https://spacy.io/models/de#de_core_news_sm

File checksum: 87fe081677c54615b8f5b3e701b8279c929dc9b5ed2aed1545e2494b5cae8b01

German multi-task CNN trained on the TIGER and WikiNER corpora. Assigns context-specific token vectors, POS tags, dependency parses and named entities.

Feature Description
Name de_core_news_sm
Version 2.3.0
spaCy >=2.3.0,<2.4.0
Model size 14 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources TIGER Corpus
WikiNER
License MIT
Author Explosion

Label Scheme

Component Labels
tagger  $(, $,, $., ADJA, ADJD, ADV, APPO, APPR, APPRART, APZR, ART, CARD, FM, ITJ, KOKOM, KON, KOUI, KOUS, NE, NN, NNE, PDAT, PDS, PIAT, PIS, PPER, PPOSAT, PPOSS, PRELAT, PRELS, PRF, PROAV, PTKA, PTKANT, PTKNEG, PTKVZ, PTKZU, PWAT, PWAV, PWS, TRUNC, VAFIN, VAIMP, VAINF, VAPP, VMFIN, VMINF, VMPP, VVFIN, VVIMP, VVINF, VVIZU, VVPP, XY, _SP
parser  ROOT, ac, adc, ag, ams, app, avc, cc, cd, cj, cm, cp, cvc, da, dep, dm, ep, ju, mnr, mo, ng, nk, nmc, oa, oc, og, op, par, pd, pg, ph, pm, pnc, punct, rc, re, rs, sb, sbp, svp, uc, vo
ner  LOC, MISC, ORG, PER

Accuracy

Type Score
LAS  90.13
UAS  92.32
TOKEN_ACC  99.92
TAGS_ACC  97.53
ENTS_F  83.35
ENTS_P  83.92
ENTS_R  82.78

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy
python -m spacy download de_core_news_sm

Don't miss a new spacy-models release

NewReleases is sending notifications on new releases.