github explosion/spacy-models de_core_news_md-2.3.0

Downloads

Details: https://spacy.io/models/de#de_core_news_md

File checksum: 2636294e1e9351cca312a88659859d367684cca968a83b837525030078c8b14f

German multi-task CNN trained on the TIGER and WikiNER corpora. Assigns word vectors, POS tags, dependency parses and named entities. Word vectors trained using FastText CBOW on Wikipedia and OSCAR (Common Crawl).

Feature Description
Name de_core_news_md
Version 2.3.0
spaCy >=2.3.0,<2.4.0
Model size 44 MB
Pipeline  tagger, parser, ner
Vectors 500000 keys, 20000 unique vectors (300 dimensions)
Sources TIGER Corpus
WikiNER
OSCAR (Common Crawl)
Wikipedia (20200201)
License MIT
Author Explosion

Label Scheme

Component Labels
tagger  $(, $,, $., ADJA, ADJD, ADV, APPO, APPR, APPRART, APZR, ART, CARD, FM, ITJ, KOKOM, KON, KOUI, KOUS, NE, NN, NNE, PDAT, PDS, PIAT, PIS, PPER, PPOSAT, PPOSS, PRELAT, PRELS, PRF, PROAV, PTKA, PTKANT, PTKNEG, PTKVZ, PTKZU, PWAT, PWAV, PWS, TRUNC, VAFIN, VAIMP, VAINF, VAPP, VMFIN, VMINF, VMPP, VVFIN, VVIMP, VVINF, VVIZU, VVPP, XY, _SP
parser  ROOT, ac, adc, ag, ams, app, avc, cc, cd, cj, cm, cp, cvc, da, dep, dm, ep, ju, mnr, mo, ng, nk, nmc, oa, oc, og, op, par, pd, pg, ph, pm, pnc, punct, rc, re, rs, sb, sbp, svp, uc, vo
ner  LOC, MISC, ORG, PER

Accuracy

Type Score
LAS  90.90
UAS  92.84
TOKEN_ACC  99.92
TAGS_ACC  97.93
ENTS_F  84.96
ENTS_P  85.31
ENTS_R  84.61

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy
python -m spacy download de_core_news_md

Don't miss a new spacy-models release

NewReleases is sending notifications on new releases.