Details: https://spacy.io/models/de#de_core_news_md
File checksum:
2636294e1e9351cca312a88659859d367684cca968a83b837525030078c8b14f
German multi-task CNN trained on the TIGER and WikiNER corpora. Assigns word vectors, POS tags, dependency parses and named entities. Word vectors trained using FastText CBOW on Wikipedia and OSCAR (Common Crawl).
Feature | Description |
---|---|
Name | de_core_news_md
|
Version | 2.3.0
|
spaCy | >=2.3.0,<2.4.0
|
Model size | 44 MB |
Pipeline | tagger , parser , ner
|
Vectors | 500000 keys, 20000 unique vectors (300 dimensions) |
Sources | TIGER Corpus WikiNER OSCAR (Common Crawl) Wikipedia (20200201) |
License | MIT
|
Author | Explosion |
Label Scheme
Component | Labels |
---|---|
tagger
| $( , $, , $. , ADJA , ADJD , ADV , APPO , APPR , APPRART , APZR , ART , CARD , FM , ITJ , KOKOM , KON , KOUI , KOUS , NE , NN , NNE , PDAT , PDS , PIAT , PIS , PPER , PPOSAT , PPOSS , PRELAT , PRELS , PRF , PROAV , PTKA , PTKANT , PTKNEG , PTKVZ , PTKZU , PWAT , PWAV , PWS , TRUNC , VAFIN , VAIMP , VAINF , VAPP , VMFIN , VMINF , VMPP , VVFIN , VVIMP , VVINF , VVIZU , VVPP , XY , _SP
|
parser
| ROOT , ac , adc , ag , ams , app , avc , cc , cd , cj , cm , cp , cvc , da , dep , dm , ep , ju , mnr , mo , ng , nk , nmc , oa , oc , og , op , par , pd , pg , ph , pm , pnc , punct , rc , re , rs , sb , sbp , svp , uc , vo
|
ner
| LOC , MISC , ORG , PER
|
Accuracy
Type | Score |
---|---|
LAS
| 90.90 |
UAS
| 92.84 |
TOKEN_ACC
| 99.92 |
TAGS_ACC
| 97.93 |
ENTS_F
| 84.96 |
ENTS_P
| 85.31 |
ENTS_R
| 84.61 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy
python -m spacy download de_core_news_md