Details: https://spacy.io/models/de#de_core_news_lg
File checksum:
cd3c565cfb6dd3df535b109077466685986059b18387259cc1fa7cf2db833d66
German multi-task CNN trained on the TIGER and WikiNER corpora. Assigns word vectors, POS tags, dependency parses and named entities. Word vectors trained using FastText CBOW on Wikipedia and OSCAR (Common Crawl).
Feature | Description |
---|---|
Name | de_core_news_lg
|
Version | 2.3.0
|
spaCy | >=2.3.0,<2.4.0
|
Model size | 544 MB |
Pipeline | tagger , parser , ner
|
Vectors | 500000 keys, 500000 unique vectors (300 dimensions) |
Sources | TIGER Corpus WikiNER OSCAR (Common Crawl) Wikipedia (20200201) |
License | MIT
|
Author | Explosion |
Label Scheme
Component | Labels |
---|---|
tagger
| $( , $, , $. , ADJA , ADJD , ADV , APPO , APPR , APPRART , APZR , ART , CARD , FM , ITJ , KOKOM , KON , KOUI , KOUS , NE , NN , NNE , PDAT , PDS , PIAT , PIS , PPER , PPOSAT , PPOSS , PRELAT , PRELS , PRF , PROAV , PTKA , PTKANT , PTKNEG , PTKVZ , PTKZU , PWAT , PWAV , PWS , TRUNC , VAFIN , VAIMP , VAINF , VAPP , VMFIN , VMINF , VMPP , VVFIN , VVIMP , VVINF , VVIZU , VVPP , XY , _SP
|
parser
| ROOT , ac , adc , ag , ams , app , avc , cc , cd , cj , cm , cp , cvc , da , dep , dm , ep , ju , mnr , mo , ng , nk , nmc , oa , oc , og , op , par , pd , pg , ph , pm , pnc , punct , rc , re , rs , sb , sbp , svp , uc , vo
|
ner
| LOC , MISC , ORG , PER
|
Accuracy
Type | Score |
---|---|
LAS
| 91.15 |
UAS
| 92.99 |
TOKEN_ACC
| 99.92 |
TAGS_ACC
| 98.12 |
ENTS_F
| 86.11 |
ENTS_P
| 86.28 |
ENTS_R
| 85.94 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy
python -m spacy download de_core_news_lg