Details: https://spacy.io/models/pl#pl_core_news_lg
File checksum:
594225dad3afebb5ea78ee2780b83216324f203de0f7cd996104867dd1858ca9
Polish multi-task CNN trained on the National Corpus of Polish and UD Polish PDB. Assigns word vectors, POS tags, lemmas, dependency parses and named entities. Word vectors trained using FastText CBOW on Wikipedia and OSCAR (Common Crawl).
Feature | Description |
---|---|
Name | pl_core_news_lg
|
Version | 2.3.0
|
spaCy | >=2.3.0,<2.4.0
|
Model size | 576 MB |
Pipeline | tagger , parser , ner
|
Vectors | 500000 keys, 500000 unique vectors (300 dimensions) |
Sources | National Corpus of Polish (Mirosław Bańko, Rafał L. Górski, Barbara Lewandowska-Tomaszczyk, Marek Łaziński, Piotr Pęzik, Adam Przepiórkowski) UD Polish SZ v2.3 (Wróblewska, Alina; Zeman, Daniel; Mašek, Jan; Rosa, Rudolf) Morfeusz 2 Lemmas from the Grammatical Dictionary of Polish (SGJP) (Marcin Woliński, Zbigniew Bronk, Włodzimierz Gruszczyński, Witold Kieraś, Zygmunt Saloni, Danuta Skowrońska, Robert Wołosz) OSCAR (Common Crawl) Wikipedia (20200301) |
License | GPL
|
Author | Explosion and Ryszard Tuora |
Label Scheme
Component | Labels |
---|---|
tagger
| ADJ , ADJA , ADJC , ADJP , ADV , AGLT , BEDZIE , BREV , BURK , COMP , CONJ , DEPR , FIN , GER , IMPS , IMPT , INF , INTERJ , INTERP , NUM , NUMCOL , PACT , PANT , PCON , PPAS , PPRON12 , PPRON3 , PRAET , PRED , PREP , QUB , SIEBIE , SUBST , WINIEN , XXX , _SP
|
parser
| ROOT , acl , advcl , advmod , amod , appos , aux , aux:pass , case , cc , ccomp , conj , cop , dep , det , det:numgov , expl:pv , iobj , mark , nmod , nsubj , nsubj:pass , nummod , obj , obl , obl:arg , punct , xcomp
|
ner
| date , geogName , orgName , persName , placeName , time
|
Accuracy
Type | Score |
---|---|
LAS
| 85.52 |
UAS
| 90.80 |
TOKEN_ACC
| 99.83 |
TAGS_ACC
| 98.45 |
ENTS_F
| 85.67 |
ENTS_P
| 85.61 |
ENTS_R
| 85.72 |
Installation
pip install spacy
python -m spacy download pl_core_news_lg