Details: https://spacy.io/models/pl#pl_core_news_sm
File checksum:
513a5a5c8c0ee0166d5001352b8447c8a71cb5c9553c0e0d635fc7371dc5d598
Polish multi-task CNN trained on the National Corpus of Polish and UD Polish PDB. Assigns context-specific token vectors, POS tags, lemmas, dependency parses and named entities.
Feature | Description |
---|---|
Name | pl_core_news_sm
|
Version | 2.3.0
|
spaCy | >=2.3.0,<2.4.0
|
Model size | 46 MB |
Pipeline | tagger , parser , ner
|
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | National Corpus of Polish (Mirosław Bańko, Rafał L. Górski, Barbara Lewandowska-Tomaszczyk, Marek Łaziński, Piotr Pęzik, Adam Przepiórkowski) UD Polish SZ v2.3 (Wróblewska, Alina; Zeman, Daniel; Mašek, Jan; Rosa, Rudolf) Morfeusz 2 Lemmas from the Grammatical Dictionary of Polish (SGJP) (Marcin Woliński, Zbigniew Bronk, Włodzimierz Gruszczyński, Witold Kieraś, Zygmunt Saloni, Danuta Skowrońska, Robert Wołosz) |
License | GPL
|
Author | Explosion and Ryszard Tuora |
Label Scheme
Component | Labels |
---|---|
tagger
| ADJ , ADJA , ADJC , ADJP , ADV , AGLT , BEDZIE , BREV , BURK , COMP , CONJ , DEPR , FIN , GER , IMPS , IMPT , INF , INTERJ , INTERP , NUM , NUMCOL , PACT , PANT , PCON , PPAS , PPRON12 , PPRON3 , PRAET , PRED , PREP , QUB , SIEBIE , SUBST , WINIEN , XXX , _SP
|
parser
| ROOT , acl , advcl , advmod , amod , appos , aux , aux:pass , case , cc , ccomp , conj , cop , dep , det , det:numgov , expl:pv , iobj , mark , nmod , nsubj , nsubj:pass , nummod , obj , obl , obl:arg , punct , xcomp
|
ner
| date , geogName , orgName , persName , placeName , time
|
Accuracy
Type | Score |
---|---|
LAS
| 78.09 |
UAS
| 85.61 |
TOKEN_ACC
| 99.83 |
TAGS_ACC
| 98.03 |
ENTS_F
| 81.32 |
ENTS_P
| 81.90 |
ENTS_R
| 80.75 |
Installation
pip install spacy
python -m spacy download pl_core_news_sm