github explosion/spacy-models pl_core_news_sm-2.3.0

Downloads

Details: https://spacy.io/models/pl#pl_core_news_sm

File checksum: 513a5a5c8c0ee0166d5001352b8447c8a71cb5c9553c0e0d635fc7371dc5d598

Polish multi-task CNN trained on the National Corpus of Polish and UD Polish PDB. Assigns context-specific token vectors, POS tags, lemmas, dependency parses and named entities.

Feature Description
Name pl_core_news_sm
Version 2.3.0
spaCy >=2.3.0,<2.4.0
Model size 46 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources National Corpus of Polish (Mirosław Bańko, Rafał L. Górski, Barbara Lewandowska-Tomaszczyk, Marek Łaziński, Piotr Pęzik, Adam Przepiórkowski)
UD Polish SZ v2.3 (Wróblewska, Alina; Zeman, Daniel; Mašek, Jan; Rosa, Rudolf)
Morfeusz 2 Lemmas from the Grammatical Dictionary of Polish (SGJP) (Marcin Woliński, Zbigniew Bronk, Włodzimierz Gruszczyński, Witold Kieraś, Zygmunt Saloni, Danuta Skowrońska, Robert Wołosz)
License GPL
Author Explosion and Ryszard Tuora

Label Scheme

Component Labels
tagger  ADJ, ADJA, ADJC, ADJP, ADV, AGLT, BEDZIE, BREV, BURK, COMP, CONJ, DEPR, FIN, GER, IMPS, IMPT, INF, INTERJ, INTERP, NUM, NUMCOL, PACT, PANT, PCON, PPAS, PPRON12, PPRON3, PRAET, PRED, PREP, QUB, SIEBIE, SUBST, WINIEN, XXX, _SP
parser  ROOT, acl, advcl, advmod, amod, appos, aux, aux:pass, case, cc, ccomp, conj, cop, dep, det, det:numgov, expl:pv, iobj, mark, nmod, nsubj, nsubj:pass, nummod, obj, obl, obl:arg, punct, xcomp
ner  date, geogName, orgName, persName, placeName, time

Accuracy

Type Score
LAS  78.09
UAS  85.61
TOKEN_ACC  99.83
TAGS_ACC  98.03
ENTS_F  81.32
ENTS_P  81.90
ENTS_R  80.75

Installation

pip install spacy
python -m spacy download pl_core_news_sm

Don't miss a new spacy-models release

NewReleases is sending notifications on new releases.