pypi spacy 1.4.0
v1.4.0: Improved language data and alpha Dutch support

latest releases: 4.0.0.dev3, 3.7.4, 3.7.3...
7 years ago

✨ Major features and improvements

  • NEW: Alpha support for Dutch tokenization.
  • Reorganise and improve format of language data.
  • Add shared tag map, entity rules, emoticons and punctuation to language data.
  • Convert entity rules, morphological rules and lemmatization rules from JSON to Python.
  • Update language data for English, German, Spanish, French, Italian and Portuguese.

🔴 Bug fixes

  • Fix issue #649: Update and reorganise stop lists.
  • Fix issue #672: Make token.ent_iob_ return unicode.
  • Fix issue #674: Add missing lemmas for contracted forms of "be" to TOKENIZER_EXCEPTIONS.
  • Fix issue #683: Morphology class now supplies tag map value for the special space tag if it's missing.
  • Fix issue #684: Ensure spacy.en.English() loads the Glove vector data if available. Previously was inconsistent with behaviour of spacy.load('en').
  • Fix issue #685: Expand TOKENIZER_EXCEPTIONS with unicode apostrophe ().
  • Fix issue #689: Correct typo in STOP_WORDS.
  • Fix issue #691: Add tokenizer exceptions for "gonna" and "Gonna".

⚠️ Backwards incompatibilities

No changes to the public, documented API, but the previously undocumented language data and model initialisation processes have been refactored and reorganised. If you were relying on the bin/init_model.py script, see the new spaCy Developer Resources repo. Code that references internals of the spacy.en or spacy.de packages should also be reviewed before updating to this version.

📖 Documentation and examples

👥 Contributors

Thanks to @dafnevk, @jvdzwaan, @RvanNieuwpoort, @wrvhage, @jaspb, @savvopoulos and @davedwards for the pull requests!

Don't miss a new spacy release

NewReleases is sending notifications on new releases.