pypi spacy 1.6.0
v1.6.0: Improvements to tokenizer and tests

latest releases: 4.0.0.dev3, 3.7.4, 3.7.3...
7 years ago

✨ Major features and improvements

  • Updated token exception handling mechanism to allow the usage of arbitrary functions as token exception matchers.
  • Improve how tokenizer exceptions for English contractions and punctuations are generated.
  • Update language data for Hungarian and Swedish tokenization.
  • Update to use Thinc v6 to prepare for spaCy v2.0.

🔴 Bug fixes

  • Fix issue #326: Tokenizer is now more consistent and handles abbreviations correctly.
  • Fix issue #344: Tokenizer now handles URLs correctly.
  • Fix issue #483: Period after two or more uppercase letters is split off in tokenizer exceptions.
  • Fix issue #631: Add richcmp method to Token.
  • Fix issue #718: Contractions with She are now handled correctly.
  • Fix issue #736: Times are now tokenized with correct string values.
  • Fix issue #743: Token is now hashable.
  • Fix issue #744: were and Were are now excluded correctly from contractions.

📋 Tests

  • Modernise and reorganise all tests and remove model dependencies where possible.
  • Improve test speed to ~20s for basic tests (from previously >80s) and ~100s including models (from previously >200s).
  • Add fixtures for spaCy components and test utilities, e.g. to create Doc object manually.
  • Add documentation for tests to explain conventions and organisation.

👥 Contributors

Thanks to @oroszgy, @magnusburton, @guyrosin and @danielhers and for the pull requests!

Don't miss a new spacy release

NewReleases is sending notifications on new releases.