⚠️ Important note: This is a bridge release that gets the current state of the v1.x branch published. Stay tuned for v2.0.
✨ Major features and improvements
- NEW: Alpha tokenization support for Thai and Russian.
- NEW: Alpha support for Japanese part-of-speech tagging.
- NEW: Dependency pattern-matching algorithm (see #1120).
- Add support for getting a lowest common ancestor matrix via
Doc.get_lca_matrix()
. - Improve capturing of English noun chunks.
🔴 Bug fixes
- Fix issue #1078: Simplify URL pattern.
- Fix issue #1174: Fix NER model loading bug and make sure JSON keys are loaded as strings.
- Fix issue #1291: Document correct JSON format for training.
- Fix issue #1292: Fix error when adding custom infix rules.
- Fix issue #1387: Ensure that lemmatizer respects exception rules.
- Fix issue #1410: Support single value for attribute list in
Doc.to_scalar
andDoc.to_array
.
📖 Documentation and examples
- Document correct JSON format for training.
- Fix various typos and inconsistencies.
👥 Contributors
Thanks to @raphael0202, @gideonite, @delirious-lettuce, @polm, @kevinmarsh, @IamJeffG, @Vimos, @ericzhao28, @galaxyh, @hscspring, @wannaphongcom, @Wellan89, @kokes, @mdcclv, @ameyuuno, @ramananbalakrishnan, @Demfier, @johnhaley81, @mayukh18 and @jnothman for the pull requests and contributions.