✨ New features and improvements
- Allow customizing punctuation characters in sentencizer and make it serializable.
- Add new
"bow"
architecture forTextCategorizer
, to do faster bag-of-words text classification.
🔴 Bug fixes
- Fix issue #3433, #3458: Fix mismatch of classes in parser after serialization.
- Fix issue #3464: Fix training loop in
train_textcat.py
example. - Fix issue #3468: Make sentencizer set
Token.is_sent_start
correctly. - Fix bug in the
"ensemble"
TextClassifier
architecture that prevented the unigram bag-of-words submodel from working properly.
👥 Contributors
Thanks to @chkoar for the pull request!