Major changes
- Python 3.11 support (wheel packages for python 3.11 are available)
- Includes the entire full sources in the source python package to reduce the pip install troubles.
- Improves the algorithm to initialize unigram seed vocabulary. Coverage is improved.
New features
- [ALL] Added the feature to train the model with pre-tokenization boundary constraints. (
--pretokenization_delimiter
) flag
Bug fixes & minor changes
- [ALL] Makes the error message more descriptive.
- [ALL] Fixes the crash error when std::random_device failed
- [ALL] Fixes the build error on Raspberry pi around atomic operation
- [ALL] Fixes the minor bugs in nbest enumeration
- [ALL] Fixes the build error when using the external protobuf library.
- [ALL] Fixes the build error on a big-endian machine.
- [Windows] Use /MD build flag instead of /MT.