Fixed
- [#362]: Fix training deadlock with Python components.
- [#363]: Fix a crash when calling
.train
with some non-existent files - [#355]: Remove a lot of possible crashes
- [#389]: Improve truncation (crash and consistency)
Added
- [#379]: Add the ability to call
encode
/encode_batch
with numpy arrays - [#292]: Support for the Unigram algorithm
- [#378], [#394], [#416], [#417]: Many new Normalizer and PreTokenizer
- [#403]: Add
TemplateProcessing
PostProcessor
. - [#420]: Ability to fuse the "unk" token in BPE.