huggingface/tokenizers python-v0.9.0
Python v0.9.0

on GitHub

latest releases: v0.20.1, v0.20.1rc1, v0.20.0...

4 years ago

Fixed

[#362]: Fix training deadlock with Python components.
[#363]: Fix a crash when calling .train with some non-existent files
[#355]: Remove a lot of possible crashes
[#389]: Improve truncation (crash and consistency)

Added

[#379]: Add the ability to call encode/encode_batch with numpy arrays
[#292]: Support for the Unigram algorithm
[#378], [#394], [#416], [#417]: Many new Normalizer and PreTokenizer
[#403]: Add TemplateProcessing PostProcessor.
[#420]: Ability to fuse the "unk" token in BPE.

Changed

[#360]: Lots of improvements related to words/alignment tracking
[#426]: Improvements on error messages thanks to PyO3 0.12

Check out latest releases or
releases around huggingface/tokenizers python-v0.9.0

Don't miss a new tokenizers release

NewReleases is sending notifications on new releases.

Get notifications