github huggingface/tokenizers python-v0.9.3
Python v0.9.3

latest releases: v0.21.0rc0, v0.20.4, v0.20.4rc0...
4 years ago

Fixed

  • [#470]: Fix hanging error when training with custom component
  • [#476]: TemplateProcessing serialization is now deterministic
  • [#481]: Fix SentencePieceBPETokenizer.from_files

Added

  • [#477]: UnicodeScripts PreTokenizer to avoid merges between various scripts
  • [#480]: Unigram now accepts an initial_alphabet and handles special_tokens correctly

Don't miss a new tokenizers release

NewReleases is sending notifications on new releases.