huggingface/tokenizers python-v0.11.0
Python v0.11.0

on GitHub

latest releases: v0.20.3, v0.20.3rc1, v0.20.2...

2 years ago

Fixed

[#585] Conda version should now work on old CentOS
[#844] Fixing interaction between is_pretokenized and trim_offsets.
[#851] Doc links

Added

[#657]: Add SplitDelimiterBehavior customization to Punctuation constructor
[#845]: Documentation for Decoders.

Changed

[#850]: Added a feature gate to enable disabling http features
[#718]: Fix WordLevel tokenizer determinism during training
[#762]: Add a way to specify the unknown token in SentencePieceUnigramTokenizer
[#770]: Improved documentation for UnigramTrainer
[#780]: Add Tokenizer.from_pretrained to load tokenizers from the Hugging Face Hub
[#793]: Saving a pretty JSON file by default when saving a tokenizer

Check out latest releases or
releases around huggingface/tokenizers python-v0.11.0

Don't miss a new tokenizers release

NewReleases is sending notifications on new releases.

Get notifications