huggingface/tokenizers python-v0.10.0 on GitHub

[#508]: Add a Visualizer for notebooks to help understand how the tokenizers work
[#519]: Add a WordLevelTrainer used to train a WordLevel model
[#533]: Add support for conda builds
[#542]: Add Split pre-tokenizer to easily split using a pattern
[#544]: Ability to train from memory. This also improves the integration with datasets
[#590]: Add getters/setters for components on BaseTokenizer
[#574]: Add fust_unk option to SentencePieceBPETokenizer

[#509]: Automatically stubbing the .pyi files
[#519]: Each Model can return its associated Trainer with get_trainer()
[#530]: The various attributes on each component can be get/set (ie.
tokenizer.model.dropout = 0.1)
[#538]: The API Reference has been improved and is now up-to-date.

Fixed

[#519]: During training, the Model is now trained in-place. This fixes several bugs that were
forcing to reload the Model after a training.
[#539]: Fix BaseTokenizer enable_truncation docstring