Added
- [#508]: Add a Visualizer for notebooks to help understand how the tokenizers work
- [#519]: Add a
WordLevelTrainer
used to train aWordLevel
model - [#533]: Add support for conda builds
- [#542]: Add Split pre-tokenizer to easily split using a pattern
- [#544]: Ability to train from memory. This also improves the integration with
datasets
Changed
- [#509]: Automatically stubbing the
.pyi
files - [#519]: Each
Model
can return its associatedTrainer
withget_trainer()
- [#530]: The various attributes on each component can be get/set (ie.
tokenizer.model.dropout = 0.1
) - [#538]: The API Reference has been improved and is now up-to-date.