huggingface/tokenizers node-v0.8.0 on GitHub

Many improvements on the Trainer (#519).
The files must now be provided first when calling tokenizer.train(files, trainer).

Adding the TemplateProcessing
Add WordLevel and Unigram models (#490)
Add nmtNormalizer and precompiledNormalizer normalizers (#490)
Add templateProcessing post-processor (#490)
Add digitsPreTokenizer pre-tokenizer (#490)
Add support for mapping to sequences (#506)
Add splitPreTokenizer pre-tokenizer (#542)
Add behavior option to the punctuationPreTokenizer (#657)
Add the ability to load tokenizers from the Hugging Face Hub using fromPretrained (#780)

Fix a bug where long tokenizer.json files would be incorrectly deserialized (#459)
Fix RobertaProcessing deserialization in PostProcessorWrapper (#464)

huggingface/tokenizers node-v0.8.0 Node v0.8.0 on GitHub