huggingface/tokenizers rust-v0.8.0
Rust v0.8.0

on GitHub

latest releases: v0.20.1, v0.20.1rc1, v0.20.0...

4 years ago

Changes:

Big improvements in speed for BPE (Both training and tokenization) (#165)

Fixes:

Do not open all files directly while training (#163)
There was a bug in ByteLevel PreTokenizer that caused offsets to be wrong if a char got split up
in multiple bytes. (cf #156)
The LongestFirst truncation strategy had a bug (#174)

Check out latest releases or
releases around huggingface/tokenizers rust-v0.8.0

Don't miss a new tokenizers release

NewReleases is sending notifications on new releases.

Get notifications