github huggingface/tokenizers rust-v0.8.0
Rust v0.8.0

latest releases: v0.20.1, v0.20.1rc1, v0.20.0...
4 years ago

Changes:

  • Big improvements in speed for BPE (Both training and tokenization) (#165)

Fixes:

  • Do not open all files directly while training (#163)
  • There was a bug in ByteLevel PreTokenizer that caused offsets to be wrong if a char got split up
    in multiple bytes. (cf #156)
  • The LongestFirst truncation strategy had a bug (#174)

Don't miss a new tokenizers release

NewReleases is sending notifications on new releases.