huggingface/tokenizers python-v0.6.0
Python v0.6.0

on GitHub

latest releases: v0.20.1, v0.20.1rc1, v0.20.0...

4 years ago

Changes:

Big improvements in speed for BPE (Both training and tokenization) (#165)

Fixes:

Some default tokens were missing from BertWordPieceTokenizer (cf #160)
There was a bug in ByteLevel PreTokenizer that caused offsets to be wrong if a char got split up
in multiple bytes. (cf #156)
The longest_first truncation strategy had a bug (#174)

Check out latest releases or
releases around huggingface/tokenizers python-v0.6.0

Don't miss a new tokenizers release

NewReleases is sending notifications on new releases.

Get notifications