github huggingface/tokenizers node-v0.12.0
[YANKED] Node v0.12.0

latest releases: v0.21.0rc0, v0.20.4, v0.20.4rc0...
3 years ago

[0.12.0]

The breaking change was causing more issues upstream in transformers than anticipated:
huggingface/transformers#16537 (comment)

The decision was to rollback on that breaking change, and figure out a different way later to do this modification

Bump minor version because of a breaking change.
Using 0.12 to match other bindings.

  • [#938] Breaking change. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizers should be error free.

  • [#939] Making the regex in ByteLevel pre_tokenizer optional (necessary for BigScience)

  • [#952] Fixed the vocabulary size of UnigramTrainer output (to respect added tokens)

  • [#954] Fixed not being able to save vocabularies with holes in vocab (ConvBert). Yell warnings instead, but stop panicking.

  • [#961] Added link for Ruby port of tokenizers

Don't miss a new tokenizers release

NewReleases is sending notifications on new releases.