Added
-
Support for training a tokenizer from scratch. See
Tokenizers.Tokenizer.train_from_files/3
andTokenizers.Model
for available models. -
Support for changing tokenizer configuration, such as
Tokenizers.Tokenizer.set_padding/2
andTokenizers.Tokenizer.set_truncation/2
. See the "Configuration" functions group in
Tokenizers.Tokenizer
. -
Support for apply multiple encoding transformations without additional data copies,
seeTokenizers.Encoding.Transformation
. Transformations can be passed to
Tokenizers.Tokenizer.encode/3
via:encoding_transformations
or applied via
Tokenizers.Encoding.transform/2
.
Changed
-
(Breaking)
Tokenizers.Tokenizer.encode/3
no longer accepts a batch of inputs,
to encode a batch useTokenizers.Tokenizer.encode_batch/3
instead -
(Breaking)
Tokenizers.Tokenizer.decode/3
no longer accepts a batch of inputs,
to encode a batch useTokenizers.Tokenizer.decode_batch/3
instead
Full Changelog: v0.3.2...v0.4.0
Checksums
SHA256 list:
2d04e0b62b8d23b515ad33bf8a6fec4c8a01aa87f79516fee1b57ac39e96ec20 ex_tokenizers-v0.4.0-nif-2.15-x86_64-pc-windows-gnu.dll.tar.gz
dee3196c4908b8f56cb3080af38a7ed955873c685a2d8cbddde8fbc96e466220 ex_tokenizers-v0.4.0-nif-2.15-x86_64-pc-windows-msvc.dll.tar.gz
3615b939766fe439ff0d3fa939ad37b6a6059a8347f06a05657185eb2e0e5b51 ex_tokenizers-v0.4.0-nif-2.16-x86_64-pc-windows-gnu.dll.tar.gz
310c032a19d088520bd4d4644ae9f19039099a2d4143523771a7a5571f2f7117 ex_tokenizers-v0.4.0-nif-2.16-x86_64-pc-windows-msvc.dll.tar.gz
bbf8f8324804c40346cb5cd7a5addc31a931fc710ca4b42e01dd07d23b8eca10 libex_tokenizers-v0.4.0-nif-2.15-aarch64-apple-darwin.so.tar.gz
3bbf7a63ed4bda9a4390b5acba2d60e2ebd744f1d3f1d754f8ec4cc4f5eca6ff libex_tokenizers-v0.4.0-nif-2.15-aarch64-unknown-linux-gnu.so.tar.gz
7dd6b547c95482518a15fb2a4bacdd751537a75ba7d8285ff5145904d55c4449 libex_tokenizers-v0.4.0-nif-2.15-aarch64-unknown-linux-musl.so.tar.gz
e1014519eb978cbc20649bb122a6c1f834629579ea0650a79bf8e816f58da6dd libex_tokenizers-v0.4.0-nif-2.15-arm-unknown-linux-gnueabihf.so.tar.gz
a6456c8719c5b5914068378198a43d080a87221dd37ff2ccf1da05371ff028d7 libex_tokenizers-v0.4.0-nif-2.15-riscv64gc-unknown-linux-gnu.so.tar.gz
9007181e446c00a9113993b43b6522b6b28a098ae1154f3f964ba8fa3cbe0b8c libex_tokenizers-v0.4.0-nif-2.15-x86_64-apple-darwin.so.tar.gz
061cef149b7e91f4556090c1df2672827c4e9d59e30befacd5167e4fd28c7098 libex_tokenizers-v0.4.0-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz
b2d9e65ccb6aeaaa4bbcf43d034c7af13e257770eaef55a0eb70f58b5e28ebc1 libex_tokenizers-v0.4.0-nif-2.15-x86_64-unknown-linux-musl.so.tar.gz
dfe7755fbad8a3409f3b5258e810220f2b0f179d8569dc7abcc489e65804d26e libex_tokenizers-v0.4.0-nif-2.16-aarch64-apple-darwin.so.tar.gz
3bc9e5e23afaa2ed15680afd8f8e843434de804e8e708a5ca00c6a413a3f6be2 libex_tokenizers-v0.4.0-nif-2.16-aarch64-unknown-linux-gnu.so.tar.gz
987a81d6babb13b2baa9bfb8c602691766a6319548f4f129e0c99d56c6a155ae libex_tokenizers-v0.4.0-nif-2.16-aarch64-unknown-linux-musl.so.tar.gz
b973a15035c495c27aa74e525342b2aff3be66c8f1f75f23776ea5f8dce59ba3 libex_tokenizers-v0.4.0-nif-2.16-arm-unknown-linux-gnueabihf.so.tar.gz
3a1cef66172ab7c82a255933a4d957d21d6b5fb26f6deea968de2598fd477f48 libex_tokenizers-v0.4.0-nif-2.16-riscv64gc-unknown-linux-gnu.so.tar.gz
e4a57f3e7a1dd29933fedab12b7c57d9a9cb1454d8fd6415bd82938d8cf64e3b libex_tokenizers-v0.4.0-nif-2.16-x86_64-apple-darwin.so.tar.gz
6578bcaf43c24c449997354ac0439b0be5e4a27bf18de13e3c33929d514fe129 libex_tokenizers-v0.4.0-nif-2.16-x86_64-unknown-linux-gnu.so.tar.gz
043a3b36e463ea354cae656347b01b1988cae416fa46014c7c39b118fa9b36d0 libex_tokenizers-v0.4.0-nif-2.16-x86_64-unknown-linux-musl.so.tar.gz