- Add
Tokenizer.save_vocab
andTokenizer.load_vocab
methods to save/load vocabulary to a json file calledvocab.tokenizer.json
by default - Add
Tokenizer.save_stopwords
andTokenizer.load_stopwords
methods to save/load stopwords to a json file calledstopwords.tokenizer.json
by default - Add
TokenizerHF
class to allow saving/loading from huggingface hub- New function:
load_vocab_from_hub
,save_vocab_to_hub
,load_stopwords_from_hub
,save_stopwords_to_hub
- New function:
New tests and examples were added (see
examples/index_to_hf.py
andexamples/tokenizer_class.py
)