Skip to content

Instantly share code, notes, and snippets.

@pythonlessons
Created September 4, 2023 15:04
Show Gist options
  • Save pythonlessons/38aa4854da112de61d5d6260763321ae to your computer and use it in GitHub Desktop.
Save pythonlessons/38aa4854da112de61d5d6260763321ae to your computer and use it in GitHub Desktop.
transformers_training
# prepare spanish tokenizer, this is the input language
tokenizer = CustomTokenizer(char_level=True)
tokenizer.fit_on_texts(es_training_data)
tokenizer.save(configs.model_path + "/tokenizer.json")
# prepare english tokenizer, this is the output language
detokenizer = CustomTokenizer(char_level=True)
detokenizer.fit_on_texts(en_training_data)
detokenizer.save(configs.model_path + "/detokenizer.json")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment