Created
April 30, 2021 21:51
-
-
Save avidale/44cd35bfcdaf8bedf51d97c468cc8001 to your computer and use it in GitHub Desktop.
create_rut5-base.ipynb
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@Nehc
As I can judge from the HF documentation, XLMRobertaTokenizer is based on SentencePiece, just like T5Tokenizer. Thus, in principle, the approach should work; I don't see any fundamental reasons why it wouldn't.
Nevertheless, the specific details, such as model parameter names, tokenizer parameter names, special tokens etc. may differ between T5 and XLMRoberta, so my code will surely need some adaptation to work with E5.