Created
February 6, 2019 09:44
-
-
Save prateekjoshi565/d9530c91802ce3ac3b889bd4cf28e9f1 to your computer and use it in GitHub Desktop.
Sequence preparation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# function to build a tokenizer | |
def tokenization(lines): | |
tokenizer = Tokenizer() | |
tokenizer.fit_on_texts(lines) | |
return tokenizer | |
# prepare english tokenizer | |
eng_tokenizer = tokenization(deu_eng[:, 0]) | |
eng_vocab_size = len(eng_tokenizer.word_index) + 1 | |
eng_length = 8 | |
# print('English Vocabulary Size: %d' % eng_vocab_size) | |
# prepare Deutch tokenizer | |
deu_tokenizer = tokenization(deu_eng[:, 1]) | |
deu_vocab_size = len(deu_tokenizer.word_index) + 1 | |
deu_length = 8 | |
# print('Deutch Vocabulary Size: %d' % deu_vocab_size) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment