Skip to content

Instantly share code, notes, and snippets.

@prateekjoshi565
Created February 6, 2019 09:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save prateekjoshi565/d9530c91802ce3ac3b889bd4cf28e9f1 to your computer and use it in GitHub Desktop.
Save prateekjoshi565/d9530c91802ce3ac3b889bd4cf28e9f1 to your computer and use it in GitHub Desktop.
Sequence preparation
# function to build a tokenizer
def tokenization(lines):
tokenizer = Tokenizer()
tokenizer.fit_on_texts(lines)
return tokenizer
# prepare english tokenizer
eng_tokenizer = tokenization(deu_eng[:, 0])
eng_vocab_size = len(eng_tokenizer.word_index) + 1
eng_length = 8
# print('English Vocabulary Size: %d' % eng_vocab_size)
# prepare Deutch tokenizer
deu_tokenizer = tokenization(deu_eng[:, 1])
deu_vocab_size = len(deu_tokenizer.word_index) + 1
deu_length = 8
# print('Deutch Vocabulary Size: %d' % deu_vocab_size)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment