Skip to content

Instantly share code, notes, and snippets.

@edumunozsala
Created October 11, 2020 16:05
Show Gist options
  • Save edumunozsala/6013e33335f70030e0760fa6457db940 to your computer and use it in GitHub Desktop.
Save edumunozsala/6013e33335f70030e0760fa6457db940 to your computer and use it in GitHub Desktop.
Create the vocabularies for the seq2seq model
# get the word to index mapping for input language
word2idx_inputs = tokenizer_inputs.word_index
print('Found %s unique input tokens.' % len(word2idx_inputs))
# get the word to index mapping for output language
word2idx_outputs = tokenizer_outputs.word_index
print('Found %s unique output tokens.' % len(word2idx_outputs))
# store number of output and input words for later
# remember to add 1 since indexing starts at 1
num_words_output = len(word2idx_outputs) + 1
num_words_inputs = len(word2idx_inputs) + 1
# map indexes back into real words
# so we can view the results
idx2word_inputs = {v:k for k, v in word2idx_inputs.items()}
idx2word_outputs = {v:k for k, v in word2idx_outputs.items()}
@AliHaiderAhmad001
Copy link

You never declare tokenizer_outputs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment