Skip to content

Instantly share code, notes, and snippets.

@hamelsmu
Last active January 16, 2018 07:26
Show Gist options
  • Save hamelsmu/eb30d598559c4ef29d2358f63064b5e3 to your computer and use it in GitHub Desktop.
Save hamelsmu/eb30d598559c4ef29d2358f63064b5e3 to your computer and use it in GitHub Desktop.
def extract_decoder_model(model):
"""
Extract the decoder from the original model.
Inputs:
------
model: keras model object
Returns:
-------
A Keras model object with the following inputs and outputs:
Inputs of Keras Model That Is Returned:
1: the embedding index for the last predicted word or the <Start> indicator
2: the last hidden state, or in the case of the first word the hidden state
from the encoder
Outputs of Keras Model That Is Returned:
1. Prediction (class probabilities) for the next word
2. The hidden state of the decoder, to be fed back into the decoder at the
next time step
Implementation Notes:
----------------------
Must extract relevant layers and reconstruct part of the computation graph
to allow for different inputs as we are not going to use teacher forcing at
inference time.
"""
# the latent dimension is the same throughout the architecture so we are going to
# cheat and grab the latent dimension of the embedding because that is the same as
# what is output from the decoder
latent_dim = model.get_layer('Decoder-Word-Embedding').output_shape[-1]
# Reconstruct the input into the decoder
decoder_inputs = model.get_layer('Decoder-Input').input
dec_emb = model.get_layer('Decoder-Word-Embedding')(decoder_inputs)
dec_bn = model.get_layer('Decoder-Batchnorm-1')(dec_emb)
# Instead of setting the intial state from the encoder and forgetting about it,
# during inference we are not doing teacher forcing, so we will have to have a
# feedback loop from predictions back into the GRU, thus we define this input
# layer for the state so we can add this capability
gru_inference_state_input = Input(shape=(latent_dim,), name='hidden_state_input')
# we need to reuse the weights that is why we are getting this
# If you inspect the decoder GRU that we created for training, it will take as
# input 2 tensors -> (1) is the embedding layer output for the teacher forcing,
# which will now be the last step's prediction, and will be
# _start_ on the first time step.
# (2) is the state, which we will initialize with the encoder
# on the first time step, but then grab the state after the
# first prediction and feed that back in again.
gru_out, gru_state_out = model.get_layer('Decoder-GRU')([dec_bn,
gru_inference_state_input])
# Reconstruct dense layers
dec_bn2 = model.get_layer('Decoder-Batchnorm-2')(gru_out)
dense_out = model.get_layer('Final-Output-Dense')(dec_bn2)
decoder_model = Model([decoder_inputs, gru_inference_state_input],
[dense_out, gru_state_out])
return decoder_model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment