Skip to content

Instantly share code, notes, and snippets.

@a-rodin
Last active Jul 6, 2020
Embed
What would you like to do?
Example of Recurrent Highway Networks (https://arxiv.org/abs/1607.03474) with PyTorch
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@emanjavacas

This comment has been minimized.

Copy link

@emanjavacas emanjavacas commented Aug 23, 2017

Hi, thanks for sharing your code. I have a question. As far as I can understand, in the paper they state that the input x^t at each timestep "is directly transformed only by the first highway layer", but in your code is being passed to all hidden highway layers, right?

gate_value = F.sigmoid(gate(x))
x = F.tanh(transform(x)) * gate_value + x * (1 - gate_value)

I was wondering whether there is some motivation for this (have you got better performance, for instance?).

@a-rodin

This comment has been minimized.

Copy link
Owner Author

@a-rodin a-rodin commented Jan 4, 2018

@emanjavacas , here x is reassigned after each step. So it is "directly transformed" only on the first step, on the following steps the result of the previous step is transformed instead.

@mourinhoxyz

This comment has been minimized.

Copy link

@mourinhoxyz mourinhoxyz commented Jan 14, 2018

@a-rodin @a-rodin
hi, thanks for the code. Inside the RHN forward function, you are iterating over [seq_len, batch_size, emb_size] i.e in def forward(self, seq, h):
for x in seq: # Here x is [batch_size, emb_size] that is passed to the RHN layer.
So, this isn't the same as iterating over elements of sequence right?

I can't understand how the d x T recurrence depth is valid in this code, since you are not iterating over individual vectors.

So, the recurrence is just the same as a feed forward layer: like previous layer's output goes into the next layer. So, the code doesn't look correct probably.

@cosmozhang

This comment has been minimized.

Copy link

@cosmozhang cosmozhang commented Feb 22, 2018

It seems the variational dropout is not yet applied here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment