Skip to content

Instantly share code, notes, and snippets.

@py-ranoid
Last active December 27, 2018 05:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save py-ranoid/50d09688fe892592ed992838e11dfb64 to your computer and use it in GitHub Desktop.
Save py-ranoid/50d09688fe892592ed992838e11dfb64 to your computer and use it in GitHub Desktop.
Markdown Content for my slides on RNNs and LSTMs. To present as a slideshow refer : http://vishalgupta.me/md-slides/

Can Neural Networks remember ?

Resources : http://bit.ly/SeqModelsResources

A gunshot.

"This is it", she thought

She took a deep breath and prayed.

9 seconds later...

she was World Sprint Champion.

Can't decide sentiment without considering all sentences.

Can't classify activity with just one frame of a video.

Can't generate/classify a second of music without considering the previous seconds.

Traditional neural networks can’t remember.

Recurrent Neural Networks can.

Conventional Neural Networks

  def __init__(self, input_size, hidden_size, output_size):
      self.in2out = nn.Linear(input_size, output_size)
      self.softmax = nn.LogSoftmax(dim=1)

  def forward(self, input):
      output = self.in2out(input)
      output = self.softmax(output)
      return output, hidden

Recurrent Neural Networks

  def __init__(self, input_size, hidden_size, output_size):
      self.hid2hid = nn.Linear(hidden_size, hidden_size)
      self.in2hid = nn.Linear(input_size,hidden_size)
      self.tanh = nn.Tanh()

  def forward(self, input, hidden):
      hidden = self.tanh(self.in2hid(input) + self.hid2hid(hidden))
      return hidden


Hidden state is representative of entire sequence

Types of Sequential Models

1 : One-to-One


Vanilla Neural Network
*Image Classification*

2 : One-to-Many


Sequence output
*Image Captioning*

3 : Many-to-One


Sequence output
*Sentiment Analysis*

4a : Many-to-Many


Sequence input and sequence output
*Encoder Decoder. Translation.*

4b : Many-to-Many


Synced sequence input and output
*Wakeword detection.*

Long and Short term dependencies

Consider the sentence...

I grew up in Chennai... I speak X.


Where X is the word we are trying to predict.

I grew up in Chennai... I speak X.


speak --> X

  • X must be a language.
  • Close to X, hence short-term dependency

Chennai --> X

  • X must be relevant to Chennai.
  • Far from X, hence long-term dependency

The problem with (Vanilla) RNN

  • In theory, RNNs are capable of handling “long-term dependencies.”
  • In practice, RNNs don’t seem to be able to learn them.

Solution ?

  • We need to decide what information to keep or remove at every timestep

  • We need gates

  • If the gate is a function...

  • Discard information, if gate value is 0

  • Allow information, if gate value is 1

How do you define a gate ?

  • pointwise multiplication operation

  • To decide if X should be allowed through the gate.

    • gate(X) = W*X + B
  • But gate value needs to be between 0 and 1..

  • Apply sigmoid function to gate value

    • gate(X) = sigmoid(W*X + B)

Elements of LSTM

  • Input
  • Hidden State
  • Cell State

Gates of LSTM

Forget gate (f_t)

  • Decide what information to get rid of
  • Input : Hidden state and input
  • 1 : Keep data
  • 0 : Discard data completely

Forget gate (f_t)

  • For example :
    • To determine gender of pronoun
    • We need to remember the gender of last subject
    • So, if we come across new subject, forget old gender

Input gate (i_t)

  • Decide what new information to store in cell state
  • Input : Hidden state and input
  • 1 : Input is important
  • 0 : Input doesn't matter

Input gate (i_t)

  • Layer has 2 parts
    • Sigmoid layer (input gate layer) which decides which values we’ll update
    • Tanh layer that creates a vector of new candidate values, Ct

Input gate (f_t)

  • For example :
    • To determine gender of pronoun
    • We need to remember the gender of last subject
    • So, if we come across new subject, update new subject's gender

Update/Output gate

We know

  • How much to update (i_t)
  • How much to forgot (f_t)

Next Step ?
Update the value

Update/Output gate

- `C``t-1` : Previous Cell State - `C``t` : Current Cell State

Determine output using hidden and cell state.

Character-Level Models

  • Encoding characters
    • 27 possible chars (A-Z + \n)
    • Character --> [27 x 1] vector
  • Input
    • Sequence of characters
    • Input for each timestep --> [27 x 1] char-vector
  • Output
    • Single character for each timestep
    • Character that will follow the sequence of input chars

Colab Notebook : bit.ly/ColabCharLSTM

  • Character-Level LSTM (Keras) to generate names
  • Bonus : Using the LSTM to join names
    • Brad + Angelina = Brangelina
    • Char + Lizard = Charizard
    • Britain + Exit = Brexit

📋 Slides Content as Markdown ➕ Follow me on Github 👨‍💼 Connect on LinkedIn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment