Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Simple implementation of LSTM in Tensorflow in 50 lines (+ 130 lines of data generation and comments)
"""Short and sweet LSTM implementation in Tensorflow.
Motivation:
When Tensorflow was released, adding RNNs was a bit of a hack - it required
building separate graphs for every number of timesteps and was a bit obscure
to use. Since then TF devs added things like `dynamic_rnn`, `scan` and `map_fn`.
Currently the APIs are decent, but all the tutorials that I am aware of are not
making the best use of the new APIs.
Advantages of this implementation:
- No need to specify number of timesteps ahead of time. Number of timesteps is
infered from shape of input tensor. Can use the same graph for multiple
different numbers of timesteps.
- No need to specify batch size ahead of time. Batch size is infered from shape
of input tensor. Can use the same graph for multiple different batch sizes.
- Easy to swap out different recurrent gadgets (RNN, LSTM, GRU, your new
creative idea)
"""
import numpy as np
import random
import tensorflow as tf
import tensorflow.contrib.layers as layers
map_fn = tf.python.functional_ops.map_fn
################################################################################
## DATASET GENERATION ##
## ##
## The problem we are trying to solve is adding two binary numbers. The ##
## numbers are reversed, so that the state of RNN can add the numbers ##
## perfectly provided it can learn to store carry in the state. Timestep t ##
## corresponds to bit len(number) - t. ##
################################################################################
def as_bytes(num, final_size):
res = []
for _ in range(final_size):
res.append(num % 2)
num //= 2
return res
def generate_example(num_bits):
a = random.randint(0, 2**(num_bits - 1) - 1)
b = random.randint(0, 2**(num_bits - 1) - 1)
res = a + b
return (as_bytes(a, num_bits),
as_bytes(b, num_bits),
as_bytes(res,num_bits))
def generate_batch(num_bits, batch_size):
"""Generates instance of a problem.
Returns
-------
x: np.array
two numbers to be added represented by bits.
shape: b, i, n
where:
b is bit index from the end
i is example idx in batch
n is one of [0,1] depending for first and
second summand respectively
y: np.array
the result of the addition
shape: b, i, n
where:
b is bit index from the end
i is example idx in batch
n is always 0
"""
x = np.empty((num_bits, batch_size, 2))
y = np.empty((num_bits, batch_size, 1))
for i in range(batch_size):
a, b, r = generate_example(num_bits)
x[:, i, 0] = a
x[:, i, 1] = b
y[:, i, 0] = r
return x, y
################################################################################
## GRAPH DEFINITION ##
################################################################################
INPUT_SIZE = 2 # 2 bits per timestep
RNN_HIDDEN = 20
OUTPUT_SIZE = 1 # 1 bit per timestep
TINY = 1e-6 # to avoid NaNs in logs
LEARNING_RATE = 0.01
USE_LSTM = True
inputs = tf.placeholder(tf.float32, (None, None, INPUT_SIZE)) # (time, batch, in)
outputs = tf.placeholder(tf.float32, (None, None, OUTPUT_SIZE)) # (time, batch, out)
## Here cell can be any function you want, provided it has two attributes:
# - cell.zero_state(batch_size, dtype)- tensor which is an initial value
# for state in __call__
# - cell.__call__(input, state) - function that given input and previous
# state returns tuple (output, state) where
# state is the state passed to the next
# timestep and output is the tensor used
# for infering the output at timestep. For
# example for LSTM, output is just hidden,
# but state is memory + hidden
# Example LSTM cell with learnable zero_state can be found here:
# https://gist.github.com/nivwusquorum/160d5cf7e1e82c21fad3ebf04f039317
if USE_LSTM:
cell = tf.nn.rnn_cell.BasicLSTMCell(RNN_HIDDEN, state_is_tuple=True)
else:
cell = tf.nn.rnn_cell.BasicRNNCell(RNN_HIDDEN)
# Create initial state. Here it is just a constant tensor filled with zeros,
# but in principle it could be a learnable parameter. This is a bit tricky
# to do for LSTM's tuple state, but can be achieved by creating two vector
# Variables, which are then tiled along batch dimension and grouped into tuple.
batch_size = tf.shape(inputs)[1]
initial_state = cell.zero_state(batch_size, tf.float32)
# Given inputs (time, batch, input_size) outputs a tuple
# - outputs: (time, batch, output_size) [do not mistake with OUTPUT_SIZE]
# - states: (time, batch, hidden_size)
rnn_outputs, rnn_states = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state, time_major=True)
# project output from rnn output size to OUTPUT_SIZE. Sometimes it is worth adding
# an extra layer here.
final_projection = lambda x: layers.linear(x, num_outputs=OUTPUT_SIZE, activation_fn=tf.nn.sigmoid)
# apply projection to every timestep.
predicted_outputs = map_fn(final_projection, rnn_outputs)
# compute elementwise cross entropy.
error = -(outputs * tf.log(predicted_outputs + TINY) + (1.0 - outputs) * tf.log(1.0 - predicted_outputs + TINY))
error = tf.reduce_mean(error)
# optimize
train_fn = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE).minimize(error)
# assuming that absolute difference between output and correct answer is 0.5
# or less we can round it to the correct output.
accuracy = tf.reduce_mean(tf.cast(tf.abs(outputs - predicted_outputs) < 0.5, tf.float32))
################################################################################
## TRAINING LOOP ##
################################################################################
NUM_BITS = 10
ITERATIONS_PER_EPOCH = 100
BATCH_SIZE = 16
valid_x, valid_y = generate_batch(num_bits=NUM_BITS, batch_size=100)
session = tf.Session()
# For some reason it is our job to do this:
session.run(tf.initialize_all_variables())
for epoch in range(1000):
epoch_error = 0
for _ in range(ITERATIONS_PER_EPOCH):
# here train_fn is what triggers backprop. error and accuracy on their
# own do not trigger the backprop.
x, y = generate_batch(num_bits=NUM_BITS, batch_size=BATCH_SIZE)
epoch_error += session.run([error, train_fn], {
inputs: x,
outputs: y,
})[0]
epoch_error /= ITERATIONS_PER_EPOCH
valid_accuracy = session.run(accuracy, {
inputs: valid_x,
outputs: valid_y,
})
print "Epoch %d, train error: %.2f, valid accuracy: %.1f %%" % (epoch, epoch_error, valid_accuracy * 100.0)
@ramseydsilva

This comment has been minimized.

Copy link

commented Oct 3, 2016

Neat, thanks!

@seovchinnikov

This comment has been minimized.

Copy link

commented Dec 31, 2016

After first reading I didn't get the usage of RNN_HIDDEN = 20 and OUTPUT_SIZE=1 (that was treated by me as a LSTM's output instead of extra layer's output) with INPUT_SIZE=2 together (because of extra layer and because rnn_hidden should be 2 (based on rnn_hidden=input_size+output_size )) so I think you should make an accent on this difference.

@seovchinnikov

This comment has been minimized.

Copy link

commented Dec 31, 2016

Oh, sorry, I meant LSTM's rnn_hidden's size should be equal to output_size, but input_size+output_size is a weights' 'width' (W_C's 'width')

@declanzane

This comment has been minimized.

Copy link

commented Mar 9, 2017

Is there a way to save and restore lstm cell ?

@NHQ

This comment has been minimized.

Copy link

commented Apr 3, 2017

tf.nn.rnn_cell is now tf.contrib.rnn
and tf.python.functional_ops.map_fn is now tf.map_fn`

when I ran the program, I got the following for every epoch, until I killed it:

train error: 0.00, valid accuracy: 100.0 %
@petrLorenc

This comment has been minimized.

Copy link

commented Apr 5, 2017

How can I print something like "input" -> "output"?

@griver

This comment has been minimized.

Copy link

commented Apr 15, 2017

I'm not sure about stacking up the final layer via map_fn:
predicted_outputs = map_fn(final_projection, rnn_outputs)

Doesn't this create separate weights and biases for each timestep?

@chrisplyn

This comment has been minimized.

Copy link

commented May 3, 2017

I have similar concerns as @griver, can you comment on this?

@HappyCatGo

This comment has been minimized.

Copy link

commented May 10, 2017

It was very helpful for me to understand LSTM, better than official Tensorflow tutorial mixing it with language processing. I appreciate it.

@giver yes, it creates another weight and bias that are necessary. rnn_outputs have 20 (=RNN_HIDDEN) nodes at the end (full dimension of rnn_outputs is [10, 25, 20]) and it is needed to be converted to 1 output node, so we need one more layer to convert.

@pbcquoc

This comment has been minimized.

Copy link

commented May 17, 2017

@griver, i think it is wrong if we are not share weights of the final layer

@mcemilg

This comment has been minimized.

Copy link

commented Nov 16, 2017

As @NHQ said at line 26 is old. Now it needs to be used this way map_fn = tf.map_fn.

@mrmrn

This comment has been minimized.

Copy link

commented Mar 4, 2018

I have dataest that contains 3 valuse for example (x,y) and z that is a function from (x,y) ==> z=function(x,y)
and a test set contains (x,y)
how can I use my data in this model?

@issa-s-ayoub

This comment has been minimized.

Copy link

commented Jul 13, 2018

What if the batch size in the training and development dataset is different?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.