Skip to content

Instantly share code, notes, and snippets.

@nervanazoo
Last active February 9, 2016 22:08
Show Gist options
  • Save nervanazoo/9b276eaee644d723f4b6 to your computer and use it in GitHub Desktop.
Save nervanazoo/9b276eaee644d723f4b6 to your computer and use it in GitHub Desktop.
neon LSTM image caption implementation

##Information

name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's NeuralTalk.

model_script: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/imagecaption/lstm/image_caption.py

model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/imagecaption/lstm/image_caption_flickr8k.p

neon_commit: e7ab2c2e2

##Description The LSTM model is trained on the flickr8k dataset using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following CVPR-2015 paper:

Show and tell: A neural image caption generator.
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.  
CVPR, 2015 (arXiv ref. cs1411.4555)

The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights, and run:

    python image_caption.py --model_file [path_to_weights]

To train the model from scratch for 15 epochs use the command:

 python image_caption.py -i 1 -e 15 -s image_caption_flickr8k.p

##Performance For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below.

BLEU Score
B-1 52.0
B-2 34.0
B-3 21.5
B-4 13.9

A few things that were not implemented are beam search, l2 regularization, and ensembles. With these things, performance would be a bit better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment