Skip to content

Instantly share code, notes, and snippets.

Last active January 21, 2018 10:41
  • Star 15 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
Star You must be signed in to star a gist
What would you like to do?
Image Captioning LSTM


name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's NeuralTalk.



neon_version: v1.0.rc1

neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c

gist_id: 7e76e90664f935c6f65d

##Description The LSTM model is trained on the flickr8k dataset using precomputed VGG features from Model details can be found in the following CVPR-2015 paper:

Show and tell: A neural image caption generator.
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.  
CVPR, 2015 (arXiv ref. cs1411.4555)

The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights, and run:

    python --model_file [path_to_weights]

##Performance For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from and evaluating against 5 reference sentences the results are below.

BLEU Score
B-1 54.2
B-2 32.6
B-3 19.3
B-4 12.3

A few things that were not implemented are beam search, l2 regularization, and ensembles. With these things, performance would be a bit better.

Copy link

Access Denied on the model file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment