- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15 , 1929–1958: http://jmlr.org/papers/v15/srivastava14a.html
-
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification: https://arxiv.org/abs/1502.01852
-
Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks: http://proceedings.mlr.press/v9/glorot10a.html
-
RMSprop: http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
-
Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization: https://arxiv.org/abs/1412.6980
-
Dauphin, Y., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., and Bengio, Y. (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization: https://arxiv.org/abs/1406.2572
-
Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. J. Machine Learning Res., 13 , 281–305: http://www.jmlr.org/papers/v13/bergstra12a.html
-
Bergstra, J, et. al. Algorithms for Hyper-Parameter Optimization. Advances in Neural Information Processing Systems (pp. 2546-2554): http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf
- Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift: https://arxiv.org/abs/1502.03167
-
LeNet-5: LeCun et al, "Gradient-Based Learning Applied to Document Recognition": http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
-
AlexNet: Krizhevsky et al, "ImageNet Classification with Deep Convolutional Neural Networks": https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
-
VGG-16: Simonyan et al, "Very Deep Convolutional Networks for Large-Scale Image Recognition": https://arxiv.org/pdf/1409.1556.pdf
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, "Deep Residual Learning for Image Recognition": https://arxiv.org/abs/1512.03385
- Min Lin, Qiang Chen, Shuicheng Yan, "Network In Network": https://arxiv.org/abs/1312.4400
- Christian Szegedy, and lots of others, "Going Deeper with Convolutions": https://arxiv.org/abs/1409.4842
- Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun, "OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks": https://arxiv.org/abs/1312.6229
-
Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, "You Only Look Once: Unified, Real-Time Object Detection": https://arxiv.org/abs/1506.02640
-
Joseph Redmon, Ali Farhadi, "YOLO9000: Better, Faster, Stronger": https://arxiv.org/abs/1612.08242
-
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation": https://arxiv.org/abs/1311.2524
-
Ross Girshick, "Fast R-CNN": https://arxiv.org/abs/1504.08083
-
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks": https://arxiv.org/abs/1506.01497
- Taigman et al, "DeepFace: Closing the Gap to Human-Level Performance in Face Verification": https://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf
- Florian Schroff, Dmitry Kalenichenko, James Philbin, "FaceNet: A Unified Embedding for Face Recognition and Clustering": https://arxiv.org/abs/1503.03832
- Matthew D Zeiler, Rob Fergus, "Visualizing and Understanding Convolutional Networks": https://arxiv.org/abs/1311.2901
- Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, "A Neural Algorithm of Artistic Style": https://arxiv.org/abs/1508.06576
- Log0, TensorFlow Implementation of "A Neural Algorithm of Artistic Style": http://www.chioka.in/tensorflow-implementation-neural-algorithm-of-artistic-style
- Harish Narayanan, "Convolutional neural networks for artistic style transfer": https://harishnarayanan.org/writing/artistic-style-transfer/
-
Chung J, Gulcehre C, Cho K, Bengio Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:14123555. December 2014: http://arxiv.org/abs/1412.3555.
-
Cho K, van Merrienboer B, Bahdanau D, Bengio Y. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. arXiv:14091259. September 2014: http://arxiv.org/abs/1409.1259.
-
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735-1780: http://www.bioinf.jku.at/publications/older/2604.pdf
-
Andrej Karpathy. The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Published 2015.
-
Pennington J, Socher R, Manning C. Glove: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics; 2014:1532-1543: https://nlp.stanford.edu/pubs/glove.pdf
-
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed Representations of Words and Phrases and their Compositionality. arXiv:13104546. October 2013: http://arxiv.org/abs/1310.4546.
-
Mikolov T, Chen K, Corrado G, Dean J. Efficient Estimation of Word Representations in Vector Space. arXiv:13013781. January 2013: http://arxiv.org/abs/1301.3781.
-
Mikolov T, Yih W, Zweig G. Linguistic Regularities in Continuous Space Word Representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics; 2013:746–751: http://aclweb.org/anthology/N13-1090.
-
Maaten L van der, Hinton GE. Visualizing Data using t-SNE. In: ; 2008.
-
Bengio Y, Ducharme R, Vincent P, Jauvin C. A Neural Probabilistic Language Model. 2003:19.
-
Bolukbasi T, Chang K-W, Zou J, Saligrama V, Kalai A. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. July 2016: https://arxiv.org/abs/1607.06520v1.
-
Cho K, van Merrienboer B, Gulcehre C, et al. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv:14061078. June 2014: http://arxiv.org/abs/1406.1078.
-
Sutskever I, Vinyals O, Le QV. Sequence to Sequence Learning with Neural Networks. In: Proc. NIPS. Montreal, CA; 2014. http://arxiv.org/abs/1409.3215.
-
Papineni K, Roukos S, Ward T, Zhu W-J. Bleu: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics; 2002:311–318. doi:10.3115/1073083.1073135: https://www.aclweb.org/anthology/P02-1040.pdf
-
Mao J, Xu W, Yang Y, Wang J, Huang Z, Yuille A. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN). arXiv:14126632. December 2014: http://arxiv.org/abs/1412.6632.
-
Karpathy A, Fei-Fei L. Deep Visual-Semantic Alignments for Generating Image Descriptions. December 2014: https://arxiv.org/abs/1412.2306v2.
-
Vinyals O, Toshev A, Bengio S, Erhan D. Show and Tell: A Neural Image Caption Generator. arXiv:14114555. November 2014: http://arxiv.org/abs/1411.4555.
-
Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:14090473. September 2014: http://arxiv.org/abs/1409.0473.
-
Xu K, Ba J, Kiros R, et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv:150203044. February 2015: http://arxiv.org/abs/1502.03044.
- Graves A, Fernandez S, Gomez F, Schmidhuber J. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. 2006:8: ftp://ftp.idsia.ch/pub/juergen/icml2006.pdf