Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save shagunsodhani/9ae6d2364c278c97b1b2f4ec53255c56 to your computer and use it in GitHub Desktop.
Save shagunsodhani/9ae6d2364c278c97b1b2f4ec53255c56 to your computer and use it in GitHub Desktop.
Summary of paper "Convolutional Neural Network For Sentence Classification"

Convolutional Neural Network For Sentence Classification

Introduction

Architecture

  • Pad input sentences so that they are of the same length.
  • Map words in the padded sentence using word embeddings (which may be either initialized as zero vectors or initialized as word2vec embeddings) to obtain a matrix corresponding to the sentence.
  • Apply convolution layer with multiple filter widths and feature maps.
  • Apply max-over-time pooling operation over the feature map.
  • Concatenate the pooling results from different layers and feed to a fully-connected layer with softmax activation.
  • Softmax outputs probabilistic distribution over the labels.
  • Use dropout for regularisation.

Hyperparameters

  • RELU activation for convolution layers
  • Filter window of 3, 4, 5 with 100 feature maps each.
  • Dropout - 0.5
  • Gradient clipping at 3
  • Batch size - 50
  • Adadelta update rule.

Variants

  • CNN-rand
    • Randomly initialized word vectors.
  • CNN-static
    • Uses pre-trained vectors from word2vec and does not update the word vectors.
  • CNN-non-static
    • Same as CNN-static but updates word vectors during training.
  • CNN-multichannel
    • Uses two set of word vectors (channels).
    • One set is updated and other is not updated.

Datasets

  • Sentiment analysis datasets for Movie Reviews, Customer Reviews etc.
  • Classification data for questions.
  • Maximum number of classes for any dataset - 6

Strengths

  • Good results on benchmarks despite being a simple architecture.
  • Word vectors obtained by non-static channel have more meaningful representation.

Weakness

  • Small data with few labels.
  • Results are not very detailed or exhaustive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment