Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Summary of paper "Convolutional Neural Network For Sentence Classification"

Convolutional Neural Network For Sentence Classification

Introduction

Architecture

  • Pad input sentences so that they are of the same length.
  • Map words in the padded sentence using word embeddings (which may be either initialized as zero vectors or initialized as word2vec embeddings) to obtain a matrix corresponding to the sentence.
  • Apply convolution layer with multiple filter widths and feature maps.
  • Apply max-over-time pooling operation over the feature map.
  • Concatenate the pooling results from different layers and feed to a fully-connected layer with softmax activation.
  • Softmax outputs probabilistic distribution over the labels.
  • Use dropout for regularisation.

Hyperparameters

  • RELU activation for convolution layers
  • Filter window of 3, 4, 5 with 100 feature maps each.
  • Dropout - 0.5
  • Gradient clipping at 3
  • Batch size - 50
  • Adadelta update rule.

Variants

  • CNN-rand
    • Randomly initialized word vectors.
  • CNN-static
    • Uses pre-trained vectors from word2vec and does not update the word vectors.
  • CNN-non-static
    • Same as CNN-static but updates word vectors during training.
  • CNN-multichannel
    • Uses two set of word vectors (channels).
    • One set is updated and other is not updated.

Datasets

  • Sentiment analysis datasets for Movie Reviews, Customer Reviews etc.
  • Classification data for questions.
  • Maximum number of classes for any dataset - 6

Strengths

  • Good results on benchmarks despite being a simple architecture.
  • Word vectors obtained by non-static channel have more meaningful representation.

Weakness

  • Small data with few labels.
  • Results are not very detailed or exhaustive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.