Summary of "Bag of Tricks for Efficient Text Classification" paper

Bag of Tricks for Efficient Text Classification


  • Introduces fastText, a simple and highly efficient approach for text classification.
  • At par with deep learning models in terms of accuracy though an order of magnitude faster in performance.
  • Link to the paper
  • Link to code


  • Built on top of linear models with a rank constraint and a fast loss approximation.
  • Start with word representations that are averaged into text representation and feed them to a linear classifier.
  • Think of text representation as a hidden state that can be shared among features and classes.
  • Softmax layer to obtain a probability distribution over pre-defined classes.
  • High computational complexity O(kh), k is the number of classes and h is dimension of text representation.

Hierarchial Softmax

  • Based on Huffman Coding Tree
  • Used to reduce complexity to O(hlog(k))
  • Top T results (from the tree) can be computed efficiently O(logT) using a binary heap.

N-gram Features

  • Instead of explicitly using word order, uses a bag of n-grams to maintain efficiency without losing on accuracy.
  • Uses hashing trick to maintain fast and memory efficient mapping of the n-grams.


Sentiment Analysis

  • fastText benefits by using bigrams.
  • Outperforms char-CNN and char-CRNN and performs a bit worse than VDCNN.
  • Order of magnitudes faster in terms of training time.
  • Note: fastText does not use pre-trained word embeddings.

Tag Prediction

  • fastText with bigrams outperforms Tagspace.
  • fastText performs upto 600 times faster at test time.
