Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Summary of "Bag of Tricks for Efficient Text Classification" paper

Bag of Tricks for Efficient Text Classification


  • Introduces fastText, a simple and highly efficient approach for text classification.
  • At par with deep learning models in terms of accuracy though an order of magnitude faster in performance.
  • Link to the paper
  • Link to code


  • Built on top of linear models with a rank constraint and a fast loss approximation.
  • Start with word representations that are averaged into text representation and feed them to a linear classifier.
  • Think of text representation as a hidden state that can be shared among features and classes.
  • Softmax layer to obtain a probability distribution over pre-defined classes.
  • High computational complexity O(kh), k is the number of classes and h is dimension of text representation.

Hierarchial Softmax

  • Based on Huffman Coding Tree
  • Used to reduce complexity to O(hlog(k))
  • Top T results (from the tree) can be computed efficiently O(logT) using a binary heap.

N-gram Features

  • Instead of explicitly using word order, uses a bag of n-grams to maintain efficiency without losing on accuracy.
  • Uses hashing trick to maintain fast and memory efficient mapping of the n-grams.


Sentiment Analysis

  • fastText benefits by using bigrams.
  • Outperforms char-CNN and char-CRNN and performs a bit worse than VDCNN.
  • Order of magnitudes faster in terms of training time.
  • Note: fastText does not use pre-trained word embeddings.

Tag Prediction

  • fastText with bigrams outperforms Tagspace.
  • fastText performs upto 600 times faster at test time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment