shagunsodhani/Bag of Tricks for Efficient Text Classification.md

## Bag of Tricks for Efficient Text Classification.md

      
    Raw
  

              Bag of Tricks for Efficient Text Classification.md
            
          
    Bag of Tricks for Efficient Text Classification

Introduction


Introduces fastText, a simple and highly efficient approach for text classification.
At par with deep learning models in terms of accuracy though an order of magnitude faster in performance.
Link to the paper
Link to code

Architecture


Built on top of linear models with a rank constraint and a fast loss approximation.
Start with word representations that are averaged into text representation and feed them to a linear classifier.
Think of text representation as a hidden state that can be shared among features and classes.
Softmax layer to obtain a probability distribution over pre-defined classes.
High computational complexity O(kh), k is the number of classes and h is dimension of text representation.

Hierarchial Softmax


Based on Huffman Coding Tree
Used to reduce complexity to O(hlog(k))
Top T results (from the tree) can be computed efficiently O(logT) using a binary heap.

N-gram Features


Instead of explicitly using word order, uses a bag of n-grams to maintain efficiency without losing on accuracy.
Uses hashing trick to maintain fast and memory efficient mapping of the n-grams.

Experiments

Sentiment Analysis


fastText benefits by using bigrams.
Outperforms char-CNN and char-CRNN and performs a bit worse than VDCNN.
Order of magnitudes faster in terms of training time.
Note: fastText does not use pre-trained word embeddings.

Tag Prediction


fastText with bigrams outperforms Tagspace.
fastText performs upto 600 times faster at test time.