Limitations:
- for binary classification/OVR/OVO only
- suitable for documents that are not too long
Advantage:
- take class label into consideration, correct the inappropriate scaling by IDF
- better than TF-IDF in most benchmarks
Pratical advice:
- using lookup table can improve efficiency a lot. With 100k dimensions, a 1e-4-precision lookup table can reduce nearly 50% of the running time
- when using SVM as the classifier, using boolean data type and encode the feature weight in a kernel can be more effiencient in some SVM implementations.
Reference
- BNS Feature Scaling: An Improved Representation over TF·IDF for SVM Text Classification (pdf)