Skip to content

Instantly share code, notes, and snippets.

@EmergentOrder
Last active September 16, 2015 19:17
Show Gist options
  • Save EmergentOrder/21132e0b9fddb7272a62 to your computer and use it in GitHub Desktop.
Save EmergentOrder/21132e0b9fddb7272a62 to your computer and use it in GitHub Desktop.
BIDMach LR ( SPPMI vs TF-IDF)
K=3 for all SPPMI. Training data is 16k tweets sampled from the Twitter140 corpus.
SPPMI, Vector size: 500, additive
Train set accuracy: 0.58786
SPPMI, Vector size: 2500, additive
Train set accuracy: 0.67468
SPPMI, Vector size: 7500, additive
Train set accuracy: 0.76
SPPMI, Vector size: 7500, additive [exception, k = 10]
Train set accuracy: 0.79109
SPPMI, Vector size: 500, appending
Train set accuracy: 0.81155
SPPMI, Vector size: 2000, appending [exception, k = 10]
Train set accuracy: 0.91094
SPPMI, Vector size: 5000, appending [exception, k = 10]
Train set accuracy: 0.92674
SPPMI, Vector size: 7500, appending [exception, k = 10]
Train set accuracy: 0.93356
[Higher vector sizes not possible for appending due to memory limitations]
TF-IDF, Vector size: 7500
Train set accuracy: 0.87812
TF-IDF, Vector size: 12500
Train set accuracy: 0.91451
TF-IDF, Vector size: default
Train set accuracy: 0.93623
-For comparison:
VW, raw text
Train set accuracy: 0.9351875
VW, SPPMI, Vector size: 2500, additive k=3
Train set accuracy: 0.7213125
VW, SPPMI, Vector size: 500, appending, k=10, plus original text
Train set accuracy: 0.9434375
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment