Skip to content

Instantly share code, notes, and snippets.

@espeed
Created September 30, 2015 02:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save espeed/bcc5997f015f30875bda to your computer and use it in GitHub Desktop.
Save espeed/bcc5997f015f30875bda to your computer and use it in GitHub Desktop.
NLTK Trainer NaiveBayes classifier example with most-informative features
python ./nltk-trainer/train_classifier.py ./data/googleNews --instances files --fraction 0.75 --min_score 2 --ngrams 1 2 3 --show-most-informative 10 --classifier NaiveBayes
loading ./data/googleNews
2 labels: ['neg', 'pos']
calculating word scores
using bag of words from known set feature extraction
1682507 words meet min_score and/or max_feats
37116 training feats, 12371 testing feats
training NaiveBayes classifier
accuracy: 0.582815
neg precision: 0.659028
neg recall: 0.466366
neg f-measure: 0.546206
pos precision: 0.535910
pos recall: 0.718613
pos f-measure: 0.613958
10 most informative features
Most Informative Features
(u'9', u'google', u'inc') = True neg : pos = 50.9 : 1.0
(u'sell', u'9', u'eurusd') = True neg : pos = 50.9 : 1.0
(u'9', u'eurusd') = True neg : pos = 50.9 : 1.0
(u'sell', u'9', u'google') = True neg : pos = 50.9 : 1.0
(u'9', u'google') = True neg : pos = 50.9 : 1.0
(u'4862', u'003') = True neg : pos = 34.6 : 1.0
(u'oil', u'4862', u'003') = True neg : pos = 34.6 : 1.0
12445 = True neg : pos = 31.4 : 1.0
(u'1253', u'other', u'news') = True neg : pos = 31.2 : 1.0
(u'1253', u'other') = True neg : pos = 31.2 : 1.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment