Skip to content

Instantly share code, notes, and snippets.

@DhavalThkkar
Created May 14, 2017 19:39
Show Gist options
  • Save DhavalThkkar/2bb566b7d02a2d26bae99d662f8b8f2f to your computer and use it in GitHub Desktop.
Save DhavalThkkar/2bb566b7d02a2d26bae99d662f8b8f2f to your computer and use it in GitHub Desktop.
import pandas as pd
import numpy as np
import sys
from datetime import datetime
# Slearn for easy label encoding, textblob for senitment reference
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import StratifiedKFold
from sklearn.cross_validation import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import roc_curve, auc
# Keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import SGD
# ML Models
from sklearn.utils import shuffle
# NLP toolkits
import nltk, re
from nltk.stem import WordNetLemmatizer, PorterStemmer
#nltk.download()
import string
/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
Using TensorFlow backend.
Importing Data...
Cleaning Tweets... 116%Elapsed Time: {}'.format(datetime.now()-t0))Cleaning Tweets...
Training Models...
__________________________________________________________________
k-1: Resampling Accuracy: 73.8% Original Accuracy: 72.1%
Resampling AUC Score: 0.819 Original AUC Score: 0.807
__________________________________________________________________
k-2: Resampling Accuracy: 73.1% Original Accuracy: 72.8%
Resampling AUC Score: 0.804 Original AUC Score: 0.809
__________________________________________________________________
k-3: Resampling Accuracy: 75.6% Original Accuracy: 73.8%
Resampling AUC Score: 0.811 Original AUC Score: 0.813
__________________________________________________________________
k-4: Resampling Accuracy: 74.8% Original Accuracy: 72.8%
Resampling AUC Score: 0.816 Original AUC Score: 0.809
__________________________________________________________________
k-5: Resampling Accuracy: 73.2% Original Accuracy: 72.6%
Resampling AUC Score: 0.793 Original AUC Score: 0.809
__________________________________________________________________
k-6: Resampling Accuracy: 73.6% Original Accuracy: 73.3%
Resampling AUC Score: 0.805 Original AUC Score: 0.810
__________________________________________________________________
k-7: Resampling Accuracy: 72.2% Original Accuracy: 72.1%
Resampling AUC Score: 0.793 Original AUC Score: 0.806
__________________________________________________________________
k-8: Resampling Accuracy: 75.3% Original Accuracy: 72.7%
Resampling AUC Score: 0.825 Original AUC Score: 0.809
__________________________________________________________________
k-9: Resampling Accuracy: 73.7% Original Accuracy: 72.8%
Resampling AUC Score: 0.812 Original AUC Score: 0.806
__________________________________________________________________
k-10: Resampling Accuracy: 74.2% Original Accuracy: 72.9%
Resampling AUC Score: 0.810 Original AUC Score: 0.811
Total Elapsed Time: 0:00:14.850715
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment