Skip to content

Instantly share code, notes, and snippets.

@toxdes
Last active March 16, 2019 20:47
Show Gist options
  • Save toxdes/64c7f4009bd6ed5ab580783cbe71c507 to your computer and use it in GitHub Desktop.
Save toxdes/64c7f4009bd6ed5ab580783cbe71c507 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "project-heart-diseases.ipynb",
"version": "0.3.2",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": false
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"metadata": {
"id": "x0wI-3UDyxjo",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"importing all necessary modules."
]
},
{
"metadata": {
"id": "bkTWewrgXtCQ",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"import pickle\n",
"from sklearn.feature_extraction.text import CountVectorizer\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.metrics import accuracy_score\n",
"from sklearn.naive_bayes import GaussianNB\n",
"import numpy as np\n",
"from math import *\n",
"# print(\"haha works\")"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Q7-EDRYyzYpg",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"**Make sure you have `data.data` file in the same directory.**"
]
},
{
"metadata": {
"id": "Jd1e0a-U0eTZ",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Just some testing, if the dataset file can successfully be accessed"
]
},
{
"metadata": {
"id": "qFu2iAsOnnGV",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"file = open('data.data', 'r')\n",
"file.readline()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "7_l5hWtN0ow-",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Training of the model using the [Gaussian Naive Bayes Classifier](https://en.wikipedia.org/wiki/Naive_Bayes_classifier), needs to be done only once, after which we will save the model to a local file (using pickle). And now, whenever we need to predict / classify for the features, we can just load the model and use it, thus, saving the training work. \n",
"\n",
"\n"
]
},
{
"metadata": {
"id": "ciWq0ReVYQ-Y",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"#load the data\n",
"data = np.genfromtxt('./dataa.data', delimiter=',', dtype=float)\n",
"X = data[:, range(0, 13)]\n",
"Y = data[:, 13]\n",
"\n",
"#imputer was deprecated, so using the SimpleImputer as per the warning\n",
"imp = SimpleImputer(missing_values=np.nan, strategy='median')\n",
"\n",
"#filter NaN values, because we don't need them?\n",
"X = imp.fit_transform(X)\n",
"\n",
"#yay! splitting the training and testing the data, so we get to know what's up\n",
"X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.15, random_state=100)\n",
"\n",
"# converting the y values(results column) to the one-dimentional array, because it IS a single column right? \n",
"# seems unnecessary idk\n",
"y_train = y_train.ravel()\n",
"y_test = y_test.ravel()\n",
"\n",
"# get the classifier from sklearn\n",
"clf = GaussianNB()\n",
"\n",
"# train our data\n",
"clf.fit(X_train, y_train)\n",
"\n",
"# test our data\n",
"result2 = clf.predict(X_test)\n",
"\n",
"\n",
"# calculate the accuracy of our trained model\n",
"score = accuracy_score(y_test, result2) * 100\n",
"\n",
"print('accuracy: {} %'.format(round(score, 3)))\n",
"\n",
"# we're skipping the most difficult / important part because we don't know how to do it.\n",
"# i.e. improving the accuracy "
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "bUpB14l91zVK",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Saving the trained model, so we can use it whenever we need later. Thus, saving a lot of work. We won't need to train the model every time we need to use it. "
]
},
{
"metadata": {
"id": "MNqqfMgk1Jcp",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"file = open('trained_clf.pkl', 'wb')\n",
"\n",
"# saves the clf object to the opened file\n",
"pickle.dump(clf, file)\n",
"\n",
"file.close()\n",
"print('whew works lol')"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "1e8O_gWn2HLW",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Testing the final model, where `list_of_ints` is the comments array that you had"
]
},
{
"metadata": {
"id": "DgCoEEPfpabQ",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"#loading the saved model\n",
"file = open('trained_clf.pkl', 'rb')\n",
"loaded_clf = pickle.load(file)\n",
"\n",
"#this is the shape you have right? 12 integers with possibly invalid values?\n",
"list_of_ints = [53,0,4,140,250,0,2,157,0,2.6,2,2,7]\n",
"\n",
"#fixed this line\n",
"wow = np.array(list_of_ints).reshape(-1,1).T\n",
"vect=imp.transform(wow)\n",
"wow2 = loaded_clf.predict(vect)\n",
"\n",
"print(wow2[0])\n",
"\n",
"\n"
],
"execution_count": 0,
"outputs": []
}
]
}
import pickle
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.naive_bayes import GaussianNB
import numpy as np
from math import *
# print("haha works")
#testing if the dataset file exists
file = open('data.data', 'r')
file.readline()
file.close()
#training the model
#load the data
data = np.genfromtxt('./data.data', delimiter=',', dtype=float)
X = data[:, range(0, 13)]
Y = data[:, 13]
#imputer was deprecated, so using the SimpleImputer as per the warning
imp = SimpleImputer(missing_values=np.nan, strategy='median')
#filter NaN values, because we don't need them?
X = imp.fit_transform(X)
#yay! splitting the training and testing the data, so we get to know what's up
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.15, random_state=100)
# converting the y values(results column) to the one-dimentional array, because it IS a single column right?
# seems unnecessary idk
y_train = y_train.ravel()
y_test = y_test.ravel()
# get the classifier from sklearn
clf = GaussianNB()
# train our data
clf.fit(X_train, y_train)
# test our data
result2 = clf.predict(X_test)
# calculate the accuracy of our trained model
score = accuracy_score(y_test, result2) * 100
print('accuracy: {} %'.format(round(score, 3)))
# we're skipping the most difficult / important part because we don't know how to do it.
# i.e. improving the accuracy
# saving the model
file = open('trained_clf.pkl', 'wb')
# saves the clf object to the opened file
pickle.dump(clf, file)
file.close()
print('successfully trained the model lol')
from sklearn.impute import SimpleImputer
import numpy as np
from flask import Flask ,render_template,url_for,request
import pickle
# load the trained model
file = open('trained_clf.pkl', 'rb')
clf = pickle.load(file)
#initiate Imputer
imp = SimpleImputer(missing_values=np.nan, strategy='median')
# initiate server
app=Flask(__name__)
@app.route('/')
def home():
return render_template('home.html')
@app.route('/predict',methods=['POST'])
def predict():
if request.method=='POST' :
try:
comment1= int(request.form['comment1'])
comment2 = int(request.form['comment2'])
comment3 = int(request.form['comment3'])
comment4=int(request.form['comment4'])
comment5=int(request.form['comment5'])
comment6 = int(request.form['comment6'])
comment7 = int(request.form['comment7'])
comment8 = int(request.form['comment8'])
comment9 = int(request.form['comment9'])
comment10 = int(request.form['comment10'])
comment11 = int(request.form['comment11'])
comment12 = int(request.form['comment12'])
comment13=int(request.form['comment13'])
data=[comment1,comment2,comment3,comment4,comment5,comment6,comment7,comment8,comment9,comment10,comment11,comment12,comment13]
data=np.array(data).reshape(-1,1).T
vect=imp.transform(data)
my_prediction=clf.predict(vect)
return render_template('results.html',prediction=my_prediction)
except e:
print('Invalid values.')
print(e)
if __name__=='__main__' :
app.run(debug=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment