toxdes/ml-model-heart-diseases.ipynb

## ml-model-heart-diseases.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "project-heart-diseases.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": false
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "metadata": {
        "id": "x0wI-3UDyxjo",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "importing all necessary modules."
      ]
    },
    {
      "metadata": {
        "id": "bkTWewrgXtCQ",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "import pickle\n",
        "from sklearn.feature_extraction.text import CountVectorizer\n",
        "from sklearn.impute import SimpleImputer\n",
        "from sklearn.model_selection import train_test_split\n",
        "from sklearn.metrics import accuracy_score\n",
        "from sklearn.naive_bayes import GaussianNB\n",
        "import numpy as np\n",
        "from math import *\n",
        "# print(\"haha works\")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "Q7-EDRYyzYpg",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "**Make sure you have `data.data` file in the same directory.**"
      ]
    },
    {
      "metadata": {
        "id": "Jd1e0a-U0eTZ",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Just some testing, if the dataset file can successfully be accessed"
      ]
    },
    {
      "metadata": {
        "id": "qFu2iAsOnnGV",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "file = open('data.data', 'r')\n",
        "file.readline()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "7_l5hWtN0ow-",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Training of the model using the [Gaussian Naive Bayes Classifier](https://en.wikipedia.org/wiki/Naive_Bayes_classifier), needs to be done only once, after which we will save the model to a local file (using pickle). And now, whenever we need to predict / classify for the features, we can just load the model and use it, thus, saving the training work. \n",
        "\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "ciWq0ReVYQ-Y",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "#load the data\n",
        "data = np.genfromtxt('./dataa.data', delimiter=',', dtype=float)\n",
        "X = data[:, range(0, 13)]\n",
        "Y = data[:, 13]\n",
        "\n",
        "#imputer was deprecated, so using the SimpleImputer as per the warning\n",
        "imp = SimpleImputer(missing_values=np.nan, strategy='median')\n",
        "\n",
        "#filter NaN values, because we don't need them?\n",
        "X = imp.fit_transform(X)\n",
        "\n",
        "#yay! splitting the training and testing the data, so we get to know what's up\n",
        "X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.15, random_state=100)\n",
        "\n",
        "# converting the y values(results column) to the one-dimentional array, because it IS a single column right? \n",
        "# seems unnecessary idk\n",
        "y_train = y_train.ravel()\n",
        "y_test = y_test.ravel()\n",
        "\n",
        "# get the classifier from sklearn\n",
        "clf = GaussianNB()\n",
        "\n",
        "# train our data\n",
        "clf.fit(X_train, y_train)\n",
        "\n",
        "# test our data\n",
        "result2 = clf.predict(X_test)\n",
        "\n",
        "\n",
        "# calculate the accuracy of our trained model\n",
        "score = accuracy_score(y_test, result2) * 100\n",
        "\n",
        "print('accuracy: {} %'.format(round(score, 3)))\n",
        "\n",
        "# we're skipping the most difficult / important part because we don't know how to do it.\n",
        "# i.e. improving the accuracy "
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "bUpB14l91zVK",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Saving the trained model, so we can use it whenever we need later. Thus, saving a lot of work. We won't need to train the model every time we need to use it. "
      ]
    },
    {
      "metadata": {
        "id": "MNqqfMgk1Jcp",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "file = open('trained_clf.pkl', 'wb')\n",
        "\n",
        "# saves the clf object to the opened file\n",
        "pickle.dump(clf, file)\n",
        "\n",
        "file.close()\n",
        "print('whew works lol')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "1e8O_gWn2HLW",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Testing the final model, where `list_of_ints` is the comments array that you had"
      ]
    },
    {
      "metadata": {
        "id": "DgCoEEPfpabQ",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "#loading the saved model\n",
        "file = open('trained_clf.pkl', 'rb')\n",
        "loaded_clf = pickle.load(file)\n",
        "\n",
        "#this is the shape you have right? 12 integers with possibly invalid values?\n",
        "list_of_ints = [53,0,4,140,250,0,2,157,0,2.6,2,2,7]\n",
        "\n",
        "#fixed this line\n",
        "wow = np.array(list_of_ints).reshape(-1,1).T\n",
        "vect=imp.transform(wow)\n",
        "wow2 = loaded_clf.predict(vect)\n",
        "\n",
        "print(wow2[0])\n",
        "\n",
        "\n"
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}

## model.py
import pickle
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.naive_bayes import GaussianNB
import numpy as np
from math import *
# print("haha works")

#testing if the dataset file exists
file = open('data.data', 'r')
file.readline()
file.close()


#training the model

#load the data
data = np.genfromtxt('./data.data', delimiter=',', dtype=float)
X = data[:, range(0, 13)]
Y = data[:, 13]

#imputer was deprecated, so using the SimpleImputer as per the warning
imp = SimpleImputer(missing_values=np.nan, strategy='median')

#filter NaN values, because we don't need them?
X = imp.fit_transform(X)

#yay! splitting the training and testing the data, so we get to know what's up
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.15, random_state=100)

# converting the y values(results column) to the one-dimentional array, because it IS a single column right?
# seems unnecessary idk
y_train = y_train.ravel()
y_test = y_test.ravel()

# get the classifier from sklearn
clf = GaussianNB()

# train our data
clf.fit(X_train, y_train)

# test our data
result2 = clf.predict(X_test)


# calculate the accuracy of our trained model
score = accuracy_score(y_test, result2) * 100

print('accuracy: {} %'.format(round(score, 3)))

# we're skipping the most difficult / important part because we don't know how to do it.
# i.e. improving the accuracy

# saving the model
file = open('trained_clf.pkl', 'wb')

# saves the clf object to the opened file
pickle.dump(clf, file)

file.close()
print('successfully trained the model lol')

## server.py
from sklearn.impute import SimpleImputer
import numpy as np
from flask import Flask ,render_template,url_for,request
import pickle

# load the trained model
file = open('trained_clf.pkl', 'rb')
clf = pickle.load(file)

#initiate Imputer
imp = SimpleImputer(missing_values=np.nan, strategy='median')

# initiate server
app=Flask(__name__)

@app.route('/')
def home():
    return render_template('home.html')

@app.route('/predict',methods=['POST'])
def predict():
    if request.method=='POST' :
        try:
	        comment1= int(request.form['comment1'])
	        comment2 = int(request.form['comment2'])
	        comment3 = int(request.form['comment3'])
	        comment4=int(request.form['comment4'])
	        comment5=int(request.form['comment5'])
	        comment6 = int(request.form['comment6'])
	        comment7 = int(request.form['comment7'])
	        comment8 = int(request.form['comment8'])
	        comment9 = int(request.form['comment9'])
	        comment10 = int(request.form['comment10'])
	        comment11 = int(request.form['comment11'])
	        comment12 = int(request.form['comment12'])
	        comment13=int(request.form['comment13'])

	        data=[comment1,comment2,comment3,comment4,comment5,comment6,comment7,comment8,comment9,comment10,comment11,comment12,comment13]
	        data=np.array(data).reshape(-1,1).T
	        vect=imp.transform(data)

	        my_prediction=clf.predict(vect)

          return render_template('results.html',prediction=my_prediction)
	    except e:
	    	print('Invalid values.')
	    	print(e)

if __name__=='__main__' :
    app.run(debug=True)
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "project-heart-diseases.ipynb",
	"version": "0.3.2",
	"provenance": [],
	"collapsed_sections": [],
	"include_colab_link": false
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	}
	},
	"cells": [
	{
	"metadata": {
	"id": "x0wI-3UDyxjo",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"importing all necessary modules."
	]
	},
	{
	"metadata": {
	"id": "bkTWewrgXtCQ",
	"colab_type": "code",
	"colab": {}
	},
	"cell_type": "code",
	"source": [
	"import pickle\n",
	"from sklearn.feature_extraction.text import CountVectorizer\n",
	"from sklearn.impute import SimpleImputer\n",
	"from sklearn.model_selection import train_test_split\n",
	"from sklearn.metrics import accuracy_score\n",
	"from sklearn.naive_bayes import GaussianNB\n",
	"import numpy as np\n",
	"from math import *\n",
	"# print(\"haha works\")"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"metadata": {
	"id": "Q7-EDRYyzYpg",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"Make sure you have `data.data` file in the same directory."
	]
	},
	{
	"metadata": {
	"id": "Jd1e0a-U0eTZ",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"Just some testing, if the dataset file can successfully be accessed"
	]
	},
	{
	"metadata": {
	"id": "qFu2iAsOnnGV",
	"colab_type": "code",
	"colab": {}
	},
	"cell_type": "code",
	"source": [
	"file = open('data.data', 'r')\n",
	"file.readline()"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"metadata": {
	"id": "7_l5hWtN0ow-",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"Training of the model using the [Gaussian Naive Bayes Classifier](https://en.wikipedia.org/wiki/Naive_Bayes_classifier), needs to be done only once, after which we will save the model to a local file (using pickle). And now, whenever we need to predict / classify for the features, we can just load the model and use it, thus, saving the training work. \n",
	"\n",
	"\n"
	]
	},
	{
	"metadata": {
	"id": "ciWq0ReVYQ-Y",
	"colab_type": "code",
	"colab": {}
	},
	"cell_type": "code",
	"source": [
	"#load the data\n",
	"data = np.genfromtxt('./dataa.data', delimiter=',', dtype=float)\n",
	"X = data[:, range(0, 13)]\n",
	"Y = data[:, 13]\n",
	"\n",
	"#imputer was deprecated, so using the SimpleImputer as per the warning\n",
	"imp = SimpleImputer(missing_values=np.nan, strategy='median')\n",
	"\n",
	"#filter NaN values, because we don't need them?\n",
	"X = imp.fit_transform(X)\n",
	"\n",
	"#yay! splitting the training and testing the data, so we get to know what's up\n",
	"X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.15, random_state=100)\n",
	"\n",
	"# converting the y values(results column) to the one-dimentional array, because it IS a single column right? \n",
	"# seems unnecessary idk\n",
	"y_train = y_train.ravel()\n",
	"y_test = y_test.ravel()\n",
	"\n",
	"# get the classifier from sklearn\n",
	"clf = GaussianNB()\n",
	"\n",
	"# train our data\n",
	"clf.fit(X_train, y_train)\n",
	"\n",
	"# test our data\n",
	"result2 = clf.predict(X_test)\n",
	"\n",
	"\n",
	"# calculate the accuracy of our trained model\n",
	"score = accuracy_score(y_test, result2) * 100\n",
	"\n",
	"print('accuracy: {} %'.format(round(score, 3)))\n",
	"\n",
	"# we're skipping the most difficult / important part because we don't know how to do it.\n",
	"# i.e. improving the accuracy "
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"metadata": {
	"id": "bUpB14l91zVK",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"Saving the trained model, so we can use it whenever we need later. Thus, saving a lot of work. We won't need to train the model every time we need to use it. "
	]
	},
	{
	"metadata": {
	"id": "MNqqfMgk1Jcp",
	"colab_type": "code",
	"colab": {}
	},
	"cell_type": "code",
	"source": [
	"file = open('trained_clf.pkl', 'wb')\n",
	"\n",
	"# saves the clf object to the opened file\n",
	"pickle.dump(clf, file)\n",
	"\n",
	"file.close()\n",
	"print('whew works lol')"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"metadata": {
	"id": "1e8O_gWn2HLW",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"Testing the final model, where `list_of_ints` is the comments array that you had"
	]
	},
	{
	"metadata": {
	"id": "DgCoEEPfpabQ",
	"colab_type": "code",
	"colab": {}
	},
	"cell_type": "code",
	"source": [
	"#loading the saved model\n",
	"file = open('trained_clf.pkl', 'rb')\n",
	"loaded_clf = pickle.load(file)\n",
	"\n",
	"#this is the shape you have right? 12 integers with possibly invalid values?\n",
	"list_of_ints = [53,0,4,140,250,0,2,157,0,2.6,2,2,7]\n",
	"\n",
	"#fixed this line\n",
	"wow = np.array(list_of_ints).reshape(-1,1).T\n",
	"vect=imp.transform(wow)\n",
	"wow2 = loaded_clf.predict(vect)\n",
	"\n",
	"print(wow2[0])\n",
	"\n",
	"\n"
	],
	"execution_count": 0,
	"outputs": []
	}
	]
	}
	import pickle
	from sklearn.feature_extraction.text import CountVectorizer
	from sklearn.impute import SimpleImputer
	from sklearn.model_selection import train_test_split
	from sklearn.metrics import accuracy_score
	from sklearn.naive_bayes import GaussianNB
	import numpy as np
	from math import *
	# print("haha works")

	#testing if the dataset file exists
	file = open('data.data', 'r')
	file.readline()
	file.close()


	#training the model

	#load the data
	data = np.genfromtxt('./data.data', delimiter=',', dtype=float)
	X = data[:, range(0, 13)]
	Y = data[:, 13]

	#imputer was deprecated, so using the SimpleImputer as per the warning
	imp = SimpleImputer(missing_values=np.nan, strategy='median')

	#filter NaN values, because we don't need them?
	X = imp.fit_transform(X)

	#yay! splitting the training and testing the data, so we get to know what's up
	X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.15, random_state=100)

	# converting the y values(results column) to the one-dimentional array, because it IS a single column right?
	# seems unnecessary idk
	y_train = y_train.ravel()
	y_test = y_test.ravel()

	# get the classifier from sklearn
	clf = GaussianNB()

	# train our data
	clf.fit(X_train, y_train)

	# test our data
	result2 = clf.predict(X_test)


	# calculate the accuracy of our trained model
	score = accuracy_score(y_test, result2) * 100

	print('accuracy: {} %'.format(round(score, 3)))

	# we're skipping the most difficult / important part because we don't know how to do it.
	# i.e. improving the accuracy

	# saving the model
	file = open('trained_clf.pkl', 'wb')

	# saves the clf object to the opened file
	pickle.dump(clf, file)

	file.close()
	print('successfully trained the model lol')
	from sklearn.impute import SimpleImputer
	import numpy as np
	from flask import Flask ,render_template,url_for,request
	import pickle

	# load the trained model
	file = open('trained_clf.pkl', 'rb')
	clf = pickle.load(file)

	#initiate Imputer
	imp = SimpleImputer(missing_values=np.nan, strategy='median')

	# initiate server
	app=Flask(__name__)

	@app.route('/')
	def home():
	return render_template('home.html')

	@app.route('/predict',methods=['POST'])
	def predict():
	if request.method=='POST' :
	try:
	comment1= int(request.form['comment1'])
	comment2 = int(request.form['comment2'])
	comment3 = int(request.form['comment3'])
	comment4=int(request.form['comment4'])
	comment5=int(request.form['comment5'])
	comment6 = int(request.form['comment6'])
	comment7 = int(request.form['comment7'])
	comment8 = int(request.form['comment8'])
	comment9 = int(request.form['comment9'])
	comment10 = int(request.form['comment10'])
	comment11 = int(request.form['comment11'])
	comment12 = int(request.form['comment12'])
	comment13=int(request.form['comment13'])

	data=[comment1,comment2,comment3,comment4,comment5,comment6,comment7,comment8,comment9,comment10,comment11,comment12,comment13]
	data=np.array(data).reshape(-1,1).T
	vect=imp.transform(data)

	my_prediction=clf.predict(vect)

	return render_template('results.html',prediction=my_prediction)
	except e:
	print('Invalid values.')
	print(e)

	if __name__=='__main__' :
	app.run(debug=True)