JJRyan0/Logistic Regression with TensorFlow.ipynb

## Logistic Regression with TensorFlow.ipynb
{
  "cells": [
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "\n<h1 align=center><font size = 5> LOGISTIC REGRESSION WITH TENSORFLOW </font></h1>"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Table of Contents\n\nLogistic Regression is one of most important techniques in data science. It is usually used to solve the classic classification problem.\n\n<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n<font size = 3><strong>This lesson covers the following concepts of Logistics Regression:</strong></font>\n<br>\n- <p><a href=\"#ref1\">Linear Regression vs Logistic Regression</a></p>\n- <p><a href=\"#ref2\">Utilizing Logistic Regression in TensorFlow</a></p>\n- <p><a href=\"#ref3\">Training</a></p>\n<p></p>\n</div>\n----------------"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "<a id=\"ref1\"></a>\n## What is different between Linear and Logistic Regression?\n\nWhile Linear Regression is suited for estimating continuous values (e.g. estimating house price), it isn’t the best tool for predicting the class of an observed data point. In order to estimate a classification, we need some sort of guidance on what would be the **most probable class** for that data point. For this, we use **Logistic Regression**.\n\n<div class=\"alert alert-success alertsuccess\" style=\"margin-top: 20px\">\n<font size = 3><strong>Recall linear regression:</strong></font>\n<br>\n<br>\nLinear regression finds a function that relates a continuous dependent variable, _y_, to some predictors (independent variables _x1_, _x2_, etc.). Simple linear regression assumes a function of the form:\n<br><br>\n$$\ny = w0 + w1 * x1 + w2 * x2 +...\n$$\n<br>\nand finds the values of _w0_, _w1_, _w2_, etc. The term _w0_ is the \"intercept\" or \"constant term\" (it's shown as _b_ in the formula below):\n<br><br>\n$$\nY = WX + b\n$$\n<p></p>\n\n</div>\n\nLogistic Regression is a variation of Linear Regression, useful when the observed dependent variable, _y_, is categorical. It produces a formula that predicts the probability of the class label as a function of the independent variables.\n\nDespite the name logistic _regression_, it is actually a __probabilistic classification__ model. Logistic regression fits a special s-shaped curve by taking the linear regression and transforming the numeric estimate into a probability with the following function:\n\n$$\nProbabilityOfaClass = \\theta(y) = \\frac{e^y}{1+e^y} = exp(y) / (1+exp(y)) = p \n$$\n\nwhich produces p-values between 0 (as y approaches minus infinity) and 1 (as y approaches plus infinity). This now becomes a special kind of non-linear regression.\n\nIn this equation, _y_ is the regression result (the sum of the variables weighted by the coefficients), `exp` is the exponential function and $\\theta(y)$ is the [logistic function](http://en.wikipedia.org/wiki/Logistic_function), also called logistic curve. It is a common \"S\" shape (sigmoid curve), and was first developed for modelling population growth.\n\nYou might also have seen this function before, in another configuration:\n\n$$\nProbabilityOfaClass = \\theta(y) = \\frac{1}{1+e^{-x}}\n$$\n\nSo, briefly, Logistic Regression passes the input through the logistic/sigmoid but then treats the result as a probability:\n\n<img\nsrc=\"https://ibm.box.com/shared/static/kgv9alcghmjcv97op4d6onkyxevk23b1.png\", width = \"400\", align = \"center\">\n"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "-------------------------------"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "<a id=\"ref2\"></a>\n# Utilizing Logistic Regression in TensorFlow\n\nFor us to utilize Logistic Regression in TensorFlow, we first need to import whatever libraries we are going to use. To do so, you can run the code cell below."
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "import tensorflow as tf\nimport pandas as pd\nimport numpy as np\nimport time\nfrom sklearn.datasets import load_iris\nfrom sklearn.cross_validation import train_test_split\nimport matplotlib.pyplot as plt",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "Next, we will load the dataset we are going to use. In this case, we are utilizing the `iris` dataset, which is inbuilt -- so there's no need to do any preprocessing and we can jump right into manipulating it. We separate the dataset into _xs_ and _ys_, and then into training _xs_ and _ys_ and testing _xs_ and _ys_, (pseudo-)randomly."
    },
    {
      "metadata": {
        "collapsed": true,
        "trusted": true
      },
      "cell_type": "code",
      "source": "iris = load_iris()\niris_X, iris_y = iris.data[:-1,:], iris.target[:-1]\niris_y= pd.get_dummies(iris_y).values\ntrainX, testX, trainY, testY = train_test_split(iris_X, iris_y, test_size=0.33, random_state=42)",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "Now we define x and y. These placeholders will hold our iris data (both the features and label matrices), and help pass them along to different parts of the algorithm. You can consider placeholders as empty shells into which we insert our data. We also need to give them shapes which correspond to the shape of our data. Later, we will insert data into these placeholders by “feeding” the placeholders the data via a “feed_dict” (Feed Dictionary).\n\n### Why use Placeholders?  \n1) This feature of TensorFlow allows us to create an algorithm which accepts data and knows something about the shape of the data without knowing the amount of data going in. <br><br>\n2) When we insert “batches” of data in training, we can easily adjust how many examples we train on in a single step without changing the entire algorithm."
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "# numFeatures is the number of features in our input data.\n# In the iris dataset, this number is '4'.\nnumFeatures = trainX.shape[1]\n\n# numLabels is the number of classes our data points can be in.\n# In the iris dataset, this number is '3'.\nnumLabels = trainY.shape[1]\n\n\n# Placeholders\n# 'None' means TensorFlow shouldn't expect a fixed number in that dimension\nX = tf.placeholder(tf.float32, [None, numFeatures]) # Iris has 4 features, so X is a tensor to hold our data.\nyGold = tf.placeholder(tf.float32, [None, numLabels]) # This will be our correct answers matrix for 3 classes.",
      "execution_count": 2,
      "outputs": [
        {
          "ename": "NameError",
          "evalue": "name 'trainX' is not defined",
          "traceback": [
            "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
            "\u001b[1;31mNameError\u001b[0m                                 Traceback (most recent call last)",
            "\u001b[1;32m<ipython-input-2-99af16bf15ae>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[1;31m# numFeatures is the number of features in our input data.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      2\u001b[0m \u001b[1;31m# In the iris dataset, this number is '4'.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0mnumFeatures\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mtrainX\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m      4\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      5\u001b[0m \u001b[1;31m# numLabels is the number of classes our data points can be in.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
            "\u001b[1;31mNameError\u001b[0m: name 'trainX' is not defined"
          ],
          "output_type": "error"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Set model weights and bias\n\nMuch like Linear Regression, we need a shared variable weight matrix for Logistic Regression. We initialize both `W` and `b` as tensors full of zeros. Since we are going to learn `W` and `b`, their initial value doesn't matter too much. These variables are the objects which define the structure of our regression model, and we can save them after they’ve been trained so we can reuse them later.\n\nWe define two TensorFlow variables as our parameters. These variables will hold the weights and biases of our logistic regression and they will be continually updated during training. \n\nNotice that `W` has a shape of [4, 3] because we want to multiply the 4-dimensional input vectors by it to produce 3-dimensional vectors of evidence for the difference classes. `b` has a shape of [3] so we can add it to the output. Moreover, unlike our placeholders above which are essentially empty shells waiting to be fed data, TensorFlow variables need to be initialized with values, e.g. with zeros."
    },
    {
      "metadata": {
        "collapsed": true,
        "trusted": true
      },
      "cell_type": "code",
      "source": "W = tf.Variable(tf.zeros([4, 3]))  # 4-dimensional input and  3 classes\nb = tf.Variable(tf.zeros([3])) # 3-dimensional output [0,0,1],[0,1,0],[1,0,0]",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {
        "collapsed": true,
        "trusted": true
      },
      "cell_type": "code",
      "source": "#Randomly sample from a normal distribution with standard deviation .01\n\nweights = tf.Variable(tf.random_normal([numFeatures,numLabels],\n                                       mean=0,\n                                       stddev=0.01,\n                                       name=\"weights\"))\n\nbias = tf.Variable(tf.random_normal([1,numLabels],\n                                    mean=0,\n                                    stddev=0.01,\n                                    name=\"bias\"))",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "###  Logistic Regression model\n\nWe now define our operations in order to properly run the Logistic Regression. Logistic regression is typically thought of as a single equation:\n\n$$\nŷ =sigmoid(WX+b)\n$$\n\nHowever, for the sake of clarity, we can have it broken into its three main components: \n- a weight times features matrix multiplication operation, \n- a summation of the weighted features and a bias term, \n- and finally the application of a sigmoid function. \n\nAs such, you will find these components defined as three separate operations below.\n"
    },
    {
      "metadata": {
        "collapsed": true,
        "trusted": true
      },
      "cell_type": "code",
      "source": "# Three-component breakdown of the Logistic Regression equation.\n# Note that these feed into each other.\napply_weights_OP = tf.matmul(X, weights, name=\"apply_weights\")\nadd_bias_OP = tf.add(apply_weights_OP, bias, name=\"add_bias\") \nactivation_OP = tf.nn.sigmoid(add_bias_OP, name=\"activation\")",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "As we have seen before, the function we are going to use is the _logistic function_ $(\\frac{1}{1+e^{-x}})$, which is fed the input data after applying weights and bias. In TensorFlow, this function is implemented as the `nn.sigmoid` function. Effectively, this fits the weighted input with bias into a 0-100 percent curve, which is the probability function we want."
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "-------------------------------------"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "<a id=\"ref3\"></a>\n# Training\n\nThe learning algorithm is how we search for the best weight vector (${\\bf w}$). This search is an optimization problem looking for the hypothesis that optimizes an error/cost measure.\n\n__What tell us our model is bad?__  \nThe Cost or Loss of the model, so what we want is to minimize that. \n\n__What is the cost function in our model?__  \nThe cost function we are going to utilize is the Squared Mean Error loss function.\n\n__How to minimize the cost function?__   \nWe can't use __least-squares linear regression__ here, so we will use [gradient descent](http://en.wikipedia.org/wiki/Gradient_descent) instead. Specifically, we will use batch gradient descent which calculates the gradient from all data points in the data set.\n\n### Cost function\nBefore defining our cost function, we need to define how long we are going to train and how should we define the learning rate."
    },
    {
      "metadata": {
        "collapsed": true,
        "trusted": true
      },
      "cell_type": "code",
      "source": "# Number of Epochs in our training\nnumEpochs = 700\n\n# Defining our learning rate iterations (decay)\nlearningRate = tf.train.exponential_decay(learning_rate=0.0008,\n                                          global_step= 1,\n                                          decay_steps=trainX.shape[0],\n                                          decay_rate= 0.95,\n                                          staircase=True)",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {
        "collapsed": true,
        "trusted": true
      },
      "cell_type": "code",
      "source": "#Defining our cost function - Squared Mean Error\ncost_OP = tf.nn.l2_loss(activation_OP-yGold, name=\"squared_error_cost\")\n\n#Defining our Gradient Descent\ntraining_OP = tf.train.GradientDescentOptimizer(learningRate).minimize(cost_OP)",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "Now we move on to actually running our operations. We will start with the operations involved in the prediction phase (i.e. the logistic regression itself).\n\nFirst, we need to initialize our weights and biases with zeros or random values via the inbuilt Initialization Op, __tf.initialize_all_variables()__. This Initialization Op will become a node in our computational graph, and when we put the graph into a session, then the Op will run and create the variables."
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "# Create a tensorflow session\nsess = tf.Session()\n\n# Initialize our weights and biases variables.\ninit_OP = tf.global_variables_initializer()\n\n# Initialize all tensorflow variables\nsess.run(init_OP)",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "We also want some additional operations to keep track of our model's efficiency over time. We can do this like so:"
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "# argmax(activation_OP, 1) returns the label with the most probability\n# argmax(yGold, 1) is the correct label\ncorrect_predictions_OP = tf.equal(tf.argmax(activation_OP,1),tf.argmax(yGold,1))\n\n# If every false prediction is 0 and every true prediction is 1, the average returns us the accuracy\naccuracy_OP = tf.reduce_mean(tf.cast(correct_predictions_OP, \"float\"))\n\n# Summary op for regression output\nactivation_summary_OP = tf.summary.histogram(\"output\", activation_OP)\n\n# Summary op for accuracy\naccuracy_summary_OP = tf.summary.scalar(\"accuracy\", accuracy_OP)\n\n# Summary op for cost\ncost_summary_OP = tf.summary.scalar(\"cost\", cost_OP)\n\n# Summary ops to check how variables (W, b) are updating after each iteration\nweightSummary = tf.summary.histogram(\"weights\", weights.eval(session=sess))\nbiasSummary = tf.summary.histogram(\"biases\", bias.eval(session=sess))\n\n# Merge all summaries\nmerged = tf.summary.merge([activation_summary_OP, accuracy_summary_OP, cost_summary_OP, weightSummary, biasSummary])\n\n# Summary writer\nwriter = tf.summary.FileWriter(\"summary_logs\", sess.graph)",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "Now we can define and run the actual training loop, like this:"
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "# Initialize reporting variables\ncost = 0\ndiff = 1\nepoch_values = []\naccuracy_values = []\ncost_values = []\n\n# Training epochs\nfor i in range(numEpochs):\n    if i > 1 and diff < .0001:\n        print(\"change in cost %g; convergence.\"%diff)\n        break\n    else:\n        # Run training step\n        step = sess.run(training_OP, feed_dict={X: trainX, yGold: trainY})\n        # Report occasional stats\n        if i % 10 == 0:\n            # Add epoch to epoch_values\n            epoch_values.append(i)\n            # Generate accuracy stats on test data\n            train_accuracy, newCost = sess.run([accuracy_OP, cost_OP], feed_dict={X: trainX, yGold: trainY})\n            # Add accuracy to live graphing variable\n            accuracy_values.append(train_accuracy)\n            # Add cost to live graphing variable\n            cost_values.append(newCost)\n            # Re-assign values for variables\n            diff = abs(newCost - cost)\n            cost = newCost\n\n            #generate print statements\n            print(\"step %d, training accuracy %g, cost %g, change in cost %g\"%(i, train_accuracy, newCost, diff))\n\n\n# How well do we perform on held-out test data?\nprint(\"final accuracy on test set: %s\" %str(sess.run(accuracy_OP, \n                                                     feed_dict={X: testX, \n                                                                yGold: testY})))",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "__Why don't we plot the cost to see how it behaves?__"
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "%matplotlib inline\nimport numpy as np\nimport matplotlib.pyplot as plt\nplt.plot([np.mean(cost_values[i-50:i]) for i in range(len(cost_values))])\nplt.show()",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {
        "collapsed": true
      },
      "cell_type": "markdown",
      "source": "Assuming no parameters were changed, you should reach a peak accuracy of 90% at the end of training, which is commendable. Try changing the parameters such as the length of training, and maybe some operations to see how the model behaves. Does it take much longer? How is the performance?"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "------------------------------------"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Thanks for completing this lesson!\n\nThis is the end of **Logistic Regression with TensorFlow** notebook. Hopefully, now you have a deeper understanding of Logistic Regression and how its structure and flow work. Thank you for reading this notebook and good luck on your studies."
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "Created by: <a href = \"https://br.linkedin.com/in/walter-gomes-de-amorim-junior-624726121\">Walter Gomes de Amorim Junior</a> , <a href = \"https://br.linkedin.com/in/walter-gomes-de-amorim-junior-624726121\">Saeed Aghabozorgi</a> , <a href = \"https://br.linkedin.com/in/victor-barros-2446a390\">Victor Barros Costa</a>\n"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "<hr>\nCopyright &copy; 2016 [Big Data University](https://bigdatauniversity.com/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/)."
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "python2",
      "display_name": "Python 2",
      "language": "python"
    },
    "widgets": {
      "state": {},
      "version": "1.1.2"
    },
    "language_info": {
      "mimetype": "text/x-python",
      "nbconvert_exporter": "python",
      "name": "python",
      "pygments_lexer": "ipython2",
      "version": "2.7.11",
      "file_extension": ".py",
      "codemirror_mode": {
        "version": 2,
        "name": "ipython"
      }
    },
    "gist": {
      "id": "",
      "data": {
        "description": "Logistic Regression with TensorFlow.ipynb",
        "public": true
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
	{
	"cells": [
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "\n<h1 align=center><font size = 5> LOGISTIC REGRESSION WITH TENSORFLOW </font></h1>"
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "## Table of Contents\n\nLogistic Regression is one of most important techniques in data science. It is usually used to solve the classic classification problem.\n\n<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n<font size = 3><strong>This lesson covers the following concepts of Logistics Regression:</strong></font>\n<br>\n- <p><a href=\"#ref1\">Linear Regression vs Logistic Regression</a></p>\n- <p><a href=\"#ref2\">Utilizing Logistic Regression in TensorFlow</a></p>\n- <p><a href=\"#ref3\">Training</a></p>\n<p></p>\n</div>\n----------------"
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "<a id=\"ref1\"></a>\n## What is different between Linear and Logistic Regression?\n\nWhile Linear Regression is suited for estimating continuous values (e.g. estimating house price), it isn’t the best tool for predicting the class of an observed data point. In order to estimate a classification, we need some sort of guidance on what would be the most probable class for that data point. For this, we use Logistic Regression.\n\n<div class=\"alert alert-success alertsuccess\" style=\"margin-top: 20px\">\n<font size = 3><strong>Recall linear regression:</strong></font>\n<br>\n<br>\nLinear regression finds a function that relates a continuous dependent variable, _y_, to some predictors (independent variables _x1_, _x2_, etc.). Simple linear regression assumes a function of the form:\n<br><br>\n$$\ny = w0 + w1 * x1 + w2 * x2 +...\n$$\n<br>\nand finds the values of _w0_, _w1_, _w2_, etc. The term _w0_ is the \"intercept\" or \"constant term\" (it's shown as _b_ in the formula below):\n<br><br>\n$$\nY = WX + b\n$$\n<p></p>\n\n</div>\n\nLogistic Regression is a variation of Linear Regression, useful when the observed dependent variable, _y_, is categorical. It produces a formula that predicts the probability of the class label as a function of the independent variables.\n\nDespite the name logistic _regression_, it is actually a __probabilistic classification__ model. Logistic regression fits a special s-shaped curve by taking the linear regression and transforming the numeric estimate into a probability with the following function:\n\n$$\nProbabilityOfaClass = \\theta(y) = \\frac{e^y}{1+e^y} = exp(y) / (1+exp(y)) = p \n$$\n\nwhich produces p-values between 0 (as y approaches minus infinity) and 1 (as y approaches plus infinity). This now becomes a special kind of non-linear regression.\n\nIn this equation, _y_ is the regression result (the sum of the variables weighted by the coefficients), `exp` is the exponential function and $\\theta(y)$ is the [logistic function](http://en.wikipedia.org/wiki/Logistic_function), also called logistic curve. It is a common \"S\" shape (sigmoid curve), and was first developed for modelling population growth.\n\nYou might also have seen this function before, in another configuration:\n\n$$\nProbabilityOfaClass = \\theta(y) = \\frac{1}{1+e^{-x}}\n$$\n\nSo, briefly, Logistic Regression passes the input through the logistic/sigmoid but then treats the result as a probability:\n\n<img\nsrc=\"https://ibm.box.com/shared/static/kgv9alcghmjcv97op4d6onkyxevk23b1.png\", width = \"400\", align = \"center\">\n"
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "-------------------------------"
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "<a id=\"ref2\"></a>\n# Utilizing Logistic Regression in TensorFlow\n\nFor us to utilize Logistic Regression in TensorFlow, we first need to import whatever libraries we are going to use. To do so, you can run the code cell below."
	},
	{
	"metadata": {
	"collapsed": false,
	"trusted": true
	},
	"cell_type": "code",
	"source": "import tensorflow as tf\nimport pandas as pd\nimport numpy as np\nimport time\nfrom sklearn.datasets import load_iris\nfrom sklearn.cross_validation import train_test_split\nimport matplotlib.pyplot as plt",
	"execution_count": null,
	"outputs": []
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "Next, we will load the dataset we are going to use. In this case, we are utilizing the `iris` dataset, which is inbuilt -- so there's no need to do any preprocessing and we can jump right into manipulating it. We separate the dataset into _xs_ and _ys_, and then into training _xs_ and _ys_ and testing _xs_ and _ys_, (pseudo-)randomly."
	},
	{
	"metadata": {
	"collapsed": true,
	"trusted": true
	},
	"cell_type": "code",
	"source": "iris = load_iris()\niris_X, iris_y = iris.data[:-1,:], iris.target[:-1]\niris_y= pd.get_dummies(iris_y).values\ntrainX, testX, trainY, testY = train_test_split(iris_X, iris_y, test_size=0.33, random_state=42)",
	"execution_count": null,
	"outputs": []
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "Now we define x and y. These placeholders will hold our iris data (both the features and label matrices), and help pass them along to different parts of the algorithm. You can consider placeholders as empty shells into which we insert our data. We also need to give them shapes which correspond to the shape of our data. Later, we will insert data into these placeholders by “feeding” the placeholders the data via a “feed_dict” (Feed Dictionary).\n\n### Why use Placeholders? \n1) This feature of TensorFlow allows us to create an algorithm which accepts data and knows something about the shape of the data without knowing the amount of data going in. <br><br>\n2) When we insert “batches” of data in training, we can easily adjust how many examples we train on in a single step without changing the entire algorithm."
	},
	{
	"metadata": {
	"collapsed": false,
	"trusted": true
	},
	"cell_type": "code",
	"source": "# numFeatures is the number of features in our input data.\n# In the iris dataset, this number is '4'.\nnumFeatures = trainX.shape[1]\n\n# numLabels is the number of classes our data points can be in.\n# In the iris dataset, this number is '3'.\nnumLabels = trainY.shape[1]\n\n\n# Placeholders\n# 'None' means TensorFlow shouldn't expect a fixed number in that dimension\nX = tf.placeholder(tf.float32, [None, numFeatures]) # Iris has 4 features, so X is a tensor to hold our data.\nyGold = tf.placeholder(tf.float32, [None, numLabels]) # This will be our correct answers matrix for 3 classes.",
	"execution_count": 2,
	"outputs": [
	{
	"ename": "NameError",
	"evalue": "name 'trainX' is not defined",
	"traceback": [
	"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
	"\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)",
	"\u001b[1;32m<ipython-input-2-99af16bf15ae>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# numFeatures is the number of features in our input data.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[1;31m# In the iris dataset, this number is '4'.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0mnumFeatures\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mtrainX\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 4\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[1;31m# numLabels is the number of classes our data points can be in.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
	"\u001b[1;31mNameError\u001b[0m: name 'trainX' is not defined"
	],
	"output_type": "error"
	}
	]
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "### Set model weights and bias\n\nMuch like Linear Regression, we need a shared variable weight matrix for Logistic Regression. We initialize both `W` and `b` as tensors full of zeros. Since we are going to learn `W` and `b`, their initial value doesn't matter too much. These variables are the objects which define the structure of our regression model, and we can save them after they’ve been trained so we can reuse them later.\n\nWe define two TensorFlow variables as our parameters. These variables will hold the weights and biases of our logistic regression and they will be continually updated during training. \n\nNotice that `W` has a shape of [4, 3] because we want to multiply the 4-dimensional input vectors by it to produce 3-dimensional vectors of evidence for the difference classes. `b` has a shape of [3] so we can add it to the output. Moreover, unlike our placeholders above which are essentially empty shells waiting to be fed data, TensorFlow variables need to be initialized with values, e.g. with zeros."
	},
	{
	"metadata": {
	"collapsed": true,
	"trusted": true
	},
	"cell_type": "code",
	"source": "W = tf.Variable(tf.zeros([4, 3])) # 4-dimensional input and 3 classes\nb = tf.Variable(tf.zeros([3])) # 3-dimensional output [0,0,1],[0,1,0],[1,0,0]",
	"execution_count": null,
	"outputs": []
	},
	{
	"metadata": {
	"collapsed": true,
	"trusted": true
	},
	"cell_type": "code",
	"source": "#Randomly sample from a normal distribution with standard deviation .01\n\nweights = tf.Variable(tf.random_normal([numFeatures,numLabels],\n mean=0,\n stddev=0.01,\n name=\"weights\"))\n\nbias = tf.Variable(tf.random_normal([1,numLabels],\n mean=0,\n stddev=0.01,\n name=\"bias\"))",
	"execution_count": null,
	"outputs": []
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "### Logistic Regression model\n\nWe now define our operations in order to properly run the Logistic Regression. Logistic regression is typically thought of as a single equation:\n\n$$\nŷ =sigmoid(WX+b)\n$$\n\nHowever, for the sake of clarity, we can have it broken into its three main components: \n- a weight times features matrix multiplication operation, \n- a summation of the weighted features and a bias term, \n- and finally the application of a sigmoid function. \n\nAs such, you will find these components defined as three separate operations below.\n"
	},
	{
	"metadata": {
	"collapsed": true,
	"trusted": true
	},
	"cell_type": "code",
	"source": "# Three-component breakdown of the Logistic Regression equation.\n# Note that these feed into each other.\napply_weights_OP = tf.matmul(X, weights, name=\"apply_weights\")\nadd_bias_OP = tf.add(apply_weights_OP, bias, name=\"add_bias\") \nactivation_OP = tf.nn.sigmoid(add_bias_OP, name=\"activation\")",
	"execution_count": null,
	"outputs": []
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "As we have seen before, the function we are going to use is the _logistic function_ $(\\frac{1}{1+e^{-x}})$, which is fed the input data after applying weights and bias. In TensorFlow, this function is implemented as the `nn.sigmoid` function. Effectively, this fits the weighted input with bias into a 0-100 percent curve, which is the probability function we want."
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "-------------------------------------"
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "<a id=\"ref3\"></a>\n# Training\n\nThe learning algorithm is how we search for the best weight vector (${\\bf w}$). This search is an optimization problem looking for the hypothesis that optimizes an error/cost measure.\n\n__What tell us our model is bad?__ \nThe Cost or Loss of the model, so what we want is to minimize that. \n\n__What is the cost function in our model?__ \nThe cost function we are going to utilize is the Squared Mean Error loss function.\n\n__How to minimize the cost function?__ \nWe can't use __least-squares linear regression__ here, so we will use [gradient descent](http://en.wikipedia.org/wiki/Gradient_descent) instead. Specifically, we will use batch gradient descent which calculates the gradient from all data points in the data set.\n\n### Cost function\nBefore defining our cost function, we need to define how long we are going to train and how should we define the learning rate."
	},
	{
	"metadata": {
	"collapsed": true,
	"trusted": true
	},
	"cell_type": "code",
	"source": "# Number of Epochs in our training\nnumEpochs = 700\n\n# Defining our learning rate iterations (decay)\nlearningRate = tf.train.exponential_decay(learning_rate=0.0008,\n global_step= 1,\n decay_steps=trainX.shape[0],\n decay_rate= 0.95,\n staircase=True)",
	"execution_count": null,
	"outputs": []
	},
	{
	"metadata": {
	"collapsed": true,
	"trusted": true
	},
	"cell_type": "code",
	"source": "#Defining our cost function - Squared Mean Error\ncost_OP = tf.nn.l2_loss(activation_OP-yGold, name=\"squared_error_cost\")\n\n#Defining our Gradient Descent\ntraining_OP = tf.train.GradientDescentOptimizer(learningRate).minimize(cost_OP)",
	"execution_count": null,
	"outputs": []
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "Now we move on to actually running our operations. We will start with the operations involved in the prediction phase (i.e. the logistic regression itself).\n\nFirst, we need to initialize our weights and biases with zeros or random values via the inbuilt Initialization Op, __tf.initialize_all_variables()__. This Initialization Op will become a node in our computational graph, and when we put the graph into a session, then the Op will run and create the variables."
	},
	{
	"metadata": {
	"collapsed": false,
	"trusted": true
	},
	"cell_type": "code",
	"source": "# Create a tensorflow session\nsess = tf.Session()\n\n# Initialize our weights and biases variables.\ninit_OP = tf.global_variables_initializer()\n\n# Initialize all tensorflow variables\nsess.run(init_OP)",
	"execution_count": null,
	"outputs": []
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "We also want some additional operations to keep track of our model's efficiency over time. We can do this like so:"
	},
	{
	"metadata": {
	"collapsed": false,
	"trusted": true
	},
	"cell_type": "code",
	"source": "# argmax(activation_OP, 1) returns the label with the most probability\n# argmax(yGold, 1) is the correct label\ncorrect_predictions_OP = tf.equal(tf.argmax(activation_OP,1),tf.argmax(yGold,1))\n\n# If every false prediction is 0 and every true prediction is 1, the average returns us the accuracy\naccuracy_OP = tf.reduce_mean(tf.cast(correct_predictions_OP, \"float\"))\n\n# Summary op for regression output\nactivation_summary_OP = tf.summary.histogram(\"output\", activation_OP)\n\n# Summary op for accuracy\naccuracy_summary_OP = tf.summary.scalar(\"accuracy\", accuracy_OP)\n\n# Summary op for cost\ncost_summary_OP = tf.summary.scalar(\"cost\", cost_OP)\n\n# Summary ops to check how variables (W, b) are updating after each iteration\nweightSummary = tf.summary.histogram(\"weights\", weights.eval(session=sess))\nbiasSummary = tf.summary.histogram(\"biases\", bias.eval(session=sess))\n\n# Merge all summaries\nmerged = tf.summary.merge([activation_summary_OP, accuracy_summary_OP, cost_summary_OP, weightSummary, biasSummary])\n\n# Summary writer\nwriter = tf.summary.FileWriter(\"summary_logs\", sess.graph)",
	"execution_count": null,
	"outputs": []
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "Now we can define and run the actual training loop, like this:"
	},
	{
	"metadata": {
	"collapsed": false,
	"trusted": true
	},
	"cell_type": "code",
	"source": "# Initialize reporting variables\ncost = 0\ndiff = 1\nepoch_values = []\naccuracy_values = []\ncost_values = []\n\n# Training epochs\nfor i in range(numEpochs):\n if i > 1 and diff < .0001:\n print(\"change in cost %g; convergence.\"%diff)\n break\n else:\n # Run training step\n step = sess.run(training_OP, feed_dict={X: trainX, yGold: trainY})\n # Report occasional stats\n if i % 10 == 0:\n # Add epoch to epoch_values\n epoch_values.append(i)\n # Generate accuracy stats on test data\n train_accuracy, newCost = sess.run([accuracy_OP, cost_OP], feed_dict={X: trainX, yGold: trainY})\n # Add accuracy to live graphing variable\n accuracy_values.append(train_accuracy)\n # Add cost to live graphing variable\n cost_values.append(newCost)\n # Re-assign values for variables\n diff = abs(newCost - cost)\n cost = newCost\n\n #generate print statements\n print(\"step %d, training accuracy %g, cost %g, change in cost %g\"%(i, train_accuracy, newCost, diff))\n\n\n# How well do we perform on held-out test data?\nprint(\"final accuracy on test set: %s\" %str(sess.run(accuracy_OP, \n feed_dict={X: testX, \n yGold: testY})))",
	"execution_count": null,
	"outputs": []
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "__Why don't we plot the cost to see how it behaves?__"
	},
	{
	"metadata": {
	"collapsed": false,
	"trusted": true
	},
	"cell_type": "code",
	"source": "%matplotlib inline\nimport numpy as np\nimport matplotlib.pyplot as plt\nplt.plot([np.mean(cost_values[i-50:i]) for i in range(len(cost_values))])\nplt.show()",
	"execution_count": null,
	"outputs": []
	},
	{
	"metadata": {
	"collapsed": true
	},
	"cell_type": "markdown",
	"source": "Assuming no parameters were changed, you should reach a peak accuracy of 90% at the end of training, which is commendable. Try changing the parameters such as the length of training, and maybe some operations to see how the model behaves. Does it take much longer? How is the performance?"
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "------------------------------------"
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "### Thanks for completing this lesson!\n\nThis is the end of Logistic Regression with TensorFlow notebook. Hopefully, now you have a deeper understanding of Logistic Regression and how its structure and flow work. Thank you for reading this notebook and good luck on your studies."
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "Created by: <a href = \"https://br.linkedin.com/in/walter-gomes-de-amorim-junior-624726121\">Walter Gomes de Amorim Junior</a> , <a href = \"https://br.linkedin.com/in/walter-gomes-de-amorim-junior-624726121\">Saeed Aghabozorgi</a> , <a href = \"https://br.linkedin.com/in/victor-barros-2446a390\">Victor Barros Costa</a>\n"
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "<hr>\nCopyright © 2016 [Big Data University](https://bigdatauniversity.com/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/)."
	}
	],
	"metadata": {
	"kernelspec": {
	"name": "python2",
	"display_name": "Python 2",
	"language": "python"
	},
	"widgets": {
	"state": {},
	"version": "1.1.2"
	},
	"language_info": {
	"mimetype": "text/x-python",
	"nbconvert_exporter": "python",
	"name": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.11",
	"file_extension": ".py",
	"codemirror_mode": {
	"version": 2,
	"name": "ipython"
	}
	},
	"gist": {
	"id": "",
	"data": {
	"description": "Logistic Regression with TensorFlow.ipynb",
	"public": true
	}
	}
	},
	"nbformat": 4,
	"nbformat_minor": 0
	}