georgehc/94-775, 95-865 CNN demo.ipynb Secret

## 94-775, 95-865 CNN demo.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 94-775/95-865: Image Analysis with Convolutional Neural Nets (CNN's, also called convnets)\n",
    "\n",
    "Author: George H. Chen (georgechen [at symbol] cmu.edu)\n",
    "\n",
    "This demo draws heavily from the handwritten digit example in Chapter 2 of Francois Chollet's \"Deep Learning with Python\" book. I've added a simpler single-layer example first before moving to the 2-layer example. I then proceed to two CNN examples."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We start with loading in the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using TensorFlow backend.\n"
     ]
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "np.set_printoptions(precision=5, suppress=True)\n",
    "\n",
    "from tensorflow.python import keras\n",
    "from keras.datasets import mnist\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense\n",
    "\n",
    "(train_images, train_labels), (test_images, test_labels) = mnist.load_data()\n",
    "\n",
    "flattened_train_images = train_images.reshape(len(train_images), -1)  # flattens out each training image\n",
    "flattened_train_images = flattened_train_images.astype(np.float32) / 255  # rescale to be between 0 and 1\n",
    "flattened_test_images = test_images.reshape(len(test_images), -1)  # flattens out each test image\n",
    "flattened_test_images = flattened_test_images.astype(np.float32) / 255  # rescale to be between 0 and 1\n",
    "\n",
    "from keras.utils import to_categorical\n",
    "train_labels_categorical = to_categorical(train_labels)\n",
    "test_labels_categorical = to_categorical(test_labels)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_labels[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_labels_categorical[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Single-layer neural net\n",
    "\n",
    "Making a neural net with a single Dense layer (also called a fully-connected layer) is fairly simple. Note that we need to specify the input shape for the initial layer.\n",
    "\n",
    "Make sure you understand where the number of parameters comes from! To do this, count how many numbers are in the weight matrix and the bias vector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"sequential_1\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "dense_1 (Dense)              (None, 10)                7850      \n",
      "=================================================================\n",
      "Total params: 7,850\n",
      "Trainable params: 7,850\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "single_layer_model = Sequential()  # this is Keras's way of specifying a model that is a single sequence of layers\n",
    "single_layer_model.add(Dense(10, activation='softmax', input_shape=(784,)))\n",
    "single_layer_model.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "single_layer_model.compile(optimizer='adam',\n",
    "                           loss='categorical_crossentropy',\n",
    "                           metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 48000 samples, validate on 12000 samples\n",
      "Epoch 1/5\n",
      "48000/48000 [==============================] - 2s 37us/step - loss: 0.7575 - accuracy: 0.8126 - val_loss: 0.4124 - val_accuracy: 0.8958\n",
      "Epoch 2/5\n",
      "48000/48000 [==============================] - 2s 32us/step - loss: 0.3928 - accuracy: 0.8945 - val_loss: 0.3359 - val_accuracy: 0.9092\n",
      "Epoch 3/5\n",
      "48000/48000 [==============================] - 2s 34us/step - loss: 0.3399 - accuracy: 0.9071 - val_loss: 0.3090 - val_accuracy: 0.9125\n",
      "Epoch 4/5\n",
      "48000/48000 [==============================] - 1s 31us/step - loss: 0.3158 - accuracy: 0.9124 - val_loss: 0.2944 - val_accuracy: 0.9183\n",
      "Epoch 5/5\n",
      "48000/48000 [==============================] - 2s 32us/step - loss: 0.3010 - accuracy: 0.9161 - val_loss: 0.2855 - val_accuracy: 0.9225\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<keras.callbacks.callbacks.History at 0x6461a0e10>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "single_layer_model.fit(flattened_train_images,\n",
    "                       train_labels_categorical,\n",
    "                       validation_split=0.2,\n",
    "                       epochs=5,\n",
    "                       batch_size=128)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Two-layer neural net\n",
    "\n",
    "Going from 1 Dense layer to 2 Dense layers is straightforward. Importantly, we only need to specify the input shape for the first layer added; the input shape is automatically determined by Keras for the second layer.\n",
    "\n",
    "Once again, make sure you know where the number of parameters come from."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"sequential_2\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "dense_2 (Dense)              (None, 512)               401920    \n",
      "_________________________________________________________________\n",
      "dense_3 (Dense)              (None, 10)                5130      \n",
      "=================================================================\n",
      "Total params: 407,050\n",
      "Trainable params: 407,050\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "two_layer_model = Sequential()  # this is Keras's way of specifying a model that is a single sequence of layers\n",
    "two_layer_model.add(Dense(512, activation='relu', input_shape=(784,)))\n",
    "two_layer_model.add(Dense(10, activation='softmax'))\n",
    "two_layer_model.compile(optimizer='adam',\n",
    "                        loss='categorical_crossentropy',\n",
    "                        metrics=['accuracy'])\n",
    "two_layer_model.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 48000 samples, validate on 12000 samples\n",
      "Epoch 1/5\n",
      "48000/48000 [==============================] - 6s 117us/step - loss: 0.2942 - accuracy: 0.9171 - val_loss: 0.1531 - val_accuracy: 0.9554\n",
      "Epoch 2/5\n",
      "48000/48000 [==============================] - 5s 110us/step - loss: 0.1200 - accuracy: 0.9654 - val_loss: 0.1218 - val_accuracy: 0.9639\n",
      "Epoch 3/5\n",
      "48000/48000 [==============================] - 5s 114us/step - loss: 0.0794 - accuracy: 0.9772 - val_loss: 0.0933 - val_accuracy: 0.9711\n",
      "Epoch 4/5\n",
      "48000/48000 [==============================] - 5s 99us/step - loss: 0.0573 - accuracy: 0.9830 - val_loss: 0.0854 - val_accuracy: 0.9747\n",
      "Epoch 5/5\n",
      "48000/48000 [==============================] - 4s 88us/step - loss: 0.0414 - accuracy: 0.9886 - val_loss: 0.0811 - val_accuracy: 0.9750\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<keras.callbacks.callbacks.History at 0x646958f90>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "two_layer_model.fit(flattened_train_images,\n",
    "                    train_labels_categorical,\n",
    "                    validation_split=0.2,\n",
    "                    epochs=5,\n",
    "                    batch_size=128)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A simple CNN\n",
    "\n",
    "To work with CNN's, we do not need to flatten the images. However, we still reshape them to include depth information (since we're looking at grayscale images, the depth is just 1)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "from keras.layers import Conv2D, MaxPooling2D, Flatten"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# reshape images to have an additional dimension for color (even though there's no color)\n",
    "scaled_train_images = train_images.reshape(len(train_images), train_images.shape[1], train_images.shape[2], -1)\n",
    "scaled_test_images = test_images.reshape(len(test_images), test_images.shape[1], test_images.shape[2], -1)\n",
    "\n",
    "# rescale to be between 0 and 1\n",
    "scaled_train_images = scaled_train_images.astype(np.float32) / 255\n",
    "scaled_test_images = scaled_test_images.astype(np.float32) / 255"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(60000, 28, 28, 1)\n"
     ]
    }
   ],
   "source": [
    "print(scaled_train_images.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now create a simple CNN.\n",
    "\n",
    "Make sure you understand why the output shape after each layer is what it is, and where the number of parameters come from for the convolutional and dense layers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"sequential_3\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       \n",
      "_________________________________________________________________\n",
      "max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         \n",
      "_________________________________________________________________\n",
      "flatten_1 (Flatten)          (None, 5408)              0         \n",
      "_________________________________________________________________\n",
      "dense_4 (Dense)              (None, 10)                54090     \n",
      "=================================================================\n",
      "Total params: 54,410\n",
      "Trainable params: 54,410\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "simple_convnet_model = Sequential()\n",
    "simple_convnet_model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))\n",
    "simple_convnet_model.add(MaxPooling2D((2, 2)))\n",
    "simple_convnet_model.add(Flatten())\n",
    "simple_convnet_model.add(Dense(10, activation='softmax'))\n",
    "simple_convnet_model.summary()\n",
    "\n",
    "simple_convnet_model.compile(optimizer='adam',\n",
    "                             loss='categorical_crossentropy',\n",
    "                             metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 48000 samples, validate on 12000 samples\n",
      "Epoch 1/5\n",
      "48000/48000 [==============================] - 11s 230us/step - loss: 0.3890 - accuracy: 0.8942 - val_loss: 0.1728 - val_accuracy: 0.9517\n",
      "Epoch 2/5\n",
      "48000/48000 [==============================] - 10s 208us/step - loss: 0.1379 - accuracy: 0.9610 - val_loss: 0.1169 - val_accuracy: 0.9668\n",
      "Epoch 3/5\n",
      "48000/48000 [==============================] - 10s 199us/step - loss: 0.0932 - accuracy: 0.9734 - val_loss: 0.0867 - val_accuracy: 0.9769\n",
      "Epoch 4/5\n",
      "48000/48000 [==============================] - 9s 198us/step - loss: 0.0733 - accuracy: 0.9790 - val_loss: 0.0742 - val_accuracy: 0.9806\n",
      "Epoch 5/5\n",
      "48000/48000 [==============================] - 9s 197us/step - loss: 0.0621 - accuracy: 0.9819 - val_loss: 0.0698 - val_accuracy: 0.9806\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<keras.callbacks.callbacks.History at 0x6475f0ad0>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "simple_convnet_model.fit(scaled_train_images,\n",
    "                         train_labels_categorical,\n",
    "                         validation_split=0.2,\n",
    "                         epochs=5,\n",
    "                         batch_size=128)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A deeper CNN\n",
    "\n",
    "We next create a deeper CNN. Note that despite this CNN being deeper, it has _fewer_ parameters!\n",
    "\n",
    "For the second convolutional layer: remember that we treat the input as a 13-by-13 image that has a depth of 32 (i.e., you can think of this as a stack of 32 images each of size 13-by-13 pixels). Keras will automatically make the filter size to be 3-by-3-**by-32** in this case. Make sure you understand how this leads to the total parameter count of 9248."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"sequential_4\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "conv2d_2 (Conv2D)            (None, 26, 26, 32)        320       \n",
      "_________________________________________________________________\n",
      "max_pooling2d_2 (MaxPooling2 (None, 13, 13, 32)        0         \n",
      "_________________________________________________________________\n",
      "conv2d_3 (Conv2D)            (None, 11, 11, 32)        9248      \n",
      "_________________________________________________________________\n",
      "max_pooling2d_3 (MaxPooling2 (None, 5, 5, 32)          0         \n",
      "_________________________________________________________________\n",
      "flatten_2 (Flatten)          (None, 800)               0         \n",
      "_________________________________________________________________\n",
      "dense_5 (Dense)              (None, 10)                8010      \n",
      "=================================================================\n",
      "Total params: 17,578\n",
      "Trainable params: 17,578\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "deeper_convnet_model = Sequential()\n",
    "deeper_convnet_model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))\n",
    "deeper_convnet_model.add(MaxPooling2D((2, 2)))\n",
    "deeper_convnet_model.add(Conv2D(32, (3, 3), activation='relu'))\n",
    "deeper_convnet_model.add(MaxPooling2D((2, 2)))\n",
    "deeper_convnet_model.add(Flatten())\n",
    "deeper_convnet_model.add(Dense(10, activation='softmax'))\n",
    "deeper_convnet_model.summary()\n",
    "\n",
    "deeper_convnet_model.compile(optimizer='adam',\n",
    "                             loss='categorical_crossentropy',\n",
    "                             metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 48000 samples, validate on 12000 samples\n",
      "Epoch 1/5\n",
      "48000/48000 [==============================] - 16s 334us/step - loss: 0.3896 - accuracy: 0.8871 - val_loss: 0.1244 - val_accuracy: 0.9651\n",
      "Epoch 2/5\n",
      "48000/48000 [==============================] - 16s 329us/step - loss: 0.1048 - accuracy: 0.9686 - val_loss: 0.0835 - val_accuracy: 0.9753\n",
      "Epoch 3/5\n",
      "48000/48000 [==============================] - 16s 343us/step - loss: 0.0767 - accuracy: 0.9758 - val_loss: 0.0661 - val_accuracy: 0.9807\n",
      "Epoch 4/5\n",
      "48000/48000 [==============================] - 16s 329us/step - loss: 0.0622 - accuracy: 0.9806 - val_loss: 0.0620 - val_accuracy: 0.9804\n",
      "Epoch 5/5\n",
      "48000/48000 [==============================] - 16s 329us/step - loss: 0.0520 - accuracy: 0.9840 - val_loss: 0.0598 - val_accuracy: 0.9810\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<keras.callbacks.callbacks.History at 0x64750b890>"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "deeper_convnet_model.fit(scaled_train_images,\n",
    "                         train_labels_categorical,\n",
    "                         validation_split=0.2,\n",
    "                         epochs=5,\n",
    "                         batch_size=128)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Finally evaluate on test data\n",
    "\n",
    "Finally, we look at the test set accuracy. Here, note that the deeper CNN has the best test set accuracy despite having fewer parameters than the two-Dense-layer neural net and the simpler CNN! In machine learning, between models that have the same accuracy, often we favor one with fewer parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10000/10000 [==============================] - 0s 23us/step\n",
      "Test accuracy: 0.9208999872207642\n"
     ]
    }
   ],
   "source": [
    "test_loss, test_acc = single_layer_model.evaluate(flattened_test_images, test_labels_categorical)\n",
    "print('Test accuracy:', test_acc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10000/10000 [==============================] - 0s 46us/step\n",
      "Test accuracy: 0.9765999913215637\n"
     ]
    }
   ],
   "source": [
    "test_loss, test_acc = two_layer_model.evaluate(flattened_test_images, test_labels_categorical)\n",
    "print('Test accuracy:', test_acc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10000/10000 [==============================] - 1s 90us/step\n",
      "Test accuracy: 0.9811999797821045\n"
     ]
    }
   ],
   "source": [
    "test_loss, test_acc = simple_convnet_model.evaluate(scaled_test_images, test_labels_categorical)\n",
    "print('Test accuracy:', test_acc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10000/10000 [==============================] - 1s 122us/step\n",
      "Test accuracy: 0.984000027179718\n"
     ]
    }
   ],
   "source": [
    "test_loss, test_acc = deeper_convnet_model.evaluate(scaled_test_images, test_labels_categorical)\n",
    "print('Test accuracy:', test_acc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To get the actual predicted labels for any of these models, we can use the `predict_classes` function; we can check that the raw accuracy agrees with the accuracy found above via the `evaluate` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "predicted_labels = deeper_convnet_model.predict_classes(scaled_test_images)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([7, 2, 1, ..., 4, 5, 6])"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predicted_labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.984"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.mean(predicted_labels == test_labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the `predict` function produces the raw output of the neural net, which for each test image corresponds to the probabilities of the different digits 0, 1, ..., 9."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "predicted_outputs = deeper_convnet_model.predict(scaled_test_images)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(10000, 10)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predicted_outputs.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, we can see what the predicted class probabilities are for the 0-th test example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.     , 0.     , 0.00001, 0.00001, 0.     , 0.     , 0.     ,\n",
       "       0.99997, 0.     , 0.     ], dtype=float32)"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predicted_outputs[0]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# 94-775/95-865: Image Analysis with Convolutional Neural Nets (CNN's, also called convnets)\n",
	"\n",
	"Author: George H. Chen (georgechen [at symbol] cmu.edu)\n",
	"\n",
	"This demo draws heavily from the handwritten digit example in Chapter 2 of Francois Chollet's \"Deep Learning with Python\" book. I've added a simpler single-layer example first before moving to the 2-layer example. I then proceed to two CNN examples."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We start with loading in the data."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"Using TensorFlow backend.\n"
	]
	}
	],
	"source": [
	"%matplotlib inline\n",
	"import matplotlib.pyplot as plt\n",
	"import numpy as np\n",
	"np.set_printoptions(precision=5, suppress=True)\n",
	"\n",
	"from tensorflow.python import keras\n",
	"from keras.datasets import mnist\n",
	"from keras.models import Sequential\n",
	"from keras.layers import Dense\n",
	"\n",
	"(train_images, train_labels), (test_images, test_labels) = mnist.load_data()\n",
	"\n",
	"flattened_train_images = train_images.reshape(len(train_images), -1) # flattens out each training image\n",
	"flattened_train_images = flattened_train_images.astype(np.float32) / 255 # rescale to be between 0 and 1\n",
	"flattened_test_images = test_images.reshape(len(test_images), -1) # flattens out each test image\n",
	"flattened_test_images = flattened_test_images.astype(np.float32) / 255 # rescale to be between 0 and 1\n",
	"\n",
	"from keras.utils import to_categorical\n",
	"train_labels_categorical = to_categorical(train_labels)\n",
	"test_labels_categorical = to_categorical(test_labels)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"5"
	]
	},
	"execution_count": 2,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"train_labels[0]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)"
	]
	},
	"execution_count": 3,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"train_labels_categorical[0]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Single-layer neural net\n",
	"\n",
	"Making a neural net with a single Dense layer (also called a fully-connected layer) is fairly simple. Note that we need to specify the input shape for the initial layer.\n",
	"\n",
	"Make sure you understand where the number of parameters comes from! To do this, count how many numbers are in the weight matrix and the bias vector."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Model: \"sequential_1\"\n",
	"_________________________________________________________________\n",
	"Layer (type) Output Shape Param # \n",
	"=================================================================\n",
	"dense_1 (Dense) (None, 10) 7850 \n",
	"=================================================================\n",
	"Total params: 7,850\n",
	"Trainable params: 7,850\n",
	"Non-trainable params: 0\n",
	"_________________________________________________________________\n"
	]
	}
	],
	"source": [
	"single_layer_model = Sequential() # this is Keras's way of specifying a model that is a single sequence of layers\n",
	"single_layer_model.add(Dense(10, activation='softmax', input_shape=(784,)))\n",
	"single_layer_model.summary()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [],
	"source": [
	"single_layer_model.compile(optimizer='adam',\n",
	" loss='categorical_crossentropy',\n",
	" metrics=['accuracy'])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Train on 48000 samples, validate on 12000 samples\n",
	"Epoch 1/5\n",
	"48000/48000 [==============================] - 2s 37us/step - loss: 0.7575 - accuracy: 0.8126 - val_loss: 0.4124 - val_accuracy: 0.8958\n",
	"Epoch 2/5\n",
	"48000/48000 [==============================] - 2s 32us/step - loss: 0.3928 - accuracy: 0.8945 - val_loss: 0.3359 - val_accuracy: 0.9092\n",
	"Epoch 3/5\n",
	"48000/48000 [==============================] - 2s 34us/step - loss: 0.3399 - accuracy: 0.9071 - val_loss: 0.3090 - val_accuracy: 0.9125\n",
	"Epoch 4/5\n",
	"48000/48000 [==============================] - 1s 31us/step - loss: 0.3158 - accuracy: 0.9124 - val_loss: 0.2944 - val_accuracy: 0.9183\n",
	"Epoch 5/5\n",
	"48000/48000 [==============================] - 2s 32us/step - loss: 0.3010 - accuracy: 0.9161 - val_loss: 0.2855 - val_accuracy: 0.9225\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"<keras.callbacks.callbacks.History at 0x6461a0e10>"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"single_layer_model.fit(flattened_train_images,\n",
	" train_labels_categorical,\n",
	" validation_split=0.2,\n",
	" epochs=5,\n",
	" batch_size=128)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Two-layer neural net\n",
	"\n",
	"Going from 1 Dense layer to 2 Dense layers is straightforward. Importantly, we only need to specify the input shape for the first layer added; the input shape is automatically determined by Keras for the second layer.\n",
	"\n",
	"Once again, make sure you know where the number of parameters come from."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Model: \"sequential_2\"\n",
	"_________________________________________________________________\n",
	"Layer (type) Output Shape Param # \n",
	"=================================================================\n",
	"dense_2 (Dense) (None, 512) 401920 \n",
	"_________________________________________________________________\n",
	"dense_3 (Dense) (None, 10) 5130 \n",
	"=================================================================\n",
	"Total params: 407,050\n",
	"Trainable params: 407,050\n",
	"Non-trainable params: 0\n",
	"_________________________________________________________________\n"
	]
	}
	],
	"source": [
	"two_layer_model = Sequential() # this is Keras's way of specifying a model that is a single sequence of layers\n",
	"two_layer_model.add(Dense(512, activation='relu', input_shape=(784,)))\n",
	"two_layer_model.add(Dense(10, activation='softmax'))\n",
	"two_layer_model.compile(optimizer='adam',\n",
	" loss='categorical_crossentropy',\n",
	" metrics=['accuracy'])\n",
	"two_layer_model.summary()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Train on 48000 samples, validate on 12000 samples\n",
	"Epoch 1/5\n",
	"48000/48000 [==============================] - 6s 117us/step - loss: 0.2942 - accuracy: 0.9171 - val_loss: 0.1531 - val_accuracy: 0.9554\n",
	"Epoch 2/5\n",
	"48000/48000 [==============================] - 5s 110us/step - loss: 0.1200 - accuracy: 0.9654 - val_loss: 0.1218 - val_accuracy: 0.9639\n",
	"Epoch 3/5\n",
	"48000/48000 [==============================] - 5s 114us/step - loss: 0.0794 - accuracy: 0.9772 - val_loss: 0.0933 - val_accuracy: 0.9711\n",
	"Epoch 4/5\n",
	"48000/48000 [==============================] - 5s 99us/step - loss: 0.0573 - accuracy: 0.9830 - val_loss: 0.0854 - val_accuracy: 0.9747\n",
	"Epoch 5/5\n",
	"48000/48000 [==============================] - 4s 88us/step - loss: 0.0414 - accuracy: 0.9886 - val_loss: 0.0811 - val_accuracy: 0.9750\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"<keras.callbacks.callbacks.History at 0x646958f90>"
	]
	},
	"execution_count": 8,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"two_layer_model.fit(flattened_train_images,\n",
	" train_labels_categorical,\n",
	" validation_split=0.2,\n",
	" epochs=5,\n",
	" batch_size=128)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## A simple CNN\n",
	"\n",
	"To work with CNN's, we do not need to flatten the images. However, we still reshape them to include depth information (since we're looking at grayscale images, the depth is just 1)."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {},
	"outputs": [],
	"source": [
	"from keras.layers import Conv2D, MaxPooling2D, Flatten"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {},
	"outputs": [],
	"source": [
	"# reshape images to have an additional dimension for color (even though there's no color)\n",
	"scaled_train_images = train_images.reshape(len(train_images), train_images.shape[1], train_images.shape[2], -1)\n",
	"scaled_test_images = test_images.reshape(len(test_images), test_images.shape[1], test_images.shape[2], -1)\n",
	"\n",
	"# rescale to be between 0 and 1\n",
	"scaled_train_images = scaled_train_images.astype(np.float32) / 255\n",
	"scaled_test_images = scaled_test_images.astype(np.float32) / 255"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"(60000, 28, 28, 1)\n"
	]
	}
	],
	"source": [
	"print(scaled_train_images.shape)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We now create a simple CNN.\n",
	"\n",
	"Make sure you understand why the output shape after each layer is what it is, and where the number of parameters come from for the convolutional and dense layers."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Model: \"sequential_3\"\n",
	"_________________________________________________________________\n",
	"Layer (type) Output Shape Param # \n",
	"=================================================================\n",
	"conv2d_1 (Conv2D) (None, 26, 26, 32) 320 \n",
	"_________________________________________________________________\n",
	"max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32) 0 \n",
	"_________________________________________________________________\n",
	"flatten_1 (Flatten) (None, 5408) 0 \n",
	"_________________________________________________________________\n",
	"dense_4 (Dense) (None, 10) 54090 \n",
	"=================================================================\n",
	"Total params: 54,410\n",
	"Trainable params: 54,410\n",
	"Non-trainable params: 0\n",
	"_________________________________________________________________\n"
	]
	}
	],
	"source": [
	"simple_convnet_model = Sequential()\n",
	"simple_convnet_model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))\n",
	"simple_convnet_model.add(MaxPooling2D((2, 2)))\n",
	"simple_convnet_model.add(Flatten())\n",
	"simple_convnet_model.add(Dense(10, activation='softmax'))\n",
	"simple_convnet_model.summary()\n",
	"\n",
	"simple_convnet_model.compile(optimizer='adam',\n",
	" loss='categorical_crossentropy',\n",
	" metrics=['accuracy'])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Train on 48000 samples, validate on 12000 samples\n",
	"Epoch 1/5\n",
	"48000/48000 [==============================] - 11s 230us/step - loss: 0.3890 - accuracy: 0.8942 - val_loss: 0.1728 - val_accuracy: 0.9517\n",
	"Epoch 2/5\n",
	"48000/48000 [==============================] - 10s 208us/step - loss: 0.1379 - accuracy: 0.9610 - val_loss: 0.1169 - val_accuracy: 0.9668\n",
	"Epoch 3/5\n",
	"48000/48000 [==============================] - 10s 199us/step - loss: 0.0932 - accuracy: 0.9734 - val_loss: 0.0867 - val_accuracy: 0.9769\n",
	"Epoch 4/5\n",
	"48000/48000 [==============================] - 9s 198us/step - loss: 0.0733 - accuracy: 0.9790 - val_loss: 0.0742 - val_accuracy: 0.9806\n",
	"Epoch 5/5\n",
	"48000/48000 [==============================] - 9s 197us/step - loss: 0.0621 - accuracy: 0.9819 - val_loss: 0.0698 - val_accuracy: 0.9806\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"<keras.callbacks.callbacks.History at 0x6475f0ad0>"
	]
	},
	"execution_count": 13,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"simple_convnet_model.fit(scaled_train_images,\n",
	" train_labels_categorical,\n",
	" validation_split=0.2,\n",
	" epochs=5,\n",
	" batch_size=128)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## A deeper CNN\n",
	"\n",
	"We next create a deeper CNN. Note that despite this CNN being deeper, it has _fewer_ parameters!\n",
	"\n",
	"For the second convolutional layer: remember that we treat the input as a 13-by-13 image that has a depth of 32 (i.e., you can think of this as a stack of 32 images each of size 13-by-13 pixels). Keras will automatically make the filter size to be 3-by-3-by-32 in this case. Make sure you understand how this leads to the total parameter count of 9248."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 14,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Model: \"sequential_4\"\n",
	"_________________________________________________________________\n",
	"Layer (type) Output Shape Param # \n",
	"=================================================================\n",
	"conv2d_2 (Conv2D) (None, 26, 26, 32) 320 \n",
	"_________________________________________________________________\n",
	"max_pooling2d_2 (MaxPooling2 (None, 13, 13, 32) 0 \n",
	"_________________________________________________________________\n",
	"conv2d_3 (Conv2D) (None, 11, 11, 32) 9248 \n",
	"_________________________________________________________________\n",
	"max_pooling2d_3 (MaxPooling2 (None, 5, 5, 32) 0 \n",
	"_________________________________________________________________\n",
	"flatten_2 (Flatten) (None, 800) 0 \n",
	"_________________________________________________________________\n",
	"dense_5 (Dense) (None, 10) 8010 \n",
	"=================================================================\n",
	"Total params: 17,578\n",
	"Trainable params: 17,578\n",
	"Non-trainable params: 0\n",
	"_________________________________________________________________\n"
	]
	}
	],
	"source": [
	"deeper_convnet_model = Sequential()\n",
	"deeper_convnet_model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))\n",
	"deeper_convnet_model.add(MaxPooling2D((2, 2)))\n",
	"deeper_convnet_model.add(Conv2D(32, (3, 3), activation='relu'))\n",
	"deeper_convnet_model.add(MaxPooling2D((2, 2)))\n",
	"deeper_convnet_model.add(Flatten())\n",
	"deeper_convnet_model.add(Dense(10, activation='softmax'))\n",
	"deeper_convnet_model.summary()\n",
	"\n",
	"deeper_convnet_model.compile(optimizer='adam',\n",
	" loss='categorical_crossentropy',\n",
	" metrics=['accuracy'])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 15,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Train on 48000 samples, validate on 12000 samples\n",
	"Epoch 1/5\n",
	"48000/48000 [==============================] - 16s 334us/step - loss: 0.3896 - accuracy: 0.8871 - val_loss: 0.1244 - val_accuracy: 0.9651\n",
	"Epoch 2/5\n",
	"48000/48000 [==============================] - 16s 329us/step - loss: 0.1048 - accuracy: 0.9686 - val_loss: 0.0835 - val_accuracy: 0.9753\n",
	"Epoch 3/5\n",
	"48000/48000 [==============================] - 16s 343us/step - loss: 0.0767 - accuracy: 0.9758 - val_loss: 0.0661 - val_accuracy: 0.9807\n",
	"Epoch 4/5\n",
	"48000/48000 [==============================] - 16s 329us/step - loss: 0.0622 - accuracy: 0.9806 - val_loss: 0.0620 - val_accuracy: 0.9804\n",
	"Epoch 5/5\n",
	"48000/48000 [==============================] - 16s 329us/step - loss: 0.0520 - accuracy: 0.9840 - val_loss: 0.0598 - val_accuracy: 0.9810\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"<keras.callbacks.callbacks.History at 0x64750b890>"
	]
	},
	"execution_count": 15,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"deeper_convnet_model.fit(scaled_train_images,\n",
	" train_labels_categorical,\n",
	" validation_split=0.2,\n",
	" epochs=5,\n",
	" batch_size=128)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Finally evaluate on test data\n",
	"\n",
	"Finally, we look at the test set accuracy. Here, note that the deeper CNN has the best test set accuracy despite having fewer parameters than the two-Dense-layer neural net and the simpler CNN! In machine learning, between models that have the same accuracy, often we favor one with fewer parameters."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 16,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"10000/10000 [==============================] - 0s 23us/step\n",
	"Test accuracy: 0.9208999872207642\n"
	]
	}
	],
	"source": [
	"test_loss, test_acc = single_layer_model.evaluate(flattened_test_images, test_labels_categorical)\n",
	"print('Test accuracy:', test_acc)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 17,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"10000/10000 [==============================] - 0s 46us/step\n",
	"Test accuracy: 0.9765999913215637\n"
	]
	}
	],
	"source": [
	"test_loss, test_acc = two_layer_model.evaluate(flattened_test_images, test_labels_categorical)\n",
	"print('Test accuracy:', test_acc)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 18,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"10000/10000 [==============================] - 1s 90us/step\n",
	"Test accuracy: 0.9811999797821045\n"
	]
	}
	],
	"source": [
	"test_loss, test_acc = simple_convnet_model.evaluate(scaled_test_images, test_labels_categorical)\n",
	"print('Test accuracy:', test_acc)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 19,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"10000/10000 [==============================] - 1s 122us/step\n",
	"Test accuracy: 0.984000027179718\n"
	]
	}
	],
	"source": [
	"test_loss, test_acc = deeper_convnet_model.evaluate(scaled_test_images, test_labels_categorical)\n",
	"print('Test accuracy:', test_acc)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"To get the actual predicted labels for any of these models, we can use the `predict_classes` function; we can check that the raw accuracy agrees with the accuracy found above via the `evaluate` function."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 20,
	"metadata": {},
	"outputs": [],
	"source": [
	"predicted_labels = deeper_convnet_model.predict_classes(scaled_test_images)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 21,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([7, 2, 1, ..., 4, 5, 6])"
	]
	},
	"execution_count": 21,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"predicted_labels"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 22,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"0.984"
	]
	},
	"execution_count": 22,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"np.mean(predicted_labels == test_labels)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Note that the `predict` function produces the raw output of the neural net, which for each test image corresponds to the probabilities of the different digits 0, 1, ..., 9."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 23,
	"metadata": {},
	"outputs": [],
	"source": [
	"predicted_outputs = deeper_convnet_model.predict(scaled_test_images)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 24,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"(10000, 10)"
	]
	},
	"execution_count": 24,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"predicted_outputs.shape"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"For example, we can see what the predicted class probabilities are for the 0-th test example:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 25,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([0. , 0. , 0.00001, 0.00001, 0. , 0. , 0. ,\n",
	" 0.99997, 0. , 0. ], dtype=float32)"
	]
	},
	"execution_count": 25,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"predicted_outputs[0]"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.4"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}