Raukk/mnist_examples.ipynb Secret

## mnist_examples.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "MNIST_Examples.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/Raukk/f26f9ad18add3468ddf2d01846b2d479/mnist_examples.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EfooSPNq0DPp",
        "colab_type": "text"
      },
      "source": [
        "#This is a very simple set of examples that use the MNIST Dataset\n",
        "\n",
        "Most of the examples compare the tradeoffs between Accuracy and Speed\n",
        "\n",
        "\n",
        "\n",
        "There are a lot of little bits of code taken from examples all across the internet, but in general, this is my code and is licensed as such:\n",
        "\"This is free and unencumbered software released into the public domain.\"\n",
        "\n",
        "For details See: https://unlicense.org/\n",
        "\n",
        "This license only applies to the code in this specific file and not any libraries or other code, etc. (duh)\n",
        "\n",
        "Personal Note: I may not have updated all comments from earlier versions, and becasue random numbers, you may get diffrent results.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MQQK4I0i0Bzx",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Imports\n",
        "\n",
        "import time\n",
        "\n",
        "import numpy as np \n",
        "\n",
        "import tensorflow as tf\n",
        "\n",
        "import keras\n",
        "\n",
        "import matplotlib.pyplot as plt    \n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "VfWq-i5q0gsR",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Constants\n",
        "\n",
        "batch_size=128\n",
        "\n",
        "epochs = 10\n",
        "\n",
        "nb_classes = 10\n",
        "\n",
        "keras_verbosity = 1\n",
        "\n",
        "input_shape = (28, 28, 1)\n",
        "\n",
        "layers = tf.keras.layers\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "7zq9w8As0gwk",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 119
        },
        "outputId": "73fcb064-0230-4535-d693-5d11e93bdcbb"
      },
      "source": [
        "# Get the MNIST Dataset\n",
        "\n",
        "# Load the Dataset, they provided a nice helper that does all the network and downloading for you\n",
        "(X_train, Y_train), (X_test, Y_test) = keras.datasets.mnist.load_data()\n",
        "# This is an leterantive to the MNIST numbers dataset that is a computationlally harder problem\n",
        "#(X_train, y_train), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()\n",
        "\n",
        "# we need to make sure that the images are normalized and in the right format\n",
        "X_train = X_train.astype('float32')\n",
        "X_test = X_test.astype('float32')\n",
        "X_train /= 255\n",
        "X_test /= 255\n",
        "\n",
        "# expand the dimensions to get the shape to (samples, height, width, channels) where greyscale has 1 channel\n",
        "X_train = np.expand_dims(X_train, axis=-1)\n",
        "X_test = np.expand_dims(X_test, axis=-1)\n",
        "\n",
        "# one-hot encoding, this way, each digit has a probability output\n",
        "Y_train = keras.utils.np_utils.to_categorical(Y_train, nb_classes)\n",
        "Y_test = keras.utils.np_utils.to_categorical(Y_test, nb_classes)\n",
        "\n",
        "# log some basic details to be sure things loaded\n",
        "print()\n",
        "print('MNIST data loaded: train:',len(X_train),'test:',len(X_test))\n",
        "print('X_train:', X_train.shape)\n",
        "print('Y_train:', Y_train.shape)\n",
        "print('X_test:', X_test.shape)\n",
        "print('Y_test:', Y_test.shape)\n"
      ],
      "execution_count": 33,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "\n",
            "MNIST data loaded: train: 60000 test: 10000\n",
            "X_train: (60000, 28, 28, 1)\n",
            "Y_train: (60000, 10)\n",
            "X_test: (10000, 28, 28, 1)\n",
            "Y_test: (10000, 10)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "D-SstfPYYQ3_",
        "colab_type": "text"
      },
      "source": [
        "Note: I'm primarilly going to talk about pixels when defining how things work, because it's easier to understand and works the same way regardless if it is running on a Greyscale pixel, an RGB pixel, or a 100 filter deep 'Pixel'\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "Also Note:\n",
        "The majority of the trainable parmeters are in the Dense layer (500) but the Convolutions still have a huge number of computations because their math is multiplied by the number of input windows. They just reuse their Weights for every window.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "UydgG1j40gym",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 303
        },
        "outputId": "140624af-b5ed-4ebf-95fe-2caa223a66b5"
      },
      "source": [
        "# Lets print of the first image just to be sure everything loaded\n",
        "\n",
        "print(np.argmax(Y_test[0]))\n",
        "\n",
        "plt.imshow(np.squeeze(X_test[0], axis=-1), cmap='gray', interpolation='none')\n"
      ],
      "execution_count": 34,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "7\n"
          ],
          "name": "stdout"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<matplotlib.image.AxesImage at 0x7f8dee9e5668>"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 34
        },
        {
          "output_type": "display_data",
          "data": {
            "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP8AAAD8CAYAAAC4nHJkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAADO5JREFUeJzt3V2IXfW5x/Hf76QpiOlFYjUMNpqe\nogerSKKjCMYS9VhyYiEWg9SLkkLJ9CJKCyVU7EVzWaQv1JvAlIbGkmMrpNUoYmNjMQ1qcSJqEmNi\nElIzMW9lhCaCtNGnF7Nsp3H2f+/st7XH5/uBYfZez3p52Mxv1lp77bX/jggByOe/6m4AQD0IP5AU\n4QeSIvxAUoQfSIrwA0kRfiApwg8kRfiBpD7Vz43Z5uOEQI9FhFuZr6M9v+1ltvfZPmD7gU7WBaC/\n3O5n+23PkrRf0h2SxiW9LOneiHijsAx7fqDH+rHnv1HSgYg4FBF/l/RrSSs6WB+APuok/JdKOjLl\n+Xg17T/YHrE9Znusg20B6LKev+EXEaOSRiUO+4FB0sme/6ikBVOef66aBmAG6CT8L0u6wvbnbX9a\n0tckbelOWwB6re3D/og4a/s+Sb+XNEvShojY07XOAPRU25f62toY5/xAz/XlQz4AZi7CDyRF+IGk\nCD+QFOEHkiL8QFKEH0iK8ANJEX4gKcIPJEX4gaQIP5AU4QeSIvxAUoQfSIrwA0kRfiApwg8kRfiB\npAg/kBThB5Ii/EBShB9IivADSRF+ICnCDyRF+IGkCD+QFOEHkmp7iG5Jsn1Y0mlJH0g6GxHD3WgK\nQO91FP7KrRHx1y6sB0AfcdgPJNVp+EPSVts7bY90oyEA/dHpYf+SiDhq+xJJz9p+MyK2T52h+qfA\nPwZgwDgiurMie52kMxHxo8I83dkYgIYiwq3M1/Zhv+0LbX/mo8eSvixpd7vrA9BfnRz2z5f0O9sf\nref/I+KZrnQFoOe6dtjf0sY47Ad6rueH/QBmNsIPJEX4gaQIP5AU4QeSIvxAUt24qy+FlStXNqyt\nXr26uOw777xTrL///vvF+qZNm4r148ePN6wdOHCguCzyYs8PJEX4gaQIP5AU4QeSIvxAUoQfSIrw\nA0lxS2+LDh061LC2cOHC/jUyjdOnTzes7dmzp4+dDJbx8fGGtYceeqi47NjYWLfb6Rtu6QVQRPiB\npAg/kBThB5Ii/EBShB9IivADSXE/f4tK9+xfe+21xWX37t1brF911VXF+nXXXVesL126tGHtpptu\nKi575MiRYn3BggXFeifOnj1brJ86dapYHxoaanvbb7/9drE+k6/zt4o9P5AU4QeSIvxAUoQfSIrw\nA0kRfiApwg8k1fR+ftsbJH1F0smIuKaaNk/SbyQtlHRY0j0R8W7Tjc3g+/kH2dy5cxvWFi1aVFx2\n586dxfoNN9zQVk+taDZewf79+4v1Zp+fmDdvXsPamjVrisuuX7++WB9k3byf/5eSlp0z7QFJ2yLi\nCknbqucAZpCm4Y+I7ZImzpm8QtLG6vFGSXd1uS8APdbuOf/8iDhWPT4uaX6X+gHQJx1/tj8ionQu\nb3tE0kin2wHQXe3u+U/YHpKk6vfJRjNGxGhEDEfEcJvbAtAD7YZ/i6RV1eNVkp7oTjsA+qVp+G0/\nKulFSf9je9z2NyX9UNIdtt+S9L/VcwAzCN/bj4F19913F+uPPfZYsb579+6GtVtvvbW47MTEuRe4\nZg6+tx9AEeEHkiL8QFKEH0iK8ANJEX4gKS71oTaXXHJJsb5r166Oll+5cmXD2ubNm4vLzmRc6gNQ\nRPiBpAg/kBThB5Ii/EBShB9IivADSTFEN2rT7OuzL7744mL93XfL3xa/b9++8+4pE/b8QFKEH0iK\n8ANJEX4gKcIPJEX4gaQIP5AU9/Ojp26++eaGteeee6647OzZs4v1pUuXFuvbt28v1j+puJ8fQBHh\nB5Ii/EBShB9IivADSRF+ICnCDyTV9H5+2xskfUXSyYi4ppq2TtJqSaeq2R6MiKd71SRmruXLlzes\nNbuOv23btmL9xRdfbKsnTGplz/9LScummf7TiFhU/RB8YIZpGv6I2C5pog+9AOijTs7577P9uu0N\ntud2rSMAfdFu+NdL+oKkRZKOSfpxoxltj9gesz3W5rYA9EBb4Y+IExHxQUR8KOnnkm4szDsaEcMR\nMdxukwC6r63w2x6a8vSrknZ3px0A/dLKpb5HJS2V9Fnb45J+IGmp7UWSQtJhSd/qYY8AeoD7+dGR\nCy64oFjfsWNHw9rVV19dXPa2224r1l944YViPSvu5wdQRPiBpAg/kBThB5Ii/EBShB9IiiG60ZG1\na9cW64sXL25Ye+aZZ4rLcimvt9jzA0kRfiApwg8kRfiBpAg/kBThB5Ii/EBS3NKLojvvvLNYf/zx\nx4v19957r2Ft2bLpvhT631566aViHdPjll4ARYQfSIrwA0kRfiApwg8kRfiBpAg/kBT38yd30UUX\nFesPP/xwsT5r1qxi/emnGw/gzHX8erHnB5Ii/EBShB9IivADSRF+ICnCDyRF+IGkmt7Pb3uBpEck\nzZcUkkYj4me250n6jaSFkg5Luici3m2yLu7n77Nm1+GbXWu//vrri/WDBw8W66V79psti/Z0837+\ns5K+GxFflHSTpDW2vyjpAUnbIuIKSduq5wBmiKbhj4hjEfFK9fi0pL2SLpW0QtLGaraNku7qVZMA\nuu+8zvltL5S0WNKfJc2PiGNV6bgmTwsAzBAtf7bf9hxJmyV9JyL+Zv/7tCIiotH5vO0RSSOdNgqg\nu1ra89uercngb4qI31aTT9gequpDkk5Ot2xEjEbEcEQMd6NhAN3RNPye3MX/QtLeiPjJlNIWSauq\nx6skPdH99gD0SiuX+pZI+pOkXZI+rCY/qMnz/sckXSbpL5q81DfRZF1c6uuzK6+8slh/8803O1r/\nihUrivUnn3yyo/Xj/LV6qa/pOX9E7JDUaGW3n09TAAYHn/ADkiL8QFKEH0iK8ANJEX4gKcIPJMVX\nd38CXH755Q1rW7du7Wjda9euLdafeuqpjtaP+rDnB5Ii/EBShB9IivADSRF+ICnCDyRF+IGkuM7/\nCTAy0vhb0i677LKO1v38888X682+DwKDiz0/kBThB5Ii/EBShB9IivADSRF+ICnCDyTFdf4ZYMmS\nJcX6/fff36dO8EnCnh9IivADSRF+ICnCDyRF+IGkCD+QFOEHkmp6nd/2AkmPSJovKSSNRsTPbK+T\ntFrSqWrWByPi6V41mtktt9xSrM+ZM6ftdR88eLBYP3PmTNvrxmBr5UM+ZyV9NyJesf0ZSTttP1vV\nfhoRP+pdewB6pWn4I+KYpGPV49O290q6tNeNAeit8zrnt71Q0mJJf64m3Wf7ddsbbM9tsMyI7THb\nYx11CqCrWg6/7TmSNkv6TkT8TdJ6SV+QtEiTRwY/nm65iBiNiOGIGO5CvwC6pKXw256tyeBviojf\nSlJEnIiIDyLiQ0k/l3Rj79oE0G1Nw2/bkn4haW9E/GTK9KEps31V0u7utwegV1p5t/9mSV+XtMv2\nq9W0ByXda3uRJi//HZb0rZ50iI689tprxfrtt99erE9MTHSzHQyQVt7t3yHJ05S4pg/MYHzCD0iK\n8ANJEX4gKcIPJEX4gaQIP5CU+znEsm3GcwZ6LCKmuzT/Mez5gaQIP5AU4QeSIvxAUoQfSIrwA0kR\nfiCpfg/R/VdJf5ny/LPVtEE0qL0Nal8SvbWrm71d3uqMff2Qz8c2bo8N6nf7DWpvg9qXRG/tqqs3\nDvuBpAg/kFTd4R+tefslg9rboPYl0Vu7aumt1nN+APWpe88PoCa1hN/2Mtv7bB+w/UAdPTRi+7Dt\nXbZfrXuIsWoYtJO2d0+ZNs/2s7bfqn5PO0xaTb2ts320eu1etb28pt4W2P6j7Tds77H97Wp6ra9d\noa9aXre+H/bbniVpv6Q7JI1LelnSvRHxRl8bacD2YUnDEVH7NWHbX5J0RtIjEXFNNe0hSRMR8cPq\nH+fciPjegPS2TtKZukdurgaUGZo6srSkuyR9QzW+doW+7lENr1sde/4bJR2IiEMR8XdJv5a0ooY+\nBl5EbJd07qgZKyRtrB5v1OQfT9816G0gRMSxiHilenxa0kcjS9f62hX6qkUd4b9U0pEpz8c1WEN+\nh6SttnfaHqm7mWnMr4ZNl6TjkubX2cw0mo7c3E/njCw9MK9dOyNedxtv+H3ckoi4TtL/SVpTHd4O\npJg8ZxukyzUtjdzcL9OMLP0vdb527Y543W11hP+opAVTnn+umjYQIuJo9fukpN9p8EYfPvHRIKnV\n75M19/MvgzRy83QjS2sAXrtBGvG6jvC/LOkK25+3/WlJX5O0pYY+Psb2hdUbMbJ9oaQva/BGH94i\naVX1eJWkJ2rs5T8MysjNjUaWVs2v3cCNeB0Rff+RtFyT7/gflPT9Onpo0Nd/S3qt+tlTd2+SHtXk\nYeA/NPneyDclXSRpm6S3JP1B0rwB6u1XknZJel2TQRuqqbclmjykf13Sq9XP8rpfu0JftbxufMIP\nSIo3/ICkCD+QFOEHkiL8QFKEH0iK8ANJEX4gKcIPJPVP82g/p9/JjhUAAAAASUVORK5CYII=\n",
            "text/plain": [
              "<Figure size 432x288 with 1 Axes>"
            ]
          },
          "metadata": {
            "tags": []
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "v-asv3FwAji9",
        "colab_type": "text"
      },
      "source": [
        "MNIST is uniformly structured data where the relative possition actually contains more information than the value itself.\n",
        "\n",
        "Images are the most common form of this type of data,  but time series data is also a good example. \n",
        "\n",
        "For these items we use Convolutional Layers to capture that extra data as well as generalize the processing of it.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qRIvye019lOT",
        "colab_type": "text"
      },
      "source": [
        "One Major thing to consider for all Deep learning is \"Information Density\". \n",
        "\n",
        "As a rough Idea, it's the amount of \"Information\" in a sample divided by the amount of \"Data\" in the sample.\n",
        "\n",
        "I can't provide a perfect definition, but I can give a few examples.\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "\n",
        "First Example is MNIST, a 28 by 28 pixel grey scale image of a single digit.\n",
        "\n",
        "This means that the input is 784 data points each containing an 8 bit int value (0-255).\n",
        "\n",
        "But all that data, it only represents a single digit from 0 to 9. \n",
        "\n",
        "That's alot of input data for very little \"Information\" (I wrote down a '7'), so it's \"Density\" is low.\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "\n",
        "As a contrasting example:\n",
        "\n",
        "If we take just the dolar amount from my last 10 purchases, it would only be 1 Integer value each.\n",
        "\n",
        "But this could have all kinds of information in it, way more than 1 digit. (Most of the Information is indirect information)\n",
        "\n",
        "Therefore this example has a very high \"Density\". \n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "\n",
        "As a general rule, if the Information Density is high, then you want to expand it as much as you can.\n",
        "\n",
        "This can be done by using a dense layer with many more hidden units than inputs, or a convolutional layer with a large number of filters.\n",
        "\n",
        "\n",
        "Similarly, in reverse, if the data density is low, then you want to compress it as much as possible (while loosing as little information as possible)\n",
        "\n",
        "This is done in a lot of ways, but a convolution layer is most* common. There also exist a lot of non-NN ways to do it.\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "\n",
        "Since we are using MNIST which is has a very low information density, we will be focusing on different ways to reduce the number of computations, and what their cost to accuracy is (which indicates the amount of information lost during compression).\n",
        "\n",
        "\n",
        "I will only do one quick examle that focuses on a non-convolutional approach.\n",
        "\n",
        "Note: Since MNIST is so simple computationally, you normally see people only focusing on Accuracy, this will focus on computational Efficency' basically; Accuracy divided by Computations. \n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-VEaFvzN0guf",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 425
        },
        "outputId": "f32c598f-ccbe-4e8d-8fa0-87aff98968e8"
      },
      "source": [
        "# Build a standard baseline model (LeNet)\n",
        "\n",
        "model = tf.keras.models.Sequential()\n",
        "\n",
        "model.add(layers.Conv2D(20,\n",
        "                     (3, 3),\n",
        "                     input_shape=input_shape,\n",
        "                     activation='relu',\n",
        "                     name='conv_1'))\n",
        "model.add(layers.MaxPool2D())\n",
        "model.add(layers.Conv2D(50, (3, 3), activation='relu', name='conv_2'))\n",
        "model.add(layers.MaxPool2D())\n",
        "model.add(layers.Permute((2, 1, 3)))\n",
        "model.add(layers.Flatten())\n",
        "model.add(layers.Dense(500, activation='relu', name='dense_1'))\n",
        "model.add(layers.Dense(10, activation='softmax', name='dense_2'))\n",
        "\n",
        "model.compile(optimizer='adam',\n",
        "                    loss='categorical_crossentropy',\n",
        "                    metrics=['accuracy'])\n",
        "\n",
        "model.summary()"
      ],
      "execution_count": 35,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Model: \"sequential_8\"\n",
            "_________________________________________________________________\n",
            "Layer (type)                 Output Shape              Param #   \n",
            "=================================================================\n",
            "conv_1 (Conv2D)              (None, 26, 26, 20)        200       \n",
            "_________________________________________________________________\n",
            "max_pooling2d_13 (MaxPooling (None, 13, 13, 20)        0         \n",
            "_________________________________________________________________\n",
            "conv_2 (Conv2D)              (None, 11, 11, 50)        9050      \n",
            "_________________________________________________________________\n",
            "max_pooling2d_14 (MaxPooling (None, 5, 5, 50)          0         \n",
            "_________________________________________________________________\n",
            "permute_5 (Permute)          (None, 5, 5, 50)          0         \n",
            "_________________________________________________________________\n",
            "flatten_6 (Flatten)          (None, 1250)              0         \n",
            "_________________________________________________________________\n",
            "dense_1 (Dense)              (None, 500)               625500    \n",
            "_________________________________________________________________\n",
            "dense_2 (Dense)              (None, 10)                5010      \n",
            "=================================================================\n",
            "Total params: 639,760\n",
            "Trainable params: 639,760\n",
            "Non-trainable params: 0\n",
            "_________________________________________________________________\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8_YtgigmF1sc",
        "colab_type": "text"
      },
      "source": [
        "This first model is a clasic, and I will use it as a baseline. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "nSc0xKn11Deq",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 459
        },
        "outputId": "377b496e-909a-49e3-80e8-dcccc6bed4b6"
      },
      "source": [
        "# Run the training\n",
        "\n",
        "model.fit(\n",
        "          X_train,\n",
        "          Y_train,\n",
        "          epochs=epochs,\n",
        "          batch_size=batch_size,\n",
        "          verbose=keras_verbosity,\n",
        "          validation_data=(X_test, Y_test)\n",
        "         )\n",
        "\n",
        "\n",
        "# Then Print the Training Results\n",
        "score = model.evaluate(X_test, Y_test)\n",
        "print('Test score:', score[0])\n",
        "print('Test accuracy:', score[1])\n",
        "\n",
        "\n",
        "# Test the total time to predict the whole Validation set\n",
        "start_time = time.time()\n",
        "model.predict(X_test)\n",
        "print(\"--- %s seconds ---\" % (time.time() - start_time))\n",
        "\n",
        "\n",
        "# Print our 'Efficency' as the Accuracy / Total Time\n",
        "print(score[1]/(time.time() - start_time))\n"
      ],
      "execution_count": 36,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Train on 60000 samples, validate on 10000 samples\n",
            "Epoch 1/10\n",
            "60000/60000 [==============================] - 43s 711us/sample - loss: 0.1845 - acc: 0.9466 - val_loss: 0.0668 - val_acc: 0.9784\n",
            "Epoch 2/10\n",
            "60000/60000 [==============================] - 43s 711us/sample - loss: 0.0517 - acc: 0.9838 - val_loss: 0.0357 - val_acc: 0.9880\n",
            "Epoch 3/10\n",
            "60000/60000 [==============================] - 43s 709us/sample - loss: 0.0338 - acc: 0.9890 - val_loss: 0.0277 - val_acc: 0.9900\n",
            "Epoch 4/10\n",
            "60000/60000 [==============================] - 43s 715us/sample - loss: 0.0248 - acc: 0.9920 - val_loss: 0.0271 - val_acc: 0.9913\n",
            "Epoch 5/10\n",
            "60000/60000 [==============================] - 43s 712us/sample - loss: 0.0182 - acc: 0.9942 - val_loss: 0.0258 - val_acc: 0.9916\n",
            "Epoch 6/10\n",
            "60000/60000 [==============================] - 43s 712us/sample - loss: 0.0152 - acc: 0.9954 - val_loss: 0.0289 - val_acc: 0.9908\n",
            "Epoch 7/10\n",
            "60000/60000 [==============================] - 43s 712us/sample - loss: 0.0106 - acc: 0.9969 - val_loss: 0.0294 - val_acc: 0.9915\n",
            "Epoch 8/10\n",
            "60000/60000 [==============================] - 43s 718us/sample - loss: 0.0092 - acc: 0.9970 - val_loss: 0.0351 - val_acc: 0.9901\n",
            "Epoch 9/10\n",
            "60000/60000 [==============================] - 43s 715us/sample - loss: 0.0079 - acc: 0.9975 - val_loss: 0.0341 - val_acc: 0.9904\n",
            "Epoch 10/10\n",
            "60000/60000 [==============================] - 43s 713us/sample - loss: 0.0068 - acc: 0.9975 - val_loss: 0.0374 - val_acc: 0.9898\n",
            "10000/10000 [==============================] - 3s 314us/sample - loss: 0.0374 - acc: 0.9898\n",
            "Test score: 0.037350132793631474\n",
            "Test accuracy: 0.9898\n",
            "--- 2.7315876483917236 seconds ---\n",
            "0.3623257884613354\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nbBgz957REm_",
        "colab_type": "text"
      },
      "source": [
        "I'm not really going to talk about this much, it's mostly there to demonstrate the diffences in performance and archetecture."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "9GAQD2gC1DjK",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 255
        },
        "outputId": "7f74eca4-fbd1-479a-d38d-b3d8ebda77c0"
      },
      "source": [
        "# Build a very small Dense net as an example\n",
        "\n",
        "model = tf.keras.models.Sequential()\n",
        "\n",
        "model.add(layers.Flatten(input_shape=input_shape))\n",
        "model.add(layers.Dense(500, activation='relu', name='dense_1'))\n",
        "model.add(layers.Dense(10, activation='softmax', name='dense_2'))\n",
        "\n",
        "model.compile(optimizer='adam',\n",
        "                    loss='categorical_crossentropy',\n",
        "                    metrics=['accuracy'])\n",
        "\n",
        "model.summary()"
      ],
      "execution_count": 45,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Model: \"sequential_13\"\n",
            "_________________________________________________________________\n",
            "Layer (type)                 Output Shape              Param #   \n",
            "=================================================================\n",
            "flatten_11 (Flatten)         (None, 784)               0         \n",
            "_________________________________________________________________\n",
            "dense_1 (Dense)              (None, 500)               392500    \n",
            "_________________________________________________________________\n",
            "dense_2 (Dense)              (None, 10)                5010      \n",
            "=================================================================\n",
            "Total params: 397,510\n",
            "Trainable params: 397,510\n",
            "Non-trainable params: 0\n",
            "_________________________________________________________________\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "N8t5FTmj1DlN",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 459
        },
        "outputId": "96d16692-b5f0-4be5-f89c-3840081d83c2"
      },
      "source": [
        "# Run the training\n",
        "\n",
        "model.fit(\n",
        "          X_train,\n",
        "          Y_train,\n",
        "          epochs=epochs,\n",
        "          batch_size=batch_size,\n",
        "          verbose=keras_verbosity,\n",
        "          validation_data=(X_test, Y_test)\n",
        "         )\n",
        "\n",
        "\n",
        "# Then Print the Training Results\n",
        "score = model.evaluate(X_test, Y_test)\n",
        "print('Test score:', score[0])\n",
        "print('Test accuracy:', score[1])\n",
        "\n",
        "\n",
        "# Test the total time to predict the whole Validation set\n",
        "start_time = time.time()\n",
        "model.predict(X_test)\n",
        "print(\"--- %s seconds ---\" % (time.time() - start_time))\n",
        "\n",
        "\n",
        "# Print our 'Efficency' as the Accuracy / Total Time\n",
        "print(score[1]/(time.time() - start_time))\n"
      ],
      "execution_count": 46,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Train on 60000 samples, validate on 10000 samples\n",
            "Epoch 1/10\n",
            "60000/60000 [==============================] - 4s 74us/sample - loss: 0.2650 - acc: 0.9255 - val_loss: 0.1363 - val_acc: 0.9605\n",
            "Epoch 2/10\n",
            "60000/60000 [==============================] - 5s 79us/sample - loss: 0.1080 - acc: 0.9687 - val_loss: 0.1035 - val_acc: 0.9679\n",
            "Epoch 3/10\n",
            "60000/60000 [==============================] - 5s 79us/sample - loss: 0.0720 - acc: 0.9790 - val_loss: 0.0798 - val_acc: 0.9752\n",
            "Epoch 4/10\n",
            "60000/60000 [==============================] - 5s 78us/sample - loss: 0.0511 - acc: 0.9850 - val_loss: 0.0666 - val_acc: 0.9791\n",
            "Epoch 5/10\n",
            "60000/60000 [==============================] - 5s 79us/sample - loss: 0.0368 - acc: 0.9898 - val_loss: 0.0683 - val_acc: 0.9788\n",
            "Epoch 6/10\n",
            "60000/60000 [==============================] - 5s 77us/sample - loss: 0.0289 - acc: 0.9917 - val_loss: 0.0607 - val_acc: 0.9808\n",
            "Epoch 7/10\n",
            "60000/60000 [==============================] - 5s 79us/sample - loss: 0.0213 - acc: 0.9942 - val_loss: 0.0620 - val_acc: 0.9807\n",
            "Epoch 8/10\n",
            "60000/60000 [==============================] - 5s 82us/sample - loss: 0.0173 - acc: 0.9952 - val_loss: 0.0734 - val_acc: 0.9786\n",
            "Epoch 9/10\n",
            "60000/60000 [==============================] - 5s 82us/sample - loss: 0.0127 - acc: 0.9968 - val_loss: 0.0648 - val_acc: 0.9797\n",
            "Epoch 10/10\n",
            "60000/60000 [==============================] - 5s 81us/sample - loss: 0.0101 - acc: 0.9975 - val_loss: 0.0672 - val_acc: 0.9791\n",
            "10000/10000 [==============================] - 1s 75us/sample - loss: 0.0672 - acc: 0.9791\n",
            "Test score: 0.06724224201618345\n",
            "Test accuracy: 0.9791\n",
            "--- 0.7240893840789795 seconds ---\n",
            "1.351808114121352\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b10iaB3HROIA",
        "colab_type": "text"
      },
      "source": [
        "First I'm going to show how max pooling works.\n",
        "At it's simplest, max pooling just looks at a patch of inputs (usualy 4 inputs; 2x2) and picks the biggest only.\n",
        "This way it can reduce the number of outputs by a factor of however many are pooled together (again, usually a 2x2 patch; for a 4 times reduction)\n",
        "\n",
        "If we run a 2x2 MaxPooling on the raw MNIST input (pixels) it will group the pixels by 4s and return the highest value (whitest) of any of the 4 pixels. This means the output will only be 14x14x1, leaving it as 196 pixels, but if any of those 4 was white or grey, that will be the value for the output. Note: when done on input, it's basically the same as downsampling or decreasing the resolution of the image.\n",
        "\n",
        "When done on the outputs of a Convolutional NN, it works similarly, but there are many more channels (the # of filters) it must compare.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "5MJ0UK7z1DhM",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 459
        },
        "outputId": "7fd01faf-6340-47e1-e086-471b3089ef4f"
      },
      "source": [
        "# Build a model that heavily uses MaxPooling\n",
        "\n",
        "model = tf.keras.models.Sequential()\n",
        "\n",
        "model.add(layers.MaxPool2D(pool_size=(2,2), input_shape=input_shape))\n",
        "model.add(layers.Conv2D(20, (3, 3), activation='relu', name='conv_1'))\n",
        "model.add(layers.MaxPool2D(pool_size=(2,2)))\n",
        "model.add(layers.Conv2D(50, (3, 3), activation='relu', name='conv_2'))\n",
        "model.add(layers.MaxPool2D(pool_size=(2,2)))\n",
        "model.add(layers.Permute((2, 1, 3)))\n",
        "\n",
        "model.add(layers.Flatten())\n",
        "model.add(layers.Dense(500, activation='relu', name='dense_1'))\n",
        "model.add(layers.Dense(10, activation='softmax', name='dense_2'))\n",
        "\n",
        "model.compile(optimizer='adam',\n",
        "                    loss='categorical_crossentropy',\n",
        "                    metrics=['accuracy'])\n",
        "\n",
        "model.summary()"
      ],
      "execution_count": 39,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Model: \"sequential_10\"\n",
            "_________________________________________________________________\n",
            "Layer (type)                 Output Shape              Param #   \n",
            "=================================================================\n",
            "max_pooling2d_15 (MaxPooling (None, 14, 14, 1)         0         \n",
            "_________________________________________________________________\n",
            "conv_1 (Conv2D)              (None, 12, 12, 20)        200       \n",
            "_________________________________________________________________\n",
            "max_pooling2d_16 (MaxPooling (None, 6, 6, 20)          0         \n",
            "_________________________________________________________________\n",
            "conv_2 (Conv2D)              (None, 4, 4, 50)          9050      \n",
            "_________________________________________________________________\n",
            "max_pooling2d_17 (MaxPooling (None, 2, 2, 50)          0         \n",
            "_________________________________________________________________\n",
            "permute_6 (Permute)          (None, 2, 2, 50)          0         \n",
            "_________________________________________________________________\n",
            "flatten_8 (Flatten)          (None, 200)               0         \n",
            "_________________________________________________________________\n",
            "dense_1 (Dense)              (None, 500)               100500    \n",
            "_________________________________________________________________\n",
            "dense_2 (Dense)              (None, 10)                5010      \n",
            "=================================================================\n",
            "Total params: 114,760\n",
            "Trainable params: 114,760\n",
            "Non-trainable params: 0\n",
            "_________________________________________________________________\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0ugXQ-o8Qn1k",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 459
        },
        "outputId": "40fb210a-2a3e-4c42-f43f-87f33bad3038"
      },
      "source": [
        "# Run the training\n",
        "\n",
        "model.fit(\n",
        "          X_train,\n",
        "          Y_train,\n",
        "          epochs=epochs,\n",
        "          batch_size=batch_size,\n",
        "          verbose=keras_verbosity,\n",
        "          validation_data=(X_test, Y_test)\n",
        "         )\n",
        "\n",
        "\n",
        "# Then Print the Training Results\n",
        "score = model.evaluate(X_test, Y_test)\n",
        "print('Test score:', score[0])\n",
        "print('Test accuracy:', score[1])\n",
        "\n",
        "\n",
        "# Test the total time to predict the whole Validation set\n",
        "start_time = time.time()\n",
        "model.predict(X_test)\n",
        "print(\"--- %s seconds ---\" % (time.time() - start_time))\n",
        "\n",
        "\n",
        "# Print our 'Efficency' as the Accuracy / Total Time\n",
        "print(score[1]/(time.time() - start_time))\n"
      ],
      "execution_count": 40,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Train on 60000 samples, validate on 10000 samples\n",
            "Epoch 1/10\n",
            "60000/60000 [==============================] - 12s 207us/sample - loss: 0.3347 - acc: 0.9036 - val_loss: 0.1104 - val_acc: 0.9671\n",
            "Epoch 2/10\n",
            "60000/60000 [==============================] - 12s 208us/sample - loss: 0.0981 - acc: 0.9690 - val_loss: 0.0734 - val_acc: 0.9774\n",
            "Epoch 3/10\n",
            "60000/60000 [==============================] - 13s 217us/sample - loss: 0.0734 - acc: 0.9763 - val_loss: 0.0632 - val_acc: 0.9790\n",
            "Epoch 4/10\n",
            "60000/60000 [==============================] - 14s 226us/sample - loss: 0.0619 - acc: 0.9802 - val_loss: 0.0595 - val_acc: 0.9805\n",
            "Epoch 5/10\n",
            "60000/60000 [==============================] - 13s 224us/sample - loss: 0.0525 - acc: 0.9826 - val_loss: 0.0481 - val_acc: 0.9838\n",
            "Epoch 6/10\n",
            "60000/60000 [==============================] - 13s 216us/sample - loss: 0.0466 - acc: 0.9850 - val_loss: 0.0493 - val_acc: 0.9828\n",
            "Epoch 7/10\n",
            "60000/60000 [==============================] - 13s 219us/sample - loss: 0.0401 - acc: 0.9871 - val_loss: 0.0438 - val_acc: 0.9845\n",
            "Epoch 8/10\n",
            "60000/60000 [==============================] - 13s 223us/sample - loss: 0.0354 - acc: 0.9883 - val_loss: 0.0430 - val_acc: 0.9848\n",
            "Epoch 9/10\n",
            "60000/60000 [==============================] - 13s 224us/sample - loss: 0.0330 - acc: 0.9890 - val_loss: 0.0439 - val_acc: 0.9847\n",
            "Epoch 10/10\n",
            "60000/60000 [==============================] - 13s 220us/sample - loss: 0.0292 - acc: 0.9905 - val_loss: 0.0455 - val_acc: 0.9856\n",
            "10000/10000 [==============================] - 1s 122us/sample - loss: 0.0455 - acc: 0.9856\n",
            "Test score: 0.04553150092173019\n",
            "Test accuracy: 0.9856\n",
            "--- 1.0810096263885498 seconds ---\n",
            "0.9112940647924092\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SJRMaYUIRjjy",
        "colab_type": "text"
      },
      "source": [
        "Here we change the the stride value. \n",
        "\n",
        "A Convolutional layer with a 3x3 kernel will look at 9 pixels at a time, but if it uses the default stride of (1,1) then it will evaluate each pixel 9 times as the 'sliding window' only moves one pixel per slot. \n",
        "\n",
        "Stride defines how far to move after each window has been procesed, such that using stride (3,3) with kernel (3,3) means that each pixel will only be processed once. \n",
        "\n",
        "You'll notice that I had to remove the MaxPooling layers otherwise the size would shrink to 0x0"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "aIfXj6xVQoTI",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 357
        },
        "outputId": "6bb8dd07-694b-4ff1-82b7-baa4e034bc92"
      },
      "source": [
        "# Build a Model demonstrating Strides\n",
        "\n",
        "model = tf.keras.models.Sequential()\n",
        "\n",
        "model.add(layers.Conv2D(20,\n",
        "                     (3, 3), strides=(3,3),\n",
        "                     input_shape=input_shape,\n",
        "                     activation='relu',\n",
        "                     name='conv_1'))\n",
        "model.add(layers.Conv2D(50, (3, 3), strides=(3,3), activation='relu', name='conv_2'))\n",
        "model.add(layers.Permute((2, 1, 3)))\n",
        "model.add(layers.Flatten())\n",
        "model.add(layers.Dense(500, activation='relu', name='dense_1'))\n",
        "model.add(layers.Dense(10, activation='softmax', name='dense_2'))\n",
        "\n",
        "model.compile(optimizer='adam',\n",
        "                    loss='categorical_crossentropy',\n",
        "                    metrics=['accuracy'])\n",
        "\n",
        "model.summary()"
      ],
      "execution_count": 41,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Model: \"sequential_11\"\n",
            "_________________________________________________________________\n",
            "Layer (type)                 Output Shape              Param #   \n",
            "=================================================================\n",
            "conv_1 (Conv2D)              (None, 9, 9, 20)          200       \n",
            "_________________________________________________________________\n",
            "conv_2 (Conv2D)              (None, 3, 3, 50)          9050      \n",
            "_________________________________________________________________\n",
            "permute_7 (Permute)          (None, 3, 3, 50)          0         \n",
            "_________________________________________________________________\n",
            "flatten_9 (Flatten)          (None, 450)               0         \n",
            "_________________________________________________________________\n",
            "dense_1 (Dense)              (None, 500)               225500    \n",
            "_________________________________________________________________\n",
            "dense_2 (Dense)              (None, 10)                5010      \n",
            "=================================================================\n",
            "Total params: 239,760\n",
            "Trainable params: 239,760\n",
            "Non-trainable params: 0\n",
            "_________________________________________________________________\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "StpB0QoxQoVm",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 459
        },
        "outputId": "aec7ff7d-b3f9-440a-ad42-cf11e6172570"
      },
      "source": [
        "# Run the training\n",
        "\n",
        "model.fit(\n",
        "          X_train,\n",
        "          Y_train,\n",
        "          epochs=epochs,\n",
        "          batch_size=batch_size,\n",
        "          verbose=keras_verbosity,\n",
        "          validation_data=(X_test, Y_test)\n",
        "         )\n",
        "\n",
        "\n",
        "# Then Print the Training Results\n",
        "score = model.evaluate(X_test, Y_test)\n",
        "print('Test score:', score[0])\n",
        "print('Test accuracy:', score[1])\n",
        "\n",
        "\n",
        "# Test the total time to predict the whole Validation set\n",
        "start_time = time.time()\n",
        "model.predict(X_test)\n",
        "print(\"--- %s seconds ---\" % (time.time() - start_time))\n",
        "\n",
        "\n",
        "# Print our 'Efficency' as the Accuracy / Total Time\n",
        "print(score[1]/(time.time() - start_time))\n"
      ],
      "execution_count": 42,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Train on 60000 samples, validate on 10000 samples\n",
            "Epoch 1/10\n",
            "60000/60000 [==============================] - 9s 145us/sample - loss: 0.3416 - acc: 0.9065 - val_loss: 0.1275 - val_acc: 0.9606\n",
            "Epoch 2/10\n",
            "60000/60000 [==============================] - 9s 145us/sample - loss: 0.1065 - acc: 0.9678 - val_loss: 0.0764 - val_acc: 0.9754\n",
            "Epoch 3/10\n",
            "60000/60000 [==============================] - 9s 144us/sample - loss: 0.0699 - acc: 0.9788 - val_loss: 0.0614 - val_acc: 0.9801\n",
            "Epoch 4/10\n",
            "60000/60000 [==============================] - 8s 139us/sample - loss: 0.0523 - acc: 0.9841 - val_loss: 0.0608 - val_acc: 0.9808\n",
            "Epoch 5/10\n",
            "60000/60000 [==============================] - 9s 142us/sample - loss: 0.0400 - acc: 0.9875 - val_loss: 0.0600 - val_acc: 0.9825\n",
            "Epoch 6/10\n",
            "60000/60000 [==============================] - 8s 140us/sample - loss: 0.0324 - acc: 0.9900 - val_loss: 0.0527 - val_acc: 0.9845\n",
            "Epoch 7/10\n",
            "60000/60000 [==============================] - 8s 140us/sample - loss: 0.0262 - acc: 0.9918 - val_loss: 0.0612 - val_acc: 0.9822\n",
            "Epoch 8/10\n",
            "60000/60000 [==============================] - 8s 141us/sample - loss: 0.0219 - acc: 0.9930 - val_loss: 0.0518 - val_acc: 0.9856\n",
            "Epoch 9/10\n",
            "60000/60000 [==============================] - 9s 143us/sample - loss: 0.0186 - acc: 0.9941 - val_loss: 0.0514 - val_acc: 0.9856\n",
            "Epoch 10/10\n",
            "60000/60000 [==============================] - 9s 146us/sample - loss: 0.0145 - acc: 0.9953 - val_loss: 0.0518 - val_acc: 0.9851\n",
            "10000/10000 [==============================] - 1s 98us/sample - loss: 0.0518 - acc: 0.9851\n",
            "Test score: 0.051835346615960586\n",
            "Test accuracy: 0.9851\n",
            "--- 0.9094095230102539 seconds ---\n",
            "1.0829889861317523\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hdDfrdC6XIZa",
        "colab_type": "text"
      },
      "source": [
        "Kernel size is the number of pixels that are read at the same time.\n",
        "A 1x1 kernel looks at only 1 pixel at a time, this is actually very usefull for expanding or compressing the channels of an item. (but makes very little sense on a greyscale image, or even on an RGB)\n",
        "\n",
        "A 3x3 Kernel is standard because it will have the fewest weights and if you stack them, they can aproximate a bigger kernel\n",
        "\n",
        "Some systems have had great success with larger kernels.\n",
        "\n",
        "This will use a larger kernel, but will run much worse\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "tNJNx0GdQo3y",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 425
        },
        "outputId": "a2497655-faf6-4d3d-a047-2bbf1fbe4987"
      },
      "source": [
        "# Build a standard baseline model (LeNet)\n",
        "\n",
        "model = tf.keras.models.Sequential()\n",
        "\n",
        "model.add(layers.Conv2D(20,\n",
        "                     (7, 7),\n",
        "                     input_shape=input_shape,\n",
        "                     activation='relu',\n",
        "                     name='conv_1'))\n",
        "model.add(layers.MaxPool2D())\n",
        "model.add(layers.Conv2D(50, (7, 7), activation='relu', name='conv_2'))\n",
        "model.add(layers.MaxPool2D())\n",
        "model.add(layers.Permute((2, 1, 3)))\n",
        "model.add(layers.Flatten())\n",
        "model.add(layers.Dense(500, activation='relu', name='dense_1'))\n",
        "model.add(layers.Dense(10, activation='softmax', name='dense_2'))\n",
        "\n",
        "model.compile(optimizer='adam',\n",
        "                    loss='categorical_crossentropy',\n",
        "                    metrics=['accuracy'])\n",
        "\n",
        "model.summary()"
      ],
      "execution_count": 43,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Model: \"sequential_12\"\n",
            "_________________________________________________________________\n",
            "Layer (type)                 Output Shape              Param #   \n",
            "=================================================================\n",
            "conv_1 (Conv2D)              (None, 22, 22, 20)        1000      \n",
            "_________________________________________________________________\n",
            "max_pooling2d_18 (MaxPooling (None, 11, 11, 20)        0         \n",
            "_________________________________________________________________\n",
            "conv_2 (Conv2D)              (None, 5, 5, 50)          49050     \n",
            "_________________________________________________________________\n",
            "max_pooling2d_19 (MaxPooling (None, 2, 2, 50)          0         \n",
            "_________________________________________________________________\n",
            "permute_8 (Permute)          (None, 2, 2, 50)          0         \n",
            "_________________________________________________________________\n",
            "flatten_10 (Flatten)         (None, 200)               0         \n",
            "_________________________________________________________________\n",
            "dense_1 (Dense)              (None, 500)               100500    \n",
            "_________________________________________________________________\n",
            "dense_2 (Dense)              (None, 10)                5010      \n",
            "=================================================================\n",
            "Total params: 155,560\n",
            "Trainable params: 155,560\n",
            "Non-trainable params: 0\n",
            "_________________________________________________________________\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "wXCfCfUaQo6D",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 459
        },
        "outputId": "1729df01-6e8f-41c8-bade-90043bd86bd7"
      },
      "source": [
        "# Run the training\n",
        "\n",
        "model.fit(\n",
        "          X_train,\n",
        "          Y_train,\n",
        "          epochs=epochs,\n",
        "          batch_size=batch_size,\n",
        "          verbose=keras_verbosity,\n",
        "          validation_data=(X_test, Y_test)\n",
        "         )\n",
        "\n",
        "\n",
        "# Then Print the Training Results\n",
        "score = model.evaluate(X_test, Y_test)\n",
        "print('Test score:', score[0])\n",
        "print('Test accuracy:', score[1])\n",
        "\n",
        "\n",
        "# Test the total time to predict the whole Validation set\n",
        "start_time = time.time()\n",
        "model.predict(X_test)\n",
        "print(\"--- %s seconds ---\" % (time.time() - start_time))\n",
        "\n",
        "\n",
        "# Print our 'Efficency' as the Accuracy / Total Time\n",
        "print(score[1]/(time.time() - start_time))\n"
      ],
      "execution_count": 44,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Train on 60000 samples, validate on 10000 samples\n",
            "Epoch 1/10\n",
            "60000/60000 [==============================] - 44s 738us/sample - loss: 0.2321 - acc: 0.9300 - val_loss: 0.0739 - val_acc: 0.9771\n",
            "Epoch 2/10\n",
            "60000/60000 [==============================] - 43s 723us/sample - loss: 0.0647 - acc: 0.9797 - val_loss: 0.0425 - val_acc: 0.9873\n",
            "Epoch 3/10\n",
            "60000/60000 [==============================] - 43s 723us/sample - loss: 0.0449 - acc: 0.9860 - val_loss: 0.0383 - val_acc: 0.9883\n",
            "Epoch 4/10\n",
            "60000/60000 [==============================] - 43s 722us/sample - loss: 0.0353 - acc: 0.9890 - val_loss: 0.0324 - val_acc: 0.9891\n",
            "Epoch 5/10\n",
            "60000/60000 [==============================] - 43s 722us/sample - loss: 0.0271 - acc: 0.9917 - val_loss: 0.0389 - val_acc: 0.9880\n",
            "Epoch 6/10\n",
            "60000/60000 [==============================] - 44s 726us/sample - loss: 0.0235 - acc: 0.9925 - val_loss: 0.0365 - val_acc: 0.9880\n",
            "Epoch 7/10\n",
            "60000/60000 [==============================] - 43s 723us/sample - loss: 0.0190 - acc: 0.9939 - val_loss: 0.0275 - val_acc: 0.9916\n",
            "Epoch 8/10\n",
            "60000/60000 [==============================] - 43s 724us/sample - loss: 0.0152 - acc: 0.9950 - val_loss: 0.0255 - val_acc: 0.9918\n",
            "Epoch 9/10\n",
            "60000/60000 [==============================] - 44s 727us/sample - loss: 0.0140 - acc: 0.9953 - val_loss: 0.0315 - val_acc: 0.9900\n",
            "Epoch 10/10\n",
            "60000/60000 [==============================] - 44s 726us/sample - loss: 0.0135 - acc: 0.9958 - val_loss: 0.0273 - val_acc: 0.9920\n",
            "10000/10000 [==============================] - 2s 248us/sample - loss: 0.0273 - acc: 0.9920\n",
            "Test score: 0.027321851327227704\n",
            "Test accuracy: 0.992\n",
            "--- 2.2850234508514404 seconds ---\n",
            "0.4340927057080107\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5YU9bdsAZ7CV",
        "colab_type": "text"
      },
      "source": [
        "I'm going to skip changing the number of filters per layer and leave that as an activity for the reader. \n",
        "\n",
        "As a simple rule, more Filters means more math and better accuracy, and fewer filters means less math and worse accuracy."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oE7Z6t6jXWLI",
        "colab_type": "text"
      },
      "source": [
        "# If I get some more Time, I'll Create a similar example that utilizes advanced performance improvement options, like pruning, Seperable Convolutions,  maybe dialation, and some other ideas, maybe.\n"
      ]
    }
  ]
}