anjuls/KF_Fashion_MNIST.ipynb

## KF_Fashion_MNIST.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "azKzys_66G42"
   },
   "source": [
    "# From Notebook to Kubeflow Pipeline using Fashion MNIST\n",
    "\n",
    "In this notebook, we will walk you through the steps of converting a machine learning model, which you may already have on a jupyter notebook, into a Kubeflow pipeline. As an example, we will make use of the fashion we will make use of the fashion MNIST dataset and the [Basic classification with Tensorflow](https://www.tensorflow.org/tutorials/keras/classification) example.\n",
    "\n",
    "In this example we use:\n",
    "\n",
    "* **Kubeflow pipelines** - [Kubeflow Pipelines](https://www.kubeflow.org/docs/pipelines/overview/pipelines-overview/) is a machine learning workflow platform that is helping data scientists and ML engineers tackle experimentation and productionization of ML workloads. It allows users to easily orchestrate scalable workloads using an SDK right from the comfort of a Jupyter Notebook.\n",
    "\n",
    "* **Microk8s** - [Microk8s](https://microk8s.io/docs) is a service that gives you the ability to spin up a lightweight Kubernetes cluster right on your local machine. It comes with Kubeflow built right in. \n",
    "\n",
    "**Note:** This notebook is to be run on a notebook server inside the Kubeflow environment. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kN5XA9ybEtwU"
   },
   "source": [
    "## Section 1: Data exploration (as in [here](https://www.tensorflow.org/tutorials/keras/classification))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "KJ4EgzcMEUko"
   },
   "source": [
    "The [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist)  dataset contains 70,000 grayscale images in 10 clothing categories. Each image is 28x28 pixels in size. We chose this dataset to demonstrate the funtionality of Kubeflow Pipelines without introducing too much complexity in the implementation of the ML model.\n",
    "\n",
    "To familiarize you with the dataset we will do a short exploration. It is always a good idea to understand your data before you begin any kind of analysis."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "xMIxLNEiGR3x"
   },
   "source": [
    "<table>\n",
    "  <tr><td>\n",
    "    <img src=\"https://tensorflow.org/images/fashion-mnist-sprite.png\"\n",
    "         alt=\"Fashion MNIST sprite\"  width=\"600\">\n",
    "  </td></tr>\n",
    "  <tr><td align=\"center\">\n",
    "    <b>Figure 1.</b> <a href=\"https://github.com/zalandoresearch/fashion-mnist\">Fashion-MNIST samples</a> (by Zalando, MIT License).<br/>&nbsp;\n",
    "  </td></tr>\n",
    "</table>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1 Install packages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python -m pip install --user --upgrade pip\n",
    "!pip install --user --upgrade pandas matplotlib numpy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After the installation, we need to restart kernel for changes to take effect:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import HTML\n",
    "HTML(\"<script>Jupyter.notebook.kernel.restart()</script>\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "y4DB_1u5H1fT",
    "outputId": "58a8d65c-e978-4af2-aa85-114d4a7f3f29"
   },
   "outputs": [],
   "source": [
    "# TensorFlow and tf.keras\n",
    "import tensorflow as tf\n",
    "from tensorflow import keras\n",
    "\n",
    "# Data exploration libraries\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "OW5-_PC3H3YB"
   },
   "source": [
    "### 1.2 Import the Fashion MNIST dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "HrDlsMh4LoXz"
   },
   "outputs": [],
   "source": [
    "fashion_mnist = keras.datasets.fashion_mnist\n",
    "\n",
    "(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "t9FDsUlxCaWW"
   },
   "source": [
    "Each image is mapped to a single label. Since the *class names* are not included with the dataset, store them here to use later when plotting the images:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "Vec35x3cMQfo"
   },
   "outputs": [],
   "source": [
    "class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',\n",
    "               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "1Y_88GQbMX1G"
   },
   "source": [
    "Let's look at the format of each dataset split. The training set contains 60,000 images and the test set contains 10,000 images which are each 28x28 pixels.\n",
    "\n",
    "---\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3 Explore the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 104
    },
    "colab_type": "code",
    "id": "e2-Zbu0BMpt1",
    "outputId": "51310a2d-ad13-4127-f1eb-db3237524044"
   },
   "outputs": [],
   "source": [
    "print(f'Number of training images: {train_images.shape[0]}\\n')\n",
    "print(f'Number of test images: {test_images.shape[0]}\\n')\n",
    "\n",
    "print(f'Image size: {train_images.shape[1:]}')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "tHF6g2dQNQVc"
   },
   "source": [
    "There are logically 60,000 training labels and 10,000 test labels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "GUsWcVaUNVev",
    "outputId": "c13c5423-b779-4b3d-e0a7-54b03b8fc067"
   },
   "outputs": [],
   "source": [
    "print(f'Number of labels: {len(train_labels)}\\n')\n",
    "print(f'Number of test labels: {len(test_labels)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ZSk_ogerNlq4"
   },
   "source": [
    "Each label is an integer between 0 and 9 corresponding to the 10 class names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 191
    },
    "colab_type": "code",
    "id": "jkDXYdJ5N5zk",
    "outputId": "55b5a3c4-aae3-46b0-a5b0-a4a31e0c610e"
   },
   "outputs": [],
   "source": [
    "unique_train_labels = np.unique(train_labels)\n",
    "\n",
    "for label in zip(class_names, train_labels):\n",
    "  label_name, label_num = label\n",
    "  print(f'{label_name}: {label_num}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 Preprocess the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "DVOPKTgdP-nS"
   },
   "source": [
    "To properly train the model, the data must be normalized so each value will fall between 0 and 1. Later on this step will be done inside of the training script but we will show what that process looks like here.\n",
    "\n",
    "The first image shows that the values fall in a range between 0 and 255."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 265
    },
    "colab_type": "code",
    "id": "m4VEw8Ud9Quh",
    "outputId": "9942926b-6ee4-4f84-e427-56e67a86d6f8"
   },
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "plt.imshow(train_images[0])\n",
    "plt.colorbar()\n",
    "plt.grid(False)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Dn88kMUcR5G5"
   },
   "source": [
    "To scale the data we divide the training and test values by 255."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "4VREJVpLRqeP"
   },
   "outputs": [],
   "source": [
    "train_images = train_images / 255.0\n",
    "\n",
    "test_images = test_images / 255.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Ocp5nOyjSBEe"
   },
   "source": [
    "We plot the first 25 images from the training set to show that the data is in fact in the form we expect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 589
    },
    "colab_type": "code",
    "id": "-KzmYh0kSLSG",
    "outputId": "8d7c3525-39a0-40c4-a5a7-e63593f9fc88"
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10,10))\n",
    "for i in range(25):\n",
    "    plt.subplot(5,5,i+1)\n",
    "    plt.xticks([])\n",
    "    plt.yticks([])\n",
    "    plt.grid(False)\n",
    "    plt.imshow(train_images[i], cmap=plt.cm.binary)\n",
    "    plt.xlabel(class_names[train_labels[i]])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "igI1dvp7SWWV"
   },
   "source": [
    "# Section 2: Kubeflow pipeline building"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Up until this point, all our steps are similar to what you can find in the [Basic classification with Tensorflow](https://https://www.tensorflow.org/tutorials/keras/classification) example. The next step on that example is to build the model. However, we will make use of the containerized approach provided by Kubeflow to allow our model to be run using Kubernetes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 Install Kubeflow pipelines SDK"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "_ctN1VrL_qnA"
   },
   "source": [
    " The first step is to install the Kubeflow Pipelines SDK package.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "kWutHs306En6",
    "outputId": "f87b05b5-0555-43ba-8dae-04056821fe1b",
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!pip install --user --upgrade kfp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "eBwaov51AFex"
   },
   "source": [
    "After the installation, we need to restart kernel for changes to take effect:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 17
    },
    "colab_type": "code",
    "id": "iUJZqAuN6EoK",
    "outputId": "8d520295-afdd-456f-a6f4-d9a17fda8ce2"
   },
   "outputs": [],
   "source": [
    "from IPython.core.display import HTML\n",
    "HTML(\"<script>Jupyter.notebook.kernel.restart()</script>\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "RMWQNog7AZFP"
   },
   "source": [
    "Check if the install was successful:\n",
    "\n",
    "**/usr/local/bin/dsl-compile**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "LmHw4UGN6EoX"
   },
   "outputs": [],
   "source": [
    "!which dsl-compile"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should see **/usr/local/bin/dsl-compile** above."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "u8hCvyHloOK6"
   },
   "source": [
    "### 2.2 Build Container Components"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "abg3ycFVFhBC"
   },
   "source": [
    "The following cells define functions that will be transformed into lightweight container components. It is recommended to look at the corresponding Fashion MNIST notebook to match what you see here to the original code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table>\n",
    "  <tr><td>\n",
    "    <img src=\"https://www.kubeflow.org/docs/images/pipelines-sdk-lightweight.svg\"\n",
    "         alt=\"Fashion MNIST sprite\"  width=\"600\">\n",
    "  </td></tr>\n",
    "  <tr><td align=\"center\">\n",
    "  </td></tr>\n",
    "</table>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import Kubeflow SDK\n",
    "import kfp\n",
    "import kfp.dsl as dsl\n",
    "import kfp.components as comp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "VpeW1k1o6Eo5"
   },
   "outputs": [],
   "source": [
    "def train(data_path, model_file):\n",
    "    \n",
    "    # func_to_container_op requires packages to be imported inside of the function.\n",
    "    import pickle\n",
    "    \n",
    "    import tensorflow as tf\n",
    "    from tensorflow.python import keras\n",
    "    \n",
    "    # Download the dataset and split into training and test data. \n",
    "    fashion_mnist = keras.datasets.fashion_mnist\n",
    "\n",
    "    (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()\n",
    "\n",
    "    # Normalize the data so that the values all fall between 0 and 1.\n",
    "    train_images = train_images / 255.0\n",
    "    test_images = test_images / 255.0\n",
    "\n",
    "    # Define the model using Keras.\n",
    "    model = keras.Sequential([\n",
    "    keras.layers.Flatten(input_shape=(28, 28)),\n",
    "    keras.layers.Dense(128, activation='relu'),\n",
    "    keras.layers.Dense(10)\n",
    "    ])\n",
    "\n",
    "    model.compile(optimizer='adam',\n",
    "                  loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
    "                  metrics=['accuracy'])\n",
    "\n",
    "    # Run a training job with specified number of epochs\n",
    "    model.fit(train_images, train_labels, epochs=10)\n",
    "\n",
    "    # Evaluate the model and print the results\n",
    "    test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)\n",
    "    print('Test accuracy:', test_acc)\n",
    "\n",
    "    # Save the model to the designated \n",
    "    model.save(f'{data_path}/{model_file}')\n",
    "\n",
    "    # Save the test_data as a pickle file to be used by the predict component.\n",
    "    with open(f'{data_path}/test_data', 'wb') as f:\n",
    "        pickle.dump((test_images,test_labels), f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "_pxGKEae6Eo_"
   },
   "outputs": [],
   "source": [
    "def predict(data_path, model_file, image_number):\n",
    "    \n",
    "    # func_to_container_op requires packages to be imported inside of the function.\n",
    "    import pickle\n",
    "\n",
    "    import tensorflow as tf\n",
    "    from tensorflow import keras\n",
    "\n",
    "    import numpy as np\n",
    "    \n",
    "    # Load the saved Keras model\n",
    "    model = keras.models.load_model(f'{data_path}/{model_file}')\n",
    "\n",
    "    # Load and unpack the test_data\n",
    "    with open(f'{data_path}/test_data','rb') as f:\n",
    "        test_data = pickle.load(f)\n",
    "    # Separate the test_images from the test_labels.\n",
    "    test_images, test_labels = test_data\n",
    "    # Define the class names.\n",
    "    class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',\n",
    "                   'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']\n",
    "\n",
    "    # Define a Softmax layer to define outputs as probabilities\n",
    "    probability_model = tf.keras.Sequential([model, \n",
    "                                            tf.keras.layers.Softmax()])\n",
    "\n",
    "    # See https://github.com/kubeflow/pipelines/issues/2320 for explanation on this line.\n",
    "    image_number = int(image_number)\n",
    "\n",
    "    # Grab an image from the test dataset.\n",
    "    img = test_images[image_number]\n",
    "\n",
    "    # Add the image to a batch where it is the only member.\n",
    "    img = (np.expand_dims(img,0))\n",
    "\n",
    "    # Predict the label of the image.\n",
    "    predictions = probability_model.predict(img)\n",
    "\n",
    "    # Take the prediction with the highest probability\n",
    "    prediction = np.argmax(predictions[0])\n",
    "\n",
    "    # Retrieve the true label of the image from the test labels.\n",
    "    true_label = test_labels[image_number]\n",
    "    \n",
    "    class_prediction = class_names[prediction]\n",
    "    confidence = 100*np.max(predictions)\n",
    "    actual = class_names[true_label]\n",
    "    \n",
    "    \n",
    "    with open(f'{data_path}/result.txt', 'w') as result:\n",
    "        result.write(\" Prediction: {} | Confidence: {:2.0f}% | Actual: {}\".format(class_prediction,\n",
    "                                                                        confidence,\n",
    "                                                                        actual))\n",
    "    \n",
    "    print('Prediction has be saved successfully!')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create train and predict lightweight components.\n",
    "train_op = comp.func_to_container_op(train, base_image='tensorflow/tensorflow:latest-gpu-py3')\n",
    "predict_op = comp.func_to_container_op(predict, base_image='tensorflow/tensorflow:latest-gpu-py3')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 Build Kubeflow Pipeline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a client to enable communication with the Pipelines API server."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "client = kfp.Client(host='pipelines-api.kubeflow.svc.cluster.local:8888')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "CqwBi1Cro49W"
   },
   "source": [
    "Our next step will be to create the various components that will make up the pipeline. Define the pipeline using the *@dsl.pipeline* decorator.\n",
    "\n",
    "The pipeline function is defined and includes a number of paramters that will be fed into our various components throughout execution. Kubeflow Pipelines are created decalaratively. This means that the code is not run until the pipeline is compiled. \n",
    "\n",
    "A [Persistent Volume Claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) can be quickly created using the [VolumeOp](https://) method to save and persist data between the components. Note that while this is a great method to use locally, you could also use a cloud bucket for your persistent storage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 237
    },
    "colab_type": "code",
    "id": "Langpu8q6Ept",
    "outputId": "30769082-0be7-4db4-cee7-20f298c3b0c3"
   },
   "outputs": [],
   "source": [
    "# Define the pipeline\n",
    "@dsl.pipeline(\n",
    "   name='MNIST Pipeline',\n",
    "   description='A toy pipeline that performs mnist model training and prediction.'\n",
    ")\n",
    "\n",
    "# Define parameters to be fed into pipeline\n",
    "def mnist_container_pipeline(\n",
    "    data_path: str,\n",
    "    model_file: str, \n",
    "    image_number: int\n",
    "):\n",
    "    \n",
    "    # Define volume to share data between components.\n",
    "    vop = dsl.VolumeOp(\n",
    "    name=\"create_volume\",\n",
    "    resource_name=\"data-volume\", \n",
    "    size=\"1Gi\", \n",
    "    modes=dsl.VOLUME_MODE_RWM)\n",
    "    \n",
    "    # Create MNIST training component.\n",
    "    mnist_training_container = train_op(data_path, model_file) \\\n",
    "                                    .add_pvolumes({data_path: vop.volume})\n",
    "\n",
    "    # Create MNIST prediction component.\n",
    "    mnist_predict_container = predict_op(data_path, model_file, image_number) \\\n",
    "                                    .add_pvolumes({data_path: mnist_training_container.pvolume})\n",
    "    \n",
    "    # Print the result of the prediction\n",
    "    mnist_result_container = dsl.ContainerOp(\n",
    "        name=\"print_prediction\",\n",
    "        image='library/bash:4.4.23',\n",
    "        pvolumes={data_path: mnist_predict_container.pvolume},\n",
    "        arguments=['cat', f'{data_path}/result.txt']\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4 Run pipeline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "b1pqw3d0tnth"
   },
   "source": [
    "Finally we feed our pipeline definition into the compiler and run it as an experiment. This will give us 2 links at the bottom that we can follow to the [Kubeflow Pipelines UI](https://www.kubeflow.org/docs/pipelines/overview/pipelines-overview/) where you can check logs, artifacts, inputs/outputs, and visually see the progress of your pipeline."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Define some environment variables which are to be used as inputs at various points in the pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "DATA_PATH = '/mnt'\n",
    "MODEL_PATH='mnist_model.h5'\n",
    "# An integer representing an image from the test set that the model will attempt to predict the label for.\n",
    "IMAGE_NUMBER = 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 167
    },
    "colab_type": "code",
    "id": "E9NphN8F6Epz",
    "outputId": "4e743723-bfb4-4026-d349-0a4a91c4e034"
   },
   "outputs": [],
   "source": [
    "pipeline_func = mnist_container_pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "8cFLEOyq6Ep5",
    "outputId": "ac3009e4-4beb-4486-bc0a-ad30dfd2780f"
   },
   "outputs": [],
   "source": [
    "experiment_name = 'fashion_mnist_kubeflow'\n",
    "run_name = pipeline_func.__name__ + ' run'\n",
    "\n",
    "arguments = {\"data_path\":DATA_PATH,\n",
    "             \"model_file\":MODEL_PATH,\n",
    "             \"image_number\": IMAGE_NUMBER}\n",
    "\n",
    "# Compile pipeline to generate compressed YAML definition of the pipeline.\n",
    "kfp.compiler.Compiler().compile(pipeline_func,  \n",
    "  '{}.zip'.format(experiment_name))\n",
    "\n",
    "# Submit pipeline directly from pipeline function\n",
    "run_result = client.create_run_from_pipeline_func(pipeline_func, \n",
    "                                                  experiment_name=experiment_name, \n",
    "                                                  run_name=run_name, \n",
    "                                                  arguments=arguments, \n"
    "                                                  namespace='anonymous')"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "KF Fashion MNIST",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}

## kfp_rbac_fix.yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: add-header
  namespace: anonymous
spec:
  configPatches:
  - applyTo: VIRTUAL_HOST
    match:
      context: SIDECAR_OUTBOUND
      routeConfiguration:
        vhost:
          name: ml-pipeline.kubeflow.svc.cluster.local:8888
          route:
            name: default
    patch:
      operation: MERGE
      value:
        request_headers_to_add:
        - append: true
          header:
            key: kubeflow-userid
            value: anonymous@kubeflow.org
  workloadSelector:
    labels:
      notebook-name: kf-demo

---

apiVersion: rbac.istio.io/v1alpha1
kind: ServiceRoleBinding
metadata:
  name: bind-ml-pipeline-nb-anonymous
  namespace: kubeflow
spec:
  roleRef:
    kind: ServiceRole
    name: ml-pipeline-services
  subjects:
  - properties:
      source.principal: cluster.local/ns/anonymous/sa/default-editor
	apiVersion: networking.istio.io/v1alpha3
	kind: EnvoyFilter
	metadata:
	name: add-header
	namespace: anonymous
	spec:
	configPatches:
	- applyTo: VIRTUAL_HOST
	match:
	context: SIDECAR_OUTBOUND
	routeConfiguration:
	vhost:
	name: ml-pipeline.kubeflow.svc.cluster.local:8888
	route:
	name: default
	patch:
	operation: MERGE
	value:
	request_headers_to_add:
	- append: true
	header:
	key: kubeflow-userid
	value: anonymous@kubeflow.org
	workloadSelector:
	labels:
	notebook-name: kf-demo

	---

	apiVersion: rbac.istio.io/v1alpha1
	kind: ServiceRoleBinding
	metadata:
	name: bind-ml-pipeline-nb-anonymous
	namespace: kubeflow
	spec:
	roleRef:
	kind: ServiceRole
	name: ml-pipeline-services
	subjects:
	- properties:
	source.principal: cluster.local/ns/anonymous/sa/default-editor