Lewington-pitsos/mscoco.ipynb

## mscoco.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Image classification multi-label classification\n",
    "\n",
    "1. [Introduction](#Introduction)\n",
    "2. [Prerequisites](#Prequisites)\n",
    "3. [Data Preparation](#Data-Preparation)\n",
    "3. [Multi-label Training](#Multi-label-Training)\n",
    "4. [Inference](#Inference)\n",
    "5. [Clean-up](#Clean-up)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "\n",
    "Welcome to our end-to-end example of multi-label classification using the Sagemaker 1P image classification algorithm. In this demo, we will use the Amazon sagemaker image classification algorithm in transfer learning mode to fine-tune a pre-trained model (trained on imagenet data) to learn to classify a new multi-label dataset. In particular, the pre-trained model will be fine-tuned using [MS-COCO](http://cocodataset.org/#overview) dataset. \n",
    "\n",
    "To get started, we need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prequisites\n",
    "\n",
    "### Permissions and environment variables\n",
    "\n",
    "Here we set up the linkage and authentication to AWS services. There are three parts to this:\n",
    "\n",
    "* The roles used to give learning and hosting access to your data. This will automatically be obtained from the role used to start the notebook\n",
    "* The S3 bucket that you want to use for training and model data\n",
    "* The Amazon sagemaker image classification docker image which need not be changed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install tools\n",
    "\n",
    "We need pycocotools to parse the annotations for the MSCOCO dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Reading package lists... Done\n",
      "Building dependency tree       \n",
      "Reading state information... Done\n",
      "build-essential is already the newest version (12.6).\n",
      "0 upgraded, 0 newly installed, 0 to remove and 41 not upgraded.\n",
      "Reading package lists... Done\n",
      "Building dependency tree       \n",
      "Reading state information... Done\n",
      "manpages-dev is already the newest version (4.16-2).\n",
      "0 upgraded, 0 newly installed, 0 to remove and 41 not upgraded.\n"
     ]
    }
   ],
   "source": [
    "! apt install build-essential -y\n",
    "! apt-get install manpages-dev -y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.7/site-packages/secretstorage/dhcrypto.py:16: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead\n",
      "  from cryptography.utils import int_from_bytes\n",
      "/opt/conda/lib/python3.7/site-packages/secretstorage/util.py:25: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead\n",
      "  from cryptography.utils import int_from_bytes\n",
      "Requirement already satisfied: pip in /opt/conda/lib/python3.7/site-packages (21.3.1)\n",
      "Requirement already satisfied: setuptools in /opt/conda/lib/python3.7/site-packages (60.5.0)\n",
      "Requirement already satisfied: wheel in /opt/conda/lib/python3.7/site-packages (0.37.1)\n",
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\n",
      "/opt/conda/lib/python3.7/site-packages/secretstorage/dhcrypto.py:16: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead\n",
      "  from cryptography.utils import int_from_bytes\n",
      "/opt/conda/lib/python3.7/site-packages/secretstorage/util.py:25: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead\n",
      "  from cryptography.utils import int_from_bytes\n",
      "Collecting pycocotools\n",
      "  Using cached pycocotools-2.0.4.tar.gz (106 kB)\n",
      "  Installing build dependencies ... \u001b[?25ldone\n",
      "\u001b[?25h  Getting requirements to build wheel ... \u001b[?25ldone\n",
      "\u001b[?25h  Preparing metadata (pyproject.toml) ... \u001b[?25ldone\n",
      "\u001b[?25hRequirement already satisfied: matplotlib>=2.1.0 in /opt/conda/lib/python3.7/site-packages (from pycocotools) (3.1.3)\n",
      "Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from pycocotools) (1.20.3)\n",
      "Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=2.1.0->pycocotools) (1.1.0)\n",
      "Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=2.1.0->pycocotools) (2.8.1)\n",
      "Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=2.1.0->pycocotools) (0.10.0)\n",
      "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=2.1.0->pycocotools) (2.4.6)\n",
      "Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from cycler>=0.10->matplotlib>=2.1.0->pycocotools) (1.14.0)\n",
      "Requirement already satisfied: setuptools in /opt/conda/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib>=2.1.0->pycocotools) (60.5.0)\n",
      "Building wheels for collected packages: pycocotools\n",
      "  Building wheel for pycocotools (pyproject.toml) ... \u001b[?25ldone\n",
      "\u001b[?25h  Created wheel for pycocotools: filename=pycocotools-2.0.4-cp37-cp37m-linux_x86_64.whl size=326825 sha256=33eb1305f24ee74f2f739df0ea352a108fa50288f9e8eb2a0656f4b198a3079f\n",
      "  Stored in directory: /root/.cache/pip/wheels/a3/5f/fa/f011e578cc76e1fc5be8dce30b3eb9fd00f337e744b3bba59b\n",
      "Successfully built pycocotools\n",
      "Installing collected packages: pycocotools\n",
      "Successfully installed pycocotools-2.0.4\n",
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "! pip install --upgrade pip setuptools wheel\n",
    "! pip install pycocotools"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "arn:aws:iam::950765595897:role/service-role/AmazonSageMaker-ExecutionRole-20210403T100038\n",
      "using bucket sagemaker-ap-southeast-2-950765595897\n"
     ]
    }
   ],
   "source": [
    "import sagemaker\n",
    "from sagemaker import get_execution_role\n",
    "\n",
    "role = get_execution_role()\n",
    "print(role)\n",
    "\n",
    "sess = sagemaker.Session()\n",
    "bucket = sess.default_bucket()\n",
    "prefix = \"ic-multilabel\"\n",
    "\n",
    "print(\"using bucket %s\" % bucket)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Defaulting to the only supported framework/algorithm version: 1. Ignoring framework/algorithm version: latest.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "544295431143.dkr.ecr.ap-southeast-2.amazonaws.com/image-classification:1\n"
     ]
    }
   ],
   "source": [
    "training_image = sagemaker.image_uris.retrieve(\n",
    "    region=sess.boto_region_name, framework=\"image-classification\", version=\"latest\"\n",
    ")\n",
    "print(training_image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Preparation\n",
    "MS COCO is a large-scale dataset for multiple computer vision tasks, including object detection, segmentation, and captioning. In this notebook, we will use the object detection dataset to construct the multi-label dataset for classification. We will use the 2017 validation set from MS-COCO dataset to train multi-label classifier. MS-COCO dataset consist of images from 80 categories. We will choose 5 categories out of 80 and train the model to learn to classify these 5 categories. These are: \n",
    "\n",
    "1. Person\n",
    "2. Bicycle\n",
    "3. Car\n",
    "4. Motorcycle\n",
    "5. Airplane\n",
    "\n",
    "An image can contain objects of multiple categories. We first create a dataset with these 5 categories. COCO is a very large dataset, and the purpose of this notebook is to show how multi-label classification works. So, instead we’ll take what COCO calls their validation dataset from 2017, and use this as our only data.  We then split this dataset into a train and holdout dataset for fine tuning the model and testing our final accuracy\n",
    "\n",
    "The image classification algorithm can take two types of input formats. The first is a [recordio format](https://mxnet.apache.org/versions/1.7.0/api/faq/recordio) and the other is a lst format. We will use the lst file format for training. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dataset License\n",
    "\n",
    "The annotations in this dataset belong to the COCO Consortium and are licensed under a Creative Commons Attribution 4.0 License. The COCO Consortium does not own the copyright of the images. Use of the images must abide by the Flickr Terms of Use. The users of the images accept full responsibility for the use of the dataset, including but not limited to the use of any copies of copyrighted images that they may create from the dataset. Before you use this data for any other purpose than this example, you should understand the data license, described at http://cocodataset.org/#termsofuse\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import urllib.request\n",
    "\n",
    "\n",
    "def download(url):\n",
    "    filename = url.split(\"/\")[-1]\n",
    "    if not os.path.exists(filename):\n",
    "        urllib.request.urlretrieve(url, filename)\n",
    "\n",
    "\n",
    "# MSCOCO validation image files\n",
    "download(\"http://images.cocodataset.org/zips/val2017.zip\")\n",
    "download(\"http://images.cocodataset.org/annotations/annotations_trainval2017.zip\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "!unzip -qo val2017.zip\n",
    "!unzip -qo annotations_trainval2017.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parse the annotation to create lst file\n",
    "Use pycocotools to parse the annotation and create the lst file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "loading annotations into memory...\n",
      "Done (t=0.80s)\n",
      "creating index...\n",
      "index created!\n",
      "[2694  149  536  159   97]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:24: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.\n",
      "Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations\n"
     ]
    }
   ],
   "source": [
    "from pycocotools.coco import COCO\n",
    "import numpy as np\n",
    "import os\n",
    "\n",
    "annFile = \"./annotations/instances_val2017.json\"\n",
    "coco = COCO(annFile)\n",
    "\n",
    "catIds = coco.getCatIds()\n",
    "image_ids_of_cats = []\n",
    "for cat in catIds:\n",
    "    image_ids_of_cats.append(coco.getImgIds(catIds=cat))\n",
    "\n",
    "image_ids = []\n",
    "labels = []\n",
    "# use only the first 5 classes\n",
    "# obtain image ids and labels for images with these 5 classes\n",
    "cats = [1, 2, 3, 4, 5]\n",
    "for ind_cat in cats:\n",
    "    for image_id in image_ids_of_cats[ind_cat - 1]:\n",
    "        if image_id in image_ids:\n",
    "            labels[image_ids.index(image_id)][ind_cat - 1] = 1\n",
    "        else:\n",
    "            image_ids.append(image_id)\n",
    "            labels.append(np.zeros(len(cats), dtype=np.int))\n",
    "            labels[-1][ind_cat - 1] = 1\n",
    "# Construct the lst file from the image ids and labels\n",
    "# The first column is the image index, the last is the image filename\n",
    "# and the second to last but one are the labels\n",
    "with open(\"image.lst\", \"w\") as fp:\n",
    "    sum_labels = labels[0]\n",
    "    for ind, image_id in enumerate(image_ids):\n",
    "        coco_img = coco.loadImgs(image_id)\n",
    "        image_path = os.path.join(coco_img[0][\"file_name\"])\n",
    "        label_h = labels[ind]\n",
    "        sum_labels += label_h\n",
    "        fp.write(str(ind) + \"\\t\")\n",
    "        for j in label_h:\n",
    "            fp.write(str(j) + \"\\t\")\n",
    "        fp.write(image_path)\n",
    "        fp.write(\"\\n\")\n",
    "    fp.close()\n",
    "print(sum_labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create training and validation set\n",
    "Create training and validation set by splitting the lst file. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "930\t1\t0\t0\t0\t0\t000000264968.jpg\n",
      "2522\t1\t0\t0\t0\t0\t000000458255.jpg\n",
      "2075\t1\t0\t0\t1\t0\t000000145620.jpg\n",
      "2186\t1\t0\t1\t0\t0\t000000334371.jpg\n",
      "1766\t1\t0\t0\t0\t0\t000000177489.jpg\n",
      "2232\t1\t0\t0\t0\t0\t000000563882.jpg\n",
      "587\t1\t0\t0\t0\t0\t000000190140.jpg\n",
      "2525\t1\t0\t0\t0\t0\t000000130579.jpg\n",
      "2720\t0\t0\t1\t0\t0\t000000454661.jpg\n",
      "1904\t1\t0\t0\t0\t0\t000000268000.jpg\n",
      "2500 mscocoval2017train.lst\n",
      "469 mscocoval2017val.lst\n"
     ]
    }
   ],
   "source": [
    "!shuf image.lst > im.lst\n",
    "!head -n 2500 im.lst > mscocoval2017train.lst\n",
    "!tail -n +2501 im.lst > mscocoval2017val.lst\n",
    "!head mscocoval2017train.lst\n",
    "!wc -l mscocoval2017train.lst\n",
    "!wc -l mscocoval2017val.lst"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Upload the data\n",
    "Upload the data onto the s3 bucket. The images are uploaded onto train and validation bucket. The lst files are uploaded to train_lst and validation_lst folders. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Four channels: train, validation, train_lst, and validation_lst\n",
    "s3train = \"s3://{}/{}/train/\".format(bucket, prefix)\n",
    "s3validation = \"s3://{}/{}/validation/\".format(bucket, prefix)\n",
    "s3train_lst = \"s3://{}/{}/train_lst/\".format(bucket, prefix)\n",
    "s3validation_lst = \"s3://{}/{}/validation_lst/\".format(bucket, prefix)\n",
    "\n",
    "# upload the image files to train and validation channels\n",
    "!aws s3 cp val2017 $s3train --recursive --quiet\n",
    "!aws s3 cp val2017 $s3validation --recursive --quiet\n",
    "\n",
    "# upload the lst files to train_lst and validation_lst channels\n",
    "!aws s3 cp mscocoval2017train.lst $s3train_lst --quiet\n",
    "!aws s3 cp mscocoval2017val.lst $s3validation_lst --quiet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Multi-label Training\n",
    "Now that we are done with all the setup that is needed, we are ready to train our object detector. To begin, let us create a ``sageMaker.estimator.Estimator`` object. This estimator will launch the training job.\n",
    "\n",
    "### Training parameters\n",
    "There are two kinds of parameters that need to be set for training. The first one are the parameters for the training job. These include:\n",
    "\n",
    "* **Training instance count**: This is the number of instances on which to run the training. When the number of instances is greater than one, then the image classification algorithm will run in distributed settings. \n",
    "* **Training instance type**: This indicates the type of machine on which to run the training. Typically, we use GPU instances for these training \n",
    "* **Output path**: This the s3 folder in which the training output is stored"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "from constants import *\n",
    "from sagemaker.debugger import DebuggerHookConfig, ProfilerRule, CollectionConfig, Rule, rule_configs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3_output_location = \"s3://{}/{}/output\".format(bucket, prefix)\n",
    "multilabel_ic = sagemaker.estimator.Estimator(\n",
    "    training_image,\n",
    "    role,\n",
    "    instance_count=1,\n",
    "    instance_type=\"ml.p2.xlarge\",\n",
    "    volume_size=50,\n",
    "    max_run=360000,\n",
    "    input_mode=\"File\",\n",
    "    output_path=s3_output_location,\n",
    "    sagemaker_session=sess,\n",
    "    rules=[\n",
    "            Rule.sagemaker(rule_configs.loss_not_decreasing()),\n",
    "            Rule.sagemaker(rule_configs.overfit()),\n",
    "            Rule.sagemaker(rule_configs.overtraining()),\n",
    "            Rule.sagemaker(rule_configs.stalled_training_rule())\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Algorithm parameters\n",
    "\n",
    "Apart from the above set of parameters, there are hyperparameters that are specific to the algorithm. These are:\n",
    "\n",
    "* **num_layers**: The number of layers (depth) for the network. We use 18 in this samples but other values such as 50, 152 can be used.\n",
    "* **use_pretrained_model**: Set to 1 to use pretrained model for transfer learning.\n",
    "* **image_shape**: The input image dimensions,'num_channels, height, width', for the network. It should be no larger than the actual image size. The number of channels should be same as the actual image.\n",
    "* **num_classes**: This is the number of output classes for the dataset. We use 5 classes from MSCOCO and hence it is set to 5\n",
    "* **mini_batch_size**: The number of training samples used for each mini batch. In distributed training, the number of training samples used per batch will be N * mini_batch_size where N is the number of hosts on which training is run\n",
    "* **resize**: Resize the image before using it for training. The images are resized so that the shortest side is of this parameter. If the parameter is not set, then the training data is used as such without resizing.\n",
    "* **epochs**: Number of training epochs\n",
    "* **learning_rate**: Learning rate for training\n",
    "* **num_training_samples**: This is the total number of training samples. It is set to 2500 for COCO dataset with the current split\n",
    "* **use_weighted_loss**: This parameter is used to balance the influence of the positive and negative samples within the dataset.\n",
    "* **augmentation_type**: This parameter determines the type of augmentation used for training. It can take on three values, 'crop', 'crop_color' and 'crop_color_transform'\n",
    "* **precision_dtype**: The data type precision used during training. Using ``float16`` can lead to faster training with minimal drop in accuracy, paritcularly on P3 machines. By default, the parameter is set to ``float32``\n",
    "* **multi_label**: Set multi_label to 1 for multi-label processing\n",
    "\n",
    "You can find a detailed description of all the algorithm parameters at https://docs.aws.amazon.com/sagemaker/latest/dg/IC-Hyperparameter.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "multilabel_ic.set_hyperparameters(\n",
    "    num_layers=18,\n",
    "    use_pretrained_model=1,\n",
    "    image_shape=\"3,224,224\",\n",
    "    num_classes=5,\n",
    "    mini_batch_size=128,\n",
    "    resize=256,\n",
    "    epochs=5,\n",
    "    learning_rate=0.001,\n",
    "    num_training_samples=2500,\n",
    "    use_weighted_loss=1,\n",
    "    augmentation_type=\"crop_color_transform\",\n",
    "    precision_dtype=\"float32\",\n",
    "    multi_label=1,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Input data specification\n",
    "Set the data type and channels used for training. In this training, we use application/x-image content type that require individual images and lst file for data input. In addition, Sagemaker image classification algorithm supports application/x-recordio format which can be used for larger datasets. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_data = sagemaker.inputs.TrainingInput(\n",
    "    s3train,\n",
    "    distribution=\"FullyReplicated\",\n",
    "    content_type=\"application/x-image\",\n",
    "    s3_data_type=\"S3Prefix\",\n",
    ")\n",
    "validation_data = sagemaker.inputs.TrainingInput(\n",
    "    s3validation,\n",
    "    distribution=\"FullyReplicated\",\n",
    "    content_type=\"application/x-image\",\n",
    "    s3_data_type=\"S3Prefix\",\n",
    ")\n",
    "train_data_lst = sagemaker.inputs.TrainingInput(\n",
    "    s3train_lst,\n",
    "    distribution=\"FullyReplicated\",\n",
    "    content_type=\"application/x-image\",\n",
    "    s3_data_type=\"S3Prefix\",\n",
    ")\n",
    "validation_data_lst = sagemaker.inputs.TrainingInput(\n",
    "    s3validation_lst,\n",
    "    distribution=\"FullyReplicated\",\n",
    "    content_type=\"application/x-image\",\n",
    "    s3_data_type=\"S3Prefix\",\n",
    ")\n",
    "data_channels = {\n",
    "    \"train\": train_data,\n",
    "    \"validation\": validation_data,\n",
    "    \"train_lst\": train_data_lst,\n",
    "    \"validation_lst\": validation_data_lst,\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Start the training\n",
    "Start training by calling the fit method in the estimator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2022-01-11 04:13:14 Starting - Starting the training job...\n",
      "2022-01-11 04:13:22 Starting - Launching requested ML instancesLossNotDecreasing: InProgress\n",
      "Overfit: InProgress\n",
      "Overtraining: InProgress\n",
      "StalledTrainingRule: InProgress\n",
      "ProfilerReport-1641874394: InProgress\n",
      "......\n",
      "2022-01-11 04:14:38 Starting - Preparing the instances for training......."
     ]
    }
   ],
   "source": [
    "multilabel_ic.fit(inputs=data_channels, logs=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Inference\n",
    "\n",
    "***\n",
    "\n",
    "A trained model does nothing on its own. We now want to use the model to perform inference. For this example, that means predicting the class of the image. You can deploy the created model by using the deploy method in the estimator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'TrainingJobName': 'image-classification-2022-01-11-03-34-50-149',\n",
       " 'TrainingJobArn': 'arn:aws:sagemaker:ap-southeast-2:950765595897:training-job/image-classification-2022-01-11-03-34-50-149',\n",
       " 'ModelArtifacts': {'S3ModelArtifacts': 's3://sagemaker-ap-southeast-2-950765595897/ic-multilabel/output/image-classification-2022-01-11-03-34-50-149/output/model.tar.gz'},\n",
       " 'TrainingJobStatus': 'Completed',\n",
       " 'SecondaryStatus': 'Completed',\n",
       " 'HyperParameters': {'augmentation_type': 'crop_color_transform',\n",
       "  'epochs': '5',\n",
       "  'image_shape': '3,224,224',\n",
       "  'learning_rate': '0.001',\n",
       "  'mini_batch_size': '128',\n",
       "  'multi_label': '1',\n",
       "  'num_classes': '5',\n",
       "  'num_layers': '18',\n",
       "  'num_training_samples': '2500',\n",
       "  'precision_dtype': 'float32',\n",
       "  'resize': '256',\n",
       "  'use_pretrained_model': '1',\n",
       "  'use_weighted_loss': '1'},\n",
       " 'AlgorithmSpecification': {'TrainingImage': '544295431143.dkr.ecr.ap-southeast-2.amazonaws.com/image-classification:1',\n",
       "  'TrainingInputMode': 'File',\n",
       "  'MetricDefinitions': [{'Name': 'train:accuracy',\n",
       "    'Regex': 'Epoch\\\\S* Train-accuracy=(\\\\S*)'},\n",
       "   {'Name': 'validation:accuracy',\n",
       "    'Regex': 'Epoch\\\\S* Validation-accuracy=(\\\\S*)'},\n",
       "   {'Name': 'train:accuracy:epoch',\n",
       "    'Regex': 'Epoch\\\\S* Train-accuracy=(\\\\S*)'},\n",
       "   {'Name': 'validation:accuracy:epoch',\n",
       "    'Regex': 'Epoch\\\\S* Validation-accuracy=(\\\\S*)'}],\n",
       "  'EnableSageMakerMetricsTimeSeries': False},\n",
       " 'RoleArn': 'arn:aws:iam::950765595897:role/service-role/AmazonSageMaker-ExecutionRole-20210403T100038',\n",
       " 'InputDataConfig': [{'ChannelName': 'train',\n",
       "   'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix',\n",
       "     'S3Uri': 's3://sagemaker-ap-southeast-2-950765595897/ic-multilabel/train/',\n",
       "     'S3DataDistributionType': 'FullyReplicated'}},\n",
       "   'ContentType': 'application/x-image',\n",
       "   'CompressionType': 'None',\n",
       "   'RecordWrapperType': 'None'},\n",
       "  {'ChannelName': 'validation',\n",
       "   'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix',\n",
       "     'S3Uri': 's3://sagemaker-ap-southeast-2-950765595897/ic-multilabel/validation/',\n",
       "     'S3DataDistributionType': 'FullyReplicated'}},\n",
       "   'ContentType': 'application/x-image',\n",
       "   'CompressionType': 'None',\n",
       "   'RecordWrapperType': 'None'},\n",
       "  {'ChannelName': 'train_lst',\n",
       "   'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix',\n",
       "     'S3Uri': 's3://sagemaker-ap-southeast-2-950765595897/ic-multilabel/train_lst/',\n",
       "     'S3DataDistributionType': 'FullyReplicated'}},\n",
       "   'ContentType': 'application/x-image',\n",
       "   'CompressionType': 'None',\n",
       "   'RecordWrapperType': 'None'},\n",
       "  {'ChannelName': 'validation_lst',\n",
       "   'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix',\n",
       "     'S3Uri': 's3://sagemaker-ap-southeast-2-950765595897/ic-multilabel/validation_lst/',\n",
       "     'S3DataDistributionType': 'FullyReplicated'}},\n",
       "   'ContentType': 'application/x-image',\n",
       "   'CompressionType': 'None',\n",
       "   'RecordWrapperType': 'None'}],\n",
       " 'OutputDataConfig': {'KmsKeyId': '',\n",
       "  'S3OutputPath': 's3://sagemaker-ap-southeast-2-950765595897/ic-multilabel/output'},\n",
       " 'ResourceConfig': {'InstanceType': 'ml.p2.xlarge',\n",
       "  'InstanceCount': 1,\n",
       "  'VolumeSizeInGB': 50},\n",
       " 'StoppingCondition': {'MaxRuntimeInSeconds': 360000},\n",
       " 'CreationTime': datetime.datetime(2022, 1, 11, 3, 34, 50, 408000, tzinfo=tzlocal()),\n",
       " 'TrainingStartTime': datetime.datetime(2022, 1, 11, 3, 37, 52, 423000, tzinfo=tzlocal()),\n",
       " 'TrainingEndTime': datetime.datetime(2022, 1, 11, 3, 42, 10, 673000, tzinfo=tzlocal()),\n",
       " 'LastModifiedTime': datetime.datetime(2022, 1, 11, 3, 42, 16, 140000, tzinfo=tzlocal()),\n",
       " 'SecondaryStatusTransitions': [{'Status': 'Starting',\n",
       "   'StartTime': datetime.datetime(2022, 1, 11, 3, 34, 50, 408000, tzinfo=tzlocal()),\n",
       "   'EndTime': datetime.datetime(2022, 1, 11, 3, 37, 52, 423000, tzinfo=tzlocal()),\n",
       "   'StatusMessage': 'Preparing the instances for training'},\n",
       "  {'Status': 'Downloading',\n",
       "   'StartTime': datetime.datetime(2022, 1, 11, 3, 37, 52, 423000, tzinfo=tzlocal()),\n",
       "   'EndTime': datetime.datetime(2022, 1, 11, 3, 39, 26, 576000, tzinfo=tzlocal()),\n",
       "   'StatusMessage': 'Downloading input data'},\n",
       "  {'Status': 'Training',\n",
       "   'StartTime': datetime.datetime(2022, 1, 11, 3, 39, 26, 576000, tzinfo=tzlocal()),\n",
       "   'EndTime': datetime.datetime(2022, 1, 11, 3, 41, 57, 288000, tzinfo=tzlocal()),\n",
       "   'StatusMessage': 'Training image download completed. Training in progress.'},\n",
       "  {'Status': 'Uploading',\n",
       "   'StartTime': datetime.datetime(2022, 1, 11, 3, 41, 57, 288000, tzinfo=tzlocal()),\n",
       "   'EndTime': datetime.datetime(2022, 1, 11, 3, 42, 10, 673000, tzinfo=tzlocal()),\n",
       "   'StatusMessage': 'Uploading generated training model'},\n",
       "  {'Status': 'Completed',\n",
       "   'StartTime': datetime.datetime(2022, 1, 11, 3, 42, 10, 673000, tzinfo=tzlocal()),\n",
       "   'EndTime': datetime.datetime(2022, 1, 11, 3, 42, 10, 673000, tzinfo=tzlocal()),\n",
       "   'StatusMessage': 'Training job completed'}],\n",
       " 'FinalMetricDataList': [{'MetricName': 'train:accuracy',\n",
       "   'Value': 0.8905429840087891,\n",
       "   'Timestamp': datetime.datetime(2022, 1, 11, 3, 41, 49, tzinfo=tzlocal())},\n",
       "  {'MetricName': 'validation:accuracy',\n",
       "   'Value': 0.8843749761581421,\n",
       "   'Timestamp': datetime.datetime(2022, 1, 11, 3, 41, 51, tzinfo=tzlocal())},\n",
       "  {'MetricName': 'train:accuracy:epoch',\n",
       "   'Value': 0.8905429840087891,\n",
       "   'Timestamp': datetime.datetime(2022, 1, 11, 3, 41, 49, tzinfo=tzlocal())},\n",
       "  {'MetricName': 'validation:accuracy:epoch',\n",
       "   'Value': 0.8843749761581421,\n",
       "   'Timestamp': datetime.datetime(2022, 1, 11, 3, 41, 51, tzinfo=tzlocal())}],\n",
       " 'EnableNetworkIsolation': False,\n",
       " 'EnableInterContainerTrafficEncryption': False,\n",
       " 'EnableManagedSpotTraining': False,\n",
       " 'TrainingTimeInSeconds': 258,\n",
       " 'BillableTimeInSeconds': 258,\n",
       " 'DebugHookConfig': {'S3OutputPath': 's3://frontlinedatasystems-ml-data/debugger/',\n",
       "  'CollectionConfigurations': [{'CollectionName': 'all',\n",
       "    'CollectionParameters': {'save_interval': '1'}}]},\n",
       " 'ProfilerConfig': {'S3OutputPath': 's3://sagemaker-ap-southeast-2-950765595897/ic-multilabel/output',\n",
       "  'ProfilingIntervalInMilliseconds': 500},\n",
       " 'ProfilerRuleConfigurations': [{'RuleConfigurationName': 'ProfilerReport-1641872090',\n",
       "   'RuleEvaluatorImage': '184798709955.dkr.ecr.ap-southeast-2.amazonaws.com/sagemaker-debugger-rules:latest',\n",
       "   'VolumeSizeInGB': 0,\n",
       "   'RuleParameters': {'rule_to_invoke': 'ProfilerReport'}}],\n",
       " 'ProfilerRuleEvaluationStatuses': [{'RuleConfigurationName': 'ProfilerReport-1641872090',\n",
       "   'RuleEvaluationJobArn': 'arn:aws:sagemaker:ap-southeast-2:950765595897:processing-job/image-classification-2022--profilerreport-1641872090-6c0f0213',\n",
       "   'RuleEvaluationStatus': 'NoIssuesFound',\n",
       "   'LastModifiedTime': datetime.datetime(2022, 1, 11, 3, 42, 16, 134000, tzinfo=tzlocal())}],\n",
       " 'ProfilingStatus': 'Enabled',\n",
       " 'ResponseMetadata': {'RequestId': 'f56b2ae9-01bd-4035-afd4-2d8d7223878c',\n",
       "  'HTTPStatusCode': 200,\n",
       "  'HTTPHeaders': {'x-amzn-requestid': 'f56b2ae9-01bd-4035-afd4-2d8d7223878c',\n",
       "   'content-type': 'application/x-amz-json-1.1',\n",
       "   'content-length': '5595',\n",
       "   'date': 'Tue, 11 Jan 2022 04:08:06 GMT'},\n",
       "  'RetryAttempts': 0}}"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "job_name = multilabel_ic.latest_training_job.name\n",
    "sclient = multilabel_ic.sagemaker_session.sagemaker_client\n",
    "\n",
    "description = sclient.describe_training_job(TrainingJobName=job_name)\n",
    "description"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.serializers import IdentitySerializer\n",
    "\n",
    "ic_classifier = multilabel_ic.deploy(\n",
    "    initial_instance_count=1,\n",
    "    instance_type=\"ml.m4.xlarge\",\n",
    "    serializer=IdentitySerializer(content_type=\"application/x-image\"),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluation\n",
    "\n",
    "Evaluate an image through the network for inference. The network outputs class probabilities for all the classes. As can be seen from this example, the network output is pretty good even with training for only 5 epochs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "# This image was used in training. In practice, test another image that is 600x400\n",
    "file_name = \"val2017/000000000139.jpg\"\n",
    "with open(file_name, \"rb\") as image:\n",
    "    f = image.read()\n",
    "    b = bytearray(f)\n",
    "\n",
    "results = ic_classifier.predict(b)\n",
    "prob = json.loads(results)\n",
    "classes = [\"Person\", \"Bicycle\", \"Car\", \"Motorcycle\", \"Airplane\"]\n",
    "for idx, val in enumerate(classes):\n",
    "    print(\"%s:%f \" % (classes[idx], prob[idx]), end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean up\n",
    "You can use the following command to delete the endpoint. The endpoint that is created above is persistent and would consume resources till it is deleted. It is good to delete the endpoint when it is not used"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ic_classifier.delete_endpoint()"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:ap-southeast-2:452832661640:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  },
  "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.  Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}