max-kuk/notebook2.ipynb

## notebook2.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Segment satellite images.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "toc_visible": true,
      "machine_shape": "hm",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/max-kuk/a0824ff5134d490abab9cf04b8da9000/notebook2.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "_tjPcPg1kStD"
      },
      "source": [
        "#Segment satellite images"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gESUAbHh9hFs",
        "colab_type": "text"
      },
      "source": [
        "#Overview\n",
        "\n",
        "In this notebook, we train the model with different parameters (batch_size, backbone, etc.)\n",
        "\n",
        "\n",
        "**Goal:**\n",
        "\n",
        "Under Google Colab restrictions to find the best model and hyperparameters"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JhsIoHwmeeKA",
        "colab_type": "text"
      },
      "source": [
        "# Model overview\n",
        "\n",
        "There are bunch of different segmentation models such as Unet [1], Linknet [2], PSPNet [3]. All of them can be used with different encoders like resnet34 [4] or efficientNetb0[5]\n",
        "\n",
        "![alt text](https://heise.cloudimg.io/width/610/q85.png-lossy-85.webp-lossy-85.foil1/_www-heise-de_/imgs/18/2/6/8/8/5/3/2/EfficientNet_Google-682afd88706299f5.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "svoKlAMNeimb",
        "colab_type": "text"
      },
      "source": [
        "# Architecture suggestion for cloud task\n",
        "\n",
        "As Mask R-CNN is better for object detection, and unet for segmentation. [6], [7] As result we can use Mask R-CNN to find clouds firstly (bounding boxes) and then unet for find exact cloud borders. \n",
        "\n",
        "![alt text](https://drive.google.com/uc?id=1xFtjhb54-rwU61TXWprqXoZySxIXqy-p)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Vf3RS8xykP5c"
      },
      "source": [
        "# Update for 4th Week 4️⃣\n",
        "1) The whole dataset was used for training\n",
        "\n",
        "2) Using of post-processing\n",
        "\n",
        "3) Trying to work with different backbones like efficientnetb2, efficientnetb3, efficientnetb4; models like unet, or linknet; different batch sizes and augmentation techniques\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TxGg6rSshTEa",
        "colab_type": "text"
      },
      "source": [
        "## Augmentation\n",
        "\n",
        "used methods:\n",
        "1.   horizontal flip\n",
        "2.   vertical flip\n",
        "3.   rotation\n",
        "4.   grid distortion\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ShMcfr3ihYUA",
        "colab_type": "text"
      },
      "source": [
        "## Postprocessing\n",
        "\n",
        "For one successful trained model we're able to run it on submission data with post processing and different threshold values\n",
        "\n",
        "Postprocessing is based on connectivity of pixels in predicted masks. If too small mask (min_seize) was predicted, it would be not recognised as a cloud (see submit_result notebook).\n",
        "\n",
        "Net | Backbone | Dice-Coef Val | Treshold (see submit nb) | Kaggle-Score\n",
        "--- | --- | --- | --- | --- \n",
        "Linknet | efficientnetb2 | 0.5759 | 0.45 | 0.637\n",
        "Linknet | efficientnetb2 | 0.5759 | 0.50 | 0.641\n",
        "Linknet | efficientnetb2 | 0.5759 | 0.40 | 0.636\n",
        "Linknet | efficientnetb2 | 0.5759 | 0.55 | 0.640\n",
        "Linknet | efficientnetb2 | 0.5759 | 0.52 | 0.639\n",
        "\n",
        "\n",
        "Also some augmentations on test images were made by flipping or shifting the image. Then the average of the prediction masks of augmented image was taken as a final mask for submission\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ciXPsBNphdHB",
        "colab_type": "text"
      },
      "source": [
        "## Results\n",
        "\n",
        "Using the whole dataset with efficientNet2 model and larger batch size have improved result by almost 4%\n",
        "\n",
        "Augmentation brought another 3%\n",
        "\n",
        "The using of postprocessing skyrocket the whole result of 7% to 64,1%\n",
        "\n",
        "To sum up, all above-mentioned techniques helped to achieve 10% higher score on Kaggle. However, due to instability of the Kaggle notebook platform and colab restriction, it was unable to train better models such as efficientnetb3 or efficientnetb4 with large batch sizes\n",
        "\n",
        "Netz | Backbone | Optimizer | Batch size | Epochen | Train size | Val size | Dice-Coef Train | Dice-Coef Val\n",
        "--- | --- | --- | --- | --- | --- | --- | --- | --- \n",
        "Linknet | efficientnetb2 | Nadam | 10 | 20 | 90% | 10% | 0.8530 | 0.5728\n",
        "Linknet | efficientnetb2 | Nadam | 10 | 20 | 90% | 10% | 0.8530 | 0.5759\n",
        "Unet | resnet18 | Nadam | 25 | 20 | 90% | 10% | 0.8530 | 0.5619\n",
        "Linknet | efficientnetb3 | Nadam | 15 | 20 | 99% | 1% | 0.8530 | 0.5791\n",
        "Unet | resnet18 | Nadam | 6 | 20 | 99% | 1% | 0.5263 | 0.5267\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "FSL7fwHokPtc"
      },
      "source": [
        "## Todos for next week\n",
        "1) try to train the same models but with Rectified Adam (RAdam) optimizer (as RAdam outperforms Adam [8]) ✅\n",
        "\n",
        "2) play with other augmentation techniques\n",
        "\n",
        "3) use larger/deeper models"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "xOOv0mDkkPhB"
      },
      "source": [
        "#Update for 5th Week 5️⃣\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "wPKyHkJdkPSS"
      },
      "source": [
        "##RAdam\n",
        "\n",
        "Last week I tried to improve mean dice score using RAdam optimizer. Unfortenatly it didn't give any significant accuracy improvements. In comparison to NAdam, RAdam doesn't tend so strong to overfitting."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_wmv5dy4mDvi",
        "colab_type": "text"
      },
      "source": [
        "##Postprocessing\n",
        "\n",
        "Unet or Linknet give a matrix of probabilistic masks. For each value, we have a value from 0 to 1. So we can set a threshold to which the probability value we will have a cloud. Then we filter the masks by their size. If the mask is too small, it will be ignored. As we can see in the table below, the final results are very sensitive to any thresholds changes in postprocessing.\n",
        "\n",
        "For each cloud type, we set own mask and label thresholds\n",
        "\n",
        "L | Fish | Flower | Gravel | Sugar | Kaggle-Score\n",
        "--- |--- | --- | --- | --- | ---\n",
        "Label Threshold | 0.5 | 0.5 | 0.5 | 0.35 | 0.6504\n",
        "Mask Threshold | 25000 | 20000 | 22500 | 15000 |\n",
        "Label Threshold | 0.6 | 0.6 | 0.6 | 0.6 | 0.6429\n",
        "Mask Threshold | 20000 | 20000 | 22500 | 10000 |\n",
        "Label Threshold | 0.5 | 0.5 | 0.5 | 0.5 | 0.6410\n",
        "Mask Threshold | 15000 | 15000 | 15000 | 15000 |\n",
        "\n",
        "This technique allowed to improve the model by 1%. The Kaggle-Score now is 65.04%"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b24dXwlyoqz1",
        "colab_type": "text"
      },
      "source": [
        "## Training von FPN-Model @with Marie-Sophie\n",
        "\n",
        "Also we tried to train Feature Pyramid Network model (FPN) with the following parameters:\n",
        "\n",
        "Netz | Backbone | Optimizer | Batch size | Epochen | Train size | Val size | Dice-Coef Train | Dice-Coef Val | Kaggle-Score\n",
        "--- | --- | --- | --- | --- | --- | --- | --- | --- | --- \n",
        "FPN | efficientnetb4 | NAdam | 15 | 20 | 99% | 1% | | | 0.6387 |"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vIZJ9V_GidzW",
        "colab_type": "text"
      },
      "source": [
        "## Results\n",
        "\n",
        "The summary of all model, trained last week\n",
        "\n",
        "Netz | Backbone | Optimizer | Batch size | Epochen | Train size | Val size | Dice-Coef Train | Dice-Coef Val\n",
        "--- | --- | --- | --- | --- | --- | --- | --- | --- \n",
        "Linknet | efficientnetb2 | RAdam | 10 | 20 | 99% | 1% | 0.5582 | 0.5589\n",
        "FPN | efficientnetb2 | RAdam | 5 | 20 | 99% | 1% | 0.5597 | 0.5597\n",
        "FPN | efficientnetb3 | Nadam | 10 | 20 | 99% | 1% | 0.7066 | 0.5671\n",
        "FPN | efficientnetb4 | NAdam | 15 | 20 | 99% | 1% | |\n",
        "Linknet | resnet34 | NAdam | 15 | 20 | 99% | 1% | 0.5609 | 0.5592\n",
        "Linknet | efficientnetb2 | NAdam | 15 | 20 | 99% | 1% | 0.5609 | 0.5856\n",
        "Linknet | efficientnetb3 | NAdam | 10 | 20 | 99% | 1% | |"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZzV4hPEg0g3x",
        "colab_type": "text"
      },
      "source": [
        "## Todo for next day\n",
        "\n",
        "1) Train the model with gamma correction\n",
        "\n",
        "2) Use Unet++"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RzjTCnS-enyT",
        "colab_type": "text"
      },
      "source": [
        "# Literature\n",
        "\n",
        "[1] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. \"U-net: Convolutional networks for biomedical image segmentation.\" International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.\n",
        "\n",
        "[2] Chaurasia, Abhishek, and Eugenio Culurciello. \"Linknet: Exploiting encoder representations for efficient semantic segmentation.\" 2017 IEEE Visual Communications and Image Processing (VCIP). IEEE, 2017.\n",
        "\n",
        "[3] Zhao, Hengshuang, et al. \"Pyramid scene parsing network.\" Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.\n",
        "\n",
        "[4] He, Kaiming, et al. \"Deep residual learning for image recognition.\" Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.\n",
        "\n",
        "[5] Tan, Mingxing, and Quoc V. Le. \"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.\" arXiv preprint arXiv:1905.11946 (2019).\n",
        "\n",
        "[6] Vuola, Aarno Oskar, Saad Ullah Akram, and Juho Kannala. \"Mask-RCNN and U-net Ensembled for Nuclei Segmentation.\" arXiv preprint arXiv:1901.10170 (2019).\n",
        "\n",
        "[7] Microsoft Developer Blog, von 5 Juli 2018, https://www.microsoft.com/developerblog/2018/07/05/satellite-images-segmentation-sustainable-farming/, abgerufen am 28.10.2019\n",
        "\n",
        "[8] Liu, Liyuan, et al. \"On the variance of the adaptive learning rate and beyond.\" arXiv preprint arXiv:1908.03265 (2019).\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "qB9paK51dK67",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "!pip uninstall keras -y\n",
        "!pip install tensorflow-gpu==1.15.0 --quiet\n",
        "!pip install keras==2.2.5 --quiet\n",
        "!pip install segmentation-models --quiet\n",
        "!pip install pandas --quiet\n",
        "!pip install albumentations --quiet\n",
        "!pip install numpy --quiet\n",
        "!pip install pathlib --quiet\n",
        "!pip install opencv-python --quiet\n",
        "!pip install scikit-learn"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "AB-z0WH4L7Is",
        "colab": {}
      },
      "source": [
        "from google.colab import drive  # comment if not using google drive\n",
        "drive.mount('/content/drive') # comment if not using google drive\n",
        "#import matplotlib.pyplot as plt\n",
        "#import matplotlib.image as mpimg\n",
        "import segmentation_models as sm\n",
        "import pandas as pd \n",
        "import numpy as np\n",
        "#from PIL import Image\n",
        "import keras\n",
        "from keras.optimizers import Adam, Nadam\n",
        "from keras.losses import binary_crossentropy\n",
        "from pathlib import Path\n",
        "from sklearn.model_selection import train_test_split\n",
        "\n",
        "from keras import backend as K\n",
        "import cv2\n",
        "from keras.callbacks import Callback, ModelCheckpoint\n",
        "import albumentations as albu\n",
        "from skimage.exposure import adjust_gamma"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "O9cIBfdvf_tH",
        "colab_type": "code",
        "outputId": "2b4ea68a-b90d-491a-b54f-37e9a42fc108",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 217
        }
      },
      "source": [
        "path=Path('/content/drive/My Drive/understanding_cloud_organization/') #change path\n",
        "### Reading files\n",
        "train=pd.read_csv(path/ 'train.csv')\n",
        "\n",
        "train['ImageId']=train['Image_Label'].apply(lambda x : x.split('_')[0])\n",
        "train['cat']=train['Image_Label'].apply(lambda x : x.split('_')[1])"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "error",
          "ename": "NameError",
          "evalue": "ignored",
          "traceback": [
            "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
            "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
            "\u001b[0;32m<ipython-input-1-fe5834ac648b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mpath\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mPath\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'/content/drive/My Drive/understanding_cloud_organization/'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m#change path\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      2\u001b[0m \u001b[0;31m### Reading files\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0mlinknet_df\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m/\u001b[0m \u001b[0;34m'submission_linknet.csv'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
            "\u001b[0;31mNameError\u001b[0m: name 'Path' is not defined"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "rlPD9w8p1bAk",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# create column \"has_mask\" by checking if \"EncodedPixels\" column has values\n",
        "train['has_mask']= ~pd.isna(train['EncodedPixels'])\n",
        "train['missing']= pd.isna(train['EncodedPixels'])\n",
        "train_nan=train.groupby('ImageId').agg('sum')\n",
        "\n",
        "mask_count_df=pd.DataFrame(train_nan)\n",
        "#mask_count_df = mask_count_df.iloc[0:1500]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "_HXA5J_02owj",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "mask_count_df = mask_count_df.groupby('ImageId').agg(np.sum).reset_index()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Fap6I7r325i7",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "### @todo increase the size of batch_size\n",
        "BATCH_SIZE = 15 # try to increase batch size to the maximum"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "uLmBftNfVfSA",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# some utility functions, e.g. for image resizing or mask building\n",
        "# was taken from https://www.kaggle.com/paulorzp/rle-functions-run-lenght-encode-decode\n",
        "def np_resize(img, input_shape):\n",
        "    \"\"\"\n",
        "    Reshape a numpy array, which is input_shape=(height, width), \n",
        "    as opposed to input_shape=(width, height) for cv2\n",
        "    \"\"\"\n",
        "    height, width = input_shape\n",
        "    return cv2.resize(img, (width, height))\n",
        "    \n",
        "def mask2rle(img):\n",
        "    '''\n",
        "    img: numpy array, 1 - mask, 0 - background\n",
        "    Returns run length as string formated\n",
        "    '''\n",
        "    pixels= img.T.flatten()\n",
        "    pixels = np.concatenate([[0], pixels, [0]])\n",
        "    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1\n",
        "    runs[1::2] -= runs[::2]\n",
        "    return ' '.join(str(x) for x in runs)\n",
        "\n",
        "def rle2mask(rle, input_shape):\n",
        "    width, height = input_shape[:2]\n",
        "    \n",
        "    mask= np.zeros( width*height ).astype(np.uint8)\n",
        "    \n",
        "    array = np.asarray([int(x) for x in rle.split()])\n",
        "    starts = array[0::2]\n",
        "    lengths = array[1::2]\n",
        "\n",
        "    current_position = 0\n",
        "    for index, start in enumerate(starts):\n",
        "        mask[int(start):int(start+lengths[index])] = 1\n",
        "        current_position += lengths[index]\n",
        "        \n",
        "    return mask.reshape(height, width).T\n",
        "\n",
        "def build_masks(rles, input_shape, reshape=None):\n",
        "    depth = len(rles)\n",
        "    if reshape is None:\n",
        "        masks = np.zeros((*input_shape, depth))\n",
        "    else:\n",
        "        masks = np.zeros((*reshape, depth))\n",
        "    \n",
        "    for i, rle in enumerate(rles):\n",
        "        if type(rle) is str:\n",
        "            if reshape is None:\n",
        "                masks[:, :, i] = rle2mask(rle, input_shape)\n",
        "            else:\n",
        "                mask = rle2mask(rle, input_shape)\n",
        "                reshaped_mask = np_resize(mask, reshape)\n",
        "                masks[:, :, i] = reshaped_mask\n",
        "    \n",
        "    return masks\n",
        "\n",
        "def build_rles(masks, reshape=None):\n",
        "    width, height, depth = masks.shape\n",
        "    \n",
        "    rles = []\n",
        "    \n",
        "    for i in range(depth):\n",
        "        mask = masks[:, :, i]\n",
        "        \n",
        "        if reshape:\n",
        "            mask = mask.astype(np.float32)\n",
        "            mask = np_resize(mask, reshape).astype(np.int64)\n",
        "        \n",
        "        rle = mask2rle(mask)\n",
        "        rles.append(rle)\n",
        "        \n",
        "    return rles"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "C0GZ09tZrfJ2",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def dice_coef(y_true, y_pred, smooth=1):\n",
        "    y_true_f = K.flatten(y_true)\n",
        "    y_pred_f = K.flatten(y_pred)\n",
        "    intersection = K.sum(y_true_f * y_pred_f)\n",
        "    return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)\n",
        "\n",
        "def dice_loss(y_true, y_pred):\n",
        "    smooth = 1.\n",
        "    y_true_f = K.flatten(y_true)\n",
        "    y_pred_f = K.flatten(y_pred)\n",
        "    intersection = y_true_f * y_pred_f\n",
        "    score = (2. * K.sum(intersection) + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)\n",
        "    return 1. - score\n",
        "\n",
        "def bce_dice_loss(y_true, y_pred):\n",
        "    return binary_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "bjAHzEZVIw8w",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Custom DataGenerator\n",
        "# https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly\n",
        "class DataGenerator(keras.utils.Sequence):\n",
        "    'Generates data for Keras'\n",
        "    def __init__(self, list_IDs, df, target_df=None, mode='fit',\n",
        "                 base_path=path/'train_images',\n",
        "                 batch_size=BATCH_SIZE, dim=(1400, 2100), n_channels=3, reshape=None, gamma=None,\n",
        "                 augment=False, n_classes=4, random_state=2019, shuffle=True):\n",
        "        self.dim = dim\n",
        "        self.batch_size = batch_size\n",
        "        self.df = df\n",
        "        self.mode = mode\n",
        "        self.base_path = base_path\n",
        "        self.target_df = target_df\n",
        "        self.list_IDs = list_IDs\n",
        "        self.reshape = reshape\n",
        "        self.gamma = gamma\n",
        "        self.n_channels = n_channels\n",
        "        self.augment = augment\n",
        "        self.n_classes = n_classes\n",
        "        self.shuffle = shuffle\n",
        "        self.random_state = random_state\n",
        "        \n",
        "        self.on_epoch_end()\n",
        "        np.random.seed(self.random_state)\n",
        "\n",
        "    def __len__(self):\n",
        "        'Denotes the number of batches per epoch'\n",
        "        return int(np.floor(len(self.list_IDs) / self.batch_size))\n",
        "\n",
        "    def __getitem__(self, index):\n",
        "        'Generate one batch of data'\n",
        "        # Generate indexes of the batch\n",
        "        indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]\n",
        "\n",
        "        # Find list of IDs\n",
        "        list_IDs_batch = [self.list_IDs[k] for k in indexes]\n",
        "        \n",
        "        X = self.__generate_X(list_IDs_batch)\n",
        "        \n",
        "        if self.mode == 'fit':\n",
        "            y = self.__generate_y(list_IDs_batch)\n",
        "            \n",
        "            if self.augment:\n",
        "                X, y = self.__augment_batch(X, y)\n",
        "            \n",
        "            return X, y\n",
        "        \n",
        "        elif self.mode == 'predict':\n",
        "            return X\n",
        "\n",
        "        else:\n",
        "            raise AttributeError('The mode parameter should be set to \"fit\" or \"predict\".')\n",
        "        \n",
        "    def on_epoch_end(self):\n",
        "        'Updates indexes after each epoch'\n",
        "        self.indexes = np.arange(len(self.list_IDs))\n",
        "        if self.shuffle == True:\n",
        "            np.random.seed(self.random_state)\n",
        "            np.random.shuffle(self.indexes)\n",
        "    \n",
        "    def __generate_X(self, list_IDs_batch):\n",
        "        'Generates data containing batch_size samples'\n",
        "        # Initialization\n",
        "        if self.reshape is None:\n",
        "            X = np.empty((self.batch_size, *self.dim, self.n_channels))\n",
        "        else:\n",
        "            X = np.empty((self.batch_size, *self.reshape, self.n_channels))\n",
        "      \n",
        "        \n",
        "        # Generate data\n",
        "        for i, ID in enumerate(list_IDs_batch):\n",
        "            im_name = self.df['ImageId'].iloc[ID]\n",
        "            img_path = f\"{self.base_path}/{im_name}\"\n",
        "            img = self.__load_rgb(img_path)\n",
        "            \n",
        "            if self.reshape is not None:\n",
        "                img = np_resize(img, self.reshape)\n",
        "\n",
        "            if self.gamma is not None:\n",
        "                img = adjust_gamma(img, gamma=self.gamma)\n",
        "            \n",
        "            # Store samples\n",
        "            X[i,] = img\n",
        "\n",
        "        return X\n",
        "    \n",
        "    def __generate_y(self, list_IDs_batch):\n",
        "        if self.reshape is None:\n",
        "            y = np.empty((self.batch_size, *self.dim, self.n_classes), dtype=int)\n",
        "        else:\n",
        "            y = np.empty((self.batch_size, *self.reshape, self.n_classes), dtype=int)\n",
        "        \n",
        "        for i, ID in enumerate(list_IDs_batch):\n",
        "            im_name = self.df['ImageId'].iloc[ID]\n",
        "            image_df = self.target_df[self.target_df['ImageId'] == im_name]\n",
        "            \n",
        "            rles = image_df['EncodedPixels'].values\n",
        "            \n",
        "            if self.reshape is not None:\n",
        "                masks = build_masks(rles, input_shape=self.dim, reshape=self.reshape)\n",
        "            else:\n",
        "                masks = build_masks(rles, input_shape=self.dim)\n",
        "            \n",
        "            y[i, ] = masks\n",
        "\n",
        "        return y\n",
        "    \n",
        "    def __load_grayscale(self, img_path):\n",
        "        img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)\n",
        "        img = img.astype(np.float32) / 255.\n",
        "        img = np.expand_dims(img, axis=-1)\n",
        "\n",
        "        return img\n",
        "    \n",
        "    def __load_rgb(self, img_path):\n",
        "        img = cv2.imread(img_path)\n",
        "        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n",
        "        img = img.astype(np.float32) / 255.\n",
        "\n",
        "        return img\n",
        "    \n",
        "    def __random_transform(self, img, masks):\n",
        "        composition = albu.Compose([albu.HorizontalFlip(p=0.5),      \n",
        "                    albu.ShiftScaleRotate(scale_limit=0.5, rotate_limit=0, shift_limit=0.1, p=0.5, border_mode=0),\n",
        "                    albu.GridDistortion(p=0.5),\n",
        "                    albu.OpticalDistortion(p=0.5, distort_limit=2, shift_limit=0.5),\n",
        "                    albu.Resize((320, 480)))]\n",
        "                    )\n",
        "        \n",
        "        composed = composition(image=img, mask=masks)\n",
        "        aug_img = composed['image']\n",
        "        aug_masks = composed['mask']\n",
        "        \n",
        "        return aug_img, aug_masks\n",
        "    \n",
        "    def __augment_batch(self, img_batch, masks_batch):\n",
        "        for i in range(img_batch.shape[0]):\n",
        "            img_batch[i, ], masks_batch[i, ] = self.__random_transform(\n",
        "                img_batch[i, ], masks_batch[i, ])\n",
        "        \n",
        "        return img_batch, masks_batch\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "AObOSOs1F-X_",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "train_idx, val_idx = train_test_split(\n",
        "    mask_count_df.index, random_state=2019, test_size=0.2\n",
        ")\n",
        "\n",
        "train_generator = DataGenerator(\n",
        "    train_idx, \n",
        "    df=mask_count_df,\n",
        "    target_df=train,\n",
        "    batch_size=BATCH_SIZE,\n",
        "    reshape=(320, 480),\n",
        "    reshape=None,\n",
        "    gamma=None,\n",
        "    augment=True,\n",
        "    n_channels=3,\n",
        "    n_classes=4\n",
        ")\n",
        "\n",
        "val_generator = DataGenerator(\n",
        "    val_idx, \n",
        "    df=mask_count_df,\n",
        "    target_df=train,\n",
        "    batch_size=BATCH_SIZE, \n",
        "    reshape=(320, 480),\n",
        "    gamma=None,\n",
        "    augment=False,\n",
        "    n_channels=3,\n",
        "    n_classes=4\n",
        ")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "u92TQSSVIHY0",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# List of all available nets and backbones https://github.com/qubvel/segmentation_models#models-and-backbones\n",
        "\n",
        "# use efficientnetbX, X = higher -> better (max. efficientnetb7)\n",
        "BACKBONE = 'efficientnetb2'\n",
        "\n",
        "# use default preprocessing from segmentation_models package\n",
        "preprocess_input = sm.get_preprocessing(BACKBONE)\n",
        "# use FPN as FPN is better than Unet and Linknet\n",
        "model = sm.FPN(BACKBONE, encoder_weights='imagenet', activation='sigmoid', classes=4, input_shape=(320, 480, 3))\n",
        "\n",
        "\n",
        "# use Dice-Coef-Metric as a metric (the higher the better)\n",
        "model.compile(\n",
        "    optimizer=Nadam(lr=0.0002),\n",
        "    loss=binary_crossentropy, \n",
        "    metrics=[dice_coef],\n",
        ")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "u2HLfmt2K4hC",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "#filepath = \"saved-model-{epoch:02d}-{val_acc:.2f}.h5\"\n",
        "checkpoint = ModelCheckpoint('model.h5', save_best_only=True)\n",
        "\n",
        "model.fit_generator(\n",
        "   generator=train_generator,\n",
        "   validation_data = val_generator,\n",
        "   epochs=20,\n",
        "   callbacks=[checkpoint]\n",
        ")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FAVE54S5nhL5",
        "colab_type": "text"
      },
      "source": [
        "# Result's summary\n",
        "Here we will compare the results of models of different neuronal network architectures for the whole time\n",
        "\n",
        "Netz | Backbone | Optimizer | Batch size | Epochen | Train size | Val size | Dice-Coef Train | Dice-Coef Val| Kaggle-Score\n",
        "--- | --- | --- | --- | --- | --- | --- | --- | --- | ---\n",
        "Unet | resnet34 | Nadam | 20 | 20 | 1000 | 500 | 0.6251 | 0.5188\n",
        "Linknet | resnet34 |Nadam | 20 | 20 | 1000 | 500 | 0.7047 | 0.5183\n",
        "PSPNet | resnet34 | Nadam | 20 | 20 | 1000 | 500 | 0.5574 | 0.5068\n",
        "Linknet | efficientnetb4 | Nadam  | 5 | 20 | 1000 | 500 | 0.6581 | 0.5390\n",
        "Linknet | efficientnetb4 | Nadam | 6 | 40 | 1000 | 500 | 0.6581 | 0.5388\n",
        "Linknet | efficientnetb4 | Adam | 5 | 20 | 1000 | 500 | 0.7469 | 0.5118\n",
        "Linknet | efficientnetb2 | Nadam | 10 | 20 | 90% | 10% | 0.8530 | 0.5728\n",
        "Linknet | efficientnetb2 | Nadam | 10 | 20 | 90% | 10% | 0.8530 | 0.5759 | 0.6504\n",
        "Unet | resnet18 | Nadam | 25 | 20 | 90% | 10% | 0.8530 | 0.5619\n",
        "Linknet | efficientnetb3 | Nadam | 15 | 20 | 99% | 1% | 0.8530 | 0.5619\n",
        "Unet | resnet18 | Nadam | 6 | 20 | 99% | 1% | 0.5263 | 0.5267\n",
        "Linknet | efficientnetb2 | RAdam | 10 | 20 | 99% | 1% | 0.5582 | 0.5589\n",
        "FPN | efficientnetb2 | RAdam | 5 | 20 | 99% | 1% | 0.5597 | 0.5597\n",
        "FPN | efficientnetb3 | Nadam | 10 | 20 | 99% | 1% | 0.7066 | 0.5671\n",
        "FPN | efficientnetb4 | NAdam | 15 | 20 | 99% | 1% | ||0.6387\n",
        "Linknet | resnet34 | NAdam | 15 | 20 | 99% | 1% | 0.5609 | 0.5592\n",
        "\n",
        "Unfortunately because of Google Colaboratory restrictions, it was unavailable to test Linknet and Unit net with batch size higher than 5 images.\n",
        "\n",
        "Even better results can be achieved by using a larger batch size, more accurate model like efficientnetb7 [5] (for now is efficientnetb7 state-of-the-art model) and more sophisticated augmentation techniques."
      ]
    }
  ]
}