Skip to content

Instantly share code, notes, and snippets.

@KMarkert
Created September 22, 2020 15:05
Show Gist options
  • Save KMarkert/32200f444ac8032e3e429cc8d6e3bd6d to your computer and use it in GitHub Desktop.
Save KMarkert/32200f444ac8032e3e429cc8d6e3bd6d to your computer and use it in GitHub Desktop.
silvacarbon-gee-tensorflow.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "silvacarbon-gee-tensorflow.ipynb",
"provenance": [],
"private_outputs": true,
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/KMarkert/32200f444ac8032e3e429cc8d6e3bd6d/silvacarbon-gee-tensorflow.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "avyV6lS_BZC1",
"colab_type": "text"
},
"source": [
"# Setting up our environment\n",
"\n",
"Within this notebook we will use a combination of packages within Python for our workflow. To use these packages we either have to import the package if it is already installed or install the package then import. Additionally, *you* will need to be authorized to run certain portions of the workflow so we will do some authenticating."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BXlJJ6gMEOlm",
"colab_type": "text"
},
"source": [
"The following blocks will either import Python packages that will be used in this notebook, install and import packages, or prompt authentication."
]
},
{
"cell_type": "code",
"metadata": {
"id": "Cm4sV9JQVxTv",
"colab_type": "code",
"colab": {}
},
"source": [
"# import basin installed packages\n",
"import math\n",
"import json\n",
"import subprocess\n",
"import numpy as np\n",
"from pprint import pprint\n",
"from google.colab import auth\n",
"from google.colab import drive\n",
"\n",
"# interactive visualization inline\n",
"%pylab inline"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "3UQEUy-BEVwJ",
"colab_type": "code",
"colab": {}
},
"source": [
"# Authenticate with our Google account so Colab can access data\n",
"auth.authenticate_user()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "wx0a9VlMWAM9",
"colab_type": "code",
"colab": {}
},
"source": [
"# check if the earth engine python api is install\n",
"# if not, install EarthEngine API and authenticate\n",
"try:\n",
" import ee\n",
" ee.Initialize()\n",
"\n",
"except ImportError:\n",
" import getpass\n",
" !pip install earthengine-api --quiet\n",
" !earthengine --no-use_cloud_api authenticate --quiet\n",
"\n",
" authCode = getpass.getpass(\"Authorization code:\")\n",
" \n",
" !earthengine --no-use_cloud_api authenticate --authorization-code=$authCode --quiet\n",
"\n",
" import ee\n",
" ee.Initialize()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "cxC_SyYNYK01",
"colab_type": "code",
"colab": {}
},
"source": [
"# install the *new* TensorFlow 2.0 API\n",
"!pip install tensorflow==2.0.0-beta1 --quiet\n",
"\n",
"# import tf specific modules for building the net\n",
"import tensorflow as tf\n",
"from tensorflow.python.keras import models\n",
"from tensorflow.python.keras import layers\n",
"from tensorflow.python.keras import backend as K\n",
"from tensorflow.python.keras.layers import Lambda\n",
"\n",
"print('\\nUsing TensorFlow version {}'.format(tf.__version__))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "9sHlXSLLYRpk",
"colab_type": "code",
"colab": {}
},
"source": [
"# import visualizaition pacakge and define a helper for later on\n",
"import folium\n",
"from IPython.display import display\n",
"\n",
"# helper class to create an interactive map with ee.Objects\n",
"# somewhat mimics the EE code editor's Map (poor man's version)\n",
"class eeMap:\n",
" def __init__(self,center=[0, 0],zoom=3,width=1200,height=500,basemap='cartodbpositron',**kwargs):\n",
" self.fig = folium.Figure(width=width, height=height)\n",
" self._map = folium.Map(location=center,zoom_start=zoom,tiles=basemap,**kwargs)\n",
" return\n",
"\n",
" def show(self):\n",
" self._map.add_child(folium.LayerControl())\n",
" self.fig.add_child(self._map)\n",
" return display(self.fig)\n",
"\n",
" def addLayer(self,eeObj,vizParams=None,name=None):\n",
" map_id = eeObj.getMapId(vizParams)\n",
" tile_url_template = \"https://earthengine.googleapis.com/map/{mapid}/{{z}}/{{x}}/{{y}}?token={token}\"\n",
" mapurl = tile_url_template.format(**map_id)\n",
" self._map.add_child(folium.WmsTileLayer(mapurl,name=name,attr='Google Earth Engine'))\n",
"\n",
" return"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "7xn8OMP9ISRb",
"colab_type": "text"
},
"source": [
"# Working with Earth Engine to get data\n",
"\n",
"Now that we have our environment set up, we can start calling Earth Engine data to process and export what we need for our TensorFlow model.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QeLO8Ip7IhtK",
"colab_type": "text"
},
"source": [
"First thing, we need a region to process."
]
},
{
"cell_type": "code",
"metadata": {
"id": "WORKg-fSa4n2",
"colab_type": "code",
"colab": {}
},
"source": [
"seed = 7 # random seed for replicability throughout the workflow\n",
"\n",
"# predefined region\n",
"region = ee.Geometry.Rectangle([-75.0, 0.0, -70.0, 5.0])\n",
"\n",
"empty = ee.Image().byte(); \n",
"bbox = empty.paint(\n",
" featureCollection= ee.FeatureCollection([region]),\n",
" color= 1,\n",
" width= 2\n",
")\n",
"\n",
"# visualize our region\n",
"Map = eeMap(center=[2.5,-72.5],zoom=6)\n",
"Map.addLayer(bbox,{},'Study Area')\n",
"Map.show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "Sn7rIeaScBI5",
"colab_type": "text"
},
"source": [
"Next, we are going to get the imagery to process. There is an image asset already created so we will not have to wait too long to process it. This is a median composite from BRDF corrected data for 2017-2019. This is what we will use as features for our TF model.\n",
"\n",
"If you want to create your own image composite from Landsat 8 surface reflectance data, check out [this script](https://code.earthengine.google.com/6405ede6762cfe03171ad9c5a5fb4fd7)."
]
},
{
"cell_type": "code",
"metadata": {
"id": "3SbpG16jZkoX",
"colab_type": "code",
"colab": {}
},
"source": [
"img = ee.Image('users/kelmarkert/public/silvacarbon-tf-composite')\n",
"\n",
"# play around with the visualization parameters to explore the data\n",
"display_bands = ['swir2','nir','green']\n",
"vmin = 50\n",
"vmax = 5500\n",
"vgamma = 1.5\n",
"\n",
"Map = eeMap(center=[2.5,-72.5],zoom=6)\n",
"Map.addLayer(img,{'bands':display_bands,'min':vmin,'max':vmax,'gamma':vgamma},'Composite Image')\n",
"Map.addLayer(bbox,{},'AoI')\n",
"Map.show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "FRVgds1ckU2x",
"colab_type": "text"
},
"source": [
"Now we want to select the features (bands) that will be used to train and ultimately predict within our model. We do this by selecting the bands that we want from the image."
]
},
{
"cell_type": "code",
"metadata": {
"id": "h6yzb_0oiPAq",
"colab_type": "code",
"colab": {}
},
"source": [
"# band names of the predictors we are going to use\n",
"# we can change these (try different ones!!!)\n",
"predictors = ['red','nir','swir1','swir2','ndbi','ndwi','evi2','mndwi']\n",
"label = ['label']\n",
"\n",
"# cast the predictor list as an ee.List\n",
"eePredictors = ee.List(predictors)\n",
"\n",
"featureImg = img.select(eePredictors)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "O0VaJZMVi6cR",
"colab_type": "text"
},
"source": [
"Usually with ML we want our [data to be normalized](https://medium.com/@urvashilluniya/why-data-normalization-is-necessary-for-machine-learning-models-681b65a05029) some how if the features have different ranges (such as surface reflectance [0-1] vs index [-1-1]). Since that is our case for this example, we are going to do a minimum/maximum normalization so all of our data is from 0-1. If you are feeling adventurous, you should try to normalize your data by standard deviation!!!\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "srYP6Wm9fQvo",
"colab_type": "code",
"colab": {}
},
"source": [
"# function to scale an image between the min and max value\n",
"def minMaxScalar(image,region,bandList,scale=1000):\n",
" def applyScaling(band):\n",
" band = ee.String(band)\n",
" imin = ee.Number(minMax.get(band.cat('_min')))\n",
" imax = ee.Number(minMax.get(band.cat('_max')))\n",
" return image.select(band).unitScale(imin,imax)\n",
"\n",
" # force cast of bandList\n",
" bandList = ee.List(bandList)\n",
"\n",
" # get the min/max values for each band over the region\n",
" minMax = image.reduceRegion(\n",
" reducer= ee.Reducer.minMax(),\n",
" geometry= region,\n",
" scale= scale,\n",
" bestEffort= True\n",
" )\n",
"\n",
" # map over each band and scale between the min and max\n",
" # returns an image with the original predictor names\n",
" normalized = ee.ImageCollection(bandList.map(applyScaling)).toBands()\\\n",
" .select(ee.List.sequence(0,bandList.size().subtract(1)),bandList)\n",
"\n",
" return normalized\n",
"\n",
"# apply the scaling on an image for a region and specific bands\n",
"normalized = minMaxScalar(img,region,eePredictors)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "m83cLb1SUqb8",
"colab_type": "text"
},
"source": [
"We have our features for training the model, now we need label data. Because deep learning models need a lot of training data, we are going to use the Hansen forest cover dataset to create a \"truth\" dataset.\n",
"\n",
"For semantic segmentation, we need an image of segmentations to classify hence we use the image here versus the traditional forest/non-forest."
]
},
{
"cell_type": "code",
"metadata": {
"id": "QCj26em5cfzV",
"colab_type": "code",
"colab": {}
},
"source": [
"hansenData = ee.Image('UMD/hansen/global_forest_change_2018_v1_6')\n",
"\n",
"forestThreshold = 90\n",
"\n",
"forest = hansenData.select('treecover2000').gte(forestThreshold)\\\n",
" .And(hansenData.select('loss').eq(0))\n",
"\n",
"# Define a kernel.\n",
"kernel = ee.Kernel.circle(radius= 1);\n",
"\n",
"#Perform an erosion followed by a dilation, display.\n",
"labels = forest.clip(region)\\\n",
" .focal_min(kernel= kernel, iterations= 2)\\\n",
" .focal_max(kernel= kernel, iterations= 2)\\\n",
" .focal_mode()\\\n",
" .rename(label)\n",
"\n",
"Map = eeMap(center=[2.5,-72.5],zoom=6)\n",
"Map.addLayer(labels.updateMask(labels),{'palette':'green'},'Forest Label Image')\n",
"Map.addLayer(bbox,{},'AoI')\n",
"Map.show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "242wampYDMo4",
"colab_type": "text"
},
"source": [
"Here we define our window size/block size that we will use in the model for training and prediction. In Earth Engine, we convert an image to neighborhoods."
]
},
{
"cell_type": "code",
"metadata": {
"id": "fyKSTUCOcfHy",
"colab_type": "code",
"colab": {}
},
"source": [
"# Create a 256x256 patch per pixel\n",
"# first we build a kernel\n",
"kernelSize = 64\n",
"list_ = ee.List.repeat(1, kernelSize)\n",
"lists = ee.List.repeat(list_, kernelSize)\n",
"kernel = ee.Kernel.fixed(kernelSize, kernelSize, lists)\n",
"\n",
"# then we convert the feature image with labels to 256x256 neighborhoods\n",
"with_neighborhood = normalized.addBands(labels.unmask(0)).neighborhoodToArray(kernel);"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "WYKlfdH3Dd6G",
"colab_type": "text"
},
"source": [
"We have to specify where our samples for training/testing to export from EE. In this case we use a simple random sample, for a much more robust training/testing dataset you would need to perfom a better sampling."
]
},
{
"cell_type": "code",
"metadata": {
"id": "pY44rcFZYDoc",
"colab_type": "code",
"colab": {}
},
"source": [
"# helper function to convert FeatureCollection \n",
"# of points to a FeatureCollection of boxes\n",
"def pt_to_region(feature):\n",
" # we buffer points by 7680 (=256x30) and return the bounding box\n",
" return feature.buffer(30*kernelSize).bounds();\n",
"\n",
"nPts = 300 # change number of points if desired\n",
"\n",
"# create a feature collection with a random sample\n",
"pts = ee.FeatureCollection.randomPoints(region,nPts,seed)\n",
"\n",
"exampleRegions = pts.map(pt_to_region)\n",
"\n",
"# visualization \n",
"outlines = empty.paint(\n",
" featureCollection= exampleRegions,\n",
" color= 1,\n",
" width= 2\n",
")\n",
"\n",
"Map = eeMap(center=[2.5,-72.5],zoom=6)\n",
"Map.addLayer(pts,{'color':'red'},'Sample Points')\n",
"Map.addLayer(outlines,{},'Example Regions')\n",
"Map.show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "M0fKg5HED0AP",
"colab_type": "text"
},
"source": [
"Now we sample the neighborhoods from the image for the center points of our sample."
]
},
{
"cell_type": "code",
"metadata": {
"id": "XFHS0o5ZjlYR",
"colab_type": "code",
"colab": {}
},
"source": [
"train_size = 0.7 # percent of samples to use for training vs testing\n",
"\n",
"# sample the neighborhoods over the sampling points \n",
"samples = with_neighborhood.sampleRegions(\n",
" collection= pts,\n",
" tileScale= 16\n",
").randomColumn('random',seed);\n",
"\n",
"training = samples.filter(ee.Filter.lte('random',train_size))\n",
"testing = samples.filter(ee.Filter.gt('random',train_size))\n",
"\n",
"feature_names = predictors+['label']"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "evXJW3ysdgsi",
"colab_type": "text"
},
"source": [
"All of the data resides on Earth Engine right now so we will need to move the data to a Google Cloud Storage bucket to run TensorFlow with it. Here we tee up export tasks and kick it off programmatically.\n",
"\n",
"You have been given read/write permissions to the folder so we can export the data together. ***Important***: change the `myfolder` variable to your name so that the exports and further processing are within \"your\" area. It can get very confusing if we all write to the same directory...also, please don't delete data to play around with your friend"
]
},
{
"cell_type": "code",
"metadata": {
"id": "rMty8SQhdMod",
"colab_type": "code",
"colab": {}
},
"source": [
"# *** change this variable to your name so we keep the storage bucket organized\n",
"myfolder = <YOUR_NAME>\n",
"\n",
"# GCS bucket to write data to, we will create a folder on there specifically for you\n",
"silvacarbonBucket = 'silvacarbon-gee-tf'\n",
"\n",
"# export our data in TFRecords to GCP Storage for TensorFlow!\n",
"trainingExport = ee.batch.Export.table.toCloudStorage(\n",
" collection= training,\n",
" bucket= silvacarbonBucket,\n",
" fileNamePrefix= myfolder+'/silvacarbon_tf_forest_training_',\n",
" fileFormat= 'TFRecord',\n",
" description= 'silvacarbon_tf_forest_training',\n",
" selectors= feature_names\n",
")\n",
"trainingExport.start()\n",
"\n",
"testingExport = ee.batch.Export.table.toCloudStorage(\n",
" collection= testing,\n",
" bucket= silvacarbonBucket,\n",
" fileNamePrefix= myfolder+'/silvacarbon_tf_forest_testing_',\n",
" fileFormat= 'TFRecord',\n",
" description= 'silvacarbon_tf_forest_testing',\n",
" selectors= feature_names\n",
")\n",
"testingExport.start()\n",
"\n",
"# get serializable geometry for image export\n",
"exportRegion = region.bounds().getInfo()['coordinates']\n",
"\n",
"imageExport = ee.batch.Export.image.toCloudStorage(\n",
" image= normalized.toFloat(),\n",
" region= exportRegion,\n",
" bucket = silvacarbonBucket,\n",
" description= 'silvacarbon_tf_forest_image',\n",
" fileNamePrefix= myfolder+'/silvacarbon_tf_forest_image_',\n",
" scale= 30,\n",
" maxPixels= 1e13,\n",
" fileFormat= 'TFRecord',\n",
" formatOptions= {\n",
" 'patchDimensions': [256,256],\n",
" 'kernelSize': [kernelSize//2, kernelSize//2],\n",
" 'compressed': True,\n",
" 'maxFileSize': 104857600\n",
" }\n",
")\n",
"# imageExport.start()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "JztYdoQja2o5",
"colab_type": "code",
"colab": {}
},
"source": [
"pprint(ee.batch.Task.list()[:3])"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "CpnSi2eni_rc",
"colab_type": "text"
},
"source": [
"![Wait meme]()\n",
"\n",
"...about 15min"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XCumkJvDzzyo",
"colab_type": "text"
},
"source": [
"Last a little setting of variables...we want to be able to call the file names we just exported so we are going to query the storage folder where our exports are and parse out which file is for training, testing, etc."
]
},
{
"cell_type": "code",
"metadata": {
"id": "Je9gu2veWLJU",
"colab_type": "code",
"colab": {}
},
"source": [
"cmd = 'gsutil ls gs://{}/{}'.format(silvacarbonBucket,myfolder)\n",
"proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\n",
"out, err = proc.communicate()\n",
"flist = out.decode()[:-1].split('\\n')\n",
"\n",
"mixerFile = [flist.pop(i) for i,f in enumerate(flist) if 'mixer' in f][0]\n",
"testFile = [flist.pop(i) for i,f in enumerate(flist) if 'testing' in f][0]\n",
"trainFile = [flist.pop(i) for i,f in enumerate(flist) if 'training' in f][0]\n",
"predictFiles = flist[:]\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "sLgHrB5dOhHX",
"colab_type": "text"
},
"source": [
"Yay! Our data is exported. We get to start playing around with TensorFlow now. \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PxV7uU7ruMm9",
"colab_type": "text"
},
"source": [
"# Setting up our TensorFlow model and training\n",
"\n",
"In this section we will focus on creating a deep learning model to map forest, read in our traning/testing data, and train the model."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-MBScwKf3-fI",
"colab_type": "text"
},
"source": [
"\n",
"There is a popular model architechure that is the basic concept for semantic segmentation, the [U-Net](https://arxiv.org/abs/1505.04597), that we will implement to predict forested areas. Although the U-Net was originally developed for biomedical imaging, it works really well on satellite imagery. Below is a schematic of the model.\n",
"\n",
"![U-Net Architecture](https://i.stack.imgur.com/EtyQs.png)\n",
"\n",
"Now, let's implement the code.\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "uX08V6lNW1pf",
"colab_type": "code",
"colab": {}
},
"source": [
"# top-level function that is the network\n",
"def unet():\n",
" # A set of helper functions for defining a convolutional layer. We use batch\n",
" # normalization to speed up training given our limited training data, therefore\n",
" # we can't use vanilla conv2d(activation='relu', ...)\n",
" def conv_block(input_tensor, num_filters):\n",
" encoder = layers.Conv2D(num_filters, (3, 3), padding='same')(input_tensor)\n",
" encoder = layers.BatchNormalization()(encoder)\n",
" encoder = layers.Activation('relu')(encoder)\n",
" encoder = layers.Conv2D(num_filters, (3, 3), padding='same')(encoder)\n",
" encoder = layers.BatchNormalization()(encoder)\n",
" encoder = layers.Activation('relu')(encoder)\n",
" return encoder\n",
"\n",
" def encoder_block(input_tensor, num_filters):\n",
" encoder = conv_block(input_tensor, num_filters)\n",
" encoder_pool = layers.MaxPooling2D((2, 2), strides=(2, 2))(encoder)\n",
" return encoder_pool, encoder\n",
"\n",
" def decoder_block(input_tensor, concat_tensor, num_filters):\n",
" decoder = layers.Conv2DTranspose(num_filters, (2, 2), strides=(2, 2), padding='same')(input_tensor)\n",
" decoder = layers.concatenate([concat_tensor, decoder], axis=-1)\n",
" decoder = layers.BatchNormalization()(decoder)\n",
" decoder = layers.Activation('relu')(decoder)\n",
" decoder = layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)\n",
" decoder = layers.BatchNormalization()(decoder)\n",
" decoder = layers.Activation('relu')(decoder)\n",
" decoder = layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)\n",
" decoder = layers.BatchNormalization()(decoder)\n",
" decoder = layers.Activation('relu')(decoder)\n",
" return decoder\n",
"\n",
" inputs = layers.Input(shape=[None, None, len(predictors)])\n",
" # start encoding the image\n",
" encoder0_pool, encoder0 = encoder_block(inputs, 16)\n",
" encoder1_pool, encoder1 = encoder_block(encoder0_pool, 32) \n",
" encoder2_pool, encoder2 = encoder_block(encoder1_pool, 64)\n",
" # fully convolutional center\n",
" center = conv_block(encoder2_pool, 128) # center\n",
" # decode/upsample the convoluted image\n",
" decoder2 = decoder_block(center, encoder2, 64) \n",
" decoder1 = decoder_block(decoder2, encoder1, 32)\n",
" decoder0 = decoder_block(decoder1, encoder0, 16) \n",
" # predict final output using a softmax activation\n",
" # This is suitable for a [0,1] response. Use linear otherwise.\n",
" outputs = layers.Conv2D(1, (1, 1),activation='sigmoid')(decoder0)\n",
" return inputs,outputs\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "DLK0hjek5j3z",
"colab_type": "text"
},
"source": [
"Here we define a custom loss function, the [soft dice loss](https://arxiv.org/abs/1606.04797), to guide the optimization during training. This loss function has proven to perform well for semantic segmentation models.\n",
"\n",
"<!-- $loss = 1- \\frac{2 \\cdot \\sum Y_{true} \\cdot Y_{pred}}{\\sum Y_{true}^2 + \\sum Y_{pred}^2}$ -->"
]
},
{
"cell_type": "code",
"metadata": {
"id": "KyWqYenZ5Ave",
"colab_type": "code",
"colab": {}
},
"source": [
"def dice_loss(y_true, y_pred, smooth=1):\n",
" intersection = K.sum(K.abs(y_true * y_pred), axis=-1)\n",
" true_sum = K.sum(K.square(y_true),-1) \n",
" pred_sum = K.sum(K.square(y_pred),-1)\n",
" return 1 - ((2. * intersection + smooth) / (true_sum + pred_sum + smooth))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "NfMIVyIxw2yT",
"colab_type": "text"
},
"source": [
"Here we compile the model using an [Adam optimizer](https://arxiv.org/abs/1412.6980v8) and the custom loss function we wrote."
]
},
{
"cell_type": "code",
"metadata": {
"id": "GU_rMA4Yd_Cc",
"colab_type": "code",
"colab": {}
},
"source": [
"# get the input and output from the unet function\n",
"ins,outs = unet()\n",
"# and define a keras.model from the inputs/outputs\n",
"model = models.Model(inputs=[ins], outputs=[outs])\n",
"\n",
"# compile model with Adam optimizer and soft dice loss\n",
"model.compile(optimizer='adam', loss=dice_loss, metrics=['accuracy'])\n",
"\n",
"model.summary()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "nU7pJZk4xwWX",
"colab_type": "text"
},
"source": [
" We now have to write a custom function to parse our inputs so we can use the data for training. We do this be extracting the parsing out individual feature information and creating [Tensor objects](https://www.tensorflow.org/guide/tensors) from the features."
]
},
{
"cell_type": "code",
"metadata": {
"id": "NzENQqJB0hi7",
"colab_type": "code",
"colab": {}
},
"source": [
"# Our input function will return n features, each a 'side' x 'side' tensor\n",
"# representing the 'label' image \n",
"# Since out data is in a TFRecord format we will need to use TF for parsing\n",
"def input_fn(fileNames,predictors,label, shuffle=True, training=True, batchSize=100, kernelShape=[64,64]):\n",
" \n",
" features = list(predictors)\n",
" features = features + label\n",
" columns = [\n",
" tf.io.FixedLenFeature(shape=kernelShape, dtype=tf.float32) for k in features\n",
" ] \n",
" feature_columns = dict(zip(features, columns))\n",
" \n",
" ds = tf.data.TFRecordDataset(fileNames, compression_type='GZIP')\n",
"\n",
" def parse(example_proto):\n",
" parsed_features = tf.io.parse_single_example(example_proto, feature_columns)\n",
" # Separate the class labels from the training features\n",
" labels = {l:parsed_features.pop(l) for l in label}\n",
" return parsed_features, labels\n",
"\n",
" ds = ds.map(parse, num_parallel_calls=5)\n",
" \n",
" def toTuple(inputs, labels):\n",
" inputsList = [inputs.get(key) for key in predictors]\n",
" stacked = tf.stack(inputsList, axis=0)\n",
" # Convert from CHW to HWC\n",
" stacked = tf.transpose(stacked, [1, 2, 0])\n",
"\n",
" # expand the dimensions of th label tensor to be CHW\n",
" labelsList = [labels.get(key) for key in label]\n",
" lstacked = tf.stack(labelsList, axis=0)\n",
" # Convert from CHW to HWC\n",
" lstacked = tf.transpose(lstacked, [1, 2, 0])\n",
"\n",
" return stacked, lstacked\n",
" \n",
" def udFlipInputs(inputs,labels):\n",
" return tf.image.flip_up_down(inputs), tf.image.flip_up_down(labels)\n",
" \n",
" def lrFlipInputs(inputs,labels):\n",
" return tf.image.flip_left_right(inputs), tf.image.flip_left_right(labels)\n",
" \n",
" def transpInputs(inputs,labels):\n",
" ud = tf.image.flip_up_down(inputs)\n",
" trans = tf.image.flip_left_right(ud)\n",
" udl = tf.image.flip_up_down(labels)\n",
" transl = tf.image.flip_left_right(udl)\n",
" return trans, transl\n",
" \n",
" def rotateInputs(inputs,labels): \n",
" return tf.image.rot90(inputs), tf.image.rot90(labels)\n",
" \n",
" ds = ds.map(toTuple)\n",
" \n",
" # because we have limited examples for training, we will do\n",
" # some rotating and flipping of the images to synthetically\n",
" # increase our training datasets\n",
" if training:\n",
" ds = ds.concatenate(ds.map(udFlipInputs))\\\n",
" .concatenate(ds.map(lrFlipInputs))\\\n",
" .concatenate(ds.map(transpInputs))\\\n",
" .concatenate(ds.map(rotateInputs))\n",
" \n",
" if shuffle:\n",
" # We choose 10 since, with a batch size of 100, we'll keep 1000 (approx.\n",
" # the size of the training data) examples in memory for the shuffle\n",
" ds = ds.shuffle(buffer_size=batchSize * 10)\n",
" \n",
" ds = ds.batch(batchSize).repeat()\n",
"\n",
" iterator = tf.compat.v1.data.make_one_shot_iterator(ds)\n",
" features, labels = iterator.get_next()\n",
" return features,labels"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "kWewAGE0B0a4",
"colab_type": "text"
},
"source": [
"Cool, we have out model and we have our data so let's train the model!!!"
]
},
{
"cell_type": "code",
"metadata": {
"id": "XyuiQ8Vr7iq_",
"colab_type": "code",
"colab": {}
},
"source": [
"nEpochs = 10 # number of epochs to run\n",
"batchSize = 100 # number of examples to run before batch u\n",
"nSteps = batchSize//nEpochs # number of steps per epoch (nExamples = nSteps * batchSize)\n",
"kernelShape = [kernelSize,kernelSize] # H x W size of tensors to parse\n",
"\n",
"training_features,training_label = input_fn(trainFile,predictors,label,batchSize=batchSize,kernelShape=kernelShape)\n",
"\n",
"training = model.fit(x=training_features,y=training_label,epochs=nEpochs,steps_per_epoch=nSteps)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "9rPCqJCVBUU-",
"colab_type": "code",
"cellView": "both",
"colab": {}
},
"source": [
"#@title Plot training results\n",
"\n",
"fig,ax = plt.subplots(ncols=2,sharex=True,figsize=(15,5))\n",
"ax[0].plot(range(1,nEpochs+1),training.history['loss'],marker='o',label='Training Loss')\n",
"ax[0].set_title('Training Loss')\n",
"ax[0].set_xlabel('Epoch')\n",
"\n",
"ax[1].plot(range(1,nEpochs+1),training.history['accuracy'],marker='o',label='Training Accuracy')\n",
"ax[1].set_title('Training Accuracy')\n",
"ax[1].set_xticks(range(1,nEpochs+1))\n",
"ax[1].set_xlabel('Epoch')\n",
"plt.show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "ZNxQ8QQZ6LGh",
"colab_type": "code",
"colab": {}
},
"source": [
"# parse testing dataset\n",
"testing_features, testing_label = input_fn(\n",
" fileNames=testFile,\n",
" predictors=predictors,\n",
" label=label,\n",
" shuffle=False,\n",
" training=False,\n",
" batchSize=1,\n",
" kernelShape=kernelShape)\n",
"\n",
"# and evaluate the model\n",
"model.evaluate(x=testing_features,y=testing_label,steps=100)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "MC9C28OWDLYo",
"colab_type": "text"
},
"source": [
"Wow! Our model is over 99% accurate compared to a testing dataset...\n",
"\n",
"![overfitting](https://miro.medium.com/max/800/1*uEL2SwXLdrEZpAgzXQ_qBA.png)\n",
"\n",
"Let's apply the model on some actual imagery and see how the predicted image looks."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VZBXB6cEOlfO",
"colab_type": "text"
},
"source": [
"# Running inference and uploading to Earth Engine\n",
"\n",
"We can now run predictions (or inference) on the full exported image to predict which pixels are forest."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "m96wyJgUukE3",
"colab_type": "text"
},
"source": [
"Earth Engine exports the image, again, in a TFRecord file format, which makes the image loose the spatial information. We need the spatial information to reconstruct the image after is this done so EE gives us a \"mixer file\" which describes the how the TFRecords are tiled. Here we get the mixer file because we will use the information to parse the TFRecords."
]
},
{
"cell_type": "code",
"metadata": {
"id": "giBr5LFL_BHl",
"colab_type": "code",
"colab": {}
},
"source": [
"# Load the contents of the mixer file to a JSON object.\n",
"jsonText = !gsutil cat {mixerFile}\n",
"\n",
"# Get a single string w/ newlines from the IPython.utils.text.SList\n",
"mixer = json.loads(jsonText.nlstr)\n",
"pprint(mixer)\n",
"\n",
"PATCH_WIDTH = mixer['patchDimensions'][0]\n",
"PATCH_HEIGHT = mixer['patchDimensions'][1]\n",
"PATCHES = mixer['totalPatches']"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "eTUS6Eb7viAl",
"colab_type": "text"
},
"source": [
"Again, the data is in a TFRecord file format so we need to parse the file and convert to Tensors. We have a similar function as before but we use the mixer file to derive some of the tensor shape information."
]
},
{
"cell_type": "code",
"metadata": {
"id": "dR8ue9m_Dg8y",
"colab_type": "code",
"colab": {}
},
"source": [
"# The default value of side is now 316, as our intent is to create predictions\n",
"# for 256x256 image patches with a 30 pixel wide border.\n",
"def infer_input_fn(fileNames, predictors, batchSize=100):\n",
" ds = tf.data.TFRecordDataset(fileNames, compression_type='GZIP')\n",
" \n",
" y = int(PATCH_HEIGHT)+(kernelSize/2)\n",
" x = int(PATCH_WIDTH)+(kernelSize/2)\n",
"\n",
" # Note that the tensors are in the shape of a patch, one patch for each band.\n",
" imageColumns = [\n",
" tf.io.FixedLenFeature(shape=[y,x], dtype=tf.float32) \n",
" for k in predictors\n",
" ]\n",
"\n",
" # Parsing dictionary.\n",
" imageFeaturesDict = dict(zip(predictors, imageColumns))\n",
"\n",
" def parse(example_proto):\n",
" parsed_features = tf.io.parse_single_example(example_proto, imageFeaturesDict)\n",
" parsed_features = {\n",
" k:v for (k,v) in parsed_features.items()}\n",
" return parsed_features\n",
" \n",
" ds = ds.map(parse, num_parallel_calls=5)\n",
" \n",
" # Return a tuple from a dict when there is no label (target). (i.e. inference)\n",
" def toTupleImage(dict):\n",
" inputsList = [dict.get(key) for key in predictors]\n",
" stacked = tf.stack(inputsList, axis=0)\n",
" # Convert from CHW to HWC\n",
" stacked = tf.transpose(stacked, [1, 2, 0])\n",
" return stacked\n",
"\n",
" ds = ds.map(toTupleImage).batch(batchSize)\n",
"\n",
" return ds"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "a7vrPUXwFZ-Y",
"colab_type": "code",
"colab": {}
},
"source": [
"# run predictions\n",
"predictions = [i[16:-16,16:-16] for i in model.predict(infer_input_fn(predictFiles,predictors,batchSize=1), steps=PATCHES, verbose=1)]\n",
"# outputs are a list of np.arrays"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "Af36m68Yg58J",
"colab_type": "text"
},
"source": [
"![slacking off]()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vLjAlrlvEyLn",
"colab_type": "text"
},
"source": [
"We ran our predictions on the image but how do we get the data back into Earth Engine? We do this by writing the data to storage and then uploading the data. We have to write a little code to write the data and create a TF example from the numpy arrays."
]
},
{
"cell_type": "code",
"metadata": {
"id": "49c35TCiFisP",
"colab_type": "code",
"colab": {}
},
"source": [
"def make_example(pred):\n",
" probs = pred.flatten()\n",
" example = tf.train.Example(\n",
" features=tf.train.Features(\n",
" feature={\n",
" 'probability': tf.train.Feature(\n",
" float_list=tf.train.FloatList(\n",
" value=probs))\n",
" }\n",
" )\n",
" )\n",
" return example\n",
"\n",
"# set this number to prevent massive file sizes\n",
"MAX_RECORDS_PER_FILE = 50\n",
"output_path = 'gs://{0}/{1}/silvacarbon_tf_forest_prediction_{2:05}.tfrecord'\n",
"\n",
"# Create the records we'll ingest into EE\n",
"file_number = 0\n",
"still_writing = True\n",
"total_patches = 0\n",
"while still_writing:\n",
" file_path = output_path.format(silvacarbonBucket,myfolder,file_number)\n",
" writer = tf.io.TFRecordWriter(file_path)\n",
" print(\"Writing file: {}\".format(file_path))\n",
" try:\n",
" written_records = 0\n",
" while True:\n",
" pred_arr = predictions[total_patches]\n",
"\n",
" writer.write(make_example(pred_arr).SerializeToString())\n",
"\n",
" written_records += 1 \n",
" total_patches += 1\n",
"\n",
" if written_records % 25 == 0:\n",
" print(\" Writing patch: {}\".format(written_records))\n",
"\n",
" if written_records == MAX_RECORDS_PER_FILE:\n",
" break\n",
" except Exception as e: \n",
" print(e)\n",
" # Stop writing for any exception. Note that reaching the end of the prediction\n",
" # dataset throws an exception.\n",
" still_writing=False\n",
" finally:\n",
" file_number += 1\n",
" writer.close()\n",
" \n",
"print('Wrote: {0} patches to {1} files'.format(total_patches,file_number))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "7BxrbOsjFP46",
"colab_type": "text"
},
"source": [
"Data has been written and we just need to kick off an upload task. We define what the asset ID should be and which files to upload (the data we just wrote)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "pvLhNTyJKTwK",
"colab_type": "code",
"colab": {}
},
"source": [
"# need to change the `toAssetId` variable to an asset folder you can write to\n",
"toAssetId = 'users/kelmarkert/public/silvacarbon_tf_forest_prediction'\n",
"\n",
"# get a formated string of all the tfrecords to upload\n",
"tfPredictions = ' '.join(['gs://{0}/{1}/silvacarbon_tf_forest_prediction_{2:05}.tfrecord'.format(silvacarbonBucket,myfolder,i) for i in range(file_number)])\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "YH25evqmNDQI",
"colab_type": "code",
"colab": {}
},
"source": [
"# kick off an upload process\n",
"!earthengine --no-use_cloud_api upload image --asset_id={toAssetId} {tfPredictions} {mixerFile}"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "QqI2IuH-NJ4H",
"colab_type": "code",
"colab": {}
},
"source": [
"pprint(ee.batch.Task.list()[0])"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "N32QfEoYdUyB",
"colab_type": "text"
},
"source": [
"You should now have an image in your asset folder that is the probablity of forest as predicted by our model! [Here](https://code.earthengine.google.com/7ab04f2c0f42105114afa33f4c7a0eac) is an example of the final output on Earth Engine"
]
},
{
"cell_type": "code",
"metadata": {
"id": "LPtCvizCh6PI",
"colab_type": "code",
"cellView": "both",
"colab": {}
},
"source": [
"#@title Congratulations, you used EE + TF!!!\n",
"\n",
"%%html\n",
"<div style=\"width:100%;\">\n",
" <p style=\"font-size:30px;text-align:center\"> How you feel after successfully running TensorFlow with Earth Engine </p>\n",
" <img style=\"width:50%;margin-left:auto;margin-right:auto;display:block;\" src=\"https://img.buzzfeed.com/buzzfeed-static/static/2019-06/29/17/asset/f36f57e98c55/sub-buzz-691-1561828556-1.jpg\"></img>\n",
"</div>"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "zp75G5nhOG4I",
"colab_type": "code",
"colab": {}
},
"source": [
""
],
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment