KMarkert/smap_l2_9km_gridding.ipynb

## smap_l2_9km_gridding.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "smap_l2_9km_gridding.ipynb",
      "provenance": [],
      "private_outputs": true,
      "collapsed_sections": [],
      "toc_visible": true,
      "authorship_tag": "ABX9TyN3lorlzs3LpQwl35Nenbie",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/KMarkert/16902163c52e587436e473587b587f52/smap_l2_9km_gridding.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ASo2MzXPgJsn",
        "colab_type": "text"
      },
      "source": [
        "## Gridding SMAP level 2 Soil Moisture data\n",
        "\n",
        "This notebook example shows how one might access the 9km SMAP level 2 data, read the dataset, grid the data to a common projection, and save the data as a time series dataset.\n",
        "\n",
        "Total time ~10 minutes"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ioqeIlbzzwLE",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# install a package used to handle projections\n",
        "!pip install pyproj h5netcdf --quiet"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "jvJ4RZoqq40h",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# import packages\n",
        "\n",
        "%pylab inline\n",
        "\n",
        "import os\n",
        "import glob\n",
        "import h5py\n",
        "import datetime\n",
        "import xarray as xr\n",
        "from pyproj import Transformer\n",
        "from scipy.interpolate import griddata"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "2KzAu9sGLZUP",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# list of urls to access data\n",
        "# this list can be created from search.earthdata.nasa.gov\n",
        "\n",
        "urls = [\n",
        "    'https://n5eil01u.ecs.nsidc.org/SMAP/SPL2SMAP.003/2015.04.13/SMAP_L2_SM_AP_01056_D_20150413T184423_R13080_001.h5',\n",
        "    'https://n5eil01u.ecs.nsidc.org/SMAP/SPL2SMAP.003/2015.04.13/SMAP_L2_SM_AP_01057_D_20150413T202249_R13080_001.h5',\n",
        "    'https://n5eil01u.ecs.nsidc.org/SMAP/SPL2SMAP.003/2015.04.13/SMAP_L2_SM_AP_01058_D_20150413T220114_R13080_001.h5'\n",
        "]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "23zUA5OcPU_K",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# download each url locally ... very simple fetching\n",
        "# change the username and password to your NASA EarthData credentials\n",
        "for url in urls:\n",
        "    os.system(f'wget --http-user=<EARTHDATA_USER> --http-password=<EARTHDATA_PASS> {url}')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "vOucoQhcPecZ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# list the files downloaded\n",
        "h5Files = glob.glob('./*.h5')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "uCl4N1VJQlLJ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# open up one dataset and see what is going on insided\n",
        "ds = h5py.File(h5Files[0],'r')\n",
        "list(ds.keys())"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "nvyXZ96CQzGB",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# looks like there are a few groups so we will see what is in the 'Soil_Moisture_Retrieval_Data'\n",
        "sub = ds['Soil_Moisture_Retrieval_Data']\n",
        "list(sub.keys())\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "xSILptjumCsC",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "ds.close() # close file...don't want to leave it open"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5RYMkp6whWd5",
        "colab_type": "text"
      },
      "source": [
        "Lots of variables...we will focus on the geographic variables and `soil_moisture`. The following workflow will work for all variables but will need to be reworked to grid all variables and save as a dataset"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eOfIFhoshtkE",
        "colab_type": "text"
      },
      "source": [
        "### Gridding process\n",
        "\n",
        "The SMAP data is a 1-d array of values with an associated geographic coordinate. This format isn't very friendly for GIS applications so we are going to take the 1-d structure and convert it into gridded raster dataset. Here is the overall workflow for the gridding:\n",
        "1. Define projection information\n",
        "2. Extract the data for each file\n",
        "3. Grid the data to the native projection\n",
        "4. Merge multiple timesteps to one 3-d dataset (y,x,t)\n",
        "5. Save the data to a more friendly format"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "uhu80dH1anFl",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# here we define the projection information\n",
        "# the SMAP data's default projection is EASE2.0 (EPSG:6993)\n",
        "# Luckily this is a nice projection to work with and the projection col,row info is defined in the SMAP data\n",
        "\n",
        "\n",
        "# we will define the under lying grid to project the 1-d data to\n",
        "# information taken from https://nsidc.org/ease/ease-grid-projection-gt\n",
        "projOrigin = (-17367530.45,7314540.83) # x,y\n",
        "projDims = (3856,1624) # cols, rows\n",
        "projRes = (9008.05, 9008.05) # xResolution, yResolution\n",
        "\n",
        "# create arrays of the row,col information\n",
        "col = np.arange(0,projDims[0],1).astype(np.int16)\n",
        "row = np.arange(0,projDims[1],1).astype(np.int16)[::-1]\n",
        "\n",
        "# make a 2-d array of the column and row indices\n",
        "xx,yy = np.meshgrid(col,row)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "PEDLkG3RZRkF",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# here is where we loop over the files read the data and perform the gridding\n",
        "\n",
        "# define empty lists to dump the gridded data and time info to\n",
        "data = []\n",
        "times = []\n",
        "refTime = datetime.datetime(2000,1,1,0,0,0)\n",
        "# loop over the files\n",
        "for f in h5Files:\n",
        "    # open dataset and access the specific data we want\n",
        "    ds = h5py.File(f,'r')\n",
        "    sub = ds['Soil_Moisture_Retrieval_Data']\n",
        "    yidx = np.array(sub['EASE_row_index']) # get the row info\n",
        "    xidx = np.array(sub['EASE_column_index']) # get the col info\n",
        "    sm = np.array(sub['soil_moisture']) # get the soil moisture array\n",
        "    # change the sm variable access to a different variable from the earlier list if you would like\n",
        "    ds.close() # close file\n",
        "\n",
        "    # perfrom the gridding based on the EASE col and row\n",
        "    smGridded = griddata(np.stack([yidx,xidx],axis=-1), sm, (yy,xx),fill_value=-9999.0,method='linear')\n",
        "\n",
        "    # get the time infomation as string from the filename\n",
        "    t = datetime.datetime.strptime(f.split('_')[-3], '%Y%m%dT%H%M%S' )# convert string to datetime info while appending\n",
        "    \n",
        "    data.append(smGridded)\n",
        "    times.append((t-refTime).total_seconds()) "
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "JNT37rcpbiq7",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# the previous geographic information we used was a column or row index from the EASE2.0 projection\n",
        "# we actually want the lat/lon coordinates from the EASE2.0 projection in the final file\n",
        "# here we create the EASE coordinates as numpy arrays and grids\n",
        "projX = np.arange(projOrigin[0],projOrigin[0]+((projDims[0])*projRes[0]),projRes[0]).astype(np.float32)\n",
        "projY = np.arange(projOrigin[1]-((projDims[1])*projRes[1]),projOrigin[1],projRes[1]).astype(np.float32)\n",
        "\n",
        "xxProj,yyProj = np.meshgrid(projX,projY)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "BSoWRaqjbms7",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# convert the projection grid to lat/lon coordinates\n",
        "# using proj to reproject EASE coordinates to lat/lon\n",
        "transformer = Transformer.from_crs(\"epsg:6933\", \"epsg:4326\",always_xy=True)\n",
        "projLon,projLat = transformer.transform(xxProj,yyProj)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "KCXD6aZaAlOW",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "imshow(projLat)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "X4FerpqEAnh2",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "all(projLon[0,:] == projLon[700,:])"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ixDRtlCEEWA-",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# here we prep our data a little further before creating a 3-d dataset\n",
        "\n",
        "# we stack the 2-d soil moisture data along a third axis here using np.dstack (third dim is time)\n",
        "data3D = np.dstack(data)\n",
        "\n",
        "# we extract out the gridded lat/long data to a 1-d array\n",
        "# we can do this because the lat/long values are the same across the globe based on the projection\n",
        "# this is a simple step that can save valuable storage space\n",
        "xcoord = projLon[0,:].astype(np.float32)\n",
        "ycoord = projLat[:,0].astype(np.float32)\n",
        "times = np.array(times,dtype=np.int32)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "OReG8W3TbuO_",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# now that we have all of the data, we just need to combine it\n",
        "# we are going to use xarray (my personal favorite for handling this kind of data)\n",
        "# we specify the 3-d data and the coordinates\n",
        "\n",
        "ds = xr.Dataset({'soil_moisture': (['latitude', 'longitude', 'time'], data3D )},\n",
        "                coords={'latitude': ycoord,\n",
        "                        'longitude': xcoord,\n",
        "                        'time': times})\n",
        "\n",
        "# # mask no data information - valid range of soil moisture data from SMAP is 0-1\n",
        "ds = ds.where((ds >= 0) & (ds <= 1)).astype('float32')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-3ZcxAKHKsuW",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "ds.soil_moisture.attrs = {'long_name':'Representative soil moisture measurement for the Earth based grid cell','units':'cm**3/cm**3'}\n",
        "ds.latitude.attrs = {'long_name':'Latitude','units':'degrees_north'}\n",
        "ds.longitude.attrs = {'long_name':'Longitude','units':'degrees_east'}\n",
        "ds.time.attrs = {'calendar':'proleptic_gregorian','units':'seconds since 2000-01-01'}"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "paPGdzeSdJTI",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# check what is going on in the xarray dataset\n",
        "ds"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "2uxyE1kKdUCn",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# plot the three different time steps\n",
        "ds.soil_moisture.plot(col='time',figsize=(15,5))\n",
        "plt.show()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "SObWh92pC7Dt",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# before we save out the file we are going to \"chunk\" the dataset\n",
        "# chunking breaks the data along a dimension into smaller pieces\n",
        "# this increases read time but can *significantly* reduce file sizes\n",
        "# we will chunk along lat/long so that internally we have 256x256 tiles\n",
        "ds = ds.chunk({'latitude': 256, 'longitude': 256,'time':1})"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "C6ppHrr4Lst9",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# the final thing we do (I know there has been a lot of data wrangling...)\n",
        "# is to sort by time (sometimes it becomes unsorted when saving to disk)\n",
        "ds = ds.sortby('time')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "God5m-ahdbpi",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# save the dataset as a gridded netcdf file\n",
        "# we provide some encoding information to help with compression too\n",
        "ds.to_netcdf('gridded_smap.nc',engine='h5netcdf',\n",
        "             encoding={'soil_moisture':{'_FillValue':-9999, # data encoding\n",
        "                                        'zlib':True,'complevel':9,'shuffle':True} # compression encoding\n",
        "                       })"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KoV63z4mkQM3",
        "colab_type": "text"
      },
      "source": [
        "So we accessed the SMAP data, performed some gridding and saved the file to a more GIS friendly format. Ideally, we would carry over the metadata from the HDF5 file, and use more variables from the original file but this example is a start for converting the data."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "4BzTeoQMOG-x",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        ""
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "smap_l2_9km_gridding.ipynb",
	"provenance": [],
	"private_outputs": true,
	"collapsed_sections": [],
	"toc_visible": true,
	"authorship_tag": "ABX9TyN3lorlzs3LpQwl35Nenbie",
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/KMarkert/16902163c52e587436e473587b587f52/smap_l2_9km_gridding.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "ASo2MzXPgJsn",
	"colab_type": "text"
	},
	"source": [
	"## Gridding SMAP level 2 Soil Moisture data\n",
	"\n",
	"This notebook example shows how one might access the 9km SMAP level 2 data, read the dataset, grid the data to a common projection, and save the data as a time series dataset.\n",
	"\n",
	"Total time ~10 minutes"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "ioqeIlbzzwLE",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# install a package used to handle projections\n",
	"!pip install pyproj h5netcdf --quiet"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "jvJ4RZoqq40h",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# import packages\n",
	"\n",
	"%pylab inline\n",
	"\n",
	"import os\n",
	"import glob\n",
	"import h5py\n",
	"import datetime\n",
	"import xarray as xr\n",
	"from pyproj import Transformer\n",
	"from scipy.interpolate import griddata"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "2KzAu9sGLZUP",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# list of urls to access data\n",
	"# this list can be created from search.earthdata.nasa.gov\n",
	"\n",
	"urls = [\n",
	" 'https://n5eil01u.ecs.nsidc.org/SMAP/SPL2SMAP.003/2015.04.13/SMAP_L2_SM_AP_01056_D_20150413T184423_R13080_001.h5',\n",
	" 'https://n5eil01u.ecs.nsidc.org/SMAP/SPL2SMAP.003/2015.04.13/SMAP_L2_SM_AP_01057_D_20150413T202249_R13080_001.h5',\n",
	" 'https://n5eil01u.ecs.nsidc.org/SMAP/SPL2SMAP.003/2015.04.13/SMAP_L2_SM_AP_01058_D_20150413T220114_R13080_001.h5'\n",
	"]"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "23zUA5OcPU_K",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# download each url locally ... very simple fetching\n",
	"# change the username and password to your NASA EarthData credentials\n",
	"for url in urls:\n",
	" os.system(f'wget --http-user=<EARTHDATA_USER> --http-password=<EARTHDATA_PASS> {url}')"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "vOucoQhcPecZ",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# list the files downloaded\n",
	"h5Files = glob.glob('./*.h5')"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "uCl4N1VJQlLJ",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# open up one dataset and see what is going on insided\n",
	"ds = h5py.File(h5Files[0],'r')\n",
	"list(ds.keys())"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "nvyXZ96CQzGB",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# looks like there are a few groups so we will see what is in the 'Soil_Moisture_Retrieval_Data'\n",
	"sub = ds['Soil_Moisture_Retrieval_Data']\n",
	"list(sub.keys())\n"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "xSILptjumCsC",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"ds.close() # close file...don't want to leave it open"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "5RYMkp6whWd5",
	"colab_type": "text"
	},
	"source": [
	"Lots of variables...we will focus on the geographic variables and `soil_moisture`. The following workflow will work for all variables but will need to be reworked to grid all variables and save as a dataset"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "eOfIFhoshtkE",
	"colab_type": "text"
	},
	"source": [
	"### Gridding process\n",
	"\n",
	"The SMAP data is a 1-d array of values with an associated geographic coordinate. This format isn't very friendly for GIS applications so we are going to take the 1-d structure and convert it into gridded raster dataset. Here is the overall workflow for the gridding:\n",
	"1. Define projection information\n",
	"2. Extract the data for each file\n",
	"3. Grid the data to the native projection\n",
	"4. Merge multiple timesteps to one 3-d dataset (y,x,t)\n",
	"5. Save the data to a more friendly format"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "uhu80dH1anFl",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# here we define the projection information\n",
	"# the SMAP data's default projection is EASE2.0 (EPSG:6993)\n",
	"# Luckily this is a nice projection to work with and the projection col,row info is defined in the SMAP data\n",
	"\n",
	"\n",
	"# we will define the under lying grid to project the 1-d data to\n",
	"# information taken from https://nsidc.org/ease/ease-grid-projection-gt\n",
	"projOrigin = (-17367530.45,7314540.83) # x,y\n",
	"projDims = (3856,1624) # cols, rows\n",
	"projRes = (9008.05, 9008.05) # xResolution, yResolution\n",
	"\n",
	"# create arrays of the row,col information\n",
	"col = np.arange(0,projDims[0],1).astype(np.int16)\n",
	"row = np.arange(0,projDims[1],1).astype(np.int16)[::-1]\n",
	"\n",
	"# make a 2-d array of the column and row indices\n",
	"xx,yy = np.meshgrid(col,row)"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "PEDLkG3RZRkF",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# here is where we loop over the files read the data and perform the gridding\n",
	"\n",
	"# define empty lists to dump the gridded data and time info to\n",
	"data = []\n",
	"times = []\n",
	"refTime = datetime.datetime(2000,1,1,0,0,0)\n",
	"# loop over the files\n",
	"for f in h5Files:\n",
	" # open dataset and access the specific data we want\n",
	" ds = h5py.File(f,'r')\n",
	" sub = ds['Soil_Moisture_Retrieval_Data']\n",
	" yidx = np.array(sub['EASE_row_index']) # get the row info\n",
	" xidx = np.array(sub['EASE_column_index']) # get the col info\n",
	" sm = np.array(sub['soil_moisture']) # get the soil moisture array\n",
	" # change the sm variable access to a different variable from the earlier list if you would like\n",
	" ds.close() # close file\n",
	"\n",
	" # perfrom the gridding based on the EASE col and row\n",
	" smGridded = griddata(np.stack([yidx,xidx],axis=-1), sm, (yy,xx),fill_value=-9999.0,method='linear')\n",
	"\n",
	" # get the time infomation as string from the filename\n",
	" t = datetime.datetime.strptime(f.split('_')[-3], '%Y%m%dT%H%M%S' )# convert string to datetime info while appending\n",
	" \n",
	" data.append(smGridded)\n",
	" times.append((t-refTime).total_seconds()) "
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "JNT37rcpbiq7",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# the previous geographic information we used was a column or row index from the EASE2.0 projection\n",
	"# we actually want the lat/lon coordinates from the EASE2.0 projection in the final file\n",
	"# here we create the EASE coordinates as numpy arrays and grids\n",
	"projX = np.arange(projOrigin[0],projOrigin[0]+((projDims[0])*projRes[0]),projRes[0]).astype(np.float32)\n",
	"projY = np.arange(projOrigin[1]-((projDims[1])*projRes[1]),projOrigin[1],projRes[1]).astype(np.float32)\n",
	"\n",
	"xxProj,yyProj = np.meshgrid(projX,projY)"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "BSoWRaqjbms7",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# convert the projection grid to lat/lon coordinates\n",
	"# using proj to reproject EASE coordinates to lat/lon\n",
	"transformer = Transformer.from_crs(\"epsg:6933\", \"epsg:4326\",always_xy=True)\n",
	"projLon,projLat = transformer.transform(xxProj,yyProj)"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "KCXD6aZaAlOW",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"imshow(projLat)"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "X4FerpqEAnh2",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"all(projLon[0,:] == projLon[700,:])"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "ixDRtlCEEWA-",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# here we prep our data a little further before creating a 3-d dataset\n",
	"\n",
	"# we stack the 2-d soil moisture data along a third axis here using np.dstack (third dim is time)\n",
	"data3D = np.dstack(data)\n",
	"\n",
	"# we extract out the gridded lat/long data to a 1-d array\n",
	"# we can do this because the lat/long values are the same across the globe based on the projection\n",
	"# this is a simple step that can save valuable storage space\n",
	"xcoord = projLon[0,:].astype(np.float32)\n",
	"ycoord = projLat[:,0].astype(np.float32)\n",
	"times = np.array(times,dtype=np.int32)"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "OReG8W3TbuO_",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# now that we have all of the data, we just need to combine it\n",
	"# we are going to use xarray (my personal favorite for handling this kind of data)\n",
	"# we specify the 3-d data and the coordinates\n",
	"\n",
	"ds = xr.Dataset({'soil_moisture': (['latitude', 'longitude', 'time'], data3D )},\n",
	" coords={'latitude': ycoord,\n",
	" 'longitude': xcoord,\n",
	" 'time': times})\n",
	"\n",
	"# # mask no data information - valid range of soil moisture data from SMAP is 0-1\n",
	"ds = ds.where((ds >= 0) & (ds <= 1)).astype('float32')"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "-3ZcxAKHKsuW",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"ds.soil_moisture.attrs = {'long_name':'Representative soil moisture measurement for the Earth based grid cell','units':'cm3/cm3'}\n",
	"ds.latitude.attrs = {'long_name':'Latitude','units':'degrees_north'}\n",
	"ds.longitude.attrs = {'long_name':'Longitude','units':'degrees_east'}\n",
	"ds.time.attrs = {'calendar':'proleptic_gregorian','units':'seconds since 2000-01-01'}"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "paPGdzeSdJTI",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# check what is going on in the xarray dataset\n",
	"ds"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "2uxyE1kKdUCn",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# plot the three different time steps\n",
	"ds.soil_moisture.plot(col='time',figsize=(15,5))\n",
	"plt.show()"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "SObWh92pC7Dt",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# before we save out the file we are going to \"chunk\" the dataset\n",
	"# chunking breaks the data along a dimension into smaller pieces\n",
	"# this increases read time but can significantly reduce file sizes\n",
	"# we will chunk along lat/long so that internally we have 256x256 tiles\n",
	"ds = ds.chunk({'latitude': 256, 'longitude': 256,'time':1})"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "C6ppHrr4Lst9",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# the final thing we do (I know there has been a lot of data wrangling...)\n",
	"# is to sort by time (sometimes it becomes unsorted when saving to disk)\n",
	"ds = ds.sortby('time')"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "God5m-ahdbpi",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# save the dataset as a gridded netcdf file\n",
	"# we provide some encoding information to help with compression too\n",
	"ds.to_netcdf('gridded_smap.nc',engine='h5netcdf',\n",
	" encoding={'soil_moisture':{'_FillValue':-9999, # data encoding\n",
	" 'zlib':True,'complevel':9,'shuffle':True} # compression encoding\n",
	" })"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "KoV63z4mkQM3",
	"colab_type": "text"
	},
	"source": [
	"So we accessed the SMAP data, performed some gridding and saved the file to a more GIS friendly format. Ideally, we would carry over the metadata from the HDF5 file, and use more variables from the original file but this example is a start for converting the data."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "4BzTeoQMOG-x",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	""
	],
	"execution_count": null,
	"outputs": []
	}
	]
	}