Skip to content

Instantly share code, notes, and snippets.

@KMarkert
Last active June 22, 2020 18:45
Show Gist options
  • Save KMarkert/16902163c52e587436e473587b587f52 to your computer and use it in GitHub Desktop.
Save KMarkert/16902163c52e587436e473587b587f52 to your computer and use it in GitHub Desktop.
smap_l2_9km_gridding.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "smap_l2_9km_gridding.ipynb",
"provenance": [],
"private_outputs": true,
"collapsed_sections": [],
"toc_visible": true,
"authorship_tag": "ABX9TyN3lorlzs3LpQwl35Nenbie",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/KMarkert/16902163c52e587436e473587b587f52/smap_l2_9km_gridding.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ASo2MzXPgJsn",
"colab_type": "text"
},
"source": [
"## Gridding SMAP level 2 Soil Moisture data\n",
"\n",
"This notebook example shows how one might access the 9km SMAP level 2 data, read the dataset, grid the data to a common projection, and save the data as a time series dataset.\n",
"\n",
"Total time ~10 minutes"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ioqeIlbzzwLE",
"colab_type": "code",
"colab": {}
},
"source": [
"# install a package used to handle projections\n",
"!pip install pyproj h5netcdf --quiet"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "jvJ4RZoqq40h",
"colab_type": "code",
"colab": {}
},
"source": [
"# import packages\n",
"\n",
"%pylab inline\n",
"\n",
"import os\n",
"import glob\n",
"import h5py\n",
"import datetime\n",
"import xarray as xr\n",
"from pyproj import Transformer\n",
"from scipy.interpolate import griddata"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "2KzAu9sGLZUP",
"colab_type": "code",
"colab": {}
},
"source": [
"# list of urls to access data\n",
"# this list can be created from search.earthdata.nasa.gov\n",
"\n",
"urls = [\n",
" 'https://n5eil01u.ecs.nsidc.org/SMAP/SPL2SMAP.003/2015.04.13/SMAP_L2_SM_AP_01056_D_20150413T184423_R13080_001.h5',\n",
" 'https://n5eil01u.ecs.nsidc.org/SMAP/SPL2SMAP.003/2015.04.13/SMAP_L2_SM_AP_01057_D_20150413T202249_R13080_001.h5',\n",
" 'https://n5eil01u.ecs.nsidc.org/SMAP/SPL2SMAP.003/2015.04.13/SMAP_L2_SM_AP_01058_D_20150413T220114_R13080_001.h5'\n",
"]"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "23zUA5OcPU_K",
"colab_type": "code",
"colab": {}
},
"source": [
"# download each url locally ... very simple fetching\n",
"# change the username and password to your NASA EarthData credentials\n",
"for url in urls:\n",
" os.system(f'wget --http-user=<EARTHDATA_USER> --http-password=<EARTHDATA_PASS> {url}')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "vOucoQhcPecZ",
"colab_type": "code",
"colab": {}
},
"source": [
"# list the files downloaded\n",
"h5Files = glob.glob('./*.h5')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "uCl4N1VJQlLJ",
"colab_type": "code",
"colab": {}
},
"source": [
"# open up one dataset and see what is going on insided\n",
"ds = h5py.File(h5Files[0],'r')\n",
"list(ds.keys())"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "nvyXZ96CQzGB",
"colab_type": "code",
"colab": {}
},
"source": [
"# looks like there are a few groups so we will see what is in the 'Soil_Moisture_Retrieval_Data'\n",
"sub = ds['Soil_Moisture_Retrieval_Data']\n",
"list(sub.keys())\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "xSILptjumCsC",
"colab_type": "code",
"colab": {}
},
"source": [
"ds.close() # close file...don't want to leave it open"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "5RYMkp6whWd5",
"colab_type": "text"
},
"source": [
"Lots of variables...we will focus on the geographic variables and `soil_moisture`. The following workflow will work for all variables but will need to be reworked to grid all variables and save as a dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eOfIFhoshtkE",
"colab_type": "text"
},
"source": [
"### Gridding process\n",
"\n",
"The SMAP data is a 1-d array of values with an associated geographic coordinate. This format isn't very friendly for GIS applications so we are going to take the 1-d structure and convert it into gridded raster dataset. Here is the overall workflow for the gridding:\n",
"1. Define projection information\n",
"2. Extract the data for each file\n",
"3. Grid the data to the native projection\n",
"4. Merge multiple timesteps to one 3-d dataset (y,x,t)\n",
"5. Save the data to a more friendly format"
]
},
{
"cell_type": "code",
"metadata": {
"id": "uhu80dH1anFl",
"colab_type": "code",
"colab": {}
},
"source": [
"# here we define the projection information\n",
"# the SMAP data's default projection is EASE2.0 (EPSG:6993)\n",
"# Luckily this is a nice projection to work with and the projection col,row info is defined in the SMAP data\n",
"\n",
"\n",
"# we will define the under lying grid to project the 1-d data to\n",
"# information taken from https://nsidc.org/ease/ease-grid-projection-gt\n",
"projOrigin = (-17367530.45,7314540.83) # x,y\n",
"projDims = (3856,1624) # cols, rows\n",
"projRes = (9008.05, 9008.05) # xResolution, yResolution\n",
"\n",
"# create arrays of the row,col information\n",
"col = np.arange(0,projDims[0],1).astype(np.int16)\n",
"row = np.arange(0,projDims[1],1).astype(np.int16)[::-1]\n",
"\n",
"# make a 2-d array of the column and row indices\n",
"xx,yy = np.meshgrid(col,row)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "PEDLkG3RZRkF",
"colab_type": "code",
"colab": {}
},
"source": [
"# here is where we loop over the files read the data and perform the gridding\n",
"\n",
"# define empty lists to dump the gridded data and time info to\n",
"data = []\n",
"times = []\n",
"refTime = datetime.datetime(2000,1,1,0,0,0)\n",
"# loop over the files\n",
"for f in h5Files:\n",
" # open dataset and access the specific data we want\n",
" ds = h5py.File(f,'r')\n",
" sub = ds['Soil_Moisture_Retrieval_Data']\n",
" yidx = np.array(sub['EASE_row_index']) # get the row info\n",
" xidx = np.array(sub['EASE_column_index']) # get the col info\n",
" sm = np.array(sub['soil_moisture']) # get the soil moisture array\n",
" # change the sm variable access to a different variable from the earlier list if you would like\n",
" ds.close() # close file\n",
"\n",
" # perfrom the gridding based on the EASE col and row\n",
" smGridded = griddata(np.stack([yidx,xidx],axis=-1), sm, (yy,xx),fill_value=-9999.0,method='linear')\n",
"\n",
" # get the time infomation as string from the filename\n",
" t = datetime.datetime.strptime(f.split('_')[-3], '%Y%m%dT%H%M%S' )# convert string to datetime info while appending\n",
" \n",
" data.append(smGridded)\n",
" times.append((t-refTime).total_seconds()) "
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "JNT37rcpbiq7",
"colab_type": "code",
"colab": {}
},
"source": [
"# the previous geographic information we used was a column or row index from the EASE2.0 projection\n",
"# we actually want the lat/lon coordinates from the EASE2.0 projection in the final file\n",
"# here we create the EASE coordinates as numpy arrays and grids\n",
"projX = np.arange(projOrigin[0],projOrigin[0]+((projDims[0])*projRes[0]),projRes[0]).astype(np.float32)\n",
"projY = np.arange(projOrigin[1]-((projDims[1])*projRes[1]),projOrigin[1],projRes[1]).astype(np.float32)\n",
"\n",
"xxProj,yyProj = np.meshgrid(projX,projY)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "BSoWRaqjbms7",
"colab_type": "code",
"colab": {}
},
"source": [
"# convert the projection grid to lat/lon coordinates\n",
"# using proj to reproject EASE coordinates to lat/lon\n",
"transformer = Transformer.from_crs(\"epsg:6933\", \"epsg:4326\",always_xy=True)\n",
"projLon,projLat = transformer.transform(xxProj,yyProj)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "KCXD6aZaAlOW",
"colab_type": "code",
"colab": {}
},
"source": [
"imshow(projLat)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "X4FerpqEAnh2",
"colab_type": "code",
"colab": {}
},
"source": [
"all(projLon[0,:] == projLon[700,:])"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "ixDRtlCEEWA-",
"colab_type": "code",
"colab": {}
},
"source": [
"# here we prep our data a little further before creating a 3-d dataset\n",
"\n",
"# we stack the 2-d soil moisture data along a third axis here using np.dstack (third dim is time)\n",
"data3D = np.dstack(data)\n",
"\n",
"# we extract out the gridded lat/long data to a 1-d array\n",
"# we can do this because the lat/long values are the same across the globe based on the projection\n",
"# this is a simple step that can save valuable storage space\n",
"xcoord = projLon[0,:].astype(np.float32)\n",
"ycoord = projLat[:,0].astype(np.float32)\n",
"times = np.array(times,dtype=np.int32)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "OReG8W3TbuO_",
"colab_type": "code",
"colab": {}
},
"source": [
"# now that we have all of the data, we just need to combine it\n",
"# we are going to use xarray (my personal favorite for handling this kind of data)\n",
"# we specify the 3-d data and the coordinates\n",
"\n",
"ds = xr.Dataset({'soil_moisture': (['latitude', 'longitude', 'time'], data3D )},\n",
" coords={'latitude': ycoord,\n",
" 'longitude': xcoord,\n",
" 'time': times})\n",
"\n",
"# # mask no data information - valid range of soil moisture data from SMAP is 0-1\n",
"ds = ds.where((ds >= 0) & (ds <= 1)).astype('float32')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "-3ZcxAKHKsuW",
"colab_type": "code",
"colab": {}
},
"source": [
"ds.soil_moisture.attrs = {'long_name':'Representative soil moisture measurement for the Earth based grid cell','units':'cm**3/cm**3'}\n",
"ds.latitude.attrs = {'long_name':'Latitude','units':'degrees_north'}\n",
"ds.longitude.attrs = {'long_name':'Longitude','units':'degrees_east'}\n",
"ds.time.attrs = {'calendar':'proleptic_gregorian','units':'seconds since 2000-01-01'}"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "paPGdzeSdJTI",
"colab_type": "code",
"colab": {}
},
"source": [
"# check what is going on in the xarray dataset\n",
"ds"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "2uxyE1kKdUCn",
"colab_type": "code",
"colab": {}
},
"source": [
"# plot the three different time steps\n",
"ds.soil_moisture.plot(col='time',figsize=(15,5))\n",
"plt.show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "SObWh92pC7Dt",
"colab_type": "code",
"colab": {}
},
"source": [
"# before we save out the file we are going to \"chunk\" the dataset\n",
"# chunking breaks the data along a dimension into smaller pieces\n",
"# this increases read time but can *significantly* reduce file sizes\n",
"# we will chunk along lat/long so that internally we have 256x256 tiles\n",
"ds = ds.chunk({'latitude': 256, 'longitude': 256,'time':1})"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "C6ppHrr4Lst9",
"colab_type": "code",
"colab": {}
},
"source": [
"# the final thing we do (I know there has been a lot of data wrangling...)\n",
"# is to sort by time (sometimes it becomes unsorted when saving to disk)\n",
"ds = ds.sortby('time')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "God5m-ahdbpi",
"colab_type": "code",
"colab": {}
},
"source": [
"# save the dataset as a gridded netcdf file\n",
"# we provide some encoding information to help with compression too\n",
"ds.to_netcdf('gridded_smap.nc',engine='h5netcdf',\n",
" encoding={'soil_moisture':{'_FillValue':-9999, # data encoding\n",
" 'zlib':True,'complevel':9,'shuffle':True} # compression encoding\n",
" })"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "KoV63z4mkQM3",
"colab_type": "text"
},
"source": [
"So we accessed the SMAP data, performed some gridding and saved the file to a more GIS friendly format. Ideally, we would carry over the metadata from the HDF5 file, and use more variables from the original file but this example is a start for converting the data."
]
},
{
"cell_type": "code",
"metadata": {
"id": "4BzTeoQMOG-x",
"colab_type": "code",
"colab": {}
},
"source": [
""
],
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment