Skip to content

Instantly share code, notes, and snippets.

@jdbcode
Last active December 18, 2023 16:48
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jdbcode/cc1db9e9555bf988e4ac3ebca5fa467b to your computer and use it in GitHub Desktop.
Save jdbcode/cc1db9e9555bf988e4ac3ebca5fa467b to your computer and use it in GitHub Desktop.
Earth Engine Data Converters
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true,
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/jdbcode/1923d6ff45e3885c5a2b986e424266ec/copy-of-earth-engine-data-converters.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"source": [
"#@title Copyright 2023 The Earth Engine Community Authors { display-mode: \"form\" }\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
],
"metadata": {
"id": "3Ep_IfWgIDHc"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Earth Engine Data Converters AGU 2023\n",
"\n",
"Author: jdbcode\n",
"\n",
"Data converters are client-side conversion capabilities built into [`getPixels`](https://developers.google.com/earth-engine/apidocs/ee-data-getpixels), [`computePixels`](https://developers.google.com/earth-engine/apidocs/ee-data-computepixels), [`listFeatures`](https://developers.google.com/earth-engine/apidocs/ee-data-listfeatures), and [`computeFeatures`](https://developers.google.com/earth-engine/apidocs/ee-data-computefeatures). By specifying a compatible `fileFormat`, these methods can return data in Python-native formats like structured NumPy arrays for rasters and Pandas DataFrames or GeoPandas GeoDataFrames for vectors. In the case of vectors, the `listFeatures` and `computeFeatures` methods will make several network requests to fetch all the pages of the table before returning the Python object.\n",
"\n",
"All of these methods transfer data from Earth Engine servers to a client machine using the [interactive processing environment](https://developers.google.com/earth-engine/guides/processing_environments#interactive_environment), which is optimized for answering small requests quickly. As such, it enforces limits on request size and compute time. You'll need to keep this in mind as you're coding your analysis and decide whether exporting data using the [batch processing environment](https://developers.google.com/earth-engine/guides/processing_environments#batch_environment) would be better. For example, see `ee.date.computePixel` limits in the [reference docs](https://developers.google.com/earth-engine/reference/rest/v1/projects.image/computePixels).\n",
"\n",
"Some common use cases for data converters are fetching many small image tiles in parallel (e.g., training ML models or automated serial workflows) and for visualization and data exploration with your favorite Python libraries. This notebook focuses on data exploration and visualization; if you're interested in learning about fetching data in parallel, see the Medium blog post \"[Pixels to the people!](https://medium.com/google-earth/pixels-to-the-people-2d3c14a46da6)\".\n",
"\n",
"In this demo we'll use `computeFeatures` and `computePixels`. Also, at the end you'll see an example of using the 3rd party library `Xee` for requesting small chunks of data in the Xarray Dataset format."
],
"metadata": {
"id": "zH_duK7No78T"
}
},
{
"cell_type": "markdown",
"source": [
"# Setup\n",
"\n",
"Import libraries and authenticate to Earth Engine and initialize the API."
],
"metadata": {
"id": "rUSOJ7iBwQQq"
}
},
{
"cell_type": "code",
"source": [
"import altair as alt\n",
"import ee\n",
"import eerepr\n",
"import geopandas as gpd\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"from mpl_toolkits.axes_grid1 import ImageGrid\n",
"import geemap\n",
"import geemap.colormaps as cm"
],
"metadata": {
"id": "n3MPcx_CwlbM"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3-F8V3Ku7mHm"
},
"outputs": [],
"source": [
"ee.Authenticate()\n",
"ee.Initialize(project='YOUR-PROJECT-ID')"
]
},
{
"cell_type": "markdown",
"source": [
"# Data\n",
"\n",
"In this notebook we'll be looking at watersheds in Washington state (USA) and long-term climate averages."
],
"metadata": {
"id": "8i4YqQctWcmL"
}
},
{
"cell_type": "markdown",
"source": [
"Define asset paths for [basins](https://developers.google.com/earth-engine/datasets/catalog/WWF_HydroSHEDS_v1_Basins_hybas_6), [state boundaries](https://developers.google.com/earth-engine/datasets/catalog/FAO_GAUL_2015_level1), and [climate averages](https://developers.google.com/earth-engine/datasets/catalog/WORLDCLIM_V1_MONTHLY)."
],
"metadata": {
"id": "o29zCdggSOCv"
}
},
{
"cell_type": "code",
"source": [
"BASINS_ID = 'WWF/HydroSHEDS/v1/Basins/hybas_6'\n",
"BOUNDARIES_ID = 'FAO/GAUL/2015/level1'\n",
"CLIMATE_ID = 'WORLDCLIM/V1/MONTHLY'"
],
"metadata": {
"id": "P8XV8eVDSLLw"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Import the basins asset and subset watersheds that intersect Washington state. The result is a `ee.FeatureCollection`."
],
"metadata": {
"id": "_IbFYce5SZtX"
}
},
{
"cell_type": "code",
"source": [
"basins = ee.FeatureCollection(BASINS_ID)\n",
"wa = ee.FeatureCollection(BOUNDARIES_ID).filter(\n",
" 'ADM0_NAME == \"United States of America\" && '\n",
" 'ADM1_NAME == \"Washington\"'\n",
")\n",
"\n",
"wa_basins = basins.filterBounds(wa)"
],
"metadata": {
"id": "gpqewRSgCTMG"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Import the WorldClim climate image collection (each image is the average historical climate for a month), subset the precipitation band and stack the individual images into a single image (each band represents a historical monthly mean). Inspect the resulting `ee.Image` band names to see that bands are named like `prec_month_01` and `prec_month_02`, indicating mean precipitation for January and February, respectively."
],
"metadata": {
"id": "iau9Sav7SgTS"
}
},
{
"cell_type": "code",
"source": [
"precip = ee.ImageCollection(CLIMATE_ID).select('prec')\n",
"\n",
"months = precip.aggregate_array('month').getInfo()\n",
"\n",
"band_names = [f'prec_month_{str(m).zfill(2)}' for m in months]\n",
"\n",
"monthly_precip = ee.Image(precip.toBands().rename(band_names))\n",
"\n",
"monthly_precip.bandNames()"
],
"metadata": {
"id": "SV5DtihmE81A"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Calculate historical mean monthly precipitation for each Washington watershed. These zonal statistics are added as attributes to the `wa_basins` feature collection."
],
"metadata": {
"id": "-wjsFxcPU5Jr"
}
},
{
"cell_type": "code",
"source": [
"wa_basins = monthly_precip.reduceRegions(\n",
" collection=wa_basins,\n",
" reducer=ee.Reducer.mean(),\n",
" scale=1e3\n",
")"
],
"metadata": {
"id": "VJlSKrWoJGJR"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Display the data in geemap for interactive exploring."
],
"metadata": {
"id": "LYXNc8pG-ld9"
}
},
{
"cell_type": "code",
"source": [
"m = geemap.Map()\n",
"m.center_object(wa, 7)\n",
"precip_vis = {'min': 0, 'max': 500, 'palette': 'YlGnBu'}\n",
"m.add_layer(monthly_precip.select('prec_month_01'), #\n",
" precip_vis, 'Jan precip (mm)')\n",
"m.add_layer(wa, {'color': 'blue'}, 'WA boundary')\n",
"m.add_layer(wa_basins, {'color': 'white'}, 'WA basins')\n",
"m.add_colorbar(precip_vis, position='bottomright', label=\"Precip (mm)\", orientation=\"horizontal\")\n",
"m"
],
"metadata": {
"id": "bonRPQM057vL"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Converters"
],
"metadata": {
"id": "6kEOrBCkWvPQ"
}
},
{
"cell_type": "markdown",
"source": [
"## `FeatureCollection` to Pandas `DataFrame`\n",
"\n",
"An `ee.FeatureCollection` is Earth Engine's table data type. Each `ee.Feature` in the collection can be thought of as a row and each of its properties as a column - one column stores the geometry. The EE API has a rich set of methods for working with feature collections, but feature collections are difficult to view as a table and you may prefer to use [Pandas](https://pandas.pydata.org/) for analysis. We can transfer the data client-side as a Pandas `DataFrame`.\n"
],
"metadata": {
"id": "JP5PYyTE49EU"
}
},
{
"cell_type": "markdown",
"source": [
"We can print the head of the Washington watersheds feature collection to preview the table, but the JSON structure makes it hard to interpret and conceptualize as a table (even with the help of `eerep` for rich object representation)."
],
"metadata": {
"id": "gv67sk9-qo3z"
}
},
{
"cell_type": "code",
"source": [
"wa_basins.limit(5)"
],
"metadata": {
"id": "f-5szE9oJd0A"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Use the `ee.data.computeFeatures` with the `fileFormat` parameter set to `'PANDAS_DATAFRAME'` to get the data as a Pandas `DataFrame`.\n",
"\n"
],
"metadata": {
"id": "elwIdkRSq7bm"
}
},
{
"cell_type": "code",
"source": [
"wa_basins_df = ee.data.computeFeatures({\n",
" 'expression': wa_basins,\n",
" 'fileFormat': 'PANDAS_DATAFRAME'\n",
"})"
],
"metadata": {
"id": "ht9YrKy6KyDl"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Print the object's type and see that it is `pandas.core.frame.DataFrame\n",
"`. Printing the head of the object shows the nicely formatted table."
],
"metadata": {
"id": "2xxl6CWzrhWj"
}
},
{
"cell_type": "code",
"source": [
"display(type(wa_basins_df))\n",
"wa_basins_df.head()"
],
"metadata": {
"id": "mgInawXQrg7x"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now we can use Pandas syntax to convert the wide table into a long table so that month is a factor and precipitation is a variable."
],
"metadata": {
"id": "hoJPYdHqsN52"
}
},
{
"cell_type": "code",
"source": [
"wa_basins_df = wa_basins_df.melt(\n",
" id_vars=[\"HYBAS_ID\"],\n",
" value_vars=band_names,\n",
" var_name=\"Month\",\n",
" value_name=\"Precipitation\",\n",
")\n",
"wa_basins_df"
],
"metadata": {
"id": "TvsP6iR5MyMq"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We can use Pandas' [`groupby`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html) method and built-in [matplotlib charting wrappers](https://pandas.pydata.org/docs/user_guide/visualization.html#basic-plotting-plot) for a quick look at mean total annual precipition for each watershed."
],
"metadata": {
"id": "ZmoUAUwMn_mE"
}
},
{
"cell_type": "code",
"source": [
"wa_basins_df.groupby(['HYBAS_ID'])['Precipitation'].sum().plot.bar()"
],
"metadata": {
"id": "zFIQ1WV1gOu-"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"There are lots of charting libraries for visualizing `DataFrame` objects. Here, we plot the data with [Altair](https://altair-viz.github.io/gallery/index.html#) using a stacked column chart to show the total mean annual precipitation (by monthly contribution) for each watershed intersecting Washington state. We can see quite a range in total precipitation and that summer months are generally drier than other months."
],
"metadata": {
"id": "nAvHPlG_s3oA"
}
},
{
"cell_type": "code",
"source": [
"alt.Chart(wa_basins_df).mark_bar().encode(\n",
" x=alt.X('HYBAS_ID:O'),\n",
" y=alt.Y('Precipitation:Q', title='Precipitation (mm)'),\n",
" color=alt.Color('Month', scale=alt.Scale(scheme='rainbow')),\n",
" tooltip=alt.Tooltip(['HYBAS_ID', 'Precipitation', 'Month'])\n",
").interactive()"
],
"metadata": {
"id": "AYDwssWoSgMR"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## `FeatureCollection` to GeoPandas `GeoDataFrame`\n",
"\n",
"Use the `ee.data.computeFeatures` with the `fileFormat` parameter set to `'GEOPANDAS_GEODATAFRAME'` to get the Washington basins climate data as a [GeoPandas](https://geopandas.org/en/v0.14.0/index.html) `GeoDataFrame`. It allows you to use the table manipulation and querying functions of Pandas, in addition to geospatial operations and visualizations."
],
"metadata": {
"id": "Wq4Puu53aALt"
}
},
{
"cell_type": "code",
"source": [
"wa_basins_gdf = ee.data.computeFeatures({\n",
" 'expression': wa_basins,\n",
" 'fileFormat': 'GEOPANDAS_GEODATAFRAME'\n",
"})\n",
"\n",
"# Need to set the CRS.\n",
"# Make sure it matches the CRS of FeatureCollection geometries.\n",
"wa_basins_gdf.crs = 'EPSG:4326'\n",
"\n",
"display(type(wa_basins_gdf))\n",
"wa_basins_gdf.head()"
],
"metadata": {
"id": "6gkcmEYRCkXq"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Fetch the Washington state boundary as `GeoDataFrame` too."
],
"metadata": {
"id": "kRb6E8Kqn5Az"
}
},
{
"cell_type": "code",
"source": [
"wa_gdf = ee.data.computeFeatures({\n",
" 'expression': wa,\n",
" 'fileFormat': 'GEOPANDAS_GEODATAFRAME'\n",
"})\n",
"\n",
"wa_gdf.crs = 'EPSG:4326'"
],
"metadata": {
"id": "VRsjLDgLlccc"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Clip the watershed geometries by the Washington state boundary using the `GeoDataFrame.clip` method and convert the original mercator projection to a Washington-specific projection for better visualization."
],
"metadata": {
"id": "PUG3kUwvoJRO"
}
},
{
"cell_type": "code",
"source": [
"wa_basins_gdf = wa_basins_gdf.clip(wa_gdf).to_crs(2856)"
],
"metadata": {
"id": "UVxnlYs40Tsl"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Sum the 12 months of mean precipitation for each watershed and append the values in a new column called `prec_total`."
],
"metadata": {
"id": "3OGJefNPoz0X"
}
},
{
"cell_type": "code",
"source": [
"wa_basins_gdf['prec_total'] = wa_basins_gdf[band_names].sum(axis=1)\n",
"wa_basins_gdf.head()"
],
"metadata": {
"id": "pqMHjXI6yrca"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Use [matplotlib](https://matplotlib.org/) to plot a choropleth map of mean annual precipitation by watershed and highlight the watershed with the minimum precipitation using a red border."
],
"metadata": {
"id": "B8spnY3Mpe_1"
}
},
{
"cell_type": "code",
"source": [
"# Define the choropleth map.\n",
"ax = wa_basins_gdf.plot(\n",
" column='prec_total',\n",
" cmap='viridis_r',\n",
" vmin=wa_basins_gdf['prec_total'].min(),\n",
" vmax=wa_basins_gdf['prec_total'].max(),\n",
" legend=False,\n",
" edgecolor='grey', linewidth=0.5\n",
")\n",
"\n",
"# Highlight the basin with the minimum annual precipitation: subset the geometry\n",
"# with the minimum precipitation total and then add it to the basin\n",
"# precipitation plot.\n",
"min_prec_gdf = wa_basins_gdf.loc[[wa_basins_gdf['prec_total'].idxmin()]]\n",
"min_prec_gdf.plot(ax=ax, color='none', edgecolor='red', linewidth=2)\n",
"\n",
"# Add axis labels, a colorbar, and rotate x axis ticks.\n",
"ax.set_xlabel('Eastings [m]')\n",
"ax.set_ylabel('Northings [m]')\n",
"colorbar = plt.colorbar(ax.get_children()[0], fraction=0.03)\n",
"colorbar.set_label('Precipitation (mm)')\n",
"plt.xticks(rotation=45)\n",
"\n",
"plt.show()"
],
"metadata": {
"id": "A_9g3plzEmci"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## `Image` to NumPy structured array\n",
"\n",
"Here we use `ee.data.computePixels` to request the `monthly_precip` computed Earth Engine image (each band is mean precipitation for a given month) as a [NumPy structured array](https://numpy.org/doc/stable/user/basics.rec.html). It is a global dataset, so we'll request only the Washington state basins bounding region at a resolution of 1500 meters. We can use the [`ee.Image.clipToBoundsAndScale`](https://developers.google.com/earth-engine/apidocs/ee-image-cliptoboundsandscale#colab-python) function to do this, which is a convenient alternative to supplying the `grid` argument."
],
"metadata": {
"id": "6fYvJO6o2vd0"
}
},
{
"cell_type": "code",
"source": [
"monthly_precip_washington = monthly_precip.clipToBoundsAndScale(\n",
" geometry=wa_basins, scale=1500\n",
")\n",
"\n",
"monthly_precip_npy = ee.data.computePixels({\n",
" 'expression': monthly_precip_washington,\n",
" 'fileFormat': 'NUMPY_NDARRAY'\n",
"})\n",
"\n",
"monthly_precip_npy"
],
"metadata": {
"id": "Zs3DLlxF1V-v"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"NumPy structured arrays work well for multiband image data. You can think of them as a table of arrays where each band is a column accessible from a field (band) name. It also permits each band to have a different data type.\n",
"\n",
"For example, get the list of field (band) names and then subset an array by name and print its shape and display a preview of it."
],
"metadata": {
"id": "Q-O7ZjcZ5EHQ"
}
},
{
"cell_type": "code",
"source": [
"names = monthly_precip_npy.dtype.names\n",
"print('field names:', names)\n",
"\n",
"prec_month_10_arr = monthly_precip_npy['prec_month_10']\n",
"print('Selected array (band) shape:', prec_month_10_arr.shape)\n",
"display(prec_month_10_arr)\n",
"plt.imshow(prec_month_10_arr, vmin=0, vmax=320)"
],
"metadata": {
"id": "gyPBhfG-Jh2I"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Since we have all months of mean precipitation, we can use the matplotlib [`ImageGrid`](https://matplotlib.org/stable/gallery/axes_grid1/simple_axesgrid.html) function to show a time series grid for simple visual interpolation of intra-annual precipitation patterns."
],
"metadata": {
"id": "Zr-idF22ALDA"
}
},
{
"cell_type": "code",
"source": [
"# Set up the figure and grid.\n",
"fig = plt.figure(figsize=(20.0, 20.0))\n",
"grid = ImageGrid(\n",
" fig,\n",
" 111,\n",
" nrows_ncols=(4, 3),\n",
" axes_pad=0.4,\n",
" cbar_mode=\"single\",\n",
" cbar_location=\"right\",\n",
" cbar_pad=0.4,\n",
" cbar_size=\"2%\",\n",
")\n",
"\n",
"# Display each band to a grid cell.\n",
"for ax, name in zip(grid, names):\n",
" ax.imshow(monthly_precip_npy[name], vmin=0, vmax=500)\n",
" ax.set_title(name)\n",
"\n",
"# Add colorbar.\n",
"colorbar = plt.colorbar(ax.get_children()[0], cax=grid[0].cax)\n",
"colorbar.set_label(\"Precipitation (mm)\")\n",
"\n",
"plt.show()"
],
"metadata": {
"id": "IY_TDTJhq7vT"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## `Image` to Xarray Dataset\n",
"\n",
"Use the 3rd party library [`Xee`](https://github.com/google/Xee) to work with Earth Engine's data catalog data from Xarray."
],
"metadata": {
"id": "COUWMliEG7Sj"
}
},
{
"cell_type": "code",
"source": [
"!pip install --upgrade -q xee"
],
"metadata": {
"id": "79gJh_u1G_Oq"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import xarray as xr"
],
"metadata": {
"id": "gUkrO8XlHT9n"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"ic = ee.ImageCollection('ECMWF/ERA5_LAND/HOURLY').filterDate('2020-06-01', '2020-07-01')\n",
"aoi = ee.Geometry.BBox(-125, 32, -113, 43)\n",
"ds = xr.open_dataset(\n",
" ic,\n",
" engine='ee',\n",
" crs='EPSG:4326',\n",
" scale=0.25,\n",
" geometry=aoi\n",
")"
],
"metadata": {
"id": "d4OCkylkHiX1"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"ds"
],
"metadata": {
"id": "FGYemRPjHjjn"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"air = ds.temperature_2m - 273.15\n",
"air.attrs = ds.temperature_2m.attrs\n",
"air.attrs['units'] = \"deg C\"\n",
"air"
],
"metadata": {
"id": "Sgnj5E_H-kgA"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"air1d = air.sel(lat=37.77, lon=-122.42, method='nearest')\n",
"air1d.plot()"
],
"metadata": {
"id": "d1Qq8yup_dTC"
},
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment