rileyhales/grids-gist-demo.ipynb

## grids-gist-demo.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Grids Gist Demo.ipynb",
      "private_outputs": true,
      "provenance": [],
      "toc_visible": true,
      "authorship_tag": "ABX9TyPF7X5/5NN2YLhxpvUxoIFm",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/rileyhales/79761303df16127e0195e11425fc2d9d/grids-gist-demo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HC_Qvdx98qfr"
      },
      "source": [
        "# Grids Python Package Demonstration\n",
        "This notebook interactively demonstrates the capabilities of the `grids` python package which lets you query subsets from time series of gridded datasets. For more links and documentation, refer to the [documentation website](https://tsgrids.readthedocs.io/en/latest/).\n",
        "\n",
        "Note: these demonstrations all use data services rather than downloaded copies of the data so that the demonstrations can happen in a portable notebook and without downloads. The data services are generally reliable but I cannot promise they will be functioning when you run the code.\n",
        "\n",
        "Begin by running the cell below to install and import the necessary dependencies."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "dQCvcDsHQqDH"
      },
      "source": [
        "# install latest version of grids\n",
        "! python -m pip install grids -q\n",
        "\n",
        "# installing a development version\n",
        "# ! python -m pip install git+git://github.com/rileyhales/grids.git@<branch-name> -q"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "kgEUL96eidO3"
      },
      "source": [
        "import grids\n",
        "\n",
        "# some other dependencies you need to get polygon data from an Esri service\n",
        "import datetime\n",
        "import tempfile\n",
        "import os\n",
        "import json\n",
        "import requests\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "import geopandas as gpd"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "t1BqS3dRLIfV"
      },
      "source": [
        "grids.__version__"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2w7FYr4gkGXh"
      },
      "source": [
        "# Create the Time Series object\n",
        "\n",
        "In order to retrieve a time series, you need 3 things\n",
        "1. You need local file acces to data file(s) or URL(s) to opendap data services. The `files` variable accepts either the absolute file paths or the URLs. It should always be a list, even if specifying only 1 dataset.\n",
        "2. Variables have several names in each of the various file formats. You'll need to inspect your data to know how the variable is named. This is the air temperature variable for the GFS data example.\n",
        "3. You need to know the dimensions in the file **and** the order they are applied to the variable you're querying. You'll need to inspect your data to know this. If your files conform to common conventions then the order should be (t, z, y, x). Substitute t,z,y,x with the names of the dimensions in your file.\n",
        "\n",
        "```\n",
        "grids.TimeSeries(files, var, dim_order)\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b5xZL7dFbGZM"
      },
      "source": [
        "# Example Dataset: Global Forecast System (GFS)\n",
        "## 3D - time, lat, lon\n",
        "\n",
        "The GFS dataset is a meteorology product produced by NOAA every 6 hours. The GFS dataset is made available through a THREDDS server hosted by UCAR. This example uses the most recent 1-degree spatial resolution product [View all data on this THREDDS Instance](https://thredds.ucar.edu/).\n",
        "\n",
        "### Note on GFS Data Service Availability\n",
        "The GFS data are available most times of the day through a UCAR thredds server. When the data are being updated, the old data are temporarily organized under a time variable and dimension named `time1` instead of the usual `time`. Once the update is completed, the variable and dimension are renamed to `time`. If you get errors in this notebook on any of the cells that query GFS data, try to replace:\n",
        "```\n",
        "dim_order = ('time', 'lat', 'lon', )\n",
        "```\n",
        "with the temporarily correct\n",
        "```\n",
        "dim_order = ('time1', 'lat', 'lon', )\n",
        "```\n",
        "and then rerun all the necessary cells. This only applies to the GFS dataset. If you get errors in any of the other dataset examples provided, the data services are unavailable for unforseen reasons."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "mkE0w6Ixk4S8"
      },
      "source": [
        "# Create an instance of grids.TimeSeries\n",
        "files = [\"https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_onedeg/Best\", ]\n",
        "var = 'Temperature_surface'\n",
        "dim_order = ('time', 'lat', 'lon', )\n",
        "gfs_data_series = grids.TimeSeries(files=files, var=var, dim_order=dim_order)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vGfwIxD_o1Qz"
      },
      "source": [
        "## Point Time Series\n",
        "Request a time series for a point by specifying the coordinates of the location that you want. Specify the coordinates in the order of the dimensions in the file. Use the `None` keyword to specify that you want all timesteps rather than a specific time step. This is the approximate Latitude and Longitude of Rome."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "xO2hGlzvpB8P"
      },
      "source": [
        "point_timeseries = gfs_data_series.point(None, 41.9, 12.5)\n",
        "point_timeseries"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EEtCr-MMtRa1"
      },
      "source": [
        "## Multiple Points Time Series\n",
        "You can also query multiple points. This method uses the same file/remote-data-service connection for each of the queries to prevent duplicate file operations. Provide a list/tuple of points using the same syntax for coordinates as with a single point. Grids will not limit your number of points but each additional point will increase the computation time and you might\n",
        "\n",
        "You can provide a label for each of these points using the `labels` keyword argument or they will be numbered sequentially.\n",
        "\n",
        "***Note:*** gridded data file formats are not optimized for querying disparate points in one operation. So the time to query a dataset will scale linearly with the number of points -> querying 2 points takes 2x as long, 4 points takes 4x as long, etc"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "k-t9FaNUtws8"
      },
      "source": [
        "columns = ['name', 'lat', 'lon']\n",
        "cities = [['Rio de Janeiro', -22.912160, -43.175010],\n",
        "          ['Cairo', 30.060726, 31.254516],\n",
        "          ['Hong Kong', 22.336563, 114.197803],\n",
        "          ['London', 51.502785, -0.106096],\n",
        "          ['Ottowa', 45.404427, -75.696683], ]\n",
        "cities = pd.DataFrame(cities, columns=columns)\n",
        "# convert these points from (-180,180) to (0,360) which is the GFS datum\n",
        "cities.lon = np.where(cities.lon < 0, cities.lon + 360, cities.lon)\n",
        "# add a time dimension coordinate so that we can easily export the coordinates\n",
        "cities['time'] = None\n",
        "coordinates_list = [\n",
        "    tuple(i) for i in cities[['time', 'lat', 'lon']].values\n",
        "]\n",
        "labels = cities.name.to_list()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "r9NrYmP4-GZt"
      },
      "source": [
        "# query each of the points\n",
        "cap_cities_forecast = gfs_data_series.multipoint(*coordinates_list, labels=labels)\n",
        "cap_cities_forecast"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OTldDkx08VFt"
      },
      "source": [
        "## Bounding Box Time Series\n",
        "Request a time series for a bounding box by specifying the minimum and maximum coordinates of the bounds in the same order as with a point."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "uue7Afq28UZw"
      },
      "source": [
        "boundbox_timeseries = gfs_data_series.bound((None, 40, 12), (None, 41, 13))\n",
        "boundbox_timeseries"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zGqKTsvUnIUw"
      },
      "source": [
        "## Polygon Time Series\n",
        "To get a timeseries within a complex shape, such as a polygon, your data must have exactly 2 spatial dimensions and the time dimension ***and*** be in a valid projection ***and*** you must have your polygon data in the same projection.\n",
        "\n",
        "There are 3 ways this can happen and you can alter how grids performs queries using the `behavior` parameter as shown in the following examples\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "k-XmlSMc6ZF6"
      },
      "source": [
        "### Option 1: `behavior=\"dissolve\"` (default)\n",
        "\n",
        "This works when your provided vector data file:\n",
        "- contains 1 polygon/feature\n",
        "- has many polygons/features and you want treat them all as 1 dissolved feature\n",
        "\n",
        "This example gets a geojson of the country boundaries of Italy in EPSG:4326 from an ArcGIS Living Atlas service and uses it to query the GFS data.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "wbCeAOT_bI9X"
      },
      "source": [
        "italy_url = 'https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/World__Countries_Generalized_analysis_trim/FeatureServer/0/query?f=pgeojson&outSR=4326&where=NAME+%3D+%27Italy%27'\n",
        "italy_json = requests.get(url=italy_url).json()\n",
        "italy_path = os.path.join(tempfile.gettempdir(), 'italy.json')\n",
        "with open(italy_path, 'w') as file:\n",
        "  file.write(json.dumps(italy_json))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "JvOhLL5WnveC"
      },
      "source": [
        "italy_timeseries = gfs_data_series.shape(italy_path, behavior='dissolve')\n",
        "italy_timeseries"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ElNyCR1Q5f5g"
      },
      "source": [
        "### Option 2: `behvaior=\"feature\"`\n",
        "\n",
        "This works when your provided vector data file:\n",
        "- has multiple features/polygons and you only want to query a group of them\n",
        "\n",
        "You must provided the `label_attr` and `feature` kwargs. The vector data file will be read and then limited to only features which have the provided `label_attr` attribute and whose value is the provided `feature` value.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "cjDf-oO48xBD"
      },
      "source": [
        "# download a zip file with the USA state boundaries\n",
        "us_state_boundaries_shapefile = 'https://www2.census.gov/geo/tiger/TIGER2017//STATE/tl_2017_us_state.zip'\n",
        "states_zip_path = os.path.join(tempfile.gettempdir(), 'tl_2017_us_state.zip')\n",
        "states_shp_path = os.path.join(tempfile.gettempdir(), 'states.shp')\n",
        "with requests.get(us_state_boundaries_shapefile, stream=True) as r:\n",
        "    r.raise_for_status()\n",
        "    with open(states_zip_path, 'wb') as f:\n",
        "        for chunk in r.iter_content(chunk_size=1024):\n",
        "            if chunk:  # filter out keep-alive new chunks\n",
        "                f.write(chunk)\n",
        "# perform some conversions to get the coordinates to match the GFS' CRS\n",
        "states_gdf = gpd.read_file(states_zip_path)\n",
        "states_gdf['geometry'] = states_gdf.translate(xoff=360, yoff=0)\n",
        "states_gdf.to_file(states_shp_path)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "omSJ7RxE8n81"
      },
      "source": [
        "state_timeseries = gfs_data_series.shape(states_shp_path, behavior='feature', label_attr=\"NAME\", feature=\"Florida\")\n",
        "state_timeseries"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "B-6zmov4uzSf"
      },
      "source": [
        "### Option 3: `behavior='features'`\n",
        "\n",
        "This works when your provided vector data file:\n",
        "- has multiple features/polygons and you want to query each individually\n",
        "\n",
        "If you have multiple polygons contained in your vector data file, you can get a timeseries for each polygon in the file if you choose `behavior=\"features\"`. You must specify the `label_attr` parameter. The `label_attr` should provide a unqie value for each feature which can be used to label the columns of the dataframe returned by this function. If multiple polygons have the same value in the `label_attr` attribute, each new polygon queried will overwrite the previous polygon queried with the same value.\n",
        "\n",
        "Warning: This operation will take longer than masking with a single polygon.\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "u3Gmlk_xzoxV"
      },
      "source": [
        "states_timeseries = gfs_data_series.shape(states_shp_path, behavior='features', label_attr=\"NAME\")\n",
        "states_timeseries"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1z8N8ef_uZA6"
      },
      "source": [
        "## Query A Time Range\n",
        "\n",
        "Querying specific time ranges can be tricky since the source multidimensional datasets can store their time coordinates in many ways. In order for you to query specific values along the time dimension, you'll need to know how they chose to store those values in advance and provide time coordinates in the same format/units. Example: if your time stamps are stored in the multidimensional datasets with units of seconds since Jan 1 1970 UTC, you should provide a timestamp such as `1583971200` to correspond to `2020-03-12T00:00:00+00:00`.\n",
        "\n",
        "If you do not want to query a specific time range, You can use the Python `None` value to indicate you want to query all values along any dimension instead of providing a coordinate value.\n",
        "\n",
        "Because `grids` uses `xarray` to automate connecting to remote datasets through opendap, the time values coming from remote datasets are typically interpretted as numpy datetime64 values. You can easily create those using the `datetime.datetime()` and `numpy.datetime64()` functions. In order to query a range of time values, you'll need to use the TimeSeries.bound method so you can provide a minimum and maximum coordinate. This example queries the next 2.5 days of GFS data for roughly the boundaries of Mongolia.\n",
        "\n",
        "If you are querying a series of files which each contain a 1 time step, you should use `None` as the time coordinate and limit the time range by controlling which files get queried."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "3A_ssGMdvmyR"
      },
      "source": [
        "now = datetime.datetime.now()\n",
        "later = now + datetime.timedelta(days=2, hours=12)\n",
        "now = np.datetime64(now)\n",
        "later = np.datetime64(later)\n",
        "\n",
        "time_filtered_series = gfs_data_series.bound(\n",
        "    (now, 42, 89),\n",
        "    (later, 51, 117),\n",
        "    stats='median'\n",
        ")\n",
        "time_filtered_series"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ojqF4ECYEAtJ"
      },
      "source": [
        "You can also provide a tuple of minimum/maximum time stamps with the `time_range` argument when querying with the `.shape` function."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ckX0R9S0D_5U"
      },
      "source": [
        "timed_shape_ts = gfs_data_series.shape(states_shp_path,\n",
        "                                       time_range=(now, later),\n",
        "                                       behavior='feature',\n",
        "                                       label_attr=\"NAME\",\n",
        "                                       feature=\"Pennsylvania\")\n",
        "timed_shape_ts"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "12Z3I_PK-MP_"
      },
      "source": [
        "## Querying with Statistics/Reducers\n",
        "If your query involves a range of values (e.g. querying a bounding box or polygon from a shapefile) you can provide a series of summary statistics, often called ***reducers***, to compute for the range of array values. Specifically, this applies to the following kinds of queries:\n",
        "- `TimeSeries.bound()`\n",
        "- `TimeSeries.range()` (alias for bound)\n",
        "- `TimeSeries.shape()`"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AQq2JMV4_Klx"
      },
      "source": [
        "### Query Many Stats at Once\n",
        "\n",
        "Specify the `stats` parameter in your `TimeSeries` object or as a keyword argument when calleding each functions. You can specify most summary stats available through `numpy`, namely: max, median, mean, min, and various percentiles. Refer to the documentation for full specifications.\n",
        "```\n",
        "stats_list = 'all'\n",
        "# grids interprets as: ('mean', 'median', 'max', 'min', 'sum', 'std', )\n",
        "\n",
        "# list of stats and with percentiles as a string:\n",
        "stats_list = ('max', 'median', 'mean', '75%', '25%', )\n",
        "\n",
        "stats_list = 'box'\n",
        "# grids interprets as: ('max', '75', 'median', 'mean', '25%', 'min', )\n",
        "```"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "O7Vri27kBld7"
      },
      "source": [
        "stats_ts = gfs_data_series.bound((None, 22, 25), (None, 30, 33.6), stats='box')\n",
        "stats_ts"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3WwXNPKEEobF"
      },
      "source": [
        "### Query All Values\n",
        "If you want to do a more complicated statistical analysis of the exctracted data, generate plots, or otherwise need all data, you can also choose not to compute statistics on the many extracted cells within your query range. Instead, the array of many cells which are extracted from the dataset are turned into a 1D/flattened list and stored in a single entry in the Pandas DataFrame. Due to the potential for infinite complexity in organizing/numbering each cell individually, cells will only be returned as a flattened list and no in separate columns.\n",
        "\n",
        "To return all values, specify the keyword argument (kwarg) `stats='values'` On the `TimeSeries.bound` method or the `TimeSeries.shape` method. You can also include values in the the list of other stats you compute. For instance: `stats=('values', 'mean', 'median')`. If you need change the stats you want to compute after creating the TimeSeries object, be sure you provide a list of stats, even if the length of that list is 1."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "eOLmoRqzB86J"
      },
      "source": [
        "vals_ts = gfs_data_series.bound((None, 22, 25), (None, 30, 33.6), stats='values')\n",
        "vals_ts"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6uDw7-KsO2K1"
      },
      "source": [
        "You can then use that column of all values to compute statistics or perform any other necessary steps. You will find `pd.Series.apply()` useful to reduce each list in the Series."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "wiCcsxhvCJx6"
      },
      "source": [
        "vals_ts['mean'] = vals_ts['Temperature_surface_values'].apply(np.nanmean)\n",
        "vals_ts['median'] = vals_ts['Temperature_surface_values'].apply(np.nanmedian)\n",
        "vals_ts"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "g732QcmEPV-d"
      },
      "source": [
        "## Query multiple variables at once\n",
        "\n",
        "While these multidimensional datasets are intended to be queried one variable at a time, you may provide a list of variables to query. Grids will internally reuse opened files and data server connections to reduce the overhead that would accumulate if you used grids in a loop.\n",
        "\n",
        "***Note:*** You can only use this feature if ***all*** the variables share the same dimension order ***and*** the same time coordinate\n",
        "\n",
        "This example uses the same GFS dataset but queries multiple variables for a point."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Fm1z_OvjQBeA"
      },
      "source": [
        "files = [\"https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_onedeg/Best\", ]\n",
        "var = ('Temperature_surface', 'Precipitation_rate_surface', 'Wind_speed_gust_surface', 'Pressure_surface')\n",
        "dim_order = ('time', 'lat', 'lon', )\n",
        "gfs_data_series = grids.TimeSeries(files=files, var=var, dim_order=dim_order)\n",
        "\n",
        "# this point corresponds to Brigham Young University in Provo, Utah\n",
        "# Add 360 to the longitude to convert to GFS's 0-360 datum\n",
        "multiple_variable_data = gfs_data_series.point(None, 40.25, -111.65 + 360)\n",
        "multiple_variable_data"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JsMJYrLo4HTp"
      },
      "source": [
        "## Query more than 3 dimensions\n",
        "### 4D - time, depth, lat, lon\n",
        "Data frequently only have 3 dimensions or less: usually time , x/longitude, y/latitude. However, some variables have additional dimensions such as z/altitude/depth/etc, ensemble number, pressure levels, etc. Querying data with more dimensions works just like querying data with 3 dimensions.\n",
        "\n",
        "This example uses the same GFS dataset but queries one of the variables with more than 3 dimensions."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "gkr4yGce4GnB",
        "collapsed": true
      },
      "source": [
        "data = ['https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_onedeg/Best',]\n",
        "var = 'Liquid_Volumetric_Soil_Moisture_non_Frozen_depth_below_surface_layer'\n",
        "dim_order = ('time', 'depth_below_surface_layer', 'lat', 'lon')\n",
        "a = grids.TimeSeries(data, var, dim_order)\n",
        "\n",
        "# Specify 1 coordinate value for each dimension, in the same order as dim_order\n",
        "# None indicates querying all time steps\n",
        "# 1 indicates a depth of 1 (meter, determined from the units of the variable)\n",
        "# 8 indicates a latitude of 8\n",
        "# 295 indicates a longitude of 295 (GFS measures longitude from 0 to 360)\n",
        "a.point(None, 1, 8, 295)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "D-JyjT2oOLHs"
      },
      "source": [
        "# Example Dataset: Palmer Drought Severity Index (PDSI)"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 3D - time, lat, lon\n",
        "\n",
        "The PDSI dataset is an index for monitoring drought conditions. It is a 3-dimensional dataset (time, lat, lon) and is available on a NOAA THREDDS server which provides OPeNDAP data services (https://psl.noaa.gov/thredds). This example shows how to query various values of PDSI for Egypt."
      ],
      "metadata": {
        "id": "MJuf4GjPqA4q"
      }
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0Oxzgd6lNoZz"
      },
      "source": [
        "pdsi_url = ['https://psl.noaa.gov/thredds/dodsC/Datasets/dai_pdsi/pdsi.mon.mean.selfcalibrated.nc', ]\n",
        "var = 'pdsi'\n",
        "dim_order = ('time', 'lat', 'lon')\n",
        "stats_list = ('max', 'median', 'mean', 'min', '25%', '75%')\n",
        "pdsi_ts = grids.TimeSeries(pdsi_url, var, dim_order, stats=stats_list)\n",
        "\n",
        "# retrieve 6 timeseries of statistical values for the range of cells in the lat/lon bounding box of Egypt\n",
        "pdsi_ts.bound((None, 22, 25), (None, 30, 33.6))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "c7w4svoOEn0H"
      },
      "source": [
        "pdsi_df = pdsi_ts.bound((None, 22, 25), (None, 30, 33.6), stats='values')\n",
        "pdsi_df"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Uu7oE34aNhYi"
      },
      "source": [
        "pdsi_df['mean'] = pdsi_df['pdsi_values'].apply(np.nanmean)\n",
        "pdsi_df['median'] = pdsi_df['pdsi_values'].apply(np.nanmedian)\n",
        "pdsi_df"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "btSnCoCIlW--"
      },
      "source": [
        "# Example Dataset: National Water Model (NWM)"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 2D - time, feature_id\n",
        "Hydroshare.org keeps a record of some old National Water Model (NWM) Forecasts. The NWM is a hydrologic model which outputs NetCDF file time series organized by a \"feature_id\"- the numeric identifier for the stream segments which are reporting points of the model. (see https://water.noaa.gov/map for help picking the ID for a certain river segment)\n",
        "\n",
        "Some feature_id's worth trying (substitute these ID's in nwm_series.point()):\n",
        "- South Santee (South Carolina): 166741050\n",
        "- Cape Fear River (North Carolina): 8832652\n",
        "- Mississippi Outlet: 933160088"
      ],
      "metadata": {
        "id": "kc4pig-RqEAG"
      }
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "N3WeCviPlTx1"
      },
      "source": [
        "files = [\"http://thredds.hydroshare.org/thredds/dodsC/nwm/florence/nwm.20180901/short_range/nwm.t00z.short_range.channel_rt.f001.conus.nc\",\n",
        "         \"http://thredds.hydroshare.org/thredds/dodsC/nwm/florence/nwm.20180901/short_range/nwm.t00z.short_range.channel_rt.f002.conus.nc\",\n",
        "         \"http://thredds.hydroshare.org/thredds/dodsC/nwm/florence/nwm.20180901/short_range/nwm.t00z.short_range.channel_rt.f003.conus.nc\",\n",
        "         \"http://thredds.hydroshare.org/thredds/dodsC/nwm/florence/nwm.20180901/short_range/nwm.t00z.short_range.channel_rt.f004.conus.nc\",\n",
        "         \"http://thredds.hydroshare.org/thredds/dodsC/nwm/florence/nwm.20180901/short_range/nwm.t00z.short_range.channel_rt.f005.conus.nc\",]\n",
        "variable = \"streamflow\"\n",
        "dim_order = (\"feature_id\", )\n",
        "nwm_series = grids.TimeSeries(files, variable, dim_order)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MTFPOZVLlyHr"
      },
      "source": [
        "# the \"point\" in this 1 dimensional variable is the ID of a stream reported by the model\n",
        "nwm_series.point(166741050)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2shLwwigc8Uo"
      },
      "source": [
        "# Example Dataset: Global Land Data Assimilation System (GLDAS)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 3D - time, lat, lon\n",
        "***NASA OPeNDAP Server - GESDISC login required***\n",
        "\n",
        "A huge number of datasets produced by NASA are available for download and also for querying through OPeNDAP providing servers such Hyrax, GraDS, and THREDDS. For example, this is one way to query the GLDAS product. You will need to provide a GES Disc username and password.\n",
        "\n",
        "Note- NASA data services are always slow and the authentication process slows it down further. While `grids` can successfully query these datasets, NASA managed data services are generally difficult to use. This example may time out or take a very long time."
      ],
      "metadata": {
        "id": "2VGNGGmFqG3Q"
      }
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "OBKjztD0xdDW"
      },
      "source": [
        "gesdisc_username = \"your_username\"\n",
        "gesdisc_password = \"your_password\""
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ROBddJMleAaO"
      },
      "source": [
        "gldas_url = [\"https://hydro1.gesdisc.eosdis.nasa.gov/thredds/dodsC/GLDAS_aggregation/GLDAS_NOAH025_M.2.1/GLDAS_NOAH025_M.2.1_Aggregation.ncml\", ]\n",
        "var = \"Tair_f_inst\"\n",
        "dim_order = ('time', 'lat', 'lon', )\n",
        "gldas_data_series = grids.TimeSeries(gldas_url, var, dim_order, user=gesdisc_username, pswd=gesdisc_password)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "945lJHGreNIH",
        "collapsed": true
      },
      "source": [
        "# these coordinates are roughly the bounds of Egypt\n",
        "gldas_data_series.bound((None, 22.25, 25), (None, 31.75, 35))"
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}