Skip to content

Instantly share code, notes, and snippets.

@rileyhales
Created April 7, 2021 16:35
Show Gist options
  • Save rileyhales/d5290e12b5858d59960d0898fbd0ed69 to your computer and use it in GitHub Desktop.
Save rileyhales/d5290e12b5858d59960d0898fbd0ed69 to your computer and use it in GitHub Desktop.
geoglows_bias_correction
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "geoglows_bias_correction",
"private_outputs": true,
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/rileyhales/d5290e12b5858d59960d0898fbd0ed69/geoglows_bias_correction.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tOwg7IFhUZTh"
},
"source": [
"# Bias Correction\n",
"\n",
"In order to use the bias correction tools in the `geoglows` package, you need 3 things. \n",
"\n",
"1. Observed streamflow data\n",
"2. Simulated historical streamflow data from the `geoglows` model\n",
"3. Data to correct: either the historical data, or any other timeseries of simulated flows from the `geoglows` model\n",
"\n",
"The simulated historical and predicted streamflow are available through the GEOGloWS ECMWF Streamflow model via the geoglows python package.\n",
"\n",
"Methods for recording streamflow and formats to save them in vary by country and not all streamflow is publically available online. As such, there is not a generic tool for retrieving observed streamflow through the geoglows package. You will need to provide it yourself.\n",
"\n",
"Lets start by installing the geoglows tools in this notebook environment and importing some other dependencies we'll need."
]
},
{
"cell_type": "code",
"metadata": {
"id": "uSd2bp8N4Vt-"
},
"source": [
"# Start by installing the package and importing it to your code. Run this cell to do that.\n",
"!pip install geoglows\n",
"import geoglows\n",
"from IPython.core.display import display, HTML\n",
"import pandas as pd\n",
"from google.colab import files"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "O8xfont_XaOX"
},
"source": [
"## Step 1: Upload a csv on your computer for Bias Correction\n",
"\n",
"If you have a copy of a file on your machine you would like to use for bias correction, you can upload it to this notebook environment with the Google Collaboratory `files.upload()` feature. Please note, this file is not saved once you leave this page. If you revisit this page, you will need to upload again.\n",
"\n",
"Your csv should have 2 columns and both should have names as the first item. The first one should be titled `datetime` and contain dates in a standard format. The other may have any title but ***must*** contain streamflow values in cubic meters per second (m^3/s)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HrwuUswSgZWs"
},
"source": [
"OPTIONAL IF YOU NEED DEMO DATA, DOWNLOAD THIS CSV\n",
"\n",
"https://www.hydroshare.org/resource/d222676fbd984a81911761ca1ba936bf/data/contents/Discharge_Data/23187280.csv"
]
},
{
"cell_type": "code",
"metadata": {
"id": "CD7GYMRRUEXU"
},
"source": [
"uploaded = files.upload()\n",
"for fn in uploaded.keys():\n",
" uploaded_file_name = fn\n",
" print(f'User uploaded file \"{fn}\"')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "1CzFGm3UZxuK"
},
"source": [
"observed_data = pd.read_csv(uploaded_file_name, index_col=0)\n",
"observed_data.index = pd.to_datetime(observed_data.index).tz_localize('UTC')\n",
"print('Here is a preview of your data')\n",
"print(observed_data.head(10))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "Pylhpr_ra8wV"
},
"source": [
"## Step 2: Get the historical data for correction\n",
"\n",
"Hydrologic models number the streams they simulate so that the results can be stored and organized. The GEOGloWS ECMWF model refers to these as `reach_id's` and has an interface for finding them programatically by providing a latitude and longitude.\n",
"\n",
"Use the latitude and longitude of your stream gauge to find the model's reporting point closest to your gauge for comparison."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4DrT_zCQgm_d"
},
"source": [
"OPTIONAL: IF YOU DOWNLOADED THE DEMO DATA, USE THIS LAT/LON PAIR \n",
"latitude = 7.81179264 \n",
"longitude = -73.8105294"
]
},
{
"cell_type": "code",
"metadata": {
"id": "cvYBtymraiV5"
},
"source": [
"# Edit this cell with the latitude and longitude of your reporting point\n",
"latitude = 7.81179264\n",
"longitude = -73.8105294"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "BynQ1JM-b7Ot"
},
"source": [
"# This function performs some geoprocessing and may take a few seconds to complete\n",
"reach_id = geoglows.streamflow.latlon_to_reach(latitude, longitude)['reach_id']\n",
"print(reach_id)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "IgEHBLU2cIKH"
},
"source": [
"historical_data = geoglows.streamflow.historic_simulation(reach_id)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "sU5u2LplXFVP"
},
"source": [
"## Step 3: Get other Forecasted Data for correction\n",
"\n",
"We will use the same reach_id as for the historical data to retrieve forecasted streamflow"
]
},
{
"cell_type": "code",
"metadata": {
"id": "OzS-52kJXTuq"
},
"source": [
"stats = geoglows.streamflow.forecast_stats(reach_id)\n",
"ensembles = geoglows.streamflow.forecast_ensembles(reach_id)\n",
"#records = geoglows.streamflow.forecast_records(reach_id)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "n2a5H_IgcREp"
},
"source": [
"## Step 4: Perform the bias correction\n",
"\n",
"Use the `geoglows.bias` tools to correct the bias using your observed data"
]
},
{
"cell_type": "code",
"metadata": {
"id": "XeI66PDMceX2"
},
"source": [
"corrected_historical = geoglows.bias.correct_historical(historical_data, observed_data)\n",
"corrected_stats = geoglows.bias.correct_forecast(stats, historical_data, observed_data)\n",
"corrected_ensembles = geoglows.bias.correct_forecast(ensembles, historical_data, observed_data)\n",
"#corrected_records = geoglows.bias.correct_forecast(records, historical_data, observed_data, use_month=-1)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "esDzS8GLcfWu"
},
"source": [
"## Step 5: Plot the results"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Wr8Hrf23ZWcI"
},
"source": [
"# You can add more entries to the dicionary and they will appear in the title of the graph\n",
"titles = {'Reach ID': reach_id, 'bias_corrected': True}"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "0E8a0ThWYm9A"
},
"source": [
"### Historical Data\n",
"Use the legend on the right of the plot to toggle on/off different layers"
]
},
{
"cell_type": "code",
"metadata": {
"id": "sIv8AZyrchku"
},
"source": [
"# This is a plot of the Original Simulated, Corrected Simulated, and Observed data\n",
"geoglows.plots.corrected_historical(corrected_historical, historical_data, observed_data, titles=titles).show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "RU4ZolI2YvJG"
},
"source": [
"### Forecasted Data\n",
"\n",
"Since there are many lines on the forecast plots, we recommend plotting the adjusted and original forecasts side by side rather than overlaying them together."
]
},
{
"cell_type": "code",
"metadata": {
"id": "Jot99hHVY9Er"
},
"source": [
"# corrected data\n",
"geoglows.plots.forecast_stats(corrected_stats, titles=titles).show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "VcXPFFmwbziV"
},
"source": [
"# original data\n",
"geoglows.plots.forecast_stats(stats).show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "U6TO0LSUsrP4"
},
"source": [
"## Step 6: Statistics, Summaries, Averages, etc\n",
"\n",
"There are many tools in the geoglows package to analyze how much the bias correction improved the streamflow simulations. These are based on the statistical analysis performed by the `hydrostats` and `HydroErr` python packages"
]
},
{
"cell_type": "code",
"metadata": {
"id": "nsia8i_xY-1T"
},
"source": [
"# This is a scatter plot of the original vs simulated data\n",
"geoglows.plots.corrected_scatterplots(corrected_historical, historical_data, observed_data, titles=titles).show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "NV46DW-fhkt_"
},
"source": [
"# This is a plot of the monthly averages\n",
"geoglows.plots.corrected_month_average(corrected_historical, historical_data, observed_data, titles=titles).show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "t9JwQoM8hpf5"
},
"source": [
"# This is a plot of the daily averages\n",
"geoglows.plots.corrected_day_average(corrected_historical, historical_data, observed_data, titles=titles).show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "NePedKrshwG3"
},
"source": [
"# This is a plot of the cumulative annual volumes\n",
"geoglows.plots.corrected_volume_compare(corrected_historical, historical_data, observed_data, titles=titles).show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "WAVHCjljs0WN"
},
"source": [
"# This is a table of a few important statistics \n",
"display(HTML(geoglows.bias.statistics_tables(corrected_historical, historical_data, observed_data)))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "MmHsNOMGib9x"
},
"source": [
"## Download your corrected results as CSV"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ZqUqce6mjsMU"
},
"source": [
"corrected_historical.to_csv('corrected_historical_streamflow.csv')\n",
"files.download('corrected_historical_streamflow.csv')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Kx3TNdB1K46H"
},
"source": [
"corrected_stats.to_csv('corrected_forecast_stats.csv')\n",
"files.download('corrected_forecast_stats.csv')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "eeLP-_zMK3S6"
},
"source": [
"corrected_ensembles.to_csv('corrected_forecasted_ensembles.csv')\n",
"files.download('corrected_forecasted_ensembles.csv')"
],
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment