Skip to content

Instantly share code, notes, and snippets.

@romer8
Last active September 26, 2022 04:23
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save romer8/89c851014afb276b0f20cb61c9c731f6 to your computer and use it in GitHub Desktop.
pywaterml_template.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "pywaterml_template.ipynb",
"provenance": [],
"collapsed_sections": [
"GdsIhCI1kW42"
],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/romer8/89c851014afb276b0f20cb61c9c731f6/pywaterml_template.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PgIgUWTOwCqL"
},
"source": [
"# Service_Name\n",
"please change this with the name of you WOF service"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "L1jSFY-jfnqs"
},
"source": [
"### Contents\n",
"1. [Variables for the Analysis](#intro)\n",
"2. [Install Pywaterml and Additional Dependencies](#install)\n",
"3. [Add WOF Web Service](#addService)\n",
"4. [Sites](#GetSites)\n",
"6. [Variables](#GetVariables)\n",
"8. [Site Information](#GetSiteInfo)\n",
"9. [Time Series Values for Variables](#GetValues)\n",
"\n",
"10. [Sites by Bounding Box](#GetSitesByBoxObject)\n",
"11. [Sites Filtered by Variables](#GetSitesByVariable)\n",
"12. [Interpolation](#GetInterpolation)\n",
"13. [Monthly Averages for Variables](#GetMonthlyAverage)\n",
"14. [Time Series Clusters from Monthly Averages](#GetClustersMonthlyAvg)\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iCZsmAefqm-D"
},
"source": [
"<a name=\"intro\"></a>\n",
"## Variables for the Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "c3n1n3MVPUTs"
},
"source": [
"The following are variables to use this notebook, so just change the following, and you should be able to run the notebook and see the output.\n",
"\n",
"*Note that the VARIABLE and SITE variables are the index from the responses of GetVariables and GetSites respectively. For example, changing VARIABLE = 1 will give the second variable in from the GetVariables methods. Similarly,the SITES variable functions in the same way.*"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ioWosThiPbvg"
},
"source": [
"WOF_URL = \"http://128.187.106.131/app/index.php/dr/services/cuahsi_1_1.asmx?WSDL\" \n",
"VARIABLE = 0 \n",
"SITE = 0 \n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "GdsIhCI1kW42"
},
"source": [
"<a name=\"install\"></a>\n",
"### Install Pywaterml and Additional Dependencies"
]
},
{
"cell_type": "code",
"metadata": {
"id": "MtWfYP3TgfR_"
},
"source": [
"!pip install pywaterml\n",
"!pip install folium\n",
"!pip install plotly\n",
"!pip install pyproj"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "axKiaVU1pjQe"
},
"source": [
"import pywaterml.waterML as pwml\n",
"import folium\n",
"import plotly.graph_objects as go\n",
"import plotly.express as px\n",
"import pandas as pd\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "0YN3fO0dsmFC"
},
"source": [
"<a name=\"addService\"></a>\n",
"### WOF Web Service"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rqGI2HnKslup"
},
"source": [
"Any WOF web service can be added to the wrapper class WaterMLOperations from the pywaterml package. In the link below, please change the <em>hs_url</em> variable below.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "hfZV_E02lEfs"
},
"source": [
"try:\n",
" water = pwml.WaterMLOperations(url = WOF_URL)\n",
"except Exception as e:\n",
" print(e)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "z8skY4N5tq7G"
},
"source": [
"<a name=\"GetSites\"></a>\n",
"\n",
"### Sites"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pG7eT4WqzIpb"
},
"source": [
"The GetSites method retrieves meta data for each site in the WOF web service. Data can be retrieved in three different formats: JSON, CSV, and WaterML"
]
},
{
"cell_type": "code",
"metadata": {
"id": "SZNMaxpwrc2D"
},
"source": [
"try:\n",
" sites = water.GetSites()\n",
" df = pd.DataFrame.from_dict(sites)\n",
" print(df)\n",
"except Exception as e:\n",
" print(e)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "wGdgtYByuvb0"
},
"source": [
"The following code is a helpful code snipped to plot the different stations in a follium map"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ld6QSYA2j_Wk"
},
"source": [
"try:\n",
" m = folium.Map(location=[40.0150, -105.2705], \n",
" tiles = 'Stamen Terrain')\n",
" sitesLocations = []\n",
" for site in sites:\n",
" sitesLocations.append([float(site['latitude']) , float(site['longitude'])])\n",
" folium.Marker(\n",
" location= [float(site['latitude']) , float(site['longitude'])],\n",
" icon=folium.Icon()\n",
" ).add_to(m)\n",
" m.fit_bounds(sitesLocations)\n",
" m\n",
"except Exception as e:\n",
" print(e)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "0UKt9Ty7xQfP"
},
"source": [
"<a name=\"GetVariables\"></a>\n",
"### Variables"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Z-Ntb2bWxVR-"
},
"source": [
"The GetVariables method retrieves metadata from the different variables in the WOF web service."
]
},
{
"cell_type": "code",
"metadata": {
"id": "v3DOPKYXsJ7i"
},
"source": [
"pd.set_option('display.max_columns', None)\n",
"try:\n",
" variables = water.GetVariables()\n",
" df = pd.DataFrame.from_dict(variables['variables'])\n",
"\n",
" print(df.head())\n",
"except Exception as e:\n",
" print(e)\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "MoeHbfokzByc"
},
"source": [
"<a name=\"GetSiteInfo\"></a>\n",
"### Site Information"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JeVHdm8gzTx1"
},
"source": [
"Retrieves the information of a given site. GetSiteInfo() function is similar to the GetSiteInfo() WaterML function.\n",
"\n",
"<em>Please note that the site used in this example is configured to use the SITE variable at the beginning. If you want to change the site, you need to change the SITE variable at the beginning of the code</em>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "v4J-ZB9esVlp"
},
"source": [
"try:\n",
" site_full_code = sites[SITE]['fullSiteCode']\n",
" siteInfo = water.GetSiteInfo(site_full_code)\n",
" df = pd.DataFrame.from_dict(siteInfo['siteInfo'])\n",
" print(df)\n",
"except Exception as e:\n",
" print(e)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "UyYmqDND2pDj"
},
"source": [
"<a name=\"GetValues\"></a>\n",
"### Time Series Values for Variables"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "l3I2khI92hFu"
},
"source": [
"Retrieves the values for an specific variable in a site. GetValues() function is similar to the GetValues() WaterML function.\n",
"\n",
"<em>Please note that the variable used in this example is configured to use the VARIABLE variable at the beginning. If you want to change the variable, you need to change the VARIABLE variable at the beginning of the code</em>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "I4mlk7O1mn_s"
},
"source": [
"try:\n",
" variable_full_code = siteInfo['siteInfo'][VARIABLE]['fullVariableCode']\n",
" start_date = siteInfo['siteInfo'][VARIABLE]['beginDateTime'].split('T')[0]\n",
" end_date = siteInfo['siteInfo'][VARIABLE]['endDateTime'].split('T')[0]\n",
" variableResponse= water.GetValues(site_full_code, variable_full_code, start_date, end_date)\n",
" df = pd.DataFrame.from_dict(variableResponse['values'])\n",
" print(df)\n",
"except Exception as e:\n",
" print(e)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "yETx0fg33IoJ"
},
"source": [
"The following is a code snipped for plotting the results of the GetValues function using the Plotly library.\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "diZKW5aB5qQv"
},
"source": [
"try:\n",
" ##PLOT THE TIME SERIES\n",
" timeStamps=[]\n",
" valuesTimeSeries = []\n",
" for index in variableResponse['values']:\n",
" timeStamps.append(index['dateTimeUTC'])\n",
" valuesTimeSeries.append(index['dataValue'])\n",
"\n",
" fig = go.Figure(data=go.Scatter(x=timeStamps, y=valuesTimeSeries))\n",
" # Edit the layout\n",
" fig.update_layout(title = variableResponse['values'][VARIABLE]['variableName'],\n",
" xaxis_title =variableResponse['values'][VARIABLE]['timeUnitAbbreviation'],\n",
" yaxis_title = variableResponse['values'][VARIABLE]['unitAbbreviation'])\n",
" fig.show()\n",
"\n",
" df = pd.DataFrame(dict(\n",
" data=valuesTimeSeries\n",
" ))\n",
" fig = px.box(df,y=\"data\", points=\"all\")\n",
" fig.show()\n",
" \n",
"except Exception as e:\n",
" print(e)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "7_TeQpmbMEgx"
},
"source": [
"## Extra Functionality Methods"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0vjseuQkMOWe"
},
"source": [
"The following are extra functionality from the package, and it includes the following methods:\n",
"\n",
"1. GetSitesByBoxObject : Get all the sites from a bounding box from a endpoint that complies to the SOAP protocol. The GetSitesByBoxObject() function is similar to the GetSitesByBoxObject() WaterML function.\n",
"\n",
"2. GetSitesByVariable: Get the specific sites according to a variable search array from a endpoint that complies to the SOAP protocol. The GetSitesByVariable() is an addition to the WaterML functions because it allows the user to retrieve sites that contains the epecific site/s.\n",
"\n",
"3. GetInterpolation : Interpolates the data given by the GetValues function in order to fix datasets with missing values. Three ooptions for interpolation are offered: mean, backward, forward. The default is the mean interpolation.\n",
"\n",
"4. GetMonthlyAverage : Gets the monthly averages for a given variable, or from the response given by the GetValues function for a given site.\n",
"\n",
"5. GetClustersMonthlyAvg : Gets “n” number of clusters using dtw time series interpolation for a given variable.\n",
"\n",
"Please note that some functions such as GetSitesByVariable, GetMonthlyAverage, and GetClustersMonthlyAvg takes time depending in the number of sites that are in the WOF service\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HZ4vWfkLwonY"
},
"source": [
"<a name=\"GetSitesByBoxObject\"></a>\n",
"\n",
"### Sites By Bounding box"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6rORLbDkwvj7"
},
"source": [
"The GetSitesByBoxObject method retrieves meta data for each site from a given lat/lon bounding box in the WOF web service. Data can be retrieved in three different formats: JSON, CSV, and WaterML"
]
},
{
"cell_type": "code",
"metadata": {
"id": "bgdyHVK7m_wY"
},
"source": [
"try: \n",
" Bounds = m.get_bounds()\n",
" BoundsRearranged = [Bounds[0][1],Bounds[0][0],Bounds[1][1],Bounds[1][0]]\n",
" SitesByBoundingBox = water.GetSitesByBoxObject(BoundsRearranged,'epsg:4326')\n",
" df = pd.DataFrame.from_dict(SitesByBoundingBox)\n",
" print(df)\n",
"except Exception as e:\n",
" print(e)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "jYrhXF_5P9Fn"
},
"source": [
"<a name=\"GetSitesByVariable\"></a>\n",
"### Sites Filtered by Variables"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Psyx8nz3P9Fp"
},
"source": [
"The GetSitesByVariable methods retrieves metadata from the sites that contains certain variables in the WOF web service\n",
"\n",
"<em>Please note that the variable used in this example is configured to use the VARIABLE variable at the beginning. If you want to change the variable, you need to change the VARIABLE variable at the beginning of the code</em>\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Kvi1i7o6P9Ft"
},
"source": [
"try: \n",
" variablesTest = [variables['variables'][0]['variableCode']]\n",
"\n",
" sitesFiltered = water.GetSitesByVariable(variablesTest,sites)\n",
" df = pd.DataFrame.from_dict(sitesFiltered['sites'])\n",
" print(df)\n",
"except Exception as e:\n",
" print(e)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "7tayhz1L4ShT"
},
"source": [
"<a name=\"GetInterpolation\"></a>\n",
"### Interpolations"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "V0xGv07qmeAP"
},
"source": [
"Interpolates the data given by the GetValues function in order to fix datasets with missing values. Three ooptions for interpolation are offered: mean, backward, forward. The default is the mean interpolation"
]
},
{
"cell_type": "code",
"metadata": {
"id": "jW9hBijwoqUn"
},
"source": [
"try:\n",
" mean_interpolation = water.GetInterpolation(variableResponse,'mean')\n",
" backward_interpolation = water.GetInterpolation(variableResponse,'backward')\n",
" forward_interpolation = water.GetInterpolation(variableResponse,'forward')\n",
" df_mean = pd.DataFrame.from_dict(mean_interpolation)\n",
" df_back = pd.DataFrame.from_dict(backward_interpolation)\n",
" df_ford = pd.DataFrame.from_dict(forward_interpolation)\n",
"\n",
" print(\"INTERPOLATIONS\")\n",
" # print(df_back)\n",
" print(df_mean)\n",
" # print(df_ford)\n",
"except Exception as e:\n",
" print(e)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "cXUBcAuomsNP"
},
"source": [
"The following is a code snipped for plotting the results of the GetInterpolation method using the Plotly library."
]
},
{
"cell_type": "code",
"metadata": {
"id": "DOxyWtwyo0sl"
},
"source": [
"try: \n",
" ##PLOT THE TIME SERIES FOR INTERPOLATIONS\n",
" timeStamps=[]\n",
" valuesTimeSeries = []\n",
" timeStampsBack=[]\n",
" valuesTimeSeriesBackward = []\n",
" timeStampsFor=[]\n",
" valuesTimeSeriesForward = []\n",
"\n",
" for index in mean_interpolation:\n",
" timeStamps.append(index[0])\n",
" valuesTimeSeries.append(index[1])\n",
"\n",
" for index in backward_interpolation:\n",
" timeStampsBack.append(index[0])\n",
" valuesTimeSeriesBackward.append(index[1])\n",
"\n",
" for index in forward_interpolation:\n",
" timeStampsFor.append(index[0])\n",
" valuesTimeSeriesForward.append(index[1])\n",
"\n",
" fig = go.Figure(data=go.Scatter(x=timeStamps, y=valuesTimeSeries, mode='lines',\n",
" name='Mean Interpolation'))\n",
" fig.add_trace(go.Scatter(x=timeStampsBack, y=valuesTimeSeriesBackward,\n",
" mode='lines',\n",
" name='Backward Interpolation'))\n",
" fig.add_trace(go.Scatter(x=timeStampsFor, y=valuesTimeSeriesForward,\n",
" mode='lines',\n",
" name='Forward Interpolation'))\n",
"\n",
" # Edit the layout\n",
" fig.update_layout(title = variableResponse['values'][0]['variableName'],\n",
" xaxis_title =variableResponse['values'][0]['timeUnitAbbreviation'],\n",
" yaxis_title = variableResponse['values'][0]['unitAbbreviation'])\n",
" fig.show()\n",
" ##ONLY MEAN INTERPOLATION WHISKER AND BOX PLOT\n",
"\n",
" df = pd.DataFrame(dict(\n",
" data=valuesTimeSeries\n",
" ))\n",
" fig = px.box(df,y=\"data\", points=\"all\")\n",
" fig.show()\n",
"except Exception as e:\n",
" print(e)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "VP69FShLm4wE"
},
"source": [
"<a name=\"GetMonthlyAverage\"></a>\n",
"### Monthly Averages for Variables"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BkGKWVn9m4ik"
},
"source": [
"Gets the monthly averages for a given variable, or from the response given by the GetValues function for a given site."
]
},
{
"cell_type": "code",
"metadata": {
"id": "3hu3bUo-uGtO"
},
"source": [
"try:\n",
" ##CALCULATING THE MONTHLY AVERAGES\n",
" m_avg = water.GetMonthlyAverage(None, site_full_code, variable_full_code, start_date, end_date)\n",
"\n",
" data = {'Months': ['Jan', 'Feb', 'Mar', 'Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'],\n",
" 'Monthly Average': m_avg,\n",
" }\n",
" df = pd.DataFrame (data, columns = ['Months','Monthly Average'])\n",
"\n",
" print (df)\n",
"except Exception as e:\n",
" print(e)\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "a6iV1Zl5oE45"
},
"source": [
"<a name=\"GetClustersMonthlyAvg\"></a>\n",
"\n",
"### Time Series Clusters from Monthly Averages"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nlBxm9wroHy9"
},
"source": [
"Gets “n” number of clusters using dtw time series interpolation for a given variable."
]
},
{
"cell_type": "code",
"metadata": {
"id": "KVT75Uk1_OfC"
},
"source": [
"try:\n",
" ## Calculating Clustering\n",
" Clusters = water.GetClustersMonthlyAvg(sites,siteInfo['siteInfo'][VARIABLE]['variableCode'])\n",
" print(Clusters)\n",
"\n",
"except Exception as e:\n",
" print(e)\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "zc_NJkEzoNgi"
},
"source": [
"The following is extra code to create a list of colors and plot the different clusters in a follium map."
]
},
{
"cell_type": "code",
"metadata": {
"id": "GbDmMvBPrRD7"
},
"source": [
"try:\n",
" ## Function to select the color for the cluster.\n",
" def colorToCluster(cluster):\n",
" colors = [\n",
" 'red',\n",
" 'blue',\n",
" 'green',\n",
" 'pink',\n",
" 'purple',\n",
" 'orange',\n",
" ]\n",
" return colors[cluster]\n",
"\n",
" ## Looking in the Map the different clusters\n",
" m = folium.Map(location=[40.0150, -105.2705], \n",
" tiles = 'Stamen Terrain')\n",
" sitesLocations = []\n",
" for site,singlecluster in zip(sitesFiltered['sites'],Clusters):\n",
" sitesLocations.append([float(site['latitude']) , float(site['longitude'])])\n",
" folium.Marker(\n",
" location= [float(site['latitude']) , float(site['longitude'])],\n",
" # popup= sites[\"name\"], # pop-up label for the marker\n",
" icon=folium.Icon(color=colorToCluster(singlecluster[1]))\n",
" ).add_to(m)\n",
" # Display m\n",
" m.fit_bounds(sitesLocations)\n",
" m\n",
"except Exception as e:\n",
" print(e)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "kFTcLsWcoZXR"
},
"source": [
"The following is extra code to plot the time series for the different clusters in the WOF web service"
]
},
{
"cell_type": "code",
"metadata": {
"id": "inFVIaDnNipE"
},
"source": [
" try: \n",
" m_arr = [\"Jan\", \"Feb\", \"Mar\", \"Apr\", \"May\", \"Jun\", \"Jul\",\"Aug\", \"Sep\", \"Oct\", \"Nov\", \"Dec\"]\n",
" fig = go.Figure(data=go.Scatter(x = m_arr, y = Clusters[0][0], mode='lines',\n",
" name='1'))\n",
" for index in range (1, len(Clusters)):\n",
" if Clusters[index][1] == 0:\n",
" fig.add_trace(go.Scatter(x=m_arr, y=Clusters[index][0],\n",
" mode='lines',\n",
" name=str(index)))\n",
"\n",
"\n",
" # Edit the layout\n",
" fig.update_layout(title = \"Graph for First Cluster\",\n",
" xaxis_title ='Months',\n",
" yaxis_title = variableResponse['values'][VARIABLE]['unitAbbreviation'])\n",
" fig.show()\n",
" fig = go.Figure(data=go.Scatter(x = m_arr, y = Clusters[0][0], mode='lines',\n",
" name='1'))\n",
" for index in range (1, len(Clusters)):\n",
" if Clusters[index][1] == 1:\n",
" fig.add_trace(go.Scatter(x=m_arr, y=Clusters[index][0],\n",
" mode='lines',\n",
" name=str(index)))\n",
"\n",
"\n",
" # Edit the layout\n",
" fig.update_layout(title = \"Graph for Second Cluster\",\n",
" xaxis_title ='Months',\n",
" yaxis_title =variableResponse['values'][VARIABLE]['unitAbbreviation'])\n",
" fig.show()\n",
" fig = go.Figure(data=go.Scatter(x = m_arr, y = Clusters[0][0], mode='lines',\n",
" name='1'))\n",
" for index in range (1, len(Clusters)):\n",
" if Clusters[index][1] == 2:\n",
" fig.add_trace(go.Scatter(x=m_arr, y=Clusters[index][0],\n",
" mode='lines',\n",
" name=str(index)))\n",
"\n",
"\n",
" # Edit the layout\n",
" fig.update_layout(title = \"Graph for Third Cluster\",\n",
" xaxis_title ='Months',\n",
" yaxis_title = variableResponse['values'][VARIABLE]['unitAbbreviation'])\n",
" fig.show()\n",
"except Exception as e:\n",
" print(e)"
],
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment