Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rutgerhofste/a75b588c36632d5c2913b4fd541434de to your computer and use it in GitHub Desktop.
Save rutgerhofste/a75b588c36632d5c2913b4fd541434de to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"from sqlalchemy import *\n",
"import datetime\n",
"\n",
"\n",
"DATABASE_ENDPOINT = \"aqueduct30v05.cgpnumwmfcqc.eu-central-1.rds.amazonaws.com\"\n",
"DATABASE_NAME = \"database01\"\n",
"TABLE_NAME = \"y2018m05d29_rh_total_demand_postgis_30spfaf06_v01_v01\"\n",
"\n",
"F = open(\"/.password\",\"r\")\n",
"password = F.read().splitlines()[0]\n",
"F.close()\n",
"\n",
"engine = create_engine(\"postgresql://rutgerhofste:{}@{}:5432/{}\".format(password,DATABASE_ENDPOINT,DATABASE_NAME))\n",
"connection = engine.connect()\n",
"\n",
"sql = \"SELECT * FROM {} \\\n",
" LIMIT 10\".format(TABLE_NAME)\n",
"\n",
"df = pd.read_sql(sql, connection)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Some Background\n",
"\n",
"the database is the result of running zonal statistics on a climate model. I am relatively new to postgreSQL and haven't set any indexes yet. The database is on AWS RDS on an x.large instance. None of the columns is unique. \"pfafid_30spfaf06\" is a zonal code for water basins. There are in total appr. 16000 unique pfaf_ids. year [1960-2014] month [1-12], temporal_resolution [\"year\",\"month\"]\n",
"\n",
"Result of (PgAdmin) : \n",
"`SELECT pg_size_pretty(pg_total_relation_size('\"public\".\"y2018m05d29_rh_total_demand_postgis_30spfaf06_v01_v01\"'));`\n",
"\n",
"Successfully run. Total query runtime: 425 msec.\n",
"1 rows affected:\n",
"\n",
"11 GB\n",
"\n",
"\n",
"Result of (PgAdmin) :\n",
"`SELECT count(*) FROM y2018m05d29_rh_total_demand_postgis_30spfaf06_v01_v01`\n",
"\n",
"Successfully run. Total query runtime: 52 secs.\n",
"1 rows affected.\n",
"\n",
"11715275 i.e. 11,715,275 rows\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>pfafid_30spfaf06</th>\n",
" <th>area_count_30spfaf06</th>\n",
" <th>area_m2_30spfaf06</th>\n",
" <th>month</th>\n",
" <th>pdomwn_count_30spfaf06</th>\n",
" <th>pdomwn_m_30spfaf06</th>\n",
" <th>pdomww_count_30spfaf06</th>\n",
" <th>pdomww_m_30spfaf06</th>\n",
" <th>pindwn_count_30spfaf06</th>\n",
" <th>pindwn_m_30spfaf06</th>\n",
" <th>...</th>\n",
" <th>plivww_count_30spfaf06</th>\n",
" <th>plivww_m_30spfaf06</th>\n",
" <th>riverdischarge_count_30spfaf06</th>\n",
" <th>riverdischarge_m_30spfaf06</th>\n",
" <th>temporal_resolution</th>\n",
" <th>year</th>\n",
" <th>input_file_name</th>\n",
" <th>datetime_stamp</th>\n",
" <th>ptotwn_m_30spfaf06</th>\n",
" <th>ptotww_m_30spfaf06</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>482012</td>\n",
" <td>15572</td>\n",
" <td>1.027443e+10</td>\n",
" <td>12</td>\n",
" <td>15572</td>\n",
" <td>0.000007</td>\n",
" <td>15572</td>\n",
" <td>0.000007</td>\n",
" <td>15572</td>\n",
" <td>1.803221e-07</td>\n",
" <td>...</td>\n",
" <td>15572</td>\n",
" <td>0.000004</td>\n",
" <td>15462.0</td>\n",
" <td>0.078879</td>\n",
" <td>year</td>\n",
" <td>1967</td>\n",
" <td>global_historical_merged_year_m_30sPfaf06_1960...</td>\n",
" <td>2018-05-31 16:42:33.339340</td>\n",
" <td>0.000089</td>\n",
" <td>0.000218</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>482013</td>\n",
" <td>2665</td>\n",
" <td>1.749258e+09</td>\n",
" <td>12</td>\n",
" <td>2665</td>\n",
" <td>0.000018</td>\n",
" <td>2665</td>\n",
" <td>0.000018</td>\n",
" <td>2665</td>\n",
" <td>0.000000e+00</td>\n",
" <td>...</td>\n",
" <td>2665</td>\n",
" <td>0.000005</td>\n",
" <td>2665.0</td>\n",
" <td>0.463177</td>\n",
" <td>year</td>\n",
" <td>1967</td>\n",
" <td>global_historical_merged_year_m_30sPfaf06_1960...</td>\n",
" <td>2018-05-31 16:42:33.339340</td>\n",
" <td>0.000058</td>\n",
" <td>0.000115</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>482014</td>\n",
" <td>7927</td>\n",
" <td>5.247636e+09</td>\n",
" <td>12</td>\n",
" <td>7927</td>\n",
" <td>0.000020</td>\n",
" <td>7927</td>\n",
" <td>0.000020</td>\n",
" <td>7927</td>\n",
" <td>7.687544e-06</td>\n",
" <td>...</td>\n",
" <td>7927</td>\n",
" <td>0.000016</td>\n",
" <td>7872.0</td>\n",
" <td>0.000123</td>\n",
" <td>year</td>\n",
" <td>1967</td>\n",
" <td>global_historical_merged_year_m_30sPfaf06_1960...</td>\n",
" <td>2018-05-31 16:42:33.339340</td>\n",
" <td>0.000115</td>\n",
" <td>0.000265</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>482015</td>\n",
" <td>6268</td>\n",
" <td>4.100086e+09</td>\n",
" <td>12</td>\n",
" <td>6268</td>\n",
" <td>0.000020</td>\n",
" <td>6268</td>\n",
" <td>0.000020</td>\n",
" <td>6268</td>\n",
" <td>0.000000e+00</td>\n",
" <td>...</td>\n",
" <td>6268</td>\n",
" <td>0.000004</td>\n",
" <td>6268.0</td>\n",
" <td>0.197547</td>\n",
" <td>year</td>\n",
" <td>1967</td>\n",
" <td>global_historical_merged_year_m_30sPfaf06_1960...</td>\n",
" <td>2018-05-31 16:42:33.339340</td>\n",
" <td>0.000023</td>\n",
" <td>0.000023</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>482016</td>\n",
" <td>51248</td>\n",
" <td>3.393310e+10</td>\n",
" <td>12</td>\n",
" <td>51248</td>\n",
" <td>0.000033</td>\n",
" <td>51248</td>\n",
" <td>0.000033</td>\n",
" <td>51248</td>\n",
" <td>2.805116e-05</td>\n",
" <td>...</td>\n",
" <td>51248</td>\n",
" <td>0.000027</td>\n",
" <td>51248.0</td>\n",
" <td>0.023862</td>\n",
" <td>year</td>\n",
" <td>1967</td>\n",
" <td>global_historical_merged_year_m_30sPfaf06_1960...</td>\n",
" <td>2018-05-31 16:42:33.339340</td>\n",
" <td>0.001410</td>\n",
" <td>0.003697</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 28 columns</p>\n",
"</div>"
],
"text/plain": [
" pfafid_30spfaf06 area_count_30spfaf06 area_m2_30spfaf06 month \\\n",
"0 482012 15572 1.027443e+10 12 \n",
"1 482013 2665 1.749258e+09 12 \n",
"2 482014 7927 5.247636e+09 12 \n",
"3 482015 6268 4.100086e+09 12 \n",
"4 482016 51248 3.393310e+10 12 \n",
"\n",
" pdomwn_count_30spfaf06 pdomwn_m_30spfaf06 pdomww_count_30spfaf06 \\\n",
"0 15572 0.000007 15572 \n",
"1 2665 0.000018 2665 \n",
"2 7927 0.000020 7927 \n",
"3 6268 0.000020 6268 \n",
"4 51248 0.000033 51248 \n",
"\n",
" pdomww_m_30spfaf06 pindwn_count_30spfaf06 pindwn_m_30spfaf06 \\\n",
"0 0.000007 15572 1.803221e-07 \n",
"1 0.000018 2665 0.000000e+00 \n",
"2 0.000020 7927 7.687544e-06 \n",
"3 0.000020 6268 0.000000e+00 \n",
"4 0.000033 51248 2.805116e-05 \n",
"\n",
" ... plivww_count_30spfaf06 plivww_m_30spfaf06 \\\n",
"0 ... 15572 0.000004 \n",
"1 ... 2665 0.000005 \n",
"2 ... 7927 0.000016 \n",
"3 ... 6268 0.000004 \n",
"4 ... 51248 0.000027 \n",
"\n",
" riverdischarge_count_30spfaf06 riverdischarge_m_30spfaf06 \\\n",
"0 15462.0 0.078879 \n",
"1 2665.0 0.463177 \n",
"2 7872.0 0.000123 \n",
"3 6268.0 0.197547 \n",
"4 51248.0 0.023862 \n",
"\n",
" temporal_resolution year \\\n",
"0 year 1967 \n",
"1 year 1967 \n",
"2 year 1967 \n",
"3 year 1967 \n",
"4 year 1967 \n",
"\n",
" input_file_name \\\n",
"0 global_historical_merged_year_m_30sPfaf06_1960... \n",
"1 global_historical_merged_year_m_30sPfaf06_1960... \n",
"2 global_historical_merged_year_m_30sPfaf06_1960... \n",
"3 global_historical_merged_year_m_30sPfaf06_1960... \n",
"4 global_historical_merged_year_m_30sPfaf06_1960... \n",
"\n",
" datetime_stamp ptotwn_m_30spfaf06 ptotww_m_30spfaf06 \n",
"0 2018-05-31 16:42:33.339340 0.000089 0.000218 \n",
"1 2018-05-31 16:42:33.339340 0.000058 0.000115 \n",
"2 2018-05-31 16:42:33.339340 0.000115 0.000265 \n",
"3 2018-05-31 16:42:33.339340 0.000023 0.000023 \n",
"4 2018-05-31 16:42:33.339340 0.001410 0.003697 \n",
"\n",
"[5 rows x 28 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I know combining month and year in one datetime column is best practice but for future use, keeping them separate is easier."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"The query I like to run calculates a 10 year moving average for three columns: \n",
"1. ptotwn_m_30spfaf06\n",
"1. ptotww_m_30spfaf06\n",
"1. riverdischarge_m_30spfaf06\n",
"\n",
"For axample the 10y annual moving average of 1969 is the average of 1960 - 1969. For a monthly moving average the average is filtered by month: average of jan 1960 jan 1961 ... jan 1969. \n",
"\n",
"The query I have so far:\n",
"\n",
"`\n",
"SELECT year, ptotww_m_30spfaf06, temporal_resolution,\n",
" SUM(ptotww_m_30spfaf06)\n",
" OVER(ORDER BY year ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as ptotwwma_m_30spfaf06 \n",
"FROM y2018m05d29_rh_total_demand_postgis_30spfaf06_v01_v01\n",
"WHERE temporal_resolution = 'year'\n",
"LIMIT 200`\n",
"\n",
"However this is slow (need to set index? Which columns? year, month?) and does not work for the monthly scores. \n",
"\n",
"Successfully run. Total query runtime: 52 secs.\n",
"200 rows affected.\n",
"\n",
"Which is quite slow. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 35",
"language": "python",
"name": "python35"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment