Skip to content

Instantly share code, notes, and snippets.

@avcaliani
Last active July 24, 2020 18:51
Show Gist options
  • Save avcaliani/5a5793ac4b0da431481c03b524f396e2 to your computer and use it in GitHub Desktop.
Save avcaliani/5a5793ac4b0da431481c03b524f396e2 to your computer and use it in GitHub Desktop.
wwii-weather.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "wwii-weather.ipynb",
"provenance": [],
"collapsed_sections": [],
"toc_visible": true,
"authorship_tag": "ABX9TyNtnhZawS518fXZsDcOodar",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/avcaliani/5a5793ac4b0da431481c03b524f396e2/wwii-weather.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bE8Uwi4UqwH3",
"colab_type": "text"
},
"source": [
"<img src=\"https://user-images.githubusercontent.com/15377830/87998356-2aa60180-cae7-11ea-9387-4e78495ad34f.png\" align=\"right\" height=\"64\" width=\"64\"/>\n",
"\n",
"# Weather Conditions in World War II\n",
"\n",
"This notebook analyses [\"Weather Conditions in World War Two\"](https://www.kaggle.com/smid80/weatherww2) data.\n",
"\n",
"What are we going to do here? \n",
"First, we are going to **explore the dataset** then let's try to calculate the **average temperature** for each **french station** for each **year** available in **Farenheit** degrees.\n",
"\n",
"\n",
"Before I forget, thanks to [Shane Smith](https://www.kaggle.com/smid80) for publishing the dataset.\n",
"\n",
"**NOW... LET'S ROCK 🤘**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZC-hmvk3x8yO",
"colab_type": "text"
},
"source": [
"### Uploading the dataset\n",
"**Download** the dataset from [Kaggle](https://www.kaggle.com/smid80/weatherww2). \n",
"Then **upload** `Summary of Weather.csv` and `Weather Station Locations.csv` files. \n",
"**After uploading** you will see these files on your left (📁 folder icon).\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "mK9JYOXCy8qp",
"colab_type": "code",
"colab": {
"resources": {
"http://localhost:8080/nbextensions/google.colab/files.js": {
"data": "Ly8gQ29weXJpZ2h0IDIwMTcgR29vZ2xlIExMQwovLwovLyBMaWNlbnNlZCB1bmRlciB0aGUgQXBhY2hlIExpY2Vuc2UsIFZlcnNpb24gMi4wICh0aGUgIkxpY2Vuc2UiKTsKLy8geW91IG1heSBub3QgdXNlIHRoaXMgZmlsZSBleGNlcHQgaW4gY29tcGxpYW5jZSB3aXRoIHRoZSBMaWNlbnNlLgovLyBZb3UgbWF5IG9idGFpbiBhIGNvcHkgb2YgdGhlIExpY2Vuc2UgYXQKLy8KLy8gICAgICBodHRwOi8vd3d3LmFwYWNoZS5vcmcvbGljZW5zZXMvTElDRU5TRS0yLjAKLy8KLy8gVW5sZXNzIHJlcXVpcmVkIGJ5IGFwcGxpY2FibGUgbGF3IG9yIGFncmVlZCB0byBpbiB3cml0aW5nLCBzb2Z0d2FyZQovLyBkaXN0cmlidXRlZCB1bmRlciB0aGUgTGljZW5zZSBpcyBkaXN0cmlidXRlZCBvbiBhbiAiQVMgSVMiIEJBU0lTLAovLyBXSVRIT1VUIFdBUlJBTlRJRVMgT1IgQ09ORElUSU9OUyBPRiBBTlkgS0lORCwgZWl0aGVyIGV4cHJlc3Mgb3IgaW1wbGllZC4KLy8gU2VlIHRoZSBMaWNlbnNlIGZvciB0aGUgc3BlY2lmaWMgbGFuZ3VhZ2UgZ292ZXJuaW5nIHBlcm1pc3Npb25zIGFuZAovLyBsaW1pdGF0aW9ucyB1bmRlciB0aGUgTGljZW5zZS4KCi8qKgogKiBAZmlsZW92ZXJ2aWV3IEhlbHBlcnMgZm9yIGdvb2dsZS5jb2xhYiBQeXRob24gbW9kdWxlLgogKi8KKGZ1bmN0aW9uKHNjb3BlKSB7CmZ1bmN0aW9uIHNwYW4odGV4dCwgc3R5bGVBdHRyaWJ1dGVzID0ge30pIHsKICBjb25zdCBlbGVtZW50ID0gZG9jdW1lbnQuY3JlYXRlRWxlbWVudCgnc3BhbicpOwogIGVsZW1lbnQudGV4dENvbnRlbnQgPSB0ZXh0OwogIGZvciAoY29uc3Qga2V5IG9mIE9iamVjdC5rZXlzKHN0eWxlQXR0cmlidXRlcykpIHsKICAgIGVsZW1lbnQuc3R5bGVba2V5XSA9IHN0eWxlQXR0cmlidXRlc1trZXldOwogIH0KICByZXR1cm4gZWxlbWVudDsKfQoKLy8gTWF4IG51bWJlciBvZiBieXRlcyB3aGljaCB3aWxsIGJlIHVwbG9hZGVkIGF0IGEgdGltZS4KY29uc3QgTUFYX1BBWUxPQURfU0laRSA9IDEwMCAqIDEwMjQ7CgpmdW5jdGlvbiBfdXBsb2FkRmlsZXMoaW5wdXRJZCwgb3V0cHV0SWQpIHsKICBjb25zdCBzdGVwcyA9IHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCk7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICAvLyBDYWNoZSBzdGVwcyBvbiB0aGUgb3V0cHV0RWxlbWVudCB0byBtYWtlIGl0IGF2YWlsYWJsZSBmb3IgdGhlIG5leHQgY2FsbAogIC8vIHRvIHVwbG9hZEZpbGVzQ29udGludWUgZnJvbSBQeXRob24uCiAgb3V0cHV0RWxlbWVudC5zdGVwcyA9IHN0ZXBzOwoKICByZXR1cm4gX3VwbG9hZEZpbGVzQ29udGludWUob3V0cHV0SWQpOwp9CgovLyBUaGlzIGlzIHJvdWdobHkgYW4gYXN5bmMgZ2VuZXJhdG9yIChub3Qgc3VwcG9ydGVkIGluIHRoZSBicm93c2VyIHlldCksCi8vIHdoZXJlIHRoZXJlIGFyZSBtdWx0aXBsZSBhc3luY2hyb25vdXMgc3RlcHMgYW5kIHRoZSBQeXRob24gc2lkZSBpcyBnb2luZwovLyB0byBwb2xsIGZvciBjb21wbGV0aW9uIG9mIGVhY2ggc3RlcC4KLy8gVGhpcyB1c2VzIGEgUHJvbWlzZSB0byBibG9jayB0aGUgcHl0aG9uIHNpZGUgb24gY29tcGxldGlvbiBvZiBlYWNoIHN0ZXAsCi8vIHRoZW4gcGFzc2VzIHRoZSByZXN1bHQgb2YgdGhlIHByZXZpb3VzIHN0ZXAgYXMgdGhlIGlucHV0IHRvIHRoZSBuZXh0IHN0ZXAuCmZ1bmN0aW9uIF91cGxvYWRGaWxlc0NvbnRpbnVlKG91dHB1dElkKSB7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICBjb25zdCBzdGVwcyA9IG91dHB1dEVsZW1lbnQuc3RlcHM7CgogIGNvbnN0IG5leHQgPSBzdGVwcy5uZXh0KG91dHB1dEVsZW1lbnQubGFzdFByb21pc2VWYWx1ZSk7CiAgcmV0dXJuIFByb21pc2UucmVzb2x2ZShuZXh0LnZhbHVlLnByb21pc2UpLnRoZW4oKHZhbHVlKSA9PiB7CiAgICAvLyBDYWNoZSB0aGUgbGFzdCBwcm9taXNlIHZhbHVlIHRvIG1ha2UgaXQgYXZhaWxhYmxlIHRvIHRoZSBuZXh0CiAgICAvLyBzdGVwIG9mIHRoZSBnZW5lcmF0b3IuCiAgICBvdXRwdXRFbGVtZW50Lmxhc3RQcm9taXNlVmFsdWUgPSB2YWx1ZTsKICAgIHJldHVybiBuZXh0LnZhbHVlLnJlc3BvbnNlOwogIH0pOwp9CgovKioKICogR2VuZXJhdG9yIGZ1bmN0aW9uIHdoaWNoIGlzIGNhbGxlZCBiZXR3ZWVuIGVhY2ggYXN5bmMgc3RlcCBvZiB0aGUgdXBsb2FkCiAqIHByb2Nlc3MuCiAqIEBwYXJhbSB7c3RyaW5nfSBpbnB1dElkIEVsZW1lbnQgSUQgb2YgdGhlIGlucHV0IGZpbGUgcGlja2VyIGVsZW1lbnQuCiAqIEBwYXJhbSB7c3RyaW5nfSBvdXRwdXRJZCBFbGVtZW50IElEIG9mIHRoZSBvdXRwdXQgZGlzcGxheS4KICogQHJldHVybiB7IUl0ZXJhYmxlPCFPYmplY3Q+fSBJdGVyYWJsZSBvZiBuZXh0IHN0ZXBzLgogKi8KZnVuY3Rpb24qIHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCkgewogIGNvbnN0IGlucHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKGlucHV0SWQpOwogIGlucHV0RWxlbWVudC5kaXNhYmxlZCA9IGZhbHNlOwoKICBjb25zdCBvdXRwdXRFbGVtZW50ID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQob3V0cHV0SWQpOwogIG91dHB1dEVsZW1lbnQuaW5uZXJIVE1MID0gJyc7CgogIGNvbnN0IHBpY2tlZFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgaW5wdXRFbGVtZW50LmFkZEV2ZW50TGlzdGVuZXIoJ2NoYW5nZScsIChlKSA9PiB7CiAgICAgIHJlc29sdmUoZS50YXJnZXQuZmlsZXMpOwogICAgfSk7CiAgfSk7CgogIGNvbnN0IGNhbmNlbCA9IGRvY3VtZW50LmNyZWF0ZUVsZW1lbnQoJ2J1dHRvbicpOwogIGlucHV0RWxlbWVudC5wYXJlbnRFbGVtZW50LmFwcGVuZENoaWxkKGNhbmNlbCk7CiAgY2FuY2VsLnRleHRDb250ZW50ID0gJ0NhbmNlbCB1cGxvYWQnOwogIGNvbnN0IGNhbmNlbFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgY2FuY2VsLm9uY2xpY2sgPSAoKSA9PiB7CiAgICAgIHJlc29sdmUobnVsbCk7CiAgICB9OwogIH0pOwoKICAvLyBXYWl0IGZvciB0aGUgdXNlciB0byBwaWNrIHRoZSBmaWxlcy4KICBjb25zdCBmaWxlcyA9IHlpZWxkIHsKICAgIHByb21pc2U6IFByb21pc2UucmFjZShbcGlja2VkUHJvbWlzZSwgY2FuY2VsUHJvbWlzZV0pLAogICAgcmVzcG9uc2U6IHsKICAgICAgYWN0aW9uOiAnc3RhcnRpbmcnLAogICAgfQogIH07CgogIGNhbmNlbC5yZW1vdmUoKTsKCiAgLy8gRGlzYWJsZSB0aGUgaW5wdXQgZWxlbWVudCBzaW5jZSBmdXJ0aGVyIHBpY2tzIGFyZSBub3QgYWxsb3dlZC4KICBpbnB1dEVsZW1lbnQuZGlzYWJsZWQgPSB0cnVlOwoKICBpZiAoIWZpbGVzKSB7CiAgICByZXR1cm4gewogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbXBsZXRlJywKICAgICAgfQogICAgfTsKICB9CgogIGZvciAoY29uc3QgZmlsZSBvZiBmaWxlcykgewogICAgY29uc3QgbGkgPSBkb2N1bWVudC5jcmVhdGVFbGVtZW50KCdsaScpOwogICAgbGkuYXBwZW5kKHNwYW4oZmlsZS5uYW1lLCB7Zm9udFdlaWdodDogJ2JvbGQnfSkpOwogICAgbGkuYXBwZW5kKHNwYW4oCiAgICAgICAgYCgke2ZpbGUudHlwZSB8fCAnbi9hJ30pIC0gJHtmaWxlLnNpemV9IGJ5dGVzLCBgICsKICAgICAgICBgbGFzdCBtb2RpZmllZDogJHsKICAgICAgICAgICAgZmlsZS5sYXN0TW9kaWZpZWREYXRlID8gZmlsZS5sYXN0TW9kaWZpZWREYXRlLnRvTG9jYWxlRGF0ZVN0cmluZygpIDoKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJ24vYSd9IC0gYCkpOwogICAgY29uc3QgcGVyY2VudCA9IHNwYW4oJzAlIGRvbmUnKTsKICAgIGxpLmFwcGVuZENoaWxkKHBlcmNlbnQpOwoKICAgIG91dHB1dEVsZW1lbnQuYXBwZW5kQ2hpbGQobGkpOwoKICAgIGNvbnN0IGZpbGVEYXRhUHJvbWlzZSA9IG5ldyBQcm9taXNlKChyZXNvbHZlKSA9PiB7CiAgICAgIGNvbnN0IHJlYWRlciA9IG5ldyBGaWxlUmVhZGVyKCk7CiAgICAgIHJlYWRlci5vbmxvYWQgPSAoZSkgPT4gewogICAgICAgIHJlc29sdmUoZS50YXJnZXQucmVzdWx0KTsKICAgICAgfTsKICAgICAgcmVhZGVyLnJlYWRBc0FycmF5QnVmZmVyKGZpbGUpOwogICAgfSk7CiAgICAvLyBXYWl0IGZvciB0aGUgZGF0YSB0byBiZSByZWFkeS4KICAgIGxldCBmaWxlRGF0YSA9IHlpZWxkIHsKICAgICAgcHJvbWlzZTogZmlsZURhdGFQcm9taXNlLAogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbnRpbnVlJywKICAgICAgfQogICAgfTsKCiAgICAvLyBVc2UgYSBjaHVua2VkIHNlbmRpbmcgdG8gYXZvaWQgbWVzc2FnZSBzaXplIGxpbWl0cy4gU2VlIGIvNjIxMTU2NjAuCiAgICBsZXQgcG9zaXRpb24gPSAwOwogICAgd2hpbGUgKHBvc2l0aW9uIDwgZmlsZURhdGEuYnl0ZUxlbmd0aCkgewogICAgICBjb25zdCBsZW5ndGggPSBNYXRoLm1pbihmaWxlRGF0YS5ieXRlTGVuZ3RoIC0gcG9zaXRpb24sIE1BWF9QQVlMT0FEX1NJWkUpOwogICAgICBjb25zdCBjaHVuayA9IG5ldyBVaW50OEFycmF5KGZpbGVEYXRhLCBwb3NpdGlvbiwgbGVuZ3RoKTsKICAgICAgcG9zaXRpb24gKz0gbGVuZ3RoOwoKICAgICAgY29uc3QgYmFzZTY0ID0gYnRvYShTdHJpbmcuZnJvbUNoYXJDb2RlLmFwcGx5KG51bGwsIGNodW5rKSk7CiAgICAgIHlpZWxkIHsKICAgICAgICByZXNwb25zZTogewogICAgICAgICAgYWN0aW9uOiAnYXBwZW5kJywKICAgICAgICAgIGZpbGU6IGZpbGUubmFtZSwKICAgICAgICAgIGRhdGE6IGJhc2U2NCwKICAgICAgICB9LAogICAgICB9OwogICAgICBwZXJjZW50LnRleHRDb250ZW50ID0KICAgICAgICAgIGAke01hdGgucm91bmQoKHBvc2l0aW9uIC8gZmlsZURhdGEuYnl0ZUxlbmd0aCkgKiAxMDApfSUgZG9uZWA7CiAgICB9CiAgfQoKICAvLyBBbGwgZG9uZS4KICB5aWVsZCB7CiAgICByZXNwb25zZTogewogICAgICBhY3Rpb246ICdjb21wbGV0ZScsCiAgICB9CiAgfTsKfQoKc2NvcGUuZ29vZ2xlID0gc2NvcGUuZ29vZ2xlIHx8IHt9OwpzY29wZS5nb29nbGUuY29sYWIgPSBzY29wZS5nb29nbGUuY29sYWIgfHwge307CnNjb3BlLmdvb2dsZS5jb2xhYi5fZmlsZXMgPSB7CiAgX3VwbG9hZEZpbGVzLAogIF91cGxvYWRGaWxlc0NvbnRpbnVlLAp9Owp9KShzZWxmKTsK",
"ok": true,
"headers": [
[
"content-type",
"application/javascript"
]
],
"status": 200,
"status_text": ""
}
},
"base_uri": "https://localhost:8080/",
"height": 106
},
"outputId": "30ebe0f2-1710-4955-e1e1-323b1eb256a5"
},
"source": [
"from google.colab import files\n",
"uploaded = files.upload()"
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <input type=\"file\" id=\"files-89174982-8f7f-4cab-a7ee-5133d23aeec7\" name=\"files[]\" multiple disabled\n",
" style=\"border:none\" />\n",
" <output id=\"result-89174982-8f7f-4cab-a7ee-5133d23aeec7\">\n",
" Upload widget is only available when the cell has been executed in the\n",
" current browser session. Please rerun this cell to enable.\n",
" </output>\n",
" <script src=\"/nbextensions/google.colab/files.js\"></script> "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {
"tags": []
}
},
{
"output_type": "stream",
"text": [
"Saving Summary of Weather.csv to Summary of Weather.csv\n",
"Saving Weather Station Locations.csv to Weather Station Locations.csv\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oQldSWF4zSzc",
"colab_type": "text"
},
"source": [
"### Reading files using Pandas\n",
"Now, **read and show** the first 5 lines of each file using Pandas.\n",
"\n",
"[`pandas.read_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "W8rOWJuJrPuP",
"colab_type": "code",
"colab": {}
},
"source": [
"import pandas as pd"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "L2UpKezkzZHe",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 326
},
"outputId": "92589ca3-e436-493d-df7e-59d41615f51a"
},
"source": [
"# Summary of Weather\n",
"weather_summary = pd.read_csv('Summary of Weather.csv', low_memory=False)\n",
"print(f'There are {weather_summary.shape[0]} weather records.')\n",
"weather_summary.head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"There are 119040 weather records.\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>STA</th>\n",
" <th>Date</th>\n",
" <th>Precip</th>\n",
" <th>WindGustSpd</th>\n",
" <th>MaxTemp</th>\n",
" <th>MinTemp</th>\n",
" <th>MeanTemp</th>\n",
" <th>Snowfall</th>\n",
" <th>PoorWeather</th>\n",
" <th>YR</th>\n",
" <th>MO</th>\n",
" <th>DA</th>\n",
" <th>PRCP</th>\n",
" <th>DR</th>\n",
" <th>SPD</th>\n",
" <th>MAX</th>\n",
" <th>MIN</th>\n",
" <th>MEA</th>\n",
" <th>SNF</th>\n",
" <th>SND</th>\n",
" <th>FT</th>\n",
" <th>FB</th>\n",
" <th>FTI</th>\n",
" <th>ITH</th>\n",
" <th>PGT</th>\n",
" <th>TSHDSBRSGF</th>\n",
" <th>SD3</th>\n",
" <th>RHX</th>\n",
" <th>RHN</th>\n",
" <th>RVG</th>\n",
" <th>WTE</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>10001</td>\n",
" <td>1942-7-1</td>\n",
" <td>1.016</td>\n",
" <td>NaN</td>\n",
" <td>25.555556</td>\n",
" <td>22.222222</td>\n",
" <td>23.888889</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>42</td>\n",
" <td>7</td>\n",
" <td>1</td>\n",
" <td>0.04</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>78.0</td>\n",
" <td>72.0</td>\n",
" <td>75.0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10001</td>\n",
" <td>1942-7-2</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>28.888889</td>\n",
" <td>21.666667</td>\n",
" <td>25.555556</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>42</td>\n",
" <td>7</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>84.0</td>\n",
" <td>71.0</td>\n",
" <td>78.0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>10001</td>\n",
" <td>1942-7-3</td>\n",
" <td>2.54</td>\n",
" <td>NaN</td>\n",
" <td>26.111111</td>\n",
" <td>22.222222</td>\n",
" <td>24.444444</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>42</td>\n",
" <td>7</td>\n",
" <td>3</td>\n",
" <td>0.1</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>79.0</td>\n",
" <td>72.0</td>\n",
" <td>76.0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>10001</td>\n",
" <td>1942-7-4</td>\n",
" <td>2.54</td>\n",
" <td>NaN</td>\n",
" <td>26.666667</td>\n",
" <td>22.222222</td>\n",
" <td>24.444444</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>42</td>\n",
" <td>7</td>\n",
" <td>4</td>\n",
" <td>0.1</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>80.0</td>\n",
" <td>72.0</td>\n",
" <td>76.0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>10001</td>\n",
" <td>1942-7-5</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>26.666667</td>\n",
" <td>21.666667</td>\n",
" <td>24.444444</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>42</td>\n",
" <td>7</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>80.0</td>\n",
" <td>71.0</td>\n",
" <td>76.0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" STA Date Precip WindGustSpd MaxTemp ... SD3 RHX RHN RVG WTE\n",
"0 10001 1942-7-1 1.016 NaN 25.555556 ... NaN NaN NaN NaN NaN\n",
"1 10001 1942-7-2 0 NaN 28.888889 ... NaN NaN NaN NaN NaN\n",
"2 10001 1942-7-3 2.54 NaN 26.111111 ... NaN NaN NaN NaN NaN\n",
"3 10001 1942-7-4 2.54 NaN 26.666667 ... NaN NaN NaN NaN NaN\n",
"4 10001 1942-7-5 0 NaN 26.666667 ... NaN NaN NaN NaN NaN\n",
"\n",
"[5 rows x 31 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 3
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "mypsme5mzZJz",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 221
},
"outputId": "4b3df077-1d21-4e90-8bd5-ab61171da6b7"
},
"source": [
"# Weather Station Locations\n",
"weather_stations = pd.read_csv('Weather Station Locations.csv')\n",
"print(f'There are {weather_stations.shape[0]} stations available.')\n",
"weather_stations.head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"There are 161 stations available.\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>WBAN</th>\n",
" <th>NAME</th>\n",
" <th>STATE/COUNTRY ID</th>\n",
" <th>LAT</th>\n",
" <th>LON</th>\n",
" <th>ELEV</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>33013</td>\n",
" <td>AIN EL</td>\n",
" <td>AL</td>\n",
" <td>3623N</td>\n",
" <td>00637E</td>\n",
" <td>611</td>\n",
" <td>36.383333</td>\n",
" <td>6.650000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33031</td>\n",
" <td>LA SENIA</td>\n",
" <td>AL</td>\n",
" <td>3537N</td>\n",
" <td>00037E</td>\n",
" <td>88</td>\n",
" <td>35.616667</td>\n",
" <td>0.583333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>33023</td>\n",
" <td>MAISON BLANCHE</td>\n",
" <td>AL</td>\n",
" <td>3643N</td>\n",
" <td>00314E</td>\n",
" <td>23</td>\n",
" <td>36.716667</td>\n",
" <td>3.216667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>33044</td>\n",
" <td>TELERGMA</td>\n",
" <td>AL</td>\n",
" <td>3607N</td>\n",
" <td>00621E</td>\n",
" <td>754</td>\n",
" <td>36.116667</td>\n",
" <td>6.416667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>12001</td>\n",
" <td>TINDOUF</td>\n",
" <td>AL</td>\n",
" <td>2741N</td>\n",
" <td>00809W</td>\n",
" <td>443</td>\n",
" <td>27.683333</td>\n",
" <td>-8.083333</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" WBAN NAME STATE/COUNTRY ID ... ELEV Latitude Longitude\n",
"0 33013 AIN EL AL ... 611 36.383333 6.650000\n",
"1 33031 LA SENIA AL ... 88 35.616667 0.583333\n",
"2 33023 MAISON BLANCHE AL ... 23 36.716667 3.216667\n",
"3 33044 TELERGMA AL ... 754 36.116667 6.416667\n",
"4 12001 TINDOUF AL ... 443 27.683333 -8.083333\n",
"\n",
"[5 rows x 8 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 4
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FxOuQCIArKvC",
"colab_type": "text"
},
"source": [
"### Filtering stations\n",
"Now, I'm going to **filter** the weather stations Data Frame to keep only the french `FR` stations.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "QzMj5FTHs87J",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 266
},
"outputId": "1dd6afad-01f4-40cc-94b1-16da6f3e7e1d"
},
"source": [
"french_stations = weather_stations[ weather_stations['STATE/COUNTRY ID'] == 'FR']\n",
"french_stations"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>WBAN</th>\n",
" <th>NAME</th>\n",
" <th>STATE/COUNTRY ID</th>\n",
" <th>LAT</th>\n",
" <th>LON</th>\n",
" <th>ELEV</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>41</th>\n",
" <td>34017</td>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>FR</td>\n",
" <td>4233N</td>\n",
" <td>00929E</td>\n",
" <td>16</td>\n",
" <td>42.550000</td>\n",
" <td>9.433333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td>34002</td>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>FR</td>\n",
" <td>4242N</td>\n",
" <td>00927E</td>\n",
" <td>105</td>\n",
" <td>42.700000</td>\n",
" <td>9.366667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>34003</td>\n",
" <td>C DELORO/CORSICA</td>\n",
" <td>FR</td>\n",
" <td>4155N</td>\n",
" <td>00848E</td>\n",
" <td>3</td>\n",
" <td>41.916667</td>\n",
" <td>8.750000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>44</th>\n",
" <td>34012</td>\n",
" <td>MARSEILLE</td>\n",
" <td>FR</td>\n",
" <td>4326N</td>\n",
" <td>00513E</td>\n",
" <td>12</td>\n",
" <td>43.433333</td>\n",
" <td>5.266667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td>34013</td>\n",
" <td>MIGLIACCIARO/CORSICA</td>\n",
" <td>FR</td>\n",
" <td>4200N</td>\n",
" <td>00924E</td>\n",
" <td>18</td>\n",
" <td>42.000000</td>\n",
" <td>9.450000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>34022</td>\n",
" <td>PARIS/ORLY</td>\n",
" <td>FR</td>\n",
" <td>4844N</td>\n",
" <td>00222E</td>\n",
" <td>100</td>\n",
" <td>48.733333</td>\n",
" <td>2.366667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>47</th>\n",
" <td>34005</td>\n",
" <td>PORTETTA/CORSICA</td>\n",
" <td>FR</td>\n",
" <td>4234N</td>\n",
" <td>00927E</td>\n",
" <td>49</td>\n",
" <td>42.566667</td>\n",
" <td>9.333333</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" WBAN NAME STATE/COUNTRY ID ... ELEV Latitude Longitude\n",
"41 34017 BASTIA/CORSICA FR ... 16 42.550000 9.433333\n",
"42 34002 BASTIA/CORSICA FR ... 105 42.700000 9.366667\n",
"43 34003 C DELORO/CORSICA FR ... 3 41.916667 8.750000\n",
"44 34012 MARSEILLE FR ... 12 43.433333 5.266667\n",
"45 34013 MIGLIACCIARO/CORSICA FR ... 18 42.000000 9.450000\n",
"46 34022 PARIS/ORLY FR ... 100 48.733333 2.366667\n",
"47 34005 PORTETTA/CORSICA FR ... 49 42.566667 9.333333\n",
"\n",
"[7 rows x 8 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qx9KGTxruXho",
"colab_type": "text"
},
"source": [
"### Joining different data frames\n",
"Now, let's **join** the `weather_summary` data frame and the `french_stations` data frame using `inner` strategy.\n",
"\n",
"> **💡 Hint!**<br/>We know that `weather_summary` data has an column named as `STA` that represents the station Id and we also know that `french_stations` data has a column named as `WBAN` that also represents the station Id.\n",
"\n",
"[`pandas.merge()`](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging)\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "7LoF4BGkuYDk",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 343
},
"outputId": "af1ff749-c464-4641-85b3-e5668063d993"
},
"source": [
"df = pd.merge(\n",
" weather_summary,\n",
" french_stations,\n",
" how='inner',\n",
" left_on='STA',\n",
" right_on='WBAN'\n",
")\n",
"print(f'There are {df.shape[0]} weather records.')\n",
"df.head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"There are 1529 weather records.\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>STA</th>\n",
" <th>Date</th>\n",
" <th>Precip</th>\n",
" <th>WindGustSpd</th>\n",
" <th>MaxTemp</th>\n",
" <th>MinTemp</th>\n",
" <th>MeanTemp</th>\n",
" <th>Snowfall</th>\n",
" <th>PoorWeather</th>\n",
" <th>YR</th>\n",
" <th>MO</th>\n",
" <th>DA</th>\n",
" <th>PRCP</th>\n",
" <th>DR</th>\n",
" <th>SPD</th>\n",
" <th>MAX</th>\n",
" <th>MIN</th>\n",
" <th>MEA</th>\n",
" <th>SNF</th>\n",
" <th>SND</th>\n",
" <th>FT</th>\n",
" <th>FB</th>\n",
" <th>FTI</th>\n",
" <th>ITH</th>\n",
" <th>PGT</th>\n",
" <th>TSHDSBRSGF</th>\n",
" <th>SD3</th>\n",
" <th>RHX</th>\n",
" <th>RHN</th>\n",
" <th>RVG</th>\n",
" <th>WTE</th>\n",
" <th>WBAN</th>\n",
" <th>NAME</th>\n",
" <th>STATE/COUNTRY ID</th>\n",
" <th>LAT</th>\n",
" <th>LON</th>\n",
" <th>ELEV</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>34002</td>\n",
" <td>1943-12-11</td>\n",
" <td>30.48</td>\n",
" <td>NaN</td>\n",
" <td>11.666667</td>\n",
" <td>7.777778</td>\n",
" <td>10.000000</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>43</td>\n",
" <td>12</td>\n",
" <td>11</td>\n",
" <td>1.2</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>53.0</td>\n",
" <td>46.0</td>\n",
" <td>50.0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>34002</td>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>FR</td>\n",
" <td>4242N</td>\n",
" <td>00927E</td>\n",
" <td>105</td>\n",
" <td>42.7</td>\n",
" <td>9.366667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>34002</td>\n",
" <td>1943-12-12</td>\n",
" <td>27.178</td>\n",
" <td>NaN</td>\n",
" <td>12.222222</td>\n",
" <td>7.777778</td>\n",
" <td>10.000000</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>43</td>\n",
" <td>12</td>\n",
" <td>12</td>\n",
" <td>1.07</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>54.0</td>\n",
" <td>46.0</td>\n",
" <td>50.0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>34002</td>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>FR</td>\n",
" <td>4242N</td>\n",
" <td>00927E</td>\n",
" <td>105</td>\n",
" <td>42.7</td>\n",
" <td>9.366667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>34002</td>\n",
" <td>1943-12-13</td>\n",
" <td>68.072</td>\n",
" <td>NaN</td>\n",
" <td>12.222222</td>\n",
" <td>9.444444</td>\n",
" <td>11.111111</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>43</td>\n",
" <td>12</td>\n",
" <td>13</td>\n",
" <td>2.68</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>54.0</td>\n",
" <td>49.0</td>\n",
" <td>52.0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>34002</td>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>FR</td>\n",
" <td>4242N</td>\n",
" <td>00927E</td>\n",
" <td>105</td>\n",
" <td>42.7</td>\n",
" <td>9.366667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>34002</td>\n",
" <td>1943-12-14</td>\n",
" <td>0.508</td>\n",
" <td>NaN</td>\n",
" <td>14.444444</td>\n",
" <td>12.222222</td>\n",
" <td>13.333333</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>43</td>\n",
" <td>12</td>\n",
" <td>14</td>\n",
" <td>0.02</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>58.0</td>\n",
" <td>54.0</td>\n",
" <td>56.0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>34002</td>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>FR</td>\n",
" <td>4242N</td>\n",
" <td>00927E</td>\n",
" <td>105</td>\n",
" <td>42.7</td>\n",
" <td>9.366667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>34002</td>\n",
" <td>1943-12-15</td>\n",
" <td>12.446</td>\n",
" <td>NaN</td>\n",
" <td>13.333333</td>\n",
" <td>11.111111</td>\n",
" <td>12.222222</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>43</td>\n",
" <td>12</td>\n",
" <td>15</td>\n",
" <td>0.49</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>56.0</td>\n",
" <td>52.0</td>\n",
" <td>54.0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>34002</td>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>FR</td>\n",
" <td>4242N</td>\n",
" <td>00927E</td>\n",
" <td>105</td>\n",
" <td>42.7</td>\n",
" <td>9.366667</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" STA Date Precip WindGustSpd ... LON ELEV Latitude Longitude\n",
"0 34002 1943-12-11 30.48 NaN ... 00927E 105 42.7 9.366667\n",
"1 34002 1943-12-12 27.178 NaN ... 00927E 105 42.7 9.366667\n",
"2 34002 1943-12-13 68.072 NaN ... 00927E 105 42.7 9.366667\n",
"3 34002 1943-12-14 0.508 NaN ... 00927E 105 42.7 9.366667\n",
"4 34002 1943-12-15 12.446 NaN ... 00927E 105 42.7 9.366667\n",
"\n",
"[5 rows x 39 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 6
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cGlNRFpfvxxw",
"colab_type": "text"
},
"source": [
"### Removing unused columns\n",
"Let's **remove** some columns that we are not going to use. In this way, data manipulation will be easier and faster.\n",
"\n",
"[`pandas.merge()`](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "IpRw8T00zLCK",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 80
},
"outputId": "5cf24153-1212-4b81-c41a-f49a1d67d677"
},
"source": [
"df = df[['NAME', 'YR', 'MeanTemp']]\n",
"df.head(1)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>NAME</th>\n",
" <th>YR</th>\n",
" <th>MeanTemp</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>43</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" NAME YR MeanTemp\n",
"0 BASTIA/CORSICA 43 10.0"
]
},
"metadata": {
"tags": []
},
"execution_count": 7
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Z04px7EbBzBj",
"colab_type": "text"
},
"source": [
"### Grouping our data\n",
"Let's **group** our data by the station name (`NAME`) and the year (`YR`). That's not everything, we are also going to calculate the mean temperature for each station each year.\n",
"\n",
"[`df.groupby()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html)\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "0bkhbcLyvzDs",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 421
},
"outputId": "d6ad27ac-15df-41b7-efc1-230a230d8b44"
},
"source": [
"final_df = df.groupby(['NAME', 'YR'], as_index=False).mean()\n",
"final_df"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>NAME</th>\n",
" <th>YR</th>\n",
" <th>MeanTemp</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>43</td>\n",
" <td>10.952381</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>44</td>\n",
" <td>15.120988</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>45</td>\n",
" <td>22.296296</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>C DELORO/CORSICA</td>\n",
" <td>44</td>\n",
" <td>20.354267</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>MARSEILLE</td>\n",
" <td>44</td>\n",
" <td>9.295393</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>MARSEILLE</td>\n",
" <td>45</td>\n",
" <td>11.928662</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>MIGLIACCIARO/CORSICA</td>\n",
" <td>44</td>\n",
" <td>10.424383</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>MIGLIACCIARO/CORSICA</td>\n",
" <td>45</td>\n",
" <td>8.356974</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>PARIS/ORLY</td>\n",
" <td>44</td>\n",
" <td>6.950483</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>PARIS/ORLY</td>\n",
" <td>45</td>\n",
" <td>11.907352</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>PORTETTA/CORSICA</td>\n",
" <td>44</td>\n",
" <td>17.885906</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>PORTETTA/CORSICA</td>\n",
" <td>45</td>\n",
" <td>12.133956</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" NAME YR MeanTemp\n",
"0 BASTIA/CORSICA 43 10.952381\n",
"1 BASTIA/CORSICA 44 15.120988\n",
"2 BASTIA/CORSICA 45 22.296296\n",
"3 C DELORO/CORSICA 44 20.354267\n",
"4 MARSEILLE 44 9.295393\n",
"5 MARSEILLE 45 11.928662\n",
"6 MIGLIACCIARO/CORSICA 44 10.424383\n",
"7 MIGLIACCIARO/CORSICA 45 8.356974\n",
"8 PARIS/ORLY 44 6.950483\n",
"9 PARIS/ORLY 45 11.907352\n",
"10 PORTETTA/CORSICA 44 17.885906\n",
"11 PORTETTA/CORSICA 45 12.133956"
]
},
"metadata": {
"tags": []
},
"execution_count": 8
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uJnsFRfB0RVG",
"colab_type": "text"
},
"source": [
"# Converting Celcius to Farenheit\n",
"Now I'm going to create a python **function to convert** Celsius degrees to Fahrenheit degrees. \n",
"Then I'm going to create a new column that will contain the Farenheit temperature and I'll name it as `MeanTempF`. This new column will be based on the values from the `MeanTemp` column that contains the Celsius temperatures.\n",
"\n",
"> **💡 Hint!**<br/>This is Celsius to Farenheit conversion formula.\n",
"```\n",
"°F = (0 °C × 9/5) + 32\n",
"```\n",
"\n",
"[`df.apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html)\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "4prvtK5n1Liu",
"colab_type": "code",
"colab": {}
},
"source": [
"def celsius_to_fahrenheit(c_temp):\n",
" return (c_temp * 9/5) + 32"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "TW4koKbq0Rg_",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 421
},
"outputId": "f666b00a-48ac-41f5-f931-45db40f852b0"
},
"source": [
"final_df['MeanTempF'] = final_df['MeanTemp'].apply(celsius_to_fahrenheit)\n",
"final_df.sort_values(by=['NAME', 'YR'])"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>NAME</th>\n",
" <th>YR</th>\n",
" <th>MeanTemp</th>\n",
" <th>MeanTempF</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>43</td>\n",
" <td>10.952381</td>\n",
" <td>51.714286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>44</td>\n",
" <td>15.120988</td>\n",
" <td>59.217778</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BASTIA/CORSICA</td>\n",
" <td>45</td>\n",
" <td>22.296296</td>\n",
" <td>72.133333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>C DELORO/CORSICA</td>\n",
" <td>44</td>\n",
" <td>20.354267</td>\n",
" <td>68.637681</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>MARSEILLE</td>\n",
" <td>44</td>\n",
" <td>9.295393</td>\n",
" <td>48.731707</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>MARSEILLE</td>\n",
" <td>45</td>\n",
" <td>11.928662</td>\n",
" <td>53.471591</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>MIGLIACCIARO/CORSICA</td>\n",
" <td>44</td>\n",
" <td>10.424383</td>\n",
" <td>50.763889</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>MIGLIACCIARO/CORSICA</td>\n",
" <td>45</td>\n",
" <td>8.356974</td>\n",
" <td>47.042553</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>PARIS/ORLY</td>\n",
" <td>44</td>\n",
" <td>6.950483</td>\n",
" <td>44.510870</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>PARIS/ORLY</td>\n",
" <td>45</td>\n",
" <td>11.907352</td>\n",
" <td>53.433234</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>PORTETTA/CORSICA</td>\n",
" <td>44</td>\n",
" <td>17.885906</td>\n",
" <td>64.194631</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>PORTETTA/CORSICA</td>\n",
" <td>45</td>\n",
" <td>12.133956</td>\n",
" <td>53.841121</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" NAME YR MeanTemp MeanTempF\n",
"0 BASTIA/CORSICA 43 10.952381 51.714286\n",
"1 BASTIA/CORSICA 44 15.120988 59.217778\n",
"2 BASTIA/CORSICA 45 22.296296 72.133333\n",
"3 C DELORO/CORSICA 44 20.354267 68.637681\n",
"4 MARSEILLE 44 9.295393 48.731707\n",
"5 MARSEILLE 45 11.928662 53.471591\n",
"6 MIGLIACCIARO/CORSICA 44 10.424383 50.763889\n",
"7 MIGLIACCIARO/CORSICA 45 8.356974 47.042553\n",
"8 PARIS/ORLY 44 6.950483 44.510870\n",
"9 PARIS/ORLY 45 11.907352 53.433234\n",
"10 PORTETTA/CORSICA 44 17.885906 64.194631\n",
"11 PORTETTA/CORSICA 45 12.133956 53.841121"
]
},
"metadata": {
"tags": []
},
"execution_count": 10
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "o7VRNHFi91Gc",
"colab_type": "text"
},
"source": [
"# Finally!\n",
"To make a long story short, let's download our processed data in CSV format. \n",
"I will also help you in this task creating a function to download the CSV, you just have to call `download_csv()` function passing your data frame as a parameter.\n",
"\n",
"[`df.to_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "TTLv2vNM9076",
"colab_type": "code",
"colab": {}
},
"source": [
"from google.colab import files\n",
"\n",
"def download_csv(df):\n",
" file_name = 'french-weather-average-per-year.csv'\n",
" df.to_csv(file_name, index=False)\n",
" files.download(file_name)\n",
" print(f'Downloading file \"{file_name}\"')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "t_iNM4BZ-uaD",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "5db18777-06d4-446d-ed51-e3e79c9e831b"
},
"source": [
"download_csv(final_df)"
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"application/javascript": [
"\n",
" async function download(id, filename, size) {\n",
" if (!google.colab.kernel.accessAllowed) {\n",
" return;\n",
" }\n",
" const div = document.createElement('div');\n",
" const label = document.createElement('label');\n",
" label.textContent = `Downloading \"${filename}\": `;\n",
" div.appendChild(label);\n",
" const progress = document.createElement('progress');\n",
" progress.max = size;\n",
" div.appendChild(progress);\n",
" document.body.appendChild(div);\n",
"\n",
" const buffers = [];\n",
" let downloaded = 0;\n",
"\n",
" const channel = await google.colab.kernel.comms.open(id);\n",
" // Send a message to notify the kernel that we're ready.\n",
" channel.send({})\n",
"\n",
" for await (const message of channel.messages) {\n",
" // Send a message to notify the kernel that we're ready.\n",
" channel.send({})\n",
" if (message.buffers) {\n",
" for (const buffer of message.buffers) {\n",
" buffers.push(buffer);\n",
" downloaded += buffer.byteLength;\n",
" progress.value = downloaded;\n",
" }\n",
" }\n",
" }\n",
" const blob = new Blob(buffers, {type: 'application/binary'});\n",
" const a = document.createElement('a');\n",
" a.href = window.URL.createObjectURL(blob);\n",
" a.download = filename;\n",
" div.appendChild(a);\n",
" a.click();\n",
" div.remove();\n",
" }\n",
" "
],
"text/plain": [
"<IPython.core.display.Javascript object>"
]
},
"metadata": {
"tags": []
}
},
{
"output_type": "display_data",
"data": {
"application/javascript": [
"download(\"download_16d70791-331b-43dd-bea2-1fea01a3cb9c\", \"french-weather-average-per-year.csv\", 681)"
],
"text/plain": [
"<IPython.core.display.Javascript object>"
]
},
"metadata": {
"tags": []
}
},
{
"output_type": "stream",
"text": [
"Downloading file \"french-weather-average-per-year.csv\"\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3-2ytuWw1lKK",
"colab_type": "text"
},
"source": [
"Follow my code on [GitHub](https://github.com/avcaliani)<br>\n",
"Icons made by [smalllikeart](https://www.flaticon.com/authors/smalllikeart) from [Flaticon](https://www.flaticon.com)"
]
}
]
}
@avcaliani
Copy link
Author

tank
Icons made by smalllikeart from Flaticon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment