Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save andres-abrigo/2f5e5f5fd92af3f20d0c3512753c2cd2 to your computer and use it in GitHub Desktop.
Save andres-abrigo/2f5e5f5fd92af3f20d0c3512753c2cd2 to your computer and use it in GitHub Desktop.
Created on Skills Network Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a href=\"https://cognitiveclass.ai\"><img src = \"https://ibm.box.com/shared/static/ugcqz6ohbvff804xp84y4kqnvvk3bq1g.png\" width = 300, align = \"center\"></a>\n",
"\n",
"<h1 align=center><font size = 5>Lab: Analyzing a real world data-set with SQL and Python</font></h1>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"\n",
"This notebook shows how to store a dataset into a database using and analyze data using SQL and Python. In this lab you will:\n",
"1. Understand a dataset of selected socioeconomic indicators in Chicago\n",
"1. Learn how to store data in an Db2 database on IBM Cloud instance\n",
"1. Solve example problems to practice your SQL skills "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selected Socioeconomic Indicators in Chicago\n",
"\n",
"The city of Chicago released a dataset of socioeconomic data to the Chicago City Portal.\n",
"This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.\n",
"\n",
"Scores on the hardship index can range from 1 to 100, with a higher index number representing a greater level of hardship.\n",
"\n",
"A detailed description of the dataset can be found on [the city of Chicago's website](\n",
"https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2), but to summarize, the dataset has the following variables:\n",
"\n",
"* **Community Area Number** (`ca`): Used to uniquely identify each row of the dataset\n",
"\n",
"* **Community Area Name** (`community_area_name`): The name of the region in the city of Chicago \n",
"\n",
"* **Percent of Housing Crowded** (`percent_of_housing_crowded`): Percent of occupied housing units with more than one person per room\n",
"\n",
"* **Percent Households Below Poverty** (`percent_households_below_poverty`): Percent of households living below the federal poverty line\n",
"\n",
"* **Percent Aged 16+ Unemployed** (`percent_aged_16_unemployed`): Percent of persons over the age of 16 years that are unemployed\n",
"\n",
"* **Percent Aged 25+ without High School Diploma** (`percent_aged_25_without_high_school_diploma`): Percent of persons over the age of 25 years without a high school education\n",
"\n",
"* **Percent Aged Under** 18 or Over 64:Percent of population under 18 or over 64 years of age (`percent_aged_under_18_or_over_64`): (ie. dependents)\n",
"\n",
"* **Per Capita Income** (`per_capita_income_`): Community Area per capita income is estimated as the sum of tract-level aggragate incomes divided by the total population\n",
"\n",
"* **Hardship Index** (`hardship_index`): Score that incorporates each of the six selected socioeconomic indicators\n",
"\n",
"In this Lab, we'll take a look at the variables in the socioeconomic indicators dataset and do some basic analysis with Python.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to the database\n",
"Let us first load the SQL extension and establish a connection with the database"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%load_ext sql"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Connected: sfz51159@BLUDB'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Remember the connection string is of the format:\n",
"# %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name\n",
"# Enter the connection string for your Db2 on Cloud database instance below\n",
"# i.e. copy after db2:// from the URI string in Service Credentials of your Db2 instance. Remove the double quotes at the end.\n",
"%sql ibm_db_sa://sfz51159:r172j9g-gm6hgk1q@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Store the dataset in a Table\n",
"##### In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. To analyze the data using SQL, it first needs to be stored in the database.\n",
"\n",
"##### We will first read the dataset source .CSV from the internet into pandas dataframe\n",
"\n",
"##### Then we need to create a table in our Db2 database to store the dataset. The PERSIST command in SQL \"magic\" simplifies the process of table creation and writing the data from a `pandas` dataframe into the table"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://sfz51159:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB\n"
]
},
{
"data": {
"text/plain": [
"'Persisted chicago_socioeconomic_data'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas\n",
"chicago_socioeconomic_data = pandas.read_csv('https://data.cityofchicago.org/resource/jcxq-k9xf.csv')\n",
"%sql PERSIST chicago_socioeconomic_data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### You can verify that the table creation was successful by making a basic query like:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://sfz51159:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>index</th>\n",
" <th>ca</th>\n",
" <th>community_area_name</th>\n",
" <th>percent_of_housing_crowded</th>\n",
" <th>percent_households_below_poverty</th>\n",
" <th>percent_aged_16_unemployed</th>\n",
" <th>percent_aged_25_without_high_school_diploma</th>\n",
" <th>percent_aged_under_18_or_over_64</th>\n",
" <th>per_capita_income_</th>\n",
" <th>hardship_index</th>\n",
" </tr>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>Rogers Park</td>\n",
" <td>7.7</td>\n",
" <td>23.6</td>\n",
" <td>8.7</td>\n",
" <td>18.2</td>\n",
" <td>27.5</td>\n",
" <td>23939</td>\n",
" <td>39.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>West Ridge</td>\n",
" <td>7.8</td>\n",
" <td>17.2</td>\n",
" <td>8.8</td>\n",
" <td>20.8</td>\n",
" <td>38.5</td>\n",
" <td>23040</td>\n",
" <td>46.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>3.0</td>\n",
" <td>Uptown</td>\n",
" <td>3.8</td>\n",
" <td>24.0</td>\n",
" <td>8.9</td>\n",
" <td>11.8</td>\n",
" <td>22.2</td>\n",
" <td>35787</td>\n",
" <td>20.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>4.0</td>\n",
" <td>Lincoln Square</td>\n",
" <td>3.4</td>\n",
" <td>10.9</td>\n",
" <td>8.2</td>\n",
" <td>13.4</td>\n",
" <td>25.5</td>\n",
" <td>37524</td>\n",
" <td>17.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>5.0</td>\n",
" <td>North Center</td>\n",
" <td>0.3</td>\n",
" <td>7.5</td>\n",
" <td>5.2</td>\n",
" <td>4.5</td>\n",
" <td>26.2</td>\n",
" <td>57123</td>\n",
" <td>6.0</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(0, 1.0, 'Rogers Park', 7.7, 23.6, 8.7, 18.2, 27.5, 23939, 39.0),\n",
" (1, 2.0, 'West Ridge', 7.8, 17.2, 8.8, 20.8, 38.5, 23040, 46.0),\n",
" (2, 3.0, 'Uptown', 3.8, 24.0, 8.9, 11.8, 22.2, 35787, 20.0),\n",
" (3, 4.0, 'Lincoln Square', 3.4, 10.9, 8.2, 13.4, 25.5, 37524, 17.0),\n",
" (4, 5.0, 'North Center', 0.3, 7.5, 5.2, 4.5, 26.2, 57123, 6.0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql SELECT * FROM chicago_socioeconomic_data limit 5;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Problems\n",
"\n",
"### Problem 1\n",
"\n",
"##### How many rows are in the dataset?"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://sfz51159:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>1</th>\n",
" </tr>\n",
" <tr>\n",
" <td>78</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(Decimal('78'),)]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql SELECT COUNT(*) FROM chicago_socioeconomic_data;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click __here__ for the solution.\n",
"\n",
"<!-- Hint:\n",
"\n",
"%sql SELECT COUNT(*) FROM chicago_socioeconomic_data;\n",
"\n",
"Correct answer: 78\n",
"\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 2\n",
"\n",
"##### How many community areas in Chicago have a hardship index greater than 50.0?"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://sfz51159:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>1</th>\n",
" </tr>\n",
" <tr>\n",
" <td>38</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(Decimal('38'),)]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql SELECT COUNT(*) FROM chicago_socioeconomic_data WHERE hardship_index > 50.0;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click __here__ for the solution.\n",
"\n",
"<!-- Hint:\n",
"\n",
"%sql SELECT COUNT(*) FROM chicago_socioeconomic_data WHERE hardship_index > 50.0;\n",
"Correct answer: 38\n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 3\n",
"\n",
"##### What is the maximum value of hardship index in this dataset?"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://sfz51159:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>1</th>\n",
" </tr>\n",
" <tr>\n",
" <td>98.0</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(98.0,)]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql SELECT MAX(chicago_socioeconomic_data.hardship_index) FROM chicago_socioeconomic_data "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click __here__ for the solution.\n",
"\n",
"<!-- Hint:\n",
"\n",
"%sql SELECT MAX(hardship_index) FROM chicago_socioeconomic_data;\n",
"\n",
"Correct answer: 98.0\n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 4\n",
"\n",
"##### Which community area which has the highest hardship index?\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://sfz51159:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>community_area_name</th>\n",
" </tr>\n",
" <tr>\n",
" <td>Riverdale</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[('Riverdale',)]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql SELECT chicago_socioeconomic_data.community_area_name FROM chicago_socioeconomic_data WHERE chicago_socioeconomic_data.hardship_index = (SELECT MAX(chicago_socioeconomic_data.hardship_index) FROM chicago_socioeconomic_data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click __here__ for the solution.\n",
"\n",
"<!-- Hint:\n",
"\n",
"## We can use the result of the last query to as an input to this query:\n",
"%sql SELECT community_area_name FROM chicago_socioeconomic_data where hardship_index=98.0\n",
"\n",
"## or another option:\n",
"%sql SELECT community_area_name FROM chicago_socioeconomic_data ORDER BY hardship_index DESC NULLS LAST FETCH FIRST ROW ONLY;\n",
"\n",
"## or you can use a sub-query to determine the max hardship index:\n",
"%sql select community_area_name from chicago_socioeconomic_data where hardship_index = ( select max(hardship_index) from chicago_socioeconomic_data ) \n",
"\n",
"Correct answer: 'Riverdale'\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 5\n",
"\n",
"##### Which Chicago community areas have per-capita incomes greater than $60,000?"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://sfz51159:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>community_area_name</th>\n",
" </tr>\n",
" <tr>\n",
" <td>Lake View</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Lincoln Park</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Near North Side</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Loop</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[('Lake View',), ('Lincoln Park',), ('Near North Side',), ('Loop',)]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql SELECT chicago_socioeconomic_data.community_area_name FROM chicago_socioeconomic_data WHERE chicago_socioeconomic_data.per_capita_income_ > 60000 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click __here__ for the solution.\n",
"\n",
"<!-- Hint:\n",
"\n",
"%sql SELECT community_area_name FROM chicago_socioeconomic_data WHERE per_capita_income_ > 60000;\n",
"\n",
"Correct answer:Lake View,Lincoln Park, Near North Side, Loop\n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 6\n",
"\n",
"##### Create a scatter plot using the variables `per_capita_income_` and `hardship_index`. Explain the correlation between the two variables."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEWCAYAAACXGLsWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3de5Qc5Xnn8e+j0YBGIDySJbA0ICTWHBEuMYKxIZaXtVEc2TiBCQQDJ6zBh1i767vDYo8cxyaJfaQEJ3Z2s46XBRtsfJOBCMV4I7MSxDFOwCMGzEVSAMuARgLJgHwBGXR59o96G9X0VFVXz0x3VXf/Puf0me7qru5HPZp++r0872vujoiISJYpRQcgIiLlp2QhIiI1KVmIiEhNShYiIlKTkoWIiNSkZCEiIjUpWUjbMzM3s9cWHUc1M3uzmW0rOg6RPJQsBAAze5OZ/dDMfm5mz5nZ3Wb2+gk+5+Vm9oOqYzeY2acnFm1jJMU7yc9fyqSVl5ldbWY3FR2HFGNq0QFI8czsCOA7wH8DVgOHAP8ReKnIuJKY2VR331d0HCIdx9116fAL0A/srvGY9wCbgF8CjwCnheODwOOx478fjv8G8GtgP/ArYDewHNgLvByO/WN47DzgFmAXsBX4YOx1rwZuBm4CfgH8UUJsNwBfBO4IcfwzcGzsfgdeG66/CvhKeK0ngE8QtbDHxJvyPtwFrATuBX4O3AbMCvfdDnyg6vE/BgaA74c4XgjPfxHwZmAbcCWwE9gBvDt2bmKs4b7LgR8AnwWeD+/b22PnXg78JLwfW4E/jB2/G/ifIf7NwNLYefOAtcBzwGPAe8Lxt4Xf294Q/wMp789J4ffwHPAM8PFw/FDg88D2cPk8cGi4r/I+fDT2PgwA5wD/Hp7r40X/nXT6pfAAdCn+AhwBPAvcCLwdmFl1/4XACPB6wIDXVj6Mw33zwgfuReHDcG6473LgB1XPdQPw6djtKcBG4JNELZrjwofcsnD/1eEDaiA8tich/hvCh+JZ4UPpb+Ovy+hk8RWiD/gZwILwYXRFWrwJr3VXeC9OBg4jSnI3hfveCdwTe+zrwvt6SHUc4fabgX3AnwPd4cPxxcr7nyPWvURJvIuoVbg9/H4OI0qsi8Jj5wInxc7bB3wkvOZFREmjkvD+GfgCMA04lShRLY39Lm7KeG9mEH3QXxnOnwGcEe77c+DfgCOBOcAPgb+oeh8+GWJ6T3jdr4fnOIkokR9X9N9KJ18KD0CXclyIvlnfQPQNbx/Rt8ujwn3rgA/lfJ77gfPC9TEfvoxNFmcAT1Y9ZgXw5XD9auD7NV7zBuCbsduHE7UQjgm3nSjBdRF1rZ0Ye+x/Ae5Kizfhte4CVsVun0j0jbuLKFE9Bxwf7vss8IXYY5OSxR5gauzYTuDMnLE+Frtvenj+1xAli93ABVQl13DedsBix+4F/jNwTHjfZsTuWwncEPtdZCWLS4DhlPseB86J3V4G/LTqfegKt2eEf8sZscdvBAaK/jvp5IsGuAUAd9/k7pe7+9FE35rnEXUVQPQh8njSeWb2LjO738x2m9nucO7sOl76WGBe5fzwHB8Hjoo95qkcz/PKY9z9V0Qf2vOqHjObqPXyROzYE0BfHfFWx/ME0bfh2e7+EtGYz6VmNoXow/OrNZ7rWR89BvMiUbLLE+vTlSvu/mK4eri7v0DUYvivwA4zu93MToidN+LhEzj2vPPC5Tl3/2XGa2ZJ/X8Snrv63xL//Tzr7vvD9T3h5zOx+/cQvS9SECULGcPdNxN9Wz85HHoK+A/VjzOzY4H/A7wfeLW79wIPEXWFQPTtcMzTV91+Ctjq7r2xywx3PyfjnCTHxOI6HJhF9A067mdEXTfHxo7NJ+pWyvs6o14rnL83PDdEXXl/CCwFXnT3f835nNVqxZrJ3de5+1uJuqA2E/2eKvrMzGK353NwLGGWmc1Iec1a70/i/5NgO2P/LdW/HykxJQvBzE4wsyvN7Ohw+xiib8X/Fh5yHfDfzex0i7w2JIrDiD5AdoXz3s3BBAPRN8OjzeyQqmPHxW7fC/zCzD5mZj1m1mVmJ49j2u45YfrvIcBfEI0djGqRhG+uq4HPmNmM8G/4Y6LB87R4k1xqZiea2XSivvibK9+KQ3I4APw1Y1sV1f/2VDliTWVmR5nZuWZ2GFFX1q+IupcqjgQ+aGbdZnYhURfkd8P79UNgpZlNM7PfBK4AvhaLf0FoNSX5DvAaM/uwmR0a4j4j3PcN4BNmNsfMZhONT2gabgtRshCIBofPAO4xsxeIksRDRAOVuPu3gc8QDTj+ElhDNCD6CNGH4r8SfZCcQjTTpmID8DDwtJlVvnlfD5wYupzWhA/F3yMaTN1K9I36OqKZQPX4OvApou6n04m+3Sf5ANEg/E+IZhN9HfhSRrxJvkrU8nqaaCD3g1X3f4Xovaj+MLwauDH8299Z81+UHWuWKUS/u+1E78d/At4bu/8e4Hii9/ozwB+4+7PhvkuIBtO3A/8AfMrd7wj3fTv8fNbM7gMwsy+a2RcBQvfVW4l+n08DjwJvCed8Ghgimh32IHBfOCYtwkZ3XYq0HjO7Adjm7p9owmvdRTTIe13GY94FLHf3NzU6nnqZ2eVE049LF5uUm1oWIpModE29F7i26FhEJpOShcgkMbNlROM3zxB1GYm0DXVDiYhITWpZiIhITS29kODs2bN9wYIFRYchItJSNm7c+DN3n1PPOS2dLBYsWMDQ0FDRYYiItBQze6L2o0ZTN5SIiNTUsGRhZl8ys51m9lDs2Cwzu8PMHg0/Z8buW2Fmj5nZljCrRERESqKRLYsbiNbAjxsE1rv78cD6cBszOxG4mGgp4rcBXzCzrgbGJiIidWhYsnD37xMtNRB3HtFCa4SfA7Hj33T3l9x9K9GmK29oVGwiIlKfZo9ZHOXuOwDCzyPD8T5GL/u8jZRlkc1suZkNmdnQrl27GhqsiIhEyjIbyhKOJVYLuvu1hKUU+vv7x1VRuGZ4hGvWbWH77j3M6+3hqmWLGFhc75YGIiKdo9nJ4hkzm+vuO8xsLtGuYBC1JOJ7BBxNg9a6XzM8wopbH2TP3mjF5pHde1hx64MAShgiIima3Q21FrgsXL+MaH/hyvGLwxr4C4mWT763EQFcs27LK4miYs/e/Vyzbssrt9cMj7Bk1QYWDt7OklUbWDOca78ZEZG21bCWhZl9g2hv3dlmto1or4FVwGozuwJ4ErgQwN0fNrPVwCNE+z+/L7bF4qTavntP5nG1PERExmpYsnD3S1LuWpry+M8QbcTSUPN6exhJSBjzenuA7JaHkoWIdKqOq+C+atkierpHl3D0dHdx1bJFAImJJOu4iEgnKMtsqKaptA7SZkN1mbE/Ydn2LkuasCUi0hk6LllAlDDSupSSEkXWcRGRTtCRySJLX8qYRm9PN0tWbVBthoh0pI4bs6glaUyje4rxwsv7GNm9B+fgDClNqRWRTqGWRZWkMY0XX97H8y/uHfW4PXv3c+XqB0adIyLSrpQsElSPaSwcvD3xcfvdVYMhIh1B3VAZKpXcWUPb1dXfIiLtSC2LFNWV3FnSqsJFRNqFWhYpkiq501Sqv0VE2pWSRYp6WgsvvLRPM6NEpK0pWaRIay309nQzc3r3qGO79+zVVFoRaWtKFinS1pC6+tyTmH7I2KEeDXSLSDtTskgxsLiPleefQl9vD0ZU2b3y/FMYWNxXc5lzEZF2o9lQGaoL9Coth1rLnIuItBu1LDJUps9WL/PxlhPmZC5zLiLSbpQsMqRthHTn5l1juqguOL2Pa9Zt0VasItKW1A2VIWtsIr4kiLZiFZF2p5ZFhrQxCIdRrYesrVhFRNqBkkWGpOmzFfFlyjU7SkTanZJFhvj02SSV1kNaC0Szo0SkXShZ1DCwuI+7B88mbQfu7bv3pBbwaXaUiLQLJYucsloPA4v7uOD0ProsSildZlxwevo+3yIirUbJIqes1sOa4RFu2TjCfo92vtjvzi0bRzR9VkTahqbO5pS03epbTpjDNeu2JFZzV8Yz1LoQkXagZFGHrNqKJJoNJSLtQt1Q45RncyTNhhKRdqFkMU5JXU9xmg0lIu1E3VDj1GX2yoB2tb7eHq5atmjMeMWa4ZFRYx5JjxERKSMli3FKSxQAdw+ePeaY1o8SkVambqhxSqvqTjuu9aNEpJUpWYxTvVXbWj9KRFpZIcnCzD5iZg+b2UNm9g0zm2Zms8zsDjN7NPycWURseVXWjert6X7l2LTu9LdT60eJSCtrerIwsz7gg0C/u58MdAEXA4PAenc/HlgfbpfeS/sOvHL9+Rf3vrISbTWtHyUirayoAe6pQI+Z7QWmA9uBFcCbw/03AncBHysiuFoqs5rqqdxOqgDXbCgRaRVNTxbuPmJmnwWeBPYA33P375nZUe6+Izxmh5kdmXS+mS0HlgPMnz+/WWG/YiKV2/EKcBGRVtL0ZBHGIs4DFgK7gW+b2aV5z3f3a4FrAfr7+9PnrzZIsyq3VZMhImVSRDfUbwNb3X0XgJndCrwReMbM5oZWxVxgZwGx1VRr9tJkjEOoJkNEyqaI2VBPAmea2XQzM2ApsAlYC1wWHnMZcFsBsdWU1Wro6+1h5fmnTPgDXTUZIlI2RYxZ3GNmNwP3AfuAYaJupcOB1WZ2BVFCubDZseVx1bJFY8Yserq7JiVJVKgmQ0TKppDZUO7+KeBTVYdfImpllNpEZzXlGYuY19uTONNKNRkiUhStDTUO453VlHcsIq31opoMESmKlvtoorxjEZXq8L7eHozJGwsRERkvtSyaKG0PjKTjqskQkTJRsmiw+BhFmi6zJkYkIlI/JYsGylPtDdl7Y4iIlIHGLBooT7U3pO+BISJSFkoWDZSnLkKznESkFShZNFCtuoguM81yEpGWoGTRQEl7WMQdcFeiEJGWoGTRYIdOrX/3PBGRstFsqAapNRNKYxUi0kqULBokayZUn/anEJEWo26oBsmaCaVEISKtRsmiQbLGI65e+3ATIxERmTgliwbJGo/YvWcva4ZHmhiNiMjEKFk0yMDiPnp7ulPv1653ItJKlCwa6OpzT0q9T7veiUgrUbJooIHFfcycnty6mGKmrigRaRlKFg32qd87KbGKe787K259UAlDRFqCeQsvj93f3+9DQ0NFh1HTmuERrlz9QOJS5F1mHHCvey9vEZHxMrON7t5fzzlqWTTBwOI+DqQk5f3uOAf341ZLQ0TKSMmiSfKsA5W0H7eISBlouY8GiG+lWuleumrZoly75mmWlIiUkVoWk6yygODI7j2jupcAVp5/Cn29PRjp+25rJVoRKSMli0mWtIBgpXtpYHEfdw+ezdZV7+CSM46hOl1oJVoRKSsli0mW1o0UP75meIRbNo4QH/I24ILT+zQbSkRKSclikqV1I8WPJ7U+HLhz865GhiYiMm5KFpMsaSvV6u6lPK0PEZEy0WyoSVbpRqqeDVU5vmZ4hClmiQV6GtwWkbJSsmiAgcXJYw+VmVJJiUKD2yJSZuqGaqKsrVY1uC0iZaZk0URZYxK3bBzRUh8iUlqFJAsz6zWzm81ss5ltMrPfMrNZZnaHmT0afs4sIrZGyhqTiC/1sWZ4hCWrNrBw8HaWrNqgJCIihSuqZfG3wD+5+wnA64BNwCCw3t2PB9aH220laaZU3Pbde1IrwJUwRKRITU8WZnYEcBZwPYC7v+zuu4HzgBvDw24EBpodW6MNLO5j5fmnZC71kVUBLiJSlCJaFscBu4Avm9mwmV1nZocBR7n7DoDw88ikk81suZkNmdnQrl2tV8Q2sLiPv37n61JrMVSDISJlVESymAqcBvy9uy8GXqCOLid3v9bd+929f86cOY2KsaEqLYzKooJ9vT2sPP8UBhb30ZuyDatqMESkSEXUWWwDtrn7PeH2zUTJ4hkzm+vuO8xsLrCzgNiaJqkWY83wCL/69b4xj+3uMtVgiEihmp4s3P1pM3vKzBa5+xZgKfBIuFwGrAo/b2t2bEW7Zt0W9h4YW7B32CFT667BSNpTQ3UcIjJeRVVwfwD4mpkdAvwEeDdRl9hqM7sCeBK4sKDYCpM2LvHzPXvrep7KjKrKQHl8Tw0lDBEZj0KShbvfDyRtFr602bEUIe1b/7zeHkYSEka94xW19tQQEamXKribLKuOIs+KtXloRpWITDYliyar9a0/Pkuqt6ebad1T+Mi37q+rkjvPnhoiIvVQsmiyWt/6K1uvfu6iU3lp3wGef3Fv3ZXck9VCERGpULJosrzf+idSyZ1VxyEiMh7az6LJrlq2aNRMJUj+1j/RcYe0PTVERMZDyaLJau2kV1FrZpTqKESkmcwTdm1rFf39/T40NFR0GA1RXSsBUQtk5fmnAKTep4QhIrWY2UZ3TypfSKWWRQlktRKSji9ZtUF1FCLSVEoWBatVbZ304a86ChFpNs2GKlg9s54qO+ildRyqjkJEGkUti4LlbSUkjWHEqY5CRBpJLYuCTaTuomK8ld4iInkpWRQsb7V11njEeCu9RUTyUrIoWN5q67QWSJeZ9uwWkYarOWZhZu8Hvubuzzchno6Up9o6rfI7rWtKM6NEZDLlaVm8BviRma02s7eZmTU6KDmoMgPqI9+6n2ndU+jt6R7VAunTCrMi0gQ1Wxbu/gkz+1Pgd4h2tPs7M1sNXO/ujzc6wE5WPQPq+Rf30tPdxecuOnVUSyTPWlMiIhORa8zCozVBng6XfcBM4GYz+6sGxtbx8tRg1BrzqLRMFg7erplSIjJuecYsPghcBvwMuA64yt33mtkU4FHgo40NsXPlrcFIG/PQXtwiMlnyFOXNBs539yfiB939gJn9bmPCEqi98myaylpTSedqDSkRGY+a3VDu/snqRBG7b9PkhyQV49nxLr7HdxrNlBKRemm5jxLLu/dFXFald4VmSolIvZQsSq7eHe9qtRo0U0pExkPJoo2sGR5hihn7Uza06tOOeiIyTkoWbaIyVpGUKLSLnohMlNaGahNpYxVdZkoUIjJhShZtIm320353JQoRmTB1Q7WArD26K7pSxiq6tJSXiEwCJYuSy1uFnTaonXZcRKQe6oYqubx7dKetPls5rjWiRGQilCxKLu/6UFnV3vGqbu2mJyLjUViyMLMuMxs2s++E27PM7A4zezT8nFlUbGWSd4/urNVn87ZORETSFDlm8SFgE3BEuD0IrHf3VWY2GG5/rKjgyiJth7ykKuy0au+8rRMRkTSFtCzM7GjgHURLnlecB9wYrt8IDDQ7rjLKu0d3mkpVdxKtESUieRXVsvg80T4YM2LHjnL3HQDuvsPMjkw60cyWA8sB5s+f3+g4S6He9aEqalV1a40oEcmr6S2LsAfGTnffOJ7z3f1ad+939/45c+ZMcnTtRVXdIjJZimhZLAHONbNzgGnAEWZ2E/CMmc0NrYq5wM4CYmsraWMSB8ZZ1Z2nOFBE2lPTWxbuvsLdj3b3BcDFwAZ3vxRYS7R9K+Hnbc2Ord3knUmVh6bfinS2MtVZrALeamaPAm8Nt2Wc1gyP8OLL+8YcH+9YhabfinS2Qpf7cPe7gLvC9WeBpUXG0y6qlwip6O3p5upzTxpX15Gm34p0tjK1LGSSpA1sH3bo1HGPMUxml5aItB4lizbUiFZA1nIiItL+lCzaUNq3/Slm415IcKLFgSLS2rREeRtKWiIEDi5XnrbMeS3jLQ4UkdanZNGGKh/olZqIKQkbI1VmMiV9+KueQkSqKVm0qXgrYOHg7YmPSRrDyLvZkoh0FiWLDjCvtydxj+742EalNZH0uKxWiIh0Bg1wd4BaM5ni1dlpVE8h0tnUsugA1WMY1eMQV699OLEuI071FCKdTcmiQ6TNZFozPMLuPXszz1U9hYgoWXS4Wms79Wk2lIigZNHxssYiPn/RqUoSIgJogLvjZY1FXLNui5YgFxFAyaLjJc2UqtCeFSJSoW6oDlRdoX3B6X3cuXmXaixEJJVaFh0mace7WzaOcNWyRVjKOaqxEBEliw6TtuPdlasfwFPOUY2FiChZdJi0VkL1QoMVqrEQEVCy6Dj1tBK0Z4WIVChZdJis2U9xBtw9eLYShYgAmg3VcfLsdQHFjlNoPw2R8lGy6EDxdaKq96+AYscptJ+GSDkpWXS4tBVpAZas2tD0b/dps7VU6yFSLCULGbMibZHf7tNma6nWQ6RYGuDuYGuGR1iyagMLB29nyaoNryzrkfXtvtHSxkpU6yFSLCWLDpVUyV1ZB6rIb/e1dvUTkWKoG6pDZbUe8uzZXct4ZzTV2tVPRIqhZNGhsloPn7vo1AnNkJromEfarn4iUhx1Q3WorLGBgcV9rDz/FPp6ezDqr+QucsxDRBpDLYsOddWyRZmth4l8u9eMJpH2o5ZFh5po6yFLWqtlitmYmVci0hrUsuhgjRobSGq1wMGVbVWVLdJ6mt6yMLNjzOxOM9tkZg+b2YfC8VlmdoeZPRp+zmx2bDI5qlstXTZ2W6WsMYy0+g8RKU4R3VD7gCvd/TeAM4H3mdmJwCCw3t2PB9aH29KiBhb3cffg2Wxd9Q4OpOyVkTSGkVX/ISLFaXqycPcd7n5fuP5LYBPQB5wH3BgediMw0OzYpDGyxjCqk4BmUomUU6ED3Ga2AFgM3AMc5e47IEoowJEp5yw3syEzG9q1a1ezQpUJSNtDY7/7mFaDZlKJlFNhycLMDgduAT7s7r/Ie567X+vu/e7eP2fOnMYFKJOmMoaRZ+xCa0OJlFMhycLMuokSxdfc/dZw+BkzmxvunwvsLCI2aYyBxX25xi60NpRIOTV96qyZGXA9sMnd/yZ211rgMmBV+Hlbs2OTxsqz5lTW2lDaQU+kOOYp3/Ya9oJmbwL+BXgQOBAOf5xo3GI1MB94ErjQ3Z/Leq7+/n4fGhpqYLQymdJ25ctTDDiRc0VkNDPb6O79dZ3T7GQxmZQsWk916+AtJ8zhzs27arYWlqzakNgq6evt4e7Bs5sRukjbGE+yUAW3NFXW/t9Zld2aJSVSLK0NJYWpp6aiiFlSqiQXOUjJQgpTT2uh2bOkVEkuMpqShRSmntZC1iq5jWgBqJJcZDSNWUhhau2pUS1pldyJ7sqXRmMkIqOpZSGFmYw9NRrVAlAluchoallIoSa6p0ajWgD1tnpE2p1aFtLS6lnRth6N3ElQpBWpZSEtLWtXvomOXTRqJ0GRVqSWhbS0ela0FZHxU7KQlpd3Rdu8VIwnMpaShbSFyZq9pGI8kWQas5C2MJHZS/HFDaeYsb+qlVLpztL4hXQyJQtpC1n7YGSpLuqrThQVKsaTTqdkIW1jPLOXkor6kqgYTzqdxiyko+VpMagYT0QtC2lztbZiTdvqtcuMA+7avlUkULKQtpVnkcG0gXFVa4uMpmQhbStrkcFKIhjvwHgrq9XaEkmiZCFtK+8ig520rEejlnSX9qcBbmlbnbzMeFoVujZ1kvFSspC21eytWMsiqwpdmzrJeClZSNvq1GXGs1oPWUu6ay0syaIxC2lrnTQeUZHVevjcRaemLukOGsOQdGpZiLSZrLGa6taWlnaXvJQsRNpMrbGagcV93D14NltXvWNSl3aX9qZuKJE2U0/tSFoF+0RmjKmOoz0pWYi0obxjNRNZ2j2J6jjal5KFSAeb7Ar2PFXzE6FWS3GULEQ63GTOGGtkHYdaLcXSALeITJpGVs2r+jxS1B7xShYiMmkaWTWv6vNi94gvXbIws7eZ2RYze8zMBouOR0Tya2TVfCev9VVRZOuqVGMWZtYF/C/grcA24EdmttbdHyk2MhHJq1FV85M9c6sVFdm6KlvL4g3AY+7+E3d/GfgmcF7BMYlICXTqWl9xRbauStWyAPqAp2K3twFnxB9gZsuB5QDz589vXmQiUrhOXOsrrsjWVdlaFmMXqoFR6xG4+7Xu3u/u/XPmzGlSWCIixSuydVW2lsU24JjY7aOB7QXFIiJSOkW1rsrWsvgRcLyZLTSzQ4CLgbUFxyQi0vFK1bJw931m9n5gHdAFfMndHy44LBGRjleqZAHg7t8Fvlt0HCIiclDZuqFERKSElCxERKQm85SdslqBme0CnmjAU88GftaA550oxVUfxZVfGWMCxVWvvHEd6+511R60dLJoFDMbcvf+ouOoprjqo7jyK2NMoLjq1ci41A0lIiI1KVmIiEhNShbJri06gBSKqz6KK78yxgSKq14Ni0tjFiIiUpNaFiIiUpOShYiI1ObubXsBvgTsBB6KHZsF3AE8Gn7OjN23AngM2AIsix0/HXgw3Pc/ONh9dyjwrXD8HmBBjpiOAe4ENgEPAx8qSVzTgHuBB0Jcf1aGuMJ5XcAw8J2yxBTO/Wl4zvuBoTLEBvQCNwObw/+x3ypBTIvCe1S5/AL4cNFxhfM+QvT//SHgG0R/B2WI60MhpoeBD5fi/1beP4xWvABnAacxOln8FTAYrg8Cfxmun0j0QXkosBB4HOgK990b/ugM+L/A28Px9wJfDNcvBr6VI6a5wGnh+gzg38NrFx2XAYeH693hP9CZRccVHvvHwNc5mCwKjyk8/qfA7KpjRf8ebwT+KFw/hCh5lOL9Cud0AU8DxxYdF9Fma1uBnnB7NXB5CeI6mShRTCdav+//AccXHlc9v+hWvAALGJ0stgBzw/W5wJZwfQWwIva4deFNngtsjh2/BPjf8ceE61OJKietzvhuI9pzvDRxhf+k9xHtUlhoXER7mqwHzuZgsijFe0VysigsNuAIog8/K0tMCTH+DnB3GeLi4M6cs8I53wnxFR3XhcB1sdt/Cny06Lg6ccziKHffARB+HhmOJ23p2hcu2xKOjzrH3fcBPwdenTcQM1sALCb6Fl94XGbWZWb3E3Xd3eHuZYjr80R/KAdix4qOqcKB75nZxrDdb9GxHQfsAr5sZsNmdp2ZHVZwTNUuJuruoei43H0E+CzwJLAD+Lm7f6/ouIhaFWeZ2avNbDpwDlH3daFxdWKySJO2pWvWVq81t4FNfTGzw4FbiPojf1GGuNx9v7ufSvRt/g1mdnKRcZnZ7wI73X1jRhxNjanKEnc/DXg78D4zO6vg2KYSdbv+vbsvBl4g6q4oMqaDLxZtaHYu8O1aD21GXGY2EziPqOtmHnCYmV1adFzuvgn4SyH3AdsAAAK3SURBVKJxiX8i6mLaV3RcnZgsnjGzuQDh585wPG1L123hevXxUeeY2VTgVcBztQIws26iRPE1d7+1LHFVuPtu4C7gbQXHtQQ418x+CnwTONvMbio4ple4+/bwcyfwD8AbCo5tG7AttAghGug+reCY4t4O3Ofuz4TbRcf128BWd9/l7nuBW4E3liAu3P16dz/N3c8Kj3+06Lg6MVmsBS4L1y8jGjOoHL/YzA41s4VEA0r3hubeL83sTDMz4F1V51Se6w+ADR46AdOE57ge2OTuf1OiuOaYWW+43kP0h7S5yLjcfYW7H+3uC4i6Lza4+6VFv1fhPTrMzGZUrhP1dT9U8Pv1NPCUmS0Kh5YCj5Th/Qou4WAXVPVzFRHXk8CZZjY9PN9SohlkRceFmR0Zfs4Hzid634qNK2tAo9Uv4Q3eAewlyqRXEPXLrSfK1OuBWbHH/wnRTIIthFkD4Xg/0QfB48DfcXD62TSiJvVjRLMOjssR05uImns/5uBUwnNKENdvEk1P/XF4zk+G44XGFXvON3NwgLvwmIjGBx7g4FTjPylDbMCpwFD4Pa4BZhYdUzhvOvAs8KrYsTLE9WdEX4oeAr5KNKOoDHH9C1GifwBYWob3S8t9iIhITZ3YDSUiInVSshARkZqULEREpCYlCxERqUnJQkREalKyEBGRmpQsRESkJiULkUlkZq83sx+b2bRQ5f1wjTW2RFqCivJEJpmZfZqoQraHaK2mlQWHJDJhShYikyysrvoj4NfAG919f8EhiUyYuqFEJt8s4HCinRCnFRyLyKRQy0JkkpnZWqIl1RcS7Wz2/oJDEpmwqUUHINJOzOxdwD53/7qZdQE/NLOz3X1D0bGJTIRaFiIiUpPGLEREpCYlCxERqUnJQkREalKyEBGRmpQsRESkJiULERGpSclCRERq+v9LxNES1n4chQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"plt.scatter(chicago_socioeconomic_data['per_capita_income_'], chicago_socioeconomic_data['hardship_index'])\n",
"plt.title('Scatter plot pythonspot.com')\n",
"plt.xlabel('x')\n",
"plt.ylabel('y')\n",
"plt.show()\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click __here__ for the solution.\n",
"\n",
"<!-- Hint:\n",
"# if the import command gives ModuleNotFoundError: No module named 'seaborn'\n",
"# then uncomment the following line i.e. delete the # to install the seaborn package \n",
"# !pip install seaborn\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"import seaborn as sns\n",
"\n",
"income_vs_hardship = %sql SELECT per_capita_income_, hardship_index FROM chicago_socioeconomic_data;\n",
"plot = sns.jointplot(x='per_capita_income_',y='hardship_index', data=income_vs_hardship.DataFrame())\n",
"\n",
"Correct answer:You can see that as Per Capita Income rises as the Hardship Index decreases. We see that the points on the scatter plot are somewhat closer to a straight line in the negative direction, so we have a negative correlation between the two variables. \n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Conclusion\n",
"\n",
"##### Now that you know how to do basic exploratory data analysis using SQL and python visualization tools, you can further explore this dataset to see how the variable `per_capita_income_` is related to `percent_households_below_poverty` and `percent_aged_16_unemployed`. Try to create interesting visualizations!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"##### In this lab you learned how to store a real world data set from the internet in a database (Db2 on IBM Cloud), gain insights into data using SQL queries. You also visualized a portion of the data in the database to see what story it tells."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright &copy; 2018 [cognitiveclass.ai](cognitiveclass.ai?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python",
"language": "python",
"name": "conda-env-python-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.10"
},
"widgets": {
"state": {},
"version": "1.1.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment