Skip to content

Instantly share code, notes, and snippets.

@dmalc
Created October 6, 2020 00:00
Show Gist options
  • Save dmalc/66ab77e3a4ecfdbb1630c003fc1dbb23 to your computer and use it in GitHub Desktop.
Save dmalc/66ab77e3a4ecfdbb1630c003fc1dbb23 to your computer and use it in GitHub Desktop.
Created on Skills Network Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<center>\n",
" <img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/Logos/organization_logo/organization_logo.png\" width=\"300\" alt=\"cognitiveclass.ai logo\" />\n",
"</center>\n",
"\n",
"# Analyzing a real world data-set with SQL and Python\n",
"\n",
"Estaimted time needed: **15** minutes\n",
"\n",
"## Objectives\n",
"\n",
"After complting this lab you will be able to:\n",
"\n",
"- Understand a dataset of selected socioeconomic indicators in Chicago\n",
"- Learn how to store data in an Db2 database on IBM Cloud instance\n",
"- Solve example problems to practice your SQL skills\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selected Socioeconomic Indicators in Chicago\n",
"\n",
"The city of Chicago released a dataset of socioeconomic data to the Chicago City Portal.\n",
"This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.\n",
"\n",
"Scores on the hardship index can range from 1 to 100, with a higher index number representing a greater level of hardship.\n",
"\n",
"A detailed description of the dataset can be found on [the city of Chicago's website](https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ), but to summarize, the dataset has the following variables:\n",
"\n",
"- **Community Area Number** (`ca`): Used to uniquely identify each row of the dataset\n",
"\n",
"- **Community Area Name** (`community_area_name`): The name of the region in the city of Chicago \n",
"\n",
"- **Percent of Housing Crowded** (`percent_of_housing_crowded`): Percent of occupied housing units with more than one person per room\n",
"\n",
"- **Percent Households Below Poverty** (`percent_households_below_poverty`): Percent of households living below the federal poverty line\n",
"\n",
"- **Percent Aged 16+ Unemployed** (`percent_aged_16_unemployed`): Percent of persons over the age of 16 years that are unemployed\n",
"\n",
"- **Percent Aged 25+ without High School Diploma** (`percent_aged_25_without_high_school_diploma`): Percent of persons over the age of 25 years without a high school education\n",
"\n",
"- **Percent Aged Under** 18 or Over 64:Percent of population under 18 or over 64 years of age (`percent_aged_under_18_or_over_64`): (ie. dependents)\n",
"\n",
"- **Per Capita Income** (`per_capita_income_`): Community Area per capita income is estimated as the sum of tract-level aggragate incomes divided by the total population\n",
"\n",
"- **Hardship Index** (`hardship_index`): Score that incorporates each of the six selected socioeconomic indicators\n",
"\n",
"In this Lab, we'll take a look at the variables in the socioeconomic indicators dataset and do some basic analysis with Python.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to the database\n",
"\n",
"Let us first load the SQL extension and establish a connection with the database\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The sql extension is already loaded. To reload it, use:\n",
" %reload_ext sql\n"
]
}
],
"source": [
"%load_ext sql"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Connected: xwg41624@BLUDB'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Remember the connection string is of the format:\n",
"# %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name\n",
"# Enter the connection string for your Db2 on Cloud database instance below\n",
"# i.e. copy after db2:// from the URI string in Service Credentials of your Db2 instance. Remove the double quotes at the end.\n",
"%sql ibm_db_sa://xwg41624:85p%40mjk8c6qkr9vk@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Store the dataset in a Table\n",
"\n",
"##### In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. To analyze the data using SQL, it first needs to be stored in the database.\n",
"\n",
"##### We will first read the dataset source .CSV from the internet into pandas dataframe\n",
"\n",
"##### Then we need to create a table in our Db2 database to store the dataset. The PERSIST command in SQL \"magic\" simplifies the process of table creation and writing the data from a `pandas` dataframe into the table\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n"
]
},
{
"ename": "ValueError",
"evalue": "Table 'chicago_socioeconomic_data' already exists.",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-4-8233fac43948>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mpandas\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mchicago_socioeconomic_data\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpandas\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'https://data.cityofchicago.org/resource/jcxq-k9xf.csv'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_line_magic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'sql'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'PERSIST chicago_socioeconomic_data'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/IPython/core/interactiveshell.py\u001b[0m in \u001b[0;36mrun_line_magic\u001b[0;34m(self, magic_name, line, _stack_depth)\u001b[0m\n\u001b[1;32m 2324\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'local_ns'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msys\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getframe\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstack_depth\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mf_locals\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2325\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbuiltin_trap\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2326\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2327\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2328\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m<decorator-gen-127>\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, line, cell, local_ns)\u001b[0m\n",
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/IPython/core/magic.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m(f, *a, **k)\u001b[0m\n\u001b[1;32m 185\u001b[0m \u001b[0;31m# but it's overkill for just that one bit of state.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 186\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mmagic_deco\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 187\u001b[0;31m \u001b[0mcall\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 188\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcallable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m<decorator-gen-126>\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, line, cell, local_ns)\u001b[0m\n",
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/IPython/core/magic.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m(f, *a, **k)\u001b[0m\n\u001b[1;32m 185\u001b[0m \u001b[0;31m# but it's overkill for just that one bit of state.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 186\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mmagic_deco\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 187\u001b[0;31m \u001b[0mcall\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 188\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcallable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/sql/magic.py\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, line, cell, local_ns)\u001b[0m\n\u001b[1;32m 90\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 91\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mflags\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'persist'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 92\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_persist_dataframe\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparsed\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'sql'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0muser_ns\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 93\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 94\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/sql/magic.py\u001b[0m in \u001b[0;36m_persist_dataframe\u001b[0;34m(self, raw, conn, user_ns)\u001b[0m\n\u001b[1;32m 149\u001b[0m \u001b[0mtable_name\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlegal_sql_identifier\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msearch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtable_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroup\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 150\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 151\u001b[0;31m \u001b[0mframe\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mto_sql\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtable_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msession\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mengine\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 152\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;34m'Persisted %s'\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0mtable_name\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 153\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36mto_sql\u001b[0;34m(self, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)\u001b[0m\n\u001b[1;32m 2610\u001b[0m \u001b[0mchunksize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mchunksize\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2611\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2612\u001b[0;31m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2613\u001b[0m )\n\u001b[1;32m 2614\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/io/sql.py\u001b[0m in \u001b[0;36mto_sql\u001b[0;34m(frame, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)\u001b[0m\n\u001b[1;32m 596\u001b[0m \u001b[0mchunksize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mchunksize\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 597\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 598\u001b[0;31m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 599\u001b[0m )\n\u001b[1;32m 600\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/io/sql.py\u001b[0m in \u001b[0;36mto_sql\u001b[0;34m(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype, method)\u001b[0m\n\u001b[1;32m 1391\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1392\u001b[0m )\n\u001b[0;32m-> 1393\u001b[0;31m \u001b[0mtable\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcreate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1394\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1395\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0msqlalchemy\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/io/sql.py\u001b[0m in \u001b[0;36mcreate\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 721\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 722\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mif_exists\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"fail\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 723\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"Table '{self.name}' already exists.\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 724\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mif_exists\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"replace\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 725\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpd_sql\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdrop_table\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mschema\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mValueError\u001b[0m: Table 'chicago_socioeconomic_data' already exists."
]
}
],
"source": [
"import pandas\n",
"chicago_socioeconomic_data = pandas.read_csv('https://data.cityofchicago.org/resource/jcxq-k9xf.csv')\n",
"%sql PERSIST chicago_socioeconomic_data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### You can verify that the table creation was successful by making a basic query like:\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>index</th>\n",
" <th>ca</th>\n",
" <th>community_area_name</th>\n",
" <th>percent_of_housing_crowded</th>\n",
" <th>percent_households_below_poverty</th>\n",
" <th>percent_aged_16_unemployed</th>\n",
" <th>percent_aged_25_without_high_school_diploma</th>\n",
" <th>percent_aged_under_18_or_over_64</th>\n",
" <th>per_capita_income_</th>\n",
" <th>hardship_index</th>\n",
" </tr>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>Rogers Park</td>\n",
" <td>7.7</td>\n",
" <td>23.6</td>\n",
" <td>8.7</td>\n",
" <td>18.2</td>\n",
" <td>27.5</td>\n",
" <td>23939</td>\n",
" <td>39.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>West Ridge</td>\n",
" <td>7.8</td>\n",
" <td>17.2</td>\n",
" <td>8.8</td>\n",
" <td>20.8</td>\n",
" <td>38.5</td>\n",
" <td>23040</td>\n",
" <td>46.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>3.0</td>\n",
" <td>Uptown</td>\n",
" <td>3.8</td>\n",
" <td>24.0</td>\n",
" <td>8.9</td>\n",
" <td>11.8</td>\n",
" <td>22.2</td>\n",
" <td>35787</td>\n",
" <td>20.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>4.0</td>\n",
" <td>Lincoln Square</td>\n",
" <td>3.4</td>\n",
" <td>10.9</td>\n",
" <td>8.2</td>\n",
" <td>13.4</td>\n",
" <td>25.5</td>\n",
" <td>37524</td>\n",
" <td>17.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>5.0</td>\n",
" <td>North Center</td>\n",
" <td>0.3</td>\n",
" <td>7.5</td>\n",
" <td>5.2</td>\n",
" <td>4.5</td>\n",
" <td>26.2</td>\n",
" <td>57123</td>\n",
" <td>6.0</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(0, 1.0, 'Rogers Park', 7.7, 23.6, 8.7, 18.2, 27.5, 23939, 39.0),\n",
" (1, 2.0, 'West Ridge', 7.8, 17.2, 8.8, 20.8, 38.5, 23040, 46.0),\n",
" (2, 3.0, 'Uptown', 3.8, 24.0, 8.9, 11.8, 22.2, 35787, 20.0),\n",
" (3, 4.0, 'Lincoln Square', 3.4, 10.9, 8.2, 13.4, 25.5, 37524, 17.0),\n",
" (4, 5.0, 'North Center', 0.3, 7.5, 5.2, 4.5, 26.2, 57123, 6.0)]"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql SELECT * FROM chicago_socioeconomic_data limit 5;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Problems\n",
"\n",
"### Problem 1\n",
"\n",
"##### How many rows are in the dataset?\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>1</th>\n",
" </tr>\n",
" <tr>\n",
" <td>78</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(Decimal('78'),)]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql SELECT COUNT(*) FROM chicago_socioeconomic_data;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click **here** for the solution.\n",
"\n",
"<!-- Hint:\n",
"\n",
"%sql SELECT COUNT(*) FROM chicago_socioeconomic_data;\n",
"\n",
"Correct answer: 78\n",
"\n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 2\n",
"\n",
"##### How many community areas in Chicago have a hardship index greater than 50.0?\n"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>1</th>\n",
" </tr>\n",
" <tr>\n",
" <td>38</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(Decimal('38'),)]"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql SELECT COUNT(*) from chicago_socioeconomic_data where hardship_index > 50.0;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click **here** for the solution.\n",
"\n",
"<!-- Hint:\n",
"\n",
"%sql SELECT COUNT(*) FROM chicago_socioeconomic_data WHERE hardship_index > 50.0;\n",
"Correct answer: 38\n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 3\n",
"\n",
"##### What is the maximum value of hardship index in this dataset?\n"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>1</th>\n",
" </tr>\n",
" <tr>\n",
" <td>98.0</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(98.0,)]"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql SELECT MAX(hardship_index) from chicago_socioeconomic_data;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click **here** for the solution.\n",
"\n",
"<!-- Hint:\n",
"\n",
"%sql SELECT MAX(hardship_index) FROM chicago_socioeconomic_data;\n",
"\n",
"Correct answer: 98.0\n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 4\n",
"\n",
"##### Which community area which has the highest hardship index?\n"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>community_area_name</th>\n",
" </tr>\n",
" <tr>\n",
" <td>Riverdale</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[('Riverdale',)]"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql SELECT community_area_name from chicago_socioeconomic_data where hardship_index = 98.0;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click **here** for the solution.\n",
"\n",
"<!-- Hint:\n",
"\n",
"## We can use the result of the last query to as an input to this query:\n",
"%sql SELECT community_area_name FROM chicago_socioeconomic_data where hardship_index=98.0\n",
"\n",
"## or another option:\n",
"%sql SELECT community_area_name FROM chicago_socioeconomic_data ORDER BY hardship_index DESC NULLS LAST FETCH FIRST ROW ONLY;\n",
"\n",
"## or you can use a sub-query to determine the max hardship index:\n",
"%sql select community_area_name from chicago_socioeconomic_data where hardship_index = ( select max(hardship_index) from chicago_socioeconomic_data ) \n",
"\n",
"Correct answer: 'Riverdale'\n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 5\n",
"\n",
"##### Which Chicago community areas have per-capita incomes greater than $60,000?\n"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>community_area_name</th>\n",
" </tr>\n",
" <tr>\n",
" <td>Lake View</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Lincoln Park</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Near North Side</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Loop</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[('Lake View',), ('Lincoln Park',), ('Near North Side',), ('Loop',)]"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql SELECT community_area_name from chicago_socioeconomic_data where per_capita_income_ > 60000;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click **here** for the solution.\n",
"\n",
"<!-- Hint:\n",
"\n",
"%sql SELECT community_area_name FROM chicago_socioeconomic_data WHERE per_capita_income_ > 60000;\n",
"\n",
"Correct answer:Lake View,Lincoln Park, Near North Side, Loop\n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 6\n",
"\n",
"##### Create a scatter plot using the variables `per_capita_income_` and `hardship_index`. Explain the correlation between the two variables.\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
"Done.\n"
]
},
{
"ename": "AttributeError",
"evalue": "'ResultSet' object has no attribute 'get'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-19-e95d24052aa0>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mincome_vs_hardship\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_line_magic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'sql'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'SELECT per_capita_income_, hardship_index FROM chicago_socioeconomic_data'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mplot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjointplot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"per_capita_income_\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"hardship_index\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mincome_vs_hardship\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/seaborn/axisgrid.py\u001b[0m in \u001b[0;36mjointplot\u001b[0;34m(x, y, data, kind, stat_func, color, height, ratio, space, dropna, xlim, ylim, joint_kws, marginal_kws, annot_kws, **kwargs)\u001b[0m\n\u001b[1;32m 2282\u001b[0m grid = JointGrid(x, y, data, dropna=dropna,\n\u001b[1;32m 2283\u001b[0m \u001b[0mheight\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mheight\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mratio\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mratio\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mspace\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mspace\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2284\u001b[0;31m xlim=xlim, ylim=ylim)\n\u001b[0m\u001b[1;32m 2285\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2286\u001b[0m \u001b[0;31m# Plot the data using the grid\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/seaborn/axisgrid.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, x, y, data, height, ratio, space, dropna, xlim, ylim, size)\u001b[0m\n\u001b[1;32m 1701\u001b[0m \u001b[0;31m# Possibly extract the variables from a DataFrame\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1702\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdata\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1703\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1704\u001b[0m \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1705\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mAttributeError\u001b[0m: 'ResultSet' object has no attribute 'get'"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAFlCAYAAAD76RNtAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAPsElEQVR4nO3cX4ild33H8fenuwb8VyPuKHY3292WNXEvTNExStE2Vlp3c7MVcpEohgZlCTXiZUIv9MKbihRETFyWsARv3IsadC2roVA0hRibXYhJ1rBhutZkukKyKrZVaNjk24tzWo7T2TnPzJyZcb77fsHAPM/5cc73xyxvnn1mzklVIUna/n5nqweQJM2GQZekJgy6JDVh0CWpCYMuSU0YdElqYudWvfCuXbtq3759W/Xykho6e/bspaqa2+o5tsqWBX3fvn2cOXNmq15eUkNJfrLVM2wlb7lIUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJlJVW/PCyX8C57fkxTfHLuDSVg+xQTrvDXrvr/PeAK6vqtdv9RBbZecWvvb5qprfwtffUEnOdN1f571B7/113huM9rfVM2wlb7lIUhMGXZKa2MqgH9/C194MnffXeW/Qe3+d9wb997eiLfulqCRptqZeoSc5keSFJE9f4fEk+VKShSRPJnnn7MeUJE0z5JbLg8ChFR4/DBwYfx0FvrL+sSRJqzU16FX1CPDzFZYcAb5aI48B1yZ566wGlCQNM4tfiu4Gnp84XhyfkyRtolm8sSjLnFv2N61JjjK6LcNrX/vad91www0zeHlJGjl79uylqppber5Te660R5hN0BeB6yaO9wAXl1tYVccZ/1nR/Px8nTlzVb+pS9KMJfnJcuc7tedKe4TZ3HI5Bdwx/muX9wK/rKqfzuB5JUmrMPUKPcnXgJuBXUkWgc8CrwKoqmPAaeAWYAH4NXDnRg0rSbqyqUGvqtunPF7AJ2c2kSRpTfwsF0lqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgYFPcmhJOeTLCS5d5nH35DkW0l+mORckjtnP6okaSVTg55kB3AfcBg4CNye5OCSZZ8EflRVNwI3A3+X5JoZzypJWsGQK/SbgIWqulBVLwEngSNL1hTw+iQBXgf8HLg800klSSsaEvTdwPMTx4vjc5O+DLwduAg8BXy6ql6ZyYSSpEGGBD3LnKslxx8CngB+D/gj4MtJfvf/PVFyNMmZJGdefPHFVY4qSWtztbRnSNAXgesmjvcwuhKfdCfwUI0sAD8Gblj6RFV1vKrmq2p+bm5urTNL0qpcLe0ZEvTHgQNJ9o9/0XkbcGrJmueADwIkeQtwPXBhloNKkla2c9qCqrqc5G7gYWAHcKKqziW5a/z4MeBzwINJnmJ0i+aeqrq0gXNLkpaYGnSAqjoNnF5y7tjE9xeBv5jtaJKk1fCdopLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTQwKepJDSc4nWUhy7xXW3JzkiSTnknxvtmNKkqbZOW1Bkh3AfcCfA4vA40lOVdWPJtZcC9wPHKqq55K8eYPmlSRdwZAr9JuAhaq6UFUvASeBI0vWfAR4qKqeA6iqF2Y7piRpmiFB3w08P3G8OD436W3AG5N8N8nZJHfMakBJ0jBTb7kAWeZcLfM87wI+CLwa+H6Sx6rq2d94ouQocBRg7969q59WktbgamnPkCv0ReC6ieM9wMVl1nynqn5VVZeAR4Ablz5RVR2vqvmqmp+bm1vrzJK0KldLe4YE/XHgQJL9Sa4BbgNOLVnzTeD9SXYmeQ3wHuCZ2Y4qSVrJ1FsuVXU5yd3Aw8AO4ERVnUty1/jxY1X1TJLvAE8CrwAPVNXTGzm4JOk3DbmHTlWdBk4vOXdsyfEXgC/MbjRJ0mr4TlFJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYGBT3JoSTnkywkuXeFde9O8nKSW2c3oiRpiKlBT7IDuA84DBwEbk9y8ArrPg88POshJUnTDblCvwlYqKoLVfUScBI4ssy6TwFfB16Y4XySpIGGBH038PzE8eL43P9Jshv4MHBsdqNJklZjSNCzzLlacvxF4J6qennFJ0qOJjmT5MyLL744cERJWp+rpT1Dgr4IXDdxvAe4uGTNPHAyyb8BtwL3J/nLpU9UVcerar6q5ufm5tY2sSSt0tXSnp0D1jwOHEiyH/h34DbgI5MLqmr//36f5EHgH6rqG7MbU5I0zdSgV9XlJHcz+uuVHcCJqjqX5K7x4943l6TfAkOu0Kmq08DpJeeWDXlV/dX6x5IkrZbvFJWkJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJamJQ0JMcSnI+yUKSe5d5/KNJnhx/PZrkxtmPKklaydSgJ9kB3AccBg4Ctyc5uGTZj4E/rap3AJ8Djs96UEnSyoZcod8ELFTVhap6CTgJHJlcUFWPVtUvxoePAXtmO6YkaZohQd8NPD9xvDg+dyUfB769nqEkSau3c8CaLHOull2YfIBR0N93hcePAkcB9u7dO3BESVqfq6U9Q67QF4HrJo73ABeXLkryDuAB4EhV/Wy5J6qq41U1X1Xzc3Nza5lXklbtamnPkKA/DhxIsj/JNcBtwKnJBUn2Ag8BH6uqZ2c/piRpmqm3XKrqcpK7gYeBHcCJqjqX5K7x48eAzwBvAu5PAnC5quY3bmxJ0lJD7qFTVaeB00vOHZv4/hPAJ2Y7miRpNXynqCQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSE4OCnuRQkvNJFpLcu8zjSfKl8eNPJnnn7EeVJK1katCT7ADuAw4DB4HbkxxcsuwwcGD8dRT4yoznlCRNMeQK/SZgoaouVNVLwEngyJI1R4Cv1shjwLVJ3jrjWSVJKxgS9N3A8xPHi+Nzq10jSdpAOwesyTLnag1rSHKU0S0ZgP9O8vSA19+udgGXtnqIDdJ5b9B7f533BnD9cieXtOe/kpzfvJFm7vev9MCQoC8C100c7wEurmENVXUcOA6Q5ExVzQ94/W2p8/467w1676/z3mC0v+XOT7ansyG3XB4HDiTZn+Qa4Dbg1JI1p4A7xn/t8l7gl1X10xnPKklawdQr9Kq6nORu4GFgB3Ciqs4luWv8+DHgNHALsAD8Grhz40aWJC1nyC0Xquo0o2hPnjs28X0Bn1zla3f/70/n/XXeG/TeX+e9Qf/9rSijFkuStjvf+i9JTWx40Dt/bMCAvX10vKcnkzya5MatmHOtpu1vYt27k7yc5NbNnG89huwtyc1JnkhyLsn3NnvG9Rjwb/MNSb6V5Ifj/W2b33slOZHkhSv92fN2bsq6VdWGfTH6Jeq/An8AXAP8EDi4ZM0twLcZ/S37e4EfbORMm7y3PwbeOP7+8HbZ29D9Taz7J0a/Y7l1q+ee4c/uWuBHwN7x8Zu3eu4Z7+9vgM+Pv58Dfg5cs9WzD9zfnwDvBJ6+wuPbsimz+NroK/TOHxswdW9V9WhV/WJ8+Bijv8/fLob87AA+BXwdeGEzh1unIXv7CPBQVT0HUFXd9lfA65MEeB2joF/e3DHXpqoeYTTvlWzXpqzbRge988cGrHbujzO6atgupu4vyW7gw8AxtpchP7u3AW9M8t0kZ5PcsWnTrd+Q/X0ZeDujNwA+BXy6ql7ZnPE23HZtyroN+rPFdZjZxwb8Fho8d5IPMAr6+zZ0otkasr8vAvdU1cujC71tY8jedgLvAj4IvBr4fpLHqurZjR5uBobs70PAE8CfAX8I/GOSf66q/9jg2TbDdm3Kum100Gf2sQG/hQbNneQdwAPA4ar62SbNNgtD9jcPnBzHfBdwS5LLVfWNTZlw7Yb+u7xUVb8CfpXkEeBGYDsEfcj+7gT+tkY3nReS/Bi4AfiXzRlxQ23XpqzbRt9y6fyxAVP3lmQv8BDwsW1yZTdp6v6qan9V7auqfcDfA3+9DWIOw/5dfhN4f5KdSV4DvAd4ZpPnXKsh+3uO0f8+SPIWRh9qdWFTp9w427Up67ahV+jV+GMDBu7tM8CbgPvHV7GXa5t8MNLA/W1LQ/ZWVc8k+Q7wJPAK8EBVbYtPBx34s/sc8GCSpxjdorinqrbFpzAm+RpwM7ArySLwWeBVsL2bMgu+U1SSmvCdopLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2Smvgf+kfmythyJFEAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x432 with 3 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plot\n",
"%matplotlib inline\n",
"import seaborn as sns\n",
"\n",
"income_vs_hardship = %sql SELECT per_capita_income_, hardship_index FROM chicago_socioeconomic_data\n",
"plot = sns.jointplot(x = \"per_capita_income_\", y = \"hardship_index\", data=income_vs_hardship.h)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click **here** for the solution.\n",
"\n",
"<!-- Hint:\n",
"# if the import command gives ModuleNotFoundError: No module named 'seaborn'\n",
"# then uncomment the following line i.e. delete the # to install the seaborn package \n",
"# !pip install seaborn\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"import seaborn as sns\n",
"\n",
"income_vs_hardship = %sql SELECT per_capita_income_, hardship_index FROM chicago_socioeconomic_data;\n",
"plot = sns.jointplot(x='per_capita_income_',y='hardship_index', data=income_vs_hardship.DataFrame())\n",
"\n",
"Correct answer:You can see that as Per Capita Income rises as the Hardship Index decreases. We see that the points on the scatter plot are somewhat closer to a straight line in the negative direction, so we have a negative correlation between the two variables. \n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Conclusion\n",
"\n",
"##### Now that you know how to do basic exploratory data analysis using SQL and python visualization tools, you can further explore this dataset to see how the variable `per_capita_income_` is related to `percent_households_below_poverty` and `percent_aged_16_unemployed`. Try to create interesting visualizations!\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"##### In this lab you learned how to store a real world data set from the internet in a database (Db2 on IBM Cloud), gain insights into data using SQL queries. You also visualized a portion of the data in the database to see what story it tells.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Author\n",
"\n",
"<a href=\"https://www.linkedin.com/in/ravahuja/\" target=\"_blank\">Rav Ahuja</a>\n",
"\n",
"## Change Log\n",
"\n",
"| Date (YYYY-MM-DD) | Version | Changed By | Change Description |\n",
"| ----------------- | ------- | ---------- | ---------------------------------- |\n",
"| 2020-08-28 | 2.0 | Lavanya | Moved lab to course repo in GitLab |\n",
"\n",
"<hr>\n",
"\n",
"## <h3 align=\"center\"> © IBM Corporation 2020. All rights reserved. <h3/>\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python",
"language": "python",
"name": "conda-env-python-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.11"
},
"widgets": {
"state": {},
"version": "1.1.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment