Created
October 6, 2020 00:00
-
-
Save dmalc/66ab77e3a4ecfdbb1630c003fc1dbb23 to your computer and use it in GitHub Desktop.
Created on Skills Network Labs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<center>\n", | |
" <img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/Logos/organization_logo/organization_logo.png\" width=\"300\" alt=\"cognitiveclass.ai logo\" />\n", | |
"</center>\n", | |
"\n", | |
"# Analyzing a real world data-set with SQL and Python\n", | |
"\n", | |
"Estaimted time needed: **15** minutes\n", | |
"\n", | |
"## Objectives\n", | |
"\n", | |
"After complting this lab you will be able to:\n", | |
"\n", | |
"- Understand a dataset of selected socioeconomic indicators in Chicago\n", | |
"- Learn how to store data in an Db2 database on IBM Cloud instance\n", | |
"- Solve example problems to practice your SQL skills\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Selected Socioeconomic Indicators in Chicago\n", | |
"\n", | |
"The city of Chicago released a dataset of socioeconomic data to the Chicago City Portal.\n", | |
"This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.\n", | |
"\n", | |
"Scores on the hardship index can range from 1 to 100, with a higher index number representing a greater level of hardship.\n", | |
"\n", | |
"A detailed description of the dataset can be found on [the city of Chicago's website](https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ), but to summarize, the dataset has the following variables:\n", | |
"\n", | |
"- **Community Area Number** (`ca`): Used to uniquely identify each row of the dataset\n", | |
"\n", | |
"- **Community Area Name** (`community_area_name`): The name of the region in the city of Chicago \n", | |
"\n", | |
"- **Percent of Housing Crowded** (`percent_of_housing_crowded`): Percent of occupied housing units with more than one person per room\n", | |
"\n", | |
"- **Percent Households Below Poverty** (`percent_households_below_poverty`): Percent of households living below the federal poverty line\n", | |
"\n", | |
"- **Percent Aged 16+ Unemployed** (`percent_aged_16_unemployed`): Percent of persons over the age of 16 years that are unemployed\n", | |
"\n", | |
"- **Percent Aged 25+ without High School Diploma** (`percent_aged_25_without_high_school_diploma`): Percent of persons over the age of 25 years without a high school education\n", | |
"\n", | |
"- **Percent Aged Under** 18 or Over 64:Percent of population under 18 or over 64 years of age (`percent_aged_under_18_or_over_64`): (ie. dependents)\n", | |
"\n", | |
"- **Per Capita Income** (`per_capita_income_`): Community Area per capita income is estimated as the sum of tract-level aggragate incomes divided by the total population\n", | |
"\n", | |
"- **Hardship Index** (`hardship_index`): Score that incorporates each of the six selected socioeconomic indicators\n", | |
"\n", | |
"In this Lab, we'll take a look at the variables in the socioeconomic indicators dataset and do some basic analysis with Python.\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Connect to the database\n", | |
"\n", | |
"Let us first load the SQL extension and establish a connection with the database\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"The sql extension is already loaded. To reload it, use:\n", | |
" %reload_ext sql\n" | |
] | |
} | |
], | |
"source": [ | |
"%load_ext sql" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'Connected: xwg41624@BLUDB'" | |
] | |
}, | |
"execution_count": 2, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Remember the connection string is of the format:\n", | |
"# %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name\n", | |
"# Enter the connection string for your Db2 on Cloud database instance below\n", | |
"# i.e. copy after db2:// from the URI string in Service Credentials of your Db2 instance. Remove the double quotes at the end.\n", | |
"%sql ibm_db_sa://xwg41624:85p%40mjk8c6qkr9vk@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Store the dataset in a Table\n", | |
"\n", | |
"##### In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. To analyze the data using SQL, it first needs to be stored in the database.\n", | |
"\n", | |
"##### We will first read the dataset source .CSV from the internet into pandas dataframe\n", | |
"\n", | |
"##### Then we need to create a table in our Db2 database to store the dataset. The PERSIST command in SQL \"magic\" simplifies the process of table creation and writing the data from a `pandas` dataframe into the table\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n" | |
] | |
}, | |
{ | |
"ename": "ValueError", | |
"evalue": "Table 'chicago_socioeconomic_data' already exists.", | |
"output_type": "error", | |
"traceback": [ | |
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", | |
"\u001b[0;32m<ipython-input-4-8233fac43948>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mpandas\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mchicago_socioeconomic_data\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpandas\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'https://data.cityofchicago.org/resource/jcxq-k9xf.csv'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_line_magic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'sql'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'PERSIST chicago_socioeconomic_data'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", | |
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/IPython/core/interactiveshell.py\u001b[0m in \u001b[0;36mrun_line_magic\u001b[0;34m(self, magic_name, line, _stack_depth)\u001b[0m\n\u001b[1;32m 2324\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'local_ns'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msys\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getframe\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstack_depth\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mf_locals\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2325\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbuiltin_trap\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2326\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2327\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2328\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m<decorator-gen-127>\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, line, cell, local_ns)\u001b[0m\n", | |
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/IPython/core/magic.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m(f, *a, **k)\u001b[0m\n\u001b[1;32m 185\u001b[0m \u001b[0;31m# but it's overkill for just that one bit of state.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 186\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mmagic_deco\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 187\u001b[0;31m \u001b[0mcall\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 188\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcallable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m<decorator-gen-126>\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, line, cell, local_ns)\u001b[0m\n", | |
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/IPython/core/magic.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m(f, *a, **k)\u001b[0m\n\u001b[1;32m 185\u001b[0m \u001b[0;31m# but it's overkill for just that one bit of state.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 186\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mmagic_deco\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 187\u001b[0;31m \u001b[0mcall\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 188\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcallable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/sql/magic.py\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, line, cell, local_ns)\u001b[0m\n\u001b[1;32m 90\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 91\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mflags\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'persist'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 92\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_persist_dataframe\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparsed\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'sql'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0muser_ns\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 93\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 94\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/sql/magic.py\u001b[0m in \u001b[0;36m_persist_dataframe\u001b[0;34m(self, raw, conn, user_ns)\u001b[0m\n\u001b[1;32m 149\u001b[0m \u001b[0mtable_name\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlegal_sql_identifier\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msearch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtable_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroup\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 150\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 151\u001b[0;31m \u001b[0mframe\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mto_sql\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtable_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msession\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mengine\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 152\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;34m'Persisted %s'\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0mtable_name\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 153\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36mto_sql\u001b[0;34m(self, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)\u001b[0m\n\u001b[1;32m 2610\u001b[0m \u001b[0mchunksize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mchunksize\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2611\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2612\u001b[0;31m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2613\u001b[0m )\n\u001b[1;32m 2614\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/io/sql.py\u001b[0m in \u001b[0;36mto_sql\u001b[0;34m(frame, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)\u001b[0m\n\u001b[1;32m 596\u001b[0m \u001b[0mchunksize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mchunksize\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 597\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 598\u001b[0;31m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 599\u001b[0m )\n\u001b[1;32m 600\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/io/sql.py\u001b[0m in \u001b[0;36mto_sql\u001b[0;34m(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype, method)\u001b[0m\n\u001b[1;32m 1391\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1392\u001b[0m )\n\u001b[0;32m-> 1393\u001b[0;31m \u001b[0mtable\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcreate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1394\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1395\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0msqlalchemy\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/io/sql.py\u001b[0m in \u001b[0;36mcreate\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 721\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 722\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mif_exists\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"fail\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 723\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"Table '{self.name}' already exists.\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 724\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mif_exists\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"replace\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 725\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpd_sql\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdrop_table\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mschema\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;31mValueError\u001b[0m: Table 'chicago_socioeconomic_data' already exists." | |
] | |
} | |
], | |
"source": [ | |
"import pandas\n", | |
"chicago_socioeconomic_data = pandas.read_csv('https://data.cityofchicago.org/resource/jcxq-k9xf.csv')\n", | |
"%sql PERSIST chicago_socioeconomic_data" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"##### You can verify that the table creation was successful by making a basic query like:\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>index</th>\n", | |
" <th>ca</th>\n", | |
" <th>community_area_name</th>\n", | |
" <th>percent_of_housing_crowded</th>\n", | |
" <th>percent_households_below_poverty</th>\n", | |
" <th>percent_aged_16_unemployed</th>\n", | |
" <th>percent_aged_25_without_high_school_diploma</th>\n", | |
" <th>percent_aged_under_18_or_over_64</th>\n", | |
" <th>per_capita_income_</th>\n", | |
" <th>hardship_index</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>Rogers Park</td>\n", | |
" <td>7.7</td>\n", | |
" <td>23.6</td>\n", | |
" <td>8.7</td>\n", | |
" <td>18.2</td>\n", | |
" <td>27.5</td>\n", | |
" <td>23939</td>\n", | |
" <td>39.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>West Ridge</td>\n", | |
" <td>7.8</td>\n", | |
" <td>17.2</td>\n", | |
" <td>8.8</td>\n", | |
" <td>20.8</td>\n", | |
" <td>38.5</td>\n", | |
" <td>23040</td>\n", | |
" <td>46.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>2</td>\n", | |
" <td>3.0</td>\n", | |
" <td>Uptown</td>\n", | |
" <td>3.8</td>\n", | |
" <td>24.0</td>\n", | |
" <td>8.9</td>\n", | |
" <td>11.8</td>\n", | |
" <td>22.2</td>\n", | |
" <td>35787</td>\n", | |
" <td>20.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>3</td>\n", | |
" <td>4.0</td>\n", | |
" <td>Lincoln Square</td>\n", | |
" <td>3.4</td>\n", | |
" <td>10.9</td>\n", | |
" <td>8.2</td>\n", | |
" <td>13.4</td>\n", | |
" <td>25.5</td>\n", | |
" <td>37524</td>\n", | |
" <td>17.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>4</td>\n", | |
" <td>5.0</td>\n", | |
" <td>North Center</td>\n", | |
" <td>0.3</td>\n", | |
" <td>7.5</td>\n", | |
" <td>5.2</td>\n", | |
" <td>4.5</td>\n", | |
" <td>26.2</td>\n", | |
" <td>57123</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[(0, 1.0, 'Rogers Park', 7.7, 23.6, 8.7, 18.2, 27.5, 23939, 39.0),\n", | |
" (1, 2.0, 'West Ridge', 7.8, 17.2, 8.8, 20.8, 38.5, 23040, 46.0),\n", | |
" (2, 3.0, 'Uptown', 3.8, 24.0, 8.9, 11.8, 22.2, 35787, 20.0),\n", | |
" (3, 4.0, 'Lincoln Square', 3.4, 10.9, 8.2, 13.4, 25.5, 37524, 17.0),\n", | |
" (4, 5.0, 'North Center', 0.3, 7.5, 5.2, 4.5, 26.2, 57123, 6.0)]" | |
] | |
}, | |
"execution_count": 26, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT * FROM chicago_socioeconomic_data limit 5;" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Problems\n", | |
"\n", | |
"### Problem 1\n", | |
"\n", | |
"##### How many rows are in the dataset?\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>78</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[(Decimal('78'),)]" | |
] | |
}, | |
"execution_count": 27, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT COUNT(*) FROM chicago_socioeconomic_data;" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Double-click **here** for the solution.\n", | |
"\n", | |
"<!-- Hint:\n", | |
"\n", | |
"%sql SELECT COUNT(*) FROM chicago_socioeconomic_data;\n", | |
"\n", | |
"Correct answer: 78\n", | |
"\n", | |
"-->\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Problem 2\n", | |
"\n", | |
"##### How many community areas in Chicago have a hardship index greater than 50.0?\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>38</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[(Decimal('38'),)]" | |
] | |
}, | |
"execution_count": 28, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT COUNT(*) from chicago_socioeconomic_data where hardship_index > 50.0;" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Double-click **here** for the solution.\n", | |
"\n", | |
"<!-- Hint:\n", | |
"\n", | |
"%sql SELECT COUNT(*) FROM chicago_socioeconomic_data WHERE hardship_index > 50.0;\n", | |
"Correct answer: 38\n", | |
"-->\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Problem 3\n", | |
"\n", | |
"##### What is the maximum value of hardship index in this dataset?\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>98.0</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[(98.0,)]" | |
] | |
}, | |
"execution_count": 29, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT MAX(hardship_index) from chicago_socioeconomic_data;" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Double-click **here** for the solution.\n", | |
"\n", | |
"<!-- Hint:\n", | |
"\n", | |
"%sql SELECT MAX(hardship_index) FROM chicago_socioeconomic_data;\n", | |
"\n", | |
"Correct answer: 98.0\n", | |
"-->\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Problem 4\n", | |
"\n", | |
"##### Which community area which has the highest hardship index?\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>community_area_name</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Riverdale</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[('Riverdale',)]" | |
] | |
}, | |
"execution_count": 30, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT community_area_name from chicago_socioeconomic_data where hardship_index = 98.0;" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Double-click **here** for the solution.\n", | |
"\n", | |
"<!-- Hint:\n", | |
"\n", | |
"## We can use the result of the last query to as an input to this query:\n", | |
"%sql SELECT community_area_name FROM chicago_socioeconomic_data where hardship_index=98.0\n", | |
"\n", | |
"## or another option:\n", | |
"%sql SELECT community_area_name FROM chicago_socioeconomic_data ORDER BY hardship_index DESC NULLS LAST FETCH FIRST ROW ONLY;\n", | |
"\n", | |
"## or you can use a sub-query to determine the max hardship index:\n", | |
"%sql select community_area_name from chicago_socioeconomic_data where hardship_index = ( select max(hardship_index) from chicago_socioeconomic_data ) \n", | |
"\n", | |
"Correct answer: 'Riverdale'\n", | |
"-->\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Problem 5\n", | |
"\n", | |
"##### Which Chicago community areas have per-capita incomes greater than $60,000?\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>community_area_name</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Lake View</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Lincoln Park</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Near North Side</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Loop</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[('Lake View',), ('Lincoln Park',), ('Near North Side',), ('Loop',)]" | |
] | |
}, | |
"execution_count": 31, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT community_area_name from chicago_socioeconomic_data where per_capita_income_ > 60000;" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Double-click **here** for the solution.\n", | |
"\n", | |
"<!-- Hint:\n", | |
"\n", | |
"%sql SELECT community_area_name FROM chicago_socioeconomic_data WHERE per_capita_income_ > 60000;\n", | |
"\n", | |
"Correct answer:Lake View,Lincoln Park, Near North Side, Loop\n", | |
"-->\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Problem 6\n", | |
"\n", | |
"##### Create a scatter plot using the variables `per_capita_income_` and `hardship_index`. Explain the correlation between the two variables.\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"ename": "AttributeError", | |
"evalue": "'ResultSet' object has no attribute 'get'", | |
"output_type": "error", | |
"traceback": [ | |
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", | |
"\u001b[0;32m<ipython-input-19-e95d24052aa0>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mincome_vs_hardship\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_line_magic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'sql'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'SELECT per_capita_income_, hardship_index FROM chicago_socioeconomic_data'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mplot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjointplot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"per_capita_income_\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"hardship_index\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mincome_vs_hardship\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", | |
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/seaborn/axisgrid.py\u001b[0m in \u001b[0;36mjointplot\u001b[0;34m(x, y, data, kind, stat_func, color, height, ratio, space, dropna, xlim, ylim, joint_kws, marginal_kws, annot_kws, **kwargs)\u001b[0m\n\u001b[1;32m 2282\u001b[0m grid = JointGrid(x, y, data, dropna=dropna,\n\u001b[1;32m 2283\u001b[0m \u001b[0mheight\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mheight\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mratio\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mratio\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mspace\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mspace\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2284\u001b[0;31m xlim=xlim, ylim=ylim)\n\u001b[0m\u001b[1;32m 2285\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2286\u001b[0m \u001b[0;31m# Plot the data using the grid\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/seaborn/axisgrid.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, x, y, data, height, ratio, space, dropna, xlim, ylim, size)\u001b[0m\n\u001b[1;32m 1701\u001b[0m \u001b[0;31m# Possibly extract the variables from a DataFrame\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1702\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdata\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1703\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1704\u001b[0m \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1705\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;31mAttributeError\u001b[0m: 'ResultSet' object has no attribute 'get'" | |
] | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAFlCAYAAAD76RNtAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAPsElEQVR4nO3cX4ild33H8fenuwb8VyPuKHY3292WNXEvTNExStE2Vlp3c7MVcpEohgZlCTXiZUIv9MKbihRETFyWsARv3IsadC2roVA0hRibXYhJ1rBhutZkukKyKrZVaNjk24tzWo7T2TnPzJyZcb77fsHAPM/5cc73xyxvnn1mzklVIUna/n5nqweQJM2GQZekJgy6JDVh0CWpCYMuSU0YdElqYudWvfCuXbtq3759W/Xykho6e/bspaqa2+o5tsqWBX3fvn2cOXNmq15eUkNJfrLVM2wlb7lIUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJlJVW/PCyX8C57fkxTfHLuDSVg+xQTrvDXrvr/PeAK6vqtdv9RBbZecWvvb5qprfwtffUEnOdN1f571B7/113huM9rfVM2wlb7lIUhMGXZKa2MqgH9/C194MnffXeW/Qe3+d9wb997eiLfulqCRptqZeoSc5keSFJE9f4fEk+VKShSRPJnnn7MeUJE0z5JbLg8ChFR4/DBwYfx0FvrL+sSRJqzU16FX1CPDzFZYcAb5aI48B1yZ566wGlCQNM4tfiu4Gnp84XhyfkyRtolm8sSjLnFv2N61JjjK6LcNrX/vad91www0zeHlJGjl79uylqppber5Te660R5hN0BeB6yaO9wAXl1tYVccZ/1nR/Px8nTlzVb+pS9KMJfnJcuc7tedKe4TZ3HI5Bdwx/muX9wK/rKqfzuB5JUmrMPUKPcnXgJuBXUkWgc8CrwKoqmPAaeAWYAH4NXDnRg0rSbqyqUGvqtunPF7AJ2c2kSRpTfwsF0lqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgYFPcmhJOeTLCS5d5nH35DkW0l+mORckjtnP6okaSVTg55kB3AfcBg4CNye5OCSZZ8EflRVNwI3A3+X5JoZzypJWsGQK/SbgIWqulBVLwEngSNL1hTw+iQBXgf8HLg800klSSsaEvTdwPMTx4vjc5O+DLwduAg8BXy6ql6ZyYSSpEGGBD3LnKslxx8CngB+D/gj4MtJfvf/PVFyNMmZJGdefPHFVY4qSWtztbRnSNAXgesmjvcwuhKfdCfwUI0sAD8Gblj6RFV1vKrmq2p+bm5urTNL0qpcLe0ZEvTHgQNJ9o9/0XkbcGrJmueADwIkeQtwPXBhloNKkla2c9qCqrqc5G7gYWAHcKKqziW5a/z4MeBzwINJnmJ0i+aeqrq0gXNLkpaYGnSAqjoNnF5y7tjE9xeBv5jtaJKk1fCdopLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTQwKepJDSc4nWUhy7xXW3JzkiSTnknxvtmNKkqbZOW1Bkh3AfcCfA4vA40lOVdWPJtZcC9wPHKqq55K8eYPmlSRdwZAr9JuAhaq6UFUvASeBI0vWfAR4qKqeA6iqF2Y7piRpmiFB3w08P3G8OD436W3AG5N8N8nZJHfMakBJ0jBTb7kAWeZcLfM87wI+CLwa+H6Sx6rq2d94ouQocBRg7969q59WktbgamnPkCv0ReC6ieM9wMVl1nynqn5VVZeAR4Ablz5RVR2vqvmqmp+bm1vrzJK0KldLe4YE/XHgQJL9Sa4BbgNOLVnzTeD9SXYmeQ3wHuCZ2Y4qSVrJ1FsuVXU5yd3Aw8AO4ERVnUty1/jxY1X1TJLvAE8CrwAPVNXTGzm4JOk3DbmHTlWdBk4vOXdsyfEXgC/MbjRJ0mr4TlFJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYGBT3JoSTnkywkuXeFde9O8nKSW2c3oiRpiKlBT7IDuA84DBwEbk9y8ArrPg88POshJUnTDblCvwlYqKoLVfUScBI4ssy6TwFfB16Y4XySpIGGBH038PzE8eL43P9Jshv4MHBsdqNJklZjSNCzzLlacvxF4J6qennFJ0qOJjmT5MyLL744cERJWp+rpT1Dgr4IXDdxvAe4uGTNPHAyyb8BtwL3J/nLpU9UVcerar6q5ufm5tY2sSSt0tXSnp0D1jwOHEiyH/h34DbgI5MLqmr//36f5EHgH6rqG7MbU5I0zdSgV9XlJHcz+uuVHcCJqjqX5K7x4943l6TfAkOu0Kmq08DpJeeWDXlV/dX6x5IkrZbvFJWkJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJamJQ0JMcSnI+yUKSe5d5/KNJnhx/PZrkxtmPKklaydSgJ9kB3AccBg4Ctyc5uGTZj4E/rap3AJ8Djs96UEnSyoZcod8ELFTVhap6CTgJHJlcUFWPVtUvxoePAXtmO6YkaZohQd8NPD9xvDg+dyUfB769nqEkSau3c8CaLHOull2YfIBR0N93hcePAkcB9u7dO3BESVqfq6U9Q67QF4HrJo73ABeXLkryDuAB4EhV/Wy5J6qq41U1X1Xzc3Nza5lXklbtamnPkKA/DhxIsj/JNcBtwKnJBUn2Ag8BH6uqZ2c/piRpmqm3XKrqcpK7gYeBHcCJqjqX5K7x48eAzwBvAu5PAnC5quY3bmxJ0lJD7qFTVaeB00vOHZv4/hPAJ2Y7miRpNXynqCQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSE4OCnuRQkvNJFpLcu8zjSfKl8eNPJnnn7EeVJK1katCT7ADuAw4DB4HbkxxcsuwwcGD8dRT4yoznlCRNMeQK/SZgoaouVNVLwEngyJI1R4Cv1shjwLVJ3jrjWSVJKxgS9N3A8xPHi+Nzq10jSdpAOwesyTLnag1rSHKU0S0ZgP9O8vSA19+udgGXtnqIDdJ5b9B7f533BnD9cieXtOe/kpzfvJFm7vev9MCQoC8C100c7wEurmENVXUcOA6Q5ExVzQ94/W2p8/467w1676/z3mC0v+XOT7ansyG3XB4HDiTZn+Qa4Dbg1JI1p4A7xn/t8l7gl1X10xnPKklawdQr9Kq6nORu4GFgB3Ciqs4luWv8+DHgNHALsAD8Grhz40aWJC1nyC0Xquo0o2hPnjs28X0Bn1zla3f/70/n/XXeG/TeX+e9Qf/9rSijFkuStjvf+i9JTWx40Dt/bMCAvX10vKcnkzya5MatmHOtpu1vYt27k7yc5NbNnG89huwtyc1JnkhyLsn3NnvG9Rjwb/MNSb6V5Ifj/W2b33slOZHkhSv92fN2bsq6VdWGfTH6Jeq/An8AXAP8EDi4ZM0twLcZ/S37e4EfbORMm7y3PwbeOP7+8HbZ29D9Taz7J0a/Y7l1q+ee4c/uWuBHwN7x8Zu3eu4Z7+9vgM+Pv58Dfg5cs9WzD9zfnwDvBJ6+wuPbsimz+NroK/TOHxswdW9V9WhV/WJ8+Bijv8/fLob87AA+BXwdeGEzh1unIXv7CPBQVT0HUFXd9lfA65MEeB2joF/e3DHXpqoeYTTvlWzXpqzbRge988cGrHbujzO6atgupu4vyW7gw8AxtpchP7u3AW9M8t0kZ5PcsWnTrd+Q/X0ZeDujNwA+BXy6ql7ZnPE23HZtyroN+rPFdZjZxwb8Fho8d5IPMAr6+zZ0otkasr8vAvdU1cujC71tY8jedgLvAj4IvBr4fpLHqurZjR5uBobs70PAE8CfAX8I/GOSf66q/9jg2TbDdm3Kum100Gf2sQG/hQbNneQdwAPA4ar62SbNNgtD9jcPnBzHfBdwS5LLVfWNTZlw7Yb+u7xUVb8CfpXkEeBGYDsEfcj+7gT+tkY3nReS/Bi4AfiXzRlxQ23XpqzbRt9y6fyxAVP3lmQv8BDwsW1yZTdp6v6qan9V7auqfcDfA3+9DWIOw/5dfhN4f5KdSV4DvAd4ZpPnXKsh+3uO0f8+SPIWRh9qdWFTp9w427Up67ahV+jV+GMDBu7tM8CbgPvHV7GXa5t8MNLA/W1LQ/ZWVc8k+Q7wJPAK8EBVbYtPBx34s/sc8GCSpxjdorinqrbFpzAm+RpwM7ArySLwWeBVsL2bMgu+U1SSmvCdopLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2Smvgf+kfmythyJFEAAAAASUVORK5CYII=\n", | |
"text/plain": [ | |
"<Figure size 432x432 with 3 Axes>" | |
] | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"import matplotlib.pyplot as plot\n", | |
"%matplotlib inline\n", | |
"import seaborn as sns\n", | |
"\n", | |
"income_vs_hardship = %sql SELECT per_capita_income_, hardship_index FROM chicago_socioeconomic_data\n", | |
"plot = sns.jointplot(x = \"per_capita_income_\", y = \"hardship_index\", data=income_vs_hardship.h)\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Double-click **here** for the solution.\n", | |
"\n", | |
"<!-- Hint:\n", | |
"# if the import command gives ModuleNotFoundError: No module named 'seaborn'\n", | |
"# then uncomment the following line i.e. delete the # to install the seaborn package \n", | |
"# !pip install seaborn\n", | |
"import matplotlib.pyplot as plt\n", | |
"%matplotlib inline\n", | |
"import seaborn as sns\n", | |
"\n", | |
"income_vs_hardship = %sql SELECT per_capita_income_, hardship_index FROM chicago_socioeconomic_data;\n", | |
"plot = sns.jointplot(x='per_capita_income_',y='hardship_index', data=income_vs_hardship.DataFrame())\n", | |
"\n", | |
"Correct answer:You can see that as Per Capita Income rises as the Hardship Index decreases. We see that the points on the scatter plot are somewhat closer to a straight line in the negative direction, so we have a negative correlation between the two variables. \n", | |
"-->\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Conclusion\n", | |
"\n", | |
"##### Now that you know how to do basic exploratory data analysis using SQL and python visualization tools, you can further explore this dataset to see how the variable `per_capita_income_` is related to `percent_households_below_poverty` and `percent_aged_16_unemployed`. Try to create interesting visualizations!\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Summary\n", | |
"\n", | |
"##### In this lab you learned how to store a real world data set from the internet in a database (Db2 on IBM Cloud), gain insights into data using SQL queries. You also visualized a portion of the data in the database to see what story it tells.\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Author\n", | |
"\n", | |
"<a href=\"https://www.linkedin.com/in/ravahuja/\" target=\"_blank\">Rav Ahuja</a>\n", | |
"\n", | |
"## Change Log\n", | |
"\n", | |
"| Date (YYYY-MM-DD) | Version | Changed By | Change Description |\n", | |
"| ----------------- | ------- | ---------- | ---------------------------------- |\n", | |
"| 2020-08-28 | 2.0 | Lavanya | Moved lab to course repo in GitLab |\n", | |
"\n", | |
"<hr>\n", | |
"\n", | |
"## <h3 align=\"center\"> © IBM Corporation 2020. All rights reserved. <h3/>\n" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python", | |
"language": "python", | |
"name": "conda-env-python-py" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.11" | |
}, | |
"widgets": { | |
"state": {}, | |
"version": "1.1.2" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment