dmalc/DB0201EN-Week3-1-4-Analyzing-v5-py.ipynb

## DB0201EN-Week3-1-4-Analyzing-v5-py.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "    <img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/Logos/organization_logo/organization_logo.png\" width=\"300\" alt=\"cognitiveclass.ai logo\"  />\n",
    "</center>\n",
    "\n",
    "# Analyzing a real world data-set with SQL and Python\n",
    "\n",
    "Estaimted time needed: **15** minutes\n",
    "\n",
    "## Objectives\n",
    "\n",
    "After complting this lab you will be able to:\n",
    "\n",
    "-   Understand a dataset of selected socioeconomic indicators in Chicago\n",
    "-   Learn how to store data in an Db2 database on IBM Cloud instance\n",
    "-   Solve example problems to practice your SQL skills\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Selected Socioeconomic Indicators in Chicago\n",
    "\n",
    "The city of Chicago released a dataset of socioeconomic data to the Chicago City Portal.\n",
    "This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.\n",
    "\n",
    "Scores on the hardship index can range from 1 to 100, with a higher index number representing a greater level of hardship.\n",
    "\n",
    "A detailed description of the dataset can be found on [the city of Chicago's website](https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ), but to summarize, the dataset has the following variables:\n",
    "\n",
    "-   **Community Area Number** (`ca`): Used to uniquely identify each row of the dataset\n",
    "\n",
    "-   **Community Area Name** (`community_area_name`): The name of the region in the city of Chicago \n",
    "\n",
    "-   **Percent of Housing Crowded** (`percent_of_housing_crowded`): Percent of occupied housing units with more than one person per room\n",
    "\n",
    "-   **Percent Households Below Poverty** (`percent_households_below_poverty`): Percent of households living below the federal poverty line\n",
    "\n",
    "-   **Percent Aged 16+ Unemployed** (`percent_aged_16_unemployed`): Percent of persons over the age of 16 years that are unemployed\n",
    "\n",
    "-   **Percent Aged 25+ without High School Diploma** (`percent_aged_25_without_high_school_diploma`): Percent of persons over the age of 25 years without a high school education\n",
    "\n",
    "-   **Percent Aged Under** 18 or Over 64:Percent of population under 18 or over 64 years of age (`percent_aged_under_18_or_over_64`): (ie. dependents)\n",
    "\n",
    "-   **Per Capita Income** (`per_capita_income_`): Community Area per capita income is estimated as the sum of tract-level aggragate incomes divided by the total population\n",
    "\n",
    "-   **Hardship Index** (`hardship_index`): Score that incorporates each of the six selected socioeconomic indicators\n",
    "\n",
    "In this Lab, we'll take a look at the variables in the socioeconomic indicators dataset and do some basic analysis with Python.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Connect to the database\n",
    "\n",
    "Let us first load the SQL extension and establish a connection with the database\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The sql extension is already loaded. To reload it, use:\n",
      "  %reload_ext sql\n"
     ]
    }
   ],
   "source": [
    "%load_ext sql"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Connected: xwg41624@BLUDB'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Remember the connection string is of the format:\n",
    "# %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name\n",
    "# Enter the connection string for your Db2 on Cloud database instance below\n",
    "# i.e. copy after db2:// from the URI string in Service Credentials of your Db2 instance. Remove the double quotes at the end.\n",
    "%sql ibm_db_sa://xwg41624:85p%40mjk8c6qkr9vk@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Store the dataset in a Table\n",
    "\n",
    "##### In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. To analyze the data using SQL, it first needs to be stored in the database.\n",
    "\n",
    "##### We will first read the dataset source .CSV from the internet into pandas dataframe\n",
    "\n",
    "##### Then we need to create a table in our Db2 database to store the dataset. The PERSIST command in SQL \"magic\" simplifies the process of table creation and writing the data from a `pandas` dataframe into the table\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n"
     ]
    },
    {
     "ename": "ValueError",
     "evalue": "Table 'chicago_socioeconomic_data' already exists.",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-4-8233fac43948>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mpandas\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[0mchicago_socioeconomic_data\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpandas\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'https://data.cityofchicago.org/resource/jcxq-k9xf.csv'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_line_magic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'sql'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'PERSIST chicago_socioeconomic_data'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/IPython/core/interactiveshell.py\u001b[0m in \u001b[0;36mrun_line_magic\u001b[0;34m(self, magic_name, line, _stack_depth)\u001b[0m\n\u001b[1;32m   2324\u001b[0m                 \u001b[0mkwargs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'local_ns'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msys\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getframe\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstack_depth\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mf_locals\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2325\u001b[0m             \u001b[0;32mwith\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbuiltin_trap\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2326\u001b[0;31m                 \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   2327\u001b[0m             \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2328\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m<decorator-gen-127>\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, line, cell, local_ns)\u001b[0m\n",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/IPython/core/magic.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m(f, *a, **k)\u001b[0m\n\u001b[1;32m    185\u001b[0m     \u001b[0;31m# but it's overkill for just that one bit of state.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    186\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0mmagic_deco\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 187\u001b[0;31m         \u001b[0mcall\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    188\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    189\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mcallable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m<decorator-gen-126>\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, line, cell, local_ns)\u001b[0m\n",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/IPython/core/magic.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m(f, *a, **k)\u001b[0m\n\u001b[1;32m    185\u001b[0m     \u001b[0;31m# but it's overkill for just that one bit of state.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    186\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0mmagic_deco\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 187\u001b[0;31m         \u001b[0mcall\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    188\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    189\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mcallable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/sql/magic.py\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, line, cell, local_ns)\u001b[0m\n\u001b[1;32m     90\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     91\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mflags\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'persist'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 92\u001b[0;31m             \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_persist_dataframe\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparsed\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'sql'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0muser_ns\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     93\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     94\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/sql/magic.py\u001b[0m in \u001b[0;36m_persist_dataframe\u001b[0;34m(self, raw, conn, user_ns)\u001b[0m\n\u001b[1;32m    149\u001b[0m         \u001b[0mtable_name\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlegal_sql_identifier\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msearch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtable_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroup\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    150\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 151\u001b[0;31m         \u001b[0mframe\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mto_sql\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtable_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msession\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mengine\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    152\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0;34m'Persisted %s'\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0mtable_name\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    153\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36mto_sql\u001b[0;34m(self, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)\u001b[0m\n\u001b[1;32m   2610\u001b[0m             \u001b[0mchunksize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mchunksize\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2611\u001b[0m             \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2612\u001b[0;31m             \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   2613\u001b[0m         )\n\u001b[1;32m   2614\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/io/sql.py\u001b[0m in \u001b[0;36mto_sql\u001b[0;34m(frame, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)\u001b[0m\n\u001b[1;32m    596\u001b[0m         \u001b[0mchunksize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mchunksize\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    597\u001b[0m         \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 598\u001b[0;31m         \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    599\u001b[0m     )\n\u001b[1;32m    600\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/io/sql.py\u001b[0m in \u001b[0;36mto_sql\u001b[0;34m(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype, method)\u001b[0m\n\u001b[1;32m   1391\u001b[0m             \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1392\u001b[0m         )\n\u001b[0;32m-> 1393\u001b[0;31m         \u001b[0mtable\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcreate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   1394\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1395\u001b[0m         \u001b[0;32mfrom\u001b[0m \u001b[0msqlalchemy\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/io/sql.py\u001b[0m in \u001b[0;36mcreate\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    721\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    722\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mif_exists\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"fail\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 723\u001b[0;31m                 \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"Table '{self.name}' already exists.\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    724\u001b[0m             \u001b[0;32melif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mif_exists\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"replace\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    725\u001b[0m                 \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpd_sql\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdrop_table\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mschema\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mValueError\u001b[0m: Table 'chicago_socioeconomic_data' already exists."
     ]
    }
   ],
   "source": [
    "import pandas\n",
    "chicago_socioeconomic_data = pandas.read_csv('https://data.cityofchicago.org/resource/jcxq-k9xf.csv')\n",
    "%sql PERSIST chicago_socioeconomic_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### You can verify that the table creation was successful by making a basic query like:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>index</th>\n",
       "        <th>ca</th>\n",
       "        <th>community_area_name</th>\n",
       "        <th>percent_of_housing_crowded</th>\n",
       "        <th>percent_households_below_poverty</th>\n",
       "        <th>percent_aged_16_unemployed</th>\n",
       "        <th>percent_aged_25_without_high_school_diploma</th>\n",
       "        <th>percent_aged_under_18_or_over_64</th>\n",
       "        <th>per_capita_income_</th>\n",
       "        <th>hardship_index</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>0</td>\n",
       "        <td>1.0</td>\n",
       "        <td>Rogers Park</td>\n",
       "        <td>7.7</td>\n",
       "        <td>23.6</td>\n",
       "        <td>8.7</td>\n",
       "        <td>18.2</td>\n",
       "        <td>27.5</td>\n",
       "        <td>23939</td>\n",
       "        <td>39.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>1</td>\n",
       "        <td>2.0</td>\n",
       "        <td>West Ridge</td>\n",
       "        <td>7.8</td>\n",
       "        <td>17.2</td>\n",
       "        <td>8.8</td>\n",
       "        <td>20.8</td>\n",
       "        <td>38.5</td>\n",
       "        <td>23040</td>\n",
       "        <td>46.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>2</td>\n",
       "        <td>3.0</td>\n",
       "        <td>Uptown</td>\n",
       "        <td>3.8</td>\n",
       "        <td>24.0</td>\n",
       "        <td>8.9</td>\n",
       "        <td>11.8</td>\n",
       "        <td>22.2</td>\n",
       "        <td>35787</td>\n",
       "        <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>3</td>\n",
       "        <td>4.0</td>\n",
       "        <td>Lincoln Square</td>\n",
       "        <td>3.4</td>\n",
       "        <td>10.9</td>\n",
       "        <td>8.2</td>\n",
       "        <td>13.4</td>\n",
       "        <td>25.5</td>\n",
       "        <td>37524</td>\n",
       "        <td>17.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>4</td>\n",
       "        <td>5.0</td>\n",
       "        <td>North Center</td>\n",
       "        <td>0.3</td>\n",
       "        <td>7.5</td>\n",
       "        <td>5.2</td>\n",
       "        <td>4.5</td>\n",
       "        <td>26.2</td>\n",
       "        <td>57123</td>\n",
       "        <td>6.0</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(0, 1.0, 'Rogers Park', 7.7, 23.6, 8.7, 18.2, 27.5, 23939, 39.0),\n",
       " (1, 2.0, 'West Ridge', 7.8, 17.2, 8.8, 20.8, 38.5, 23040, 46.0),\n",
       " (2, 3.0, 'Uptown', 3.8, 24.0, 8.9, 11.8, 22.2, 35787, 20.0),\n",
       " (3, 4.0, 'Lincoln Square', 3.4, 10.9, 8.2, 13.4, 25.5, 37524, 17.0),\n",
       " (4, 5.0, 'North Center', 0.3, 7.5, 5.2, 4.5, 26.2, 57123, 6.0)]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql SELECT * FROM chicago_socioeconomic_data limit 5;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problems\n",
    "\n",
    "### Problem 1\n",
    "\n",
    "##### How many rows are in the dataset?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>78</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(Decimal('78'),)]"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql SELECT COUNT(*) FROM chicago_socioeconomic_data;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Double-click **here** for the solution.\n",
    "\n",
    "<!-- Hint:\n",
    "\n",
    "%sql SELECT COUNT(*) FROM chicago_socioeconomic_data;\n",
    "\n",
    "Correct answer: 78\n",
    "\n",
    "-->\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Problem 2\n",
    "\n",
    "##### How many community areas in Chicago have a hardship index greater than 50.0?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>38</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(Decimal('38'),)]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql SELECT COUNT(*) from chicago_socioeconomic_data where hardship_index > 50.0;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Double-click **here** for the solution.\n",
    "\n",
    "<!-- Hint:\n",
    "\n",
    "%sql SELECT COUNT(*) FROM chicago_socioeconomic_data WHERE hardship_index > 50.0;\n",
    "Correct answer: 38\n",
    "-->\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Problem 3\n",
    "\n",
    "##### What is the maximum value of hardship index in this dataset?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>98.0</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(98.0,)]"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql SELECT MAX(hardship_index) from chicago_socioeconomic_data;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Double-click **here** for the solution.\n",
    "\n",
    "<!-- Hint:\n",
    "\n",
    "%sql SELECT MAX(hardship_index) FROM chicago_socioeconomic_data;\n",
    "\n",
    "Correct answer: 98.0\n",
    "-->\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Problem 4\n",
    "\n",
    "##### Which community area which has the highest hardship index?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>community_area_name</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Riverdale</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[('Riverdale',)]"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql SELECT community_area_name from chicago_socioeconomic_data where hardship_index = 98.0;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Double-click **here** for the solution.\n",
    "\n",
    "<!-- Hint:\n",
    "\n",
    "## We can use the result of the last query to as an input to this query:\n",
    "%sql SELECT community_area_name FROM chicago_socioeconomic_data where hardship_index=98.0\n",
    "\n",
    "## or another option:\n",
    "%sql SELECT community_area_name FROM chicago_socioeconomic_data ORDER BY hardship_index DESC NULLS LAST FETCH FIRST ROW ONLY;\n",
    "\n",
    "## or you can use a sub-query to determine the max hardship index:\n",
    "%sql select community_area_name from chicago_socioeconomic_data where hardship_index = ( select max(hardship_index) from chicago_socioeconomic_data ) \n",
    "\n",
    "Correct answer: 'Riverdale'\n",
    "-->\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Problem 5\n",
    "\n",
    "##### Which Chicago community areas have per-capita incomes greater than $60,000?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>community_area_name</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Lake View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Lincoln Park</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Near North Side</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Loop</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[('Lake View',), ('Lincoln Park',), ('Near North Side',), ('Loop',)]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql SELECT community_area_name from chicago_socioeconomic_data where per_capita_income_ > 60000;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Double-click **here** for the solution.\n",
    "\n",
    "<!-- Hint:\n",
    "\n",
    "%sql SELECT community_area_name FROM chicago_socioeconomic_data WHERE per_capita_income_ > 60000;\n",
    "\n",
    "Correct answer:Lake View,Lincoln Park, Near North Side, Loop\n",
    "-->\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Problem 6\n",
    "\n",
    "##### Create a scatter plot using the variables `per_capita_income_` and `hardship_index`. Explain the correlation between the two variables.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://xwg41624:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "ename": "AttributeError",
     "evalue": "'ResultSet' object has no attribute 'get'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-19-e95d24052aa0>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0mincome_vs_hardship\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_line_magic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'sql'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'SELECT per_capita_income_, hardship_index FROM chicago_socioeconomic_data'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mplot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjointplot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"per_capita_income_\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"hardship_index\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mincome_vs_hardship\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/seaborn/axisgrid.py\u001b[0m in \u001b[0;36mjointplot\u001b[0;34m(x, y, data, kind, stat_func, color, height, ratio, space, dropna, xlim, ylim, joint_kws, marginal_kws, annot_kws, **kwargs)\u001b[0m\n\u001b[1;32m   2282\u001b[0m     grid = JointGrid(x, y, data, dropna=dropna,\n\u001b[1;32m   2283\u001b[0m                      \u001b[0mheight\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mheight\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mratio\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mratio\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mspace\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mspace\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2284\u001b[0;31m                      xlim=xlim, ylim=ylim)\n\u001b[0m\u001b[1;32m   2285\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2286\u001b[0m     \u001b[0;31m# Plot the data using the grid\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/seaborn/axisgrid.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, x, y, data, height, ratio, space, dropna, xlim, ylim, size)\u001b[0m\n\u001b[1;32m   1701\u001b[0m         \u001b[0;31m# Possibly extract the variables from a DataFrame\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1702\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mdata\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1703\u001b[0;31m             \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   1704\u001b[0m             \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1705\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mAttributeError\u001b[0m: 'ResultSet' object has no attribute 'get'"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAFlCAYAAAD76RNtAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAPsElEQVR4nO3cX4ild33H8fenuwb8VyPuKHY3292WNXEvTNExStE2Vlp3c7MVcpEohgZlCTXiZUIv9MKbihRETFyWsARv3IsadC2roVA0hRibXYhJ1rBhutZkukKyKrZVaNjk24tzWo7T2TnPzJyZcb77fsHAPM/5cc73xyxvnn1mzklVIUna/n5nqweQJM2GQZekJgy6JDVh0CWpCYMuSU0YdElqYudWvfCuXbtq3759W/Xykho6e/bspaqa2+o5tsqWBX3fvn2cOXNmq15eUkNJfrLVM2wlb7lIUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJlJVW/PCyX8C57fkxTfHLuDSVg+xQTrvDXrvr/PeAK6vqtdv9RBbZecWvvb5qprfwtffUEnOdN1f571B7/113huM9rfVM2wlb7lIUhMGXZKa2MqgH9/C194MnffXeW/Qe3+d9wb997eiLfulqCRptqZeoSc5keSFJE9f4fEk+VKShSRPJnnn7MeUJE0z5JbLg8ChFR4/DBwYfx0FvrL+sSRJqzU16FX1CPDzFZYcAb5aI48B1yZ566wGlCQNM4tfiu4Gnp84XhyfkyRtolm8sSjLnFv2N61JjjK6LcNrX/vad91www0zeHlJGjl79uylqppber5Te660R5hN0BeB6yaO9wAXl1tYVccZ/1nR/Px8nTlzVb+pS9KMJfnJcuc7tedKe4TZ3HI5Bdwx/muX9wK/rKqfzuB5JUmrMPUKPcnXgJuBXUkWgc8CrwKoqmPAaeAWYAH4NXDnRg0rSbqyqUGvqtunPF7AJ2c2kSRpTfwsF0lqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgYFPcmhJOeTLCS5d5nH35DkW0l+mORckjtnP6okaSVTg55kB3AfcBg4CNye5OCSZZ8EflRVNwI3A3+X5JoZzypJWsGQK/SbgIWqulBVLwEngSNL1hTw+iQBXgf8HLg800klSSsaEvTdwPMTx4vjc5O+DLwduAg8BXy6ql6ZyYSSpEGGBD3LnKslxx8CngB+D/gj4MtJfvf/PVFyNMmZJGdefPHFVY4qSWtztbRnSNAXgesmjvcwuhKfdCfwUI0sAD8Gblj6RFV1vKrmq2p+bm5urTNL0qpcLe0ZEvTHgQNJ9o9/0XkbcGrJmueADwIkeQtwPXBhloNKkla2c9qCqrqc5G7gYWAHcKKqziW5a/z4MeBzwINJnmJ0i+aeqrq0gXNLkpaYGnSAqjoNnF5y7tjE9xeBv5jtaJKk1fCdopLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTQwKepJDSc4nWUhy7xXW3JzkiSTnknxvtmNKkqbZOW1Bkh3AfcCfA4vA40lOVdWPJtZcC9wPHKqq55K8eYPmlSRdwZAr9JuAhaq6UFUvASeBI0vWfAR4qKqeA6iqF2Y7piRpmiFB3w08P3G8OD436W3AG5N8N8nZJHfMakBJ0jBTb7kAWeZcLfM87wI+CLwa+H6Sx6rq2d94ouQocBRg7969q59WktbgamnPkCv0ReC6ieM9wMVl1nynqn5VVZeAR4Ablz5RVR2vqvmqmp+bm1vrzJK0KldLe4YE/XHgQJL9Sa4BbgNOLVnzTeD9SXYmeQ3wHuCZ2Y4qSVrJ1FsuVXU5yd3Aw8AO4ERVnUty1/jxY1X1TJLvAE8CrwAPVNXTGzm4JOk3DbmHTlWdBk4vOXdsyfEXgC/MbjRJ0mr4TlFJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYGBT3JoSTnkywkuXeFde9O8nKSW2c3oiRpiKlBT7IDuA84DBwEbk9y8ArrPg88POshJUnTDblCvwlYqKoLVfUScBI4ssy6TwFfB16Y4XySpIGGBH038PzE8eL43P9Jshv4MHBsdqNJklZjSNCzzLlacvxF4J6qennFJ0qOJjmT5MyLL744cERJWp+rpT1Dgr4IXDdxvAe4uGTNPHAyyb8BtwL3J/nLpU9UVcerar6q5ufm5tY2sSSt0tXSnp0D1jwOHEiyH/h34DbgI5MLqmr//36f5EHgH6rqG7MbU5I0zdSgV9XlJHcz+uuVHcCJqjqX5K7x4943l6TfAkOu0Kmq08DpJeeWDXlV/dX6x5IkrZbvFJWkJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJamJQ0JMcSnI+yUKSe5d5/KNJnhx/PZrkxtmPKklaydSgJ9kB3AccBg4Ctyc5uGTZj4E/rap3AJ8Djs96UEnSyoZcod8ELFTVhap6CTgJHJlcUFWPVtUvxoePAXtmO6YkaZohQd8NPD9xvDg+dyUfB769nqEkSau3c8CaLHOull2YfIBR0N93hcePAkcB9u7dO3BESVqfq6U9Q67QF4HrJo73ABeXLkryDuAB4EhV/Wy5J6qq41U1X1Xzc3Nza5lXklbtamnPkKA/DhxIsj/JNcBtwKnJBUn2Ag8BH6uqZ2c/piRpmqm3XKrqcpK7gYeBHcCJqjqX5K7x48eAzwBvAu5PAnC5quY3bmxJ0lJD7qFTVaeB00vOHZv4/hPAJ2Y7miRpNXynqCQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSE4OCnuRQkvNJFpLcu8zjSfKl8eNPJnnn7EeVJK1katCT7ADuAw4DB4HbkxxcsuwwcGD8dRT4yoznlCRNMeQK/SZgoaouVNVLwEngyJI1R4Cv1shjwLVJ3jrjWSVJKxgS9N3A8xPHi+Nzq10jSdpAOwesyTLnag1rSHKU0S0ZgP9O8vSA19+udgGXtnqIDdJ5b9B7f533BnD9cieXtOe/kpzfvJFm7vev9MCQoC8C100c7wEurmENVXUcOA6Q5ExVzQ94/W2p8/467w1676/z3mC0v+XOT7ansyG3XB4HDiTZn+Qa4Dbg1JI1p4A7xn/t8l7gl1X10xnPKklawdQr9Kq6nORu4GFgB3Ciqs4luWv8+DHgNHALsAD8Grhz40aWJC1nyC0Xquo0o2hPnjs28X0Bn1zla3f/70/n/XXeG/TeX+e9Qf/9rSijFkuStjvf+i9JTWx40Dt/bMCAvX10vKcnkzya5MatmHOtpu1vYt27k7yc5NbNnG89huwtyc1JnkhyLsn3NnvG9Rjwb/MNSb6V5Ifj/W2b33slOZHkhSv92fN2bsq6VdWGfTH6Jeq/An8AXAP8EDi4ZM0twLcZ/S37e4EfbORMm7y3PwbeOP7+8HbZ29D9Taz7J0a/Y7l1q+ee4c/uWuBHwN7x8Zu3eu4Z7+9vgM+Pv58Dfg5cs9WzD9zfnwDvBJ6+wuPbsimz+NroK/TOHxswdW9V9WhV/WJ8+Bijv8/fLob87AA+BXwdeGEzh1unIXv7CPBQVT0HUFXd9lfA65MEeB2joF/e3DHXpqoeYTTvlWzXpqzbRge988cGrHbujzO6atgupu4vyW7gw8AxtpchP7u3AW9M8t0kZ5PcsWnTrd+Q/X0ZeDujNwA+BXy6ql7ZnPE23HZtyroN+rPFdZjZxwb8Fho8d5IPMAr6+zZ0otkasr8vAvdU1cujC71tY8jedgLvAj4IvBr4fpLHqurZjR5uBobs70PAE8CfAX8I/GOSf66q/9jg2TbDdm3Kum100Gf2sQG/hQbNneQdwAPA4ar62SbNNgtD9jcPnBzHfBdwS5LLVfWNTZlw7Yb+u7xUVb8CfpXkEeBGYDsEfcj+7gT+tkY3nReS/Bi4AfiXzRlxQ23XpqzbRt9y6fyxAVP3lmQv8BDwsW1yZTdp6v6qan9V7auqfcDfA3+9DWIOw/5dfhN4f5KdSV4DvAd4ZpPnXKsh+3uO0f8+SPIWRh9qdWFTp9w427Up67ahV+jV+GMDBu7tM8CbgPvHV7GXa5t8MNLA/W1LQ/ZWVc8k+Q7wJPAK8EBVbYtPBx34s/sc8GCSpxjdorinqrbFpzAm+RpwM7ArySLwWeBVsL2bMgu+U1SSmvCdopLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2Smvgf+kfmythyJFEAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x432 with 3 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plot\n",
    "%matplotlib inline\n",
    "import seaborn as sns\n",
    "\n",
    "income_vs_hardship = %sql SELECT per_capita_income_, hardship_index FROM chicago_socioeconomic_data\n",
    "plot = sns.jointplot(x = \"per_capita_income_\", y = \"hardship_index\", data=income_vs_hardship.h)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Double-click **here** for the solution.\n",
    "\n",
    "<!-- Hint:\n",
    "# if the import command gives ModuleNotFoundError: No module named 'seaborn'\n",
    "# then uncomment the following line i.e. delete the # to install the seaborn package \n",
    "# !pip install seaborn\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "import seaborn as sns\n",
    "\n",
    "income_vs_hardship = %sql SELECT per_capita_income_, hardship_index FROM chicago_socioeconomic_data;\n",
    "plot = sns.jointplot(x='per_capita_income_',y='hardship_index', data=income_vs_hardship.DataFrame())\n",
    "\n",
    "Correct answer:You can see that as Per Capita Income rises as the Hardship Index decreases. We see that the points on the scatter plot are somewhat closer to a straight line in the negative direction, so we have a negative correlation between the two variables. \n",
    "-->\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusion\n",
    "\n",
    "##### Now that you know how to do basic exploratory data analysis using SQL and python visualization tools, you can further explore this dataset to see how the variable `per_capita_income_` is related to `percent_households_below_poverty` and `percent_aged_16_unemployed`. Try to create interesting visualizations!\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "##### In this lab you learned how to store a real world data set from the internet in a database (Db2 on IBM Cloud), gain insights into data using SQL queries. You also visualized a portion of the data in the database to see what story it tells.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Author\n",
    "\n",
    "<a href=\"https://www.linkedin.com/in/ravahuja/\" target=\"_blank\">Rav Ahuja</a>\n",
    "\n",
    "## Change Log\n",
    "\n",
    "| Date (YYYY-MM-DD) | Version | Changed By | Change Description                 |\n",
    "| ----------------- | ------- | ---------- | ---------------------------------- |\n",
    "| 2020-08-28        | 2.0     | Lavanya    | Moved lab to course repo in GitLab |\n",
    "\n",
    "<hr>\n",
    "\n",
    "## <h3 align=\"center\"> © IBM Corporation 2020. All rights reserved. <h3/>\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python",
   "language": "python",
   "name": "conda-env-python-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.11"
  },
  "widgets": {
   "state": {},
   "version": "1.1.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}