Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pavanandhukuri/f360ebb1ad119cd5319068a545569060 to your computer and use it in GitHub Desktop.
Save pavanandhukuri/f360ebb1ad119cd5319068a545569060 to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"<a href=\"https://cognitiveclass.ai\"><img src = \"https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png\" width = 400> </a>\n",
"\n",
"<h1 align=center><font size = 5>Pie Charts, Box Plots, Scatter Plots, and Bubble Plots</font></h1>"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## Introduction\n",
"\n",
"In this lab session, we continue exploring the Matplotlib library. More specificatlly, we will learn how to create pie charts, box plots, scatter plots, and bubble charts."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## Table of Contents\n",
"\n",
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
"\n",
"1. [Exploring Datasets with *p*andas](#0)<br>\n",
"2. [Downloading and Prepping Data](#2)<br>\n",
"3. [Visualizing Data using Matplotlib](#4) <br>\n",
"4. [Pie Charts](#6) <br>\n",
"5. [Box Plots](#8) <br>\n",
"6. [Scatter Plots](#10) <br>\n",
"7. [Bubble Plots](#12) <br> \n",
"</div>\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# Exploring Datasets with *pandas* and Matplotlib<a id=\"0\"></a>\n",
"\n",
"Toolkits: The course heavily relies on [*pandas*](http://pandas.pydata.org/) and [**Numpy**](http://www.numpy.org/) for data wrangling, analysis, and visualization. The primary plotting library we will explore in the course is [Matplotlib](http://matplotlib.org/).\n",
"\n",
"Dataset: Immigration to Canada from 1980 to 2013 - [International migration flows to and from selected countries - The 2015 revision](http://www.un.org/en/development/desa/population/migration/data/empirical2/migrationflows.shtml) from United Nation's website.\n",
"\n",
"The dataset contains annual data on the flows of international migrants as recorded by the countries of destination. The data presents both inflows and outflows according to the place of birth, citizenship or place of previous / next residence both for foreigners and nationals. In this lab, we will focus on the Canadian Immigration data."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# Downloading and Prepping Data <a id=\"2\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Import primary modules."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [],
"source": [
"import numpy as np # useful for many scientific computing in Python\n",
"import pandas as pd # primary data structure library"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Let's download and import our primary Canadian Immigration dataset using *pandas* `read_excel()` method. Normally, before we can do that, we would need to download a module which *pandas* requires to read in excel files. This module is **xlrd**. For your convenience, we have pre-installed this module, so you would not have to worry about that. Otherwise, you would need to run the following line of code to install the **xlrd** module:\n",
"```\n",
"!conda install -c anaconda xlrd --yes\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Download the dataset and read it into a *pandas* dataframe."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Solving environment: done\n",
"\n",
"\n",
"==> WARNING: A newer version of conda exists. <==\n",
" current version: 4.5.11\n",
" latest version: 4.7.12\n",
"\n",
"Please update conda by running\n",
"\n",
" $ conda update -n base -c defaults conda\n",
"\n",
"\n",
"\n",
"## Package Plan ##\n",
"\n",
" environment location: /home/jupyterlab/conda/envs/python\n",
"\n",
" added / updated specs: \n",
" - xlrd\n",
"\n",
"\n",
"The following packages will be downloaded:\n",
"\n",
" package | build\n",
" ---------------------------|-----------------\n",
" openssl-1.1.1 | h7b6447c_0 5.0 MB anaconda\n",
" certifi-2019.9.11 | py36_0 154 KB anaconda\n",
" xlrd-1.2.0 | py36_0 188 KB anaconda\n",
" ------------------------------------------------------------\n",
" Total: 5.4 MB\n",
"\n",
"The following packages will be UPDATED:\n",
"\n",
" certifi: 2019.9.11-py36_0 conda-forge --> 2019.9.11-py36_0 anaconda\n",
" openssl: 1.1.1c-h516909a_0 conda-forge --> 1.1.1-h7b6447c_0 anaconda\n",
" xlrd: 1.1.0-py37_1 --> 1.2.0-py36_0 anaconda\n",
"\n",
"\n",
"Downloading and Extracting Packages\n",
"openssl-1.1.1 | 5.0 MB | ##################################### | 100% \n",
"certifi-2019.9.11 | 154 KB | ##################################### | 100% \n",
"xlrd-1.2.0 | 188 KB | ##################################### | 100% \n",
"Preparing transaction: done\n",
"Verifying transaction: done\n",
"Executing transaction: done\n",
"Data downloaded and read into a dataframe!\n"
]
}
],
"source": [
"!conda install -c anaconda xlrd --yes\n",
"df_can = pd.read_excel('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Data_Files/Canada.xlsx',\n",
" sheet_name='Canada by Citizenship',\n",
" skiprows=range(20),\n",
" skipfooter=2\n",
" )\n",
"\n",
"print('Data downloaded and read into a dataframe!')"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Let's take a look at the first five items in our dataset."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Type</th>\n",
" <th>Coverage</th>\n",
" <th>OdName</th>\n",
" <th>AREA</th>\n",
" <th>AreaName</th>\n",
" <th>REG</th>\n",
" <th>RegName</th>\n",
" <th>DEV</th>\n",
" <th>DevName</th>\n",
" <th>1980</th>\n",
" <th>...</th>\n",
" <th>2004</th>\n",
" <th>2005</th>\n",
" <th>2006</th>\n",
" <th>2007</th>\n",
" <th>2008</th>\n",
" <th>2009</th>\n",
" <th>2010</th>\n",
" <th>2011</th>\n",
" <th>2012</th>\n",
" <th>2013</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Immigrants</td>\n",
" <td>Foreigners</td>\n",
" <td>Afghanistan</td>\n",
" <td>935</td>\n",
" <td>Asia</td>\n",
" <td>5501</td>\n",
" <td>Southern Asia</td>\n",
" <td>902</td>\n",
" <td>Developing regions</td>\n",
" <td>16</td>\n",
" <td>...</td>\n",
" <td>2978</td>\n",
" <td>3436</td>\n",
" <td>3009</td>\n",
" <td>2652</td>\n",
" <td>2111</td>\n",
" <td>1746</td>\n",
" <td>1758</td>\n",
" <td>2203</td>\n",
" <td>2635</td>\n",
" <td>2004</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Immigrants</td>\n",
" <td>Foreigners</td>\n",
" <td>Albania</td>\n",
" <td>908</td>\n",
" <td>Europe</td>\n",
" <td>925</td>\n",
" <td>Southern Europe</td>\n",
" <td>901</td>\n",
" <td>Developed regions</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>1450</td>\n",
" <td>1223</td>\n",
" <td>856</td>\n",
" <td>702</td>\n",
" <td>560</td>\n",
" <td>716</td>\n",
" <td>561</td>\n",
" <td>539</td>\n",
" <td>620</td>\n",
" <td>603</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Immigrants</td>\n",
" <td>Foreigners</td>\n",
" <td>Algeria</td>\n",
" <td>903</td>\n",
" <td>Africa</td>\n",
" <td>912</td>\n",
" <td>Northern Africa</td>\n",
" <td>902</td>\n",
" <td>Developing regions</td>\n",
" <td>80</td>\n",
" <td>...</td>\n",
" <td>3616</td>\n",
" <td>3626</td>\n",
" <td>4807</td>\n",
" <td>3623</td>\n",
" <td>4005</td>\n",
" <td>5393</td>\n",
" <td>4752</td>\n",
" <td>4325</td>\n",
" <td>3774</td>\n",
" <td>4331</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Immigrants</td>\n",
" <td>Foreigners</td>\n",
" <td>American Samoa</td>\n",
" <td>909</td>\n",
" <td>Oceania</td>\n",
" <td>957</td>\n",
" <td>Polynesia</td>\n",
" <td>902</td>\n",
" <td>Developing regions</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Immigrants</td>\n",
" <td>Foreigners</td>\n",
" <td>Andorra</td>\n",
" <td>908</td>\n",
" <td>Europe</td>\n",
" <td>925</td>\n",
" <td>Southern Europe</td>\n",
" <td>901</td>\n",
" <td>Developed regions</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 43 columns</p>\n",
"</div>"
],
"text/plain": [
" Type Coverage OdName AREA AreaName REG \\\n",
"0 Immigrants Foreigners Afghanistan 935 Asia 5501 \n",
"1 Immigrants Foreigners Albania 908 Europe 925 \n",
"2 Immigrants Foreigners Algeria 903 Africa 912 \n",
"3 Immigrants Foreigners American Samoa 909 Oceania 957 \n",
"4 Immigrants Foreigners Andorra 908 Europe 925 \n",
"\n",
" RegName DEV DevName 1980 ... 2004 2005 2006 \\\n",
"0 Southern Asia 902 Developing regions 16 ... 2978 3436 3009 \n",
"1 Southern Europe 901 Developed regions 1 ... 1450 1223 856 \n",
"2 Northern Africa 902 Developing regions 80 ... 3616 3626 4807 \n",
"3 Polynesia 902 Developing regions 0 ... 0 0 1 \n",
"4 Southern Europe 901 Developed regions 0 ... 0 0 1 \n",
"\n",
" 2007 2008 2009 2010 2011 2012 2013 \n",
"0 2652 2111 1746 1758 2203 2635 2004 \n",
"1 702 560 716 561 539 620 603 \n",
"2 3623 4005 5393 4752 4325 3774 4331 \n",
"3 0 0 0 0 0 0 0 \n",
"4 1 0 0 0 0 1 1 \n",
"\n",
"[5 rows x 43 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_can.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Let's find out how many entries there are in our dataset."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(195, 43)\n"
]
}
],
"source": [
"# print the dimensions of the dataframe\n",
"print(df_can.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Clean up data. We will make some modifications to the original dataset to make it easier to create our visualizations. Refer to *Introduction to Matplotlib and Line Plots* and *Area Plots, Histograms, and Bar Plots* for a detailed description of this preprocessing."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"data dimensions: (195, 38)\n"
]
}
],
"source": [
"# clean up the dataset to remove unnecessary columns (eg. REG) \n",
"df_can.drop(['AREA', 'REG', 'DEV', 'Type', 'Coverage'], axis=1, inplace=True)\n",
"\n",
"# let's rename the columns so that they make sense\n",
"df_can.rename(columns={'OdName':'Country', 'AreaName':'Continent','RegName':'Region'}, inplace=True)\n",
"\n",
"# for sake of consistency, let's also make all column labels of type string\n",
"df_can.columns = list(map(str, df_can.columns))\n",
"\n",
"# set the country name as index - useful for quickly looking up countries using .loc method\n",
"df_can.set_index('Country', inplace=True)\n",
"\n",
"# add total column\n",
"df_can['Total'] = df_can.sum(axis=1)\n",
"\n",
"# years that we will be using in this lesson - useful for plotting later on\n",
"years = list(map(str, range(1980, 2014)))\n",
"print('data dimensions:', df_can.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# Visualizing Data using Matplotlib<a id=\"4\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Import `Matplotlib`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Matplotlib version: 3.1.1\n"
]
}
],
"source": [
"%matplotlib inline\n",
"\n",
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"\n",
"mpl.style.use('ggplot') # optional: for ggplot-like style\n",
"\n",
"# check for latest version of Matplotlib\n",
"print('Matplotlib version: ', mpl.__version__) # >= 2.0.0"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# Pie Charts <a id=\"6\"></a>\n",
"\n",
"A `pie chart` is a circualr graphic that displays numeric proportions by dividing a circle (or pie) into proportional slices. You are most likely already familiar with pie charts as it is widely used in business and media. We can create pie charts in Matplotlib by passing in the `kind=pie` keyword.\n",
"\n",
"Let's use a pie chart to explore the proportion (percentage) of new immigrants grouped by continents for the entire time period from 1980 to 2013. "
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 1: Gather data. \n",
"\n",
"We will use *pandas* `groupby` method to summarize the immigration data by `Continent`. The general process of `groupby` involves the following steps:\n",
"\n",
"1. **Split:** Splitting the data into groups based on some criteria.\n",
"2. **Apply:** Applying a function to each group independently:\n",
" .sum()\n",
" .count()\n",
" .mean() \n",
" .std() \n",
" .aggregate()\n",
" .apply()\n",
" .etc..\n",
"3. **Combine:** Combining the results into a data structure."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Images/Mod3Fig4SplitApplyCombine.png\" height=400 align=\"center\">"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.groupby.generic.DataFrameGroupBy'>\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>1980</th>\n",
" <th>1981</th>\n",
" <th>1982</th>\n",
" <th>1983</th>\n",
" <th>1984</th>\n",
" <th>1985</th>\n",
" <th>1986</th>\n",
" <th>1987</th>\n",
" <th>1988</th>\n",
" <th>1989</th>\n",
" <th>...</th>\n",
" <th>2005</th>\n",
" <th>2006</th>\n",
" <th>2007</th>\n",
" <th>2008</th>\n",
" <th>2009</th>\n",
" <th>2010</th>\n",
" <th>2011</th>\n",
" <th>2012</th>\n",
" <th>2013</th>\n",
" <th>Total</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Continent</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Africa</th>\n",
" <td>3951</td>\n",
" <td>4363</td>\n",
" <td>3819</td>\n",
" <td>2671</td>\n",
" <td>2639</td>\n",
" <td>2650</td>\n",
" <td>3782</td>\n",
" <td>7494</td>\n",
" <td>7552</td>\n",
" <td>9894</td>\n",
" <td>...</td>\n",
" <td>27523</td>\n",
" <td>29188</td>\n",
" <td>28284</td>\n",
" <td>29890</td>\n",
" <td>34534</td>\n",
" <td>40892</td>\n",
" <td>35441</td>\n",
" <td>38083</td>\n",
" <td>38543</td>\n",
" <td>618948</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Asia</th>\n",
" <td>31025</td>\n",
" <td>34314</td>\n",
" <td>30214</td>\n",
" <td>24696</td>\n",
" <td>27274</td>\n",
" <td>23850</td>\n",
" <td>28739</td>\n",
" <td>43203</td>\n",
" <td>47454</td>\n",
" <td>60256</td>\n",
" <td>...</td>\n",
" <td>159253</td>\n",
" <td>149054</td>\n",
" <td>133459</td>\n",
" <td>139894</td>\n",
" <td>141434</td>\n",
" <td>163845</td>\n",
" <td>146894</td>\n",
" <td>152218</td>\n",
" <td>155075</td>\n",
" <td>3317794</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Europe</th>\n",
" <td>39760</td>\n",
" <td>44802</td>\n",
" <td>42720</td>\n",
" <td>24638</td>\n",
" <td>22287</td>\n",
" <td>20844</td>\n",
" <td>24370</td>\n",
" <td>46698</td>\n",
" <td>54726</td>\n",
" <td>60893</td>\n",
" <td>...</td>\n",
" <td>35955</td>\n",
" <td>33053</td>\n",
" <td>33495</td>\n",
" <td>34692</td>\n",
" <td>35078</td>\n",
" <td>33425</td>\n",
" <td>26778</td>\n",
" <td>29177</td>\n",
" <td>28691</td>\n",
" <td>1410947</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Latin America and the Caribbean</th>\n",
" <td>13081</td>\n",
" <td>15215</td>\n",
" <td>16769</td>\n",
" <td>15427</td>\n",
" <td>13678</td>\n",
" <td>15171</td>\n",
" <td>21179</td>\n",
" <td>28471</td>\n",
" <td>21924</td>\n",
" <td>25060</td>\n",
" <td>...</td>\n",
" <td>24747</td>\n",
" <td>24676</td>\n",
" <td>26011</td>\n",
" <td>26547</td>\n",
" <td>26867</td>\n",
" <td>28818</td>\n",
" <td>27856</td>\n",
" <td>27173</td>\n",
" <td>24950</td>\n",
" <td>765148</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Northern America</th>\n",
" <td>9378</td>\n",
" <td>10030</td>\n",
" <td>9074</td>\n",
" <td>7100</td>\n",
" <td>6661</td>\n",
" <td>6543</td>\n",
" <td>7074</td>\n",
" <td>7705</td>\n",
" <td>6469</td>\n",
" <td>6790</td>\n",
" <td>...</td>\n",
" <td>8394</td>\n",
" <td>9613</td>\n",
" <td>9463</td>\n",
" <td>10190</td>\n",
" <td>8995</td>\n",
" <td>8142</td>\n",
" <td>7677</td>\n",
" <td>7892</td>\n",
" <td>8503</td>\n",
" <td>241142</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 35 columns</p>\n",
"</div>"
],
"text/plain": [
" 1980 1981 1982 1983 1984 1985 \\\n",
"Continent \n",
"Africa 3951 4363 3819 2671 2639 2650 \n",
"Asia 31025 34314 30214 24696 27274 23850 \n",
"Europe 39760 44802 42720 24638 22287 20844 \n",
"Latin America and the Caribbean 13081 15215 16769 15427 13678 15171 \n",
"Northern America 9378 10030 9074 7100 6661 6543 \n",
"\n",
" 1986 1987 1988 1989 ... 2005 \\\n",
"Continent ... \n",
"Africa 3782 7494 7552 9894 ... 27523 \n",
"Asia 28739 43203 47454 60256 ... 159253 \n",
"Europe 24370 46698 54726 60893 ... 35955 \n",
"Latin America and the Caribbean 21179 28471 21924 25060 ... 24747 \n",
"Northern America 7074 7705 6469 6790 ... 8394 \n",
"\n",
" 2006 2007 2008 2009 2010 \\\n",
"Continent \n",
"Africa 29188 28284 29890 34534 40892 \n",
"Asia 149054 133459 139894 141434 163845 \n",
"Europe 33053 33495 34692 35078 33425 \n",
"Latin America and the Caribbean 24676 26011 26547 26867 28818 \n",
"Northern America 9613 9463 10190 8995 8142 \n",
"\n",
" 2011 2012 2013 Total \n",
"Continent \n",
"Africa 35441 38083 38543 618948 \n",
"Asia 146894 152218 155075 3317794 \n",
"Europe 26778 29177 28691 1410947 \n",
"Latin America and the Caribbean 27856 27173 24950 765148 \n",
"Northern America 7677 7892 8503 241142 \n",
"\n",
"[5 rows x 35 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# group countries by continents and apply sum() function \n",
"df_continents = df_can.groupby('Continent', axis=0).sum()\n",
"\n",
"# note: the output of the groupby method is a `groupby' object. \n",
"# we can not use it further until we apply a function (eg .sum())\n",
"print(type(df_can.groupby('Continent', axis=0)))\n",
"\n",
"df_continents.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 2: Plot the data. We will pass in `kind = 'pie'` keyword, along with the following additional parameters:\n",
"- `autopct` - is a string or function used to label the wedges with their numeric value. The label will be placed inside the wedge. If it is a format string, the label will be `fmt%pct`.\n",
"- `startangle` - rotates the start of the pie chart by angle degrees counterclockwise from the x-axis.\n",
"- `shadow` - Draws a shadow beneath the pie (to give a 3D feel)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 360x432 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# autopct create %, start angle represent starting point\n",
"df_continents['Total'].plot(kind='pie',\n",
" figsize=(5, 6),\n",
" autopct='%1.1f%%', # add in percentages\n",
" startangle=90, # start angle 90° (Africa)\n",
" shadow=True, # add shadow \n",
" )\n",
"\n",
"plt.title('Immigration to Canada by Continent [1980 - 2013]')\n",
"plt.axis('equal') # Sets the pie chart to look like a circle.\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"The above visual is not very clear, the numbers and text overlap in some instances. Let's make a few modifications to improve the visuals:\n",
"\n",
"* Remove the text labels on the pie chart by passing in `legend` and add it as a seperate legend using `plt.legend()`.\n",
"* Push out the percentages to sit just outside the pie chart by passing in `pctdistance` parameter.\n",
"* Pass in a custom set of colors for continents by passing in `colors` parameter.\n",
"* **Explode** the pie chart to emphasize the lowest three continents (Africa, North America, and Latin America and Carribbean) by pasing in `explode` parameter.\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x432 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"colors_list = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue', 'lightgreen', 'pink']\n",
"explode_list = [0.1, 0, 0, 0, 0.1, 0.1] # ratio for each continent with which to offset each wedge.\n",
"\n",
"df_continents['Total'].plot(kind='pie',\n",
" figsize=(15, 6),\n",
" autopct='%1.1f%%', \n",
" startangle=90, \n",
" shadow=True, \n",
" labels=None, # turn off labels on pie chart\n",
" pctdistance=1.12, # the ratio between the center of each pie slice and the start of the text generated by autopct \n",
" colors=colors_list, # add custom colors\n",
" explode=explode_list # 'explode' lowest 3 continents\n",
" )\n",
"\n",
"# scale the title up by 12% to match pctdistance\n",
"plt.title('Immigration to Canada by Continent [1980 - 2013]', y=1.12) \n",
"\n",
"plt.axis('equal') \n",
"\n",
"# add legend\n",
"plt.legend(labels=df_continents.index, loc='upper left') \n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"**Question:** Using a pie chart, explore the proportion (percentage) of new immigrants grouped by continents in the year 2013.\n",
"\n",
"**Note**: You might need to play with the explore values in order to fix any overlapping slice values."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x432 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"### type your answer here\n",
"colors_list = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue', 'lightgreen', 'pink']\n",
"explode_list = [0, 0, 0, 0.1, 0.1, 0.1] # ratio for each continent with which to offset each wedge.\n",
"\n",
"df_continents[\"2013\"].plot(kind='pie', figsize=(15, 6), explode=explode_list, colors=colors_list, autopct='%1.1f%%', pctdistance=1.12, startangle=90, \n",
" shadow=True, \n",
" labels=None )\n",
"plt.title('Immigration to Canada by Continent in 2013', y=1.12) \n",
"\n",
"plt.axis('equal') \n",
"plt.legend(labels=df_continents.index, loc='upper left') \n",
"plt.show()\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Double-click __here__ for the solution.\n",
"<!-- The correct answer is:\n",
"explode_list = [0.1, 0, 0, 0, 0.1, 0.2] # ratio for each continent with which to offset each wedge.\n",
"-->\n",
"\n",
"<!--\n",
"df_continents['2013'].plot(kind='pie',\n",
" figsize=(15, 6),\n",
" autopct='%1.1f%%', \n",
" startangle=90, \n",
" shadow=True, \n",
" labels=None, # turn off labels on pie chart\n",
" pctdistance=1.12, # the ratio between the pie center and start of text label\n",
" explode=explode_list # 'explode' lowest 3 continents\n",
" )\n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # scale the title up by 12% to match pctdistance\n",
"plt.title('Immigration to Canada by Continent in 2013', y=1.12) \n",
"plt.axis('equal') \n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # add legend\n",
"plt.legend(labels=df_continents.index, loc='upper left') \n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # show plot\n",
"plt.show()\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# Box Plots <a id=\"8\"></a>\n",
"\n",
"A `box plot` is a way of statistically representing the *distribution* of the data through five main dimensions: \n",
"\n",
"- **Minimun:** Smallest number in the dataset.\n",
"- **First quartile:** Middle number between the `minimum` and the `median`.\n",
"- **Second quartile (Median):** Middle number of the (sorted) dataset.\n",
"- **Third quartile:** Middle number between `median` and `maximum`.\n",
"- **Maximum:** Highest number in the dataset."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Images/boxplot_complete.png\" width=440, align=\"center\">"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"To make a `box plot`, we can use `kind=box` in `plot` method invoked on a *pandas* series or dataframe.\n",
"\n",
"Let's plot the box plot for the Japanese immigrants between 1980 - 2013."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 1: Get the dataset. Even though we are extracting the data for just one country, we will obtain it as a dataframe. This will help us with calling the `dataframe.describe()` method to view the percentiles."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Country</th>\n",
" <th>Japan</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1980</th>\n",
" <td>701</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1981</th>\n",
" <td>756</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1982</th>\n",
" <td>598</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1983</th>\n",
" <td>309</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1984</th>\n",
" <td>246</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Country Japan\n",
"1980 701\n",
"1981 756\n",
"1982 598\n",
"1983 309\n",
"1984 246"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# to get a dataframe, place extra square brackets around 'Japan'.\n",
"df_japan = df_can.loc[['Japan'], years].transpose()\n",
"df_japan.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 2: Plot by passing in `kind='box'`."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAf4AAAF2CAYAAACPjPqQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3debwkZXno8d/rjCKIqHgEmRkMKKAC7oqYi4ghKnEBY+ILVwSiBMSAuN0YiF4l5Oo1V8MN1wQNuAAKwuPKJKIoKGERIWiiBlDEgDCLAxMWZZFlqPvH+x5pevqcqeF0nz5n6vf9fPrT3W9tT3VX11PvW29Vp6ZpkCRJ3fCwcQcgSZJmj4lfkqQOMfFLktQhJn5JkjrExC9JUoeY+CVJ6hATv9ZLSum6lNL7xrj8rVNK56WU7kgpeS3qPJJSOjmldO644xiFlNLClNKnU0r/lVJqUkp7jDsmaSom/jmk7hibnsdtKaVLUkqvHHdsD1VKacmQd4R/CWwBPBvYaopl7lGXuWRIy9wgpJSOSSldM8YQ3g68fozLX0tK6dyU0slDmNUfAW8AXkPZLr87hHnOWEppp5TSF1JKP0sp3Z9S+uQU4/1pSunHKaU7U0rX123lYX3j/F5K6cK6X7o5pfStlNLz+8Z5dErppHoAdEdK6esppacMaV3elFL6TkrpppTSr1NK308p7T9gvB1SSufUdVmdUvpESulRPcMfmVL6TErp31JK9wz6TaSUNk4p/XP9LH6TUlqVUvpqSmnHYazLuJn4554LKTuOrYBdgR8AXx3Wj2cDsD1wWdM0P2ua5pfjDkbtNU1zW9M0t8xkHimlh6WUFgwrpiHaHljeNM13m6b5ZdM09/SPkFJ6xBji2gS4HjgW+OGgEVJKhwAfAz4K7AwcARwG/HXPOE8C/hn4d+AFwO7AbcA5vUkV+CywJ/DHwG5AAr6VUtp4COuyJ7AUeCXwHOAM4LMppX174twUOA+4D/hdIAN7AZ/qmc8C4B7gxDqPQRrgW3X6pwKvAhYC56WUHjmEdRmvpml8zJEHcDJwbl/Zoykb4R/2lf0jcBPwG+By4OU9wzNlw96lp+zAOu5zpll+Q6mVfQm4A1gBvKtvnOuA961HLE3f47pplr++8zp5ivnsUYcvqe8TcBLwc+Au4D+BDwEb9UxzDHANpdb2n3X55wLb9oyzLfDl+rncCfwYOKBv2ecDnwT+J/BL4Ob6vT6qb7z9KDvR39TP9LjecSg7zYuBX9fHD4FX9Azfss73pjr8YmD3dWxfxwDXDFjnDPysrtNXgc2A1wE/rfP+IvCY/u0UeBuwDLi9rvPDKQnjF8AtlB3rI6bavikVjw/VdbidshN+B3DfgBj3BX5C2aHvDDwX+DpwY532X4G9BmyrxwLH1+9hFSW5LeiJp3+b2qMO+8u6Hdxd4zsH2HiKz/V8BmzjtfxTlAS6Erip5Xa+TZ3PG+py76zr/hJgMXA25fd5JfDi9di/nA98ckD5RcAn+sreWZfxqPr+tTWmR/eM84xa9qz6fof6vnddHlc/wz8Z0T7zn4Av9bw/lPIb791eX1Xj2nbA9MfQ85tYx7Ke1bu+8/kx9gB89HwZa+8YHwG8q+4cfqen/At1p/YK4Ol1x3YP8LSecSYT3Wb1B/lr4G3rWH5Td5Bvq9O8nbKjfV3PONfx4MQ/bSyUI/OGkkieCDxhmuWva15PpDShnlZfP2aK+ezBgxP/w4D/BbyQslPdm7Ij/queaY6pO7qLKDWaFwCXUhJuquM8AzgceCbwlPo53Qe8tGc+5wO3Av8XeBqltnFr37L+hJIYDwCeTKk9/Qj4bB2+oH4Px1FqktsDf0jdyQMbU3b6XwKeD2wHvJeyg336NJ/vMayd+O8AvlbX6SWUZPRNSnJ5FvBiSsL8m77t9DbglPo97U3ZRs8GTgV2BF5N2QG/dZrt+12UpH1AXcd31fXuT/x3Av9CaQHbgZI49wAOqsvaoX6/9wA79G2rtwBH1fnvW7+vN9XhjwEuAM6kbE9PpPzmXgf8itJs/yTKaaV3MHXi35xyQHEtPdt43RZ+DXyixvmMltv5NpTt9+eUhLsD8BXKAee5dVvYoX7/NwAPb7l/OZ/Bif9y4O/6yg6rMbykvt+6fg/vphzgbQz8HeWgbKM6zpvqeizom9eFg5Y7pH3mBcA/9rw/Bfh23zgPB9YAb1zXb2Ka5Tya0iqyDNhkFOsym4+xB+Cj58soO8b7KDvD24H763PuGWe7+oN8Zd+0PwA+3fN+Y+AKIIB/A77aYvkNNfn0lJ0OXNTz/jpq4m8TC7CEnprUNMtuu14Dd1590+xBT+KfYpx3Aj/reX9MnWa7nrLJGszvTzOfs4CT+uL7Ud84nwAu6fsMD+sbZ/e6rMfVx5SfGeXAYRmwsK/82/TtwPuGP2gnV9/fB0z0lP1D3Uk+oafseODyvu30Rh5cm/8asJoHt6KcBXyxb7rexL8c+Ou+GM9g7cR/P/CkFtvvD4H39n3OS/vG+Qbw+Z7359LXclS3jatpmVAHfbY928LVwMPWZzvngcT/jp7hL6hl7+4pmzyo3rlljOczOPH/NeWA679RWseeTmntaYD/3jPerpTTBvfV7+QnwJN7hv8lsGLA/L8AfK3tZ7ken/kbKQcaz+0p+yZw+oBxbwL+vM331jf8byj74Aa4ip79w3x+eI5/7rmUUsN4NqU581jglJTSK+rwyc4lF/RNdwGw0+SbpmnuotRwXkfpDPfmlsu/pO/9xT3L7NcqlpaGOa+1pJQOSSldWjvp3A78b+B3+ka7qWma33b0aZrmakoy27HOY5OU0odTSlfUzk23U8439s/n3/veL6c0zZNSekId/7iU0u2TD0qzNZQdyy2UpvNzaueoo1JKT+2Z3wsoNctb++bxYkrNdn0sb5pmdc/7XwK/bJrmpr6yLfqmu6p58HnsXwI/bZrm7nVMB0BKaTNgEfC9vkH92x/AqqZpru+b/gkppRNSSj9JKd1a138n1uO7mEZQaom/qB1uD0gpPXod00zl+03T3N/zfn22895z8pP9WX40oGzgZ7we/hdlnb8D3EupoX+uDlsDkFLaAvgM5Rz7CykHCVcBZ7f8bJqpBvRuwymlr081Xt80+1BaNQ9umuYHbaaZLoZpfIRygPVSSgvMV2awLcwZC8cdgNZyV2/yAf49pbQnpSn3nGmmS6y9Ye9Wnx9L2Tnc/BDiSQ9xmofyIxvJvFJKr6fUZI+iNBn/itK7/IMtlz/pI8A+lObOn1Cayf+W0mTcq79jV8MDHWknn99O2dH2WwbQNM0hKaXjgZcDLwP+OqV0RNM0/1jncRWlybffnS3Wqde9A2IdVNZfSXio001KPeOsyx0Dyk6mNMO/h9LEfheltaC/A91038VATdMsTyk9jbKz/z1Kf42/SSm9sGmaG1rEu67YBxm0nfd+ns00ZTOqwNWDtcNSSkdQDihXUbY5KMkOSoc/mqY54rcBp7Qf5VTKvpQD1ZXAREppQdM0a3oWsSWl5WMqz+55fde64q3LPRk4pGmaz/YNXkk5LdE7/sMpp2PWuzNwPSheDfwspfRd4L+A/SmtePOWNf754T5K71wozfdQmoZ7vbhnGCmlnSjniN9CqU2ekVLaqMWydu17/yJKkhmkTSyTO9519cRutV4P0e7AvzVNc1zTNN9vmuZnlObUfk/ovXoipbQD8HgeWP/dgdOapjmzaZofUjp/7bA+gTRNs4pyXvapTdNcM+Dxm55x/6PG/AeUTmKH1kGXU/oG/GrA9CvWJ55xaZrmNso56xf1Derf/qayO3BC0zRLm6b5MWWH/+SHEMo9DNg2m6a5u2mabzRN8x5K345NKOfbZ2qU2/mMNE1zX9M0y5qmuZfSsfBaymlCgEdRmvd73V8fkwdxF1NaSn5vcoSU0mMpLQQXTbPc3u13+XQx1isQTgYOGpD0J2N4UW1RmvQySq67eLp5t5SANvvROc0a/9zziJTSE+vrR1E6AL0C+ABA0zQ/Tyl9ATghpfQWSg/qt1J6Or8BynWqlNrP0qZpPpVS+jKlyfOjlA5p03l1PfI/h9IxbV9KD/S1tImFcrR8O/DylNIVwN3NgEu6Ws7rofopcHBtHvwPSsez1w0Y707gMymld1J+4B+j9Nw/t2c++6SUvlTX6V2U5upV6xnPe4FPpZRupfSiv5dyXvUPmqZ5S0ppO+AQSo/lG+oyXkw5Dwylc+M7ga+llN5LqU1tSdnhXtU0zVfXM55x+Vvgr1JKPwEuo/S+fjntWgF+CuyfUrqIkriPZd0Hl4NcC7y0HvDdVh8HUhLFZZSOmXtSOndd+RDm/yAj3s7XUi8hnDy9sCmweUrp2cA9TdNcWcfZjtJ0fwllPQ+m/O5f03OaYinwzpTShylN/o+gtKA1lMveaJrm6pTSWcDHU0oHUz7LD1FOr5w5hHV5J6XV7XDgX3r2k/c0TTPZmnk6pYXm9Prb2JzS2ndm0zTX9sxrx7oOT6TscydbHa5smuaeVO478nRKZ+JbKK0If0E50PnKTNdl7MbdycDHAw/WvrzoTkot4H/w4A5Cm/HA5UB3s/blQB+n1EZ7L2nZjZJg9p5m+Q2l9/JX67JX0tchhrV79U8bSx3nQMoO9l6mv5yvzbzOZ92d+36vrssT6/uH1/neTGnmP53SdNn0THMMpYfyG+s63k3pLPeUnnG2phwQ3VE/m7+i1MTPny4+4H39602pPV5SP+dfUQ7M3l+HbUW5bHBZjWMF5Xxm7/f5+Po9L6fUWpdTdkjTXa55DAMu52sR61HAsr7ttP+y00/2fg617BM8uGPog6ajJNf/zQMHh2dQOoj9eroYa/kzKDvlu+r39Wf0ddSjb1sdFCelleACHujAtQfloHByh38n5WDx4HVsc4M+y7W2hZa/321qLLv1lK3VSZaStNbV+XRyXv2P63rG2aHGcEf9HL7DgEtDKTcpuoyyvd5M+X3s1jfOo+tnfHP97L7BkDrE1e9z0Lr0b3dPpXTyu5PSNP+PrH057VTz2qYOfyHltOB/1e/oF5R+DzsOY13G/Zi8TEkilVvgHtA0zefWOfIcllJ6A+Wyso2aB59rnG6aYyiX+2w3ytg0vZTSpynXST9v3LFIGyqb+rXBqHcHm7y+/tttk77GI6W0iNJB8TuU3uOvobQOHTHddJJmxs592pDsS2mi/RXlBiSa29ZQrq64iNKJ7EDKDX/mdY9paa6zqV+SpA6xxi9JUoeY+CVJ6pCudO7zfIYkqWsG3nm1K4mfFSvmxQ3NpA3KxMQEq1evXveIkoZq0aJFUw6zqV+SpA4x8UuS1CEmfkmSOsTEL0lSh5j4JUnqEBO/JEkdYuKXJKlDTPySJHWIiV+SpA4x8UuS1CEmfkmSOsTEL0lSh3TmT3okrZ/FixePO4TfWr58+bhDkDYYJn5JAw0j2a45ZG8WnLR0CNFIGhab+iVJ6hATvyRJHWLilySpQ0z8kiR1iIlfkqQOMfFLktQhJn5JkjrExC9JUoeY+CVJ6hATvyRJHWLilySpQ0z8kiR1iIlfkqQOMfFLktQhJn5JkjrExC9JUoeY+CVJ6hATvyRJHbJwNhaSc/408GrgxojYuZZ9BHgNcA/wc+BNEXFrHXY0cDCwBjgyIs6p5c8DTgY2Bs4G3h4RzWysgyRJG4LZqvGfDOzVV/YtYOeIeCZwNXA0QM55R2A/YKc6zQk55wV1mo8DhwLb10f/PCVJ0jRmJfFHxAXAzX1l34yI++rb7wFL6ut9gDMi4u6IuBa4Btgl57wVsFlEXFJr+acCr52N+CVJ2lDMSlN/C28GzqyvF1MOBCYtq2X31tf95QPlnA+ltA4QEUxMTAwzXkktrAJ/e9IcM/bEn3N+L3AfcFotSgNGa6YpHygiTgROnBxv9erVMwlT0kPkb0+afYsWLZpy2FgTf875IEqnvz17OuktA7buGW0JsKKWLxlQLkmSWhpb4s857wX8BfCSiLizZ9BS4PSc83HAIkonvssiYk3O+dc5512BS4EDgY/NdtySJM1ns3U53+eBPYCJnPMy4AOUXvwbAd/KOQN8LyIOi4grcs4BXEk5BXB4RKyps3orD1zO9/X6kCRJLaWm6cRl8M2KFZ4VkGbbmkP2ZsFJS8cdhtQ59Rz/oL5x3rlPkqQuMfFLktQhJn5JkjrExC9JUoeY+CVJ6hATvyRJHWLilySpQ0z8kiR1iIlfkqQOMfFLktQhJn5JkjrExC9JUoeY+CVJ6hATvyRJHWLilySpQ0z8kiR1iIlfkqQOMfFLktQhJn5JkjrExC9JUoeY+CVJ6hATvyRJHWLilySpQ0z8kiR1iIlfkqQOMfFLktQhJn5JkjokNU0z7hhmQ7NixYpxxyDNmjVvfwPcefu4w5gbNtmUBcefPu4opFm1aNEigDRo2MLZDUXSrLjzdhactHTcUTAxMcHq1avHGsOaQ/Ye6/KlucamfkmSOsTEL0lSh5j4JUnqEBO/JEkdYuKXJKlDTPySJHWIiV+SpA4x8UuS1CEmfkmSOsTEL0lSh5j4JUnqEBO/JEkdYuKXJKlDTPySJHWIiV+SpA4x8UuS1CEmfkmSOmThbCwk5/xp4NXAjRGxcy3bHDgT2Aa4DsgRcUsddjRwMLAGODIizqnlzwNOBjYGzgbeHhHNbKyDJEkbgtmq8Z8M7NVXdhRwXkRsD5xX35Nz3hHYD9ipTnNCznlBnebjwKHA9vXRP09JkjSNWUn8EXEBcHNf8T7AKfX1KcBre8rPiIi7I+Ja4Bpgl5zzVsBmEXFJreWf2jONJElqYVaa+qewZUSsBIiIlTnnLWr5YuB7PeMtq2X31tf95QPlnA+ltA4QEUxMTAwxdGluWwVzYptfuHDh2OOYK5+FNFeMM/FPJQ0oa6YpHygiTgROnBxv9erVQwhNmj/mwjY/MTExJ+KYCzFIs2nRokVTDhtnr/5Vtfme+nxjLV8GbN0z3hJgRS1fMqBckiS1NM7EvxQ4qL4+CDirp3y/nPNGOedtKZ34LqunBX6dc94155yAA3umkSRJLTykpv6c88bAmoi4p+X4nwf2ACZyzsuADwAfBiLnfDBwPfB6gIi4IuccwJXAfcDhEbGmzuqtPHA539frQ5IktZSaZt2XweecPwpERFyWc34V8EXK+fV9I+KfRhzjMDQrVnhWQN2x5pC9WXDS0nGHMSfO8c+Vz0KaTfUc/6C+ca2b+vcH/qO+fj/wRmBv4EMzDU6SJM2etk39m0TEnTnnxwNPjogvAeScf2d0oUmSpGFrm/ivzjnvD2wHfAsg5zwB3DWqwCRJ0vC1Tfx/BhwP3EO5hz7AK4BvjiIoSZI0Gm0T/w0R8bu9BRFxWs75vBHEJEmSRqRt576rpyi/cliBSJKk0Wub+Ne6JCDnvBlw/3DDkSRJozRtU3/O+QbK9fob55yv7xv8eODzowpMkiQN37rO8b+RUts/Gzigp7wBVkXET0cVmCRJGr5pE39E/AuUS/ci4s7ZCUmSJI1K217999X/t382sGnvgIg4cOhRSZKkkWib+E8Fngn8E7BqdOFIkqRRapv4XwFsGxG3jjIYSZI0Wm0v57se2GiUgUiSpNFbn6b+s3LOx9PX1B8R3x56VJIkaSTaJv4j6nP/3/A2wJOHF44kSRqlVok/IrYddSCSJGn02p7jlyRJG4BWNf56X/5jgJcAE/Tcuz8injSSyCRJ0tC1rfGfADwXOBbYHHgbpaf//x1RXJIkaQTaJv6XA38UEWcBa+rzvjz4/v2SJGmOa9ur/2HAbfX17TnnxwIrge1GEpWkGTn790+FM+fC/bbmQAy/fyqvGXcM0hzSNvH/kHJ+/zzgQuAfgNuBq0cUl6QZeOW5B7LgpKXjDoOJiQlWr1491hjWHLI37Dv+z0KaK9o29R8CXFdfHwncBTwW8A96JEmaR9ZZ4885LwD+BPggQETcBPzpaMOSJEmjsM4af0SsAQ4H7h19OJIkaZTaNvWfAhw2ykAkSdLote3ctwvwtpzze4AbKPfoByAidh9FYJIkafjaJv6T6kOSJM1jbf+k55RRByJJkkav7b363zzFoLuBZcD3IuLuoUUlSZJGom1T/4HAi4BVlES/BNgSuBzYBiDnvE9EXD6CGCVJ0pC0TfxXAF+OiP83WZBzPgJ4GrAb8F7gY5SDA0mSNEe1vZzvDcDf95V9HNg/IhrgI8COwwxMkiQNX9vEvwrW+p+LVwE31tePxBv8SJI057Vt6j8S+ELO+T8o1/FvDewMvL4OfyGlqV+SJM1hbS/n+2bO+SnAHwCLgLOBr0XEf00OB745siglSdJQtK3xExGrgc+OMBZJkjRiUyb+nPM3ImKv+vpCem7T28tb9kqSNH9MV+M/tef1J0cdiCRJGr0pE39EnN7z2lv2SpK0AWh9jj/n/GLgOcCmveUR8aFhByVJkkaj7b36PwZk4ELgrp5BA8/7S5KkualtjX9/YOeIWDHKYCRJ0mi1vXPfDZR/4pMkSfNY2xr/wcBJOefPU27f+1sRccFMAsg5vxP4U8ppgx8DbwI2Ac6k/PPfdUCOiFvq+EfXeNYAR0bEOTNZviRJXdK2xv88yl37Pg6c1vP43EwWnnNeTLkd8PMjYmdgAbAfcBRwXkRsD5xX35Nz3rEO3wnYCzgh57xgJjFIktQlbWv8HwJeExHnjiiGjXPO91Jq+iuAo4E96vBTgPOBvwD2Ac6IiLuBa3PO1wC7AJeMIC5JkjY4bWv8dwAzatIfJCKWAx8FrgdWArfV+/5vGREr6zgrgS3qJIsp/Q0mLatlkiSphbY1/vcDf5dzPpYH/ooXgIi4/6EuPOf8OEotflvgVso/AL5xmknSgLKBlxTmnA8FDq0xMjEx8VDDlOadVTAntvmFCxeOPY658llIc0XbxP/p+vyWnrJESbozOcf++8C1EXETQM75y8DvAqtyzltFxMqc81Y8cLCxjPKXwJOWUE4NrCUiTgROrG+b1atXzyBMaf6ZC9v8xMTEnIhjLsQgzaZFixZNOaxt4t92OKGs5Xpg15zzJpQbA+0JXE45tXAQ8OH6fFYdfylwes75OMrfA28PXDai2CRJ2uC0SvwR8YtRLDwiLs05fxH4AXAf8G+UWvqmQOScD6YcHLy+jn9FzjmAK+v4h0fEmlHEJknShig1zbrvuptzfgzlsrtB9+p/+WhCG6pmxQpvOqjuWHPI3iw4aem4w5gTTf1z5bOQZlNt6h/UL651U/8XKOfyv8KD79UvSZLmkbaJf1fg8RFx7yiDkSRJo9X2Ov6LgKePMhBJkjR6bWv8fwKcnXO+lLXv1X/ssIOSJEmj0Tbxf5By/fx1wGY95evuGShJkuaMtol/P2CHydvoSpKk+antOf7/BOzYJ0nSPNe2xv9ZYGnO+WOsfY7/20OPSpIkjUTbxH94ff5QX3kDPHl44UiSpFFqe8veUd2rX5IkzaK25/glSdIGYNoaf875QtZxyV5E7D7UiCRJ0sisq6n/k7MShSRJmhXTJv6IOGW2ApEkSaPnOX5JkjrExC9JUoeY+CVJ6pApE3/O+Xs9rz8wO+FIkqRRmq7Gv0PO+ZH19btnIxhJkjRa0/XqPwu4Oud8HbBxzvmCQSN5Hb8kSfPHlIk/It6Uc94N2AZ4AfCp2QpKkiSNxrqu478IuCjn/Aiv6Zckaf5r+yc9n845vxQ4AFgMLAc+51/ySpI0v7S6nC/n/KfAmcAvgS8DK4HTc86HjDA2SZI0ZK1q/MB7gJdFxA8nC3LOZwJfAk4aRWCSJGn42t7A5/HAlX1lPwU2H244kiRplNom/ouA43LOmwDknB8FfAT47qgCkyRJw9c28R8GPBO4Lee8CrgVeBbwllEFJkmShq9tr/6VwEtyzkuARcCKiFg20sgkSdLQte3cB0BN9iZ8SZLmKf+dT5KkDjHxS5LUIets6s85PwzYA7goIu4ZeUSSJGlk1lnjj4j7gbNM+pIkzX9tm/ovyDnvOtJIJEnSyLXt1f8L4Os557OAG4BmckBEvH8UgUmSpOFrm/g3Br5aXy8ZUSySJGnE2t7A502jDkSSJI1e6xv45JyfDvwxsGVEHJFzfiqwUUT8aGTRSZKkoWrVuS/n/HrgAmAxcGAtfjRw3IjikiRJI9C2V/+xwMsi4jBgTS37IeWPeiRJ0jzRNvFvQUn08ECP/qbntSRJmgfaJv7vAwf0le0HXDbccCRJ0ii17dx3JPDNnPPBwKNyzucAOwAvH1lkkiRp6FrV+CPiJ8DTgH8A3gd8BnhGRPxshLFJkqQha/3vfBFxJ3AxcD5wYUTcPqqgJEnSaLRq6s85Pwk4DdgVuAV4XM75UmD/iPjFTALIOT8W+CSwM6Wz4JuBnwJnAtsA1wE5Im6p4x8NHEy5uuDIiDhnJsuXJKlL2tb4T6F08HtsRGwBPA7411o+U8cD34iIp1EuD7wKOAo4LyK2B86r78k570jpVLgTsBdwQs55wRBikCSpE9om/ucBfx4RdwDUZv6/qOUPWc55M2B34FN1vvdExK3APjxwUHEK8Nr6eh/gjIi4OyKuBa4BdplJDJIkdUnbXv3foyTYi3vKng9cMsPlPxm4CfhMzvlZlFaFt1NuC7wSICJW5py3qOMvrrFMWlbL1pJzPhQ4tM6DiYmJGYYqzR+rYE5s8wsXLhx7HHPls5DmiikTf8752J63PwfOzjl/jfK3vFsDrwROH8Lynwu8LSIuzTkfT23Wn0IaUDbwJkIRcSJw4uQ4q1evnlGg0nwzF7b5iYmJORHHXIhBmk2LFi2acth0Tf1b9zweCXwZuJtyF7+7ga/U8plYBiyLiEvr+y9SDgRW5Zy3AqjPN/aMv3XP9EuAFTOMQZKkzpiyxj8bf8UbEb/MOd+Qc35qRPwU2BO4sj4OAj5cn8+qkywFTs85HwcsArbHuwdKktTa+vwt7ybAdsCmveUR8d0ZxvA24LSc8yOA/wTeRGmJiHqnwOuB19dlXZFzDsqBwX3A4RGxZvBsJUlSv7bX8R8I/D1wD3BXz6AGeNJMAoiIf6d0FOy35xTjfxD44EyWKUlSV7Wt8f8f4I8i4lujDEaSJI1W2+v476HcqleSJM1jbRP//wSOyzl7MawkSckCK20AAAiaSURBVPNY26b+q4FjgT/LOU+WJaCJCG+ZK0nSPNE28X8WOJXyxzl3rWNcSZI0R7VN/I8H3h8RA++SJ0mS5oe25/g/AxwwykAkSdLota3x7wIckXN+L+U/L34rInYfelSSJGkk2ib+k+pDkiTNY60Sf0ScMupAJEnS6LW9Ze+bpxoWEZ8eXjiSJGmU2jb193fseyLwFOBiwMQvSdI80bap/6X9ZbUV4OlDj0iSJI1M28v5BjkZOHhIcUiSpFnQ9hx//wHCJsAbgVuHHpGkoVhzyN7jDuHB1/6OyyabjjsCaU5pe47/PqD/rn3LgUOGG46kYVhw0tJxhwCUg4+5Eoukom3i37bv/R0RsXrYwUiSpNFq27nvF6MORJIkjd60iT/n/B3WbuLv1UTEnsMNSZIkjcq6avyfm6J8MXAkpZOfJEmaJ6ZN/BHxqd73OefHA0dTOvWdCRw7utAkSdKwtb2cbzPgz4EjgH8GnhsRPx9lYJIkafjWdY5/Y+AdwLuB84HdIuKKWYhLkiSNwLpq/NcCC4D/A1wObJlz3rJ3hIj49ohikyRJQ7auxP8bSq/+t04xvAGePNSIJEnSyKyrc982sxSHJEmaBTP5kx5JkjTPmPglSeoQE78kSR1i4pckqUNM/JIkdYiJX5KkDjHxS5LUISZ+SZI6xMQvSVKHmPglSeoQE78kSR1i4pckqUNM/JIkdYiJX5KkDjHxS5LUISZ+SZI6xMQvSVKHmPglSeoQE78kSR2ycNwBAOScFwCXA8sj4tU5582BM4FtgOuAHBG31HGPBg4G1gBHRsQ5YwlakqR5aK7U+N8OXNXz/ijgvIjYHjivvifnvCOwH7ATsBdwQj1okCRJLYw98eeclwCvAj7ZU7wPcEp9fQrw2p7yMyLi7oi4FrgG2GW2YpUkab4be+IH/g54D3B/T9mWEbESoD5vUcsXAzf0jLeslkmSpBbGeo4/5/xq4MaI+H7OeY8Wk6QBZc0U8z4UOBQgIpiYmHjIcUp6aFaBvz1pjhl3577/Buydc34l8Ehgs5zz54BVOeetImJlznkr4MY6/jJg657plwArBs04Ik4ETqxvm9WrV49kBSRNz9+eNPsWLVo05bCxNvVHxNERsSQitqF02vt2RLwRWAocVEc7CDirvl4K7Jdz3ijnvC2wPXDZLIctSdK8NRfO8Q/yYeBlOeefAS+r74mIK4AArgS+ARweEWvGFqUkSfNMapqBp8g3NM2KFQPPCEgaoTWH7M2Ck5aOOwypc2pT/6B+cXO2xi9JkkbAxC9JUoeY+CVJ6hATvyRJHWLilySpQ0z8kiR1iIlfkqQOMfFLktQhJn5JkjrExC9JUoeY+CVJ6hATvyRJHWLilySpQ0z8kiR1iIlfkqQOMfFLktQhJn5JkjrExC9JUoeY+CVJ6hATvyRJHWLilySpQ0z8kiR1iIlfkqQOMfFLktQhJn5JkjrExC9JUoeY+CVJ6hATvyRJHWLilySpQ0z8kiR1iIlfkqQOMfFLktQhJn5JkjrExC9JUoeY+CVJ6hATvyRJHWLilySpQ0z8kiR1iIlfkqQOMfFLktQhJn5JkjrExC9JUoeY+CVJ6hATvyRJHWLilySpQ0z8kiR1yMJxLjznvDVwKvBE4H7gxIg4Pue8OXAmsA1wHZAj4pY6zdHAwcAa4MiIOGcMoUuSNC+Nu8Z/H/DuiHg6sCtweM55R+Ao4LyI2B44r76nDtsP2AnYCzgh57xgLJFLkjQPjTXxR8TKiPhBff1r4CpgMbAPcEod7RTgtfX1PsAZEXF3RFwLXAPsMrtRS5I0f421qb9Xznkb4DnApcCWEbESysFBznmLOtpi4Hs9ky2rZYPmdyhwaJ0HExMTI4pc0lRWgb89aY6ZE4k/57wp8CXgHRHxq5zzVKOmAWXNoBEj4kTgxMlxVq9ePeM4pS5ZvHjgMfX622ijGc9i+fLlQwhE6o5FixZNOWzsiT/n/HBK0j8tIr5ci1flnLeqtf2tgBtr+TJg657JlwArZi9aqTuGkWwnJibwoFuaW8bdqz8BnwKuiojjegYtBQ4CPlyfz+opPz3nfBywCNgeuGz2IpYkaX5LTTOwpXxW5Jx3Ay4Efky5nA/gLynn+QN4EnA98PqIuLlO817gzZQrAt4REV9vsahmxQobBqTZZo1fGo/a1D/o9Ph4E/8sMvFLY2Dil8ZjusQ/7uv4JUnSLDLxS5LUISZ+SZI6xMQvSVKHmPglSeoQE78kSR1i4pckqUNM/JIkdYiJX5KkDjHxS5LUIZ25Ze+4A5AkaZYNvGXv2P+Wd5YMXHlJo5Vzvjwinj/uOCQ9wKZ+SZI6xMQvSVKHmPgljdKJ4w5A0oN1pXOfJEnCGr8kSZ1i4pfUWs759nHHIGlmTPySJHVIV67jlzQkOedNgbOAxwEPB94XEWflnLcBvgFcCjwHuBo4MCLuzDm/H3gNsDHwXeAtEdHknM+v478UeCxwcERcOMurJHWKNX5J6+s3wB9GxHMpCftvc86TN8l6KnBiRDwT+BXwZ7X87yPiBRGxMyX5v7pnfgsjYhfgHcAHZmUNpA6zxi9pfSXgQznn3YH7gcXAlnXYDRFxcX39OeBI4KPAS3PO7wE2ATYHrgD+qY735fr8fWCbkUcvdZw1fknra3/gCcDzIuLZwCrgkXVY//XBTc75kcAJwB9HxDOAk3rGB7i7Pq/Byog0ciZ+SevrMcCNEXFvzvmlwO/0DHtSzvlF9fV/By7igSS/uvYP+OPZC1VSPxO/pFZyzgsptfPTgOfnnC+n1P5/0jPaVcBBOecfUZr0Px4Rt1Jq+T8Gvgr866wGLulBvHOfpFZyzs8CTqod8QYN3wb459qBT9IcZY1f0jrlnA8DPg+8b9yxSJoZa/ySJHWINX5JkjrExC9JUoeY+CVJ6hATvyRJHWLilySpQ0z8kiR1yP8HfyfxlxhvfC0AAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 576x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df_japan.plot(kind='box', figsize=(8, 6))\n",
"\n",
"plt.title('Box plot of Japanese Immigrants from 1980 - 2013')\n",
"plt.ylabel('Number of Immigrants')\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"We can immediately make a few key observations from the plot above:\n",
"1. The minimum number of immigrants is around 200 (min), maximum number is around 1300 (max), and median number of immigrants is around 900 (median).\n",
"2. 25% of the years for period 1980 - 2013 had an annual immigrant count of ~500 or fewer (First quartile).\n",
"2. 75% of the years for period 1980 - 2013 had an annual immigrant count of ~1100 or fewer (Third quartile).\n",
"\n",
"We can view the actual numbers by calling the `describe()` method on the dataframe."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Country</th>\n",
" <th>Japan</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>34.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>814.911765</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>337.219771</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>198.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>529.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>902.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>1079.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>1284.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Country Japan\n",
"count 34.000000\n",
"mean 814.911765\n",
"std 337.219771\n",
"min 198.000000\n",
"25% 529.000000\n",
"50% 902.000000\n",
"75% 1079.000000\n",
"max 1284.000000"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_japan.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"One of the key benefits of box plots is comparing the distribution of multiple datasets. In one of the previous labs, we observed that China and India had very similar immigration trends. Let's analyize these two countries further using box plots.\n",
"\n",
"**Question:** Compare the distribution of the number of new immigrants from India and China for the period 1980 - 2013."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 1: Get the dataset for China and India and call the dataframe **df_CI**."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Country</th>\n",
" <th>India</th>\n",
" <th>China</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1980</th>\n",
" <td>8880</td>\n",
" <td>5123</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1981</th>\n",
" <td>8670</td>\n",
" <td>6682</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1982</th>\n",
" <td>8147</td>\n",
" <td>3308</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1983</th>\n",
" <td>7338</td>\n",
" <td>1863</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1984</th>\n",
" <td>5704</td>\n",
" <td>1527</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Country India China\n",
"1980 8880 5123\n",
"1981 8670 6682\n",
"1982 8147 3308\n",
"1983 7338 1863\n",
"1984 5704 1527"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### type your answer here\n",
"df_CI = df_can.loc[['India', 'China'], years].transpose()\n",
"df_CI.head()\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Double-click __here__ for the solution.\n",
"<!-- The correct answer is:\n",
"df_CI= df_can.loc[['China', 'India'], years].transpose()\n",
"df_CI.head()\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Let's view the percentages associated with both countries using the `describe()` method."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Country</th>\n",
" <th>India</th>\n",
" <th>China</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>34.000000</td>\n",
" <td>34.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>20350.117647</td>\n",
" <td>19410.647059</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>10007.342579</td>\n",
" <td>13568.230790</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>4211.000000</td>\n",
" <td>1527.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>10637.750000</td>\n",
" <td>5512.750000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>20235.000000</td>\n",
" <td>19945.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>28699.500000</td>\n",
" <td>31568.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>36210.000000</td>\n",
" <td>42584.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Country India China\n",
"count 34.000000 34.000000\n",
"mean 20350.117647 19410.647059\n",
"std 10007.342579 13568.230790\n",
"min 4211.000000 1527.000000\n",
"25% 10637.750000 5512.750000\n",
"50% 20235.000000 19945.000000\n",
"75% 28699.500000 31568.500000\n",
"max 36210.000000 42584.000000"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### type your answer here\n",
"df_CI.describe()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Double-click __here__ for the solution.\n",
"<!-- The correct answer is:\n",
"df_CI.describe()\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 2: Plot data."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f1965088a20>"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAD4CAYAAADsKpHdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAQcUlEQVR4nO3db4xc1XnH8e/WmxpSChFMId41kmlxpRqqJHJDrUZVaQDJTYhNJPLgpClu5doVIoIorRKoKoU3keBN/qhJqAy02ElbeEqbYqUmqDGNKCqEhCpqZYgUp7jBXstowaGgNA7eTl/Ms8p6Pbs7sx7PDjvfj3S1d8495865o6P53X97Z6TZbCJJ0s8sdQckSYPBQJAkAQaCJKkYCJIkwECQJJXRpe7AafD2KElanJF2hW/kQGBiYmKpu7BsNBoNJicnl7ob0ikcm701NjY25zJPGUmSAANBklQMBEkSYCBIkoqBIEkCDARJUjEQJEmAgSBJKm/of0yTtHyMj4933ebw4cNnoCfDy0CQNBDm+nKf2r6JFffs6XNvhpOnjCRJgIEgSSoGgiQJMBAkScVAkCQBBoIkqRgIkiTAQJAklY7/MS0iVgDfBg5n5rURcT7wILAGOAhEZh6rurcD24Ap4JbMfLTK1wP3A2cDe4FbM7MZESuB3cB64CXghsw82IPtkyR1qJsjhFuB52a8vg3Yl5lrgX31mohYB2wBLgM2Al+sMAG4G9gBrK1pY5VvA45l5qXAZ4C7FrU1kqRF6ygQImI18F7g3hnFm4FdNb8LuG5G+QOZeTwznwcOAFdExCrg3Mx8MjObtI4IrmuzroeAqyJiZJHbJElahE5PGX0W+Djw8zPKLsrMIwCZeSQiLqzyceCpGfUOVdnrNT+7fLrNC7WuExHxCnABMDmzExGxg9YRBplJo9HosPtayOjoqJ+nBtJRcGz2yYKBEBHXAi9m5jMRcWUH62y3Z9+cp3y+NifJzJ3Azunlk5OTs6tokRqNBn6eGlSOzd4ZGxubc1knp4zeBWyKiIPAA8C7I+LLwNE6DUT9fbHqHwIuntF+NTBR5avblJ/UJiJGgfOAlzvomySpRxYMhMy8PTNXZ+YaWheLH8vMDwN7gK1VbSvwcM3vAbZExMqIuITWxeOn6/TSqxGxoa4P3DirzfS6rq/3OOUIQZJ05pzO/yHcCVwTEd8DrqnXZOZ+IIFnga8BN2fmVLW5idaF6QPA94FHqvw+4IKIOAB8jLpjSZLUPyPN5ht2R7w5MTGxcC11xGsIGlT+QE5v1TWEtndx+p/KkiTAQJAkFQNBkgQYCJKkYiBIkgADQZJUDARJEmAgSJKKgSBJAgwESVIxECRJgIEgSSqd/mKaloHx8fGFK7Vx+PDhHvdE0iAyEIbIfF/sPlFSkqeMJEmAgSBJKgaCJAkwECRJxUCQJAEGgiSpGAiSJMBAkCQVA0GSBBgIkqRiIEiSAANBklQMBEkSYCBIkoqBIEkCDARJUjEQJEmAgSBJKgaCJAkwECRJZXSpOyBpuEzd+iH40Wvdtdm+qbs3efM5rPjc33TXRgaCpD770WusuGdPx9UbjQaTk5NdvUXXASLAU0aSpGIgSJIAA0GSVBa8hhARZwGPAyur/kOZ+cmIOB94EFgDHAQiM49Vm9uBbcAUcEtmPlrl64H7gbOBvcCtmdmMiJXAbmA98BJwQ2Ye7NlWSpIW1MkRwnHg3Zn5NuDtwMaI2ADcBuzLzLXAvnpNRKwDtgCXARuBL0bEilrX3cAOYG1NG6t8G3AsMy8FPgPc1YNtkyR1YcFAyMxmZk7fI/ammprAZmBXle8Crqv5zcADmXk8M58HDgBXRMQq4NzMfDIzm7SOCGa2mV7XQ8BVETFyepsmSepGR7ed1h7+M8ClwBcy85sRcVFmHgHIzCMRcWFVHweemtH8UJW9XvOzy6fbvFDrOhERrwAXACfdaxYRO2gdYZCZNBqNTrdTCzgKfp7qi27H2ujoaNdj0/G8OB0FQmZOAW+PiLcAX4mIy+ep3m7PvjlP+XxtZvdjJ7Bzenm39yZrfn6e6pduxtpi/g+h2/cYJmNjY3Mu6+ouo8z8IfANWuf+j9ZpIOrvi1XtEHDxjGargYkqX92m/KQ2ETEKnAe83E3fJEmnZ8FAiIhfqCMDIuJs4Grgu8AeYGtV2wo8XPN7gC0RsTIiLqF18fjpOr30akRsqOsDN85qM72u64HH6jqDJKlPOjlCWAX8S0T8B/At4J8z86vAncA1EfE94Jp6TWbuBxJ4FvgacHOdcgK4CbiX1oXm7wOPVPl9wAURcQD4GHXHkiSpf0aazTfsjnhzYmJi4VrqyNT2TV09X0ZarG7H2mKfZeR4bq+uIbS9i9OH2y1Di3maJHT5QDCfJiktOwbCctTl0ySh+70wnyYpLT8+y0iSBBgIkqRiIEiSAANBklQMBEkSYCBIkoqBIEkCDARJUjEQJEmAgSBJKgaCJAkwECRJxUCQJAEGgiSpGAiSJMBAkCQVA0GSBBgIkqRiIEiSAANBklQMBEkSYCBIkoqBIEkCDARJUjEQJEmAgSBJKgaCJAkwECRJxUCQJAEGgiSpGAiSJMBAkCQVA0GSBBgIkqRiIEiSAANBklQMBEkSAKMLVYiIi4HdwFuB/wN2ZubnIuJ84EFgDXAQiMw8Vm1uB7YBU8Atmflola8H7gfOBvYCt2ZmMyJW1nusB14CbsjMgz3bSknSgjo5QjgB/HFm/gqwAbg5ItYBtwH7MnMtsK9eU8u2AJcBG4EvRsSKWtfdwA5gbU0bq3wbcCwzLwU+A9zVg22TJHVhwSOEzDwCHKn5VyPiOWAc2AxcWdV2Ad8APlHlD2TmceD5iDgAXBERB4FzM/NJgIjYDVwHPFJt7qh1PQR8PiJGMrN5+ps4fPZevRse/GGXrbqsf/Vu3tflO0gabAsGwkwRsQZ4B/BN4KIKCzLzSERcWNXGgadmNDtUZa/X/Ozy6TYv1LpORMQrwAXA5Kz330HrCIPMpNFodNP9ofGer2/ioq/8W1dtRkdHOXHiRMf1j77/N2jc3N17SAB/1fUOS7c7N8DVu/kDvx+61nEgRMQ5wN8DH83M/4mIuaqOtClrzlM+X5uTZOZOYOf08snJydlVVLr9bBqNRtdt/Py1GO/5+o2suGdPx/UXMzantm9i8obO32OYjI2Nzbmso7uMIuJNtMLgrzPzH6r4aESsquWrgBer/BBw8Yzmq4GJKl/dpvykNhExCpwHvNxJ3yRJvbFgIETECHAf8FxmfnrGoj3A1prfCjw8o3xLRKyMiEtoXTx+uk4vvRoRG2qdN85qM72u64HHvH4gSf3VySmjdwG/B/xnRHynyv4UuBPIiNgG/AD4AEBm7o+IBJ6ldYfSzZk5Ve1u4qe3nT5SE7QC50t1AfplWncpSZL6qJO7jJ6g/Tl+gKvmaPMp4FNtyr8NXN6m/MdUoEiSlob/qSxJAgwESVIxECRJgIEgSSoGgiQJMBAkScVAkCQBBoIkqRgIkiTAQJAkFQNBkgQYCJKkYiBIkgADQZJUDARJEmAgSJKKgSBJAgwESVIxECRJgIEgSSoGgiQJMBAkScVAkCQBBoIkqRgIkiTAQJAkFQNBkgQYCJKkYiBIkgADQZJUDARJEmAgSJKKgSBJAgwESVIxECRJgIEgSSoGgiQJMBAkScVAkCQBMLpQhYj4S+Ba4MXMvLzKzgceBNYAB4HIzGO17HZgGzAF3JKZj1b5euB+4GxgL3BrZjYjYiWwG1gPvATckJkHe7aFQ2pq+6au6h/t9g3efE63LSQNuAUDgdaX+OdpfWlPuw3Yl5l3RsRt9foTEbEO2AJcBowBX4+IX87MKeBuYAfwFK1A2Ag8Qis8jmXmpRGxBbgLuKEXGzesVtyzp+s2U9s3LaqdpOVjwVNGmfk48PKs4s3ArprfBVw3o/yBzDyemc8DB4ArImIVcG5mPpmZTVrhcl2bdT0EXBURI4vdIEnS4nRyhNDORZl5BCAzj0TEhVU+TusIYNqhKnu95meXT7d5odZ1IiJeAS4AJme/aUTsoHWUQWbSaDQW2X3NdhT8PNUX3Y610dHRrsem43lxFhsIc2m3Z9+cp3y+NqfIzJ3Azuk6k5OnZIZOg5+n+qWbsdZoNBY1Nh3P7Y2Njc25bLF3GR2t00DU3xer/BBw8Yx6q4GJKl/dpvykNhExCpzHqaeoJEln2GIDYQ+wtea3Ag/PKN8SESsj4hJgLfB0nV56NSI21PWBG2e1mV7X9cBjdZ1BktRHndx2+rfAlUAjIg4BnwTuBDIitgE/AD4AkJn7IyKBZ4ETwM11hxHATfz0ttNHagK4D/hSRBygdWSwpSdbJknqykiz+YbdGW9OTEwsXEsd8bZT9Uu3Y20x1xAcz3Orawht7+T0P5UlSYCBIEkqBoIkCTAQJEnFQJAkAQaCJKkYCJIkwECQJBUDQZIEGAiSpGIgSJIAA0GSVAwESRJgIEiSioEgSQIMBElSWfAX0ySp16a2b+q47tHFvMGbz1lMq6FnIEjqq25/ycxfP+sfTxlJkgADQZJUDARJEmAgSJKKF5WHyPj4+EIV2hYfPnz4DPRG0qAxEIbIfF/sjUaDycnJPvZG0qDxlJEkCTAQJEnFQJAkAQaCJKkYCJIkwECQJBUDQZIEGAiSpGIgSJIAA0GSVAwESRLgs4wkDYh5H77ogxf7wkCQNBDm+nL3wYv94ykjSRJgIEiSysCcMoqIjcDngBXAvZl55xJ3SZKGykAcIUTECuALwO8A64APRsS6pe2VJA2XgQgE4ArgQGb+V2b+BHgA2LzEfZKkoTIop4zGgRdmvD4E/PrsShGxA9gBkJk0Go3+9G4IjI6O+nlqIDk2+2dQAmGkTVlzdkFm7gR2Ti/3VrTe8dY+DSrHZm+NjY3NuWxQThkdAi6e8Xo1MLFEfZGkoTQoRwjfAtZGxCXAYWAL8KGFGs2XdOqen6cGlWOzPwbiCCEzTwAfAR4FnmsV5f4Fmo049W6KiGeWug9OTu0mx+YZmdoalCMEMnMvsHep+yFJw2ogjhAkSUvPQNC0nQtXkZaEY7NPRprNU+7ulCQNIY8QJEmAgSBJKgNzl5F6KyJey8xzuqh/JfAnmXltRGwC1vnEWfVaRLwV+CzwTuA4cBD4R2BTZl7bpv69wKcz89l+9nNYGQg6RWbuAfYsdT+0vETECPAVYFdmbqmytwPvm6tNZv5hn7onDIRlr/b87wAmgcuBZ4APZ2azfoPis7Xs32e0+X3g1zLzIxHxPuDPgJ8FXgJ+NzOP9nMbtGz8NvB6Zv7FdEFmfici3gJcFREPceoY/QatI9dvR8RrtH4z5Vrgf4HNmXnUMdo7XkMYDu8APkrrtyZ+EXhXRJwF3ENr7+w3gbfO0fYJYENmvoPWY8k/fua7q2Vq+su+nVPGaJs6Pwc8lZlvAx4Htle5Y7RHPEIYDk9n5iGAiPgOsAZ4DXg+M79X5V+mHi0+y2rgwYhYRWsP7Pm+9FjDpt0YfWJWnZ8AX635Z4Brat4x2iMeIQyH4zPmp/jpjkAn/4Ty58DnM/NXgT8Czupx3zQ89gPr51g21xid6fXMbLap4xjtEQNheH0XuCQifqlef3COeufRegItwNYz3istZ48BKyNi+lQPEfFO4LdOc72O0R4xEIZUZv6Y1imif4qIJ4D/nqPqHcDfRcS/0rr4LC1K7d2/H7gmIr4fEftpja/T/e2TO3CM9oSPrpAkAR4hSJKKgSBJAgwESVIxECRJgIEgSSoGgiQJMBAkSeX/AVG0pXLb+J5LAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"### type your answer here\n",
"df_CI.plot(kind='box')\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Double-click __here__ for the solution.\n",
"<!-- The correct answer is:\n",
"df_CI.plot(kind='box', figsize=(10, 7))\n",
"-->\n",
"\n",
"<!--\n",
"plt.title('Box plots of Immigrants from China and India (1980 - 2013)')\n",
"plt.xlabel('Number of Immigrants')\n",
"-->\n",
"\n",
"<!--\n",
"plt.show()\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"We can observe that, while both countries have around the same median immigrant population (~20,000), China's immigrant population range is more spread out than India's. The maximum population from India for any year (36,210) is around 15% lower than the maximum population from China (42,584).\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"If you prefer to create horizontal box plots, you can pass the `vert` parameter in the **plot** function and assign it to *False*. You can also specify a different color in case you are not a big fan of the default red color."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# horizontal box plots\n",
"df_CI.plot(kind='box', figsize=(10, 7), color='blue', vert=False)\n",
"\n",
"plt.title('Box plots of Immigrants from China and India (1980 - 2013)')\n",
"plt.xlabel('Number of Immigrants')\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"**Subplots**\n",
"\n",
"Often times we might want to plot multiple plots within the same figure. For example, we might want to perform a side by side comparison of the box plot with the line plot of China and India's immigration.\n",
"\n",
"To visualize multiple plots together, we can create a **`figure`** (overall canvas) and divide it into **`subplots`**, each containing a plot. With **subplots**, we usually work with the **artist layer** instead of the **scripting layer**. \n",
"\n",
"Typical syntax is : <br>\n",
"```python\n",
" fig = plt.figure() # create figure\n",
" ax = fig.add_subplot(nrows, ncols, plot_number) # create subplots\n",
"```\n",
"Where\n",
"- `nrows` and `ncols` are used to notionally split the figure into (`nrows` \\* `ncols`) sub-axes, \n",
"- `plot_number` is used to identify the particular subplot that this function is to create within the notional grid. `plot_number` starts at 1, increments across rows first and has a maximum of `nrows` * `ncols` as shown below.\n",
"\n",
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Images/Mod3Fig5Subplots_V2.png\" width=500 align=\"center\">"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"We can then specify which subplot to place each plot by passing in the `ax` paramemter in `plot()` method as follows:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1440x432 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plt.figure() # create figure\n",
"\n",
"ax0 = fig.add_subplot(1, 2, 1) # add subplot 1 (1 row, 2 columns, first plot)\n",
"ax1 = fig.add_subplot(1, 2, 2) # add subplot 2 (1 row, 2 columns, second plot). See tip below**\n",
"\n",
"# Subplot 1: Box plot\n",
"df_CI.plot(kind='box', color='blue', vert=False, figsize=(20, 6), ax=ax0) # add to subplot 1\n",
"ax0.set_title('Box Plots of Immigrants from China and India (1980 - 2013)')\n",
"ax0.set_xlabel('Number of Immigrants')\n",
"ax0.set_ylabel('Countries')\n",
"\n",
"# Subplot 2: Line plot\n",
"df_CI.plot(kind='line', figsize=(20, 6), ax=ax1) # add to subplot 2\n",
"ax1.set_title ('Line Plots of Immigrants from China and India (1980 - 2013)')\n",
"ax1.set_ylabel('Number of Immigrants')\n",
"ax1.set_xlabel('Years')\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"** * Tip regarding subplot convention **\n",
"\n",
"In the case when `nrows`, `ncols`, and `plot_number` are all less than 10, a convenience exists such that the a 3 digit number can be given instead, where the hundreds represent `nrows`, the tens represent `ncols` and the units represent `plot_number`. For instance,\n",
"```python\n",
" subplot(211) == subplot(2, 1, 1) \n",
"```\n",
"produces a subaxes in a figure which represents the top plot (i.e. the first) in a 2 rows by 1 column notional grid (no grid actually exists, but conceptually this is how the returned subplot has been positioned)."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Let's try something a little more advanced. \n",
"\n",
"Previously we identified the top 15 countries based on total immigration from 1980 - 2013.\n",
"\n",
"**Question:** Create a box plot to visualize the distribution of the top 15 countries (based on total immigration) grouped by the *decades* `1980s`, `1990s`, and `2000s`."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 1: Get the dataset. Get the top 15 countries based on Total immigrant population. Name the dataframe **df_top15**."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Continent</th>\n",
" <th>Region</th>\n",
" <th>DevName</th>\n",
" <th>1980</th>\n",
" <th>1981</th>\n",
" <th>1982</th>\n",
" <th>1983</th>\n",
" <th>1984</th>\n",
" <th>1985</th>\n",
" <th>1986</th>\n",
" <th>...</th>\n",
" <th>2005</th>\n",
" <th>2006</th>\n",
" <th>2007</th>\n",
" <th>2008</th>\n",
" <th>2009</th>\n",
" <th>2010</th>\n",
" <th>2011</th>\n",
" <th>2012</th>\n",
" <th>2013</th>\n",
" <th>Total</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Country</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>India</th>\n",
" <td>Asia</td>\n",
" <td>Southern Asia</td>\n",
" <td>Developing regions</td>\n",
" <td>8880</td>\n",
" <td>8670</td>\n",
" <td>8147</td>\n",
" <td>7338</td>\n",
" <td>5704</td>\n",
" <td>4211</td>\n",
" <td>7150</td>\n",
" <td>...</td>\n",
" <td>36210</td>\n",
" <td>33848</td>\n",
" <td>28742</td>\n",
" <td>28261</td>\n",
" <td>29456</td>\n",
" <td>34235</td>\n",
" <td>27509</td>\n",
" <td>30933</td>\n",
" <td>33087</td>\n",
" <td>691904</td>\n",
" </tr>\n",
" <tr>\n",
" <th>China</th>\n",
" <td>Asia</td>\n",
" <td>Eastern Asia</td>\n",
" <td>Developing regions</td>\n",
" <td>5123</td>\n",
" <td>6682</td>\n",
" <td>3308</td>\n",
" <td>1863</td>\n",
" <td>1527</td>\n",
" <td>1816</td>\n",
" <td>1960</td>\n",
" <td>...</td>\n",
" <td>42584</td>\n",
" <td>33518</td>\n",
" <td>27642</td>\n",
" <td>30037</td>\n",
" <td>29622</td>\n",
" <td>30391</td>\n",
" <td>28502</td>\n",
" <td>33024</td>\n",
" <td>34129</td>\n",
" <td>659962</td>\n",
" </tr>\n",
" <tr>\n",
" <th>United Kingdom of Great Britain and Northern Ireland</th>\n",
" <td>Europe</td>\n",
" <td>Northern Europe</td>\n",
" <td>Developed regions</td>\n",
" <td>22045</td>\n",
" <td>24796</td>\n",
" <td>20620</td>\n",
" <td>10015</td>\n",
" <td>10170</td>\n",
" <td>9564</td>\n",
" <td>9470</td>\n",
" <td>...</td>\n",
" <td>7258</td>\n",
" <td>7140</td>\n",
" <td>8216</td>\n",
" <td>8979</td>\n",
" <td>8876</td>\n",
" <td>8724</td>\n",
" <td>6204</td>\n",
" <td>6195</td>\n",
" <td>5827</td>\n",
" <td>551500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Philippines</th>\n",
" <td>Asia</td>\n",
" <td>South-Eastern Asia</td>\n",
" <td>Developing regions</td>\n",
" <td>6051</td>\n",
" <td>5921</td>\n",
" <td>5249</td>\n",
" <td>4562</td>\n",
" <td>3801</td>\n",
" <td>3150</td>\n",
" <td>4166</td>\n",
" <td>...</td>\n",
" <td>18139</td>\n",
" <td>18400</td>\n",
" <td>19837</td>\n",
" <td>24887</td>\n",
" <td>28573</td>\n",
" <td>38617</td>\n",
" <td>36765</td>\n",
" <td>34315</td>\n",
" <td>29544</td>\n",
" <td>511391</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Pakistan</th>\n",
" <td>Asia</td>\n",
" <td>Southern Asia</td>\n",
" <td>Developing regions</td>\n",
" <td>978</td>\n",
" <td>972</td>\n",
" <td>1201</td>\n",
" <td>900</td>\n",
" <td>668</td>\n",
" <td>514</td>\n",
" <td>691</td>\n",
" <td>...</td>\n",
" <td>14314</td>\n",
" <td>13127</td>\n",
" <td>10124</td>\n",
" <td>8994</td>\n",
" <td>7217</td>\n",
" <td>6811</td>\n",
" <td>7468</td>\n",
" <td>11227</td>\n",
" <td>12603</td>\n",
" <td>241600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>United States of America</th>\n",
" <td>Northern America</td>\n",
" <td>Northern America</td>\n",
" <td>Developed regions</td>\n",
" <td>9378</td>\n",
" <td>10030</td>\n",
" <td>9074</td>\n",
" <td>7100</td>\n",
" <td>6661</td>\n",
" <td>6543</td>\n",
" <td>7074</td>\n",
" <td>...</td>\n",
" <td>8394</td>\n",
" <td>9613</td>\n",
" <td>9463</td>\n",
" <td>10190</td>\n",
" <td>8995</td>\n",
" <td>8142</td>\n",
" <td>7676</td>\n",
" <td>7891</td>\n",
" <td>8501</td>\n",
" <td>241122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Iran (Islamic Republic of)</th>\n",
" <td>Asia</td>\n",
" <td>Southern Asia</td>\n",
" <td>Developing regions</td>\n",
" <td>1172</td>\n",
" <td>1429</td>\n",
" <td>1822</td>\n",
" <td>1592</td>\n",
" <td>1977</td>\n",
" <td>1648</td>\n",
" <td>1794</td>\n",
" <td>...</td>\n",
" <td>5837</td>\n",
" <td>7480</td>\n",
" <td>6974</td>\n",
" <td>6475</td>\n",
" <td>6580</td>\n",
" <td>7477</td>\n",
" <td>7479</td>\n",
" <td>7534</td>\n",
" <td>11291</td>\n",
" <td>175923</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Sri Lanka</th>\n",
" <td>Asia</td>\n",
" <td>Southern Asia</td>\n",
" <td>Developing regions</td>\n",
" <td>185</td>\n",
" <td>371</td>\n",
" <td>290</td>\n",
" <td>197</td>\n",
" <td>1086</td>\n",
" <td>845</td>\n",
" <td>1838</td>\n",
" <td>...</td>\n",
" <td>4930</td>\n",
" <td>4714</td>\n",
" <td>4123</td>\n",
" <td>4756</td>\n",
" <td>4547</td>\n",
" <td>4422</td>\n",
" <td>3309</td>\n",
" <td>3338</td>\n",
" <td>2394</td>\n",
" <td>148358</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Republic of Korea</th>\n",
" <td>Asia</td>\n",
" <td>Eastern Asia</td>\n",
" <td>Developing regions</td>\n",
" <td>1011</td>\n",
" <td>1456</td>\n",
" <td>1572</td>\n",
" <td>1081</td>\n",
" <td>847</td>\n",
" <td>962</td>\n",
" <td>1208</td>\n",
" <td>...</td>\n",
" <td>5832</td>\n",
" <td>6215</td>\n",
" <td>5920</td>\n",
" <td>7294</td>\n",
" <td>5874</td>\n",
" <td>5537</td>\n",
" <td>4588</td>\n",
" <td>5316</td>\n",
" <td>4509</td>\n",
" <td>142581</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Poland</th>\n",
" <td>Europe</td>\n",
" <td>Eastern Europe</td>\n",
" <td>Developed regions</td>\n",
" <td>863</td>\n",
" <td>2930</td>\n",
" <td>5881</td>\n",
" <td>4546</td>\n",
" <td>3588</td>\n",
" <td>2819</td>\n",
" <td>4808</td>\n",
" <td>...</td>\n",
" <td>1405</td>\n",
" <td>1263</td>\n",
" <td>1235</td>\n",
" <td>1267</td>\n",
" <td>1013</td>\n",
" <td>795</td>\n",
" <td>720</td>\n",
" <td>779</td>\n",
" <td>852</td>\n",
" <td>139241</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Lebanon</th>\n",
" <td>Asia</td>\n",
" <td>Western Asia</td>\n",
" <td>Developing regions</td>\n",
" <td>1409</td>\n",
" <td>1119</td>\n",
" <td>1159</td>\n",
" <td>789</td>\n",
" <td>1253</td>\n",
" <td>1683</td>\n",
" <td>2576</td>\n",
" <td>...</td>\n",
" <td>3709</td>\n",
" <td>3802</td>\n",
" <td>3467</td>\n",
" <td>3566</td>\n",
" <td>3077</td>\n",
" <td>3432</td>\n",
" <td>3072</td>\n",
" <td>1614</td>\n",
" <td>2172</td>\n",
" <td>115359</td>\n",
" </tr>\n",
" <tr>\n",
" <th>France</th>\n",
" <td>Europe</td>\n",
" <td>Western Europe</td>\n",
" <td>Developed regions</td>\n",
" <td>1729</td>\n",
" <td>2027</td>\n",
" <td>2219</td>\n",
" <td>1490</td>\n",
" <td>1169</td>\n",
" <td>1177</td>\n",
" <td>1298</td>\n",
" <td>...</td>\n",
" <td>4429</td>\n",
" <td>4002</td>\n",
" <td>4290</td>\n",
" <td>4532</td>\n",
" <td>5051</td>\n",
" <td>4646</td>\n",
" <td>4080</td>\n",
" <td>6280</td>\n",
" <td>5623</td>\n",
" <td>109091</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Jamaica</th>\n",
" <td>Latin America and the Caribbean</td>\n",
" <td>Caribbean</td>\n",
" <td>Developing regions</td>\n",
" <td>3198</td>\n",
" <td>2634</td>\n",
" <td>2661</td>\n",
" <td>2455</td>\n",
" <td>2508</td>\n",
" <td>2938</td>\n",
" <td>4649</td>\n",
" <td>...</td>\n",
" <td>1945</td>\n",
" <td>1722</td>\n",
" <td>2141</td>\n",
" <td>2334</td>\n",
" <td>2456</td>\n",
" <td>2321</td>\n",
" <td>2059</td>\n",
" <td>2182</td>\n",
" <td>2479</td>\n",
" <td>106431</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Viet Nam</th>\n",
" <td>Asia</td>\n",
" <td>South-Eastern Asia</td>\n",
" <td>Developing regions</td>\n",
" <td>1191</td>\n",
" <td>1829</td>\n",
" <td>2162</td>\n",
" <td>3404</td>\n",
" <td>7583</td>\n",
" <td>5907</td>\n",
" <td>2741</td>\n",
" <td>...</td>\n",
" <td>1852</td>\n",
" <td>3153</td>\n",
" <td>2574</td>\n",
" <td>1784</td>\n",
" <td>2171</td>\n",
" <td>1942</td>\n",
" <td>1723</td>\n",
" <td>1731</td>\n",
" <td>2112</td>\n",
" <td>97146</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Romania</th>\n",
" <td>Europe</td>\n",
" <td>Eastern Europe</td>\n",
" <td>Developed regions</td>\n",
" <td>375</td>\n",
" <td>438</td>\n",
" <td>583</td>\n",
" <td>543</td>\n",
" <td>524</td>\n",
" <td>604</td>\n",
" <td>656</td>\n",
" <td>...</td>\n",
" <td>5048</td>\n",
" <td>4468</td>\n",
" <td>3834</td>\n",
" <td>2837</td>\n",
" <td>2076</td>\n",
" <td>1922</td>\n",
" <td>1776</td>\n",
" <td>1588</td>\n",
" <td>1512</td>\n",
" <td>93585</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>15 rows × 38 columns</p>\n",
"</div>"
],
"text/plain": [
" Continent \\\n",
"Country \n",
"India Asia \n",
"China Asia \n",
"United Kingdom of Great Britain and Northern Ir... Europe \n",
"Philippines Asia \n",
"Pakistan Asia \n",
"United States of America Northern America \n",
"Iran (Islamic Republic of) Asia \n",
"Sri Lanka Asia \n",
"Republic of Korea Asia \n",
"Poland Europe \n",
"Lebanon Asia \n",
"France Europe \n",
"Jamaica Latin America and the Caribbean \n",
"Viet Nam Asia \n",
"Romania Europe \n",
"\n",
" Region \\\n",
"Country \n",
"India Southern Asia \n",
"China Eastern Asia \n",
"United Kingdom of Great Britain and Northern Ir... Northern Europe \n",
"Philippines South-Eastern Asia \n",
"Pakistan Southern Asia \n",
"United States of America Northern America \n",
"Iran (Islamic Republic of) Southern Asia \n",
"Sri Lanka Southern Asia \n",
"Republic of Korea Eastern Asia \n",
"Poland Eastern Europe \n",
"Lebanon Western Asia \n",
"France Western Europe \n",
"Jamaica Caribbean \n",
"Viet Nam South-Eastern Asia \n",
"Romania Eastern Europe \n",
"\n",
" DevName 1980 \\\n",
"Country \n",
"India Developing regions 8880 \n",
"China Developing regions 5123 \n",
"United Kingdom of Great Britain and Northern Ir... Developed regions 22045 \n",
"Philippines Developing regions 6051 \n",
"Pakistan Developing regions 978 \n",
"United States of America Developed regions 9378 \n",
"Iran (Islamic Republic of) Developing regions 1172 \n",
"Sri Lanka Developing regions 185 \n",
"Republic of Korea Developing regions 1011 \n",
"Poland Developed regions 863 \n",
"Lebanon Developing regions 1409 \n",
"France Developed regions 1729 \n",
"Jamaica Developing regions 3198 \n",
"Viet Nam Developing regions 1191 \n",
"Romania Developed regions 375 \n",
"\n",
" 1981 1982 1983 \\\n",
"Country \n",
"India 8670 8147 7338 \n",
"China 6682 3308 1863 \n",
"United Kingdom of Great Britain and Northern Ir... 24796 20620 10015 \n",
"Philippines 5921 5249 4562 \n",
"Pakistan 972 1201 900 \n",
"United States of America 10030 9074 7100 \n",
"Iran (Islamic Republic of) 1429 1822 1592 \n",
"Sri Lanka 371 290 197 \n",
"Republic of Korea 1456 1572 1081 \n",
"Poland 2930 5881 4546 \n",
"Lebanon 1119 1159 789 \n",
"France 2027 2219 1490 \n",
"Jamaica 2634 2661 2455 \n",
"Viet Nam 1829 2162 3404 \n",
"Romania 438 583 543 \n",
"\n",
" 1984 1985 1986 ... \\\n",
"Country ... \n",
"India 5704 4211 7150 ... \n",
"China 1527 1816 1960 ... \n",
"United Kingdom of Great Britain and Northern Ir... 10170 9564 9470 ... \n",
"Philippines 3801 3150 4166 ... \n",
"Pakistan 668 514 691 ... \n",
"United States of America 6661 6543 7074 ... \n",
"Iran (Islamic Republic of) 1977 1648 1794 ... \n",
"Sri Lanka 1086 845 1838 ... \n",
"Republic of Korea 847 962 1208 ... \n",
"Poland 3588 2819 4808 ... \n",
"Lebanon 1253 1683 2576 ... \n",
"France 1169 1177 1298 ... \n",
"Jamaica 2508 2938 4649 ... \n",
"Viet Nam 7583 5907 2741 ... \n",
"Romania 524 604 656 ... \n",
"\n",
" 2005 2006 2007 \\\n",
"Country \n",
"India 36210 33848 28742 \n",
"China 42584 33518 27642 \n",
"United Kingdom of Great Britain and Northern Ir... 7258 7140 8216 \n",
"Philippines 18139 18400 19837 \n",
"Pakistan 14314 13127 10124 \n",
"United States of America 8394 9613 9463 \n",
"Iran (Islamic Republic of) 5837 7480 6974 \n",
"Sri Lanka 4930 4714 4123 \n",
"Republic of Korea 5832 6215 5920 \n",
"Poland 1405 1263 1235 \n",
"Lebanon 3709 3802 3467 \n",
"France 4429 4002 4290 \n",
"Jamaica 1945 1722 2141 \n",
"Viet Nam 1852 3153 2574 \n",
"Romania 5048 4468 3834 \n",
"\n",
" 2008 2009 2010 \\\n",
"Country \n",
"India 28261 29456 34235 \n",
"China 30037 29622 30391 \n",
"United Kingdom of Great Britain and Northern Ir... 8979 8876 8724 \n",
"Philippines 24887 28573 38617 \n",
"Pakistan 8994 7217 6811 \n",
"United States of America 10190 8995 8142 \n",
"Iran (Islamic Republic of) 6475 6580 7477 \n",
"Sri Lanka 4756 4547 4422 \n",
"Republic of Korea 7294 5874 5537 \n",
"Poland 1267 1013 795 \n",
"Lebanon 3566 3077 3432 \n",
"France 4532 5051 4646 \n",
"Jamaica 2334 2456 2321 \n",
"Viet Nam 1784 2171 1942 \n",
"Romania 2837 2076 1922 \n",
"\n",
" 2011 2012 2013 \\\n",
"Country \n",
"India 27509 30933 33087 \n",
"China 28502 33024 34129 \n",
"United Kingdom of Great Britain and Northern Ir... 6204 6195 5827 \n",
"Philippines 36765 34315 29544 \n",
"Pakistan 7468 11227 12603 \n",
"United States of America 7676 7891 8501 \n",
"Iran (Islamic Republic of) 7479 7534 11291 \n",
"Sri Lanka 3309 3338 2394 \n",
"Republic of Korea 4588 5316 4509 \n",
"Poland 720 779 852 \n",
"Lebanon 3072 1614 2172 \n",
"France 4080 6280 5623 \n",
"Jamaica 2059 2182 2479 \n",
"Viet Nam 1723 1731 2112 \n",
"Romania 1776 1588 1512 \n",
"\n",
" Total \n",
"Country \n",
"India 691904 \n",
"China 659962 \n",
"United Kingdom of Great Britain and Northern Ir... 551500 \n",
"Philippines 511391 \n",
"Pakistan 241600 \n",
"United States of America 241122 \n",
"Iran (Islamic Republic of) 175923 \n",
"Sri Lanka 148358 \n",
"Republic of Korea 142581 \n",
"Poland 139241 \n",
"Lebanon 115359 \n",
"France 109091 \n",
"Jamaica 106431 \n",
"Viet Nam 97146 \n",
"Romania 93585 \n",
"\n",
"[15 rows x 38 columns]"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### type your answer here\n",
"df_top15 = df_can.sort_values(['Total'], ascending=False, axis=0).head(15)\n",
"df_top15\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Double-click __here__ for the solution.\n",
"<!-- The correct answer is:\n",
"df_top15 = df_can.sort_values(['Total'], ascending=False, axis=0).head(15)\n",
"df_top15\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 2: Create a new dataframe which contains the aggregate for each decade. One way to do that:\n",
" 1. Create a list of all years in decades 80's, 90's, and 00's.\n",
" 2. Slice the original dataframe df_can to create a series for each decade and sum across all years for each country.\n",
" 3. Merge the three series into a new data frame. Call your dataframe **new_df**."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>1980s</th>\n",
" <th>1990s</th>\n",
" <th>2000s</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Country</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>India</th>\n",
" <td>82154</td>\n",
" <td>180395</td>\n",
" <td>303591</td>\n",
" </tr>\n",
" <tr>\n",
" <th>China</th>\n",
" <td>32003</td>\n",
" <td>161528</td>\n",
" <td>340385</td>\n",
" </tr>\n",
" <tr>\n",
" <th>United Kingdom of Great Britain and Northern Ireland</th>\n",
" <td>179171</td>\n",
" <td>261966</td>\n",
" <td>83413</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Philippines</th>\n",
" <td>60764</td>\n",
" <td>138482</td>\n",
" <td>172904</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Pakistan</th>\n",
" <td>10591</td>\n",
" <td>65302</td>\n",
" <td>127598</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 1980s 1990s 2000s\n",
"Country \n",
"India 82154 180395 303591\n",
"China 32003 161528 340385\n",
"United Kingdom of Great Britain and Northern Ir... 179171 261966 83413\n",
"Philippines 60764 138482 172904\n",
"Pakistan 10591 65302 127598"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### type your answer here\n",
"years_80s = list(map(str, range(1980, 1990))) \n",
"years_90s = list(map(str, range(1990, 2000))) \n",
"years_00s = list(map(str, range(2000, 2010))) \n",
"\n",
"df_80s = df_top15.loc[:, years_80s].sum(axis=1) \n",
"df_90s = df_top15.loc[:, years_90s].sum(axis=1) \n",
"df_00s = df_top15.loc[:, years_00s].sum(axis=1)\n",
"\n",
"\n",
"new_df = pd.DataFrame({'1980s': df_80s, '1990s': df_90s, '2000s':df_00s}) \n",
"new_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Double-click __here__ for the solution.\n",
"<!-- The correct answer is:\n",
"\\\\ # create a list of all years in decades 80's, 90's, and 00's\n",
"years_80s = list(map(str, range(1980, 1990))) \n",
"years_90s = list(map(str, range(1990, 2000))) \n",
"years_00s = list(map(str, range(2000, 2010))) \n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # slice the original dataframe df_can to create a series for each decade\n",
"df_80s = df_top15.loc[:, years_80s].sum(axis=1) \n",
"df_90s = df_top15.loc[:, years_90s].sum(axis=1) \n",
"df_00s = df_top15.loc[:, years_00s].sum(axis=1)\n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # merge the three series into a new data frame\n",
"new_df = pd.DataFrame({'1980s': df_80s, '1990s': df_90s, '2000s':df_00s}) \n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # display dataframe\n",
"new_df.head()\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Let's learn more about the statistics associated with the dataframe using the `describe()` method."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>1980s</th>\n",
" <th>1990s</th>\n",
" <th>2000s</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>15.000000</td>\n",
" <td>15.000000</td>\n",
" <td>15.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>44418.333333</td>\n",
" <td>85594.666667</td>\n",
" <td>97471.533333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>44190.676455</td>\n",
" <td>68237.560246</td>\n",
" <td>100583.204205</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>7613.000000</td>\n",
" <td>30028.000000</td>\n",
" <td>13629.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>16698.000000</td>\n",
" <td>39259.000000</td>\n",
" <td>36101.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>30638.000000</td>\n",
" <td>56915.000000</td>\n",
" <td>65794.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>59183.000000</td>\n",
" <td>104451.500000</td>\n",
" <td>105505.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>179171.000000</td>\n",
" <td>261966.000000</td>\n",
" <td>340385.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 1980s 1990s 2000s\n",
"count 15.000000 15.000000 15.000000\n",
"mean 44418.333333 85594.666667 97471.533333\n",
"std 44190.676455 68237.560246 100583.204205\n",
"min 7613.000000 30028.000000 13629.000000\n",
"25% 16698.000000 39259.000000 36101.500000\n",
"50% 30638.000000 56915.000000 65794.000000\n",
"75% 59183.000000 104451.500000 105505.500000\n",
"max 179171.000000 261966.000000 340385.000000"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### type your answer here\n",
"new_df.describe()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Double-click __here__ for the solution.\n",
"<!-- The correct answer is:\n",
"new_df.describe()\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 3: Plot the box plots."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f196510bfd0>"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"### type your answer here\n",
"new_df.plot(kind='box')\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Double-click __here__ for the solution.\n",
"<!-- The correct answer is:\n",
"new_df.plot(kind='box', figsize=(10, 6))\n",
"-->\n",
"\n",
"<!--\n",
"plt.title('Immigration from top 15 countries for decades 80s, 90s and 2000s')\n",
"-->\n",
"\n",
"<!--\n",
"plt.show()\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Note how the box plot differs from the summary table created. The box plot scans the data and identifies the outliers. In order to be an outlier, the data value must be:<br>\n",
"* larger than Q3 by at least 1.5 times the interquartile range (IQR), or,\n",
"* smaller than Q1 by at least 1.5 times the IQR.\n",
"\n",
"Let's look at decade 2000s as an example: <br>\n",
"* Q1 (25%) = 36,101.5 <br>\n",
"* Q3 (75%) = 105,505.5 <br>\n",
"* IQR = Q3 - Q1 = 69,404 <br>\n",
"\n",
"Using the definition of outlier, any value that is greater than Q3 by 1.5 times IQR will be flagged as outlier.\n",
"\n",
"Outlier > 105,505.5 + (1.5 * 69,404) <br>\n",
"Outlier > 209,611.5"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>1980s</th>\n",
" <th>1990s</th>\n",
" <th>2000s</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Country</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>India</th>\n",
" <td>82154</td>\n",
" <td>180395</td>\n",
" <td>303591</td>\n",
" </tr>\n",
" <tr>\n",
" <th>China</th>\n",
" <td>32003</td>\n",
" <td>161528</td>\n",
" <td>340385</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 1980s 1990s 2000s\n",
"Country \n",
"India 82154 180395 303591\n",
"China 32003 161528 340385"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# let's check how many entries fall above the outlier threshold \n",
"new_df[new_df['2000s']> 209611.5]"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"China and India are both considered as outliers since their population for the decade exceeds 209,611.5. \n",
"\n",
"The box plot is an advanced visualizaiton tool, and there are many options and customizations that exceed the scope of this lab. Please refer to [Matplotlib documentation](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.boxplot) on box plots for more information."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# Scatter Plots <a id=\"10\"></a>\n",
"\n",
"A `scatter plot` (2D) is a useful method of comparing variables against each other. `Scatter` plots look similar to `line plots` in that they both map independent and dependent variables on a 2D graph. While the datapoints are connected together by a line in a line plot, they are not connected in a scatter plot. The data in a scatter plot is considered to express a trend. With further analysis using tools like regression, we can mathematically calculate this relationship and use it to predict trends outside the dataset.\n",
"\n",
"Let's start by exploring the following:\n",
"\n",
"Using a `scatter plot`, let's visualize the trend of total immigrantion to Canada (all countries combined) for the years 1980 - 2013."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 1: Get the dataset. Since we are expecting to use the relationship betewen `years` and `total population`, we will convert `years` to `int` type."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>total</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1980</td>\n",
" <td>99137</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1981</td>\n",
" <td>110563</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1982</td>\n",
" <td>104271</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1983</td>\n",
" <td>75550</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1984</td>\n",
" <td>73417</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year total\n",
"0 1980 99137\n",
"1 1981 110563\n",
"2 1982 104271\n",
"3 1983 75550\n",
"4 1984 73417"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# we can use the sum() method to get the total population per year\n",
"df_tot = pd.DataFrame(df_can[years].sum(axis=0))\n",
"\n",
"# change the years to type int (useful for regression later on)\n",
"df_tot.index = map(int, df_tot.index)\n",
"\n",
"# reset the index to put in back in as a column in the df_tot dataframe\n",
"df_tot.reset_index(inplace = True)\n",
"\n",
"# rename columns\n",
"df_tot.columns = ['year', 'total']\n",
"\n",
"# view the final dataframe\n",
"df_tot.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 2: Plot the data. In `Matplotlib`, we can create a `scatter` plot set by passing in `kind='scatter'` as plot argument. We will also need to pass in `x` and `y` keywords to specify the columns that go on the x- and the y-axis."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df_tot.plot(kind='scatter', x='year', y='total', figsize=(10, 6), color='darkblue')\n",
"\n",
"plt.title('Total Immigration to Canada from 1980 - 2013')\n",
"plt.xlabel('Year')\n",
"plt.ylabel('Number of Immigrants')\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Notice how the scatter plot does not connect the datapoints together. We can clearly observe an upward trend in the data: as the years go by, the total number of immigrants increases. We can mathematically analyze this upward trend using a regression line (line of best fit). "
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"So let's try to plot a linear line of best fit, and use it to predict the number of immigrants in 2015.\n",
"\n",
"Step 1: Get the equation of line of best fit. We will use **Numpy**'s `polyfit()` method by passing in the following:\n",
"- `x`: x-coordinates of the data. \n",
"- `y`: y-coordinates of the data. \n",
"- `deg`: Degree of fitting polynomial. 1 = linear, 2 = quadratic, and so on."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 5.56709228e+03, -1.09261952e+07])"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = df_tot['year'] # year on x-axis\n",
"y = df_tot['total'] # total on y-axis\n",
"fit = np.polyfit(x, y, deg=1)\n",
"\n",
"fit"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"The output is an array with the polynomial coefficients, highest powers first. Since we are plotting a linear regression `y= a*x + b`, our output has 2 elements `[5.56709228e+03, -1.09261952e+07]` with the the slope in position 0 and intercept in position 1. \n",
"\n",
"Step 2: Plot the regression line on the `scatter plot`."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"'No. Immigrants = 5567 * Year + -10926195'"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_tot.plot(kind='scatter', x='year', y='total', figsize=(10, 6), color='darkblue')\n",
"\n",
"plt.title('Total Immigration to Canada from 1980 - 2013')\n",
"plt.xlabel('Year')\n",
"plt.ylabel('Number of Immigrants')\n",
"\n",
"# plot line of best fit\n",
"plt.plot(x, fit[0] * x + fit[1], color='red') # recall that x is the Years\n",
"plt.annotate('y={0:.0f} x + {1:.0f}'.format(fit[0], fit[1]), xy=(2000, 150000))\n",
"\n",
"plt.show()\n",
"\n",
"# print out the line of best fit\n",
"'No. Immigrants = {0:.0f} * Year + {1:.0f}'.format(fit[0], fit[1]) "
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Using the equation of line of best fit, we can estimate the number of immigrants in 2015:\n",
"```python\n",
"No. Immigrants = 5567 * Year - 10926195\n",
"No. Immigrants = 5567 * 2015 - 10926195\n",
"No. Immigrants = 291,310\n",
"```\n",
"When compared to the actuals from Citizenship and Immigration Canada's (CIC) [2016 Annual Report](http://www.cic.gc.ca/english/resources/publications/annual-report-2016/index.asp), we see that Canada accepted 271,845 immigrants in 2015. Our estimated value of 291,310 is within 7% of the actual number, which is pretty good considering our original data came from United Nations (and might differ slightly from CIC data).\n",
"\n",
"As a side note, we can observe that immigration took a dip around 1993 - 1997. Further analysis into the topic revealed that in 1993 Canada introcuded Bill C-86 which introduced revisions to the refugee determination system, mostly restrictive. Further amendments to the Immigration Regulations cancelled the sponsorship required for \"assisted relatives\" and reduced the points awarded to them, making it more difficult for family members (other than nuclear family) to immigrate to Canada. These restrictive measures had a direct impact on the immigration numbers for the next several years."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"**Question**: Create a scatter plot of the total immigration from Denmark, Norway, and Sweden to Canada from 1980 to 2013?"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 1: Get the data:\n",
" 1. Create a dataframe the consists of the numbers associated with Denmark, Norway, and Sweden only. Name it **df_countries**.\n",
" 2. Sum the immigration numbers across all three countries for each year and turn the result into a dataframe. Name this new dataframe **df_total**.\n",
" 3. Reset the index in place.\n",
" 4. Rename the columns to **year** and **total**.\n",
" 5. Display the resulting dataframe."
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>total</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1980</td>\n",
" <td>669</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1981</td>\n",
" <td>678</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1982</td>\n",
" <td>627</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1983</td>\n",
" <td>333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1984</td>\n",
" <td>252</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year total\n",
"0 1980 669\n",
"1 1981 678\n",
"2 1982 627\n",
"3 1983 333\n",
"4 1984 252"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### type your answer here\n",
"df_countries = df_can.loc[['Denmark', 'Norway', 'Sweden'], years].transpose()\n",
"\n",
"df_total = pd.DataFrame(df_countries.sum(axis=1))\n",
"\n",
"df_total.reset_index(inplace=True)\n",
"\n",
"df_total.columns = ['year', 'total']\n",
"\n",
"df_total['year'] = df_total['year'].astype(int)\n",
"\n",
"df_total.head()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Double-click __here__ for the solution.\n",
"<!-- The correct answer is:\n",
"\\\\ # create df_countries dataframe\n",
"df_countries = df_can.loc[['Denmark', 'Norway', 'Sweden'], years].transpose()\n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # create df_total by summing across three countries for each year\n",
"df_total = pd.DataFrame(df_countries.sum(axis=1))\n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # reset index in place\n",
"df_total.reset_index(inplace=True)\n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # rename columns\n",
"df_total.columns = ['year', 'total']\n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # change column year from string to int to create scatter plot\n",
"df_total['year'] = df_total['year'].astype(int)\n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # show resulting dataframe\n",
"df_total.head()\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 2: Generate the scatter plot by plotting the total versus year in **df_total**."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'Number of Immigrants')"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"### type your answer here\n",
"df_total.plot(kind='scatter', x='year', y='total', figsize=(10, 6), color='darkblue')\n",
"\n",
"plt.title('Total Immigration to Canada from 1980 - 2013')\n",
"plt.xlabel('Year')\n",
"plt.ylabel('Number of Immigrants')\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Double-click __here__ for the solution.\n",
"<!-- The correct answer is:\n",
"\\\\ # generate scatter plot\n",
"df_total.plot(kind='scatter', x='year', y='total', figsize=(10, 6), color='darkblue')\n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # add title and label to axes\n",
"plt.title('Immigration from Denmark, Norway, and Sweden to Canada from 1980 - 2013')\n",
"plt.xlabel('Year')\n",
"plt.ylabel('Number of Immigrants')\n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # show plot\n",
"plt.show()\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# Bubble Plots <a id=\"12\"></a>\n",
"\n",
"A `bubble plot` is a variation of the `scatter plot` that displays three dimensions of data (x, y, z). The datapoints are replaced with bubbles, and the size of the bubble is determined by the third variable 'z', also known as the weight. In `maplotlib`, we can pass in an array or scalar to the keyword `s` to `plot()`, that contains the weight of each point.\n",
"\n",
"**Let's start by analyzing the effect of Argentina's great depression**.\n",
"\n",
"Argentina suffered a great depression from 1998 - 2002, which caused widespread unemployment, riots, the fall of the government, and a default on the country's foreign debt. In terms of income, over 50% of Argentines were poor, and seven out of ten Argentine children were poor at the depth of the crisis in 2002. \n",
"\n",
"Let's analyze the effect of this crisis, and compare Argentina's immigration to that of it's neighbour Brazil. Let's do that using a `bubble plot` of immigration from Brazil and Argentina for the years 1980 - 2013. We will set the weights for the bubble as the *normalized* value of the population for each year."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 1: Get the data for Brazil and Argentina. Like in the previous example, we will convert the `Years` to type int and bring it in the dataframe."
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Country</th>\n",
" <th>Year</th>\n",
" <th>Afghanistan</th>\n",
" <th>Albania</th>\n",
" <th>Algeria</th>\n",
" <th>American Samoa</th>\n",
" <th>Andorra</th>\n",
" <th>Angola</th>\n",
" <th>Antigua and Barbuda</th>\n",
" <th>Argentina</th>\n",
" <th>Armenia</th>\n",
" <th>...</th>\n",
" <th>United States of America</th>\n",
" <th>Uruguay</th>\n",
" <th>Uzbekistan</th>\n",
" <th>Vanuatu</th>\n",
" <th>Venezuela (Bolivarian Republic of)</th>\n",
" <th>Viet Nam</th>\n",
" <th>Western Sahara</th>\n",
" <th>Yemen</th>\n",
" <th>Zambia</th>\n",
" <th>Zimbabwe</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1980</td>\n",
" <td>16</td>\n",
" <td>1</td>\n",
" <td>80</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>368</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>9378</td>\n",
" <td>128</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>103</td>\n",
" <td>1191</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>72</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1981</td>\n",
" <td>39</td>\n",
" <td>0</td>\n",
" <td>67</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>426</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>10030</td>\n",
" <td>132</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>117</td>\n",
" <td>1829</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>17</td>\n",
" <td>114</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1982</td>\n",
" <td>39</td>\n",
" <td>0</td>\n",
" <td>71</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>0</td>\n",
" <td>626</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>9074</td>\n",
" <td>146</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>174</td>\n",
" <td>2162</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>102</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1983</td>\n",
" <td>47</td>\n",
" <td>0</td>\n",
" <td>69</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>0</td>\n",
" <td>241</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>7100</td>\n",
" <td>105</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>124</td>\n",
" <td>3404</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>7</td>\n",
" <td>44</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1984</td>\n",
" <td>71</td>\n",
" <td>0</td>\n",
" <td>63</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>42</td>\n",
" <td>237</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>6661</td>\n",
" <td>90</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>142</td>\n",
" <td>7583</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>16</td>\n",
" <td>32</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 196 columns</p>\n",
"</div>"
],
"text/plain": [
"Country Year Afghanistan Albania Algeria American Samoa Andorra Angola \\\n",
"0 1980 16 1 80 0 0 1 \n",
"1 1981 39 0 67 1 0 3 \n",
"2 1982 39 0 71 0 0 6 \n",
"3 1983 47 0 69 0 0 6 \n",
"4 1984 71 0 63 0 0 4 \n",
"\n",
"Country Antigua and Barbuda Argentina Armenia ... \\\n",
"0 0 368 0 ... \n",
"1 0 426 0 ... \n",
"2 0 626 0 ... \n",
"3 0 241 0 ... \n",
"4 42 237 0 ... \n",
"\n",
"Country United States of America Uruguay Uzbekistan Vanuatu \\\n",
"0 9378 128 0 0 \n",
"1 10030 132 0 0 \n",
"2 9074 146 0 0 \n",
"3 7100 105 0 0 \n",
"4 6661 90 0 0 \n",
"\n",
"Country Venezuela (Bolivarian Republic of) Viet Nam Western Sahara Yemen \\\n",
"0 103 1191 0 1 \n",
"1 117 1829 0 2 \n",
"2 174 2162 0 1 \n",
"3 124 3404 0 6 \n",
"4 142 7583 0 0 \n",
"\n",
"Country Zambia Zimbabwe \n",
"0 11 72 \n",
"1 17 114 \n",
"2 11 102 \n",
"3 7 44 \n",
"4 16 32 \n",
"\n",
"[5 rows x 196 columns]"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_can_t = df_can[years].transpose() # transposed dataframe\n",
"\n",
"# cast the Years (the index) to type int\n",
"df_can_t.index = map(int, df_can_t.index)\n",
"\n",
"# let's label the index. This will automatically be the column name when we reset the index\n",
"df_can_t.index.name = 'Year'\n",
"\n",
"# reset index to bring the Year in as a column\n",
"df_can_t.reset_index(inplace=True)\n",
"\n",
"# view the changes\n",
"df_can_t.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 2: Create the normalized weights. \n",
"\n",
"There are several methods of normalizations in statistics, each with its own use. In this case, we will use [feature scaling](https://en.wikipedia.org/wiki/Feature_scaling) to bring all values into the range [0,1]. The general formula is:\n",
"\n",
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Images/Mod3Fig3FeatureScaling.png\" align=\"center\">\n",
"\n",
"where *`X`* is an original value, *`X'`* is the normalized value. The formula sets the max value in the dataset to 1, and sets the min value to 0. The rest of the datapoints are scaled to a value between 0-1 accordingly.\n"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
},
"scrolled": true
},
"outputs": [],
"source": [
"# normalize Brazil data\n",
"norm_brazil = (df_can_t['Brazil'] - df_can_t['Brazil'].min()) / (df_can_t['Brazil'].max() - df_can_t['Brazil'].min())\n",
"\n",
"# normalize Argentina data\n",
"norm_argentina = (df_can_t['Argentina'] - df_can_t['Argentina'].min()) / (df_can_t['Argentina'].max() - df_can_t['Argentina'].min())"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 3: Plot the data. \n",
"- To plot two different scatter plots in one plot, we can include the axes one plot into the other by passing it via the `ax` parameter. \n",
"- We will also pass in the weights using the `s` parameter. Given that the normalized weights are between 0-1, they won't be visible on the plot. Therefore we will:\n",
" - multiply weights by 2000 to scale it up on the graph, and,\n",
" - add 10 to compensate for the min value (which has a 0 weight and therefore scale with x2000)."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"editable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x7f196542bb00>"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1008x576 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Brazil\n",
"ax0 = df_can_t.plot(kind='scatter',\n",
" x='Year',\n",
" y='Brazil',\n",
" figsize=(14, 8),\n",
" alpha=0.5, # transparency\n",
" color='green',\n",
" s=norm_brazil * 2000 + 10, # pass in weights \n",
" xlim=(1975, 2015)\n",
" )\n",
"\n",
"# Argentina\n",
"ax1 = df_can_t.plot(kind='scatter',\n",
" x='Year',\n",
" y='Argentina',\n",
" alpha=0.5,\n",
" color=\"blue\",\n",
" s=norm_argentina * 2000 + 10,\n",
" ax = ax0\n",
" )\n",
"\n",
"ax0.set_ylabel('Number of Immigrants')\n",
"ax0.set_title('Immigration from Brazil and Argentina from 1980 - 2013')\n",
"ax0.legend(['Brazil', 'Argentina'], loc='upper left', fontsize='x-large')"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"The size of the bubble corresponds to the magnitude of immigrating population for that year, compared to the 1980 - 2013 data. The larger the bubble, the more immigrants in that year.\n",
"\n",
"From the plot above, we can see a corresponding increase in immigration from Argentina during the 1998 - 2002 great depression. We can also observe a similar spike around 1985 to 1993. In fact, Argentina had suffered a great depression from 1974 - 1990, just before the onset of 1998 - 2002 great depression. \n",
"\n",
"On a similar note, Brazil suffered the *Samba Effect* where the Brazilian real (currency) dropped nearly 35% in 1999. There was a fear of a South American financial crisis as many South American countries were heavily dependent on industrial exports from Brazil. The Brazilian government subsequently adopted an austerity program, and the economy slowly recovered over the years, culminating in a surge in 2010. The immigration data reflect these events."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"**Question**: Previously in this lab, we created box plots to compare immigration from China and India to Canada. Create bubble plots of immigration from China and India to visualize any differences with time from 1980 to 2013. You can use **df_can_t** that we defined and used in the previous example."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 1: Normalize the data pertaining to China and India."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [],
"source": [
"### type your answer here\n",
"\n",
"norm_china = (df_can_t['China'] - df_can_t['China'].min()) / (df_can_t['China'].max() - df_can_t['China'].min())\n",
"\n",
"# normalize Argentina data\n",
"norm_india = (df_can_t['India'] - df_can_t['India'].min()) / (df_can_t['India'].max() - df_can_t['India'].min())\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Double-click __here__ for the solution.\n",
"<!-- The correct answer is:\n",
"\\\\ # normalize China data\n",
"norm_china = (df_can_t['China'] - df_can_t['China'].min()) / (df_can_t['China'].max() - df_can_t['China'].min())\n",
"-->\n",
"\n",
"<!--\n",
"# normalize India data\n",
"norm_india = (df_can_t['India'] - df_can_t['India'].min()) / (df_can_t['India'].max() - df_can_t['India'].min())\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Step 2: Generate the bubble plots."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"jupyter": {
"outputs_hidden": false
},
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x7f1965e53550>"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1008x576 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"### type your answer here\n",
"# Brazil\n",
"ax0 = df_can_t.plot(kind='scatter',\n",
" x='Year',\n",
" y='China',\n",
" figsize=(14, 8),\n",
" alpha=0.5, # transparency\n",
" color='green',\n",
" s=norm_china * 2000 + 10, # pass in weights \n",
" xlim=(1975, 2015)\n",
" )\n",
"\n",
"# Argentina\n",
"ax1 = df_can_t.plot(kind='scatter',\n",
" x='Year',\n",
" y='India',\n",
" alpha=0.5,\n",
" color=\"blue\",\n",
" s=norm_india * 2000 + 10,\n",
" ax = ax0\n",
" )\n",
"\n",
"ax0.set_ylabel('Number of Immigrants')\n",
"ax0.set_title('Immigration from China and India from 1980 - 2013')\n",
"ax0.legend(['China', 'India'], loc='upper left', fontsize='x-large')\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Double-click __here__ for the solution.\n",
"<!-- The correct answer is:\n",
"\\\\ # China\n",
"ax0 = df_can_t.plot(kind='scatter',\n",
" x='Year',\n",
" y='China',\n",
" figsize=(14, 8),\n",
" alpha=0.5, # transparency\n",
" color='green',\n",
" s=norm_china * 2000 + 10, # pass in weights \n",
" xlim=(1975, 2015)\n",
" )\n",
"-->\n",
"\n",
"<!--\n",
"\\\\ # India\n",
"ax1 = df_can_t.plot(kind='scatter',\n",
" x='Year',\n",
" y='India',\n",
" alpha=0.5,\n",
" color=\"blue\",\n",
" s=norm_india * 2000 + 10,\n",
" ax = ax0\n",
" )\n",
"-->\n",
"\n",
"<!--\n",
"ax0.set_ylabel('Number of Immigrants')\n",
"ax0.set_title('Immigration from China and India from 1980 - 2013')\n",
"ax0.legend(['China', 'India'], loc='upper left', fontsize='x-large')\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Thank you for completing this lab!\n",
"\n",
"This notebook was created by [Jay Rajasekharan](https://www.linkedin.com/in/jayrajasekharan) with contributions from [Ehsan M. Kermani](https://www.linkedin.com/in/ehsanmkermani), and [Slobodan Markovic](https://www.linkedin.com/in/slobodan-markovic).\n",
"\n",
"This notebook was recently revamped by [Alex Aklson](https://www.linkedin.com/in/aklson/). I hope you found this lab session interesting. Feel free to contact me if you have any questions!"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"This notebook is part of a course on **edX** called *Visualizing Data with Python*. If you accessed this notebook outside the course, you can take this course online by clicking [here](http://cocl.us/DV0101EN_edX_LAB3)."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"editable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"<hr>\n",
"\n",
"Copyright &copy; 2019 [Cognitive Class](https://cognitiveclass.ai/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python",
"language": "python",
"name": "conda-env-python-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
},
"widgets": {
"state": {},
"version": "1.1.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment