Skip to content

Instantly share code, notes, and snippets.

@map0logo
Created September 24, 2015 22:14
Show Gist options
  • Save map0logo/1dc18a1fe871c85cac84 to your computer and use it in GitHub Desktop.
Save map0logo/1dc18a1fe871c85cac84 to your computer and use it in GitHub Desktop.
Extract tabular data from Wikipedia
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Extract tabular data from Wikipedia\n",
"\n",
"Using Python 3 and standard library functions\n",
"\n",
"**Francisco Palm**\n",
"\n",
"> ## Learning Objectives\n",
">\n",
"> * Explain what web scraping is, and what libraries are used for.\n",
"> * Extract data some tabular data from Wikipedia.\n",
"\n",
"Information in the web is structured in HTML structures. In particular, there are some Wikipedia pages that had very useful tables with countries data and indicators.\n",
"\n",
"There are two basic operations to get data from web pages:\n",
"1. Request web pages from a given URL address.\n",
"2. Extract information from the web pages.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get pages source with `urllib.request`\n",
"\n",
"First one is made using standard library module urllib.\n",
"\n",
"We are interested in `urllib.request` that opens URLs and get informartion from them."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import urllib.request"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We know that Wikipedia has a page with Human Development Index values by countries, then we'll take data from there."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"url = 'https://en.wikipedia.org/wiki/List_of_countries_by_Human_Development_Index'\n",
"req = urllib.request.Request(url)\n",
"resp = urllib.request.urlopen(req)\n",
"respData = resp.read()\n",
"respData = respData.decode('utf-8')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Response data contains HTML source from the selected page."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'<!DOCTYPE html>\\n<html lang=\"en\" dir=\"ltr\" class=\"client-nojs\">\\n<head>\\n<meta charset=\"UTF-8\" />\\n<title>List of countries by Human Development Index - Wikipedia, the free encyclopedia</title>\\n<script>document.documentElement.className = document.documentElem'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"respData[:256]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Parse html source with `html_source`\n",
"\n",
"Now, `html_parser` package from Python Standard Library implements `HTMLParser` class that reacts accord to different tags. I use `html_table_parse` from *Josua Schmid* that convert tables in python lists of lists https://github.com/schmijos/html-table-parser-python3."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from html_table_parser import HTMLTableParser\n",
"p = HTMLTableParser()\n",
"p.feed(respData)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`pprint()` function from `pprint` package give nice format to our tables."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[['Rank', 'Country', 'HDI'],\n",
" ['2014 estimates for 2013 [ 1 ]',\n",
" 'Change in rank from previous year [ 1 ]',\n",
" '2014 estimates for 2013 [ 1 ]',\n",
" 'Change from previous year [ 1 ]'],\n",
" ['1', '', 'Norway', '0.944', '0.001'],\n",
" ['2', '', 'Australia', '0.933', '0.002'],\n",
" ['3', '', 'Switzerland', '0.917', '0.001'],\n",
" ['4', '', 'Netherlands', '0.915', ''],\n",
" ['5', '', 'United States', '0.914', '0.002'],\n",
" ['6', '', 'Germany', '0.911', ''],\n",
" ['7', '', 'New Zealand', '0.910', '0.002'],\n",
" ['8', '', 'Canada', '0.902', '0.001'],\n",
" ['9', '(3)', 'Singapore', '0.901', '0.003'],\n",
" ['10', '', 'Denmark', '0.900', ''],\n",
" ['11', '(3)', 'Ireland', '0.899', '0.002'],\n",
" ['12', '(1)', 'Sweden', '0.898', '0.001'],\n",
" ['13', '', 'Iceland', '0.895', '0.002'],\n",
" ['14', '', 'United Kingdom', '0.892', '0.002'],\n",
" ['15', '', 'Hong Kong', '0.891', '0.002'],\n",
" ['15', '(1)', 'Korea, South', '0.891', '0.003'],\n",
" ['17', '(1)', 'Japan', '0.890', '0.002'],\n",
" ['18', '(2)', 'Liechtenstein', '0.889', '0.001'],\n",
" ['19', '', 'Israel', '0.888', '0.002'],\n",
" ['20', '', 'France', '0.884', ''],\n",
" ['21', '', 'Austria', '0.881', '0.001'],\n",
" ['21', '', 'Belgium', '0.881', '0.001'],\n",
" ['21', '', 'Luxembourg', '0.881', '0.001'],\n",
" ['24', '', 'Finland', '0.879', ''],\n",
" ['25', '', 'Slovenia', '0.874', '']]\n"
]
}
],
"source": [
"from pprint import pprint\n",
"pprint(p.tables[2])"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Manipulate data with `pandas`\n",
"\n",
"Finally we can use `pandas` data frames to manipulate our data."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import pandas as pd\n",
"headers = ['','','Country','HDI','']\n",
"df = []\n",
"for i in range(2,13):\n",
" df.append(pd.DataFrame(p.tables[i][2:], columns=headers))"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>HDI</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td></td>\n",
" <td>Norway</td>\n",
" <td>0.944</td>\n",
" <td>0.001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td></td>\n",
" <td>Australia</td>\n",
" <td>0.933</td>\n",
" <td>0.002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td></td>\n",
" <td>Switzerland</td>\n",
" <td>0.917</td>\n",
" <td>0.001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td></td>\n",
" <td>Netherlands</td>\n",
" <td>0.915</td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td></td>\n",
" <td>United States</td>\n",
" <td>0.914</td>\n",
" <td>0.002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td></td>\n",
" <td>Germany</td>\n",
" <td>0.911</td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>7</td>\n",
" <td></td>\n",
" <td>New Zealand</td>\n",
" <td>0.910</td>\n",
" <td>0.002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>8</td>\n",
" <td></td>\n",
" <td>Canada</td>\n",
" <td>0.902</td>\n",
" <td>0.001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>9</td>\n",
" <td>(3)</td>\n",
" <td>Singapore</td>\n",
" <td>0.901</td>\n",
" <td>0.003</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>10</td>\n",
" <td></td>\n",
" <td>Denmark</td>\n",
" <td>0.900</td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>11</td>\n",
" <td>(3)</td>\n",
" <td>Ireland</td>\n",
" <td>0.899</td>\n",
" <td>0.002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>12</td>\n",
" <td>(1)</td>\n",
" <td>Sweden</td>\n",
" <td>0.898</td>\n",
" <td>0.001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>13</td>\n",
" <td></td>\n",
" <td>Iceland</td>\n",
" <td>0.895</td>\n",
" <td>0.002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>14</td>\n",
" <td></td>\n",
" <td>United Kingdom</td>\n",
" <td>0.892</td>\n",
" <td>0.002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>15</td>\n",
" <td></td>\n",
" <td>Hong Kong</td>\n",
" <td>0.891</td>\n",
" <td>0.002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>15</td>\n",
" <td>(1)</td>\n",
" <td>Korea, South</td>\n",
" <td>0.891</td>\n",
" <td>0.003</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>17</td>\n",
" <td>(1)</td>\n",
" <td>Japan</td>\n",
" <td>0.890</td>\n",
" <td>0.002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>18</td>\n",
" <td>(2)</td>\n",
" <td>Liechtenstein</td>\n",
" <td>0.889</td>\n",
" <td>0.001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>19</td>\n",
" <td></td>\n",
" <td>Israel</td>\n",
" <td>0.888</td>\n",
" <td>0.002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>20</td>\n",
" <td></td>\n",
" <td>France</td>\n",
" <td>0.884</td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>21</td>\n",
" <td></td>\n",
" <td>Austria</td>\n",
" <td>0.881</td>\n",
" <td>0.001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>21</td>\n",
" <td></td>\n",
" <td>Belgium</td>\n",
" <td>0.881</td>\n",
" <td>0.001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>21</td>\n",
" <td></td>\n",
" <td>Luxembourg</td>\n",
" <td>0.881</td>\n",
" <td>0.001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>24</td>\n",
" <td></td>\n",
" <td>Finland</td>\n",
" <td>0.879</td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>25</td>\n",
" <td></td>\n",
" <td>Slovenia</td>\n",
" <td>0.874</td>\n",
" <td></td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country HDI \n",
"0 1 Norway 0.944 0.001\n",
"1 2 Australia 0.933 0.002\n",
"2 3 Switzerland 0.917 0.001\n",
"3 4 Netherlands 0.915 \n",
"4 5 United States 0.914 0.002\n",
"5 6 Germany 0.911 \n",
"6 7 New Zealand 0.910 0.002\n",
"7 8 Canada 0.902 0.001\n",
"8 9 (3) Singapore 0.901 0.003\n",
"9 10 Denmark 0.900 \n",
"10 11 (3) Ireland 0.899 0.002\n",
"11 12 (1) Sweden 0.898 0.001\n",
"12 13 Iceland 0.895 0.002\n",
"13 14 United Kingdom 0.892 0.002\n",
"14 15 Hong Kong 0.891 0.002\n",
"15 15 (1) Korea, South 0.891 0.003\n",
"16 17 (1) Japan 0.890 0.002\n",
"17 18 (2) Liechtenstein 0.889 0.001\n",
"18 19 Israel 0.888 0.002\n",
"19 20 France 0.884 \n",
"20 21 Austria 0.881 0.001\n",
"21 21 Belgium 0.881 0.001\n",
"22 21 Luxembourg 0.881 0.001\n",
"23 24 Finland 0.879 \n",
"24 25 Slovenia 0.874 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, in data frame format our data can be easily filtered."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>HDI</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Norway</td>\n",
" <td>0.944</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Australia</td>\n",
" <td>0.933</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Switzerland</td>\n",
" <td>0.917</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Netherlands</td>\n",
" <td>0.915</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>United States</td>\n",
" <td>0.914</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Germany</td>\n",
" <td>0.911</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>New Zealand</td>\n",
" <td>0.910</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Canada</td>\n",
" <td>0.902</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Singapore</td>\n",
" <td>0.901</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Denmark</td>\n",
" <td>0.900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Ireland</td>\n",
" <td>0.899</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Sweden</td>\n",
" <td>0.898</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Iceland</td>\n",
" <td>0.895</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>United Kingdom</td>\n",
" <td>0.892</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>Hong Kong</td>\n",
" <td>0.891</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>Korea, South</td>\n",
" <td>0.891</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>Japan</td>\n",
" <td>0.890</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>Liechtenstein</td>\n",
" <td>0.889</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>Israel</td>\n",
" <td>0.888</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>France</td>\n",
" <td>0.884</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>Austria</td>\n",
" <td>0.881</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>Belgium</td>\n",
" <td>0.881</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>Luxembourg</td>\n",
" <td>0.881</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>Finland</td>\n",
" <td>0.879</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>Slovenia</td>\n",
" <td>0.874</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Italy</td>\n",
" <td>0.872</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Spain</td>\n",
" <td>0.869</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Czech Republic</td>\n",
" <td>0.861</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Greece</td>\n",
" <td>0.853</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brunei</td>\n",
" <td>0.852</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Solomon Islands</td>\n",
" <td>0.491</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>Comoros</td>\n",
" <td>0.488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>Tanzania</td>\n",
" <td>0.488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>Mauritania</td>\n",
" <td>0.487</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>Lesotho</td>\n",
" <td>0.486</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>Senegal</td>\n",
" <td>0.485</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>Uganda</td>\n",
" <td>0.484</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>Benin</td>\n",
" <td>0.476</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>Sudan</td>\n",
" <td>0.473</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Togo</td>\n",
" <td>0.473</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Haiti</td>\n",
" <td>0.471</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Afghanistan</td>\n",
" <td>0.468</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Djibouti</td>\n",
" <td>0.467</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Ivory Coast</td>\n",
" <td>0.452</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Gambia</td>\n",
" <td>0.441</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Ethiopia</td>\n",
" <td>0.435</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Malawi</td>\n",
" <td>0.414</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Liberia</td>\n",
" <td>0.412</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Mali</td>\n",
" <td>0.407</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Guinea-Bissau</td>\n",
" <td>0.396</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Mozambique</td>\n",
" <td>0.393</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Guinea</td>\n",
" <td>0.392</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Burundi</td>\n",
" <td>0.389</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>Burkina Faso</td>\n",
" <td>0.388</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>Eritrea</td>\n",
" <td>0.381</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>Sierra Leone</td>\n",
" <td>0.374</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>Chad</td>\n",
" <td>0.372</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>Central African Republic</td>\n",
" <td>0.341</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>Congo, Democratic Republic of the</td>\n",
" <td>0.338</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>Niger</td>\n",
" <td>0.337</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>188 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" Country HDI\n",
"0 Norway 0.944\n",
"1 Australia 0.933\n",
"2 Switzerland 0.917\n",
"3 Netherlands 0.915\n",
"4 United States 0.914\n",
"5 Germany 0.911\n",
"6 New Zealand 0.910\n",
"7 Canada 0.902\n",
"8 Singapore 0.901\n",
"9 Denmark 0.900\n",
"10 Ireland 0.899\n",
"11 Sweden 0.898\n",
"12 Iceland 0.895\n",
"13 United Kingdom 0.892\n",
"14 Hong Kong 0.891\n",
"15 Korea, South 0.891\n",
"16 Japan 0.890\n",
"17 Liechtenstein 0.889\n",
"18 Israel 0.888\n",
"19 France 0.884\n",
"20 Austria 0.881\n",
"21 Belgium 0.881\n",
"22 Luxembourg 0.881\n",
"23 Finland 0.879\n",
"24 Slovenia 0.874\n",
"0 Italy 0.872\n",
"1 Spain 0.869\n",
"2 Czech Republic 0.861\n",
"3 Greece 0.853\n",
"4 Brunei 0.852\n",
".. ... ...\n",
"13 Solomon Islands 0.491\n",
"14 Comoros 0.488\n",
"15 Tanzania 0.488\n",
"16 Mauritania 0.487\n",
"17 Lesotho 0.486\n",
"18 Senegal 0.485\n",
"19 Uganda 0.484\n",
"20 Benin 0.476\n",
"21 Sudan 0.473\n",
"0 Togo 0.473\n",
"1 Haiti 0.471\n",
"2 Afghanistan 0.468\n",
"3 Djibouti 0.467\n",
"4 Ivory Coast 0.452\n",
"5 Gambia 0.441\n",
"6 Ethiopia 0.435\n",
"7 Malawi 0.414\n",
"8 Liberia 0.412\n",
"9 Mali 0.407\n",
"10 Guinea-Bissau 0.396\n",
"11 Mozambique 0.393\n",
"12 Guinea 0.392\n",
"13 Burundi 0.389\n",
"14 Burkina Faso 0.388\n",
"15 Eritrea 0.381\n",
"16 Sierra Leone 0.374\n",
"17 Chad 0.372\n",
"18 Central African Republic 0.341\n",
"19 Congo, Democratic Republic of the 0.338\n",
"20 Niger 0.337\n",
"\n",
"[188 rows x 2 columns]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"HDIframe = pd.concat(df)\n",
"pd.DataFrame({'Country' : HDIframe.Country, 'HDI' : HDIframe.HDI})"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"23 0.808\n",
"Name: HDI, dtype: object"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"HDIframe.HDI[HDIframe.Country == 'Argentina']"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment