Skip to content

Instantly share code, notes, and snippets.

@davified
Last active May 16, 2016 03:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save davified/a1d98467e03e3813c705f7691e6bd836 to your computer and use it in GitHub Desktop.
Save davified/a1d98467e03e3813c705f7691e6bd836 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Understanding the Panama Papers using pandas and plotly\n",
"\n",
"In this notebook, I use the pandas and plotly libraries to clean, wrangle and visualise the Panama Papers dataset. It is in part to help me learn python, pandas and plotly, and in part to help me satisfy my curiosity about the Panama Papers. Specifically, I hope to answer the following questions: (i) which countries do tax evading companies originate from, (ii) which countries are popular tax havens, and (iii) zooming into Singapore, where do tax evaders live?\n",
"\n",
"Everything I learnt about pandas are from two amazing tutorials by Brandon Rhodes (PyCon 2015) https://www.youtube.com/watch?v=5JnMutdy6Fw&list=PLO9pkowc_99YYMFnW9GZkIE93jbip1jZb&index=4 and Jonathan Rocher (SciPy 2015) https://www.youtube.com/watch?v=0CFFTJUZ2dc.\n",
"\n",
"\n",
"In the process, I learnt how to use pandas to:\n",
"1. Load data\n",
"2. Clean and format data\n",
"3. Transform datasets\n",
"4. Aggregate and summarise data\n",
"\n",
"I also learnt how to use the plotly library for data visualisation and how to import and embed HTML files\n",
"\n",
"Note:\n",
"- This is an exercise in pandas (for data cleaning) and plotly (for visualisation). I haven't learnt about data validation at this point, so please take my conclusions with a pinch of salt.\n",
"- Source: https://offshoreleaks.icij.org/pages/database ICIJ Offshore Leaks Database is licensed under the Open Database License and contents under Creative Commons Attribution-ShareAlike license. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 143,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"plt.style.use('ggplot')\n",
"import cufflinks as cf\n",
"import plotly.tools as tls\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"from pandas import set_option\n",
"set_option(\"display.max_rows\", 25)\n",
"from IPython.core.display import HTML\n",
"\n",
"\n",
"LARGE_FIGSIZE = (16, 12)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Describing the data\n",
"The data is taken from 5 csv files, which contain data on the (i) entities, (ii) intermediaries, (iii) officers, (iv) addresses of officers, and the fifth file appears to be a file which allows Neo4j to map data across csv files using node_id as a unique identifier.\n",
"\n",
"More details here: https://offshoreleaks.icij.org/pages/about\n",
"Note that this contains only a fraction of the Panama Papers leaks from the Panama-based offices of Mossack Fonseca. Those leaks are made up of 2.6 terabytes of data, a total of 11.5 million records, which are not included en masse in the public database. The raw documents have not been published. The Mossack Fonseca leaks include vast amounts of email correspondence, bank account details, identity documents and financial transactions, and not all the names on those files appear in the database. Much of the information is buried in emails, power-of-attorney letters and internal notes of Mossack Fonseca employees and cannot easily be extracted in a systematic manner, according to the International Consortium of Investigative Journalists (ICIJ).\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": 144,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"addresses = pd.read_csv(\"data/Addresses.csv\")\n",
"all_edges = pd.read_csv(\"data/all_edges.csv\")\n",
"entities = pd.read_csv(\"data/Entities.csv\")\n",
"intermediaries = pd.read_csv(\"data/Intermediaries.csv\")\n",
"officers = pd.read_csv(\"data/Officers.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>1. entities</th>\n",
" <th>2. intermediaries</th>\n",
" <th>3. officers</th>\n",
" <th>4. addresses</th>\n",
" <th>5. all_edges</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>name</td>\n",
" <td>name</td>\n",
" <td>name</td>\n",
" <td>address</td>\n",
" <td>node_1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>original_name</td>\n",
" <td>internal_id</td>\n",
" <td>icij_id</td>\n",
" <td>icij_id</td>\n",
" <td>rel_type</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>former_name</td>\n",
" <td>address</td>\n",
" <td>valid_until</td>\n",
" <td>valid_until</td>\n",
" <td>node_2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>jurisdiction</td>\n",
" <td>valid_until</td>\n",
" <td>country_codes</td>\n",
" <td>country_codes</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>jurisdiction_description</td>\n",
" <td>country_codes</td>\n",
" <td>countries</td>\n",
" <td>countries</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>company_type</td>\n",
" <td>countries</td>\n",
" <td>node_id</td>\n",
" <td>node_id</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>address</td>\n",
" <td>status</td>\n",
" <td>sourceID</td>\n",
" <td>sourceID</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>internal_id</td>\n",
" <td>node_id</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>incorporation_date</td>\n",
" <td>sourceID</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>inactivation_date</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>struck_off_date</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>dorm_date</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>status</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>service_provider</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>ibcRUC</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>country_codes</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>countries</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>note</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>valid_until</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>node_id</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 1. entities 2. intermediaries 3. officers 4. addresses \\\n",
"0 name name name address \n",
"1 original_name internal_id icij_id icij_id \n",
"2 former_name address valid_until valid_until \n",
"3 jurisdiction valid_until country_codes country_codes \n",
"4 jurisdiction_description country_codes countries countries \n",
"5 company_type countries node_id node_id \n",
"6 address status sourceID sourceID \n",
"7 internal_id node_id NaN NaN \n",
"8 incorporation_date sourceID NaN NaN \n",
"9 inactivation_date NaN NaN NaN \n",
"10 struck_off_date NaN NaN NaN \n",
"11 dorm_date NaN NaN NaN \n",
"12 status NaN NaN NaN \n",
"13 service_provider NaN NaN NaN \n",
"14 ibcRUC NaN NaN NaN \n",
"15 country_codes NaN NaN NaN \n",
"16 countries NaN NaN NaN \n",
"17 note NaN NaN NaN \n",
"18 valid_until NaN NaN NaN \n",
"19 node_id NaN NaN NaN \n",
"\n",
" 5. all_edges \n",
"0 node_1 \n",
"1 rel_type \n",
"2 node_2 \n",
"3 NaN \n",
"4 NaN \n",
"5 NaN \n",
"6 NaN \n",
"7 NaN \n",
"8 NaN \n",
"9 NaN \n",
"10 NaN \n",
"11 NaN \n",
"12 NaN \n",
"13 NaN \n",
"14 NaN \n",
"15 NaN \n",
"16 NaN \n",
"17 NaN \n",
"18 NaN \n",
"19 NaN "
]
},
"execution_count": 145,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_data = {'1. entities' : pd.Series(entities.columns.values),\n",
" '2. intermediaries' : pd.Series(intermediaries.columns.values),\n",
" '3. officers': pd.Series(officers.columns.values),\n",
" '4. addresses' : pd.Series(addresses.columns.values),\n",
" '5. all_edges': pd.Series(all_edges.columns.values)}\n",
"\n",
"data_overview = pd.DataFrame(all_data)\n",
"data_overview.head(20)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Exploring the entities dataframe\n"
]
},
{
"cell_type": "code",
"execution_count": 146,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(319150, 21)"
]
},
"execution_count": 146,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"entities.shape"
]
},
{
"cell_type": "code",
"execution_count": 147,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 319150 entries, 0 to 319149\n",
"Data columns (total 21 columns):\n",
"name 319146 non-null object\n",
"original_name 213634 non-null object\n",
"former_name 6454 non-null object\n",
"jurisdiction 319150 non-null object\n",
"jurisdiction_description 319150 non-null object\n",
"company_type 103227 non-null object\n",
"address 299319 non-null object\n",
"internal_id 213634 non-null float64\n",
"incorporation_date 309736 non-null object\n",
"inactivation_date 144760 non-null object\n",
"struck_off_date 157872 non-null object\n",
"dorm_date 20207 non-null object\n",
"status 310106 non-null object\n",
"service_provider 319150 non-null object\n",
"ibcRUC 255375 non-null object\n",
"country_codes 318360 non-null object\n",
"countries 318360 non-null object\n",
"note 8984 non-null object\n",
"valid_until 319150 non-null object\n",
"node_id 319150 non-null int64\n",
"sourceID 319150 non-null object\n",
"dtypes: float64(1), int64(1), object(19)\n",
"memory usage: 51.1+ MB\n"
]
}
],
"source": [
"entities.info()"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Index([u'name', u'original_name', u'former_name', u'jurisdiction',\n",
" u'jurisdiction_description', u'company_type', u'address',\n",
" u'internal_id', u'incorporation_date', u'inactivation_date',\n",
" u'struck_off_date', u'dorm_date', u'status', u'service_provider',\n",
" u'ibcRUC', u'country_codes', u'countries', u'note', u'valid_until',\n",
" u'node_id', u'sourceID'],\n",
" dtype='object')"
]
},
"execution_count": 148,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"entities.columns"
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>address</th>\n",
" <th>jurisdiction_description</th>\n",
" <th>countries</th>\n",
" <th>node_id</th>\n",
" <th>internal_id</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ZODIAK LTD</td>\n",
" <td>Christabel Corporate Services Limited Christab...</td>\n",
" <td>Undetermined</td>\n",
" <td>Virgin Islands, British;Cyprus</td>\n",
" <td>67028</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Zven Ltd.</td>\n",
" <td>Alesta Consulting, S.A. INVOICE BY EMAIL ONLY</td>\n",
" <td>Undetermined</td>\n",
" <td>Not identified;Virgin Islands, British</td>\n",
" <td>67243</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Anson 11A Ltd.</td>\n",
" <td>Express Co Registration Pte Ltd 138 Cecil Stre...</td>\n",
" <td>Undetermined</td>\n",
" <td>Singapore;Virgin Islands, British</td>\n",
" <td>67258</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>GIADA LTD</td>\n",
" <td>Christabel Corporate Services Limited Christab...</td>\n",
" <td>Undetermined</td>\n",
" <td>Cyprus;Virgin Islands, British</td>\n",
" <td>67266</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Scott D. Howard</td>\n",
" <td>Scott D. Howard P.O. Box 811 Brunswick ME 0401...</td>\n",
" <td>Undetermined</td>\n",
" <td>United States;Not identified</td>\n",
" <td>108050</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name address \\\n",
"0 ZODIAK LTD Christabel Corporate Services Limited Christab... \n",
"1 Zven Ltd. Alesta Consulting, S.A. INVOICE BY EMAIL ONLY \n",
"2 Anson 11A Ltd. Express Co Registration Pte Ltd 138 Cecil Stre... \n",
"3 GIADA LTD Christabel Corporate Services Limited Christab... \n",
"4 Scott D. Howard Scott D. Howard P.O. Box 811 Brunswick ME 0401... \n",
"\n",
" jurisdiction_description countries node_id \\\n",
"0 Undetermined Virgin Islands, British;Cyprus 67028 \n",
"1 Undetermined Not identified;Virgin Islands, British 67243 \n",
"2 Undetermined Singapore;Virgin Islands, British 67258 \n",
"3 Undetermined Cyprus;Virgin Islands, British 67266 \n",
"4 Undetermined United States;Not identified 108050 \n",
"\n",
" internal_id \n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN "
]
},
"execution_count": 150,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#from the above, the columns of interest are: name, jurisdiction_description, countries, \n",
"#node_id and internal_id. The last 3 columns may be useful later on as an index to reference other csv files.\n",
"\n",
"entities_subset = entities[[u'name', u'address', u'jurisdiction_description', u'countries', u'node_id',\n",
" u'internal_id']]\n",
"entities_subset.head()"
]
},
{
"cell_type": "code",
"execution_count": 151,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
"5 True\n",
"6 False\n",
"7 False\n",
"8 False\n",
"9 True\n",
"10 False\n",
"11 False\n",
" ... \n",
"319138 False\n",
"319139 False\n",
"319140 False\n",
"319141 False\n",
"319142 False\n",
"319143 False\n",
"319144 False\n",
"319145 False\n",
"319146 False\n",
"319147 False\n",
"319148 False\n",
"319149 False\n",
"dtype: bool"
]
},
"execution_count": 151,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#I'm not sure what jurisdiction and countries mean - which is the source and which is the destination? Let's find\n",
"#out if they are identical.\n",
"\n",
"entities.jurisdiction == entities.country_codes\n",
"\n",
"# The output shows that they are not the same. Following some exploratory data analyses (which I've excluded from\n",
"#here for brevity), I found out that (i) jurisdiction refer to the tax havens, and (ii) the distinction between \n",
"#source country and tax havens is not so clear cut. For instance, based on the second chart below, you'll see that\n",
"#several thousands of companies investing into the British Virgin Islands are from British Virgin islands."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Country1 Country2 Country3 Country4\n",
"0 Cyprus Virgin Islands, British NaN NaN\n",
"1 Virgin Islands, British Not identified NaN NaN\n",
"2 Virgin Islands, British Singapore NaN NaN\n",
"3 Virgin Islands, British Cyprus NaN NaN\n",
"4 Not identified United States NaN NaN\n",
"5 Not identified NaN NaN NaN\n",
"6 Virgin Islands, British Russian Federation NaN NaN\n",
"7 Netherlands Not identified NaN NaN\n",
"... ... ... ... ...\n",
"319142 Andorra NaN NaN NaN\n",
"319143 Isle of Man NaN NaN NaN\n",
"319144 Hong Kong NaN NaN NaN\n",
"319145 Hong Kong NaN NaN NaN\n",
"319146 Hong Kong NaN NaN NaN\n",
"319147 Hong Kong NaN NaN NaN\n",
"319148 United Arab Emirates NaN NaN NaN\n",
"319149 United Arab Emirates NaN NaN NaN\n",
"\n",
"[318360 rows x 4 columns]\n"
]
}
],
"source": [
"#The country_codes and countries column sometimes contain more than 1 element. This may lead to undercounting. Let's\n",
"#fix this.\n",
"\n",
"split = lambda x: pd.Series([i for i in reversed(x.split(';'))])\n",
"countries_split = entities_subset['countries'].dropna().apply(split)\n",
"\n",
"countries_split.rename(columns={0:'Country1',1:'Country2',2:'Country3', 3: 'Country4'},inplace=True)\n",
"countries_split = countries_split[['Country1', 'Country2', 'Country3', 'Country4']]\n",
"print countries_split"
]
},
{
"cell_type": "code",
"execution_count": 154,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Virgin Islands, British 4\n",
"Cayman Islands 3\n",
"Singapore 2\n",
"Hong Kong 2\n",
"Not identified 2\n",
"Seychelles 1\n",
"United Kingdom 1\n",
"Curaçao 1\n",
"Name: Country3, dtype: int64\n",
"Virgin Islands, British 1\n",
"Name: Country4, dtype: int64\n"
]
}
],
"source": [
"print countries_split.Country3.value_counts()\n",
"print countries_split.Country4.value_counts()\n",
"\n",
"# For the sake of simplicity, I am going to drop Country3 and Country4, since countries only appear for a total of\n",
"#20 times (less than 0.00001% of our dataset) in these 2 columns."
]
},
{
"cell_type": "code",
"execution_count": 153,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Dropping two columns and merging this back to the main dataframe, and then dropping the country_codes column\n",
"countries_split.drop(['Country3', 'Country4'], axis = 1)\n",
"entities_subset[['Country1', 'Country2']] = countries_split[['Country1', 'Country2']]"
]
},
{
"cell_type": "code",
"execution_count": 142,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>address</th>\n",
" <th>jurisdiction_description</th>\n",
" <th>node_id</th>\n",
" <th>internal_id</th>\n",
" <th>Country2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ZODIAK LTD</td>\n",
" <td>Christabel Corporate Services Limited Christab...</td>\n",
" <td>Undetermined</td>\n",
" <td>67028</td>\n",
" <td>NaN</td>\n",
" <td>British Virgin Islands</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Zven Ltd.</td>\n",
" <td>Alesta Consulting, S.A. INVOICE BY EMAIL ONLY</td>\n",
" <td>Undetermined</td>\n",
" <td>67243</td>\n",
" <td>NaN</td>\n",
" <td>Not identified</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Anson 11A Ltd.</td>\n",
" <td>Express Co Registration Pte Ltd 138 Cecil Stre...</td>\n",
" <td>Undetermined</td>\n",
" <td>67258</td>\n",
" <td>NaN</td>\n",
" <td>Singapore</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>GIADA LTD</td>\n",
" <td>Christabel Corporate Services Limited Christab...</td>\n",
" <td>Undetermined</td>\n",
" <td>67266</td>\n",
" <td>NaN</td>\n",
" <td>Cyprus</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Scott D. Howard</td>\n",
" <td>Scott D. Howard P.O. Box 811 Brunswick ME 0401...</td>\n",
" <td>Undetermined</td>\n",
" <td>108050</td>\n",
" <td>NaN</td>\n",
" <td>United States</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name address \\\n",
"0 ZODIAK LTD Christabel Corporate Services Limited Christab... \n",
"1 Zven Ltd. Alesta Consulting, S.A. INVOICE BY EMAIL ONLY \n",
"2 Anson 11A Ltd. Express Co Registration Pte Ltd 138 Cecil Stre... \n",
"3 GIADA LTD Christabel Corporate Services Limited Christab... \n",
"4 Scott D. Howard Scott D. Howard P.O. Box 811 Brunswick ME 0401... \n",
"\n",
" jurisdiction_description node_id internal_id Country2 \n",
"0 Undetermined 67028 NaN British Virgin Islands \n",
"1 Undetermined 67243 NaN Not identified \n",
"2 Undetermined 67258 NaN Singapore \n",
"3 Undetermined 67266 NaN Cyprus \n",
"4 Undetermined 108050 NaN United States "
]
},
"execution_count": 142,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"entities_subset = entities_subset.drop('countries', axis = 1)\n",
"entities_subset.head()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>address</th>\n",
" <th>jurisdiction_description</th>\n",
" <th>node_id</th>\n",
" <th>internal_id</th>\n",
" <th>Country1</th>\n",
" <th>Country2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ZODIAK LTD</td>\n",
" <td>Christabel Corporate Services Limited Christab...</td>\n",
" <td>Undetermined</td>\n",
" <td>67028</td>\n",
" <td>NaN</td>\n",
" <td>Cyprus</td>\n",
" <td>British Virgin Islands</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Zven Ltd.</td>\n",
" <td>Alesta Consulting, S.A. INVOICE BY EMAIL ONLY</td>\n",
" <td>Undetermined</td>\n",
" <td>67243</td>\n",
" <td>NaN</td>\n",
" <td>British Virgin Islands</td>\n",
" <td>Not identified</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Anson 11A Ltd.</td>\n",
" <td>Express Co Registration Pte Ltd 138 Cecil Stre...</td>\n",
" <td>Undetermined</td>\n",
" <td>67258</td>\n",
" <td>NaN</td>\n",
" <td>British Virgin Islands</td>\n",
" <td>Singapore</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>GIADA LTD</td>\n",
" <td>Christabel Corporate Services Limited Christab...</td>\n",
" <td>Undetermined</td>\n",
" <td>67266</td>\n",
" <td>NaN</td>\n",
" <td>British Virgin Islands</td>\n",
" <td>Cyprus</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Scott D. Howard</td>\n",
" <td>Scott D. Howard P.O. Box 811 Brunswick ME 0401...</td>\n",
" <td>Undetermined</td>\n",
" <td>108050</td>\n",
" <td>NaN</td>\n",
" <td>Not identified</td>\n",
" <td>United States</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name address \\\n",
"0 ZODIAK LTD Christabel Corporate Services Limited Christab... \n",
"1 Zven Ltd. Alesta Consulting, S.A. INVOICE BY EMAIL ONLY \n",
"2 Anson 11A Ltd. Express Co Registration Pte Ltd 138 Cecil Stre... \n",
"3 GIADA LTD Christabel Corporate Services Limited Christab... \n",
"4 Scott D. Howard Scott D. Howard P.O. Box 811 Brunswick ME 0401... \n",
"\n",
" jurisdiction_description node_id internal_id Country1 \\\n",
"0 Undetermined 67028 NaN Cyprus \n",
"1 Undetermined 67243 NaN British Virgin Islands \n",
"2 Undetermined 67258 NaN British Virgin Islands \n",
"3 Undetermined 67266 NaN British Virgin Islands \n",
"4 Undetermined 108050 NaN Not identified \n",
"\n",
" Country2 \n",
"0 British Virgin Islands \n",
"1 Not identified \n",
"2 Singapore \n",
"3 Cyprus \n",
"4 United States "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Let's move on to harmonise the references to British Virgin Islands (in country1 they are referred to as\n",
"#Virgin Islands, British).\n",
"\n",
"entities_subset = entities_subset.replace(to_replace='Virgin Islands, British', value='British Virgin Islands')\n",
"entities_subset.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The data is cleaner now, so let's prepare it for visualisation!\n",
"\n",
"Based on what's available in the entities_subset dataframe, I think we can visualise the following: (i) top countries with companies found in the Panama Papers, (ii) breakdown of these top countries by destination countries, (iii) the breakdown in source and destination countries (using a heatmap), and (iv) zooming into Singapore, we can use the address column to visualise where these companies are located in Singapore (I will enlist the help of Google Maps for this last visualisation)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Top 20 countries with companies found in the Panama Papers"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"British Virgin Islands 69087.0\n",
"Hong Kong 51293.0\n",
"Switzerland 38077.0\n",
"Not identified 25698.0\n",
"Panama 18122.0\n",
"United Kingdom 17972.0\n",
"Jersey 14562.0\n",
"Russian Federation 11516.0\n",
" ... \n",
"Cyprus 6374.0\n",
"United States 6254.0\n",
"Singapore 5867.0\n",
"Bahamas 5021.0\n",
"Uruguay 4906.0\n",
"Isle of Man 4893.0\n",
"China 4188.0\n",
"Monaco 3168.0\n",
"Name: total, dtype: float64"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#First, let's build a dataframe for visualisation\n",
"\n",
"source_country = entities_subset[['Country1', 'Country2']]\n",
"rank = source_country.apply(pd.Series.value_counts).fillna(0)\n",
"rank['total'] = rank.sum(axis = 1)\n",
"rank = rank.sort_values(by = 'total', ascending = False).total\n",
"\n",
"top_20 = rank[:20]\n",
"top_20"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https://plot.ly/~davified/18.embed\" height=\"587px\" width=\"1116px\"></iframe>"
],
"text/plain": [
"<plotly.tools.PlotlyDisplay object>"
]
},
"execution_count": 124,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Let's visualise the data! I'll the plotly cufflinks library (why not? it's interactive! try clicking on the chart!)\n",
"\n",
"top_20.iplot(kind ='bar', yTitle = 'Number of Companies', title = 'Top 20 Countries (Source) Listed in Panama Papers')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Breakdown of top countries by destination countries (i.e. jurisdictions)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning:\n",
"\n",
"\n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
"\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country1</th>\n",
" <th>Country2</th>\n",
" <th>jurisdiction</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Cyprus</td>\n",
" <td>British Virgin Islands</td>\n",
" <td>Undetermined</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>British Virgin Islands</td>\n",
" <td>Not identified</td>\n",
" <td>Undetermined</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>British Virgin Islands</td>\n",
" <td>Singapore</td>\n",
" <td>Undetermined</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>British Virgin Islands</td>\n",
" <td>Cyprus</td>\n",
" <td>Undetermined</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Not identified</td>\n",
" <td>United States</td>\n",
" <td>Undetermined</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Not identified</td>\n",
" <td>NaN</td>\n",
" <td>Undetermined</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>British Virgin Islands</td>\n",
" <td>Russian Federation</td>\n",
" <td>Undetermined</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Netherlands</td>\n",
" <td>Not identified</td>\n",
" <td>Undetermined</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>319142</th>\n",
" <td>Andorra</td>\n",
" <td>NaN</td>\n",
" <td>Hong Kong</td>\n",
" </tr>\n",
" <tr>\n",
" <th>319143</th>\n",
" <td>Isle of Man</td>\n",
" <td>NaN</td>\n",
" <td>British Virgin Islands</td>\n",
" </tr>\n",
" <tr>\n",
" <th>319144</th>\n",
" <td>Hong Kong</td>\n",
" <td>NaN</td>\n",
" <td>British Virgin Islands</td>\n",
" </tr>\n",
" <tr>\n",
" <th>319145</th>\n",
" <td>Hong Kong</td>\n",
" <td>NaN</td>\n",
" <td>British Virgin Islands</td>\n",
" </tr>\n",
" <tr>\n",
" <th>319146</th>\n",
" <td>Hong Kong</td>\n",
" <td>NaN</td>\n",
" <td>British Virgin Islands</td>\n",
" </tr>\n",
" <tr>\n",
" <th>319147</th>\n",
" <td>Hong Kong</td>\n",
" <td>NaN</td>\n",
" <td>British Virgin Islands</td>\n",
" </tr>\n",
" <tr>\n",
" <th>319148</th>\n",
" <td>United Arab Emirates</td>\n",
" <td>NaN</td>\n",
" <td>British Virgin Islands</td>\n",
" </tr>\n",
" <tr>\n",
" <th>319149</th>\n",
" <td>United Arab Emirates</td>\n",
" <td>NaN</td>\n",
" <td>British Virgin Islands</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>319150 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" Country1 Country2 jurisdiction\n",
"0 Cyprus British Virgin Islands Undetermined\n",
"1 British Virgin Islands Not identified Undetermined\n",
"2 British Virgin Islands Singapore Undetermined\n",
"3 British Virgin Islands Cyprus Undetermined\n",
"4 Not identified United States Undetermined\n",
"5 Not identified NaN Undetermined\n",
"6 British Virgin Islands Russian Federation Undetermined\n",
"7 Netherlands Not identified Undetermined\n",
"... ... ... ...\n",
"319142 Andorra NaN Hong Kong\n",
"319143 Isle of Man NaN British Virgin Islands\n",
"319144 Hong Kong NaN British Virgin Islands\n",
"319145 Hong Kong NaN British Virgin Islands\n",
"319146 Hong Kong NaN British Virgin Islands\n",
"319147 Hong Kong NaN British Virgin Islands\n",
"319148 United Arab Emirates NaN British Virgin Islands\n",
"319149 United Arab Emirates NaN British Virgin Islands\n",
"\n",
"[319150 rows x 3 columns]"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#First, let's structure our data into a 2-level dataframe, so that it can be used to create a stacked bar chart.\n",
"#Caveat - This section is slightly more messy and I wish there were less trial and error in my approach!\n",
"\n",
"source_country['jurisdiction'] = entities_subset['jurisdiction_description']\n",
"source_country"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country1</th>\n",
" <th>jurisdiction</th>\n",
" <th>total_companies</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Albania</td>\n",
" <td>Bahamas</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>American Samoa</td>\n",
" <td>United Kingdom</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Andorra</td>\n",
" <td>Bahamas</td>\n",
" <td>26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Andorra</td>\n",
" <td>British Virgin Islands</td>\n",
" <td>39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Andorra</td>\n",
" <td>Costa Rica</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Andorra</td>\n",
" <td>Hong Kong</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Andorra</td>\n",
" <td>Nevada</td>\n",
" <td>24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Andorra</td>\n",
" <td>Niue</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>947</th>\n",
" <td>Virgin Islands, U.S.</td>\n",
" <td>Bahamas</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>948</th>\n",
" <td>Virgin Islands, U.S.</td>\n",
" <td>British Virgin Islands</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>949</th>\n",
" <td>Virgin Islands, U.S.</td>\n",
" <td>Panama</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>950</th>\n",
" <td>Virgin Islands, U.S.</td>\n",
" <td>Undetermined</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>951</th>\n",
" <td>Yemen</td>\n",
" <td>Panama</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>952</th>\n",
" <td>Zambia</td>\n",
" <td>British Virgin Islands</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>953</th>\n",
" <td>Zimbabwe</td>\n",
" <td>British Virgin Islands</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>954</th>\n",
" <td>Zimbabwe</td>\n",
" <td>Panama</td>\n",
" <td>3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>955 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" Country1 jurisdiction total_companies\n",
"0 Albania Bahamas 2\n",
"1 American Samoa United Kingdom 1\n",
"2 Andorra Bahamas 26\n",
"3 Andorra British Virgin Islands 39\n",
"4 Andorra Costa Rica 3\n",
"5 Andorra Hong Kong 9\n",
"6 Andorra Nevada 24\n",
"7 Andorra Niue 12\n",
".. ... ... ...\n",
"947 Virgin Islands, U.S. Bahamas 2\n",
"948 Virgin Islands, U.S. British Virgin Islands 1\n",
"949 Virgin Islands, U.S. Panama 1\n",
"950 Virgin Islands, U.S. Undetermined 7\n",
"951 Yemen Panama 1\n",
"952 Zambia British Virgin Islands 2\n",
"953 Zimbabwe British Virgin Islands 5\n",
"954 Zimbabwe Panama 3\n",
"\n",
"[955 rows x 3 columns]"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source_country_2_level_counts = source_country.groupby(['Country1', 'jurisdiction']).size().reset_index()\n",
"source_country_2_level_counts.rename(columns={0:'total_companies'},inplace=True)\n",
"source_country_2_level_counts"
]
},
{
"cell_type": "code",
"execution_count": 159,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country1</th>\n",
" <th>jurisdiction</th>\n",
" <th>total_companies</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>117</th>\n",
" <td>British Virgin Islands</td>\n",
" <td>British Virgin Islands</td>\n",
" <td>37843</td>\n",
" </tr>\n",
" <tr>\n",
" <th>358</th>\n",
" <td>Hong Kong</td>\n",
" <td>British Virgin Islands</td>\n",
" <td>27300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>805</th>\n",
" <td>Switzerland</td>\n",
" <td>British Virgin Islands</td>\n",
" <td>19650</td>\n",
" </tr>\n",
" <tr>\n",
" <th>634</th>\n",
" <td>Not identified</td>\n",
" <td>Undetermined</td>\n",
" <td>18193</td>\n",
" </tr>\n",
" <tr>\n",
" <th>124</th>\n",
" <td>British Virgin Islands</td>\n",
" <td>Undetermined</td>\n",
" <td>14817</td>\n",
" </tr>\n",
" <tr>\n",
" <th>811</th>\n",
" <td>Switzerland</td>\n",
" <td>Panama</td>\n",
" <td>11590</td>\n",
" </tr>\n",
" <tr>\n",
" <th>448</th>\n",
" <td>Jersey</td>\n",
" <td>British Virgin Islands</td>\n",
" <td>10467</td>\n",
" </tr>\n",
" <tr>\n",
" <th>735</th>\n",
" <td>Samoa</td>\n",
" <td>Samoa</td>\n",
" <td>7677</td>\n",
" </tr>\n",
" <tr>\n",
" <th>370</th>\n",
" <td>Hong Kong</td>\n",
" <td>Undetermined</td>\n",
" <td>6110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>651</th>\n",
" <td>Panama</td>\n",
" <td>Panama</td>\n",
" <td>6090</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country1 jurisdiction total_companies\n",
"117 British Virgin Islands British Virgin Islands 37843\n",
"358 Hong Kong British Virgin Islands 27300\n",
"805 Switzerland British Virgin Islands 19650\n",
"634 Not identified Undetermined 18193\n",
"124 British Virgin Islands Undetermined 14817\n",
"811 Switzerland Panama 11590\n",
"448 Jersey British Virgin Islands 10467\n",
"735 Samoa Samoa 7677\n",
"370 Hong Kong Undetermined 6110\n",
"651 Panama Panama 6090"
]
},
"execution_count": 159,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source_country_2_level_counts_sorted = source_country_2_level_counts.sort_values(by=['total_companies'], ascending = False)\n",
"source_country_2_level_counts_sorted.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#In the interest of time, I will export the data for visualisation in plotly. \n",
"source_country_2_level_counts_sorted.to_csv(\"data/country_jurisdiction_sorted.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https://plot.ly/~davified/17.embed\" height=\"400\" width=\"100%\"></iframe>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"execution_count": 156,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#I've visualised the data in plotly (it was a breeze!), and it even allows me to embed an interactive chart in\n",
"#Jupyter notebook! WOW!)\n",
"HTML('<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https://plot.ly/~davified/17.embed\" height=\"400\" width=\"100%\"></iframe>')\n",
"\n",
"#Note: You can zoom in by dragging and selecting the section of interest. Double click to zoom out"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. A heatmap representing the number of companies found in each country, broken down by destination jurisdictions"
]
},
{
"cell_type": "code",
"execution_count": 133,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<iframe src=\"https://docs.google.com/spreadsheets/d/1kqYZrG7GPBPDWsJWBWlBHBV7DCt9kFaWNi83VkVOZac/pubhtml?gid=1600001876&amp;single=true&amp;widget=true&amp;headers=false\" width=1000 height=500></iframe>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"execution_count": 133,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#I exported the source_country_2_level_counts_sorted dataframe to create a heatmap on Google Sheets.\n",
"\n",
"HTML('<iframe src=\"https://docs.google.com/spreadsheets/d/1kqYZrG7GPBPDWsJWBWlBHBV7DCt9kFaWNi83VkVOZac/pubhtml?gid=1600001876&amp;single=true&amp;widget=true&amp;headers=false\" width=1000 height=500></iframe>')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. Visualising the locations of the Singapore-based companies found in the Panama Papers (with the help of Google Maps)"
]
},
{
"cell_type": "code",
"execution_count": 139,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<iframe src=\"https://www.google.com/maps/d/embed?mid=1nrJFPuC2TWjS3vVHQbPCnU2jmFQ\" width=\"1000\" height=\"500\"></iframe>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"execution_count": 139,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"HTML('<iframe src=\"https://www.google.com/maps/d/embed?mid=1nrJFPuC2TWjS3vVHQbPCnU2jmFQ\" width=\"1000\" height=\"500\"></iframe>')\n",
"\n",
"#Hover over the placemarkers to see details of the company"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"###About Python\n",
"\n",
"1. Pandas is really powerful and fast\n",
"2. Plotly is really intuitive and it syncs well with Python and Jupyter Notebook as well\n",
"\n",
"###About the Panama Papers\n",
"\n",
"Based on the exploratory data analyses above, I've found out that:\n",
"1. The top 5 source countries of such shady dealings are: British Virgin Islands, Hong Kong, Switzerland, \"Not Identified\" and Panama. I guess the lawyers and accountants have learnt to hide their trail well. Apart from the clear suspects (BVI, Panama), there is a big chunk which is \"Not Identified\" - and this makes it quite hard to conclusively identify tax havens.\n",
"2. Even though Singapore is often touted as a possible tax haven, the figures show that the degree to which it houses companies related to the Panama Papers is much lesser as compared to Hong Kong, Switzerland, British Virgin islands. That said, as a source country, it is ranked 15th (5,867 companies), right after the United States (6,254 companies). As a jurisdiction, it is ranked 11th (668 companies), right after Nevada (1,260 companies) and Hong Kong (1,331).\n",
"\n",
"Do explore the charts above yourself and let me know what else you can conclude from them :-)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment