Skip to content

Instantly share code, notes, and snippets.

@psychemedia
Last active December 5, 2020 14:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save psychemedia/30a9353b6c0bcf2dc866 to your computer and use it in GitHub Desktop.
Save psychemedia/30a9353b6c0bcf2dc866 to your computer and use it in GitHub Desktop.
A quick conversation with data downloaded from 360 Giving http://lin-360giving.aptivate.org/
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:f08d5ad4dcbf131e6fbbbf77195d1d27cdbade203f7aa3c23a4dca175549b133"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
" 360 Giving Grant Navigator Data Conversation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Earlier today, via @OwenBoswarva, I noticed this post announcing [A big day for charity data](http://guest.thirdsector.co.uk/2014/08/14/a-big-day-for-charity-data/) and the release of \"details of more than a quarter of a million different grants covering the past 20 years\" by [360 Giving](http://lin-360giving.aptivate.org/).\n",
"\n",
"The site aggrgegates grant funding donations from a variety of sources and makes them available via an online database and a downloadable CSV document containing about 230,000 rows."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!wget http://lin-360giving.aptivate.org/static/uploads/all_grants.csv.gz -P data\n",
"!gunzip \"data/all_grants.csv.gz\" -d data"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"--2014-08-14 18:35:54-- http://lin-360giving.aptivate.org/static/uploads/all_grants.csv.gz\r\n",
"Resolving lin-360giving.aptivate.org (lin-360giving.aptivate.org)... "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"85.159.210.79\r\n",
"Connecting to lin-360giving.aptivate.org (lin-360giving.aptivate.org)|85.159.210.79|:80... connected.\r\n",
"HTTP request sent, awaiting response... 200 OK\r\n",
"Length: 22082694 (21M) [application/x-gzip]\r\n",
"Saving to: 'data/all_grants.csv.gz'\r\n",
"\r\n",
"\r",
" 0% [ ] 0 --.-K/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
" 1% [ ] 407,072 1.92MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
" 3% [> ] 855,072 2.02MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
" 5% [=> ] 1,210,672 1.84MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
" 7% [=> ] 1,672,672 1.93MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"10% [==> ] 2,239,672 2.06MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"12% [====> ] 2,836,072 2.19MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"15% [=====> ] 3,508,072 2.32MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"18% [======> ] 3,986,304 2.32MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"20% [=======> ] 4,589,704 2.38MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"22% [=======> ] 4,945,304 2.30MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"25% [=========> ] 5,691,504 2.41MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"29% [==========> ] 6,437,704 2.50MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"33% [===========> ] 7,360,304 2.63MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"37% [=============> ] 8,245,104 2.71MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"40% [==============> ] 8,873,704 2.73MB/s eta 5s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"43% [===============> ] 9,579,304 2.74MB/s eta 5s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"49% [==================> ] 10,983,504 3.09MB/s eta 5s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"54% [====================> ] 12,123,104 3.25MB/s eta 5s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"58% [=====================> ] 12,821,704 3.34MB/s eta 5s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"61% [======================> ] 13,513,096 3.34MB/s eta 3s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"65% [========================> ] 14,365,696 3.42MB/s eta 3s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"68% [=========================> ] 15,074,304 3.45MB/s eta 3s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"72% [===========================> ] 15,918,504 3.48MB/s eta 3s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"75% [============================> ] 16,729,104 3.56MB/s eta 3s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"79% [=============================> ] 17,536,904 3.61MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"82% [===============================> ] 18,276,104 3.72MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"85% [================================> ] 18,798,304 3.68MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"87% [=================================> ] 19,317,704 3.60MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"89% [=================================> ] 19,788,104 3.47MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"91% [==================================> ] 20,278,104 3.43MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"94% [===================================> ] 20,798,904 3.38MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"97% [====================================> ] 21,447,104 3.38MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"99% [=====================================> ] 21,906,304 3.19MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"100%[======================================>] 22,082,694 3.06MB/s in 6.9s \r\n",
"\r\n",
"2014-08-14 18:36:06 (3.04 MB/s) - 'data/all_grants.csv.gz' saved [22082694/22082694]\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"gzip: data is a directory -- ignored\r\n"
]
}
],
"prompt_number": 184
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!ls data"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"README.md all_grants.csv\r\n"
]
}
],
"prompt_number": 185
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df=pd.read_csv('data/all_grants.csv', dtype={'title':object},\n",
" parse_dates=['applicationdate_startdate', 'awarddate_startdate', 'planneddates_startdate','planneddates_enddate'] )"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 231
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[:3]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>title</th>\n",
" <th>description</th>\n",
" <th>fundingorganization_id</th>\n",
" <th>fundingorganization_name</th>\n",
" <th>recipientorganization_id</th>\n",
" <th>recipientorganization_name</th>\n",
" <th>currency</th>\n",
" <th>totalamountappliedfor</th>\n",
" <th>totalamountawarded</th>\n",
" <th>recipientorganization_charitynumber</th>\n",
" <th>recipientorganization_companynumber</th>\n",
" <th>applicationdate_startdate</th>\n",
" <th>awarddate_startdate</th>\n",
" <th>planneddates_startdate</th>\n",
" <th>planneddates_enddate</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 360G-biglottery:f282eb6b-4200-44d9-96a4-9d09aa...</td>\n",
" <td> NaN</td>\n",
" <td> This tennis club will work with six local scho...</td>\n",
" <td> 360G-biglottery:Joint Pot</td>\n",
" <td> Big Lottery</td>\n",
" <td> 360G-biglottery:Acklam Tennis Club</td>\n",
" <td> Acklam Tennis Club</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td> 4203</td>\n",
" <td> NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaT</td>\n",
" <td>2008-02-20</td>\n",
" <td>NaT</td>\n",
" <td>NaT</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 360G-biglottery:f282f0e4-77fc-4b21-bff9-d0519d...</td>\n",
" <td> NaN</td>\n",
" <td> This school provides educational activities fo...</td>\n",
" <td> 360G-biglottery:Big Lottery Fund</td>\n",
" <td> Big Lottery</td>\n",
" <td> 360G-biglottery:Holy Cross RC Primary School</td>\n",
" <td> Holy Cross RC Primary School</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td> 5972</td>\n",
" <td> NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaT</td>\n",
" <td>2007-03-15</td>\n",
" <td>NaT</td>\n",
" <td>NaT</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 360G-biglottery:f2828b86-14a3-4557-a85b-c69263...</td>\n",
" <td> NaN</td>\n",
" <td> This group in Croydon will provide training, h...</td>\n",
" <td> 360G-biglottery:Joint Pot</td>\n",
" <td> Big Lottery</td>\n",
" <td> 360G-biglottery:African Youth Development Asso...</td>\n",
" <td> African Youth Development Association</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td> 10000</td>\n",
" <td> NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaT</td>\n",
" <td>2007-12-10</td>\n",
" <td>NaT</td>\n",
" <td>NaT</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 232,
"text": [
" id title \\\n",
"0 360G-biglottery:f282eb6b-4200-44d9-96a4-9d09aa... NaN \n",
"1 360G-biglottery:f282f0e4-77fc-4b21-bff9-d0519d... NaN \n",
"2 360G-biglottery:f2828b86-14a3-4557-a85b-c69263... NaN \n",
"\n",
" description \\\n",
"0 This tennis club will work with six local scho... \n",
"1 This school provides educational activities fo... \n",
"2 This group in Croydon will provide training, h... \n",
"\n",
" fundingorganization_id fundingorganization_name \\\n",
"0 360G-biglottery:Joint Pot Big Lottery \n",
"1 360G-biglottery:Big Lottery Fund Big Lottery \n",
"2 360G-biglottery:Joint Pot Big Lottery \n",
"\n",
" recipientorganization_id \\\n",
"0 360G-biglottery:Acklam Tennis Club \n",
"1 360G-biglottery:Holy Cross RC Primary School \n",
"2 360G-biglottery:African Youth Development Asso... \n",
"\n",
" recipientorganization_name currency totalamountappliedfor \\\n",
"0 Acklam Tennis Club NaN NaN \n",
"1 Holy Cross RC Primary School NaN NaN \n",
"2 African Youth Development Association NaN NaN \n",
"\n",
" totalamountawarded recipientorganization_charitynumber \\\n",
"0 4203 NaN \n",
"1 5972 NaN \n",
"2 10000 NaN \n",
"\n",
" recipientorganization_companynumber applicationdate_startdate \\\n",
"0 NaN NaT \n",
"1 NaN NaT \n",
"2 NaN NaT \n",
"\n",
" awarddate_startdate planneddates_startdate planneddates_enddate \n",
"0 2008-02-20 NaT NaT \n",
"1 2007-03-15 NaT NaT \n",
"2 2007-12-10 NaT NaT "
]
}
],
"prompt_number": 232
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df['title'].unique()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 233,
"text": [
"array([nan], dtype=object)"
]
}
],
"prompt_number": 233
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Who are the funders?\n",
"funders=df['fundingorganization_name'].unique().tolist()\n",
"funders"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 191,
"text": [
"['Big Lottery',\n",
" 'Arts Council England',\n",
" 'Sport England',\n",
" 'Nesta',\n",
" 'TSB',\n",
" 'Wellcome Trust',\n",
" 'Arts Council Wales',\n",
" 'Sport Northern Ireland',\n",
" 'DSDNI',\n",
" 'Sport Wales',\n",
" 'Creative Scotland',\n",
" 'Nominet Trust']"
]
}
],
"prompt_number": 191
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[df['fundingorganization_name'].str.contains('Rock')]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>title</th>\n",
" <th>description</th>\n",
" <th>fundingorganization_id</th>\n",
" <th>fundingorganization_name</th>\n",
" <th>recipientorganization_id</th>\n",
" <th>recipientorganization_name</th>\n",
" <th>currency</th>\n",
" <th>totalamountappliedfor</th>\n",
" <th>totalamountawarded</th>\n",
" <th>recipientorganization_charitynumber</th>\n",
" <th>recipientorganization_companynumber</th>\n",
" <th>applicationdate_startdate</th>\n",
" <th>awarddate_startdate</th>\n",
" <th>planneddates_startdate</th>\n",
" <th>planneddates_enddate</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 238,
"text": [
"Empty DataFrame\n",
"Columns: [id, title, description, fundingorganization_id, fundingorganization_name, recipientorganization_id, recipientorganization_name, currency, totalamountappliedfor, totalamountawarded, recipientorganization_charitynumber, recipientorganization_companynumber, applicationdate_startdate, awarddate_startdate, planneddates_startdate, planneddates_enddate]\n",
"Index: []"
]
}
],
"prompt_number": 238
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So this makes me think that the data from the [Gateway to Research](http://gtr.rcuk.ac.uk/) is missing and could be added in to the mix, if they make bulk downloads available (or if we scrape it..)? Also, the TSB data at least - which is already included in the 360 Giving data - would have to be deduped."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Which organisations are the most 20 successful recipents in terms of number of grants awarded?\n",
"df.groupby('recipientorganization_name').size().order(ascending=False)[:20]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 198,
"text": [
"recipientorganization_name\n",
"Early Years Team 2476\n",
"The Ulster Scots Agency 2281\n",
"VCU 2036\n",
"Arts & creativity Branch 1546\n",
"NIHE 1513\n",
"Sport NI 1068\n",
"UNIVERSITY OF OXFORD 805\n",
"UNIVERSITY OF CAMBRIDGE 634\n",
"Arts Council 599\n",
"UNIVERSITY COLLEGE LONDON 553\n",
"University of Oxford 514\n",
"NWDO 449\n",
"UNIVERSITY OF EDINBURGH 429\n",
"Policing and Community Safety Partnership 402\n",
"University of Cambridge 390\n",
"Foras na Gaeilge 374\n",
"Youth Justice Agency 363\n",
"University College London 362\n",
"BRO West Team 355\n",
"UNIVERSITY OF MANCHESTER 313\n",
"dtype: int64"
]
}
],
"prompt_number": 198
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Straight away we see a data cleanliness issue in terms of the different capitalisation of the same recipients. I'd also hazard a guess that there may be whitespace issues. Let's do a bit of tidying:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Strip any whitespace around the name\n",
"df['recipientorganization_name'] = df['recipientorganization_name'].str.strip()\n",
"#Generate a new column with names in uppercase\n",
"df['recipientorganization_name_upper'] = df['recipientorganization_name'].str.upper()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 199
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#So which organisations are actually the most 20 successful recipents in terms of number of grants awarded?\n",
"df.groupby('recipientorganization_name_upper').size().order(ascending=False)[:20]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 214,
"text": [
"recipientorganization_name_upper\n",
"EARLY YEARS TEAM 2476\n",
"THE ULSTER SCOTS AGENCY 2281\n",
"VCU 2036\n",
"ARTS & CREATIVITY BRANCH 1546\n",
"NIHE 1513\n",
"UNIVERSITY OF OXFORD 1436\n",
"UNIVERSITY OF CAMBRIDGE 1142\n",
"SPORT NI 1068\n",
"UNIVERSITY COLLEGE LONDON 1051\n",
"UNIVERSITY OF EDINBURGH 717\n",
"ARTS COUNCIL 599\n",
"UNIVERSITY OF MANCHESTER 585\n",
"UNIVERSITY OF GLASGOW 468\n",
"NWDO 449\n",
"UNIVERSITY OF BRISTOL 440\n",
"POLICING AND COMMUNITY SAFETY PARTNERSHIP 402\n",
"UNIVERSITY OF LIVERPOOL 399\n",
"IMPERIAL COLLEGE LONDON 386\n",
"FORAS NA GAEILGE 374\n",
"YOUTH JUSTICE AGENCY 363\n",
"dtype: int64"
]
}
],
"prompt_number": 214
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Now let's see which are the most 20 successful organisations in terms of the total amounts awarded?\n",
"orgs=df[['recipientorganization_name_upper','totalamountawarded']].groupby('recipientorganization_name_upper')\n",
"orgs.sum().sort('totalamountawarded',ascending=False)[:10]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>totalamountawarded</th>\n",
" </tr>\n",
" <tr>\n",
" <th>recipientorganization_name_upper</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>NIHE</th>\n",
" <td> 7.133095e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF OXFORD</th>\n",
" <td> 6.382125e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF CAMBRIDGE</th>\n",
" <td> 4.605349e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY COLLEGE LONDON</th>\n",
" <td> 4.135489e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF EDINBURGH</th>\n",
" <td> 2.429283e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>IMPERIAL COLLEGE LONDON</th>\n",
" <td> 1.588882e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF MANCHESTER</th>\n",
" <td> 1.527562e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF DUNDEE</th>\n",
" <td> 1.501241e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>KING'S COLLEGE LONDON</th>\n",
" <td> 1.310204e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF BRISTOL</th>\n",
" <td> 1.160328e+08</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 208,
"text": [
" totalamountawarded\n",
"recipientorganization_name_upper \n",
"NIHE 7.133095e+08\n",
"UNIVERSITY OF OXFORD 6.382125e+08\n",
"UNIVERSITY OF CAMBRIDGE 4.605349e+08\n",
"UNIVERSITY COLLEGE LONDON 4.135489e+08\n",
"UNIVERSITY OF EDINBURGH 2.429283e+08\n",
"IMPERIAL COLLEGE LONDON 1.588882e+08\n",
"UNIVERSITY OF MANCHESTER 1.527562e+08\n",
"UNIVERSITY OF DUNDEE 1.501241e+08\n",
"KING'S COLLEGE LONDON 1.310204e+08\n",
"UNIVERSITY OF BRISTOL 1.160328e+08"
]
}
],
"prompt_number": 208
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#So the universities are getting lots of grants and lots of income. How about if we try to separate out \"winners\" by funder?\n",
"#First in terms of number of grants received\n",
"df.groupby(['fundingorganization_name','recipientorganization_name_upper']).size().order(ascending=False)[:30]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 212,
"text": [
"fundingorganization_name recipientorganization_name_upper \n",
"DSDNI EARLY YEARS TEAM 2476\n",
" THE ULSTER SCOTS AGENCY 2281\n",
" VCU 2036\n",
" ARTS & CREATIVITY BRANCH 1546\n",
" NIHE 1513\n",
"Wellcome Trust UNIVERSITY OF OXFORD 1401\n",
"DSDNI SPORT NI 1068\n",
"Wellcome Trust UNIVERSITY OF CAMBRIDGE 1063\n",
" UNIVERSITY COLLEGE LONDON 957\n",
" UNIVERSITY OF EDINBURGH 684\n",
"DSDNI ARTS COUNCIL 599\n",
"Wellcome Trust UNIVERSITY OF MANCHESTER 499\n",
"DSDNI NWDO 449\n",
"Wellcome Trust UNIVERSITY OF GLASGOW 434\n",
"DSDNI POLICING AND COMMUNITY SAFETY PARTNERSHIP 402\n",
"Wellcome Trust UNIVERSITY OF BRISTOL 384\n",
"DSDNI FORAS NA GAEILGE 374\n",
" YOUTH JUSTICE AGENCY 363\n",
" BRO WEST TEAM 355\n",
"Wellcome Trust MISCELLANEOUS ENGLISH PROVINCES 355\n",
" UNIVERSITY OF LIVERPOOL 352\n",
" KING'S COLLEGE LONDON 339\n",
" IMPERIAL COLLEGE LONDON 312\n",
" UNIVERSITY OF DUNDEE 304\n",
"DSDNI DERRY CITY COUNCIL 293\n",
" OMAGH COUNCIL 287\n",
" EQUALITY & STRATEGY DIRECTORATE 276\n",
" BRO NORTH TEAM 268\n",
"Wellcome Trust IMPERIAL COLLEGE OF SCI TECH & MED 262\n",
" MISCELLANEOUS LONDON 255\n",
"dtype: int64"
]
}
],
"prompt_number": 212
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Which universities have received the most awards?\n",
"unis=df[df['recipientorganization_name_upper'].str.contains('UNIVERSITY')]\n",
"unis.groupby(['fundingorganization_name','recipientorganization_name_upper']).size().order(ascending=False)[:20]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 215,
"text": [
"fundingorganization_name recipientorganization_name_upper\n",
"Wellcome Trust UNIVERSITY OF OXFORD 1401\n",
" UNIVERSITY OF CAMBRIDGE 1063\n",
" UNIVERSITY COLLEGE LONDON 957\n",
" UNIVERSITY OF EDINBURGH 684\n",
" UNIVERSITY OF MANCHESTER 499\n",
" UNIVERSITY OF GLASGOW 434\n",
" UNIVERSITY OF BRISTOL 384\n",
" UNIVERSITY OF LIVERPOOL 352\n",
" UNIVERSITY OF DUNDEE 304\n",
" UNIVERSITY OF LEEDS 238\n",
" UNIVERSITY OF BIRMINGHAM 212\n",
" CARDIFF UNIVERSITY 170\n",
" UNIVERSITY OF LEICESTER 169\n",
" UNIVERSITY OF NOTTINGHAM 161\n",
" UNIVERSITY OF WARWICK 147\n",
" UNIVERSITY OF SHEFFIELD 139\n",
" UNIVERSITY OF NEWCASTLE 129\n",
" QUEEN MARY, UNIVERSITY OF LONDON 125\n",
" UNIVERSITY OF SOUTHAMPTON 121\n",
" UNIVERSITY OF YORK 119\n",
"dtype: int64"
]
}
],
"prompt_number": 215
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#How much cash have these universities received?\n",
"uniFunds=unis[['fundingorganization_name','recipientorganization_name_upper','totalamountawarded']]\n",
"uniFunds.groupby(['fundingorganization_name','recipientorganization_name_upper']).sum().sort('totalamountawarded',ascending=False)[:10]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>totalamountawarded</th>\n",
" </tr>\n",
" <tr>\n",
" <th>fundingorganization_name</th>\n",
" <th>recipientorganization_name_upper</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"10\" valign=\"top\">Wellcome Trust</th>\n",
" <th>UNIVERSITY OF OXFORD</th>\n",
" <td> 6.277400e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF CAMBRIDGE</th>\n",
" <td> 4.432572e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY COLLEGE LONDON</th>\n",
" <td> 3.987253e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF EDINBURGH</th>\n",
" <td> 2.390989e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF DUNDEE</th>\n",
" <td> 1.482222e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF MANCHESTER</th>\n",
" <td> 1.389828e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF GLASGOW</th>\n",
" <td> 1.031193e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF BRISTOL</th>\n",
" <td> 1.026566e+08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF LEEDS</th>\n",
" <td> 6.190601e+07</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UNIVERSITY OF SHEFFIELD</th>\n",
" <td> 6.070813e+07</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 219,
"text": [
" totalamountawarded\n",
"fundingorganization_name recipientorganization_name_upper \n",
"Wellcome Trust UNIVERSITY OF OXFORD 6.277400e+08\n",
" UNIVERSITY OF CAMBRIDGE 4.432572e+08\n",
" UNIVERSITY COLLEGE LONDON 3.987253e+08\n",
" UNIVERSITY OF EDINBURGH 2.390989e+08\n",
" UNIVERSITY OF DUNDEE 1.482222e+08\n",
" UNIVERSITY OF MANCHESTER 1.389828e+08\n",
" UNIVERSITY OF GLASGOW 1.031193e+08\n",
" UNIVERSITY OF BRISTOL 1.026566e+08\n",
" UNIVERSITY OF LEEDS 6.190601e+07\n",
" UNIVERSITY OF SHEFFIELD 6.070813e+07"
]
}
],
"prompt_number": 219
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Bored of the universities... how about limited companies?\n",
"ltd=df[df['recipientorganization_name_upper'].str.contains('LTD') | df['recipientorganization_name_upper'].str.contains('LIMITED')]\n",
"ltd.groupby(['fundingorganization_name','recipientorganization_name_upper']).size().order(ascending=False)[:20]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 223,
"text": [
"fundingorganization_name recipientorganization_name_upper \n",
"TSB TWI LIMITED 66\n",
" C-TECH INNOVATION LIMITED 39\n",
" CENTRE FOR PROCESS INNOVATION LIMITED 38\n",
" NPL MANAGEMENT LIMITED 37\n",
"Arts Council Wales CHAPTER (CARDIFF) LTD. 32\n",
"Creative Scotland HIGHLANDS AND ISLANDS ARTS LTD 30\n",
"TSB RICARDO UK LIMITED 27\n",
" AIRBUS UK LIMITED 26\n",
"Creative Scotland DANCE BASE LIMITED 25\n",
"TSB BUILDING RESEARCH ESTABLISHMENT LIMITED 25\n",
"Arts Council Wales EARTHFALL DANCE LTD. 24\n",
"TSB QINETIQ LIMITED 23\n",
"Creative Scotland CITIZENS\u2019 THEATRE LIMITED 23\n",
"TSB BAE SYSTEMS (OPERATIONS) LIMITED 22\n",
"Creative Scotland HORSECROSS ARTS LIMITED 21\n",
"Arts Council Wales DAWNS TAN TAN DANCE LTD. 21\n",
"Sport England BLACK COUNTRY CONSORTIUM LTD 21\n",
"TSB AIRBUS OPERATIONS LIMITED 21\n",
" GKN AEROSPACE SERVICES LIMITED 20\n",
"Creative Scotland TRAVERSE THEATRE (SCOTLAND) LIMITED 20\n",
"dtype: int64"
]
}
],
"prompt_number": 223
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#And in terms of the cash received?\n",
"ltdFunds=ltd[['fundingorganization_name','recipientorganization_name_upper','totalamountawarded']]\n",
"ltdFunds.groupby(['fundingorganization_name','recipientorganization_name_upper']).sum().sort('totalamountawarded',ascending=False)[:15]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>totalamountawarded</th>\n",
" </tr>\n",
" <tr>\n",
" <th>fundingorganization_name</th>\n",
" <th>recipientorganization_name_upper</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>TSB</th>\n",
" <th>NATIONAL RENEWABLE ENERGY CENTRE LIMITED</th>\n",
" <td> 93760524</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Big Lottery</th>\n",
" <th>SUSTRANS LIMITED</th>\n",
" <td> 76104184</td>\n",
" </tr>\n",
" <tr>\n",
" <th>TSB</th>\n",
" <th>CENTRE FOR PROCESS INNOVATION LIMITED</th>\n",
" <td> 75132460</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Big Lottery</th>\n",
" <th>LIFE CHANGES (TRUSTEE) LIMITED</th>\n",
" <td> 49338186</td>\n",
" </tr>\n",
" <tr>\n",
" <th>TSB</th>\n",
" <th>AIRBUS UK LIMITED</th>\n",
" <td> 37976726</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Big Lottery</th>\n",
" <th>FORCES IN MIND TRUSTEE LIMITED</th>\n",
" <td> 34808423</td>\n",
" </tr>\n",
" <tr>\n",
" <th>TSB</th>\n",
" <th>GKN AEROSPACE SERVICES LIMITED</th>\n",
" <td> 30163228</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">Big Lottery</th>\n",
" <th>LEGACY TRUST UK LIMITED</th>\n",
" <td> 28850000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CATCH 22 CHARITY LIMITED</th>\n",
" <td> 27636814</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Sport England</th>\n",
" <th>ECORYS UK LIMITED</th>\n",
" <td> 27630000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>TSB</th>\n",
" <th>CELL THERAPY CATAPULT LIMITED</th>\n",
" <td> 23388900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Big Lottery</th>\n",
" <th>THE SOIL ASSOCIATION LIMITED</th>\n",
" <td> 21549732</td>\n",
" </tr>\n",
" <tr>\n",
" <th>TSB</th>\n",
" <th>AGUSTAWESTLAND LIMITED</th>\n",
" <td> 18523006</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Wellcome Trust</th>\n",
" <th>UK BIOBANK LTD</th>\n",
" <td> 18297438</td>\n",
" </tr>\n",
" <tr>\n",
" <th>TSB</th>\n",
" <th>AIRBUS OPERATIONS LIMITED</th>\n",
" <td> 17536265</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 230,
"text": [
" totalamountawarded\n",
"fundingorganization_name recipientorganization_name_upper \n",
"TSB NATIONAL RENEWABLE ENERGY CENTRE LIMITED 93760524\n",
"Big Lottery SUSTRANS LIMITED 76104184\n",
"TSB CENTRE FOR PROCESS INNOVATION LIMITED 75132460\n",
"Big Lottery LIFE CHANGES (TRUSTEE) LIMITED 49338186\n",
"TSB AIRBUS UK LIMITED 37976726\n",
"Big Lottery FORCES IN MIND TRUSTEE LIMITED 34808423\n",
"TSB GKN AEROSPACE SERVICES LIMITED 30163228\n",
"Big Lottery LEGACY TRUST UK LIMITED 28850000\n",
" CATCH 22 CHARITY LIMITED 27636814\n",
"Sport England ECORYS UK LIMITED 27630000\n",
"TSB CELL THERAPY CATAPULT LIMITED 23388900\n",
"Big Lottery THE SOIL ASSOCIATION LIMITED 21549732\n",
"TSB AGUSTAWESTLAND LIMITED 18523006\n",
"Wellcome Trust UK BIOBANK LTD 18297438\n",
"TSB AIRBUS OPERATIONS LIMITED 17536265"
]
}
],
"prompt_number": 230
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Who's most successful in terms of total number of grants received apart from the universities?\n",
"nunis=df[~df['recipientorganization_name_upper'].str.contains('UNIVERSITY')]\n",
"nunis.groupby(['fundingorganization_name','recipientorganization_name_upper']).size().order(ascending=False)[:20]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 224,
"text": [
"fundingorganization_name recipientorganization_name_upper \n",
"DSDNI EARLY YEARS TEAM 2476\n",
" THE ULSTER SCOTS AGENCY 2281\n",
" VCU 2036\n",
" ARTS & CREATIVITY BRANCH 1546\n",
" NIHE 1513\n",
" SPORT NI 1068\n",
" ARTS COUNCIL 599\n",
" NWDO 449\n",
" POLICING AND COMMUNITY SAFETY PARTNERSHIP 402\n",
" FORAS NA GAEILGE 374\n",
" YOUTH JUSTICE AGENCY 363\n",
" BRO WEST TEAM 355\n",
"Wellcome Trust MISCELLANEOUS ENGLISH PROVINCES 355\n",
" KING'S COLLEGE LONDON 339\n",
" IMPERIAL COLLEGE LONDON 312\n",
"DSDNI DERRY CITY COUNCIL 293\n",
" OMAGH COUNCIL 287\n",
" EQUALITY & STRATEGY DIRECTORATE 276\n",
" BRO NORTH TEAM 268\n",
"Wellcome Trust IMPERIAL COLLEGE OF SCI TECH & MED 262\n",
"dtype: int64"
]
}
],
"prompt_number": 224
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Hmmm, councils... which have had most grants?\n",
"ccls=df[df['recipientorganization_name_upper'].str.contains('COUNCIL')]\n",
"ccls.groupby(['fundingorganization_name','recipientorganization_name_upper']).size().order(ascending=False)[:20]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 226,
"text": [
"fundingorganization_name recipientorganization_name_upper \n",
"DSDNI ARTS COUNCIL 599\n",
" DERRY CITY COUNCIL 293\n",
" OMAGH COUNCIL 287\n",
" BALLYMONEY COUNCIL 192\n",
" NORTHERN IRELAND MUSEUM COUNCIL 107\n",
" BELFAST CITY COUNCIL 97\n",
" STRABANE DISTRICT COUNCIL 93\n",
" DOWN DISTRICT COUNCIL 91\n",
" LISBURN CITY COUNCIL 88\n",
"Sport Wales FLINTSHIRE COUNTY COUNCIL 70\n",
"DSDNI MOYLE COUNCIL 69\n",
" COLERAINE COUNCIL 64\n",
"Wellcome Trust MEDICAL RESEARCH COUNCIL 63\n",
"DSDNI FERMANAGH DISTRICT COUNCIL 48\n",
" ANTRIM BOROUGH COUNCIL 46\n",
"Sport Wales CONWY COUNTY BOROUGH COUNCIL 43\n",
" THE COUNCIL OF THE CITY AND COUNTY OF CARDIFF 43\n",
"Creative Scotland CITY OF EDINBURGH COUNCIL 43\n",
"DSDNI COOKSTOWN BOROUGH COUNCIL 43\n",
"Sport Wales GWYNEDD COUNCIL 42\n",
"dtype: int64"
]
}
],
"prompt_number": 226
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#How about council success in terms of cash...?\n",
"cclFunds=ccls[['fundingorganization_name','recipientorganization_name_upper','totalamountawarded']]\n",
"cclFunds.groupby(['fundingorganization_name','recipientorganization_name_upper']).sum().sort('totalamountawarded',ascending=False)[:15]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>totalamountawarded</th>\n",
" </tr>\n",
" <tr>\n",
" <th>fundingorganization_name</th>\n",
" <th>recipientorganization_name_upper</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>DSDNI</th>\n",
" <th>ARTS COUNCIL</th>\n",
" <td> 48512098.35</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">Big Lottery</th>\n",
" <th>CORNWALL COUNCIL</th>\n",
" <td> 27033958.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FALKIRK COUNCIL</th>\n",
" <td> 25744798.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>BELFAST CITY COUNCIL</th>\n",
" <td> 25416214.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>TSB</th>\n",
" <th>GLASGOW CITY COUNCIL</th>\n",
" <td> 24053259.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Big Lottery</th>\n",
" <th>BIRMINGHAM VOLUNTARY SERVICE COUNCIL</th>\n",
" <td> 18541998.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Wellcome Trust</th>\n",
" <th>MEDICAL RESEARCH COUNCIL</th>\n",
" <td> 18126555.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Sport England</th>\n",
" <th>SPORTS COUNCIL TRUST COMPANY</th>\n",
" <td> 16000000.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Big Lottery</th>\n",
" <th>BIRMINGHAM CITY COUNCIL</th>\n",
" <td> 15172512.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Sport Northern Ireland</th>\n",
" <th>NORTH DOWN BOROUGH COUNCIL</th>\n",
" <td> 14916784.19</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"5\" valign=\"top\">Big Lottery</th>\n",
" <th>STOCKPORT METROPOLITAN BOROUGH COUNCIL</th>\n",
" <td> 12159922.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>WOLVERHAMPTON VOLUNTARY SECTOR COUNCIL</th>\n",
" <td> 10422400.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>LIVERPOOL CITY COUNCIL</th>\n",
" <td> 10116854.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SCOTTISH COUNCIL FOR VOLUNTARY ORGANISATIONS</th>\n",
" <td> 9826005.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>THE NATIONAL COUNCIL FOR VOLUNTARY ORGANISATIONS</th>\n",
" <td> 9674505.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 228,
"text": [
" totalamountawarded\n",
"fundingorganization_name recipientorganization_name_upper \n",
"DSDNI ARTS COUNCIL 48512098.35\n",
"Big Lottery CORNWALL COUNCIL 27033958.00\n",
" FALKIRK COUNCIL 25744798.00\n",
" BELFAST CITY COUNCIL 25416214.00\n",
"TSB GLASGOW CITY COUNCIL 24053259.00\n",
"Big Lottery BIRMINGHAM VOLUNTARY SERVICE COUNCIL 18541998.00\n",
"Wellcome Trust MEDICAL RESEARCH COUNCIL 18126555.00\n",
"Sport England SPORTS COUNCIL TRUST COMPANY 16000000.00\n",
"Big Lottery BIRMINGHAM CITY COUNCIL 15172512.00\n",
"Sport Northern Ireland NORTH DOWN BOROUGH COUNCIL 14916784.19\n",
"Big Lottery STOCKPORT METROPOLITAN BOROUGH COUNCIL 12159922.00\n",
" WOLVERHAMPTON VOLUNTARY SECTOR COUNCIL 10422400.00\n",
" LIVERPOOL CITY COUNCIL 10116854.00\n",
" SCOTTISH COUNCIL FOR VOLUNTARY ORGANISATIONS 9826005.00\n",
" THE NATIONAL COUNCIL FOR VOLUNTARY ORGANISATIONS 9674505.00"
]
}
],
"prompt_number": 228
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Putting the Stuff into a Database"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following example shows one way of getting the data into a database that's already up and running..."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#psql lets us return tables as dataframes\n",
"from pandas import read_sql_query as psql\n",
"#psqlx is a convenience fucntion for executing raw queries where we aren't so bothered about the response...\n",
"#Errors should still be thrown...\n",
"from pandas.io.sql import execute as psqlx\n",
"\n",
"import pandas as pd\n",
"from sqlalchemy import create_engine"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Create the database connection for pandas dataframe operations\n",
"engine = create_engine('postgresql://test:test@localhost:5432/tm351test')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 30
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#This ia a helper routine to help find the maximum length of each field - cast as a string - in the datafile\n",
"def colMaxLen(df,maxlens):\n",
" for col in df.columns:\n",
" cml=df[col].astype(str).map(len).max()\n",
" if cml > maxlens[col]: maxlens[col]=cml"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Let's detect the maximum field lengths in the dataset so we can use it to help configue our database table\n",
"fname='data/all_grants.csv'\n",
"maxlens={}\n",
"for name in df.columns: maxlens[name]=0\n",
" \n",
"chunks=pd.read_csv(fname,chunksize=10000)\n",
"for chunk in chunks:\n",
" colMaxLen(chunk,maxlens)\n",
"maxlens"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 43,
"text": [
"{'planneddates_enddate': 3,\n",
" 'awarddate_startdate': 25,\n",
" 'currency': 3,\n",
" 'applicationdate_startdate': 3,\n",
" 'recipientorganization_charitynumber': 8,\n",
" 'recipientorganization_companynumber': 3,\n",
" 'fundingorganization_id': 38,\n",
" 'id': 62,\n",
" 'totalamountawarded': 11,\n",
" 'fundingorganization_name': 22,\n",
" 'recipientorganization_id': 96,\n",
" 'description': 512,\n",
" 'recipientorganization_name': 113,\n",
" 'planneddates_startdate': 3,\n",
" 'totalamountappliedfor': 3,\n",
" 'title': 3}"
]
}
],
"prompt_number": 43
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#If the table we're going to use already exists, get rid of it - we're going for a fresh start...\n",
"psqlx(\"DROP TABLE IF EXISTS charities360data;\",engine)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 141,
"text": [
"<sqlalchemy.engine.result.ResultProxy at 0x7fce63dc2b00>"
]
}
],
"prompt_number": 141
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Here's where I create the table - gonna be lazy and make lots of things varchar for a quick start...\n",
"psqlx(\"CREATE TABLE charities360data ( planneddates_enddate VARCHAR (30), awarddate_startdate VARCHAR (30), currency VARCHAR (10), \\\n",
" applicationdate_startdate VARCHAR (30), recipientorganization_charitynumber VARCHAR (10), \\\n",
" recipientorganization_companynumber VARCHAR (10), fundingorganization_id VARCHAR (50), \\\n",
" id VARCHAR (75) PRIMARY KEY, totalamountawarded FLOAT, fundingorganization_name VARCHAR (30), \\\n",
" recipientorganization_id VARCHAR(100), description VARCHAR(1000), recipientorganization_name VARCHAR (150), \\\n",
" recipientorganization_name_upper VARCHAR (150), \\\n",
" planneddates_startdate VARCHAR (30), totalamountappliedfor FLOAT, title VARCHAR (100) );\",engine)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 142,
"text": [
"<sqlalchemy.engine.result.ResultProxy at 0x7fce63dc2438>"
]
}
],
"prompt_number": 142
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Test the table\n",
"psql(\"SELECT * FROM charities360data LIMIT 1;\", engine)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>planneddates_enddate</th>\n",
" <th>awarddate_startdate</th>\n",
" <th>currency</th>\n",
" <th>applicationdate_startdate</th>\n",
" <th>recipientorganization_charitynumber</th>\n",
" <th>recipientorganization_companynumber</th>\n",
" <th>fundingorganization_id</th>\n",
" <th>id</th>\n",
" <th>totalamountawarded</th>\n",
" <th>fundingorganization_name</th>\n",
" <th>recipientorganization_id</th>\n",
" <th>description</th>\n",
" <th>recipientorganization_name</th>\n",
" <th>recipientorganization_name_upper</th>\n",
" <th>planneddates_startdate</th>\n",
" <th>totalamountappliedfor</th>\n",
" <th>title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 143,
"text": [
"Empty DataFrame\n",
"Columns: [planneddates_enddate, awarddate_startdate, currency, applicationdate_startdate, recipientorganization_charitynumber, recipientorganization_companynumber, fundingorganization_id, id, totalamountawarded, fundingorganization_name, recipientorganization_id, description, recipientorganization_name, recipientorganization_name_upper, planneddates_startdate, totalamountappliedfor, title]\n",
"Index: []"
]
}
],
"prompt_number": 143
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#This routine will load data in from a pandas dataframe, itself created from 10k rows at a time read in from the source CSV file\n",
"dateFields=[]#['applicationdate_startdate', 'awarddate_startdate', 'planneddates_startdate','planneddates_enddate']\n",
"chunks=pd.read_csv(fname,parse_dates=dateFields,chunksize=10000)\n",
"for chunk in chunks:\n",
" #Just in case, strip of any whotespace around names\n",
" for col in ['recipientorganization_name','fundingorganization_name']:\n",
" chunk[col]=chunk[col].str.strip()\n",
" #Another normalisation step: introduce a new column corresponding to organisation name in all upper case characters\n",
" chunk['recipientorganization_name_upper']=chunk['recipientorganization_name'].str.upper()\n",
" chunk.to_sql('charities360data', engine, index=False, if_exists='append')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 144
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Quick query to see which organisation names appear the most\n",
"psql(\"SELECT recipientorganization_name, COUNT(*) AS tally FROM charities360data \\\n",
" WHERE recipientorganization_name ILIKE '%%UNIVERSITY%%' \\\n",
" GROUP BY recipientorganization_name HAVING COUNT(*)>3 \\\n",
" ORDER BY tally DESC LIMIT 20;\", engine)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>recipientorganization_name</th>\n",
" <th>tally</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0 </th>\n",
" <td> UNIVERSITY OF OXFORD</td>\n",
" <td> 809</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1 </th>\n",
" <td> UNIVERSITY OF CAMBRIDGE</td>\n",
" <td> 643</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2 </th>\n",
" <td> University of Oxford</td>\n",
" <td> 592</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3 </th>\n",
" <td> UNIVERSITY COLLEGE LONDON</td>\n",
" <td> 556</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4 </th>\n",
" <td> University of Cambridge</td>\n",
" <td> 499</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5 </th>\n",
" <td> University College London</td>\n",
" <td> 495</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6 </th>\n",
" <td> UNIVERSITY OF EDINBURGH</td>\n",
" <td> 432</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7 </th>\n",
" <td> UNIVERSITY OF MANCHESTER</td>\n",
" <td> 316</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8 </th>\n",
" <td> University of Edinburgh</td>\n",
" <td> 285</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9 </th>\n",
" <td> University of Manchester</td>\n",
" <td> 269</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td> UNIVERSITY OF GLASGOW</td>\n",
" <td> 256</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td> UNIVERSITY OF BRISTOL</td>\n",
" <td> 222</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td> University of Bristol</td>\n",
" <td> 218</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td> UNIVERSITY OF LIVERPOOL</td>\n",
" <td> 217</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td> University of Glasgow</td>\n",
" <td> 211</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td> UNIVERSITY OF DUNDEE</td>\n",
" <td> 187</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td> University of Liverpool</td>\n",
" <td> 181</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td> University of Leeds</td>\n",
" <td> 155</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td> University of Nottingham</td>\n",
" <td> 151</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td> UNIVERSITY OF LEEDS</td>\n",
" <td> 148</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 145,
"text": [
" recipientorganization_name tally\n",
"0 UNIVERSITY OF OXFORD 809\n",
"1 UNIVERSITY OF CAMBRIDGE 643\n",
"2 University of Oxford 592\n",
"3 UNIVERSITY COLLEGE LONDON 556\n",
"4 University of Cambridge 499\n",
"5 University College London 495\n",
"6 UNIVERSITY OF EDINBURGH 432\n",
"7 UNIVERSITY OF MANCHESTER 316\n",
"8 University of Edinburgh 285\n",
"9 University of Manchester 269\n",
"10 UNIVERSITY OF GLASGOW 256\n",
"11 UNIVERSITY OF BRISTOL 222\n",
"12 University of Bristol 218\n",
"13 UNIVERSITY OF LIVERPOOL 217\n",
"14 University of Glasgow 211\n",
"15 UNIVERSITY OF DUNDEE 187\n",
"16 University of Liverpool 181\n",
"17 University of Leeds 155\n",
"18 University of Nottingham 151\n",
"19 UNIVERSITY OF LEEDS 148"
]
}
],
"prompt_number": 145
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#We note that the data appears to have the same organisation names in a variety of capitalisations...\n",
"#That's why we normalise them...\n",
"psql(\"SELECT recipientorganization_name_upper, COUNT(*) AS tally, to_char(SUM(totalamountawarded),'99999999999D99') FROM charities360data \\\n",
" WHERE recipientorganization_name_upper ILIKE '%%UNIVERSITY%%' AND totalamountawarded != 'NaN' AND totalamountawarded>10000000 \\\n",
" GROUP BY recipientorganization_name_upper HAVING COUNT(*)>3 \\\n",
" ORDER BY tally DESC;\", engine)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>recipientorganization_name_upper</th>\n",
" <th>tally</th>\n",
" <th>to_char</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> UNIVERSITY OF OXFORD</td>\n",
" <td> 7</td>\n",
" <td> 82317012.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 154,
"text": [
" recipientorganization_name_upper tally to_char\n",
"0 UNIVERSITY OF OXFORD 7 82317012.00"
]
}
],
"prompt_number": 154
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Quick test query around the total amounts awarded to orgs by normalised name\n",
"psql(\"SELECT recipientorganization_name_upper ,totalamountawarded FROM charities360data \\\n",
" WHERE totalamountawarded >10000000 AND totalamountawarded != 'NaN' \\\n",
" AND recipientorganization_name_upper ILIKE '%%UNIVERSITY%%' ORDER BY totalamountawarded DESC; \",engine)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>recipientorganization_name_upper</th>\n",
" <th>totalamountawarded</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0 </th>\n",
" <td> UNIVERSITY COLLEGE LONDON</td>\n",
" <td> 45000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1 </th>\n",
" <td> UNIVERSITY OF BRISTOL- NATIONAL COMPOSITES CENTRE</td>\n",
" <td> 28000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2 </th>\n",
" <td> UNIVERSITY OF SHEFFIELD</td>\n",
" <td> 16294595</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3 </th>\n",
" <td> UNIVERSITY OF MANCHESTER</td>\n",
" <td> 15000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4 </th>\n",
" <td> UNIVERSITY OF OXFORD</td>\n",
" <td> 14288026</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5 </th>\n",
" <td> UNIVERSITY OF MANCHESTER</td>\n",
" <td> 14151671</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6 </th>\n",
" <td> UNIVERSITY OF KWAZULU NATAL</td>\n",
" <td> 13526338</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7 </th>\n",
" <td> UNIVERSITY OF OXFORD</td>\n",
" <td> 12548429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8 </th>\n",
" <td> UNIVERSITY OF OXFORD</td>\n",
" <td> 12000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9 </th>\n",
" <td> UNIVERSITY OF EDINBURGH</td>\n",
" <td> 11200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td> UNIVERSITY OF OXFORD</td>\n",
" <td> 11111368</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td> UNIVERSITY OF OXFORD</td>\n",
" <td> 11046122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td> UNIVERSITY OF OXFORD</td>\n",
" <td> 11000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td> UNIVERSITY OF LEEDS</td>\n",
" <td> 10400000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td> UNIVERSITY OF OXFORD</td>\n",
" <td> 10323067</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td> UNIVERSITY OF CAMBRIDGE</td>\n",
" <td> 10188580</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 160,
"text": [
" recipientorganization_name_upper totalamountawarded\n",
"0 UNIVERSITY COLLEGE LONDON 45000000\n",
"1 UNIVERSITY OF BRISTOL- NATIONAL COMPOSITES CENTRE 28000000\n",
"2 UNIVERSITY OF SHEFFIELD 16294595\n",
"3 UNIVERSITY OF MANCHESTER 15000000\n",
"4 UNIVERSITY OF OXFORD 14288026\n",
"5 UNIVERSITY OF MANCHESTER 14151671\n",
"6 UNIVERSITY OF KWAZULU NATAL 13526338\n",
"7 UNIVERSITY OF OXFORD 12548429\n",
"8 UNIVERSITY OF OXFORD 12000000\n",
"9 UNIVERSITY OF EDINBURGH 11200000\n",
"10 UNIVERSITY OF OXFORD 11111368\n",
"11 UNIVERSITY OF OXFORD 11046122\n",
"12 UNIVERSITY OF OXFORD 11000000\n",
"13 UNIVERSITY OF LEEDS 10400000\n",
"14 UNIVERSITY OF OXFORD 10323067\n",
"15 UNIVERSITY OF CAMBRIDGE 10188580"
]
}
],
"prompt_number": 160
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#By the by, how many unique upper case names are there?\n",
"psql(\"SELECT COUNT(*) FROM (SELECT DISTINCT recipientorganization_name_upper FROM charities360data) tmp\",engine)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 122421</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 114,
"text": [
" count\n",
"0 122421"
]
}
],
"prompt_number": 114
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#One of the other tables I have in the same database is a list of company names and company numbers from Companies House\n",
"#Here's what each row looks like in that table...\n",
"psql(\"SELECT * from companydata limit 1\",engine)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>companyname</th>\n",
" <th>companynumber</th>\n",
" <th>regaddress_careof</th>\n",
" <th>regaddress_pobox</th>\n",
" <th>regaddress_addressline1</th>\n",
" <th>regaddress_addressline2</th>\n",
" <th>regaddress_posttown</th>\n",
" <th>regaddress_county</th>\n",
" <th>regaddress_country</th>\n",
" <th>regaddress_postcode</th>\n",
" <th>companycategory</th>\n",
" <th>companystatus</th>\n",
" <th>countryoforigin</th>\n",
" <th>incorporationdate</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> ! LTD</td>\n",
" <td> 08209948</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> METROHOUSE 57 PEPPER ROAD</td>\n",
" <td> HUNSLET</td>\n",
" <td> LEEDS</td>\n",
" <td> YORKSHIRE</td>\n",
" <td> UNITED KINGDOM</td>\n",
" <td> LS10 2RU</td>\n",
" <td> Private Limited Company</td>\n",
" <td> Active</td>\n",
" <td> United Kingdom</td>\n",
" <td> 2012-09-11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 241,
"text": [
" companyname companynumber regaddress_careof regaddress_pobox \\\n",
"0 ! LTD 08209948 NaN NaN \n",
"\n",
" regaddress_addressline1 regaddress_addressline2 regaddress_posttown \\\n",
"0 METROHOUSE 57 PEPPER ROAD HUNSLET LEEDS \n",
"\n",
" regaddress_county regaddress_country regaddress_postcode \\\n",
"0 YORKSHIRE UNITED KINGDOM LS10 2RU \n",
"\n",
" companycategory companystatus countryoforigin incorporationdate \n",
"0 Private Limited Company Active United Kingdom 2012-09-11 "
]
}
],
"prompt_number": 241
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Let's see how many names we an trivially join on between the 360Charities data and the Companies House data...\n",
"cnum=psql(\"SELECT DISTINCT ch.recipientorganization_name_upper, co.companynumber FROM charities360data ch JOIN companydata co \\\n",
" ON ch.recipientorganization_name_upper = co.companyname ;\",engine)\n",
"cnum[:10]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>recipientorganization_name_upper</th>\n",
" <th>companynumber</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 100 BLACK MEN OF LONDON</td>\n",
" <td> 04200429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 100 BLACK WOMEN OF LONDON</td>\n",
" <td> 07923522</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 10:10</td>\n",
" <td> 06958799</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 101 PLAYGROUP</td>\n",
" <td> 04945702</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 10RADIO CIC</td>\n",
" <td> 06004254</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> 11 TECH 18</td>\n",
" <td> 05480166</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> 170 COMMUNITY PROJECT</td>\n",
" <td> 03416785</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td> 198 CONTEMPORARY ARTS AND LEARNING LTD</td>\n",
" <td> 02369267</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td> 1ST 4 KIDZ COMMUNITY INTEREST COMPANY</td>\n",
" <td> 07175119</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td> 1ST CALL LOCKOUTS LIMITED</td>\n",
" <td> 04807947</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 100,
"text": [
" recipientorganization_name_upper companynumber\n",
"0 100 BLACK MEN OF LONDON 04200429\n",
"1 100 BLACK WOMEN OF LONDON 07923522\n",
"2 10:10 06958799\n",
"3 101 PLAYGROUP 04945702\n",
"4 10RADIO CIC 06004254\n",
"5 11 TECH 18 05480166\n",
"6 170 COMMUNITY PROJECT 03416785\n",
"7 198 CONTEMPORARY ARTS AND LEARNING LTD 02369267\n",
"8 1ST 4 KIDZ COMMUNITY INTEREST COMPANY 07175119\n",
"9 1ST CALL LOCKOUTS LIMITED 04807947"
]
}
],
"prompt_number": 100
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#How many matches have we got?\n",
"len(cnum)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 103,
"text": [
"15462"
]
}
],
"prompt_number": 103
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#We can bring in additional data about a company from it's company number using the OpenCorporates API\n",
"\n",
"import json, requests\n",
" \n",
"def openCorporatesLookupCompanyNumber(cnum):\n",
" ''' Grab company data from OpenCorporates '''\n",
" url='https://api.opencorporates.com/v0.3/companies/gb/{0}?format=json'.format(cnum)\n",
" jsondata = json.loads(requests.get(url).text)\n",
" return jsondata['results']\n",
"\n",
" \n",
"corpdata=openCorporatesLookupCompanyNumber('04200429')\n",
"\n",
"#We can then do things like look the the list of officers\n",
"officers = pd.DataFrame( [ d['officer'] for d in corpdata['company']['officers'] ] )\n",
"officers"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>end_date</th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>opencorporates_url</th>\n",
" <th>position</th>\n",
" <th>start_date</th>\n",
" <th>uid</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0 </th>\n",
" <td> None</td>\n",
" <td> 38555511</td>\n",
" <td> OLUKAYODELE ALAKE</td>\n",
" <td> https://opencorporates.com/officers/38555511</td>\n",
" <td> director</td>\n",
" <td> 2010-01-01</td>\n",
" <td> None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1 </th>\n",
" <td> None</td>\n",
" <td> 38555524</td>\n",
" <td> JONATHAN MASHE THOMAS</td>\n",
" <td> https://opencorporates.com/officers/38555524</td>\n",
" <td> director</td>\n",
" <td> 2007-07-09</td>\n",
" <td> None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2 </th>\n",
" <td> 2014-07-08</td>\n",
" <td> 38555530</td>\n",
" <td> LENNY LAWRENCE</td>\n",
" <td> https://opencorporates.com/officers/38555530</td>\n",
" <td> director</td>\n",
" <td> 2010-03-01</td>\n",
" <td> None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3 </th>\n",
" <td> 2012-10-13</td>\n",
" <td> 38555533</td>\n",
" <td> EDWARD ROYER</td>\n",
" <td> https://opencorporates.com/officers/38555533</td>\n",
" <td> director</td>\n",
" <td> 2007-07-09</td>\n",
" <td> None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4 </th>\n",
" <td> None</td>\n",
" <td> 38555536</td>\n",
" <td> KOLARELE SONAIKE</td>\n",
" <td> https://opencorporates.com/officers/38555536</td>\n",
" <td> director</td>\n",
" <td> 2005-10-24</td>\n",
" <td> None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5 </th>\n",
" <td> 2001-04-17</td>\n",
" <td> 38555540</td>\n",
" <td> KEVIN BREWER</td>\n",
" <td> https://opencorporates.com/officers/38555540</td>\n",
" <td> nominated secretary</td>\n",
" <td> 2001-04-17</td>\n",
" <td> None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6 </th>\n",
" <td> 2004-01-01</td>\n",
" <td> 38555541</td>\n",
" <td> LLOYD DIXON</td>\n",
" <td> https://opencorporates.com/officers/38555541</td>\n",
" <td> secretary</td>\n",
" <td> 2001-04-17</td>\n",
" <td> None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7 </th>\n",
" <td> 2007-07-09</td>\n",
" <td> 38555543</td>\n",
" <td> KENNETH BARNES</td>\n",
" <td> https://opencorporates.com/officers/38555543</td>\n",
" <td> director</td>\n",
" <td> 2001-04-17</td>\n",
" <td> None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8 </th>\n",
" <td> 2005-10-24</td>\n",
" <td> 38555545</td>\n",
" <td> GAIRY DAVIS</td>\n",
" <td> https://opencorporates.com/officers/38555545</td>\n",
" <td> director</td>\n",
" <td> 2004-07-01</td>\n",
" <td> None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9 </th>\n",
" <td> 2006-10-31</td>\n",
" <td> 38555546</td>\n",
" <td> ANIETIE EKA</td>\n",
" <td> https://opencorporates.com/officers/38555546</td>\n",
" <td> director</td>\n",
" <td> 2005-10-24</td>\n",
" <td> None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td> 2005-10-24</td>\n",
" <td> 38555547</td>\n",
" <td> DAVID PETER OKORO</td>\n",
" <td> https://opencorporates.com/officers/38555547</td>\n",
" <td> director</td>\n",
" <td> 2004-07-01</td>\n",
" <td> None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td> 2009-01-01</td>\n",
" <td> 38555548</td>\n",
" <td> ANTHONY ROBERT WARNER</td>\n",
" <td> https://opencorporates.com/officers/38555548</td>\n",
" <td> director</td>\n",
" <td> 2007-07-09</td>\n",
" <td> None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td> 2014-07-08</td>\n",
" <td> 39746456</td>\n",
" <td> OLUKAYODELE ALAKE</td>\n",
" <td> https://opencorporates.com/officers/39746456</td>\n",
" <td> secretary</td>\n",
" <td> 2004-01-01</td>\n",
" <td> None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td> None</td>\n",
" <td> 39746460</td>\n",
" <td> JONATHAN MASHE THOMAS</td>\n",
" <td> https://opencorporates.com/officers/39746460</td>\n",
" <td> secretary</td>\n",
" <td> 2007-07-09</td>\n",
" <td> None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 121,
"text": [
" end_date id name \\\n",
"0 None 38555511 OLUKAYODELE ALAKE \n",
"1 None 38555524 JONATHAN MASHE THOMAS \n",
"2 2014-07-08 38555530 LENNY LAWRENCE \n",
"3 2012-10-13 38555533 EDWARD ROYER \n",
"4 None 38555536 KOLARELE SONAIKE \n",
"5 2001-04-17 38555540 KEVIN BREWER \n",
"6 2004-01-01 38555541 LLOYD DIXON \n",
"7 2007-07-09 38555543 KENNETH BARNES \n",
"8 2005-10-24 38555545 GAIRY DAVIS \n",
"9 2006-10-31 38555546 ANIETIE EKA \n",
"10 2005-10-24 38555547 DAVID PETER OKORO \n",
"11 2009-01-01 38555548 ANTHONY ROBERT WARNER \n",
"12 2014-07-08 39746456 OLUKAYODELE ALAKE \n",
"13 None 39746460 JONATHAN MASHE THOMAS \n",
"\n",
" opencorporates_url position \\\n",
"0 https://opencorporates.com/officers/38555511 director \n",
"1 https://opencorporates.com/officers/38555524 director \n",
"2 https://opencorporates.com/officers/38555530 director \n",
"3 https://opencorporates.com/officers/38555533 director \n",
"4 https://opencorporates.com/officers/38555536 director \n",
"5 https://opencorporates.com/officers/38555540 nominated secretary \n",
"6 https://opencorporates.com/officers/38555541 secretary \n",
"7 https://opencorporates.com/officers/38555543 director \n",
"8 https://opencorporates.com/officers/38555545 director \n",
"9 https://opencorporates.com/officers/38555546 director \n",
"10 https://opencorporates.com/officers/38555547 director \n",
"11 https://opencorporates.com/officers/38555548 director \n",
"12 https://opencorporates.com/officers/39746456 secretary \n",
"13 https://opencorporates.com/officers/39746460 secretary \n",
"\n",
" start_date uid \n",
"0 2010-01-01 None \n",
"1 2007-07-09 None \n",
"2 2010-03-01 None \n",
"3 2007-07-09 None \n",
"4 2005-10-24 None \n",
"5 2001-04-17 None \n",
"6 2001-04-17 None \n",
"7 2001-04-17 None \n",
"8 2004-07-01 None \n",
"9 2005-10-24 None \n",
"10 2004-07-01 None \n",
"11 2007-07-09 None \n",
"12 2004-01-01 None \n",
"13 2007-07-09 None "
]
}
],
"prompt_number": 121
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#We can also search OpenCorporates officers by name\n",
"def openCorporatesOfficerSearch(officername):\n",
" ''' Search for officers by exact match name '''\n",
" url='https://api.opencorporates.com/v0.3/officers/search'\n",
" params={'q':'\"'+officername+'\"'}\n",
" jsondata = json.loads(requests.get(url,params=params).text)\n",
" data=[]\n",
" for d in jsondata['results']['officers']:\n",
" tmp=d['officer']\n",
" for i in ['jurisdiction_code','company_number']:\n",
" tmp[i]=tmp['company'][i]\n",
" tmp['company_name']=tmp['company']['name']\n",
" tmp.pop('company',None)\n",
" data.append(tmp)\n",
" df= pd.DataFrame( data )\n",
" return df\n",
"\n",
"\n",
"openCorporatesOfficerSearch(\"JONATHAN MASHE THOMAS\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>company_name</th>\n",
" <th>company_number</th>\n",
" <th>id</th>\n",
" <th>jurisdiction_code</th>\n",
" <th>name</th>\n",
" <th>opencorporates_url</th>\n",
" <th>position</th>\n",
" <th>retrieved_at</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> MARSH ANALYTICS LTD</td>\n",
" <td> 07057484</td>\n",
" <td> 58417258</td>\n",
" <td> gb</td>\n",
" <td> JONATHAN MASHE THOMAS</td>\n",
" <td> https://opencorporates.com/officers/58417258</td>\n",
" <td> director</td>\n",
" <td> 2014-06-25T14:08:25+00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> AFRICAN &amp; CARIBBEAN DIVERSITY</td>\n",
" <td> 06513287</td>\n",
" <td> 52442487</td>\n",
" <td> gb</td>\n",
" <td> JONATHAN MASHE THOMAS</td>\n",
" <td> https://opencorporates.com/officers/52442487</td>\n",
" <td> director</td>\n",
" <td> 2014-08-12T19:10:55+00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 100 BLACK MEN OF LONDON</td>\n",
" <td> 04200429</td>\n",
" <td> 39746460</td>\n",
" <td> gb</td>\n",
" <td> JONATHAN MASHE THOMAS</td>\n",
" <td> https://opencorporates.com/officers/39746460</td>\n",
" <td> secretary</td>\n",
" <td> 2014-08-13T17:37:31+00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 100 BLACK MEN OF LONDON</td>\n",
" <td> 04200429</td>\n",
" <td> 38555524</td>\n",
" <td> gb</td>\n",
" <td> JONATHAN MASHE THOMAS</td>\n",
" <td> https://opencorporates.com/officers/38555524</td>\n",
" <td> director</td>\n",
" <td> 2014-08-13T17:37:31+00:00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 134,
"text": [
" company_name company_number id jurisdiction_code \\\n",
"0 MARSH ANALYTICS LTD 07057484 58417258 gb \n",
"1 AFRICAN & CARIBBEAN DIVERSITY 06513287 52442487 gb \n",
"2 100 BLACK MEN OF LONDON 04200429 39746460 gb \n",
"3 100 BLACK MEN OF LONDON 04200429 38555524 gb \n",
"\n",
" name opencorporates_url \\\n",
"0 JONATHAN MASHE THOMAS https://opencorporates.com/officers/58417258 \n",
"1 JONATHAN MASHE THOMAS https://opencorporates.com/officers/52442487 \n",
"2 JONATHAN MASHE THOMAS https://opencorporates.com/officers/39746460 \n",
"3 JONATHAN MASHE THOMAS https://opencorporates.com/officers/38555524 \n",
"\n",
" position retrieved_at \n",
"0 director 2014-06-25T14:08:25+00:00 \n",
"1 director 2014-08-12T19:10:55+00:00 \n",
"2 secretary 2014-08-13T17:37:31+00:00 \n",
"3 director 2014-08-13T17:37:31+00:00 "
]
}
],
"prompt_number": 134
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#If we use the join to get a list of company numbers for companies in the 360 charities data,\n",
"#we can then look these up in the Companies House datatable to see what postcodes they correspond to,\n",
"#and then group and count on this basis.\n",
"#That is, for compnay matched charities, what are the most popular postcodes?\n",
"pcc=psql(\"SELECT regaddress_postcode, COUNT(*) FROM companydata \\\n",
" WHERE companynumber IN (SELECT DISTINCT co.companynumber \\\n",
" FROM charities360data ch JOIN companydata co \\\n",
" ON ch.recipientorganization_name_upper = co.companyname) \\\n",
" AND regaddress_postcode !='NaN' \\\n",
" GROUP BY regaddress_postcode HAVING COUNT(*)> 5 ORDER BY COUNT(*) DESC;\",engine)\n",
"pcc"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>regaddress_postcode</th>\n",
" <th>count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0 </th>\n",
" <td> EC4M 6YH</td>\n",
" <td> 12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1 </th>\n",
" <td> EC1V 4PW</td>\n",
" <td> 12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2 </th>\n",
" <td> IV1 1YN</td>\n",
" <td> 10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3 </th>\n",
" <td> N1 6AH</td>\n",
" <td> 10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4 </th>\n",
" <td> L2 2AH</td>\n",
" <td> 9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5 </th>\n",
" <td> CB1 2LA</td>\n",
" <td> 9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6 </th>\n",
" <td> E1 6AB</td>\n",
" <td> 8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7 </th>\n",
" <td> EC1V 4PY</td>\n",
" <td> 8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8 </th>\n",
" <td> N7 6PA</td>\n",
" <td> 8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9 </th>\n",
" <td> N9 0PZ</td>\n",
" <td> 8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td> E8 3DL</td>\n",
" <td> 8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td> CB4 0WS</td>\n",
" <td> 8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td> BN1 3XG</td>\n",
" <td> 7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td> EH7 5QY</td>\n",
" <td> 7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td> SE8 4AG</td>\n",
" <td> 7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td> TS10 4RF</td>\n",
" <td> 7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td> NW1 0NH</td>\n",
" <td> 7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td> L3 5TF</td>\n",
" <td> 7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td> WC1H 9NA</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td> BN44 3TN</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td> BS15 8DB</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td> E2 6HG</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td> EC1R 3GA</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td> EH2 2PR</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td> G1 5HD</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td> G1 5HZ</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td> G2 1BA</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td> HU19 2DL</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td> KT15 2NX</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td> L8 1XE</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td> LE1 1RE</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td> LS1 4ND</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td> OX4 2HN</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td> PH16 5BU</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td> SE1 0EH</td>\n",
" <td> 6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 104,
"text": [
" regaddress_postcode count\n",
"0 EC4M 6YH 12\n",
"1 EC1V 4PW 12\n",
"2 IV1 1YN 10\n",
"3 N1 6AH 10\n",
"4 L2 2AH 9\n",
"5 CB1 2LA 9\n",
"6 E1 6AB 8\n",
"7 EC1V 4PY 8\n",
"8 N7 6PA 8\n",
"9 N9 0PZ 8\n",
"10 E8 3DL 8\n",
"11 CB4 0WS 8\n",
"12 BN1 3XG 7\n",
"13 EH7 5QY 7\n",
"14 SE8 4AG 7\n",
"15 TS10 4RF 7\n",
"16 NW1 0NH 7\n",
"17 L3 5TF 7\n",
"18 WC1H 9NA 6\n",
"19 BN44 3TN 6\n",
"20 BS15 8DB 6\n",
"21 E2 6HG 6\n",
"22 EC1R 3GA 6\n",
"23 EH2 2PR 6\n",
"24 G1 5HD 6\n",
"25 G1 5HZ 6\n",
"26 G2 1BA 6\n",
"27 HU19 2DL 6\n",
"28 KT15 2NX 6\n",
"29 L8 1XE 6\n",
"30 LE1 1RE 6\n",
"31 LS1 4ND 6\n",
"32 OX4 2HN 6\n",
"33 PH16 5BU 6\n",
"34 SE1 0EH 6"
]
}
],
"prompt_number": 104
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There may be several reasons why the same postcode appears to be associated with the several different organisations:\n",
"\n",
"- each postcode is associated with multiple postal addresses, so the organisations may actually have different addresses;\n",
"- the companies may have been set up using a corporate services company that handles things like registered company address;\n",
"- the companies may be part of a corporate group with the same registered address;\n",
"- the companies may have registered addresses associated with a community centre or other form of civic hub."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#We can tunnel in on a particular postcode\n",
"mpc=psql(\"SELECT DISTINCT ch.fundingorganization_name , ch.recipientorganization_name_upper, co.companynumber, regaddress_addressline1 \\\n",
" FROM charities360data ch JOIN companydata co \\\n",
" ON ch.recipientorganization_name_upper = co.companyname \\\n",
" WHERE co.regaddress_postcode ='HU19 2DL';\",engine)\n",
"mpc"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>fundingorganization_name</th>\n",
" <th>recipientorganization_name_upper</th>\n",
" <th>companynumber</th>\n",
" <th>regaddress_addressline1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> Big Lottery</td>\n",
" <td> CHILDREN AND FAMILY ACTION</td>\n",
" <td> 07612029</td>\n",
" <td> 29-31 SEASIDE ROAD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> Big Lottery</td>\n",
" <td> SHORES TEAM LIMITED</td>\n",
" <td> 06288967</td>\n",
" <td> THE SHORES RESOURCE CENTRE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> Big Lottery</td>\n",
" <td> SHORES COMMUNITY DINER LIMITED</td>\n",
" <td> 06459499</td>\n",
" <td> 29-31 SEASIDE ROAD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> Sport England</td>\n",
" <td> CHILDREN AND FAMILY ACTION</td>\n",
" <td> 07612029</td>\n",
" <td> 29-31 SEASIDE ROAD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> Big Lottery</td>\n",
" <td> SEASIDE RADIO LIMITED</td>\n",
" <td> 05031695</td>\n",
" <td> 29 \u00db 31 SEASIDE ROAD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> Big Lottery</td>\n",
" <td> SOUTHERN HOLDERNESS RESOURCE CENTRE</td>\n",
" <td> 07271654</td>\n",
" <td> 29-31 SEASIDE ROAD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> Big Lottery</td>\n",
" <td> SHORES HOMECARE LIMITED</td>\n",
" <td> 06459496</td>\n",
" <td> 29-31 SEASIDE ROAD</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 106,
"text": [
" fundingorganization_name recipientorganization_name_upper companynumber \\\n",
"0 Big Lottery CHILDREN AND FAMILY ACTION 07612029 \n",
"1 Big Lottery SHORES TEAM LIMITED 06288967 \n",
"2 Big Lottery SHORES COMMUNITY DINER LIMITED 06459499 \n",
"3 Sport England CHILDREN AND FAMILY ACTION 07612029 \n",
"4 Big Lottery SEASIDE RADIO LIMITED 05031695 \n",
"5 Big Lottery SOUTHERN HOLDERNESS RESOURCE CENTRE 07271654 \n",
"6 Big Lottery SHORES HOMECARE LIMITED 06459496 \n",
"\n",
" regaddress_addressline1 \n",
"0 29-31 SEASIDE ROAD \n",
"1 THE SHORES RESOURCE CENTRE \n",
"2 29-31 SEASIDE ROAD \n",
"3 29-31 SEASIDE ROAD \n",
"4 29 \u00db 31 SEASIDE ROAD \n",
"5 29-31 SEASIDE ROAD \n",
"6 29-31 SEASIDE ROAD "
]
}
],
"prompt_number": 106
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Let's make it easier to run this sort of query...\n",
"\n",
"def searchByCompanyPostcode(pc):\n",
" mpc=psql(\"SELECT DISTINCT ch.fundingorganization_name, ch.recipientorganization_name_upper, co.companynumber, regaddress_addressline1 \\\n",
" FROM charities360data ch JOIN companydata co \\\n",
" ON ch.recipientorganization_name_upper = co.companyname \\\n",
" WHERE co.regaddress_postcode ='\"+pc+\"';\",engine)\n",
" return mpc"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 109
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"searchByCompanyPostcode('KT15 2NX')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>fundingorganization_name</th>\n",
" <th>recipientorganization_name_upper</th>\n",
" <th>companynumber</th>\n",
" <th>regaddress_addressline1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> TSB</td>\n",
" <td> THALES TRANSPORT AND SECURITY LIMITED</td>\n",
" <td> 03132438</td>\n",
" <td> 2 DASHWOOD LANG ROAD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> TSB</td>\n",
" <td> QUINTEC ASSOCIATES LIMITED</td>\n",
" <td> 02375963</td>\n",
" <td> 2 DASHWOOD LANG ROAD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> TSB</td>\n",
" <td> THALES RESEARCH &amp; TECHNOLOGY (UK) LIMITED</td>\n",
" <td> 00774298</td>\n",
" <td> 2 DASHWOOD LANG ROAD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> TSB</td>\n",
" <td> THALES ATM LIMITED</td>\n",
" <td> 03229755</td>\n",
" <td> 2 DASHWOOD LANG ROAD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> TSB</td>\n",
" <td> THALES UK LIMITED</td>\n",
" <td> 00868273</td>\n",
" <td> 2 DASHWOOD LANG ROAD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> TSB</td>\n",
" <td> THALES AVIONICS LIMITED</td>\n",
" <td> 00523160</td>\n",
" <td> 2 DASHWOOD LANG ROAD</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 110,
"text": [
" fundingorganization_name recipientorganization_name_upper \\\n",
"0 TSB THALES TRANSPORT AND SECURITY LIMITED \n",
"1 TSB QUINTEC ASSOCIATES LIMITED \n",
"2 TSB THALES RESEARCH & TECHNOLOGY (UK) LIMITED \n",
"3 TSB THALES ATM LIMITED \n",
"4 TSB THALES UK LIMITED \n",
"5 TSB THALES AVIONICS LIMITED \n",
"\n",
" companynumber regaddress_addressline1 \n",
"0 03132438 2 DASHWOOD LANG ROAD \n",
"1 02375963 2 DASHWOOD LANG ROAD \n",
"2 00774298 2 DASHWOOD LANG ROAD \n",
"3 03229755 2 DASHWOOD LANG ROAD \n",
"4 00868273 2 DASHWOOD LANG ROAD \n",
"5 00523160 2 DASHWOOD LANG ROAD "
]
}
],
"prompt_number": 110
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"searchByCompanyPostcode('N9 0PZ')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>fundingorganization_name</th>\n",
" <th>recipientorganization_name_upper</th>\n",
" <th>companynumber</th>\n",
" <th>regaddress_addressline1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> Arts Council England</td>\n",
" <td> ONE-TO-ONE (ENFIELD)</td>\n",
" <td> 03324923</td>\n",
" <td> COMMUNITY HOUSE 311 FORE STREET</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> Big Lottery</td>\n",
" <td> ENFIELD TURKISH CYPRIOT ASSOCIATION</td>\n",
" <td> 03258314</td>\n",
" <td> COMMUNITY HOUSE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> Sport England</td>\n",
" <td> ONE-TO-ONE (ENFIELD)</td>\n",
" <td> 03324923</td>\n",
" <td> COMMUNITY HOUSE 311 FORE STREET</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> Big Lottery</td>\n",
" <td> ONE-TO-ONE (ENFIELD)</td>\n",
" <td> 03324923</td>\n",
" <td> COMMUNITY HOUSE 311 FORE STREET</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> Big Lottery</td>\n",
" <td> THE ENFIELD CLUBHOUSE LIMITED</td>\n",
" <td> 04568108</td>\n",
" <td> COMMUNITY HOUSE (ROOM 10)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> Big Lottery</td>\n",
" <td> ENFIELD RACIAL EQUALITY COUNCIL</td>\n",
" <td> 04527878</td>\n",
" <td> COMMUNITY HOUSE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> Big Lottery</td>\n",
" <td> ENFIELD SAHELI</td>\n",
" <td> 06663604</td>\n",
" <td> COMMUNITY HOUSE 311 FORE STREET</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td> Big Lottery</td>\n",
" <td> ENFIELD DISABILITY ACTION</td>\n",
" <td> 03937507</td>\n",
" <td> COMMUNITY HOUSE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td> Big Lottery</td>\n",
" <td> GARGAAR SOMALI WELFARE ASSOCIATION</td>\n",
" <td> 05100011</td>\n",
" <td> COMMUNITY HOUSE 311 FORE STREET</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td> Big Lottery</td>\n",
" <td> ENFIELD VOLUNTARY ACTION</td>\n",
" <td> 03755382</td>\n",
" <td> COMMUNITY HOUSE</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 111,
"text": [
" fundingorganization_name recipientorganization_name_upper companynumber \\\n",
"0 Arts Council England ONE-TO-ONE (ENFIELD) 03324923 \n",
"1 Big Lottery ENFIELD TURKISH CYPRIOT ASSOCIATION 03258314 \n",
"2 Sport England ONE-TO-ONE (ENFIELD) 03324923 \n",
"3 Big Lottery ONE-TO-ONE (ENFIELD) 03324923 \n",
"4 Big Lottery THE ENFIELD CLUBHOUSE LIMITED 04568108 \n",
"5 Big Lottery ENFIELD RACIAL EQUALITY COUNCIL 04527878 \n",
"6 Big Lottery ENFIELD SAHELI 06663604 \n",
"7 Big Lottery ENFIELD DISABILITY ACTION 03937507 \n",
"8 Big Lottery GARGAAR SOMALI WELFARE ASSOCIATION 05100011 \n",
"9 Big Lottery ENFIELD VOLUNTARY ACTION 03755382 \n",
"\n",
" regaddress_addressline1 \n",
"0 COMMUNITY HOUSE 311 FORE STREET \n",
"1 COMMUNITY HOUSE \n",
"2 COMMUNITY HOUSE 311 FORE STREET \n",
"3 COMMUNITY HOUSE 311 FORE STREET \n",
"4 COMMUNITY HOUSE (ROOM 10) \n",
"5 COMMUNITY HOUSE \n",
"6 COMMUNITY HOUSE 311 FORE STREET \n",
"7 COMMUNITY HOUSE \n",
"8 COMMUNITY HOUSE 311 FORE STREET \n",
"9 COMMUNITY HOUSE "
]
}
],
"prompt_number": 111
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Bringing in Charities Commission Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Charities Commission don't currently publish details of their register in as a bulk download dataset, which means that to get the data we either need to scrape the site ourselves, or find an alternative source. One such source is OpenCharities, who make a scrape of the data available at [Opencharities.org](http://opencharities.org/info/about) [I'm not sure how current the scrape is?]."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Grab the data file\n",
"!wget http://OpenCharities.org/charities.csv.zip -P data\n",
" \n",
"#Uncompress it\n",
"!unzip data/charities.csv.zip -d data"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"--2014-08-18 10:57:31-- http://opencharities.org/charities.csv.zip\r\n",
"Resolving opencharities.org (opencharities.org)... "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"46.43.37.20\r\n",
"Connecting to opencharities.org (opencharities.org)|46.43.37.20|:80... "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"connected.\r\n",
"HTTP request sent, awaiting response... 200 OK\r\n",
"Length: 35713392 (34M) [application/zip]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Saving to: 'data/charities.csv.zip'\r\n",
"\r\n",
"\r",
" 0% [ ] 0 --.-K/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
" 0% [ ] 197,400 964KB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
" 2% [ ] 910,000 2.17MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
" 4% [> ] 1,628,200 2.59MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
" 6% [=> ] 2,450,000 2.92MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
" 9% [==> ] 3,386,600 3.23MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"11% [===> ] 4,258,800 3.38MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"15% [====> ] 5,373,200 3.65MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"17% [=====> ] 6,298,392 3.72MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"20% [=======> ] 7,420,000 3.89MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"23% [========> ] 8,453,200 3.99MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"26% [=========> ] 9,564,800 4.10MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"28% [==========> ] 10,080,000 3.94MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"31% [===========> ] 11,142,600 4.02MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"34% [============> ] 12,256,792 4.09MB/s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"37% [=============> ] 13,405,000 4.18MB/s eta 5s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"40% [==============> ] 14,379,400 4.38MB/s eta 5s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"43% [===============> ] 15,561,000 4.55MB/s eta 5s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"46% [=================> ] 16,585,800 4.63MB/s eta 5s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"49% [==================> ] 17,777,200 4.78MB/s eta 5s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"52% [===================> ] 18,867,592 4.82MB/s eta 4s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"56% [=====================> ] 20,217,400 4.94MB/s eta 4s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"60% [======================> ] 21,537,600 5.05MB/s eta 4s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"63% [=======================> ] 22,743,000 5.14MB/s eta 4s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"66% [=========================> ] 23,853,560 5.15MB/s eta 4s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"70% [==========================> ] 25,020,800 5.17MB/s eta 2s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"73% [===========================> ] 26,115,600 5.17MB/s eta 2s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"76% [============================> ] 27,183,384 5.31MB/s eta 2s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"79% [=============================> ] 28,344,400 5.40MB/s eta 2s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"82% [===============================> ] 29,417,992 4.70MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"86% [================================> ] 30,973,392 4.78MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"89% [==================================> ] 32,089,192 4.84MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"93% [===================================> ] 33,306,000 4.79MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"95% [====================================> ] 34,183,800 4.81MB/s eta 1s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"99% [=====================================> ] 35,389,200 4.79MB/s eta 0s "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r",
"100%[======================================>] 35,713,392 4.78MB/s in 7.4s \r\n",
"\r\n",
"2014-08-18 10:57:43 (4.59 MB/s) - 'data/charities.csv.zip' saved [35713392/35713392]\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Archive: data/charities.csv.zip\r\n",
" inflating: data/charities.csv "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!head -n 3 data/charities.csv"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"title,charity_number,activities,contact_name,address,website,telephone,date_registered,date_removed,accounts_date,spending,income,company_number,openlylocal_url,twitter_account_name,facebook_account_name,youtube_account_name,feed_url,charity_classification_uids,signed_up_for_1010,last_checked,created_at,updated_at\r\n",
"ASSOCIATION FOR INDUSTRIAL ARCHAEOLOGY,277511,\"THE AIA IS THE NATIONAL ORGANISATION FOR PEOPLE WHO SHARE AN INTEREST IN BRITAIN'S INDUSTRIAL PAST. IT BRINGS TOGETHER PEOPLE WHO ARE RESEARCHING, RECORDING, PRESERVING AND PRESENTING THE GREAT VARIETY OF THIS COUNTRY'S INDUSTRIAL HERITAGE.\",MR DAVID DE HAAN MSC DIC FMA,\"7 ST. MICHAELS CLOSE, MADELEY, TELFORD, TF7 5SD\",http://www.industrial-archaeology.org,01952 435934,1979-08-22,,2010-12-31,,,1326854,http://OpenlyLocal.com/charities/1-ASSOCIATION-FOR-INDUSTRIAL-ARCHAEOLOGY,,,,,\"102,307,207,308,112\",false,2012-01-19T14:10:19+00:00,2010-08-22T20:38:01+01:00,2012-04-15T10:31:36+01:00\r\n",
"INTERNATIONAL SOCIETY OF CHEMOTHERAPY FOR INFECTION AND CANCER,1029653,\"OBJECTIVES ARE TO ADVANCE THE EDUCATION AND SCIENCE OF CHEMOTHERAPY. INTERNATIONAL CONGRESSES PROMOTING THE DEVELOPMENT OF CHEMOTHERAPY ARE HELD REGULARLY. ISC ENCOURAGES, SUPPORTS AND SPONSORS THE FORMATION OF INTERNATIONAL WORKING GROUPS FOR SPECIAL RESEARCH PROJECTS, COMMISSIONS, TRAINING PROJECTS AND THE USE OF CHEMOTHERAPEUTICS IN THE FIELD OF ANTIMICROBIAL AND ANTINEOPLASTIC CHEMOTHERAPY.\",FIONA MACKENZIE,\"MEDICAL MICROBIOLOGY, ABERDEEN ROYAL INFIRMARY, CORNHILL ROAD, ABERDEEN, AB25 2ZN\",http://www.ischemo.org,01224 552127,1993-12-03,,2008-12-31,,,,http://OpenlyLocal.com/charities/2-INTERNATIONAL-SOCIETY-OF-CHEMOTHERAPY-FOR-INFECTION-AND-CANCER,,,,,\"304,302,306,301,308,206,102,207,103\",false,2010-09-17T15:39:54+01:00,2010-08-22T20:38:01+01:00,2012-04-15T10:56:52+01:00\r\n"
]
}
],
"prompt_number": 157
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Find out how wide each daata field is to help us set up the database table\n",
"fname='data/charities.csv'\n",
"\n",
"df=pd.read_csv(fname)\n",
"\n",
"maxlens={}\n",
"for name in df.columns: maxlens[name]=0\n",
" \n",
"chunks=pd.read_csv(fname,chunksize=10000)\n",
"for chunk in chunks:\n",
" colMaxLen(chunk,maxlens)\n",
"maxlens"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"{'charity_classification_uids': 135,\n",
" 'openlylocal_url': 189,\n",
" 'charity_number': 11,\n",
" 'income': 11,\n",
" 'created_at': 25,\n",
" 'contact_name': 146,\n",
" 'address': 194,\n",
" 'youtube_account_name': 255,\n",
" 'company_number': 20,\n",
" 'twitter_account_name': 255,\n",
" 'title': 150,\n",
" 'spending': 11,\n",
" 'accounts_date': 10,\n",
" 'updated_at': 25,\n",
" 'website': 129,\n",
" 'date_removed': 10,\n",
" 'last_checked': 25,\n",
" 'activities': 445,\n",
" 'feed_url': 163,\n",
" 'date_registered': 10,\n",
" 'telephone': 35,\n",
" 'signed_up_for_1010': 5,\n",
" 'facebook_account_name': 136}"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#If the table we're going to use already exists, get rid of it - we're going for a fresh start...\n",
"psqlx(\"DROP TABLE IF EXISTS charitiescommission\",engine)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 74,
"text": [
"<sqlalchemy.engine.result.ResultProxy at 0x7f7b02e18320>"
]
}
],
"prompt_number": 74
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Postgresql defaults to lower caps - we can force capitalisation on a table name by double quoting it\n",
"#psqlx('DROP TABLE IF EXISTS \"charitiesCommission\"',engine)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 75
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Create the table using fields a bit bigger than the max width\n",
"sql=\"CREATE TABLE charitiescommission (\"\n",
"colstr='charity_name_upper'+\" VARCHAR (\"+str(maxlens['title']+5)+\"), pcode_guess VARCHAR (10)\"\n",
"for i in maxlens:\n",
" tmp=i+\" VARCHAR (\"+str(maxlens[i]+5)+\")\"\n",
" colstr=\", \".join([colstr,tmp])\n",
" \n",
"psqlx( sql +colstr +\");\",engine)\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 76,
"text": [
"<sqlalchemy.engine.result.ResultProxy at 0x7f7afa46c320>"
]
}
],
"prompt_number": 76
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Get the data into the postgresql database, via a pandas dataframe\n",
"chunks=pd.read_csv(fname,chunksize=10000, dtype ={'charity_number':str})\n",
"for chunk in chunks:\n",
" for col in ['title']:\n",
" chunk[col]=chunk[col].str.strip()\n",
" #I'm going to add another column which is a guess at the postcode extracted from the address\n",
" chunk['pcode_guess']=chunk['address'].str.extract('([A-Z]{1,2}[0-9]{1,2} [0-9][A-Z]{2})')\n",
" #Let's also get a normalised, upper case variant of the charity name\n",
" chunk['charity_name_upper']=chunk['title'].str.upper()\n",
" chunk.to_sql('charitiescommission', engine, index=False, if_exists='append')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 77
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#A quick test query to see what we can join with the 360 charities data using charity name...\n",
"chcnum=psql(\"SELECT DISTINCT ch.recipientorganization_name_upper, cc.charity_number, cc.company_number, cc.pcode_guess \\\n",
" FROM charities360data ch JOIN charitiescommission cc \\\n",
" ON ch.recipientorganization_name_upper = cc.charity_name_upper ;\",engine)\n",
"chcnum[:10]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>recipientorganization_name_upper</th>\n",
" <th>charity_number</th>\n",
" <th>company_number</th>\n",
" <th>pcode_guess</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 100 BLACK MEN OF LONDON</td>\n",
" <td> 1105904</td>\n",
" <td> 4200429</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 101 PLAYGROUP</td>\n",
" <td> 1101406</td>\n",
" <td> 4945702.0</td>\n",
" <td> N8 9NH</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 101 PLAYGROUP</td>\n",
" <td> 803462</td>\n",
" <td> NaN</td>\n",
" <td> N8 9BG</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP</td>\n",
" <td> 516163</td>\n",
" <td> NaN</td>\n",
" <td> S35 4FP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 1066 CHILDMINDING ASSOCIATION</td>\n",
" <td> 1029817</td>\n",
" <td> NaN</td>\n",
" <td> TN37 7RG</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> 10TH BATH (COMBE DOWN) SCOUT GROUP</td>\n",
" <td> 305613</td>\n",
" <td> NaN</td>\n",
" <td> BA2 5EQ</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> 10TH BIRKENHEAD SCOUT GROUP</td>\n",
" <td> 520165</td>\n",
" <td> NaN</td>\n",
" <td> CH63 8NZ</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td> 10TH OXFORD BOY SCOUT GROUP</td>\n",
" <td> 304435</td>\n",
" <td> NaN</td>\n",
" <td> OX29 8JZ</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td> 10TH ROYSTON SCOUT GROUP</td>\n",
" <td> 1017455</td>\n",
" <td> NaN</td>\n",
" <td> SG8 7XE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td> 10TH SEVENOAKS BOY SCOUT GROUP</td>\n",
" <td> 303442</td>\n",
" <td> NaN</td>\n",
" <td> TN11 9BG</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 95,
"text": [
" recipientorganization_name_upper charity_number company_number \\\n",
"0 100 BLACK MEN OF LONDON 1105904 4200429 \n",
"1 101 PLAYGROUP 1101406 4945702.0 \n",
"2 101 PLAYGROUP 803462 NaN \n",
"3 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP 516163 NaN \n",
"4 1066 CHILDMINDING ASSOCIATION 1029817 NaN \n",
"5 10TH BATH (COMBE DOWN) SCOUT GROUP 305613 NaN \n",
"6 10TH BIRKENHEAD SCOUT GROUP 520165 NaN \n",
"7 10TH OXFORD BOY SCOUT GROUP 304435 NaN \n",
"8 10TH ROYSTON SCOUT GROUP 1017455 NaN \n",
"9 10TH SEVENOAKS BOY SCOUT GROUP 303442 NaN \n",
"\n",
" pcode_guess \n",
"0 NaN \n",
"1 N8 9NH \n",
"2 N8 9BG \n",
"3 S35 4FP \n",
"4 TN37 7RG \n",
"5 BA2 5EQ \n",
"6 CH63 8NZ \n",
"7 OX29 8JZ \n",
"8 SG8 7XE \n",
"9 TN11 9BG "
]
}
],
"prompt_number": 95
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#How many matches were there?\n",
"len(chcnum)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 113,
"text": [
"20649"
]
}
],
"prompt_number": 113
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Are there any popular postcodes?\n",
"chcnum[['pcode_guess']].groupby(['pcode_guess']).size().order(ascending=False)[:20]\n",
"\n",
"#As with populat postcodes for company registered addresses, there may be several reasons for popular charity address postcodes"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 96,
"text": [
"pcode_guess\n",
"NaN 1075\n",
"W10 5XL 11\n",
"N9 0PZ 10\n",
"E2 6HG 10\n",
"SE1 9BG 9\n",
"W6 9LP 9\n",
"BN1 3XG 9\n",
"CR4 3UD 9\n",
"N7 6LA 8\n",
"BS15 8DB 8\n",
"NE1 8XS 8\n",
"SE1 0EH 8\n",
"ME14 1HH 8\n",
"W13 9LA 7\n",
"W2 5ES 7\n",
"S1 4FW 7\n",
"E1 6AB 7\n",
"NE46 3NP 7\n",
"LA9 4PU 7\n",
"E2 9PL 6\n",
"dtype: int64"
]
}
],
"prompt_number": 96
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Let's have a quick look at charities registered to one of those postcodes\n",
"psql(\"SELECT DISTINCT charity_name_upper,address FROM charitiescommission WHERE pcode_guess='NE46 3NP'\",engine)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>charity_name_upper</th>\n",
" <th>address</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0 </th>\n",
" <td> TYNEDALE WOMEN'S TRAINING GROUP</td>\n",
" <td> TYNEDALE WOMEN'S TRAINING GROUP, COMMUNITY CEN...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1 </th>\n",
" <td> HEXHAM COMMUNITY CENTRE</td>\n",
" <td> HEXHAM COMMUNITY CENTRE, GILESGATE, HEXHAM, NO...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2 </th>\n",
" <td> TYNEDALE CITIZENS' ADVICE BUREAU</td>\n",
" <td> THE COMMUNITY CENTRE, GILESGATE, HEXHAM, NORTH...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3 </th>\n",
" <td> WEST NORTHUMBERLAND CITIZENS ADVICE BUREAU</td>\n",
" <td> COMMUNITY CENTRE, GILESGATE, HEXHAM, NE46 3NP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4 </th>\n",
" <td> SIXTYEIGHTYTHIRTY</td>\n",
" <td> COMMUNITY CENTRE, GILESGATE, HEXHAM, NORTHUMBE...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5 </th>\n",
" <td> TYNEDALE VOLUNTARY ACTION</td>\n",
" <td> TYNEDALE VOLUNTARY ACTION, HEXHAM COMMUNITY CE...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6 </th>\n",
" <td> GATEWAY INTO THE COMMUNITY</td>\n",
" <td> HEXHAM COMMUNITY CENTRE, GILESGATE, HEXHAM, NO...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7 </th>\n",
" <td> HEXHAM AND TYNEDALE COMMUNITY TRUST</td>\n",
" <td> HEXHAM COMMUNITY CENTRE, GILESGATE, HEXHAM, NO...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8 </th>\n",
" <td> HEXHAM YOUTH INITIATIVE</td>\n",
" <td> HEXHAM YOUTH INITIATIVE, THE COMMUNITY CENTRE,...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9 </th>\n",
" <td> SIXTYEIGHTYTHIRTY</td>\n",
" <td> COMMUNITY CENTRE, GILESGATE, HEXHAM, NE46 3NP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td> THE SAMLING FOUNDATION</td>\n",
" <td> SAMLING, HEXHAM COMMUNITY CENTRE, GILESGATE, H...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td> THE SAMLING FOUNDATION</td>\n",
" <td> HEXHAM COMMUNITY CENTRE, GILESGATE, HEXHAM, NO...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 98,
"text": [
" charity_name_upper \\\n",
"0 TYNEDALE WOMEN'S TRAINING GROUP \n",
"1 HEXHAM COMMUNITY CENTRE \n",
"2 TYNEDALE CITIZENS' ADVICE BUREAU \n",
"3 WEST NORTHUMBERLAND CITIZENS ADVICE BUREAU \n",
"4 SIXTYEIGHTYTHIRTY \n",
"5 TYNEDALE VOLUNTARY ACTION \n",
"6 GATEWAY INTO THE COMMUNITY \n",
"7 HEXHAM AND TYNEDALE COMMUNITY TRUST \n",
"8 HEXHAM YOUTH INITIATIVE \n",
"9 SIXTYEIGHTYTHIRTY \n",
"10 THE SAMLING FOUNDATION \n",
"11 THE SAMLING FOUNDATION \n",
"\n",
" address \n",
"0 TYNEDALE WOMEN'S TRAINING GROUP, COMMUNITY CEN... \n",
"1 HEXHAM COMMUNITY CENTRE, GILESGATE, HEXHAM, NO... \n",
"2 THE COMMUNITY CENTRE, GILESGATE, HEXHAM, NORTH... \n",
"3 COMMUNITY CENTRE, GILESGATE, HEXHAM, NE46 3NP \n",
"4 COMMUNITY CENTRE, GILESGATE, HEXHAM, NORTHUMBE... \n",
"5 TYNEDALE VOLUNTARY ACTION, HEXHAM COMMUNITY CE... \n",
"6 HEXHAM COMMUNITY CENTRE, GILESGATE, HEXHAM, NO... \n",
"7 HEXHAM COMMUNITY CENTRE, GILESGATE, HEXHAM, NO... \n",
"8 HEXHAM YOUTH INITIATIVE, THE COMMUNITY CENTRE,... \n",
"9 COMMUNITY CENTRE, GILESGATE, HEXHAM, NE46 3NP \n",
"10 SAMLING, HEXHAM COMMUNITY CENTRE, GILESGATE, H... \n",
"11 HEXHAM COMMUNITY CENTRE, GILESGATE, HEXHAM, NO... "
]
}
],
"prompt_number": 98
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Although the Charity Commission doesn't publish an API, Open Charities does... \n",
"import json, requests\n",
"\n",
"def opencharitiesLookup(id):\n",
" url = 'http://opencharities.org/charities/{0}.json'.format(id)\n",
" jsondata = json.loads(requests.get(url).text)\n",
" return jsondata\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 158
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Lets see what we can back...\n",
"charityNum='202918'\n",
"\n",
"jdata = opencharitiesLookup(charityNum)\n",
"jdata"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 136,
"text": [
"{'charity': {'financial_breakdown': {'assets': {'other_assets': '108400000',\n",
" 'total_liabilities': '-68600000',\n",
" 'long_term_investments': '3000000',\n",
" 'own_use_assets': '13100000'},\n",
" 'income': {'charitable': '166000000',\n",
" 'investment': '900000',\n",
" 'voluntary': '111500000',\n",
" 'other': '800000',\n",
" 'trading': '88700000',\n",
" 'investment_gains': '0'},\n",
" 'spending': {'investment_management': '100000',\n",
" 'governance': '1300000',\n",
" 'generating_voluntary_income': '23200000',\n",
" 'other': '1400000',\n",
" 'trading': '68600000',\n",
" 'charitable_activities': '290000000'}},\n",
" 'activities': \"OXFAM'S OBJECTS ARE TO PREVENT AND RELIEVE POVERTY AND PROTECT THE VULNERABLE ANYWHERE IN THE WORLD. OXFAM FURTHERS ITS OBJECTS THROUGH INTERLINKED ACTIVITIES OF HUMANITARIAN RELIEF, DEVELOPMENT WORK AND ADVOCACY AND CAMPAIGNING. OXFAM IS AN AFFILIATE MEMBER OF OXFAM INTERNATIONAL. DETAILS OF OXFAM'S MISSION AND WAYS OF WORKING CAN BE FOUND AT WWW.OXFAM.ORG.UK\",\n",
" 'updated_at': '2014-06-11T20:59:28Z',\n",
" 'income': None,\n",
" 'grants': [{'classification_id': None,\n",
" 'cost_centre': None,\n",
" 'transaction_type': 'LotteryGrant',\n",
" 'invoice_date': None,\n",
" 'description': 'Roars Not Whispers Is A 3-Year Youth Led Programme That Will Build The Capacity Of Young People To Represent Themselves And Others, Take Action On Issues Of Concern And Bring These Issues Further Up The Public And Political Agenda In Scotland. Oxfam And The Scottish Youth Parliament Will Deliver The Work With A Wide Range Of Other Youth Focused Groups Involved. Young People From Every Local Authority Area In Scotland Will Be Trained As Peer Educators. They Will Then Pass On Skills To Other Young People In',\n",
" 'date_fuzziness': None,\n",
" 'uid': None,\n",
" 'created_at': '2010-09-23T09:51:14+01:00',\n",
" 'id': 243306,\n",
" 'supplier_id': 47408,\n",
" 'value': 1187948.0,\n",
" 'department_name': None,\n",
" 'date': '2006-06-30',\n",
" 'invoice_number': None,\n",
" 'source_url': 'http://www.lottery.culture.gov.uk/details.asp?ID=YSC/1/010201390&DBID=BL',\n",
" 'csv_line_number': None,\n",
" 'service': 'Health, Education, Environment and Charitable Expenditure',\n",
" 'updated_at': '2010-09-23T09:51:14+01:00'},\n",
" {'classification_id': None,\n",
" 'cost_centre': None,\n",
" 'transaction_type': 'LotteryGrant',\n",
" 'invoice_date': None,\n",
" 'description': 'This project will work in partnership with Refugee Community Organisations (grass roots refugee and asylum seeker organisations) in Swansea, Newport and Cardiff to address the particular issues faced by refugee and asylum seeking women. This will include provision of drop-ins and activities to reduce social isolation. It will provide training to women to become champions for their community and advocates for better services to meet the specific needs of the target beneficiaries.',\n",
" 'date_fuzziness': None,\n",
" 'uid': None,\n",
" 'created_at': '2010-09-23T10:41:31+01:00',\n",
" 'id': 288237,\n",
" 'supplier_id': 47408,\n",
" 'value': 247411.0,\n",
" 'department_name': None,\n",
" 'date': '2008-04-25',\n",
" 'invoice_number': None,\n",
" 'source_url': 'http://www.lottery.culture.gov.uk/details.asp?ID=PPA/1/010248997&DBID=BL',\n",
" 'csv_line_number': None,\n",
" 'service': 'Health, Education, Environment and Charitable Expenditure',\n",
" 'updated_at': '2010-09-23T10:41:31+01:00'}],\n",
" 'area_of_benefit': 'NATIONAL AND OVERSEAS',\n",
" 'contact_name': 'MR JOSS SAUNDERS',\n",
" 'created_at': '2010-08-23T00:02:03+01:00',\n",
" 'annual_reports': [{'endowment_funds': 2600000,\n",
" 'grants_to_institutions': 46000000,\n",
" 'financial_year_end': '2007-04-30',\n",
" 'support_costs': 21300000,\n",
" 'voluntary_income': 147800000,\n",
" 'total_current_assets': 99200000,\n",
" 'created_at': '2010-10-26T22:44:03+01:00',\n",
" 'reserves': 48600000,\n",
" 'other_expenses': 800000,\n",
" 'income_from_endowments': 0,\n",
" 'cash': 75700000,\n",
" 'investment_gains': 0,\n",
" 'total_expenses': 297200000,\n",
" 'current_investment_assets': 0,\n",
" 'charity_only_accounts': None,\n",
" 'charitable_activities_costs': 211200000,\n",
" 'total_funds': 70700000,\n",
" 'investment_management_costs': 100000,\n",
" 'fixed_investment_assets_at_end_of_year': 2900000,\n",
" 'voluntary_income_costs': 23900000,\n",
" 'fixed_assets_at_end_of_year': 19800000,\n",
" 'other_income': 1100000,\n",
" 'long_term_creditors_or_provisions': 9900000,\n",
" 'annual_return_code': 'AR07',\n",
" 'employees': 5728,\n",
" 'consolidated_accounts': True,\n",
" 'fixed_assets_at_start_of_year': 20300000,\n",
" 'updated_at': '2010-10-26T22:44:03+01:00',\n",
" 'total_income': 290700000,\n",
" 'charity_id': 92548,\n",
" 'gains_from_asset_revaluations': 0,\n",
" 'financial_year_start': '2006-05-01',\n",
" 'depreciation': 4600000,\n",
" 'investment_income': 3300000,\n",
" 'income_from_charitable_activities': 62700000,\n",
" 'activities_generating_funds': 75800000,\n",
" 'creditors_within_1_year': 18300000,\n",
" 'fundraising_trading_costs': 60000000,\n",
" 'total_assets': 70700000,\n",
" 'gains_on_pension_fund': 2000000,\n",
" 'fixed_investment_assets_at_start_of_year': 2900000,\n",
" 'volunteers': 21000,\n",
" 'pension_assets': -20100000,\n",
" 'governance_costs': 1200000,\n",
" 'id': 103,\n",
" 'income_from_legacies': 12300000,\n",
" 'restricted_funds': 24500000,\n",
" 'unrestricted_funds': 43600000},\n",
" {'endowment_funds': 2700000,\n",
" 'grants_to_institutions': 53400000,\n",
" 'financial_year_end': '2008-04-30',\n",
" 'support_costs': 25400000,\n",
" 'voluntary_income': 142100000,\n",
" 'total_current_assets': 99400000,\n",
" 'created_at': '2010-10-26T22:44:03+01:00',\n",
" 'reserves': 44600000,\n",
" 'other_expenses': 600000,\n",
" 'income_from_endowments': 0,\n",
" 'cash': 68300000,\n",
" 'investment_gains': 100000,\n",
" 'total_expenses': 298400000,\n",
" 'current_investment_assets': 0,\n",
" 'charity_only_accounts': None,\n",
" 'charitable_activities_costs': 212300000,\n",
" 'total_funds': 76900000,\n",
" 'investment_management_costs': 100000,\n",
" 'fixed_investment_assets_at_end_of_year': 3300000,\n",
" 'voluntary_income_costs': 23500000,\n",
" 'fixed_assets_at_end_of_year': 18500000,\n",
" 'other_income': 4800000,\n",
" 'long_term_creditors_or_provisions': 12100000,\n",
" 'annual_return_code': 'AR08',\n",
" 'employees': 4800,\n",
" 'consolidated_accounts': True,\n",
" 'fixed_assets_at_start_of_year': 19800000,\n",
" 'updated_at': '2010-10-26T22:44:03+01:00',\n",
" 'total_income': 299700000,\n",
" 'charity_id': 92548,\n",
" 'gains_from_asset_revaluations': 0,\n",
" 'financial_year_start': '2007-05-01',\n",
" 'depreciation': 4700000,\n",
" 'investment_income': 3600000,\n",
" 'income_from_charitable_activities': 71500000,\n",
" 'activities_generating_funds': 77700000,\n",
" 'creditors_within_1_year': 15900000,\n",
" 'fundraising_trading_costs': 60600000,\n",
" 'total_assets': 76900000,\n",
" 'gains_on_pension_fund': 4800000,\n",
" 'fixed_investment_assets_at_start_of_year': 2900000,\n",
" 'volunteers': 20000,\n",
" 'pension_assets': -13000000,\n",
" 'governance_costs': 1300000,\n",
" 'id': 104,\n",
" 'income_from_legacies': 11600000,\n",
" 'restricted_funds': 27400000,\n",
" 'unrestricted_funds': 46800000},\n",
" {'endowment_funds': 2400000,\n",
" 'grants_to_institutions': 53900000,\n",
" 'financial_year_end': '2009-04-30',\n",
" 'support_costs': 28200000,\n",
" 'voluntary_income': 133900000,\n",
" 'total_current_assets': 90500000,\n",
" 'created_at': '2010-10-26T22:44:03+01:00',\n",
" 'reserves': 34700000,\n",
" 'other_expenses': 200000,\n",
" 'income_from_endowments': 0,\n",
" 'cash': 45700000,\n",
" 'investment_gains': 0,\n",
" 'total_expenses': 318600000,\n",
" 'current_investment_assets': 0,\n",
" 'charity_only_accounts': None,\n",
" 'charitable_activities_costs': 234400000,\n",
" 'total_funds': 52300000,\n",
" 'investment_management_costs': 100000,\n",
" 'fixed_investment_assets_at_end_of_year': 2900000,\n",
" 'voluntary_income_costs': 19800000,\n",
" 'fixed_assets_at_end_of_year': 17200000,\n",
" 'other_income': 4200000,\n",
" 'long_term_creditors_or_provisions': 11600000,\n",
" 'annual_return_code': 'AR09',\n",
" 'employees': 4624,\n",
" 'consolidated_accounts': True,\n",
" 'fixed_assets_at_start_of_year': 18500000,\n",
" 'updated_at': '2010-10-26T22:44:03+01:00',\n",
" 'total_income': 308300000,\n",
" 'charity_id': 92548,\n",
" 'gains_from_asset_revaluations': 0,\n",
" 'financial_year_start': '2008-05-01',\n",
" 'depreciation': 4500000,\n",
" 'investment_income': 2500000,\n",
" 'income_from_charitable_activities': 88400000,\n",
" 'activities_generating_funds': 79300000,\n",
" 'creditors_within_1_year': 19300000,\n",
" 'fundraising_trading_costs': 62900000,\n",
" 'total_assets': 52300000,\n",
" 'gains_on_pension_fund': -14300000,\n",
" 'fixed_investment_assets_at_start_of_year': 3300000,\n",
" 'volunteers': None,\n",
" 'pension_assets': -24500000,\n",
" 'governance_costs': 1200000,\n",
" 'id': 105,\n",
" 'income_from_legacies': 10500000,\n",
" 'restricted_funds': 24000000,\n",
" 'unrestricted_funds': 25900000},\n",
" {'endowment_funds': 2500000,\n",
" 'grants_to_institutions': 54300000,\n",
" 'financial_year_end': '2010-03-31',\n",
" 'support_costs': 21500000,\n",
" 'voluntary_income': 125600000,\n",
" 'total_current_assets': 113100000,\n",
" 'created_at': '2011-03-05T22:23:09+00:00',\n",
" 'reserves': 38100000,\n",
" 'other_expenses': 2000000,\n",
" 'income_from_endowments': 100000,\n",
" 'cash': 66500000,\n",
" 'investment_gains': 0,\n",
" 'total_expenses': 294800000,\n",
" 'current_investment_assets': 0,\n",
" 'charity_only_accounts': None,\n",
" 'charitable_activities_costs': 216300000,\n",
" 'total_funds': 68300000,\n",
" 'investment_management_costs': 100000,\n",
" 'fixed_investment_assets_at_end_of_year': 3000000,\n",
" 'voluntary_income_costs': 17200000,\n",
" 'fixed_assets_at_end_of_year': 15600000,\n",
" 'other_income': 2800000,\n",
" 'long_term_creditors_or_provisions': 12800000,\n",
" 'annual_return_code': 'AR10',\n",
" 'employees': 4371,\n",
" 'consolidated_accounts': True,\n",
" 'fixed_assets_at_start_of_year': 17200000,\n",
" 'updated_at': '2011-03-05T22:23:09+00:00',\n",
" 'total_income': 318000000,\n",
" 'charity_id': 92548,\n",
" 'gains_from_asset_revaluations': 0,\n",
" 'financial_year_start': '2009-04-01',\n",
" 'depreciation': 4300000,\n",
" 'investment_income': 900000,\n",
" 'income_from_charitable_activities': 114300000,\n",
" 'activities_generating_funds': 74400000,\n",
" 'creditors_within_1_year': 18300000,\n",
" 'fundraising_trading_costs': 58100000,\n",
" 'total_assets': 68300000,\n",
" 'gains_on_pension_fund': -7200000,\n",
" 'fixed_investment_assets_at_start_of_year': 2900000,\n",
" 'volunteers': None,\n",
" 'pension_assets': -29300000,\n",
" 'governance_costs': 1100000,\n",
" 'id': 27676,\n",
" 'income_from_legacies': 13300000,\n",
" 'restricted_funds': 39300000,\n",
" 'unrestricted_funds': 26500000},\n",
" {'endowment_funds': 2500000,\n",
" 'grants_to_institutions': 72500000,\n",
" 'financial_year_end': '2011-03-31',\n",
" 'support_costs': 26800000,\n",
" 'voluntary_income': 138400000,\n",
" 'total_current_assets': 120400000,\n",
" 'created_at': '2012-01-26T23:44:12+00:00',\n",
" 'reserves': 41400000,\n",
" 'other_expenses': 1200000,\n",
" 'income_from_endowments': 0,\n",
" 'cash': 73700000,\n",
" 'investment_gains': 400000,\n",
" 'total_expenses': 361100000,\n",
" 'current_investment_assets': 0,\n",
" 'charity_only_accounts': None,\n",
" 'charitable_activities_costs': 272700000,\n",
" 'total_funds': 83400000,\n",
" 'investment_management_costs': 200000,\n",
" 'fixed_investment_assets_at_end_of_year': 3400000,\n",
" 'voluntary_income_costs': 20800000,\n",
" 'fixed_assets_at_end_of_year': 15100000,\n",
" 'other_income': 1800000,\n",
" 'long_term_creditors_or_provisions': 10700000,\n",
" 'annual_return_code': 'AR11',\n",
" 'employees': 4689,\n",
" 'consolidated_accounts': True,\n",
" 'fixed_assets_at_start_of_year': 15600000,\n",
" 'updated_at': '2012-01-26T23:44:12+00:00',\n",
" 'total_income': 367500000,\n",
" 'charity_id': 92548,\n",
" 'gains_from_asset_revaluations': 0,\n",
" 'financial_year_start': '2010-04-01',\n",
" 'depreciation': 3900000,\n",
" 'investment_income': 900000,\n",
" 'income_from_charitable_activities': 140500000,\n",
" 'activities_generating_funds': 85900000,\n",
" 'creditors_within_1_year': 23100000,\n",
" 'fundraising_trading_costs': 65000000,\n",
" 'total_assets': 83400000,\n",
" 'gains_on_pension_fund': 8300000,\n",
" 'fixed_investment_assets_at_start_of_year': 3000000,\n",
" 'volunteers': 22000,\n",
" 'pension_assets': -18300000,\n",
" 'governance_costs': 1200000,\n",
" 'id': 31739,\n",
" 'income_from_legacies': 14100000,\n",
" 'restricted_funds': 38000000,\n",
" 'unrestricted_funds': 42900000},\n",
" {'endowment_funds': 2600000,\n",
" 'grants_to_institutions': 90600000,\n",
" 'financial_year_end': '2012-03-31',\n",
" 'support_costs': 30600000,\n",
" 'voluntary_income': 129700000,\n",
" 'total_current_assets': 122700000,\n",
" 'created_at': '2014-06-12T22:47:36+01:00',\n",
" 'reserves': 39200000,\n",
" 'other_expenses': 300000,\n",
" 'income_from_endowments': 100000,\n",
" 'cash': 63300000,\n",
" 'investment_gains': 400000,\n",
" 'total_expenses': 378700000,\n",
" 'current_investment_assets': 0,\n",
" 'charity_only_accounts': None,\n",
" 'charitable_activities_costs': 286400000,\n",
" 'total_funds': 73900000,\n",
" 'investment_management_costs': 100000,\n",
" 'fixed_investment_assets_at_end_of_year': 3000000,\n",
" 'voluntary_income_costs': 23000000,\n",
" 'fixed_assets_at_end_of_year': 14800000,\n",
" 'other_income': 2300000,\n",
" 'long_term_creditors_or_provisions': 12500000,\n",
" 'annual_return_code': 'AR12',\n",
" 'employees': 4885,\n",
" 'consolidated_accounts': True,\n",
" 'fixed_assets_at_start_of_year': 15100000,\n",
" 'updated_at': '2014-06-12T22:47:36+01:00',\n",
" 'total_income': 385500000,\n",
" 'charity_id': 92548,\n",
" 'gains_from_asset_revaluations': 0,\n",
" 'financial_year_start': '2011-04-01',\n",
" 'depreciation': 3500000,\n",
" 'investment_income': 1200000,\n",
" 'income_from_charitable_activities': 162400000,\n",
" 'activities_generating_funds': 89900000,\n",
" 'creditors_within_1_year': 20500000,\n",
" 'fundraising_trading_costs': 67600000,\n",
" 'total_assets': 73900000,\n",
" 'gains_on_pension_fund': -16300000,\n",
" 'fixed_investment_assets_at_start_of_year': 3400000,\n",
" 'volunteers': 22000,\n",
" 'pension_assets': -30600000,\n",
" 'governance_costs': 1300000,\n",
" 'id': 45244,\n",
" 'income_from_legacies': 15000000,\n",
" 'restricted_funds': 39100000,\n",
" 'unrestricted_funds': 32200000},\n",
" {'endowment_funds': 2600000,\n",
" 'grants_to_institutions': 79300000,\n",
" 'financial_year_end': '2013-03-31',\n",
" 'support_costs': 27700000,\n",
" 'voluntary_income': 111500000,\n",
" 'total_current_assets': 108400000,\n",
" 'created_at': '2014-06-12T22:47:36+01:00',\n",
" 'reserves': 26500000,\n",
" 'other_expenses': 1400000,\n",
" 'income_from_endowments': 0,\n",
" 'cash': 52900000,\n",
" 'investment_gains': 0,\n",
" 'total_expenses': 384600000,\n",
" 'current_investment_assets': 0,\n",
" 'charity_only_accounts': None,\n",
" 'charitable_activities_costs': 290000000,\n",
" 'total_funds': 55900000,\n",
" 'investment_management_costs': 100000,\n",
" 'fixed_investment_assets_at_end_of_year': 3000000,\n",
" 'voluntary_income_costs': 23200000,\n",
" 'fixed_assets_at_end_of_year': 16100000,\n",
" 'other_income': 800000,\n",
" 'long_term_creditors_or_provisions': 14100000,\n",
" 'annual_return_code': 'AR13',\n",
" 'employees': 5046,\n",
" 'consolidated_accounts': True,\n",
" 'fixed_assets_at_start_of_year': 14800000,\n",
" 'updated_at': '2014-06-12T22:47:36+01:00',\n",
" 'total_income': 367900000,\n",
" 'charity_id': 92548,\n",
" 'gains_from_asset_revaluations': 0,\n",
" 'financial_year_start': '2012-04-01',\n",
" 'depreciation': 3200000,\n",
" 'investment_income': 900000,\n",
" 'income_from_charitable_activities': 166000000,\n",
" 'activities_generating_funds': 88700000,\n",
" 'creditors_within_1_year': 25900000,\n",
" 'fundraising_trading_costs': 68600000,\n",
" 'total_assets': 55900000,\n",
" 'gains_on_pension_fund': -1300000,\n",
" 'fixed_investment_assets_at_start_of_year': 3000000,\n",
" 'volunteers': 22000,\n",
" 'pension_assets': -28600000,\n",
" 'governance_costs': 1300000,\n",
" 'id': 45245,\n",
" 'income_from_legacies': 11800000,\n",
" 'restricted_funds': 31700000,\n",
" 'unrestricted_funds': 21600000}],\n",
" 'corrected_company_number': '00612172',\n",
" 'charity_number': '202918',\n",
" 'address': {'lat': 51.7332804,\n",
" 'addressee_type': 'Charity',\n",
" 'postal_code': 'OX4 2JY',\n",
" 'updated_at': '2012-04-30T01:03:50+01:00',\n",
" 'locality': 'OXFORD',\n",
" 'addressee_id': 92548,\n",
" 'raw_address': None,\n",
" 'created_at': '2010-08-31T23:07:32+01:00',\n",
" 'region': None,\n",
" 'id': 136328,\n",
" 'former': False,\n",
" 'country': None,\n",
" 'street_address': 'OXFAM GB, OXFAM, 2700 JOHN SMITH DRIVE, OXFORD BUSINESS PARK SOUTH',\n",
" 'lng': -1.2060825},\n",
" 'youtube_account_name': 'OxfamGreatBritain',\n",
" 'company_number': '612172',\n",
" 'normalised_title': 'oxfam',\n",
" 'twitter_account_name': 'oxfamgb',\n",
" 'classifications': [{'id': 678,\n",
" 'parent_id': None,\n",
" 'extended_title': None,\n",
" 'updated_at': '2012-01-26T08:54:30+00:00',\n",
" 'grouping': 'CharityClassification',\n",
" 'uid': '202',\n",
" 'title': 'ELDERLY/OLD PEOPLE',\n",
" 'created_at': '2010-10-26T07:28:30+01:00'},\n",
" {'id': 691,\n",
" 'parent_id': None,\n",
" 'extended_title': None,\n",
" 'updated_at': '2012-01-26T08:54:30+00:00',\n",
" 'grouping': 'CharityClassification',\n",
" 'uid': '308',\n",
" 'title': 'SPONSORS OR UNDERTAKES RESEARCH',\n",
" 'created_at': '2010-10-26T07:28:30+01:00'},\n",
" {'id': 685,\n",
" 'parent_id': None,\n",
" 'extended_title': None,\n",
" 'updated_at': '2012-01-26T08:54:30+00:00',\n",
" 'grouping': 'CharityClassification',\n",
" 'uid': '302',\n",
" 'title': 'MAKES GRANTS TO ORGANISATIONS',\n",
" 'created_at': '2010-10-26T07:28:30+01:00'},\n",
" {'id': 689,\n",
" 'parent_id': None,\n",
" 'extended_title': None,\n",
" 'updated_at': '2012-01-26T08:54:30+00:00',\n",
" 'grouping': 'CharityClassification',\n",
" 'uid': '306',\n",
" 'title': 'PROVIDES SERVICES',\n",
" 'created_at': '2010-10-26T07:28:30+01:00'},\n",
" {'id': 683,\n",
" 'parent_id': None,\n",
" 'extended_title': None,\n",
" 'updated_at': '2012-01-26T08:54:30+00:00',\n",
" 'grouping': 'CharityClassification',\n",
" 'uid': '207',\n",
" 'title': 'THE GENERAL PUBLIC/MANKIND',\n",
" 'created_at': '2010-10-26T07:28:30+01:00'},\n",
" {'id': 690,\n",
" 'parent_id': None,\n",
" 'extended_title': None,\n",
" 'updated_at': '2012-01-26T08:54:30+00:00',\n",
" 'grouping': 'CharityClassification',\n",
" 'uid': '307',\n",
" 'title': 'PROVIDES ADVOCACY/ADVICE/INFORMATION',\n",
" 'created_at': '2010-10-26T07:28:30+01:00'},\n",
" {'id': 677,\n",
" 'parent_id': None,\n",
" 'extended_title': None,\n",
" 'updated_at': '2012-01-26T08:54:30+00:00',\n",
" 'grouping': 'CharityClassification',\n",
" 'uid': '201',\n",
" 'title': 'CHILDREN/YOUNG PEOPLE',\n",
" 'created_at': '2010-10-26T07:28:30+01:00'},\n",
" {'id': 733,\n",
" 'parent_id': 732,\n",
" 'extended_title': None,\n",
" 'updated_at': '2010-10-29T22:19:14+01:00',\n",
" 'grouping': 'ICNPO',\n",
" 'uid': '9100',\n",
" 'title': 'International activities',\n",
" 'created_at': '2010-10-29T22:19:14+01:00'},\n",
" {'id': 667,\n",
" 'parent_id': None,\n",
" 'extended_title': None,\n",
" 'updated_at': '2012-01-26T08:54:30+00:00',\n",
" 'grouping': 'CharityClassification',\n",
" 'uid': '105',\n",
" 'title': 'THE PREVENTION OR RELIEF OF POVERTY',\n",
" 'created_at': '2010-10-26T07:28:30+01:00'},\n",
" {'id': 686,\n",
" 'parent_id': None,\n",
" 'extended_title': None,\n",
" 'updated_at': '2012-01-26T08:54:30+00:00',\n",
" 'grouping': 'CharityClassification',\n",
" 'uid': '303',\n",
" 'title': 'PROVIDES OTHER FINANCE',\n",
" 'created_at': '2010-10-26T07:28:30+01:00'},\n",
" {'id': 679,\n",
" 'parent_id': None,\n",
" 'extended_title': None,\n",
" 'updated_at': '2012-01-26T08:54:30+00:00',\n",
" 'grouping': 'CharityClassification',\n",
" 'uid': '203',\n",
" 'title': 'PEOPLE WITH DISABILITIES',\n",
" 'created_at': '2010-10-26T07:28:30+01:00'},\n",
" {'id': 668,\n",
" 'parent_id': None,\n",
" 'extended_title': None,\n",
" 'updated_at': '2012-01-26T08:54:30+00:00',\n",
" 'grouping': 'CharityClassification',\n",
" 'uid': '106',\n",
" 'title': 'OVERSEAS AID/FAMINE RELIEF',\n",
" 'created_at': '2010-10-26T07:28:30+01:00'},\n",
" {'id': 681,\n",
" 'parent_id': None,\n",
" 'extended_title': None,\n",
" 'updated_at': '2012-01-26T08:54:30+00:00',\n",
" 'grouping': 'CharityClassification',\n",
" 'uid': '205',\n",
" 'title': 'OTHER CHARITIES OR VOLUNTARY BODIES',\n",
" 'created_at': '2010-10-26T07:28:30+01:00'}],\n",
" 'title': 'OXFAM',\n",
" 'accounts': [{'spending': '384600000',\n",
" 'sir_url': 'http://apps.charitycommission.gov.uk/SIR/ENDS18\\\\0000202918_SIR_20130331_E.PDF',\n",
" 'accounts_url': None,\n",
" 'accounts_date': '31 Mar 2013',\n",
" 'income': '367900000',\n",
" 'consolidated': 13},\n",
" {'spending': '378700000',\n",
" 'sir_url': 'http://apps.charitycommission.gov.uk/SIR/ENDS18\\\\0000202918_SIR_20120331_E.PDF',\n",
" 'accounts_url': None,\n",
" 'accounts_date': '31 Mar 2012',\n",
" 'income': '385500000',\n",
" 'consolidated': 13},\n",
" {'spending': '361100000',\n",
" 'sir_url': None,\n",
" 'accounts_url': None,\n",
" 'accounts_date': '31 Mar 2011',\n",
" 'income': '367500000',\n",
" 'consolidated': 13},\n",
" {'spending': '294800000',\n",
" 'sir_url': 'http://apps.charitycommission.gov.uk/SIR/ENDS18\\\\0000202918_SIR_10_E.PDF',\n",
" 'accounts_url': None,\n",
" 'accounts_date': '31 Mar 2010',\n",
" 'income': '318000000',\n",
" 'consolidated': 13},\n",
" {'spending': '318600000',\n",
" 'sir_url': 'http://apps.charitycommission.gov.uk/SIR/ENDS18\\\\0000202918_SIR_09_E.PDF',\n",
" 'accounts_url': None,\n",
" 'accounts_date': '30 Apr 2009',\n",
" 'income': '308300000',\n",
" 'consolidated': 13}],\n",
" 'spending': None,\n",
" 'trustees': [{'uid': '2490956', 'full_name': 'MS MAJA DARUWALA'},\n",
" {'uid': '3358688', 'full_name': 'MR MATTHEW HENRY MARTIN'},\n",
" {'uid': '3480415', 'full_name': 'MS Katy Steward'},\n",
" {'uid': '3517246', 'full_name': 'MS PATRICIA MARY ZIPFEL'},\n",
" {'uid': '3517247', 'full_name': 'MR RAJIV JOSHI'},\n",
" {'uid': '3570943', 'full_name': 'MS MARJORIE SCARDINO'},\n",
" {'uid': '3666324', 'full_name': 'MR JAMES ANDREW DARCY'},\n",
" {'uid': '3700697', 'full_name': 'KAREN BROWN MS'},\n",
" {'uid': '3710351', 'full_name': 'MR DAVID PITT-WATSON'},\n",
" {'uid': '11169670', 'full_name': 'MS NKOYO TOYO'},\n",
" {'uid': '11369445', 'full_name': 'MR GAVIN MACNEILL STEWART'},\n",
" {'uid': '11369446', 'full_name': 'MR STEPHEN MARK WALTON'}],\n",
" 'website': 'http://www.oxfam.org.uk',\n",
" 'last_checked': '2014-05-16T16:44:09+01:00',\n",
" 'housing_association_number': None,\n",
" 'signed_up_for_1010': False,\n",
" 'other_names': ['OXFAM GB'],\n",
" 'volunteers': 22000,\n",
" 'subsidiary_number': None,\n",
" 'fax': None,\n",
" 'accounts_date': '2013-03-31',\n",
" 'id': 92548,\n",
" 'date_removed': None,\n",
" 'manually_updated': None,\n",
" 'facebook_account_name': None,\n",
" 'feed_url': None,\n",
" 'date_registered': '1965-09-07',\n",
" 'employees': 5046,\n",
" 'telephone': '0870 333 2444',\n",
" 'vat_number': None,\n",
" 'governing_document': 'MEMORANDUM AND ARTICLES INCORPORATED 01/10/58 AS AMENDED 07/05/65, 26/07/86, 18/10/96, 10/07/98, 29/10/98, 05/01/99, 02/99, 13/01/ 00, 23/11/07 AND 24/04/09, NOW ARTICLES ADOPTED AND AMENDED BY SPECIAL RESOLUTION DATED 04/12/09.;'}}"
]
}
],
"prompt_number": 136
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#We can start to pull out specifics from this data...\n",
"chdata = jdata['charity']\n",
"fdata = chdata['financial_breakdown']\n",
"\n",
"data={}\n",
"#The data will include employee and volunteer numbers..\n",
"for tmp in ['volunteers','employees']:\n",
" data[tmp] = chdata[tmp]\n",
" \n",
"#...as well as financial information\n",
"for tmp in ['assets','spending','income']:\n",
" if fdata != None and tmp in fdata:\n",
" for tmp2 in fdata[tmp]:\n",
" data[tmp+'_'+tmp2] = fdata[tmp][tmp2]\n",
"data"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 159,
"text": [
"{'spending_generating_voluntary_income': '23200000',\n",
" 'assets_other_assets': '108400000',\n",
" 'assets_total_liabilities': '-68600000',\n",
" 'assets_own_use_assets': '13100000',\n",
" 'income_charitable': '166000000',\n",
" 'spending_investment_management': '100000',\n",
" 'assets_long_term_investments': '3000000',\n",
" 'volunteers': 22000,\n",
" 'income_trading': '88700000',\n",
" 'income_investment_gains': '0',\n",
" 'spending_other': '1400000',\n",
" 'income_voluntary': '111500000',\n",
" 'spending_charitable_activities': '290000000',\n",
" 'spending_trading': '68600000',\n",
" 'employees': 5046,\n",
" 'spending_governance': '1300000',\n",
" 'income_other': '800000',\n",
" 'income_investment': '900000'}"
]
}
],
"prompt_number": 159
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Or how about get a list of trustees?\n",
"pd.DataFrame(jdata['charity']['trustees'])\n",
"\n",
"#A search on the Charities Commission site - http://www.charitycommission.gov.uk/find-charities/ - for a particular charity\n",
"# includes a list of trustees. Clicking a trustee name shows you other charities for which they are a trustee.\n",
"#Unfortunately, OpenCharities doesnlt seem to support this level of search - to implement it ourselves, we'd have to scrape the\n",
"# data ourselves..."
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>full_name</th>\n",
" <th>uid</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0 </th>\n",
" <td> MS MAJA DARUWALA</td>\n",
" <td> 2490956</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1 </th>\n",
" <td> MR MATTHEW HENRY MARTIN</td>\n",
" <td> 3358688</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2 </th>\n",
" <td> MS Katy Steward</td>\n",
" <td> 3480415</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3 </th>\n",
" <td> MS PATRICIA MARY ZIPFEL</td>\n",
" <td> 3517246</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4 </th>\n",
" <td> MR RAJIV JOSHI</td>\n",
" <td> 3517247</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5 </th>\n",
" <td> MS MARJORIE SCARDINO</td>\n",
" <td> 3570943</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6 </th>\n",
" <td> MR JAMES ANDREW DARCY</td>\n",
" <td> 3666324</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7 </th>\n",
" <td> KAREN BROWN MS</td>\n",
" <td> 3700697</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8 </th>\n",
" <td> MR DAVID PITT-WATSON</td>\n",
" <td> 3710351</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9 </th>\n",
" <td> MS NKOYO TOYO</td>\n",
" <td> 11169670</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td> MR GAVIN MACNEILL STEWART</td>\n",
" <td> 11369445</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td> MR STEPHEN MARK WALTON</td>\n",
" <td> 11369446</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 160,
"text": [
" full_name uid\n",
"0 MS MAJA DARUWALA 2490956\n",
"1 MR MATTHEW HENRY MARTIN 3358688\n",
"2 MS Katy Steward 3480415\n",
"3 MS PATRICIA MARY ZIPFEL 3517246\n",
"4 MR RAJIV JOSHI 3517247\n",
"5 MS MARJORIE SCARDINO 3570943\n",
"6 MR JAMES ANDREW DARCY 3666324\n",
"7 KAREN BROWN MS 3700697\n",
"8 MR DAVID PITT-WATSON 3710351\n",
"9 MS NKOYO TOYO 11169670\n",
"10 MR GAVIN MACNEILL STEWART 11369445\n",
"11 MR STEPHEN MARK WALTON 11369446"
]
}
],
"prompt_number": 160
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Simple function to look for trustees for a charity using OpenCharities by Charity Number\n",
"def lookupTrusteesForCharity(charitynumber):\n",
" url = 'http://opencharities.org/charities/{0}.json'.format(charitynumber)\n",
" jsondata = json.loads(requests.get(url).text)\n",
" df = pd.DataFrame(jsondata['charity']['trustees'])\n",
" df['charityname']=jsondata['charity']['title']\n",
" df['charitynumber']=jsondata['charity']['charity_number']\n",
" return df\n",
"\n",
"lookupTrusteesForCharity('516163')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>full_name</th>\n",
" <th>uid</th>\n",
" <th>charityname</th>\n",
" <th>charitynumber</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> MR MICHAEL PEARS</td>\n",
" <td> 1556031</td>\n",
" <td> 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP</td>\n",
" <td> 516163</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> MR ROBERT THURLBY</td>\n",
" <td> 1768031</td>\n",
" <td> 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP</td>\n",
" <td> 516163</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> MRS CYNTHIA HINDLEY</td>\n",
" <td> 2026503</td>\n",
" <td> 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP</td>\n",
" <td> 516163</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> MR DAVID BUTTERWORTH</td>\n",
" <td> 2026712</td>\n",
" <td> 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP</td>\n",
" <td> 516163</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> MR ADAM LOMAX</td>\n",
" <td> 3256338</td>\n",
" <td> 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP</td>\n",
" <td> 516163</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 149,
"text": [
" full_name uid charityname \\\n",
"0 MR MICHAEL PEARS 1556031 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP \n",
"1 MR ROBERT THURLBY 1768031 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP \n",
"2 MRS CYNTHIA HINDLEY 2026503 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP \n",
"3 MR DAVID BUTTERWORTH 2026712 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP \n",
"4 MR ADAM LOMAX 3256338 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP \n",
"\n",
" charitynumber \n",
"0 516163 \n",
"1 516163 \n",
"2 516163 \n",
"3 516163 \n",
"4 516163 "
]
}
],
"prompt_number": 149
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Display charity classification areas\n",
"\n",
"def lookupCharityClassification(charitynumber):\n",
" url = 'http://opencharities.org/charities/{0}.json'.format(charitynumber)\n",
" jsondata = json.loads(requests.get(url).text)\n",
" df = pd.DataFrame(jsondata['charity']['classifications'])\n",
" \n",
" df['charityname']=jsondata['charity']['title']\n",
" df['charitynumber']=jsondata['charity']['charity_number']\n",
" return df\n",
"\n",
"lookupCharityClassification('516163')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>created_at</th>\n",
" <th>extended_title</th>\n",
" <th>grouping</th>\n",
" <th>id</th>\n",
" <th>parent_id</th>\n",
" <th>title</th>\n",
" <th>uid</th>\n",
" <th>updated_at</th>\n",
" <th>charityname</th>\n",
" <th>charitynumber</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 2010-10-26T07:28:30+01:00</td>\n",
" <td> None</td>\n",
" <td> CharityClassification</td>\n",
" <td> 664</td>\n",
" <td> NaN</td>\n",
" <td> EDUCATION/TRAINING</td>\n",
" <td> 102</td>\n",
" <td> 2012-01-26T08:54:30+00:00</td>\n",
" <td> 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP</td>\n",
" <td> 516163</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 2010-10-26T07:28:30+01:00</td>\n",
" <td> None</td>\n",
" <td> CharityClassification</td>\n",
" <td> 677</td>\n",
" <td> NaN</td>\n",
" <td> CHILDREN/YOUNG PEOPLE</td>\n",
" <td> 201</td>\n",
" <td> 2012-01-26T08:54:30+00:00</td>\n",
" <td> 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP</td>\n",
" <td> 516163</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 2010-10-26T07:28:30+01:00</td>\n",
" <td> None</td>\n",
" <td> CharityClassification</td>\n",
" <td> 687</td>\n",
" <td> NaN</td>\n",
" <td> PROVIDES HUMAN RESOURCES</td>\n",
" <td> 304</td>\n",
" <td> 2012-01-26T08:54:30+00:00</td>\n",
" <td> 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP</td>\n",
" <td> 516163</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 2010-10-29T22:19:13+01:00</td>\n",
" <td> None</td>\n",
" <td> ICNPO</td>\n",
" <td> 714</td>\n",
" <td> 712</td>\n",
" <td> Scout groups and youth clubs</td>\n",
" <td> 4110</td>\n",
" <td> 2010-10-29T22:19:13+01:00</td>\n",
" <td> 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP</td>\n",
" <td> 516163</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 2010-10-26T07:28:30+01:00</td>\n",
" <td> None</td>\n",
" <td> CharityClassification</td>\n",
" <td> 688</td>\n",
" <td> NaN</td>\n",
" <td> PROVIDES BUILDINGS/FACILITIES/OPEN SPACE</td>\n",
" <td> 305</td>\n",
" <td> 2012-01-26T08:54:30+00:00</td>\n",
" <td> 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP</td>\n",
" <td> 516163</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> 2010-10-26T07:28:30+01:00</td>\n",
" <td> None</td>\n",
" <td> CharityClassification</td>\n",
" <td> 672</td>\n",
" <td> NaN</td>\n",
" <td> AMATEUR SPORT</td>\n",
" <td> 110</td>\n",
" <td> 2012-01-26T08:54:30+00:00</td>\n",
" <td> 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP</td>\n",
" <td> 516163</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 156,
"text": [
" created_at extended_title grouping id \\\n",
"0 2010-10-26T07:28:30+01:00 None CharityClassification 664 \n",
"1 2010-10-26T07:28:30+01:00 None CharityClassification 677 \n",
"2 2010-10-26T07:28:30+01:00 None CharityClassification 687 \n",
"3 2010-10-29T22:19:13+01:00 None ICNPO 714 \n",
"4 2010-10-26T07:28:30+01:00 None CharityClassification 688 \n",
"5 2010-10-26T07:28:30+01:00 None CharityClassification 672 \n",
"\n",
" parent_id title uid \\\n",
"0 NaN EDUCATION/TRAINING 102 \n",
"1 NaN CHILDREN/YOUNG PEOPLE 201 \n",
"2 NaN PROVIDES HUMAN RESOURCES 304 \n",
"3 712 Scout groups and youth clubs 4110 \n",
"4 NaN PROVIDES BUILDINGS/FACILITIES/OPEN SPACE 305 \n",
"5 NaN AMATEUR SPORT 110 \n",
"\n",
" updated_at charityname \\\n",
"0 2012-01-26T08:54:30+00:00 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP \n",
"1 2012-01-26T08:54:30+00:00 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP \n",
"2 2012-01-26T08:54:30+00:00 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP \n",
"3 2010-10-29T22:19:13+01:00 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP \n",
"4 2012-01-26T08:54:30+00:00 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP \n",
"5 2012-01-26T08:54:30+00:00 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP \n",
"\n",
" charitynumber \n",
"0 516163 \n",
"1 516163 \n",
"2 516163 \n",
"3 516163 \n",
"4 516163 \n",
"5 516163 "
]
}
],
"prompt_number": 156
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Notes on sqlalchemy\n",
"\n",
"#There's something odd goes on - sqlalchemy tablenames may include case by default.\n",
"#To drop a tablename with cased characters, enclose the name in \"\" in the DROP statement.\n",
"#I also seem to have a table that is listed but that doesn't exist?"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#import sqlalchemy\n",
"#meta = sqlalchemy.MetaData(engine, schema='tm351test')\n",
"#meta.reflect()\n",
"#pdsql = pd.io.sql.PandasSQLAlchemy(engine, meta=meta)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 42
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#from sqlalchemy import inspect\n",
"#eng=inspect(engine)\n",
"#eng.get_columns('charitiescommission')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 43
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#dir(eng)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 44
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#eng.get_table_names()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 45
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#dir(chC)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 46
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#psql(\"SELECT table_schema,table_name FROM information_schema.tables \\\n",
"# ORDER BY table_schema,table_name;\", engine)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 47
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment