Skip to content

Instantly share code, notes, and snippets.

@phenders
Last active July 23, 2018 18:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save phenders/99ede97853961e0ec62ece9666e05741 to your computer and use it in GitHub Desktop.
Save phenders/99ede97853961e0ec62ece9666e05741 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Which of the FAMGA companies spent most on lobbying over the last five years?\n",
"\n",
"Facebook, Apple, Microsoft, Google, and Amazon (FAMGA) all spend a lot of money on lobbying. The issues they hope to influence lawmakers on range from those you might expect, like \"High-skilled immigration\", \"Cybersecurity issues\", and \"Autonomous vehicle technology,\" to more obscure ones, like \"Issues related to online wine sales\" and \"Saving High Tech Inventors from Egregious Legal Disputes (SHIELD).\" According to data on [Enigma Public](https://public.enigma.com), the total amount spent on lobbying in the U.S. Senate by all companies has more than doubled since 2000. But which of the FAMGA companies spends the most on lobbying? \n",
"\n",
"Lobbyists working in the U.S. Senate are required to report all of their activity. You can find these reports on Enigma Public in the [U.S. Senate Lobbying Reports collection](https://public.enigma.com/browse/collection/u-s-senate-lobbying-reports/3b169a69-cbf4-4137-92d4-0ca268816153). Within this collection, [Primary Lobbying Reports](https://public.enigma.com/browse/collection/primary-lobbying-reports/b4c9a21a-79d1-495e-a6ea-d643a3820b43) includes the information about clients and payments. We'll use the datasets in this collection to answer the question about who spends most.\n",
"\n",
"### Some initial setup\n",
"\n",
"This article assumes a basic knowledge of Python and pandas, as well as some familiarity with the [Enigma Public API](http://docs.enigma.com/public/public_v20_api_about.html). If you haven't used the API before, the tutorial [Getting started with the API](http://docs.enigma.com/public/public_v20_user_api.html) provides information on exploring the API using [Postman](https://www.getpostman.com/). Another tutorial, [Using Python to access Enigma Public](http://docs.enigma.com/public/public_v20_user_python.html), demonstrates how to make API calls from Python.\n",
"\n",
"The code samples in this article are shown in Jupyter Notebook. If you want to follow along, you'll need to install [Jupyter Notebook](https://jupyter.readthedocs.io/en/latest/content-quickstart.html) (or [JupyterLab](http://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html)) along with the two required Python modules, [requests](https://docs.python-requests.org/en/master/user/quickstart/) and [pandas](https://pandas.pydata.org/), into your Python environment (`pip install requests` and `pip install pandas`). You can then import the modules into your Jupyter project. The code below shows how to create an HTTP session object and include your [Enigma Public API key](http://docs.enigma.com/public/public_v20_api_authentication.html) in the authorization header."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import pandas as pd\n",
"\n",
"base_url = 'https://public.enigma.com/api/'\n",
"s = requests.Session()\n",
"headers = {'authorization': 'Bearer ' + 'YOUR-API-KEY'}\n",
"s.headers.update(headers)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get the snapshot IDs for the datasets of interest\n",
"\n",
"The [Primary Lobbying Reports collection](https://public.enigma.com/browse/collection/primary-lobbying-reports/b4c9a21a-79d1-495e-a6ea-d643a3820b43) has data going back to 1999. Since we're interested only in the data for the last five years, you'll need to get the snapshot ID associated with each of the datasets going back to 2013. First, we'll use the `parent_collection_id` parameter on the API's [GET /datasets/](http://docs.enigma.com/public/public_v20_api_get_datasets.html) endpoint to get the datasets within the Primary Lobbying Reports collection, and then get the ID of the current snapshot for datasets where the year in the dataset name is 2013 or later."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"collection_id = 'b4c9a21a-79d1-495e-a6ea-d643a3820b43' # Primary Lobbying Reports\n",
"url = base_url + 'datasets/?parent_collection_id={}'.format(collection_id)\n",
"response = s.get(url).json()\n",
"ids = [dataset['current_snapshot']['id'] for dataset in response if int(dataset['display_name'].split()[-1]) >= 2013]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you make a request to the Enigma Public API, the response comes back in JSON format. If you open one of the datasets in Enigma Public (for example, [Lobbying Disclosures - Primary Report 2017](https://public.enigma.com/datasets/lobbying-disclosures-primary-report-2017/bbb5f44a-f9c4-4c01-b112-03d576984d67)) and click the API button, you'll see an example of a JSON response. Navigating the JSON hierarchy to get the information you want can be one of the biggest challenges when working with the API (see [Getting the most recent snapshot ID](http://docs.enigma.com/public/public_v20_user_python.html#snapshot_id)). In the example above, we're going to the `current_snapshot` node and getting the value associated with the `id` key, but saving it only if the numeric value of the last word in the display name (the year) is greater than or equal to 2013. This gives us a list of the required snapshot IDs."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get the rows of interest from each snapshot\n",
"\n",
"Each of the datasets in the collection is quite large. Since we're interested only in data about five specific companies over five specific years, you can request only the rows where the \"Client Name\" column includes Facebook, Apple, Microsoft, Google, or Amazon, and the \"Year\" column is in the range 2013-2017 (since a report can include filings for previous years).\n",
"\n",
"\"Client Name\" and \"Year\" are the column *display names* (the names shown in the column headers in Enigma Public). To request information for specific columns, you need to know their *field names*. If you clicked the API button in Enigma Public as suggested earlier, you'll see a list of field names in the JSON response. The field names in this case are `year` and `client_name`.\n",
"\n",
"The [GET /snapshots/id](http://docs.enigma.com/public/public_v20_api_get_snapshots_id.html) API endpoint has an [advanced query mode](http://development-dockerd-0.enigma:8888/public_v20_api_get_snapshots_id.html#advanced) that lets you search specific columns, rather than entire rows. To search a column for multiple terms, you need to \"OR\" the search terms and must include a URL-encoded space character (`%20`) before and after the operator, as done by the Python `join` functions below. Since we're searching two columns, \"AND\" the two column searches together to create the query string. You can then iterate over the snapshot ID list and call the API using the same query for each call. Note that you need to specify a `row_limit` (<10,000) or you won't get any rows back. In the code below, `rows.extend(response['table_rows']['rows'])` creates a list-of-lists with all of the resulting rows for all five years."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"companies = ['Facebook', 'Apple', 'Microsoft', 'Google', 'Amazon']\n",
"years = ['2013', '2014', '2015', '2016', '2017']\n",
"query = '(client_name:({})AND(year:({})))'.format('%20OR%20'.join(companies), '%20OR%20'.join(years))\n",
"rows = []\n",
"for snapshot_id in ids:\n",
" url = base_url + 'snapshots/{}?query_mode=advanced&query={}&row_limit=1000'.format(snapshot_id, query)\n",
" response = s.get(url).json()\n",
" rows.extend(response['table_rows']['rows'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a dataframe using the field names as the column headers\n",
"\n",
"Using pandas, it's easy to convert a list-of-lists into a DataFrame. You can use the list of field names contained in the JSON API response to set the column headers."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>year</th>\n",
" <th>received</th>\n",
" <th>amount</th>\n",
" <th>type</th>\n",
" <th>period</th>\n",
" <th>registrant_id</th>\n",
" <th>registrant_name</th>\n",
" <th>registrant_general_description</th>\n",
" <th>address</th>\n",
" <th>...</th>\n",
" <th>client_general_description</th>\n",
" <th>client_id</th>\n",
" <th>self_filer</th>\n",
" <th>contact_full_name</th>\n",
" <th>is_state_or_local_gov</th>\n",
" <th>client_country</th>\n",
" <th>client_ppb_country</th>\n",
" <th>client_state</th>\n",
" <th>client_ppb_state</th>\n",
" <th>serialid</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1129</th>\n",
" <td>4EC593A7-CD18-425F-86F3-EE782103AD40</td>\n",
" <td>2016</td>\n",
" <td>2017-01-04T14:02:53+00:00</td>\n",
" <td>30000</td>\n",
" <td>FOURTH QUARTER REPORT</td>\n",
" <td>4th Quarter (Oct 1 - Dec 31)</td>\n",
" <td>401036920</td>\n",
" <td>Bloom Strategic Counsel</td>\n",
" <td>Law, public policy, and government relations</td>\n",
" <td>1155 F Street, NW\\r\\nSuite 1200\\r\\nWashington,...</td>\n",
" <td>...</td>\n",
" <td>Online retailer</td>\n",
" <td>125</td>\n",
" <td>FALSE</td>\n",
" <td>Seth Bloom</td>\n",
" <td>TRUE</td>\n",
" <td>USA</td>\n",
" <td>USA</td>\n",
" <td>WASHINGTON</td>\n",
" <td>WASHINGTON</td>\n",
" <td>16034</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1130</th>\n",
" <td>904E3F0B-FF0B-4B8B-94D7-8D468A4672CC</td>\n",
" <td>2016</td>\n",
" <td>2017-01-08T11:57:36+00:00</td>\n",
" <td>30000</td>\n",
" <td>FOURTH QUARTER REPORT</td>\n",
" <td>4th Quarter (Oct 1 - Dec 31)</td>\n",
" <td>306011</td>\n",
" <td>Fidelis Government Relations</td>\n",
" <td>Government relations consulting firm</td>\n",
" <td>1300 Penn Ave Suite 700\\r\\nWashington, DC 20004</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>51</td>\n",
" <td>FALSE</td>\n",
" <td>TERRY G. ALLEN</td>\n",
" <td>TRUE</td>\n",
" <td>USA</td>\n",
" <td>USA</td>\n",
" <td>DISTRICT OF COLUMBIA</td>\n",
" <td>WASHINGTON</td>\n",
" <td>16525</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1131</th>\n",
" <td>2FE1FBFB-A894-4FED-9BB1-FD1012E44422</td>\n",
" <td>2016</td>\n",
" <td>2017-01-20T17:22:01+00:00</td>\n",
" <td>60000</td>\n",
" <td>FOURTH QUARTER REPORT</td>\n",
" <td>4th Quarter (Oct 1 - Dec 31)</td>\n",
" <td>400424029</td>\n",
" <td>Polaris-Hutton Group, LLC</td>\n",
" <td>Government Relations Firm</td>\n",
" <td>PO Box 1304\\r\\nAlexandria, VA 22313</td>\n",
" <td>...</td>\n",
" <td>Software company that manufacturers, licenses ...</td>\n",
" <td>48</td>\n",
" <td>FALSE</td>\n",
" <td>DANIEL J. GANS</td>\n",
" <td>TRUE</td>\n",
" <td>USA</td>\n",
" <td>USA</td>\n",
" <td>DISTRICT OF COLUMBIA</td>\n",
" <td>WASHINGTON</td>\n",
" <td>16981</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1132</th>\n",
" <td>4B354BA0-9FA5-4433-AB7A-C2A7FA553F04</td>\n",
" <td>2016</td>\n",
" <td>2017-01-20T21:55:24+00:00</td>\n",
" <td>50000</td>\n",
" <td>FOURTH QUARTER REPORT</td>\n",
" <td>4th Quarter (Oct 1 - Dec 31)</td>\n",
" <td>57475</td>\n",
" <td>Ernst &amp; Young LLP (Washington Council Ernst &amp; ...</td>\n",
" <td>professional services firm</td>\n",
" <td>1001 Pennsylvania Avenue, NW\\r\\nSuite 601 Nort...</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>2031</td>\n",
" <td>FALSE</td>\n",
" <td>Nick Giordano</td>\n",
" <td>TRUE</td>\n",
" <td>USA</td>\n",
" <td>USA</td>\n",
" <td>DISTRICT OF COLUMBIA</td>\n",
" <td>DISTRICT OF COLUMBIA</td>\n",
" <td>17245</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1133</th>\n",
" <td>53826956-E747-4B33-94F2-766ADEF4F4A7</td>\n",
" <td>2016</td>\n",
" <td>2017-01-21T17:56:06+00:00</td>\n",
" <td>None</td>\n",
" <td>FOURTH QUARTER REPORT</td>\n",
" <td>4th Quarter (Oct 1 - Dec 31)</td>\n",
" <td>44641</td>\n",
" <td>Brown Winick Graves Gross Baskerville &amp; Schoen...</td>\n",
" <td>Law firm</td>\n",
" <td>666 Grand Avenue, Suite 2000\\r\\nDes Moines, IA...</td>\n",
" <td>...</td>\n",
" <td>Internet search and advertising services</td>\n",
" <td>125</td>\n",
" <td>FALSE</td>\n",
" <td>Christopher Sackett</td>\n",
" <td>TRUE</td>\n",
" <td>USA</td>\n",
" <td>USA</td>\n",
" <td>DISTRICT OF COLUMBIA</td>\n",
" <td>CALIFORNIA</td>\n",
" <td>17511</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 23 columns</p>\n",
"</div>"
],
"text/plain": [
" id year received \\\n",
"1129 4EC593A7-CD18-425F-86F3-EE782103AD40 2016 2017-01-04T14:02:53+00:00 \n",
"1130 904E3F0B-FF0B-4B8B-94D7-8D468A4672CC 2016 2017-01-08T11:57:36+00:00 \n",
"1131 2FE1FBFB-A894-4FED-9BB1-FD1012E44422 2016 2017-01-20T17:22:01+00:00 \n",
"1132 4B354BA0-9FA5-4433-AB7A-C2A7FA553F04 2016 2017-01-20T21:55:24+00:00 \n",
"1133 53826956-E747-4B33-94F2-766ADEF4F4A7 2016 2017-01-21T17:56:06+00:00 \n",
"\n",
" amount type period \\\n",
"1129 30000 FOURTH QUARTER REPORT 4th Quarter (Oct 1 - Dec 31) \n",
"1130 30000 FOURTH QUARTER REPORT 4th Quarter (Oct 1 - Dec 31) \n",
"1131 60000 FOURTH QUARTER REPORT 4th Quarter (Oct 1 - Dec 31) \n",
"1132 50000 FOURTH QUARTER REPORT 4th Quarter (Oct 1 - Dec 31) \n",
"1133 None FOURTH QUARTER REPORT 4th Quarter (Oct 1 - Dec 31) \n",
"\n",
" registrant_id registrant_name \\\n",
"1129 401036920 Bloom Strategic Counsel \n",
"1130 306011 Fidelis Government Relations \n",
"1131 400424029 Polaris-Hutton Group, LLC \n",
"1132 57475 Ernst & Young LLP (Washington Council Ernst & ... \n",
"1133 44641 Brown Winick Graves Gross Baskerville & Schoen... \n",
"\n",
" registrant_general_description \\\n",
"1129 Law, public policy, and government relations \n",
"1130 Government relations consulting firm \n",
"1131 Government Relations Firm \n",
"1132 professional services firm \n",
"1133 Law firm \n",
"\n",
" address ... \\\n",
"1129 1155 F Street, NW\\r\\nSuite 1200\\r\\nWashington,... ... \n",
"1130 1300 Penn Ave Suite 700\\r\\nWashington, DC 20004 ... \n",
"1131 PO Box 1304\\r\\nAlexandria, VA 22313 ... \n",
"1132 1001 Pennsylvania Avenue, NW\\r\\nSuite 601 Nort... ... \n",
"1133 666 Grand Avenue, Suite 2000\\r\\nDes Moines, IA... ... \n",
"\n",
" client_general_description client_id self_filer \\\n",
"1129 Online retailer 125 FALSE \n",
"1130 None 51 FALSE \n",
"1131 Software company that manufacturers, licenses ... 48 FALSE \n",
"1132 None 2031 FALSE \n",
"1133 Internet search and advertising services 125 FALSE \n",
"\n",
" contact_full_name is_state_or_local_gov client_country \\\n",
"1129 Seth Bloom TRUE USA \n",
"1130 TERRY G. ALLEN TRUE USA \n",
"1131 DANIEL J. GANS TRUE USA \n",
"1132 Nick Giordano TRUE USA \n",
"1133 Christopher Sackett TRUE USA \n",
"\n",
" client_ppb_country client_state client_ppb_state serialid \n",
"1129 USA WASHINGTON WASHINGTON 16034 \n",
"1130 USA DISTRICT OF COLUMBIA WASHINGTON 16525 \n",
"1131 USA DISTRICT OF COLUMBIA WASHINGTON 16981 \n",
"1132 USA DISTRICT OF COLUMBIA DISTRICT OF COLUMBIA 17245 \n",
"1133 USA DISTRICT OF COLUMBIA CALIFORNIA 17511 \n",
"\n",
"[5 rows x 23 columns]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame(rows, columns=response['table_rows']['fields'])\n",
"df.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Extract the columns of interest\n",
"\n",
"The resulting DataFrame has 23 columns, but you don't need all of these to understand where and when each company spent its lobbying money. Next, we'll extract the columns we want to keep."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>received</th>\n",
" <th>amount</th>\n",
" <th>registrant_name</th>\n",
" <th>client_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1129</th>\n",
" <td>2016</td>\n",
" <td>2017-01-04T14:02:53+00:00</td>\n",
" <td>30000</td>\n",
" <td>Bloom Strategic Counsel</td>\n",
" <td>Amazon</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1130</th>\n",
" <td>2016</td>\n",
" <td>2017-01-08T11:57:36+00:00</td>\n",
" <td>30000</td>\n",
" <td>Fidelis Government Relations</td>\n",
" <td>Microsoft Corp.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1131</th>\n",
" <td>2016</td>\n",
" <td>2017-01-20T17:22:01+00:00</td>\n",
" <td>60000</td>\n",
" <td>Polaris-Hutton Group, LLC</td>\n",
" <td>Microsoft</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1132</th>\n",
" <td>2016</td>\n",
" <td>2017-01-20T21:55:24+00:00</td>\n",
" <td>50000</td>\n",
" <td>Ernst &amp; Young LLP (Washington Council Ernst &amp; ...</td>\n",
" <td>Microsoft Corporation</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1133</th>\n",
" <td>2016</td>\n",
" <td>2017-01-21T17:56:06+00:00</td>\n",
" <td>None</td>\n",
" <td>Brown Winick Graves Gross Baskerville &amp; Schoen...</td>\n",
" <td>Google Inc.</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year received amount \\\n",
"1129 2016 2017-01-04T14:02:53+00:00 30000 \n",
"1130 2016 2017-01-08T11:57:36+00:00 30000 \n",
"1131 2016 2017-01-20T17:22:01+00:00 60000 \n",
"1132 2016 2017-01-20T21:55:24+00:00 50000 \n",
"1133 2016 2017-01-21T17:56:06+00:00 None \n",
"\n",
" registrant_name client_name \n",
"1129 Bloom Strategic Counsel Amazon \n",
"1130 Fidelis Government Relations Microsoft Corp. \n",
"1131 Polaris-Hutton Group, LLC Microsoft \n",
"1132 Ernst & Young LLP (Washington Council Ernst & ... Microsoft Corporation \n",
"1133 Brown Winick Graves Gross Baskerville & Schoen... Google Inc. "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = df[['year', 'received', 'amount', 'registrant_name', 'client_name']]\n",
"df.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set the column types as needed\n",
"\n",
"When you read data from the JSON response, the values are read as strings. In order to compute aggregates, you'll need to convert the strings in the \"Amount\" column to numeric values. The code below does this, and also shows how to convert the datetime strings to dates in YYYY-MM-DD format."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>received</th>\n",
" <th>amount</th>\n",
" <th>registrant_name</th>\n",
" <th>client_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1129</th>\n",
" <td>2016</td>\n",
" <td>2017-01-04</td>\n",
" <td>30000.0</td>\n",
" <td>Bloom Strategic Counsel</td>\n",
" <td>Amazon</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1130</th>\n",
" <td>2016</td>\n",
" <td>2017-01-08</td>\n",
" <td>30000.0</td>\n",
" <td>Fidelis Government Relations</td>\n",
" <td>Microsoft Corp.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1131</th>\n",
" <td>2016</td>\n",
" <td>2017-01-20</td>\n",
" <td>60000.0</td>\n",
" <td>Polaris-Hutton Group, LLC</td>\n",
" <td>Microsoft</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1132</th>\n",
" <td>2016</td>\n",
" <td>2017-01-20</td>\n",
" <td>50000.0</td>\n",
" <td>Ernst &amp; Young LLP (Washington Council Ernst &amp; ...</td>\n",
" <td>Microsoft Corporation</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1133</th>\n",
" <td>2016</td>\n",
" <td>2017-01-21</td>\n",
" <td>NaN</td>\n",
" <td>Brown Winick Graves Gross Baskerville &amp; Schoen...</td>\n",
" <td>Google Inc.</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year received amount \\\n",
"1129 2016 2017-01-04 30000.0 \n",
"1130 2016 2017-01-08 30000.0 \n",
"1131 2016 2017-01-20 60000.0 \n",
"1132 2016 2017-01-20 50000.0 \n",
"1133 2016 2017-01-21 NaN \n",
"\n",
" registrant_name client_name \n",
"1129 Bloom Strategic Counsel Amazon \n",
"1130 Fidelis Government Relations Microsoft Corp. \n",
"1131 Polaris-Hutton Group, LLC Microsoft \n",
"1132 Ernst & Young LLP (Washington Council Ernst & ... Microsoft Corporation \n",
"1133 Brown Winick Graves Gross Baskerville & Schoen... Google Inc. "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['amount'] = df['amount'].apply(pd.to_numeric)\n",
"df['received'] = df['received'].apply(pd.to_datetime)\n",
"df['received'] = df['received'].apply(lambda x: x.strftime('%Y-%m-%d'))\n",
"df.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Normalize the company names\n",
"\n",
"As you can see above, lobbyists sometimes refer to their clients using variations of the client's name in the disclosure. For example, it's probably safe to assume Microsoft Corp., Microsoft, and Microsoft Corporation are all the same company. To do the aggregation, you'll need to resolve these variants to a single name. We'll use the list of names created earlier and convert anything in the `client_name` column that contains the name to the shorter version."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>received</th>\n",
" <th>amount</th>\n",
" <th>registrant_name</th>\n",
" <th>client_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1129</th>\n",
" <td>2016</td>\n",
" <td>2017-01-04</td>\n",
" <td>30000.0</td>\n",
" <td>Bloom Strategic Counsel</td>\n",
" <td>Amazon</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1130</th>\n",
" <td>2016</td>\n",
" <td>2017-01-08</td>\n",
" <td>30000.0</td>\n",
" <td>Fidelis Government Relations</td>\n",
" <td>Microsoft</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1131</th>\n",
" <td>2016</td>\n",
" <td>2017-01-20</td>\n",
" <td>60000.0</td>\n",
" <td>Polaris-Hutton Group, LLC</td>\n",
" <td>Microsoft</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1132</th>\n",
" <td>2016</td>\n",
" <td>2017-01-20</td>\n",
" <td>50000.0</td>\n",
" <td>Ernst &amp; Young LLP (Washington Council Ernst &amp; ...</td>\n",
" <td>Microsoft</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1133</th>\n",
" <td>2016</td>\n",
" <td>2017-01-21</td>\n",
" <td>NaN</td>\n",
" <td>Brown Winick Graves Gross Baskerville &amp; Schoen...</td>\n",
" <td>Google</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year received amount \\\n",
"1129 2016 2017-01-04 30000.0 \n",
"1130 2016 2017-01-08 30000.0 \n",
"1131 2016 2017-01-20 60000.0 \n",
"1132 2016 2017-01-20 50000.0 \n",
"1133 2016 2017-01-21 NaN \n",
"\n",
" registrant_name client_name \n",
"1129 Bloom Strategic Counsel Amazon \n",
"1130 Fidelis Government Relations Microsoft \n",
"1131 Polaris-Hutton Group, LLC Microsoft \n",
"1132 Ernst & Young LLP (Washington Council Ernst & ... Microsoft \n",
"1133 Brown Winick Graves Gross Baskerville & Schoen... Google "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for company in companies:\n",
" df.loc[df['client_name'].str.contains(company, case=False), 'client_name'] = company\n",
"df.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Compute the totals\n",
"\n",
"Finally we're ready to compute the total amount spent by each company. Since we resolved the company names, we can use the `client_name` column in the panda's `groupby` function to combine all rows with the same company name. Here, we want to sum the values in the \"Amount\" column in the resulting row. We'll then format the numeric values and display the rows in descending order based on the amount."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>client_name</th>\n",
" <th>amount</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Google</td>\n",
" <td>90,037,877.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Microsoft</td>\n",
" <td>51,908,500.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Facebook</td>\n",
" <td>38,372,000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Amazon</td>\n",
" <td>30,431,831.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Apple</td>\n",
" <td>23,135,000.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" client_name amount\n",
"0 Google 90,037,877.0\n",
"1 Microsoft 51,908,500.0\n",
"2 Facebook 38,372,000.0\n",
"3 Amazon 30,431,831.0\n",
"4 Apple 23,135,000.0"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"total = df.groupby(['client_name']).agg({'amount': sum}).reset_index()\n",
"total['amount'] = total.apply(lambda x: \"{:,}\".format(x['amount']), axis=1)\n",
"total = total.sort_values(by='amount', ascending=0).reset_index(drop=True)\n",
"total"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have the answer: Google spent the most on lobbying.\n",
"\n",
"One caveat -- if you look at the source datasets, you'll see that lobbyists have an opportunity to amend their reports. When they do, the orginal report and the amended report are both listed, meaning some transactions are counted twice. This will reduce the total amount actually spent by each company, but most likely won't impact the overall result -- that Google spent the most by a large margin."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Further research\n",
"\n",
"The datasets examined here indicate how much each company spent and who lobbied on their behalf, but don't tell you about the issues they tried to influence. If you take an issue ID from column 1 of the primary report, you can look it up in the corresponding issues report in the [Lobbying Issues collection](https://public.enigma.com/browse/collection/lobbying-issues/6ecaafbc-6962-4dc2-a4ac-b067ba3ba021). If you're interested in exploring further, try joining the primary dataset and issues dataset for each year, using the issue ID as the column to join on."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment