Skip to content

Instantly share code, notes, and snippets.

@phenders
Last active September 25, 2018 17:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save phenders/e9e7856ba9760f90d0d5ae04bad936a5 to your computer and use it in GitHub Desktop.
Save phenders/e9e7856ba9760f90d0d5ae04bad936a5 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Some initial setup\n",
"\n",
"This article assumes a basic knowledge of Python, as well as some familiarity with the [hierarchy of collections, datasets, and snapshots](http://docs.enigma.com/public/public_v20_user_organization.html) available on [Enigma Public](https://public.enigma.com).\n",
"\n",
"The code samples here are shown in Jupyter Notebook. If you want to follow along, you'll need to install [Jupyter Notebook](https://jupyter.readthedocs.io/en/latest/content-quickstart.html) (or [JupyterLab](http://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html)) into your Python virtual environment. Then download the .ipynb file from the GitHub URL above to your Jupyter folder.\n",
"\n",
"The tutorial requires two Python modules: the [Enigma SDK](https://pypi.org/project/enigma-sdk/) and [pandas](https://pandas.pydata.org/). You'll need to install these into your Python environment (`pip install enigma-sdk` and `pip install pandas`). You can then import the modules into your Jupyter project as shown below. The code here also creates a Public SDK client object and sets your [Enigma Public API key](http://docs.enigma.com/public/public_v20_api_authentication.html)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Install the Enigma SDK and pandas into your virtual environment\n",
"# TODO: Insert your Enigma Public API key as indicated below. \n",
"import enigma\n",
"import pandas as pd\n",
"\n",
"public = enigma.Public()\n",
"\n",
"public.set_auth(apikey='YOUR-API-KEY')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Getting all top level collections\n",
"\n",
"Begin by requesting a list of Enigma Public's top level collections using the SDK client's [collections.list( )](http://docs.enigma.com/public/public_v20_sdk_collections_list.html) method. Then print the display name and ID for each of the returned collections."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Companies: 5f8faa60-e6c3-4dc0-8eea-ade8c81d1265\n",
"Organizations: bc5c2c88-687e-4da2-93c3-32237ece39f0\n",
"Universities: c396139e-d685-4311-a447-2dd7669d963a\n",
"Governments: 651a30cd-c864-49ca-8d8b-9418029127db\n",
"United States: 41026df2-2db0-41b1-8d7f-5a6c8e34de62\n",
"Curated Collections: 52dfb31c-f22e-49fb-bc05-8f5d8a5e7cab\n"
]
}
],
"source": [
"collections = public.collections.list()\n",
"for collection in collections:\n",
" print('{}: {}'.format(collection.display_name, collection.id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`collections.list()` returns a list of Collection objects. Each Collection object has a set of attributes you can examine using Python's `object.attribute` notation (for example, `collection.display_name` and `collection.id` above). To see a full list of attributes for the Collection object, see the [Collection model](http://docs.enigma.com/public/public_v20_sdk_collection.html) in the SDK Reference.\n",
"\n",
"Note: All of the SDK client's `list()` methods return a Python list with at most 20 resources (collections, datasets, etc.). If you want more, you can append `.all()`. For example, to request all available collections, use `public.collections.list().all()`. This returns a ResourceList object that represents the entire batch and supports all standard list operations (indexing, slicing, iterating over, etc.), but fetches resources from the server only as needed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Getting specific collections\n",
"\n",
"The example above returns an unfiltered list of top level collections. The [collections.list( )](http://docs.enigma.com/public/public_v20_sdk_collections_list.html) method has several optional keyword arguments for locating specific collections. For example, if you want to see the collections within one of the top level collections, use the `parent_collection_id` argument. \n",
"\n",
"The example below fetches all collections that are children of the U.S. Federal Government collection. Since this collection has more than 20 child collections, we'll append `.all()`."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"63"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cid = 'bf068aa3-a15c-4db3-bdb1-6d51f91eae5a' # US Fed Govt collection\n",
"fed_collections = public.collections.list(\n",
" parent_collection_id=cid\n",
").all()\n",
"len(fed_collections)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's another way to get a list of child collections:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"63"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cid = 'bf068aa3-a15c-4db3-bdb1-6d51f91eae5a' # US Fed Govt collection\n",
"fed_collections = public.collections.get(cid).child_collections().all()\n",
"len(fed_collections)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The example above uses the SDK client's [collections.get( )](http://docs.enigma.com/public/public_v20_sdk_collections_get.html) method with the collection ID as an argument to get a reference to the U.S. Federal Government collection. It then uses the Collection object's [child_collections( )](http://docs.enigma.com/public/public_v20_sdk_collection_child_collections.html) method to get this collection's child collections. This pattern, where you first get an object (or list of objects) using the appropriate SDK client `get()` (or `list()`) method and then call the SDK helper methods on the returned object(s), is a pattern we'll use throughout this tutorial.\n",
"\n",
"Another way to get specific collections is to search for them using the `query` argument on [collections.list( )](http://docs.enigma.com/public/public_v20_sdk_collections_list.html). For example, to locate all collections with the words `federal` and `reserve` in the collection metadata:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<Collection 'Federal Reserve System'>, <Collection 'Federal Reserve Bank'>, <Collection 'E-Payments Routing Directory'>]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"public.collections.list(query='federal reserve').all()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: To search for the *phrase* `federal reserve`, use double quotes, like this: `query='\"federal reserve\"'`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Getting the datasets within a collection\n",
"\n",
"The Collection object's [child_datasets( )](http://docs.enigma.com/public/public_v20_sdk_collection_child_datasets.html) method returns a list of datasets that are immediate children of the collection. First, let's see if the U.S. Federal Government collection has any child datasets."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cid = 'bf068aa3-a15c-4db3-bdb1-6d51f91eae5a' # US Fed Govt collection\n",
"public.collections.get(cid).child_datasets().all()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The method returned an empty list so the answer is no, the U.S. Federal Government collection doesn't have any child datasets.\n",
"\n",
"A collection that *does* have child datasets is the NOAA National Climatic Data Center collection:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<Dataset 'Daily Average Wind Speed'>, <Dataset 'Daily Maximum Temperatures'>, <Dataset 'Daily Minimum Temperatures'>, <Dataset 'Daily Precipitation'>, <Dataset 'Daily Snow Depth'>, ...]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cid = 'b87242f0-65e0-4f9f-a051-084d63ca6dd9' # NOAA collection\n",
"noaa_datasets = public.collections.get(cid).child_datasets().all()\n",
"noaa_datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each Dataset object has a set of attributes you can examine using the same `object.attribute` used earlier to examine Collection objects. The example below prints the `display_name` and `id` dataset attributes. To see a full list of attributes for the Dataset object, see the [Dataset model](http://docs.enigma.com/public/public_v20_sdk_dataset.html) in the SDK Reference."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Daily Average Wind Speed: 55838ea3-0123-4342-a78c-92af05d18856\n",
"Daily Maximum Temperatures: 879b48e7-5532-41f0-8f45-62e83befc4ea\n",
"Daily Minimum Temperatures: 0bec0e1b-7ab5-4edd-8601-547e5d5d3e29\n",
"Daily Precipitation: 2e97c8d3-32f4-4056-8913-f889bc0be792\n",
"Daily Snow Depth: 4e2d6d2f-d02c-4955-9bf5-cc3ff28dd9bb\n",
"Daily Snowfall: 859932a2-d93c-446c-b0a4-d7b4bb285cd2\n",
"Daily Weather Indications: d70070ae-5fce-4ccb-8b6f-c72f6ee5a75b\n"
]
}
],
"source": [
"for dataset in noaa_datasets:\n",
" print('{}: {}'.format(dataset.display_name, dataset.id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Getting specific datasets\n",
"\n",
"By default, the SDK client's [datasets.list( )](http://docs.enigma.com/public/public_v20_sdk_datasets_list.html) method returns a list with the first 20 Enigma Public datasets, ordered by display name. If you want *all* datasets, append `.all()`."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<Dataset '100th Congress Amendments'>, <Dataset '100th Congress Basic Information'>, <Dataset '100th Congress Becoming Law'>, <Dataset '100th Congress Bill History'>, <Dataset '100th Congress Committees'>, ...]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"public.datasets.list().all()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want (say) the last 10, you can use Python [list slicing](https://docs.python.org/3/tutorial/introduction.html?highlight=slice#lists) as shown below."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Year Structure Built\n",
"Year Structure Built\n",
"Zambia - National Assembly - 11th Assembly\n",
"Zimbabwe - House of Assembly - 8th Parliament\n",
"Zimbabwe - Senate - 8th Parliament\n",
"Zip Code Business Patterns by Employment Size Class\n",
"Zip Code County Business Patterns\n",
"Zip-Code to Core-Based Statistical Area (CBSA)\n",
"Zip-Code to County\n",
"Zip-Code to Tract\n"
]
}
],
"source": [
"last_10 = public.datasets.list()[-10:]\n",
"for dataset in last_10:\n",
" print(dataset.display_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you make a request like this, the SDK figures out the total number of datasets and requests the last 10 for you. To do this using raw API calls is much more complicated as you must do all this yourself:\n",
"\n",
"```python\n",
"import requests\n",
"api_endpoint = 'https://public.enigma.com/api/datasets/'\n",
"headers = {'authorization': 'Bearer YOUR-API-KEY'}\n",
"api_response = requests.head(api_endpoint, headers = headers)\n",
"count = int(api_response.headers.get('content-range').split(\"/\")[1])\n",
"headers['Range'] = 'resources={}-{}'.format(count-10, count)\n",
"last_10 = requests.get(api_endpoint, headers=headers).json()\n",
"for dataset in last_10:\n",
" print(dataset['display_name'])\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you use list slicing, the SDK returns a Python generator object rather than a list. A generator object is similar to the ResourceList object described earlier in that it supports most list operations, but you can't get its length directly. To do this, you'll need to convert it to a list first, as shown below."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"last_10 = public.datasets.list()[-10:]\n",
"len(list(last_10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Searching for datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [datasets.list( )](http://docs.enigma.com/public/public_v20_sdk_datasets_list.html) method has a number of optional arguments that let you tailor the list of returned datasets. One is the `query` argument, which lets you search for datasets that include specific words in the datasets metadata or the dataset records:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<Dataset 'O*NET Tools and Technology'>, <Dataset 'World Bank - Crowd Source Prices Data'>, <Dataset 'The Inclusive Internet Index'>, <Dataset \"PhilGEPS - Bidder's List\">, <Dataset 'FEC Federal Campaign Contributions - 2016'>, ...]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"datasets = public.datasets.list(query='enigma technologies').all()\n",
"datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you're interested only in datasets that include the search terms within the dataset records, you can use the `match_metadata` argument to exclude metadata hits, as shown below. This example also shows how to search for the phrase \"enigma technologies\" (double quotes) and uses `rows_limit=200` to get matching data records as well (the default is 0 rows)."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<Dataset \"PhilGEPS - Bidder's List\">, <Dataset 'FEC Federal Campaign Contributions - 2016'>, <Dataset 'PhilGEPS - Awards'>, <Dataset 'Form D Related Persons - 2015'>, <Dataset 'Form D Related Persons - 2014'>, ...]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"datasets = public.datasets.list(\n",
" query='\"enigma technologies\"', \n",
" match_metadata=False, \n",
" row_limit=200\n",
").all()\n",
"datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Getting snapshot data\n",
"\n",
"Once you have a dataset or a list of datasets, you can use the attributes and methods on the [Dataset object](http://docs.enigma.com/public/public_v20_sdk_dataset.html) to explore each dataset. For example, you can use the `current_snapshot` attribute to get the current snapshot for the New York City Restaurant Inpections dataset:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Snapshot 'a8cfa297-290b-49f1-870b-7868b350cd22'>"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dsid = 'd8c29d0d-f283-4eb5-b4d4-460c9779d05d' # Restaurant insp. dataset\n",
"snapshot = public.datasets.get(dsid).current_snapshot\n",
"snapshot"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What you see above is a [Snapshot object](http://docs.enigma.com/public/public_v20_sdk_snapshot.html) reference you can use to access the snapshot data through its attributes and methods. For example, to print the field names:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"camis\n",
"dba\n",
"boro\n",
"address\n",
"zipcode\n",
"neighborhood\n",
"phone\n",
"cuisine_description\n",
"food_type\n",
"inspection_date\n",
"violation_code\n",
"grade\n",
"score\n",
"latitude\n",
"longitude\n",
"geo_location\n",
"serial_a8cfa297_290b_49f1_870b_7868b350cd22\n"
]
}
],
"source": [
"for field in snapshot.fields:\n",
" print(field.name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To view the snapshot's data records, you can use the Snapshot object's [get_rows( )](http://docs.enigma.com/public/public_v20_sdk_snapshot_get_rows.html) method. This returns a [TableView object](http://docs.enigma.com/public/public_v20_sdk_tableview.html) with the first 200 rows. You can request additional rows (up to 10,000) by including the `row_limit` argument, for example:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"500"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tableview = snapshot.get_rows(row_limit=500)\n",
"len(tableview)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The snapshot's [get_rows( )](http://docs.enigma.com/public/public_v20_sdk_snapshot_get_rows.html) method supports the same arguments that are available on [snapshots.get( )](http://docs.enigma.com/public/public_v20_sdk_snapshots_get.html). For example, to generate a new TableView with the first 200 rows that include the word `mexican`, use the `query` parameter as shown below. You then can use [list slicing](https://docs.python.org/3/tutorial/introduction.html?highlight=slice#lists) to access specific rows:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[[camis: '41054695', dba: 'Picante Mexican Bar and Grill Restaurant', boro: 'Manhattan', address: '3424 Broadway', zipcode: '10031', neighborhood: 'Harlem', phone: '2122346479', cuisine_description: 'Mexican', food_type: 'Mexican', inspection_date: '2018-07-10T00:00:00', violation_code: '10F 02G', grade: 'A', score: '12.0', latitude: '40.82286', longitude: '-73.95294', geo_location: {'lat': 40.82286, 'lng': -73.95294}],\n",
" [camis: '41086895', dba: 'Chipotle Mexican Grill', boro: 'Manhattan', address: '9 W 42nd St', zipcode: '10036', neighborhood: 'Midtown', phone: '2123546760', cuisine_description: 'Mexican', food_type: 'Mexican', inspection_date: '2018-08-27T00:00:00', violation_code: '04N 08A 10E', grade: 'A', score: '12.0', latitude: '40.753679999999996', longitude: '-73.98113000000001', geo_location: {'lat': 40.75368, 'lng': -73.98113}]]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tableview = snapshot.get_rows(query='mexican')\n",
"tableview[10:12]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [get_rows( )](http://docs.enigma.com/public/public_v20_sdk_snapshot_get_rows.html) method also supports the `advanced_query` option (see the attribute of the same name in [snapshots.get( )](http://docs.enigma.com/public/public_v20_sdk_snapshots_get.html)), which lets you perform more sophisticated queries. This example searches for rows where the borough name is Brooklyn and the food type is pizza."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"435 matching rows\n"
]
},
{
"data": {
"text/plain": [
"[camis: '40365632', dba: 'J&v Famous Pizza', boro: 'Brooklyn', address: '6322 18th Ave', zipcode: '11204', neighborhood: 'Borough Park', phone: '7182322700', cuisine_description: 'Pizza', food_type: 'Pizza', inspection_date: '2017-11-30T00:00:00', violation_code: '08A 04L', grade: 'A', score: '10.0', latitude: '40.62017', longitude: '-73.98924', geo_location: {'lat': 40.62017, 'lng': -73.98924}]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"advanced_query = '(boro:(Brooklyn) && food_type:(pizza))'\n",
"tableview = snapshot.get_rows(\n",
" row_limit=10000, \n",
" query_mode='advanced', \n",
" query=advanced_query\n",
")\n",
"print('{} matching rows'.format(len(tableview)))\n",
"tableview[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A [TableView object](http://docs.enigma.com/public/public_v20_sdk_tableview.html) is a list of TableRow objects, where each TableRow object represents one data record. The TableRow object lets you reference individual cells by name, as well as by numeric index:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Philadelphia Grille Express\n",
"Philadelphia Grille Express\n",
"J&v Famous Pizza\n",
"J&v Famous Pizza\n"
]
}
],
"source": [
"for row in tableview[0:2]:\n",
" print(row.dba) # Reference by name\n",
" print(row[1]) # Reference by index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Exporting a snapshot as a pandas dataframe\n",
"\n",
"The [pandas Python library](https://pandas.pydata.org/) provides data structures and data manipulation tools that greatly simplify data analysis and data cleaning. The SDK includes a convenient [export_dataframe( )](http://docs.enigma.com/public/public_v20_sdk_snapshot_export_dataframe.html) method that exports a snapshot to a pandas DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>camis</th>\n",
" <th>dba</th>\n",
" <th>boro</th>\n",
" <th>address</th>\n",
" <th>zipcode</th>\n",
" <th>neighborhood</th>\n",
" <th>phone</th>\n",
" <th>cuisine_description</th>\n",
" <th>food_type</th>\n",
" <th>inspection_date</th>\n",
" <th>violation_code</th>\n",
" <th>grade</th>\n",
" <th>score</th>\n",
" <th>latitude</th>\n",
" <th>longitude</th>\n",
" <th>geo_location</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30075445</td>\n",
" <td>Morris Park Bake Shop</td>\n",
" <td>Bronx</td>\n",
" <td>1007 Morris Park Ave</td>\n",
" <td>10462</td>\n",
" <td>Morris Park</td>\n",
" <td>7188924968</td>\n",
" <td>Bakery</td>\n",
" <td>Bakery</td>\n",
" <td>2018-05-11T00:00:00</td>\n",
" <td>10F 08C</td>\n",
" <td>A</td>\n",
" <td>5.0</td>\n",
" <td>40.84846</td>\n",
" <td>-73.85624</td>\n",
" <td>40.84846,-73.85624</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>30112340</td>\n",
" <td>Wendy's</td>\n",
" <td>Brooklyn</td>\n",
" <td>469 Flatbush Ave</td>\n",
" <td>11225</td>\n",
" <td>Prospect Park</td>\n",
" <td>7182875005</td>\n",
" <td>Hamburgers</td>\n",
" <td>Hamburgers</td>\n",
" <td>2018-03-13T00:00:00</td>\n",
" <td>04L 08A 10B</td>\n",
" <td>A</td>\n",
" <td>12.0</td>\n",
" <td>40.66313</td>\n",
" <td>-73.96232</td>\n",
" <td>40.66313,-73.96232</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>30191841</td>\n",
" <td>Dj Reynolds Pub and Restaurant</td>\n",
" <td>Manhattan</td>\n",
" <td>351 W 57th St</td>\n",
" <td>10019</td>\n",
" <td>Hell's Kitchen</td>\n",
" <td>2122452912</td>\n",
" <td>Irish</td>\n",
" <td>Irish</td>\n",
" <td>2018-05-16T00:00:00</td>\n",
" <td>08A 10F 04L</td>\n",
" <td>A</td>\n",
" <td>12.0</td>\n",
" <td>40.76782</td>\n",
" <td>-73.98481</td>\n",
" <td>40.76782,-73.98481</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" camis dba boro address \\\n",
"0 30075445 Morris Park Bake Shop Bronx 1007 Morris Park Ave \n",
"1 30112340 Wendy's Brooklyn 469 Flatbush Ave \n",
"2 30191841 Dj Reynolds Pub and Restaurant Manhattan 351 W 57th St \n",
"\n",
" zipcode neighborhood phone cuisine_description food_type \\\n",
"0 10462 Morris Park 7188924968 Bakery Bakery \n",
"1 11225 Prospect Park 7182875005 Hamburgers Hamburgers \n",
"2 10019 Hell's Kitchen 2122452912 Irish Irish \n",
"\n",
" inspection_date violation_code grade score latitude longitude \\\n",
"0 2018-05-11T00:00:00 10F 08C A 5.0 40.84846 -73.85624 \n",
"1 2018-03-13T00:00:00 04L 08A 10B A 12.0 40.66313 -73.96232 \n",
"2 2018-05-16T00:00:00 08A 10F 04L A 12.0 40.76782 -73.98481 \n",
"\n",
" geo_location \n",
"0 40.84846,-73.85624 \n",
"1 40.66313,-73.96232 \n",
"2 40.76782,-73.98481 "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TODO Confirm you installed pandas when you set up your virtual environment\n",
"# TODO Make sure you entered your API key in step 1 of this tutorial\n",
"dataset = public.datasets.get('d8c29d0d-f283-4eb5-b4d4-460c9779d05d')\n",
"df = dataset.current_snapshot.export_dataframe()\n",
"df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can include arguments (the same ones as [snapshots.get( )](http://docs.enigma.com/public/public_v20_sdk_snapshots_get.html)) to export only selected rows. This example exports Asian restaurants that received an 'A' inspection grade and are within 100 meters of Enigma's New York offices."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>camis</th>\n",
" <th>dba</th>\n",
" <th>boro</th>\n",
" <th>address</th>\n",
" <th>zipcode</th>\n",
" <th>neighborhood</th>\n",
" <th>phone</th>\n",
" <th>cuisine_description</th>\n",
" <th>food_type</th>\n",
" <th>inspection_date</th>\n",
" <th>violation_code</th>\n",
" <th>grade</th>\n",
" <th>score</th>\n",
" <th>latitude</th>\n",
" <th>longitude</th>\n",
" <th>geo_location</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>50019000</td>\n",
" <td>Idea Coffee</td>\n",
" <td>Manhattan</td>\n",
" <td>246 5th Ave</td>\n",
" <td>10001</td>\n",
" <td>Midtown</td>\n",
" <td>6468961533</td>\n",
" <td>Korean</td>\n",
" <td>Asian, Korean</td>\n",
" <td>2018-02-14T00:00:00</td>\n",
" <td>06D 02H</td>\n",
" <td>A</td>\n",
" <td>12.0</td>\n",
" <td>40.74471</td>\n",
" <td>-73.98738</td>\n",
" <td>40.74471,-73.98738</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>50043278</td>\n",
" <td>Bubble Tea &amp; Crepes</td>\n",
" <td>Manhattan</td>\n",
" <td>251 5th Ave</td>\n",
" <td>10016</td>\n",
" <td>Midtown</td>\n",
" <td>6465901959</td>\n",
" <td>Asian</td>\n",
" <td>Asian</td>\n",
" <td>2017-11-30T00:00:00</td>\n",
" <td>04L 08A 10F</td>\n",
" <td>A</td>\n",
" <td>13.0</td>\n",
" <td>40.74465</td>\n",
" <td>-73.98720</td>\n",
" <td>40.74465,-73.9872</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>50044541</td>\n",
" <td>Teisui</td>\n",
" <td>Manhattan</td>\n",
" <td>246 5th Ave</td>\n",
" <td>10001</td>\n",
" <td>Midtown</td>\n",
" <td>9173883596</td>\n",
" <td>Japanese</td>\n",
" <td>Asian, Japanese/Sushi</td>\n",
" <td>2018-03-13T00:00:00</td>\n",
" <td>10F 06A 06D</td>\n",
" <td>A</td>\n",
" <td>13.0</td>\n",
" <td>40.74471</td>\n",
" <td>-73.98738</td>\n",
" <td>40.74471,-73.98738</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>50047830</td>\n",
" <td>Pondicheri</td>\n",
" <td>Manhattan</td>\n",
" <td>15 W 27th St</td>\n",
" <td>10001</td>\n",
" <td>Midtown</td>\n",
" <td>7138200154</td>\n",
" <td>Indian</td>\n",
" <td>Indian, South Asian</td>\n",
" <td>2018-07-20T00:00:00</td>\n",
" <td>10B 02G</td>\n",
" <td>A</td>\n",
" <td>13.0</td>\n",
" <td>40.74450</td>\n",
" <td>-73.98822</td>\n",
" <td>40.7445,-73.98822</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>50069929</td>\n",
" <td>Chikarashi</td>\n",
" <td>Manhattan</td>\n",
" <td>1158 Broadway</td>\n",
" <td>10001</td>\n",
" <td>Midtown</td>\n",
" <td>9172620623</td>\n",
" <td>Japanese</td>\n",
" <td>Asian, Japanese/Sushi</td>\n",
" <td>2017-11-09T00:00:00</td>\n",
" <td>10H 10B 06C</td>\n",
" <td>A</td>\n",
" <td>12.0</td>\n",
" <td>40.74456</td>\n",
" <td>-73.98847</td>\n",
" <td>40.74456,-73.98847</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>50072908</td>\n",
" <td>Bondi Sushi</td>\n",
" <td>Manhattan</td>\n",
" <td>6 W 28th St</td>\n",
" <td>10001</td>\n",
" <td>Midtown</td>\n",
" <td>9174787078</td>\n",
" <td>Japanese</td>\n",
" <td>Asian, Japanese/Sushi</td>\n",
" <td>2018-02-05T00:00:00</td>\n",
" <td>06C</td>\n",
" <td>A</td>\n",
" <td>6.0</td>\n",
" <td>40.74476</td>\n",
" <td>-73.98809</td>\n",
" <td>40.74476,-73.98809</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>50073594</td>\n",
" <td>Noda</td>\n",
" <td>Manhattan</td>\n",
" <td>6 W 28th St</td>\n",
" <td>10001</td>\n",
" <td>Midtown</td>\n",
" <td>9147153029</td>\n",
" <td>Japanese</td>\n",
" <td>Asian, Japanese/Sushi</td>\n",
" <td>2018-08-01T00:00:00</td>\n",
" <td>08B 06C</td>\n",
" <td>A</td>\n",
" <td>7.0</td>\n",
" <td>40.74476</td>\n",
" <td>-73.98809</td>\n",
" <td>40.74476,-73.98809</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" camis dba boro address zipcode \\\n",
"0 50019000 Idea Coffee Manhattan 246 5th Ave 10001 \n",
"1 50043278 Bubble Tea & Crepes Manhattan 251 5th Ave 10016 \n",
"2 50044541 Teisui Manhattan 246 5th Ave 10001 \n",
"3 50047830 Pondicheri Manhattan 15 W 27th St 10001 \n",
"4 50069929 Chikarashi Manhattan 1158 Broadway 10001 \n",
"5 50072908 Bondi Sushi Manhattan 6 W 28th St 10001 \n",
"6 50073594 Noda Manhattan 6 W 28th St 10001 \n",
"\n",
" neighborhood phone cuisine_description food_type \\\n",
"0 Midtown 6468961533 Korean Asian, Korean \n",
"1 Midtown 6465901959 Asian Asian \n",
"2 Midtown 9173883596 Japanese Asian, Japanese/Sushi \n",
"3 Midtown 7138200154 Indian Indian, South Asian \n",
"4 Midtown 9172620623 Japanese Asian, Japanese/Sushi \n",
"5 Midtown 9174787078 Japanese Asian, Japanese/Sushi \n",
"6 Midtown 9147153029 Japanese Asian, Japanese/Sushi \n",
"\n",
" inspection_date violation_code grade score latitude longitude \\\n",
"0 2018-02-14T00:00:00 06D 02H A 12.0 40.74471 -73.98738 \n",
"1 2017-11-30T00:00:00 04L 08A 10F A 13.0 40.74465 -73.98720 \n",
"2 2018-03-13T00:00:00 10F 06A 06D A 13.0 40.74471 -73.98738 \n",
"3 2018-07-20T00:00:00 10B 02G A 13.0 40.74450 -73.98822 \n",
"4 2017-11-09T00:00:00 10H 10B 06C A 12.0 40.74456 -73.98847 \n",
"5 2018-02-05T00:00:00 06C A 6.0 40.74476 -73.98809 \n",
"6 2018-08-01T00:00:00 08B 06C A 7.0 40.74476 -73.98809 \n",
"\n",
" geo_location \n",
"0 40.74471,-73.98738 \n",
"1 40.74465,-73.9872 \n",
"2 40.74471,-73.98738 \n",
"3 40.7445,-73.98822 \n",
"4 40.74456,-73.98847 \n",
"5 40.74476,-73.98809 \n",
"6 40.74476,-73.98809 "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataset = public.datasets.get('d8c29d0d-f283-4eb5-b4d4-460c9779d05d')\n",
"df = dataset.current_snapshot.export_dataframe(\n",
" query_mode='advanced',\n",
" geo_query='geo_location:40.744460,-73.987340;distance:100m', \n",
" query='(food_type:asian)AND(grade:A)'\n",
")\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Enigma Public users are limited to 80 exports per month, so using `snapshot.export_dataframe()` as a query interface isn't recommended. Instead, use `snapshot.get_rows()` to get the rows you want and then convert the [TableView](http://docs.enigma.com/public/public_v20_sdk_tableview.html) object to a DataFrame as shown here. "
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>camis</th>\n",
" <th>dba</th>\n",
" <th>boro</th>\n",
" <th>address</th>\n",
" <th>zipcode</th>\n",
" <th>neighborhood</th>\n",
" <th>phone</th>\n",
" <th>cuisine_description</th>\n",
" <th>food_type</th>\n",
" <th>inspection_date</th>\n",
" <th>violation_code</th>\n",
" <th>grade</th>\n",
" <th>score</th>\n",
" <th>latitude</th>\n",
" <th>longitude</th>\n",
" <th>geo_location</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>50019000</td>\n",
" <td>Idea Coffee</td>\n",
" <td>Manhattan</td>\n",
" <td>246 5th Ave</td>\n",
" <td>10001</td>\n",
" <td>Midtown</td>\n",
" <td>6468961533</td>\n",
" <td>Korean</td>\n",
" <td>Asian, Korean</td>\n",
" <td>2018-02-14T00:00:00</td>\n",
" <td>06D 02H</td>\n",
" <td>A</td>\n",
" <td>12.0</td>\n",
" <td>40.74471</td>\n",
" <td>-73.98738</td>\n",
" <td>{'lat': 40.74471, 'lng': -73.98738}</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>50043278</td>\n",
" <td>Bubble Tea &amp; Crepes</td>\n",
" <td>Manhattan</td>\n",
" <td>251 5th Ave</td>\n",
" <td>10016</td>\n",
" <td>Midtown</td>\n",
" <td>6465901959</td>\n",
" <td>Asian</td>\n",
" <td>Asian</td>\n",
" <td>2017-11-30T00:00:00</td>\n",
" <td>04L 08A 10F</td>\n",
" <td>A</td>\n",
" <td>13.0</td>\n",
" <td>40.74465</td>\n",
" <td>-73.9872</td>\n",
" <td>{'lat': 40.74465, 'lng': -73.9872}</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>50044541</td>\n",
" <td>Teisui</td>\n",
" <td>Manhattan</td>\n",
" <td>246 5th Ave</td>\n",
" <td>10001</td>\n",
" <td>Midtown</td>\n",
" <td>9173883596</td>\n",
" <td>Japanese</td>\n",
" <td>Asian, Japanese/Sushi</td>\n",
" <td>2018-03-13T00:00:00</td>\n",
" <td>10F 06A 06D</td>\n",
" <td>A</td>\n",
" <td>13.0</td>\n",
" <td>40.74471</td>\n",
" <td>-73.98738</td>\n",
" <td>{'lat': 40.74471, 'lng': -73.98738}</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>50047830</td>\n",
" <td>Pondicheri</td>\n",
" <td>Manhattan</td>\n",
" <td>15 W 27th St</td>\n",
" <td>10001</td>\n",
" <td>Midtown</td>\n",
" <td>7138200154</td>\n",
" <td>Indian</td>\n",
" <td>Indian, South Asian</td>\n",
" <td>2018-07-20T00:00:00</td>\n",
" <td>10B 02G</td>\n",
" <td>A</td>\n",
" <td>13.0</td>\n",
" <td>40.7445</td>\n",
" <td>-73.98822</td>\n",
" <td>{'lat': 40.7445, 'lng': -73.98822}</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>50069929</td>\n",
" <td>Chikarashi</td>\n",
" <td>Manhattan</td>\n",
" <td>1158 Broadway</td>\n",
" <td>10001</td>\n",
" <td>Midtown</td>\n",
" <td>9172620623</td>\n",
" <td>Japanese</td>\n",
" <td>Asian, Japanese/Sushi</td>\n",
" <td>2017-11-09T00:00:00</td>\n",
" <td>10H 10B 06C</td>\n",
" <td>A</td>\n",
" <td>12.0</td>\n",
" <td>40.74456</td>\n",
" <td>-73.98846999999999</td>\n",
" <td>{'lat': 40.74456, 'lng': -73.98847}</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" camis dba boro address zipcode \\\n",
"0 50019000 Idea Coffee Manhattan 246 5th Ave 10001 \n",
"1 50043278 Bubble Tea & Crepes Manhattan 251 5th Ave 10016 \n",
"2 50044541 Teisui Manhattan 246 5th Ave 10001 \n",
"3 50047830 Pondicheri Manhattan 15 W 27th St 10001 \n",
"4 50069929 Chikarashi Manhattan 1158 Broadway 10001 \n",
"\n",
" neighborhood phone cuisine_description food_type \\\n",
"0 Midtown 6468961533 Korean Asian, Korean \n",
"1 Midtown 6465901959 Asian Asian \n",
"2 Midtown 9173883596 Japanese Asian, Japanese/Sushi \n",
"3 Midtown 7138200154 Indian Indian, South Asian \n",
"4 Midtown 9172620623 Japanese Asian, Japanese/Sushi \n",
"\n",
" inspection_date violation_code grade score latitude \\\n",
"0 2018-02-14T00:00:00 06D 02H A 12.0 40.74471 \n",
"1 2017-11-30T00:00:00 04L 08A 10F A 13.0 40.74465 \n",
"2 2018-03-13T00:00:00 10F 06A 06D A 13.0 40.74471 \n",
"3 2018-07-20T00:00:00 10B 02G A 13.0 40.7445 \n",
"4 2017-11-09T00:00:00 10H 10B 06C A 12.0 40.74456 \n",
"\n",
" longitude geo_location \n",
"0 -73.98738 {'lat': 40.74471, 'lng': -73.98738} \n",
"1 -73.9872 {'lat': 40.74465, 'lng': -73.9872} \n",
"2 -73.98738 {'lat': 40.74471, 'lng': -73.98738} \n",
"3 -73.98822 {'lat': 40.7445, 'lng': -73.98822} \n",
"4 -73.98846999999999 {'lat': 40.74456, 'lng': -73.98847} "
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"snapshot = public.datasets.get('d8c29d0d-f283-4eb5-b4d4-460c9779d05d').current_snapshot\n",
"tableview = snapshot.get_rows(\n",
" query_mode='advanced',\n",
" geo_query='geo_location:40.744460,-73.987340;distance:100m', \n",
" query='(food_type:asian)AND(grade:A)',\n",
" row_sort='camis'\n",
")\n",
"fields = [field.name for field in tableview.fields]\n",
"df = pd.DataFrame(tableview, columns=fields)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are many excellent books on using pandas for data analysis, as well as numerous websites dedicated to the topic. Refer to any of these for information to help with your analysis of Enigma Public datasets."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Moving on\n",
"\n",
"This tutorial provided a brief introduction to the Enigma SDK and a few of the convenience methods that simplify programmatic interaction with Enigma Public through the API. For a complete list of all SDK classes and methods, refer to the [Python SDK Reference](http://docs.enigma.com/public/public_v20_sdk_about.html). "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment