Skip to content

Instantly share code, notes, and snippets.

Created Sep 3, 2014
What would you like to do?
javier's notes on data munging yelp

How to process YELP data for Map visualization

Download the data

Take it from here:

Transforming the data from JSON to CSV string to be able to import it

Change quotes so that each JSON line inside the file gets imported as a row as a CSV

sed -i.bak "s/\"/\"\"/g" yelp_academic_dataset_business.json
sed -i.bak "s/\(.*\)/\"\1\"/" yelp_academic_dataset_business.json
awk 'BEGIN{print "data_json"}1' yelp_academic_dataset_business.json > yelp_bussiness.csv

Lets do the same with the checkins

sed -i.bak "s/\"/\"\"/g" yelp_academic_dataset_checkin.json
sed -i.bak "s/\(.*\)/\"\1\"/" yelp_academic_dataset_checkin.json

##Upload the yelp_bussiness.csv file to CartoDB You can do this manually just drag and drop

##Create a new table with the data parsed We are going to do a SQL query that reads the JSON and transforms it into multiple columns so CartoDB can use them easily. We will run this inside CartoDB already.

data_json::json->>'type' as type, 
data_json::json->>'business_id' as business_id, 
data_json::json->>'name' as name, 
data_json::json->>'full_address' as full_address, 
data_json::json->>'city' as city, 
data_json::json->>'state' as state, 
(data_json::json->>'longitude')::float as longitude_f, 
(data_json::json->>'latitude')::float as latitude_f,
(data_json::json->>'stars')::float as stars, 
(data_json::json->>'review_count')::integer as review_count, 
(data_json::json->>'open')::boolean as open,
(data_json::json->'attributes')::json as attributes
from yelp_bussiness

If all this works fine, now we create a table from this view. Click on Option and then "Table from Query". Give it a name such as yelp_businesses.

After that we can georeference the table clicking on Options and selecting the latitude and longitude columns.

Now we have a table with all the business already imported.

create table checkins AS SELECT '2014-07-06 00:00'::timestamp + ( ( (((string_to_array(json_data.key,'-'))[2]::integer)*24) + (string_to_array(json_data.key,'-'))[1]::integer) ||' hour')::interval as timecheckin, json_data.value::integer AS checkins, y.data_json::json->>'business_id' as business_id FROM yelp_checkin as y, json_each_text(y.data_json::json->'checkin_info') AS json_data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment