|Convert Yelp Academic Dataset from JSON to CSV|
|Requires Pandas (https://pypi.python.org/pypi/pandas)|
|By Paul Butler, No Rights Reserved|
|import pandas as pd|
|from glob import glob|
|''' Convert a json string to a flat python dictionary|
|which can be passed into Pandas. '''|
|ob = json.loads(x)|
|for k, v in ob.items():|
|if isinstance(v, list):|
|ob[k] = ','.join(v)|
|elif isinstance(v, dict):|
|for kk, vv in v.items():|
|ob['%s_%s' % (k, kk)] = vv|
|for json_filename in glob('*.json'):|
|csv_filename = '%s.csv' % json_filename[:-5]|
|print 'Converting %s to %s' % (json_filename, csv_filename)|
|df = pd.DataFrame([convert(line) for line in file(json_filename)])|
|df.to_csv(csv_filename, encoding='utf-8', index=False)|
This worked great for reviews and businesses. Thanks a lot for the code. But for users it gives me an error
What should I be doing?
I am getting this error while converting the review dataset. Need help urgently.
Converting yelp_academic_dataset_review.json to yelp_academic_dataset_review.csv
Traceback (most recent call last):
@paulgb Thanks for this excellent start! I used it as a first step in my code to process the 2017 dataset from Yelp: https://github.com/tothebeat/Yelp-Challenge-Dataset
I also got "RuntimeError: dictionary changed size during iteration", while using the above code to open the Round 11 business.json file. For the review.json file, I didn't get the error message but the code ran for almost an hour and nothing happened. Finally, I had to terminate the execution. Any advice to get around these issues is highly appreciated.
I wrote another one that works with the 2018 version of the dataset. In theory it should work with any arbitrary dataset as long as they're structured as one json object per line:
Give it the directory where the json files reside from the command line, it should do the trick.