Skip to content

Instantly share code, notes, and snippets.

@jasonrhaas
Last active August 29, 2015 14:16
Show Gist options
  • Save jasonrhaas/c4340ae8685f1534574b to your computer and use it in GitHub Desktop.
Save jasonrhaas/c4340ae8685f1534574b to your computer and use it in GitHub Desktop.
Make GNIP data serialized (\n) JSON
#!/usr/bin/env python
import json
import fileinput
def process_file(files):
record_count = 0
errors = 0
objs = []
for chunk in files:
chunk = chunk.replace('"}{"', '"}\n{"')
lines = chunk.split('\n')
for line in lines:
try:
obj = json.loads(line)
objs.append(obj)
except ValueError as ee:
sys.stderr.write('failed to parse: {}\n'.format(line))
errors += 1
record_count += 1
stats = "{} records processed; {} errors\n".format(record_count, errors)
return (stats, objs)
outputs = process_file(fileinput.input())
for i in outputs[1]:
print json.dumps(i, indent=4)
@jasonrhaas
Copy link
Author

outputs JSON data to stdout. To write to a file,
python gnip_snippet.py [fileglob] > output.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment