Skip to content

Instantly share code, notes, and snippets.

@vitalibertas
Last active March 11, 2020 16:23
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save vitalibertas/c0aa4560907add2c77b0a9e6ff796378 to your computer and use it in GitHub Desktop.
Python API Download Zipped JSON file, Unzip and Format for Redshift, Upload to S3 as GZip.
gz_buffer = BytesIO()
json_buffer = StringIO()
download_url = "{0}{1}/file".format(request_url, file_id)
request_download = requests.request("GET", download_url, headers=json_header, stream=True)
with zipfile.ZipFile(BytesIO(request_download.content), mode='r') as z:
unzip_file = StringIO(z.read(z.infolist()[0]).decode('utf-8'))
json_responses = json.load(unzip_file)['responses']
for response in json_responses:
json_buffer.write(json.dumps(response))
with gzip.GzipFile(mode='wb', fileobj=gz_buffer) as f:
f.write(json_buffer.getvalue().encode('utf-8'))
return gz_buffer
@vitalibertas
Copy link
Author

Putting data into Redshift is most efficient when using a gzipped JSON file. However, Redshift doesn't like the how the JSON library decodes a Python list to start with brackets -- []. So you need to just take each list element instead. This is all done in memory, so mind how much you've got versus how much you're pulling down!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment