Skip to content

Instantly share code, notes, and snippets.

@vitalibertas
Last active March 11, 2020 16:23
Show Gist options
  • Save vitalibertas/c0aa4560907add2c77b0a9e6ff796378 to your computer and use it in GitHub Desktop.
Save vitalibertas/c0aa4560907add2c77b0a9e6ff796378 to your computer and use it in GitHub Desktop.
Python API Download Zipped JSON file, Unzip and Format for Redshift, Upload to S3 as GZip.
gz_buffer = BytesIO()
json_buffer = StringIO()
download_url = "{0}{1}/file".format(request_url, file_id)
request_download = requests.request("GET", download_url, headers=json_header, stream=True)
with zipfile.ZipFile(BytesIO(request_download.content), mode='r') as z:
unzip_file = StringIO(z.read(z.infolist()[0]).decode('utf-8'))
json_responses = json.load(unzip_file)['responses']
for response in json_responses:
json_buffer.write(json.dumps(response))
with gzip.GzipFile(mode='wb', fileobj=gz_buffer) as f:
f.write(json_buffer.getvalue().encode('utf-8'))
return gz_buffer
@vitalibertas
Copy link
Author

Putting data into Redshift is most efficient when using a gzipped JSON file. However, Redshift doesn't like the how the JSON library decodes a Python list to start with brackets -- []. So you need to just take each list element instead. This is all done in memory, so mind how much you've got versus how much you're pulling down!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment