Skip to content

Instantly share code, notes, and snippets.

View ShahBinoy's full-sized avatar

Binoy Shah ShahBinoy

  • Vibrent Health
  • Fairfax, VA
View GitHub Profile
@ShahBinoy
ShahBinoy / pandas_s3_streaming.py
Created November 1, 2020 05:51 — forked from uhho/pandas_s3_streaming.py
Streaming pandas DataFrame to/from S3 with on-the-fly processing and GZIP compression
def s3_to_pandas(client, bucket, key, header=None):
# get key using boto3 client
obj = client.get_object(Bucket=bucket, Key=key)
gz = gzip.GzipFile(fileobj=obj['Body'])
# load stream directly to DF
return pd.read_csv(gz, header=header, dtype=str)
def s3_to_pandas_with_processing(client, bucket, key, header=None):
#!/bin/bash
# init script for Cassandra.
# chkconfig: 2345 90 10
# description: Cassandra
# script slightly modified from
# http://blog.milford.io/2010/06/installing-apache-cassandra-on-centos/
. /etc/rc.d/init.d/functions