Skip to content

Instantly share code, notes, and snippets.

View shahidash's full-sized avatar

shahidash

  • applied informatics
  • srinagar Kashmir
View GitHub Profile
@uhho
uhho / pandas_s3_streaming.py
Last active December 2, 2022 18:57
Streaming pandas DataFrame to/from S3 with on-the-fly processing and GZIP compression
def s3_to_pandas(client, bucket, key, header=None):
# get key using boto3 client
obj = client.get_object(Bucket=bucket, Key=key)
gz = gzip.GzipFile(fileobj=obj['Body'])
# load stream directly to DF
return pd.read_csv(gz, header=header, dtype=str)
def s3_to_pandas_with_processing(client, bucket, key, header=None):