Skip to content

Instantly share code, notes, and snippets.

@jitsejan
Last active February 13, 2024 12:53
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save jitsejan/e24c6f9b288a839f40edd1ce944a747e to your computer and use it in GitHub Desktop.
Save jitsejan/e24c6f9b288a839f40edd1ce944a747e to your computer and use it in GitHub Desktop.
Write a Pandas dataframe to CSV format on AWS S3.
import boto3
from io import StringIO
def _write_dataframe_to_csv_on_s3(dataframe, filename):
""" Write a dataframe to a CSV on S3 """
print("Writing {} records to {}".format(len(dataframe), filename))
# Create buffer
csv_buffer = StringIO()
# Write dataframe to buffer
dataframe.to_csv(csv_buffer, sep="|", index=False)
# Create S3 object
s3_resource = boto3.resource("s3")
# Write buffer to S3 object
s3_resource.Object(DESTINATION, filename).put(Body=csv_buffer.getvalue())
@RMCollins175
Copy link

Great thanks! But do you know how you can successfully add UTF-8-sig encoding? Otherwise using the above I get words like this: 'R√©union' instead of Rèunion when I download my csv from s3 bucket

@jitsejan
Copy link
Author

Great thanks! But do you know how you can successfully add UTF-8-sig encoding? Otherwise using the above I get words like this: 'R√©union' instead of Rèunion when I download my csv from s3 bucket

I assume you can use the encoding parameter of Pandas to_csv. Are you able to add encoding="utf-8" to the dataframe.to_csv() step?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment