Skip to content

Instantly share code, notes, and snippets.

@chucknado
Last active November 23, 2020 07:41
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save chucknado/fc39d82352d8eb9323a9 to your computer and use it in GitHub Desktop.
Save chucknado/fc39d82352d8eb9323a9 to your computer and use it in GitHub Desktop.
Sample script for "Write large data sets in Excel with Python and pandas" at https://support.zendesk.com/hc/en-us/articles/212227138
import dateutil.parser
import pandas as pd
topic = pd.read_pickle('my_serialized_data')
posts_df = pd.DataFrame(topic['posts'], columns=['id', 'title', 'created_at', 'author_id'])
users_df = pd.DataFrame(topic['users'], columns=['id', 'name']).drop_duplicates(subset=['id'])
posts_df['created_at'] = posts_df['created_at'].apply(lambda x: dateutil.parser.parse(x).date())
merged_df = pd.merge(posts_df, users_df, how='left', left_on='author_id', right_on='id')
merged_df.rename(columns={'id_x': 'post_id'}, inplace=True)
merged_df.drop(['id_y', 'author_id'], axis=1, inplace=True)
merged_df.to_excel('topic_posts.xlsx', index=False)
print('Spreadsheet saved.')
@nimicent
Copy link

nimicent commented Jul 8, 2017

Love this, thank you for that tutorial!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment