Skip to content

Instantly share code, notes, and snippets.

@samuelleach
Created March 9, 2016 15:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save samuelleach/db6d339fe16a67422f3c to your computer and use it in GitHub Desktop.
Save samuelleach/db6d339fe16a67422f3c to your computer and use it in GitHub Desktop.
Preparation script to create data for a Sankey diagram in Tableau. Input data is a row based and 'country' and 'sector' are two dimensions of the data.
# Preparation script to create data for a Sankey diagram in Tableau.
# Input data is a row based and 'country' and 'sector' are two dimensions of the data.
# See https://public.tableau.com/profile/samuelleach#!/vizhome/SankeyTemplate/Dashboard1
import pandas as pd
infile = 'all_loans.csv'
outfile = 'all_loans_sankey.csv'
sankey_columns = ['country', 'sector']
print 'Reading ' + infile
df = pd.read_csv(infile, low_memory=False)
print 'Performing groupby operations'
bygroup_treatment = df.groupby(sankey_columns)
df = bygroup_treatment.sum()
print 'Adding RowType column'
df['RowType'] = pd.Series('Dummy', index=df.index)
print 'Copying dataframe'
df2 = df.copy()
df2['RowType'] = 'Real'
print 'Concatenating dataframes'
result = pd.concat([df, df2])
print 'Writing data frame to ' + outfile
result.to_csv(outfile)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment