-
-
Save toddbirchard/9f6f1508cb24e78315778837a7d31328 to your computer and use it in GitHub Desktop.
"""Find Symmetric Differences between two Pandas DataFrames.""" | |
def dataframe_difference(df1, df2, which=None): | |
"""Find rows which are different.""" | |
comparison_df = df1.merge( | |
df2, | |
indicator=True, | |
how='outer' | |
) | |
if which is None: | |
diff_df = comparison_df[comparison_df['_merge'] != 'both'] | |
else: | |
diff_df = comparison_df[comparison_df['_merge'] == which] | |
diff_df.to_csv('data/diff.csv') | |
return diff_df |
Super useful -- thank you!
I am able to get a left or right but not both, what am I doing wrong?
(below is whole script)
import pandas as pd
def dataframe_difference(wk15, wk16, which=None):
"""Find rows which are different between two DataFrames."""
comparison_df = wk15.merge(wk16,indicator=True, how='outer')
if which is None:
diff_df = comparison_df[comparison_df['_merge'] != 'both']
else:
diff_df = comparison_df[comparison_df['_merge'] == which]
diff_df.to_csv('diff.csv')
return diff_df
wk15 = pd.read_excel('GRNI-wk15.xlsx', sheet_name='GRNI-wk15')
wk16 = pd.read_excel('GRNI-wk16.xlsx', sheet_name='GRNI-wk16')
dataframe_difference(wk15,wk16)
Yes, I am receiving the same results as Velascog;
Can you please explain?
I am also getting the same issue. For small data sets its working file, I have tested it for 50 rows but for around 25k rows. It breaks :(
Thanks for this helper function!
Might as well help with some questions while I am here:
I am able to get a left or right but not both, what am I doing wrong? (below is whole script)
import pandas as pd def dataframe_difference(wk15, wk16, which=None): """Find rows which are different between two DataFrames.""" comparison_df = wk15.merge(wk16,indicator=True, how='outer') if which is None: diff_df = comparison_df[comparison_df['_merge'] != 'both'] else: diff_df = comparison_df[comparison_df['_merge'] == which] diff_df.to_csv('diff.csv') return diff_df
wk15 = pd.read_excel('GRNI-wk15.xlsx', sheet_name='GRNI-wk15') wk16 = pd.read_excel('GRNI-wk16.xlsx', sheet_name='GRNI-wk16')
dataframe_difference(wk15,wk16)
The code is selecting !='both' (not equal to 'both') when not passing in a which paramater value - so you only get left_only and right_only.
To get the values in 'both' dataframes try:
dataframe_difference(wk15,wk16,which='both')
Thank you!
excellent work!