Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Compare two Pandas DataFrames
import pandas as pd
def compare_two_dfs(input_df_1, input_df_2):
df_1, df_2 = input_df_1.copy(), input_df_2.copy()
ne_stacked = (df_1 != df_2).stack()
changed = ne_stacked[ne_stacked]
changed.index.names = ['id', 'col']
difference_locations = np.where(df_1 != df_2)
changed_from = df_1.values[difference_locations]
changed_to = df_2.values[difference_locations]
df = pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index)
return df
@vincentclaes

This comment has been minimized.

Copy link

commented Sep 7, 2018

you should add:
import numpy as np

great job btw!

@ryancollingwood

This comment has been minimized.

Copy link

commented Sep 11, 2018

Thank you for this. Just a heads up for dataframe with np.nan as the nothing value, as

np.nan == np.nan will always be False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.