Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Compare two Pandas DataFrames
import pandas as pd
def compare_two_dfs(input_df_1, input_df_2):
df_1, df_2 = input_df_1.copy(), input_df_2.copy()
ne_stacked = (df_1 != df_2).stack()
changed = ne_stacked[ne_stacked]
changed.index.names = ['id', 'col']
difference_locations = np.where(df_1 != df_2)
changed_from = df_1.values[difference_locations]
changed_to = df_2.values[difference_locations]
df = pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index)
return df
@vincentclaes

This comment has been minimized.

Copy link

@vincentclaes vincentclaes commented Sep 7, 2018

you should add:
import numpy as np

great job btw!

@ryancollingwood

This comment has been minimized.

Copy link

@ryancollingwood ryancollingwood commented Sep 11, 2018

Thank you for this. Just a heads up for dataframe with np.nan as the nothing value, as

np.nan == np.nan will always be False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment