Skip to content

Instantly share code, notes, and snippets.

@yassineAlouini
Created July 4, 2017 13:50
Show Gist options
  • Star 14 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save yassineAlouini/9b36ee91560445ce28b06733a362ced8 to your computer and use it in GitHub Desktop.
Save yassineAlouini/9b36ee91560445ce28b06733a362ced8 to your computer and use it in GitHub Desktop.
Compare two Pandas DataFrames
import pandas as pd
def compare_two_dfs(input_df_1, input_df_2):
df_1, df_2 = input_df_1.copy(), input_df_2.copy()
ne_stacked = (df_1 != df_2).stack()
changed = ne_stacked[ne_stacked]
changed.index.names = ['id', 'col']
difference_locations = np.where(df_1 != df_2)
changed_from = df_1.values[difference_locations]
changed_to = df_2.values[difference_locations]
df = pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index)
return df
@vincentclaes
Copy link

you should add:
import numpy as np

great job btw!

@ryancollingwood
Copy link

Thank you for this. Just a heads up for dataframe with np.nan as the nothing value, as

np.nan == np.nan will always be False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment