Skip to content

Instantly share code, notes, and snippets.

@ryancollingwood
Forked from yassineAlouini/compare_dfs.py
Last active September 11, 2018 02:11
Show Gist options
  • Save ryancollingwood/983bf7112a1743306a9f08b8fb5a64dd to your computer and use it in GitHub Desktop.
Save ryancollingwood/983bf7112a1743306a9f08b8fb5a64dd to your computer and use it in GitHub Desktop.
Compare two Pandas DataFrames
import pandas as pd
import numpy as np
def compare_two_dfs(input_df_1, input_df_2):
# explicitly calling fillna with ""
# as if you've used np.nan it has the
# property of nevery being able to be equals
# i.e. `np.nan == np.nan` will always be False
df_1, df_2 = input_df_1.copy().fillna(""), input_df_2.copy().fillna("")
ne_stacked = (df_1 != df_2).stack()
changed = ne_stacked[ne_stacked]
changed.index.names = ["id", "col"]
difference_locations = np.where(df_1 != df_2)
changed_from = df_1.values[difference_locations]
changed_to = df_2.values[difference_locations]
df = pd.DataFrame({"from": changed_from, "to": changed_to}, index=changed.index)
return df
@ryancollingwood
Copy link
Author

Using blank strings for fillna value probably isn't as performance conscious as some other known value.
But meh for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment