Skip to content

Instantly share code, notes, and snippets.

@dyerrington
Last active May 4, 2016 18:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dyerrington/b72056948ce7d880b59eb67af29ab211 to your computer and use it in GitHub Desktop.
Save dyerrington/b72056948ce7d880b59eb67af29ab211 to your computer and use it in GitHub Desktop.
One of the problems with interpolate() in Pandas is that it only works on continuous data. Using ffill(), you can fill objects. Using gropuby in iteration, we can fill in missing categorical data / object type data cells in our dataframe based on subsets.
import pandas as pd, numpy as np
data = [["blabla", "234234234", "yoyoyo", "Super Store235"],
[np.nan, np.nan, np.nan, "Super Store"],
[np.nan, np.nan, np.nan, "Super Store"],
["yo yo yo", 456, 789, "Super Store"],
[np.nan, np.nan, np.nan, "Super Store"],
[np.nan, np.nan, np.nan, "Super Store"],
[123, 456, 789, "Super Store2"],
[np.nan, np.nan, np.nan, "Super Store2"],
[np.nan, np.nan, np.nan, "Super Store2"],
[np.nan, np.nan, np.nan, "Super Store2"],
[np.nan, np.nan, np.nan, "Super Store2"],
]
df = pd.DataFrame(data, columns=["county_number", "county_district", "random_num", "store"])
for group_label, group_df in df.groupby("store"):
df[df["store"] == group_label] = df[df["store"] == group_label].sort("county_number").ffill()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment