Skip to content

Instantly share code, notes, and snippets.

@ramhiser
Last active August 29, 2015 14:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ramhiser/61e60b0a7b21422edee8 to your computer and use it in GitHub Desktop.
Save ramhiser/61e60b0a7b21422edee8 to your computer and use it in GitHub Desktop.
Reindexing Pandas DataFrame with MultiIindex.from_product triggers missing values
import pandas as pd
df = pd.DataFrame([['01-02-2015', 'a', 17],
['01-09-2015', 'a', 42],
['01-30-2015', 'a', 19],
['01-02-2015', 'b', 23],
['01-23-2015', 'b', 1],
['01-30-2015', 'b', 13]])
df.columns = ['date', 'group', 'response']
df.set_index(['date', 'group'], inplace=True)
#date_idx = pd.date_range('01-02-2015', '01-30-2015', freq='7D')
date_idx = ['01-02-2015', '01-09-2015', '01-16-2015', '01-23-2015', '01-30-2015']
group_idx = ['a', 'b']
idx_product = pd.MultiIndex.from_product([date_idx, group_idx], names=['date', 'group'])
df.reindex(idx_product, fill_value=0)
# Pandas 0.15.2
import pandas as pd
# Goal: Fill missing date/group pairs with response = 0 using Cartesian product
df = pd.DataFrame([['01-02-2015', 'a', 17],
['01-09-2015', 'a', 42],
['01-30-2015', 'a', 19],
['01-02-2015', 'b', 23],
['01-23-2015', 'b', 1],
['01-30-2015', 'b', 13]])
df.columns = ['date', 'group', 'response']
df.set_index(['date', 'group'], inplace=True)
# Cartesian product of factors to fill missing values
date_idx = pd.date_range('01-02-2015', '01-30-2015', freq='7D')
group_idx = ['a', 'b']
iterables = [date_idx, group_idx]
idx_product = pd.MultiIndex.from_tuple(iterables, names=['date', 'group'])
# df.response is all NaN values after this line
df = df.reindex(idx_product)
@ramhiser
Copy link
Author

ramhiser commented Feb 9, 2015

Replacing date_idx with strings fixes the issue. Looks like the NaN values are caused by pd.date_range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment