Skip to content

Instantly share code, notes, and snippets.

@CaptainAshis
Created September 25, 2018 06:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save CaptainAshis/ce0e9fb9e65595cdcedbec60e93706ab to your computer and use it in GitHub Desktop.
Save CaptainAshis/ce0e9fb9e65595cdcedbec60e93706ab to your computer and use it in GitHub Desktop.
# Step 8
# Next we'll fill in missing values to avoid complications with NA's.
# NA (not available) is how Pandas indicates missing values; many models have problems when missing values are present,
# so it's always important to think about how to deal with them.
# In these cases, we are picking an arbitrary signal value that doesn't otherwise appear in the data.
for df in (joined,joined_test):
df['CompetitionOpenSinceYear'] = df.CompetitionOpenSinceYear.fillna(1900).astype(np.int32)
df['CompetitionOpenSinceMonth'] = df.CompetitionOpenSinceMonth.fillna(1).astype(np.int32)
df['Promo2SinceYear'] = df.Promo2SinceYear.fillna(1900).astype(np.int32)
df['Promo2SinceWeek'] = df.Promo2SinceWeek.fillna(1).astype(np.int32)
# Next we'll extract features "CompetitionOpenSince" and "CompetitionDaysOpen".
# Note the use of apply() in mapping a function across dataframe values.
for df in (joined,joined_test):
df["CompetitionOpenSince"] = pd.to_datetime(dict(year=df.CompetitionOpenSinceYear,
month=df.CompetitionOpenSinceMonth, day=15))
df["CompetitionDaysOpen"] = df.Date.subtract(df.CompetitionOpenSince).dt.days
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment