Created
September 25, 2018 06:58
-
-
Save CaptainAshis/ce0e9fb9e65595cdcedbec60e93706ab to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Step 8 | |
# Next we'll fill in missing values to avoid complications with NA's. | |
# NA (not available) is how Pandas indicates missing values; many models have problems when missing values are present, | |
# so it's always important to think about how to deal with them. | |
# In these cases, we are picking an arbitrary signal value that doesn't otherwise appear in the data. | |
for df in (joined,joined_test): | |
df['CompetitionOpenSinceYear'] = df.CompetitionOpenSinceYear.fillna(1900).astype(np.int32) | |
df['CompetitionOpenSinceMonth'] = df.CompetitionOpenSinceMonth.fillna(1).astype(np.int32) | |
df['Promo2SinceYear'] = df.Promo2SinceYear.fillna(1900).astype(np.int32) | |
df['Promo2SinceWeek'] = df.Promo2SinceWeek.fillna(1).astype(np.int32) | |
# Next we'll extract features "CompetitionOpenSince" and "CompetitionDaysOpen". | |
# Note the use of apply() in mapping a function across dataframe values. | |
for df in (joined,joined_test): | |
df["CompetitionOpenSince"] = pd.to_datetime(dict(year=df.CompetitionOpenSinceYear, | |
month=df.CompetitionOpenSinceMonth, day=15)) | |
df["CompetitionDaysOpen"] = df.Date.subtract(df.CompetitionOpenSince).dt.days | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment