Last active
October 11, 2018 16:03
-
-
Save CaptainAshis/11424e1943b0aa9ae7e5dd50c781cac4 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Step 1. | |
# Keep the datasets in respective variables | |
train, test, store, store_states, state_names, googletrend, weather = tables | |
# Step 2. | |
# Check out the length of train and test dataset. | |
len(train),len(test) | |
# (1017209, 41088) | |
# Step 3. | |
# We turn state Holidays to booleans, to make them more convenient for modeling. | |
# We can do calculations on pandas fields using notation very similar (often identical) to numpy. | |
train.StateHoliday = train.StateHoliday!='0' | |
test.StateHoliday = test.StateHoliday!='0' | |
# Step 4. | |
# `join_df` is a function for joining tables on specific fields. By default, we'll be doing a left outer join of `right` on | |
# the `left` argument using the given fields for each table. | |
# Pandas does joins using the `merge` method. The `suffixes` argument describes the naming convention for duplicate fields. | |
# We've elected to leave the duplicate field names on the left untouched, and append a "\_y" to those on the right. | |
def join_df(left, right, left_on, right_on=None, suffix='_y'): | |
# left_on suggest which column on the left table should we match with the table on the right. | |
# right_on suggest which column on the right table should we match with the table on the left. | |
# If two tables have the same column , the right table column will be suffixed with '_y' | |
if right_on is None: right_on = left_on | |
return left.merge(right, how='left', left_on=left_on, right_on=right_on, | |
suffixes=("", suffix)) | |
# Concatenating two tables weather and state_names based on columns "file" and "StateName" | |
weather = join_df(weather, state_names, "file", "StateName") | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment