Skip to content

Instantly share code, notes, and snippets.

@CaptainAshis
Last active October 11, 2018 16:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save CaptainAshis/11424e1943b0aa9ae7e5dd50c781cac4 to your computer and use it in GitHub Desktop.
Save CaptainAshis/11424e1943b0aa9ae7e5dd50c781cac4 to your computer and use it in GitHub Desktop.
# Step 1.
# Keep the datasets in respective variables
train, test, store, store_states, state_names, googletrend, weather = tables
# Step 2.
# Check out the length of train and test dataset.
len(train),len(test)
# (1017209, 41088)
# Step 3.
# We turn state Holidays to booleans, to make them more convenient for modeling.
# We can do calculations on pandas fields using notation very similar (often identical) to numpy.
train.StateHoliday = train.StateHoliday!='0'
test.StateHoliday = test.StateHoliday!='0'
# Step 4.
# `join_df` is a function for joining tables on specific fields. By default, we'll be doing a left outer join of `right` on
# the `left` argument using the given fields for each table.
# Pandas does joins using the `merge` method. The `suffixes` argument describes the naming convention for duplicate fields.
# We've elected to leave the duplicate field names on the left untouched, and append a "\_y" to those on the right.
def join_df(left, right, left_on, right_on=None, suffix='_y'):
# left_on suggest which column on the left table should we match with the table on the right.
# right_on suggest which column on the right table should we match with the table on the left.
# If two tables have the same column , the right table column will be suffixed with '_y'
if right_on is None: right_on = left_on
return left.merge(right, how='left', left_on=left_on, right_on=right_on,
suffixes=("", suffix))
# Concatenating two tables weather and state_names based on columns "file" and "StateName"
weather = join_df(weather, state_names, "file", "StateName")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment