Skip to content

Instantly share code, notes, and snippets.

@WillKoehrsen
Created May 16, 2018 15:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save WillKoehrsen/c2f86b76626ce49048f5a4fedcaae240 to your computer and use it in GitHub Desktop.
Save WillKoehrsen/c2f86b76626ce49048f5a4fedcaae240 to your computer and use it in GitHub Desktop.
# Copy the original data
features = data.copy()
# Select the numeric columns
numeric_subset = data.select_dtypes('number')
# Create columns with log of numeric columns
for col in numeric_subset.columns:
# Skip the Energy Star Score column
if col == 'score':
next
else:
numeric_subset['log_' + col] = np.log(numeric_subset[col])
# Select the categorical columns
categorical_subset = data[['Borough', 'Largest Property Use Type']]
# One hot encode
categorical_subset = pd.get_dummies(categorical_subset)
# Join the two dataframes using concat
# Make sure to use axis = 1 to perform a column bind
features = pd.concat([numeric_subset, categorical_subset], axis = 1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment