Skip to content

Instantly share code, notes, and snippets.

@jurand71
Created June 16, 2022 10:31
Show Gist options
  • Save jurand71/3fbcf6bc8c9e6a5af8f5948149205927 to your computer and use it in GitHub Desktop.
Save jurand71/3fbcf6bc8c9e6a5af8f5948149205927 to your computer and use it in GitHub Desktop.
# Import libraries
import numpy as np
import pandas as pd
# Display all columns
pd.set_option('display.max_columns', None)
# Import Houseprice data from GitHub
data = pd.read_csv('https://github.com/jurand71/datasets/raw/master/HouseSalePriceCompetition/houseprice.csv')
# Three variables were chosen from categorical variables for OneHotEncoder
usecols = ['Neighborhood','Exterior1st','Exterior2nd']
data = data[usecols]
# How many categories are in selected variables
for col in usecols:
print(col,': ',len(data[col].unique()))
# Obtain counts for each variable and replace categories by number of counts
def count_encoding(df, variable):
count_map = df[variable].value_counts().to_dict()
df[variable]=df[variable].map(count_map)
for var in usecols:
count_encoding(data, var)
data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment