Skip to content

Instantly share code, notes, and snippets.

@francoisstamant
Created May 28, 2020 15:57
Show Gist options
  • Save francoisstamant/3ea91cf8b299292cf407b27898992187 to your computer and use it in GitHub Desktop.
Save francoisstamant/3ea91cf8b299292cf407b27898992187 to your computer and use it in GitHub Desktop.
#Clean size
df['size']=pd.to_numeric(df['size'].str.extract('(\d+)', expand=False))
df['size']+=0.5
#Clean location
df['distance_center']=pd.to_numeric(df['distance_center'].str.extract('(\d+)', expand=False))
df['distance_center'] = df['distance_center'].fillna(1)
#Clean price
new=[]
for i in range(0,len(df)):
strings = ''.join([i for i in df.price[i] if i.isdigit()])
strings = strings[:-2]
new.append(pd.to_numeric(strings))
df['price'] = new
df['price'].replace('', np.nan, inplace=True)
#Clean all and remove outliers
df=df.dropna()
df = df[df['price'] < 10000]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment