Skip to content

Instantly share code, notes, and snippets.

@JoeReis
Last active August 29, 2015 14:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JoeReis/caa983c090d0020d1e97 to your computer and use it in GitHub Desktop.
Save JoeReis/caa983c090d0020d1e97 to your computer and use it in GitHub Desktop.
Simple Python Script To Convert A Table Into Machine Learning ready format
"""
Goal is to be able to transform a text file into machine readable form
"""
import pandas as pd
import numpy as np
from sklearn import preprocessing
#open the file
df = pd.read_csv('<your file name>')
#select categoricals
categoricals = df.select_dtypes(include=['object'])
#select non-categoricals
non_cats = df.select_dtypes(include=['int64', 'float64'])
df_categoricals = []
for i in categoricals:
df_categoricals.append(df[i])
#transform categoricals into dummy variables
dummies = []
for i in df_categoricals:
dummies.append(pd.get_dummies(i))
dummies.append(non_cats)
df_new = pd.DataFrame()
df_new = pd.concat(dummies, axis=1)
#fill in nulls
df_new.fillna(0, inplace = True)
print df_new
#apply standarization to new df
X_scaled = preprocessing.scale(df_new)
print X_scaled
#export data to csv, remove header
df_new.to_csv("output.csv", header=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment