Skip to content

Instantly share code, notes, and snippets.

@alistairwalsh
Created September 15, 2015 08:31
Show Gist options
  • Save alistairwalsh/e451a41f388ebb9a5806 to your computer and use it in GitHub Desktop.
Save alistairwalsh/e451a41f388ebb9a5806 to your computer and use it in GitHub Desktop.
Impute missing values
import numpy as np
from sklearn.preprocessing import Imputer
#generate some data
df1 = np.array(np.random.randn(1000)).reshape(100,10)
#make some values 'NaN'
df1[(df1>-.05) & (df1<.05)] = np.nan
X = df1
print(X)
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
X = imp.fit_transform(X)
print(X)
@alistairwalsh
Copy link
Author

the working bits of code are:

imp = Imputer()

Where you create an instance of the Imputer

missing_values='NaN', strategy='mean', axis=0 are all defaults so don't actually need to be stated. It looks at the other values in a column (feature) to generate new values by default (axis = 0)

and

X = imp.fit_transform(X)

Which actually fits the Imputer to the data and creates a new array with the missing values filled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment