Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save toshihiroryuu/551e4bca30ab5be7fb67fe3daa8f2a22 to your computer and use it in GitHub Desktop.
Save toshihiroryuu/551e4bca30ab5be7fb67fe3daa8f2a22 to your computer and use it in GitHub Desktop.
Using the features which do not have missing values, we can predict the nulls with the help of a machine learning algorithm. This method may result in better accuracy, unless a missing value is expected to have a very high variance. We will be using linear regression to replace the nulls in the feature ‘age’, using other available features. One …
from sklear.liner_model import LinerRegression
linreg=LinerRegression()
data_with_null = data[['passengerid', 'class','survived', 'age']].dropna()
data_without_null = data_with_null.dropna()
train_data_x = data_without_null.iolc[:,:4]
train_data_y = data_without_null.iolc[:,4]
linreg.fit(train_data_x,train_data_y)
test_data = data_with_null.iloc[:,:4]
age_predicted['age'] = pd.DataFrame(linreg.predict(test_data))
ata_with_null.age.fillna(age_predicted.age , inplace=True)
@toshihiroryuu
Copy link
Author

Age is predicted with other data available to be replaced later on for the final model with no missing data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment