Last active
April 29, 2019 11:05
-
-
Save toshihiroryuu/551e4bca30ab5be7fb67fe3daa8f2a22 to your computer and use it in GitHub Desktop.
Using the features which do not have missing values, we can predict the nulls with the help of a machine learning algorithm. This method may result in better accuracy, unless a missing value is expected to have a very high variance. We will be using linear regression to replace the nulls in the feature ‘age’, using other available features. One …
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sklear.liner_model import LinerRegression | |
linreg=LinerRegression() | |
data_with_null = data[['passengerid', 'class','survived', 'age']].dropna() | |
data_without_null = data_with_null.dropna() | |
train_data_x = data_without_null.iolc[:,:4] | |
train_data_y = data_without_null.iolc[:,4] | |
linreg.fit(train_data_x,train_data_y) | |
test_data = data_with_null.iloc[:,:4] | |
age_predicted['age'] = pd.DataFrame(linreg.predict(test_data)) | |
ata_with_null.age.fillna(age_predicted.age , inplace=True) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Age is predicted with other data available to be replaced later on for the final model with no missing data.