Skip to content

Instantly share code, notes, and snippets.

@inoryy
Created November 18, 2015 15:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save inoryy/f393d8b3b38eec662075 to your computer and use it in GitHub Desktop.
Save inoryy/f393d8b3b38eec662075 to your computer and use it in GitHub Desktop.
import pandas
from sklearn.linear_model import LogisticRegression
from sklearn import cross_validation
def sanitize(dataset):
dataset["Age"] = dataset["Age"].fillna(dataset["Age"].median())
dataset["Fare"] = dataset["Fare"].fillna(dataset["Fare"].median())
dataset.loc[dataset["Sex"] == "male", "Sex"] = 0
dataset.loc[dataset["Sex"] == "female", "Sex"] = 1
dataset["Embarked"] = dataset["Embarked"].fillna("S")
dataset.loc[dataset["Embarked"] == "S", "Embarked"] = 0
dataset.loc[dataset["Embarked"] == "C", "Embarked"] = 1
dataset.loc[dataset["Embarked"] == "Q", "Embarked"] = 2
return dataset
if __name__ == '__main__':
titanic = sanitize(pandas.read_csv("train.csv"))
titanic_test = sanitize(pandas.read_csv("test.csv"))
predictors = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]
alg = LogisticRegression(random_state=1)
alg.fit(titanic[predictors], titanic["Survived"])
predictions = alg.predict(titanic_test[predictors])
scores = cross_validation.cross_val_score(alg, titanic[predictors], titanic["Survived"], cv=3)
print(scores.mean())
submission = pandas.DataFrame({"PassengerId": titanic_test["PassengerId"], "Survived": predictions })
submission.to_csv("kaggle.csv", index=False)
@inoryy
Copy link
Author

inoryy commented Nov 18, 2015

Extracted & tweaked code from Getting started with Kaggle tutorial for the Titanic Kaggle challenge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment