Skip to content

Instantly share code, notes, and snippets.

@ogyalcin
Created July 14, 2018 06:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ogyalcin/9f1b410563bb8960a8d5437a847a8ad6 to your computer and use it in GitHub Desktop.
Save ogyalcin/9f1b410563bb8960a8d5437a847a8ad6 to your computer and use it in GitHub Desktop.
Clean the Test Data
test = pd.read_csv("test.csv") # load the testing data
ids = test[['PassengerId']] # create a sub-dataset for submission file and saving it
test.drop(['PassengerId', 'Name', 'Ticket', 'Cabin'], 1, inplace=True) # drop the irrelevant and keeping the rest
test.fillna(2, inplace=True) # fill (instead of drop) empty rows so that I would get the exact row number required for submission
test = pd.get_dummies(test) # convert non-numerical variables to dummy variables
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment