Skip to content

Instantly share code, notes, and snippets.

@ogyalcin
Last active October 8, 2020 14:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ogyalcin/2a2102d1f2c14416b54a029d688cc1ac to your computer and use it in GitHub Desktop.
Save ogyalcin/2a2102d1f2c14416b54a029d688cc1ac to your computer and use it in GitHub Desktop.
Clean the Training Data
import pandas as pd
train = pd.read_csv("train.csv") #load the data from the system
train = train.drop(['Cabin'], 1, inplace=False) # First dropping 'Cabin' column because it has a lot of null values.
train = train.dropna() #delete the rows with empty values
y = train['Survived'] #select the column representing survival
X = train.drop(['Survived', 'PassengerId', 'Name', 'Ticket'], 1, inplace=True) # drop the irrelevant columns and keep the rest
X = pd.get_dummies(train) # convert non-numerical variables to dummy variables
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment