Skip to content

Instantly share code, notes, and snippets.

@mmmayo13
Created March 21, 2018 23:09
Show Gist options
  • Save mmmayo13/b35315c8fcaace31db5ecaf96f068753 to your computer and use it in GitHub Desktop.
Save mmmayo13/b35315c8fcaace31db5ecaf96f068753 to your computer and use it in GitHub Desktop.
# Split full dataset into train and test sets
train = titanic_full[:889]; train.name='titanic_train_clean'
test = titanic_full[889:]; test.name='titanic_test_clean'
test.drop(['Survived'], axis=1, inplace=True)
def validate_test_split(df, validate_percent=.25, seed=42):
np.random.seed(seed)
perm = np.random.permutation(df.index)
m = len(df.index)
validate_end = int(validate_percent * m)
validate = df.ix[perm[:validate_end]]
test = df.ix[perm[validate_end:]]
return validate, test
validate, test = validate_test_split(test)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment