Skip to content

Instantly share code, notes, and snippets.

@makispl
Created April 10, 2020 15:08
Show Gist options
  • Save makispl/c5ed5c62be4306750636c0d349c506e7 to your computer and use it in GitHub Desktop.
Save makispl/c5ed5c62be4306750636c0d349c506e7 to your computer and use it in GitHub Desktop.
# Read in data
spam_collection = pd.read_csv('SMSSpamCollection', sep='\t', header=None, names=['Label', 'SMS'])
# Randomize the data set
randomized_collection = spam_collection.sample(frac=1, random_state=3)
# Calculate index for the split-up
training_test_index = round(len(randomized_collection) * 0.8)
# Training/Test split-up
training_set = randomized_collection[:training_test_index].reset_index(drop=True)
test_set = randomized_collection[training_test_index:].reset_index(drop=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment