Skip to content

Instantly share code, notes, and snippets.

@LauraLangdon
Created August 10, 2021 05:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save LauraLangdon/32ad8ba990a1fb74a2c20f59551cf590 to your computer and use it in GitHub Desktop.
Save LauraLangdon/32ad8ba990a1fb74a2c20f59551cf590 to your computer and use it in GitHub Desktop.
def split_train_test(tweet_vectors, randomized_tweet_vectors) -> tuple:
"""
Split into train and test sets
:param tweet_vectors: tweets in vector form
:return: train_set, test_set tuple of train set and test set
"""
x_train_dim = math.floor(0.8 * tweet_vectors.shape[0]) # Use 80% of data for train set
x_test_dim = math.ceil(0.2 * tweet_vectors.shape[0]) # Use 20% of data for test set
y_dim = tweet_vectors.shape[1]
train_set = np.zeros((x_train_dim, y_dim), dtype=int)
test_set = np.zeros((x_test_dim, y_dim), dtype=int)
for x in range(x_train_dim):
for y in range(y_dim):
train_set[x][y] = randomized_tweet_vectors[x][y]
for x in range(x_test_dim):
for y in range(y_dim):
test_set[x][y] = randomized_tweet_vectors[x + x_train_dim][y]
return train_set, test_set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment