Skip to content

Instantly share code, notes, and snippets.

@LauraLangdon
Created August 10, 2021 04:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save LauraLangdon/a1a8e1deee02479a9c09892f326a51ef to your computer and use it in GitHub Desktop.
Save LauraLangdon/a1a8e1deee02479a9c09892f326a51ef to your computer and use it in GitHub Desktop.
def individual_tweet_vectorizer(corpus, tweet, index=0, author=''):
"""
Formats a single tweet as a vector
:param corpus: list of all words in tweets
:param tweet: tweet to be vectorized
:param index: index of tweet in main list of tweets
:param author: Trump or general
:return: Single tweet in vector form
"""
individual_tweet_vector = np.zeros((1, len(corpus) + 2), dtype=int)
for word in range(len(corpus)):
if corpus[word] in tweet:
individual_tweet_vector[0][word] = 1
if author != '': # If author is specified, set the last value of the tweet vector to 1
individual_tweet_vector[0][-1] = 1
individual_tweet_vector[0][-2] = index # Keep track of index of tweet for interpretation
return individual_tweet_vector
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment