Skip to content

Instantly share code, notes, and snippets.

@lakshay-arora
Created December 9, 2019 11:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lakshay-arora/f4241030c5d6eff434e30833d3d2096a to your computer and use it in GitHub Desktop.
Save lakshay-arora/f4241030c5d6eff434e30833d3d2096a to your computer and use it in GitHub Desktop.
# define stage 1: tokenize the tweet text
stage_1 = RegexTokenizer(inputCol= 'tweet' , outputCol= 'tokens', pattern= '\\W')
# define stage 2: remove the stop words
stage_2 = StopWordsRemover(inputCol= 'tokens', outputCol= 'filtered_words')
# define stage 3: create a word vector of the size 100
stage_3 = Word2Vec(inputCol= 'filtered_words', outputCol= 'vector', vectorSize= 100)
# define stage 4: Logistic Regression Model
model = LogisticRegression(featuresCol= 'vector', labelCol= 'label')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment