Skip to content

Instantly share code, notes, and snippets.

@geroldcsendes
Last active November 15, 2019 15:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save geroldcsendes/30dd0f36ab731d80a31a3e1b21e106d9 to your computer and use it in GitHub Desktop.
Save geroldcsendes/30dd0f36ab731d80a31a3e1b21e106d9 to your computer and use it in GitHub Desktop.
Get twitter data, clean & analyze it
# Get twitter data
trainer <- search_tweets(q = "Guardiola", n=18000, type='mixed', lang="en")
# Source for URL removal: https://www.earthdatascience.org/courses/earth-analytics/get-data-using-apis/text-mining-twitter-data-intro-r/
trainer$stripped_text <- gsub("http.*","", trainer$text) # Remove http
trainer$stripped_text <- gsub("https.*","", trainer$stripped_text) # Remove https
# Emoji removal
trainer$plain_tweet <- enc2native(trainer$stripped_text) # Covnert emojis to native encoding
trainer$plain_tweet <- gsub("<.*.>", "", trainer$plain_tweet)
trainer$plain_tweet <- trimws(trainer$plain_tweet) # Remove leading whitespaces from the beginning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment