Last active
November 15, 2019 15:45
-
-
Save geroldcsendes/30dd0f36ab731d80a31a3e1b21e106d9 to your computer and use it in GitHub Desktop.
Get twitter data, clean & analyze it
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Get twitter data | |
trainer <- search_tweets(q = "Guardiola", n=18000, type='mixed', lang="en") | |
# Source for URL removal: https://www.earthdatascience.org/courses/earth-analytics/get-data-using-apis/text-mining-twitter-data-intro-r/ | |
trainer$stripped_text <- gsub("http.*","", trainer$text) # Remove http | |
trainer$stripped_text <- gsub("https.*","", trainer$stripped_text) # Remove https | |
# Emoji removal | |
trainer$plain_tweet <- enc2native(trainer$stripped_text) # Covnert emojis to native encoding | |
trainer$plain_tweet <- gsub("<.*.>", "", trainer$plain_tweet) | |
trainer$plain_tweet <- trimws(trainer$plain_tweet) # Remove leading whitespaces from the beginning |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment