Skip to content

Instantly share code, notes, and snippets.

@stephaniehicks
Created June 10, 2014 18:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stephaniehicks/5e43ba6448ed28a8dce5 to your computer and use it in GitHub Desktop.
Save stephaniehicks/5e43ba6448ed28a8dce5 to your computer and use it in GitHub Desktop.
Clean the tweets from searchTwitter()
clean.tweets <- function(some_txt)
{
some_txt = gsub("&amp", "", some_txt)
some_txt = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", some_txt)
some_txt = gsub("@\\w+", "", some_txt)
some_txt = gsub("[[:punct:]]", "", some_txt)
some_txt = gsub("[[:digit:]]", "", some_txt)
some_txt = gsub("http\\w+", "", some_txt)
some_txt = gsub("[ \t]{2,}", "", some_txt)
some_txt = gsub("^\\s+|\\s+$", "", some_txt)
# define "tolower error handling" function
try.tolower = function(x)
{
y = NA
try_error = tryCatch(tolower(x), error=function(e) e)
if (!inherits(try_error, "error"))
y = tolower(x)
return(y)
}
some_txt = sapply(some_txt, try.tolower)
some_txt = some_txt[some_txt != ""]
names(some_txt) = NULL
return(some_txt)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment