Skip to content

Instantly share code, notes, and snippets.

@Robinlovelace
Created October 27, 2014 15:21
Show Gist options
  • Save Robinlovelace/29fc0c8b4023f6013aec to your computer and use it in GitHub Desktop.
Save Robinlovelace/29fc0c8b4023f6013aec to your computer and use it in GitHub Desktop.
Removal of sensitive words
# Remove sensitive text
summary(factor(Encoding(tdft$text)))
Encoding(tdft$text) <- "UTF-8"
tdft$text <- iconv(tdft$text, "UTF-8", "UTF-8",sub='')
tdft$text <- gsub('@\\S+', '@', tdft$text) # remove all to '@' texts
tdft$text <- gsub('http\\S+', 'http', tdft$text) # remove all to hyperlinks
head(tdft$text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment