Skip to content

Instantly share code, notes, and snippets.

@gomesfellipe
Created February 20, 2018 03:47
Show Gist options
  • Save gomesfellipe/ca95c2a9f68915116d39b66a015a4412 to your computer and use it in GitHub Desktop.
Save gomesfellipe/ca95c2a9f68915116d39b66a015a4412 to your computer and use it in GitHub Desktop.
# Trecho da funcao obtida em :
# http://www.sthda.com/english/wiki/word-cloud-generator-in-r-one-killer-function-to-do-everything-you-need
# Download e analise de webpage
html_to_text<-function(url){
library(RCurl)
library(XML)
# download html
html.doc <- getURL(url)
#convert to plain text
doc = htmlParse(html.doc, asText=TRUE)
# "//text()" returns all text outside of HTML tags.
# We also don’t want text such as style and script codes
text <- xpathSApply(doc, "//text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)][not(ancestor::form)]", xmlValue)
# Format text vector into one character string
return(paste(text, collapse = " "))
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment