Skip to content

Instantly share code, notes, and snippets.

@hrbrmstr
Last active October 17, 2018 16:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hrbrmstr/fb0d24bf6c51d2c3e686ae277c9d22d5 to your computer and use it in GitHub Desktop.
Save hrbrmstr/fb0d24bf6c51d2c3e686ae277c9d22d5 to your computer and use it in GitHub Desktop.
really pathetic child text tag extraction
library(rvest)
read_html(paste0(readLines(textConnection("<html>
<body>
<p> Simple paragraph </p>
<p> Another properly formatted simple paragraph </p>
<div>
<p> Another properly formatted simple paragraph in a div element </p>
</div>
</body>
</html>")), collapse="\n")) -> doc
map_chr(html_children(doc), html_text)
## [1] "\n Simple paragraph \n Another properly formatted simple paragraph \n\n Another properly formatted simple paragraph in a div element \n\n"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment