Skip to content

Instantly share code, notes, and snippets.

@mhkeller
Created September 9, 2012 17:11
Show Gist options
  • Save mhkeller/3685743 to your computer and use it in GitHub Desktop.
Save mhkeller/3685743 to your computer and use it in GitHub Desktop.
XPath scraping in R
library("XML")
library("RCurl")
url = "http://thecaucus.blogs.nytimes.com/"
xpath = '//*[@id="entry-230579"]/div/text()'
page = getURL(url)
tree = htmlTreeParse(page, useInternalNodes=TRUE)
text = getNodeSet(tree, '//*[@id="entry-230579"]/div')
text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment