Skip to content

Instantly share code, notes, and snippets.

View PaulieGillett's full-sized avatar

Paulie PaulieGillett

  • Toronto, Ontario
View GitHub Profile
@PaulieGillett
PaulieGillett / gist:c7daa503a001040c16b190f7025b5320
Created April 26, 2017 21:17
R: scrape multiple pages with XML and readHTMLTable
library(XML)
library(plyr)
base.url <- "http://www.ttmeiju.com/meiju/Movie.html?page"
GetTable <- function(page.number) {
full.url <- paste(base.url, page.number, sep = "=")
doc <- htmlParse(full.url, encoding = "GBK")
node <- getNodeSet(doc, "//table")[[2]]
last.row <- xmlSize(node) - 1