Skip to content

Instantly share code, notes, and snippets.

@crsh
Last active February 1, 2018 11:16
Show Gist options
  • Save crsh/bd4d1f62d300462ea0c0f44b9ad38616 to your computer and use it in GitHub Desktop.
Save crsh/bd4d1f62d300462ea0c0f44b9ad38616 to your computer and use it in GitHub Desktop.
Download data from GitHub repository
batch_download_github <- function(url, pattern, path, ...) {
if(!require("rvest")) stop("Please install the 'rvest' package.")
if(!require("RCurl")) stop("Please install the 'RCurl' package.")
# Fetch file names
github_page <- read_html(url)
file_nodes <- html_nodes(github_page, ".content .css-truncate-target .js-navigation-open")
file_names <- html_text(file_nodes)
file_url <- html_attr(file_nodes, "href")[grep(pattern, file_names)]
file_names <- file_names[grep(pattern, file_names)]
file_url <- paste0("https://raw.githubusercontent.com", file_url)
file_url <- gsub("blob/", "", file_url)
data <- data.frame(
file_name = file_names
, file_content = sapply(file_url, getURL)
)
apply(data, 1, function(x) {
writeLines(
text = x["file_content"]
, con = paste0(path, x["file_name"])
)
})
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment