Skip to content

Instantly share code, notes, and snippets.

@crsh
Last active January 31, 2018 17:05
Show Gist options
  • Save crsh/b7372679c4e99442de7cceb05170bfe0 to your computer and use it in GitHub Desktop.
Save crsh/b7372679c4e99442de7cceb05170bfe0 to your computer and use it in GitHub Desktop.
Function to quickly rbind multiple data-files from a GitHub repository
batch_read_github <- function(url, pattern, read_fun, ...) {
if(!require("rvest")) stop("Please install the 'rvest' package.")
if(!require("RCurl")) stop("Please install the 'RCurl' package.")
# Fetch file names
github_page <- read_html(url)
file_nodes <- html_nodes(github_page, ".content .css-truncate-target .js-navigation-open")
file_names <- html_text(file_nodes)
file_url <- html_attr(file_nodes, "href")[grep(pattern, file_names)]
file_url <- paste0("https://raw.githubusercontent.com", file_url)
file_url <- gsub("blob/", "", file_url)
data <- lapply(file_url, getURL) #, ...)
data <- lapply(data, function(x) read_fun(textConnection(x), ...))
data <- do.call("rbind", data)
data
}
@crsh
Copy link
Author

crsh commented Apr 1, 2016

Arguments

url Character. URL to folder in GitHub repository containing the data.
extension Character. File extension as regular expression (could also contain other parts of the file name).
read_fun Function. Function used to read data files (e.g. read.csv).
... Other parameters passed to read_fun.

Example

data <- batch_read_github(
  url = "https://github.com/methexp/subliminal-EC/tree/master/data"
  , extension = "\\.csv"
  , read_fun = read.csv
  , header = TRUE
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment