Skip to content

Instantly share code, notes, and snippets.

@ryan-hill
Last active June 11, 2018 22:54
Show Gist options
  • Save ryan-hill/a8da86ef36eae86dcf3217fda8f96b02 to your computer and use it in GitHub Desktop.
Save ryan-hill/a8da86ef36eae86dcf3217fda8f96b02 to your computer and use it in GitHub Desktop.
Batch download StreamCat files for entire US based on desired table
#FTP location
ftpdir <- 'ftp://newftp.epa.gov/EPADataCommons/ORD/NHDPlusLandscapeAttributes/StreamCat/HydroRegions/'
#Desired table (change to name of desired table)
table <- 'PredictedBioCondition'
#Get URL, split returned list, select out only desired tables by name ('table' above)
library(RCurl)
url_list <- getURL(ftpdir, dirlistonly = TRUE)
url_list <- strsplit(url_list, split = '\r\n')[[1]]
url_list <- url_list[grep(table, url_list)]
#Loop through files on FTP, download, and append
for(i in 1:length(url_list)){
print(i)
temp <- tempfile()
download.file(paste0(ftpdir, url_list[i]), temp)
#replace .zip with .csv in file name
csv_file <- gsub('.zip', '.csv', url_list[i])
#read in csv
tmp_metric <- read.csv(unz(temp, csv_file))
#Append to final table
if(i == 1){
outdf <- tmp_metric
}else{
outdf <- rbind(outdf, tmp_metric)
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment