Skip to content

Instantly share code, notes, and snippets.

@jamesthomson
Last active August 29, 2015 14:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jamesthomson/f73506dc883dafc0ae97 to your computer and use it in GitHub Desktop.
Save jamesthomson/f73506dc883dafc0ae97 to your computer and use it in GitHub Desktop.
msd files list
library(XML)
#read url
search<-readLines('http://tbmmsd.s3.amazonaws.com/')
#convert to data.frame
df<-xmlToDataFrame(search)
#pull out files list
Files<-df$Key
#clean up NAs
Files2<-Files[!is.na(Files)]
#construct code
code<-paste0("hadoop fs -cp s3://tbmmsd/", Files2, " /data/files/", Files2)
#get list run either
writeClipboard(code)
write.table(code, "file_list.txt", quote=FALSE, row.names=FALSE, col.names=FALSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment