Skip to content

Instantly share code, notes, and snippets.

@millerh1
Created August 27, 2021 15:45
Show Gist options
  • Save millerh1/0bb980de121113811e4b0bcbfd431466 to your computer and use it in GitHub Desktop.
Save millerh1/0bb980de121113811e4b0bcbfd431466 to your computer and use it in GitHub Desktop.
Changing filenames on AWS S3 from R (Example, RSeq Testing Data)
# Script for wrangling test files to correct naming conventions
library(tidyverse)
library(parallel)
S3_BAM_URI <- "s3://rseq-testing/bam-files/"
bamsAvail <- system(paste0("aws s3 ls ", S3_BAM_URI), intern = TRUE)
oldnew <- tibble(
oldfls = gsub(bamsAvail, pattern = ".+ ([ES]{1}RX[0-9]+_.+\\.[hgmm]{2}[0-9]+\\.bam)", replacement = "\\1"),
newfls = gsub(bamsAvail, pattern = ".+ ([ES]{1}RX[0-9]+)_.+\\.([hgmm]{2}[0-9]+\\.bam)", replacement = "\\1_\\2")
) %>%
mutate(across(everything(), function(x) {paste0(S3_BAM_URI, x)}))
mclapply(seq(rownames(oldnew)), function(i) {
old <- oldnew$oldfls[i]
new <- oldnew$newfls[i]
system(paste0("aws s3 mv ", old, " ", new))
}, mc.cores = length(seq(rownames(oldnew)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment