Skip to content

Instantly share code, notes, and snippets.

@seandavi
Last active January 8, 2024 21:12
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save seandavi/f79bd539573c8d4dca6e95fea3ec1075 to your computer and use it in GitHub Desktop.
Save seandavi/f79bd539573c8d4dca6e95fea3ec1075 to your computer and use it in GitHub Desktop.
Translate GDC file_ids to TCGA barcodes
library(GenomicDataCommons)
library(magrittr)
TCGAtranslateID = function(file_ids) {
info = files() %>%
GenomicDataCommons::filter( ~ file_id %in% file_ids) %>%
GenomicDataCommons::select('cases.samples.submitter_id') %>%
results_all()
# The mess of code below is to extract TCGA barcodes
# id_list will contain a list (one item for each file_id)
# of TCGA barcodes of the form 'TCGA-XX-YYYY-ZZZ'
id_list = lapply(info$cases,function(a) {
a[[1]][[1]][[1]]})
# so we can later expand to a data.frame of the right size
barcodes_per_file = sapply(id_list,length)
# And build the data.frame
return(data.frame(file_id = rep(ids(info),barcodes_per_file),
submitter_id = unlist(id_list)))
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment