Skip to content

Instantly share code, notes, and snippets.

@PoisonAlien
Last active February 18, 2019 16:13
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PoisonAlien/86541b2d310471f4c786 to your computer and use it in GitHub Desktop.
Save PoisonAlien/86541b2d310471f4c786 to your computer and use it in GitHub Desktop.
Takes variants as input and annotates them using borads oncotator api (http://www.broadinstitute.org/oncotator/). Output is a dataframe of annotated variants in maf format.
# Usage
#
# oncotate(maflite, header = FALSE, basename = NULL)
# Arguments
#
# maflite
# input tsv file with chr, start, end, ref_allele, alt_allele columns. (rest of the columns, if present will be attached to the output maf)
#
# header
# logical. Whether input has a header line. Default is FALSE.
#
# basename
# NULL. if basename is given, annotations will be written to <basename>.maf file.
#
# Details
#
# Input should be a five column file with chr, start, end, ref_allele, alt_allele (and so on, but only first five will used, rest will be attached to resulting maf file). Note: Time consuming if input is huge.
oncotate = function (maflite, header = FALSE, basename = NULL) {
require(package = "rjson")
anno.df = c()
m = read.delim(maflite, stringsAsFactors = F, header = header)
anno = paste(m[, 1], m[, 2], m[, 3], m[, 4], m[, 5], sep = "_")
for (i in 1:length(anno)) {
rec = anno[i]
rec.url = paste("http://www.broadinstitute.org/oncotator/mutation",
rec, sep = "/")
annot = fromJSON(file = rec.url)
anno.df = rbind(anno.df, as.data.frame(annot))
}
colnames(anno.df) = gsub(pattern = "^X", replacement = "",
x = colnames(anno.df))
colnames(m)[1:5] = c("Chromosome", "Start_Position", "End_Position",
"Reference_Allele", "Tumor_Seq_Allele2")
anno.df = cbind(m, anno.df)
anno.df$Center = NA
anno.df$Tumor_Seq_Allele1 = anno.df$Reference_Allele
colnames(anno.df)[which(colnames(anno.df) == "gene")] = "Hugo_Symbol"
colnames(anno.df)[which(colnames(anno.df) == "variant_classification")] = "Variant_Classification"
colnames(anno.df)[which(colnames(anno.df) == "variant_type")] = "Variant_Type"
colnames(anno.df)[which(colnames(anno.df) == "HGNC_Entrez.Gene.ID.supplied.by.NCBI.")] = "Entrez_Gene_Id"
colnames(anno.df)[which(colnames(anno.df) == "strand")] = "Strand"
colnames(anno.df)[which(colnames(anno.df) == "build")] = "NCBI_Build"
colnames(anno.df)[which(colnames(anno.df) == "strand")] = "Strand"
anno.df1 = anno.df[, c("Hugo_Symbol", "Entrez_Gene_Id", "Center",
"NCBI_Build", "Chromosome", "Start_Position", "End_Position",
"Strand", "Variant_Classification", "Variant_Type", "Reference_Allele",
"Tumor_Seq_Allele1", "Tumor_Seq_Allele2")]
anno.df2 = anno.df[, colnames(anno.df)[!colnames(anno.df) %in%
colnames(anno.df1)]]
anno.df = cbind(anno.df1, anno.df2)
if (!is.null(basename)) {
write.table(anno.df, paste(basename, "maf", sep = "."),
quote = F, row.names = F, sep = "\t")
}
return(anno.df)
}
@rsklav
Copy link

rsklav commented Oct 6, 2017

Hello,

I am having issues with mutations that are not SNPs. For example, although oncotator has no problem annotating this, when I include it as part of a larger tsv file and run your script or just run your script on a tsv file that only has this mutation, I get an error :
Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match

This is the mutation I have been using as an example to test the script: 7 50367240 50367256 AAAGCCCCCCTGTAAGC AGTGGGG.

Thanks so much for this script!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment