Skip to content

Instantly share code, notes, and snippets.

@hyphaltip
Last active September 3, 2020 19:56
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hyphaltip/6ef2c688a15e2fed27d07de94eb381a3 to your computer and use it in GitHub Desktop.
Save hyphaltip/6ef2c688a15e2fed27d07de94eb381a3 to your computer and use it in GitHub Desktop.
Hyper geometric domain enrich

Domain set enrichment. code by Diego Martinez used in this paper https://mbio.asm.org/content/3/5/e00259-12

Comparative Genome Analysis of Trichophyton rubrum and Related Dermatophytes Reveals Candidate Genes Involved in Infection Diego A. Martinez, Brian G. Oliver, Yvonne Gräser, Jonathan M. Goldberg, Wenjun Li, Nilce M. Martinez-Rossi, Michel Monod, Ekaterina Shelest, Richard C. Barton, Elizabeth Birch, Axel A. Brakhage, Zehua Chen, Sarah J. Gurr, David Heiman, Joseph Heitman, Idit Kosti, Antonio Rossi, Sakina Saif, Marketa Samalova, Charles W. Saunders, Terrance Shea, Richard C. Summerbell, Jun Xu, Sarah Young, Qiandong Zeng, Bruce W. Birren, Christina A. Cuomo, Theodore C. White mBio Sep 2012, 3 (5) e00259-12; DOI: 10.1128/mBio.00259-12

Diego's instructions Lump multiple species together when comparing a set - I just added them as if they were a psuedo genome.

I think this is commented for use but let me know if it doesnt make sense. give it  (<genomes, domains matrix>, <rownames for 1st genomes>, <rownames of 2nd genomes>)

#!/usr/bin/env Rscript
testTehTablez.HG.AnyGenome <- function(x,list_1,list_2){#x is a table of size X, rows are genomes, columns are IPR domains,
out <- c() #the point of this version is to be able to give a table and spit out any combination of values.
fam <- c() #old was table == 1gene_fam 2genome1 3genome2 4gfamcount1 5gfamcount2 6genomesize1 7genomesize2
gen1 <- c() #MUST have library(qvalue) loaded.
gen2 <- c()
enrich <- c()
deplete <- c()
famSize1 <- c()
famSize2 <- c()
genColSize1 <- c()
genColSize2 <- c()
for(fs1 in list_1){
famSize1[fs1] <- sum(x[fs1,])
}
for(fs2 in list_2){
famSize2[fs2] <- sum(x[fs2,])
}
for(col in 1:dim(x)[2]){
genColSize1[col] <- sum(x[list_1,col])
genColSize2[col] <- sum(x[list_2,col])
if(sum(x[list_1,col],x[list_2,col]) > 0 ){#if both are zero, phyper does not work.
if(sum(x[list_1,col]) == 0){
enrich[col] <- phyper(sum(x[list_1,col]),sum(famSize1[list_1]),sum(famSize2[list_2]),
sum(x[list_1,col]) + sum(x[list_2,col]), lower.tail = FALSE)#enrichment p-valueA
#enrich[rw] <- phyper(x[rw,4],x[rw,6],x[rw,7], x[rw,4] + x[rw,5], lower.tail = FALSE)#enrichment p-valueA
}else{
enrich[col] <- phyper(sum(x[list_1,col]) - 1,sum(famSize1[list_1]),sum(famSize2[list_2]),
sum(x[list_1,col]) + sum(x[list_2,col]), lower.tail = FALSE)#enrichment p-valueA
}
if(sum(x[list_1,col]) == 0){
deplete[col] <- phyper(sum(x[list_1,col]),sum(famSize1[list_1]),sum(famSize2[list_2]),
sum(x[list_1,col]) + sum(x[list_2,col]), lower.tail = TRUE)#enrichment p-valueA
}else{
deplete[col] <- phyper(sum(x[list_1,col]) + 1,sum(famSize1[list_1]),sum(famSize2[list_2]),
sum(x[list_1,col]) + sum(x[list_2,col]), lower.tail = TRUE)#enrichment p-valueA
}
}else{
enrich[col] <- 1
deplete[col] <- 1
}
}
# out <- data.frame(fam,gen1,gen2,enrich,deplete)
# out <- data.frame(x,enrich,enwo,deplete,depwo)
enrich.qval <- c()
deplete.qval <- c()
out <- data.frame(colnames(x),genColSize1,genColSize2,enrich,deplete,qvalue(enrich)$qvalues,qvalue(deplete)$qvalues)
colnames(out) <- c('fam',paste(list_1,sep="_",collapse="_"),paste(list_2,sep="_",collapse="_"),'enrich.pvalue','deplete.pvalue','enrich.qvalue','deplete.qvalue')
out }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment