Skip to content

Instantly share code, notes, and snippets.

@cbare
Created April 20, 2012 17:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cbare/2430361 to your computer and use it in GitHub Desktop.
Save cbare/2430361 to your computer and use it in GitHub Desktop.
Extract gene expression data from a cmonkey output file
# Extract expression data
# =======================
#
# Gene expression data comes from the cmonkey output RData file:
# ~/Documents/work/projects/network_portal/cm_session.RData
#
# ...in a variable: env$ratios$ratios, a 3491 by 739 matrix
conditions <- c('X130.1', 'X127.1', 'X435.8', 'X435.7', 'X435.6', 'X435.1', 'X435.2', 'X435.3', 'X435.4', 'X435.10', 'X435.9', 'X435.5', 'X435.11', 'X449.1', 'X450.1', 'X450.2', 'X450.3', 'X450.4', 'X450.5', 'X1271.1', 'X1271.2', 'X1271.3', 'X1271.4', 'X1271.5', 'X1271.7', 'X1271.9', 'X1271.10', 'X1709.2', 'X1709.3', 'X1709.4', 'X1709.5', 'X1709.6', 'X1709.7', 'X1709.8', 'X1709.9', 'X1709.10', 'X1709.12', 'X1709.13', 'X1709.14', 'X1709.15', 'X1713.1', 'X1713.2', 'X1713.3', 'X1713.4', 'X1713.5', 'X1736.5', 'X1736.10', 'X1736.17', 'X1736.19', 'X1736.21')
# dvu gene names are the second entry in the synonyms table
trans <- sapply(env$genome.info$synonyms, `[`, 2)
# the feature table is keyed by YP_ identifiers
# we want to use DVU gene names
ft = env$genome.info$feature.tab
dvu.features.1 <- ft[trans[rownames(env$ratios$ratios)],c('contig','strand','start_pos','end_pos')]
trans[rownames(env$ratios$ratios)]
# are all gene names in translation table?
all(rownames(env$ratios$ratios) %in% names(trans))
# are any mapped to NAs?
trans[ is.na(trans) ]
# 'DVU0490','DVU0557','DVU0699','DVU1831','DVU2001','DVU2049','DVU2950','DVU3126','DVU3280','DVU3304'
# missing.genes = read.delim(file='missing-genes.txt', header=F, col.names=c('name','sequence','strand','start','end'), stringsAsFactors=F)
missing.genes.txt <- "DVU0490 chromosome - 558218 559487
DVU0557 chromosome + 624872 625726
DVU0699 chromosome + 773972 775650
DVU1831 chromosome + 1896865 1897691
DVU2001 chromosome + 2082778 2084039
DVU2049 chromosome + 2126385 2126851
DVU2950 chromosome - 3054878 3056595
DVU3126 chromosome + 3273316 3274547
DVU3280 chromosome + 3455425 3456410
DVU3304 chromosome + 3481262 3482740
"
# read the string above into a data.frame
missing.genes <- read.delim(file=textConnection(missing.genes.txt, open="r"), header=F, col.names=c('name','sequence','strand','start','end'))
dvu.features <- data.frame(
name=rownames(env$ratios$ratios),
sequence=sapply(dvu.features.1$contig, function(x)
if (is.na(x)) NA
else if (x=='NC_002937.3') 'chr'
else if (x=='NC_005863.1') 'megaplasmid'
else x),
strand=sapply(dvu.features.1$strand, function(x)
if (is.na(x)) NA
else if (x=='R') {'-'}
else if (x=='D') {'+'}
else {x}),
start=dvu.features.1$start_pos,
end=dvu.features.1$end_pos,
stringsAsFactors=F)
rownames(dvu.features) <- rownames(env$ratios$ratios)
# fix entries that either can't be translated or are missing from the features table
dvu.features[missing.genes$name,] <- missing.genes
dvu.expression <- cbind( dvu.features, env$ratios$ratios[, conditions] )
@cbare
Copy link
Author

cbare commented Apr 20, 2012

Create a heatmap track of gene expression in the Gaggle Genome Browser, given expression data in a cmonkey output file for the organism Desulfovibrio vulgaris Hildenborough. This is not really a complete script. Some manual intervention is required.

The first complication is that the rows in the ratios matrix are keyed by DVU gene identifiers, while the features table that has location on the genome is keyed by some other identifiers which look like this: "YP_009868.1". We create a lookup table called trans to fix this. This covers most genes. We add the missing genes (in the ratios matrix, but not in the features table) manually.

Next, create dvu.features, translating sequence names and strand along the way.

Finally, extract the desired conditions and cbind the chromosomal locations to the expression data. The resulting data.frame can be written out with write.table and read into SQLite with .import

Ugly? Yes! But, it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment