Immanuel Purushothaman ipurusho

## tsne_and_sos.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ipurusho
                / tsne_and_sos.md
            
            
              Last active
              October 7, 2016 19:49
            
              
                A simple tutorial of how to use the python implementation of tsne and Stochastic Outlier Selection
              
          
    tSNE

Download the following script: https://gist.github.com/ipurusho/44e06d43aab0a7dd2641589a4fd3351c
In R, write the variance stabilized values per sample, subsetting for the top 500 variable genes, to a file #without# row and column labels. You can then use the tsne.py script as follows:
python /path/to/tSNE.py /path/to/tsne_input_vsd.csv 30 /path/to/output_file.csv

Where 30 is the perplexity value, which is dependent on sample size for optimum output (see documentation). The output file will contain two columns, dimension 1, dimension 2. The rows correspond to the Samples which will be in the same order as the input vsd columns. You can load this data back into R and visualize it using your preferred method.

  
## tsne.py
import sys
import numpy as np
from sklearn.manifold import TSNE


my_data = np.genfromtxt(str(sys.argv[1]), delimiter=',')

model = TSNE(n_components=2, random_state=0,perplexity=int(sys.argv[2]))
tsne = model.fit_transform(np.transpose(my_data))
np.savetxt(str(sys.argv[3]), tsne, delimiter=",")

## DAVID.R
### "transcript" parameter is the original matrix of normalized counts
### "signifcant" parameter is the filtered matrix of significant genes (using desired cutoff)
### "email" is an email address needed to run the web service

go.david<-function(transcript,significant,email){
  require("RDAVIDWebService")
  if(dim(significant)[1] !=0){
  david<-DAVIDWebService$new(email=email)
  background<-addList(david, rownames(transcript), idType="ENSEMBL_GENE_ID",listName="full_transcript", listType="Background")
  sig.genes<-addList(david, rownames(significant), idType="ENSEMBL_GENE_ID",listName="significant", listType="Gene")

## Running_RRHO.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ipurusho
                / Running_RRHO.md
            
            
              Last active
              September 28, 2016 19:41
            
          
    ###Download modified RRHO scripts
Obtain the modified R scripts from https://gist.github.com/ipurusho/8e4beef4fb51c2485c9f and source it in your R environment.
###Prepare the data
Create two separate matrices, one for P value and one for log fold change from your differential analysis. The rows should be genes and the columns should be the comparisons (column order should be consistent between the two matrices). For example:
P.vals = as.data.frame(cbind(Comparison1 = Comparison1[intersect.genes,"P.Value"],
                             Comparison2 = Comparison2[intersect.genes,"P.Value"],
 Comparison3 = Comparison3[intersect.genes,"P.Value"]))


## PCA_covariate_heatmap.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ipurusho
                / PCA_covariate_heatmap.md
            
            
              Last active
              September 28, 2016 17:05
            
          
    ###Download R scripts
Download the following R script which contains necessary functions:
https://gist.github.com/ipurusho/64d6dbf1a32aa7c83f665785155e494b
Then, load the  covariate_contribution.R script into your R environment.
###Performing Covariate Analysis
Once you have obtained a variance stabilized count matrix and organized your meta data (meta data rows should correspond to VST columns), you can use the top.var function (now in your R environment) to filter for the top 500 (or more) variable genes for input into PCA.

  
## covariate_contribution.R
top.var<-function (x, ntop = 500)
{
  require("genefilter")
  rv = rowVars(x)
  select = order(rv, decreasing = TRUE)[seq_len(ntop)]
  topvar = x[select, ]

  return(topvar)
}

## rMATS_splicing.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              3 stars
            
          
                ipurusho
                / rMATS_splicing.md
            
            
              Last active
              July 12, 2023 06:59
            
              
                Brief tutorial for using rMATS to conduct a pairwise differential splicing analysis
              
          
    In order to perform a pairwise splicing analysis, sorted bam files per sample are needed as well as reference annotation in the form of a GTF file
###Download rMATS
Download rMATS at http://rnaseq-mats.sourceforge.net/download.html
###Running rMATS
Via python, you can run rMATs using the following example command:
python /path/to/rMATS/directory/RNASeq-MATS.py -b1 conditon1_replicate_1.bam,condition_1_replicate_2.bam -b2 condition_2_replicate_1.bam,condition_2_replicate_2.bam -gtf /path/to/gtf/file -o /path/to/results/directory -t single -len {read length}

  
## STAR.md

      
              1 file
            
          
              2 forks
            
          
              5 comments
            
          
              7 stars
            
          
                ipurusho
                / STAR.md
            
            
              Last active
              September 27, 2023 16:58
            
              
                A brief tutorial on how to run the STAR aligner on medinfo.mssm.edu
              
          
    ###Download STAR###
Obtain STAR source from https://github.com/alexdobin/STAR
Add the following to your .bashrc file and source it:
export PATH=/path/to/STAR/bin/:$PATH
###Generate Reference Genome

  
## merge_or_rename.py
import sys
import os
import re
import glob


#sys.argv[1] = action
#sys.argv[2] = top level folder
#sys.argv[3] = read
def subdirectories(args):

## bin_index_ucsc.scala
import scala.collection.mutable.ArrayBuffer

def binFromRange(start: Int, end: Int): ArrayBuffer[Int] ={

  val bin_offsets = Array(512+64+8+1,64+8+1,8+1,0)
  val binFirstShift = 17
  val binNextShift = 3


  var startBin = start >> binFirstShift
	import sys
	import numpy as np
	from sklearn.manifold import TSNE


	my_data = np.genfromtxt(str(sys.argv[1]), delimiter=',')

	model = TSNE(n_components=2, random_state=0,perplexity=int(sys.argv[2]))
	tsne = model.fit_transform(np.transpose(my_data))
	np.savetxt(str(sys.argv[3]), tsne, delimiter=",")
	### "transcript" parameter is the original matrix of normalized counts
	### "signifcant" parameter is the filtered matrix of significant genes (using desired cutoff)
	### "email" is an email address needed to run the web service

	go.david<-function(transcript,significant,email){
	require("RDAVIDWebService")
	if(dim(significant)[1] !=0){
	david<-DAVIDWebService$new(email=email)
	background<-addList(david, rownames(transcript), idType="ENSEMBL_GENE_ID",listName="full_transcript", listType="Background")
	sig.genes<-addList(david, rownames(significant), idType="ENSEMBL_GENE_ID",listName="significant", listType="Gene")
	top.var<-function (x, ntop = 500)
	{
	require("genefilter")
	rv = rowVars(x)
	select = order(rv, decreasing = TRUE)[seq_len(ntop)]
	topvar = x[select, ]

	return(topvar)
	}
	import sys
	import os
	import re
	import glob


	#sys.argv[1] = action
	#sys.argv[2] = top level folder
	#sys.argv[3] = read
	def subdirectories(args):
	import scala.collection.mutable.ArrayBuffer

	def binFromRange(start: Int, end: Int): ArrayBuffer[Int] ={

	val bin_offsets = Array(512+64+8+1,64+8+1,8+1,0)
	val binFirstShift = 17
	val binNextShift = 3


	var startBin = start >> binFirstShift