Skip to content

Instantly share code, notes, and snippets.

@ipurusho
ipurusho / tsne_and_sos.md
Last active October 7, 2016 19:49
A simple tutorial of how to use the python implementation of tsne and Stochastic Outlier Selection

tSNE

Download the following script: https://gist.github.com/ipurusho/44e06d43aab0a7dd2641589a4fd3351c

In R, write the variance stabilized values per sample, subsetting for the top 500 variable genes, to a file #without# row and column labels. You can then use the tsne.py script as follows:

python /path/to/tSNE.py /path/to/tsne_input_vsd.csv 30 /path/to/output_file.csv

Where 30 is the perplexity value, which is dependent on sample size for optimum output (see documentation). The output file will contain two columns, dimension 1, dimension 2. The rows correspond to the Samples which will be in the same order as the input vsd columns. You can load this data back into R and visualize it using your preferred method.

@ipurusho
ipurusho / tsne.py
Created October 7, 2016 18:19
tSNE script for VSD as input to visualize clusters in sequencing data
import sys
import numpy as np
from sklearn.manifold import TSNE
my_data = np.genfromtxt(str(sys.argv[1]), delimiter=',')
model = TSNE(n_components=2, random_state=0,perplexity=int(sys.argv[2]))
tsne = model.fit_transform(np.transpose(my_data))
np.savetxt(str(sys.argv[3]), tsne, delimiter=",")
@ipurusho
ipurusho / DAVID.R
Last active September 29, 2016 15:49
A function to perform DAVID GO term analysis using R directly.
### "transcript" parameter is the original matrix of normalized counts
### "signifcant" parameter is the filtered matrix of significant genes (using desired cutoff)
### "email" is an email address needed to run the web service
go.david<-function(transcript,significant,email){
require("RDAVIDWebService")
if(dim(significant)[1] !=0){
david<-DAVIDWebService$new(email=email)
background<-addList(david, rownames(transcript), idType="ENSEMBL_GENE_ID",listName="full_transcript", listType="Background")
sig.genes<-addList(david, rownames(significant), idType="ENSEMBL_GENE_ID",listName="significant", listType="Gene")

###Download modified RRHO scripts Obtain the modified R scripts from https://gist.github.com/ipurusho/8e4beef4fb51c2485c9f and source it in your R environment.

###Prepare the data Create two separate matrices, one for P value and one for log fold change from your differential analysis. The rows should be genes and the columns should be the comparisons (column order should be consistent between the two matrices). For example:

P.vals = as.data.frame(cbind(Comparison1 = Comparison1[intersect.genes,"P.Value"],
                             Comparison2 = Comparison2[intersect.genes,"P.Value"],
 Comparison3 = Comparison3[intersect.genes,"P.Value"]))

###Download R scripts

Download the following R script which contains necessary functions: https://gist.github.com/ipurusho/64d6dbf1a32aa7c83f665785155e494b

Then, load the covariate_contribution.R script into your R environment.

###Performing Covariate Analysis Once you have obtained a variance stabilized count matrix and organized your meta data (meta data rows should correspond to VST columns), you can use the top.var function (now in your R environment) to filter for the top 500 (or more) variable genes for input into PCA.

@ipurusho
ipurusho / covariate_contribution.R
Last active September 28, 2016 15:09
Functions to help determine the contribution of covariates, weighted by the percent contribution of principal components
top.var<-function (x, ntop = 500)
{
require("genefilter")
rv = rowVars(x)
select = order(rv, decreasing = TRUE)[seq_len(ntop)]
topvar = x[select, ]
return(topvar)
}
@ipurusho
ipurusho / rMATS_splicing.md
Last active July 12, 2023 06:59
Brief tutorial for using rMATS to conduct a pairwise differential splicing analysis

In order to perform a pairwise splicing analysis, sorted bam files per sample are needed as well as reference annotation in the form of a GTF file

###Download rMATS Download rMATS at http://rnaseq-mats.sourceforge.net/download.html

###Running rMATS Via python, you can run rMATs using the following example command:

python /path/to/rMATS/directory/RNASeq-MATS.py -b1 conditon1_replicate_1.bam,condition_1_replicate_2.bam -b2 condition_2_replicate_1.bam,condition_2_replicate_2.bam -gtf /path/to/gtf/file -o /path/to/results/directory -t single -len {read length}

@ipurusho
ipurusho / STAR.md
Last active September 27, 2023 16:58
A brief tutorial on how to run the STAR aligner on medinfo.mssm.edu

###Download STAR### Obtain STAR source from https://github.com/alexdobin/STAR

Add the following to your .bashrc file and source it: export PATH=/path/to/STAR/bin/:$PATH

###Generate Reference Genome

import sys
import os
import re
import glob
#sys.argv[1] = action
#sys.argv[2] = top level folder
#sys.argv[3] = read
def subdirectories(args):
import scala.collection.mutable.ArrayBuffer
def binFromRange(start: Int, end: Int): ArrayBuffer[Int] ={
val bin_offsets = Array(512+64+8+1,64+8+1,8+1,0)
val binFirstShift = 17
val binNextShift = 3
var startBin = start >> binFirstShift