Skip to content

Instantly share code, notes, and snippets.

@ipurusho
Last active October 7, 2016 19:49
Show Gist options
  • Save ipurusho/1d1f6149b0b7b74c89f8992a4f787ab4 to your computer and use it in GitHub Desktop.
Save ipurusho/1d1f6149b0b7b74c89f8992a4f787ab4 to your computer and use it in GitHub Desktop.
A simple tutorial of how to use the python implementation of tsne and Stochastic Outlier Selection

tSNE

Download the following script: https://gist.github.com/ipurusho/44e06d43aab0a7dd2641589a4fd3351c

In R, write the variance stabilized values per sample, subsetting for the top 500 variable genes, to a file #without# row and column labels. You can then use the tsne.py script as follows:

python /path/to/tSNE.py /path/to/tsne_input_vsd.csv 30 /path/to/output_file.csv

Where 30 is the perplexity value, which is dependent on sample size for optimum output (see documentation). The output file will contain two columns, dimension 1, dimension 2. The rows correspond to the Samples which will be in the same order as the input vsd columns. You can load this data back into R and visualize it using your preferred method.

SOS

Please refer to https://github.com/jeroenjanssens/scikit-sos and pip install scikit-sos

For SOS, please write the transposed matrix of filtered variance stabilized values to a text file (again without row and column labels). For example, in the tSNE example above, if you have a matrix with 40 samples filtered for the top 500 varying genes, the resulting text file will have 500 rows and 40 columns. For SOS, transpose it such that there are 500 columns and 40 rows.

Execute the SOS algorithm:

/path/to/vsd.csv | sos -p 30 > /path/to/output.txt

The -p flag corresponds to the perplexity, again, this should be adjusted based on sample size. Each row of the output corresponds to the columns (in same order) of the vsd input. The output is the outlier probability value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment