Skip to content

Instantly share code, notes, and snippets.

@hussius
hussius / kallisto_setup.sh
Last active December 11, 2020 15:45
Kallisto setup
# Download Kallisto and sratools (the latter to be able to download from SRA)
wget https://github.com/pachterlab/kallisto/releases/download/v0.42.3/kallisto_mac-v0.42.3.tar.gz
tar zvxf kallisto_mac-v0.42.3.tar.gz
wget http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.5.2/sratoolkit.2.5.2-mac64.tar.gz
tar zxvf sratoolkit.2.5.2-mac64.tar.gz
# Download and merge human cDNA and ncDNA files from Ensembl for the index.
wget ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
wget ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/ncrna/Homo_sapiens.GRCh38.ncrna.fa.gz
cat Homo_sapiens.GRCh38.cdna.all.fa.gz Homo_sapiens.GRCh38.ncrna.fa.gz > Homo_sapiens.GRCh38.rna.fa.gz
@hussius
hussius / ae_toy_example.py
Last active June 28, 2019 17:12
Toy example of single-layer autoencoder in TensorFlow
import tensorflow as tf
import numpy as np
import math
#import pandas as pd
#import sys
input = np.array([[2.0, 1.0, 1.0, 2.0],
[-2.0, 1.0, -1.0, 2.0],
[0.0, 1.0, 0.0, 2.0],
[0.0, -1.0, 0.0, -2.0],
"""
* Converts images to GGB (grayscale)
* Creates subsets for training and validation
* Adds columns to indicate training or validation (useful for analysis of the deployed model)
* Adds rotated images to the dataset
* Creates comma-separated CSV file
* Creates zip archive
"""
import pandas as pd
1. Install appropriate version of the Tensorflow (Python) framework from https://www.tensorflow.org/versions/r0.12/get_started/os_setup.html
In my case (Mac OS X 10.11), I did:
- Get the .whl file (this is more likely to work than a direct pip install)
wget https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.11.0-py3-none-any.whl
- Install using non-Anaconda pip
/usr/local/bin/pip3 install tensorflow-0.11.0-py3-none-any.whl_
@hussius
hussius / preprocess_yeast_dna.py
Created June 14, 2018 12:17
Preprocess yeast DNA csv file from Genome Research paper
from pathlib import Path
import os
import sys
from fire import Fire
import numpy as np
import pandas as pd
from tqdm import tqdm
@hussius
hussius / decode_cossmo_example.py
Created April 26, 2018 11:02 — forked from hannes-brt/decode_cossmo_example.py
Function to decode a COSSMO training example in tfrecord format
def read_single_cossmo_example(serialized_example, n_tissues=1, coord_sys='rna1'):
"""Decode a single COSSMO example
coord_sys must be one of 'rna1' or 'dna0', if 'dna0' then an extra 'strand' field
must exist in the tfrecord and is extracted.
"""
assert coord_sys in ['dna0', 'rna1']
context_features = {
@hussius
hussius / sleuth_commands.R
Created September 14, 2015 07:52
Sleuth commands
# Installation (only needs to be done once)
source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")
install.packages("devtools")
devtools::install_github("pachterlab/sleuth")
# Now load the package
library("sleuth")
# A function (borrowed from the Sleuth documentation) for connecting Ensembl transcript names to common gene names
library(tensorflow)
tf$reset_default_graph()
x_data <- runif(100, min=0, max=1)
y_data <- x_data * 0.1 + 0.3 + rnorm(n, mean=0, sd=0.025)
W <- tf$Variable(tf$random_uniform(shape(1L), -1.0, 1.0))
b <- tf$Variable(tf$zeros(shape(1L)))
y <- W * x_data + b
@hussius
hussius / sum_by_gene.py
Created November 17, 2016 09:39
Sum transcript TPMs by gene using the FASTA file used for a Kallisto index
import sys
import gzip
if len(sys.argv)<3:
sys.exit("python sum_per_gene.py <cDNA FASTA file> <TPM table>")
ensg = {}
mapf = gzip.open(sys.argv[1])
ctr = 0
@hussius
hussius / merge_kallisto_TPM.R
Last active November 17, 2016 09:38
R script for merging Kallisto TPMs from output directories below path given as command-line argument
args = commandArgs(trailingOnly=TRUE)
path=args[1]
files=Sys.glob(paste0(path,"/*/abundance.tsv"))
#print(files)
merge_two <- function(x,y){
#print(dim(x))
if ("tpm" %in% colnames(x)){
x_ <- x[,c(1,5)]
}
else{