Skip to content

Instantly share code, notes, and snippets.

@omaraflak
omaraflak / main.cpp
Last active March 23, 2024 14:44
Image convolution in C++ + Gaussian blur
#include <iostream>
#include <vector>
#include <assert.h>
#include <cmath>
#include <png++/png.hpp>
using namespace std;
typedef vector<double> Array;
typedef vector<Array> Matrix;
@slowkow
slowkow / counts_to_tpm.R
Last active March 18, 2024 20:38
Convert read counts to transcripts per million (TPM).
#' Convert counts to transcripts per million (TPM).
#'
#' Convert a numeric matrix of features (rows) and conditions (columns) with
#' raw feature counts to transcripts per million.
#'
#' Lior Pachter. Models for transcript quantification from RNA-Seq.
#' arXiv:1104.3889v2
#'
#' Wagner, et al. Measurement of mRNA abundance using RNA-seq data:
#' RPKM measure is inconsistent among samples. Theory Biosci. 24 July 2012.
@stephenturner
stephenturner / deseq2-analysis-template.R
Created July 30, 2014 12:20
Template for analysis with DESeq2
## RNA-seq analysis with DESeq2
## Stephen Turner, @genetics_blog
# RNA-seq data from GSE52202
# http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse52202. All patients with
# ALS, 4 with C9 expansion ("exp"), 4 controls without expansion ("ctl")
# Import & pre-process ----------------------------------------------------
# Import data from featureCounts
@peterk87
peterk87 / parseSNPs.py
Created January 13, 2014 22:52
Python: Parse SNPs from one or more multiple sequence alignments in multifasta format and output a concatenated SNP fasta, a basic SNP report, and/or [binarized] SNP table.
import argparse
import textwrap
import os
import sys
from datetime import timedelta, datetime
# function for reading a multifasta file
# returns a dictionary with sequence headers and nucleotide sequences
def get_seqs_from_fasta(filepath):
@taoliu
taoliu / bin_chromosome.py
Created November 6, 2013 05:39
Script to bin chromosome
#!/usr/bin/env python
# Time-stamp: <2013-09-24 15:23:09 Tao Liu>
import os
import sys
# ------------------------------------
# Main function
# ------------------------------------
def main():
if len( sys.argv ) < 4:
@johnstantongeddes
johnstantongeddes / RPKM-TPM.r
Created October 10, 2013 20:48
Script to compare calculation of Reads per Kilobase per Million mapped reads (RPKM) to Transcripts per Million (TPM) using example data from http://blog.nextgenetics.net/?e=51. Wagner et al. 2012 "Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples" Theory Biosci. 131:281-285
# Script to compare Reads per Kilobase per Million mapped reads (RPKM) to Transcripts per Million (TPM) for gene expression count data
# Wagner et al. 2012 "Measurement of mRNA abundance using RNA-seq data: RPKM measure
# is inconsistent among samples" Theory Biosci. 131:281-285
library(plyr)
## Worked example from http://blog.nextgenetics.net/?e=51
X <- data.frame(gene=c("A","B","C","D","E"), count=c(80, 10, 6, 3, 1),
@peterk87
peterk87 / get_snps_from_msas.py
Created May 9, 2013 16:34
Python: Get SNPs from MSAs
aln_snps = {}
for aln in aln_files:
recs = [f for f in SeqIO.parse(aln, 'fasta')]
# strain names should be the last dash delimited element in fasta header
strains = [rec.name.split('-')[-1] for rec in recs]
# get a dictionary of strain names and sequences
strain_seq = {rec.name.split('-')[-1]:''.join([nt for nt in rec.seq]) \
for rec in recs}
# get length of the MSA and check that all of the seq are the same length
@DarioS
DarioS / calcFPKMs.R
Created August 22, 2011 06:07
Convert feature counts into RPKM counts per gene
setGeneric("calcFPKMs", function(counts, ...) {standardGeneric("calcFPKMs")})
setMethod("calcFPKMs", c("GRanges"),
function(counts, verbose = TRUE)
{
counts.df <- as.data.frame(counts)
counts.cols <- metadata(counts)[["counts.cols"]] + 5
# Only use read counts from the known transcriptome.
counts.df <- counts.df[counts.df[, "type"] %in% c("exon", "junction"), ]