Created Jun 18, 2013
Parses output from Cufflinks' cuffdiff averaging two samples. Outputs tab-delimited csv files (log2 transformed and not).
from sys import argv
import csv,StringIO
import math
expr = open(argv[1], 'r')
output = open(argv[1] + '.bed','w')
output_log = open(argv[1] + '_log2.bed','w')
Created Aug 5, 2013
The tool reads the description of JASPAR and reads all files required to provide decent input to clover. jaspar2fasta takes the directory containing all the matrix_list.txt file of JASPAR as the only argument. The output should be redirected into a new file on which to subsequently work with clover.
#!/usr/bin/perl -w
# Convert JASPAR matrices to fasta-like format
# Written by Martin C Frith
# I intend that anyone who finds this code useful be free to use,
# modify, or redistribute it without any restrictions
=head1 NAME
jaspar2fasta - conversion of JASPAR database release for use with clover
Last active Dec 20, 2015
Batch script to preprocess ChIP PE reads samples. Parallelization with GNU parallel.
# Pipeline for PE samples
# paths and variables to change
Created Dec 26, 2013
Reveal likely WPA key for Thompson Routers based on network's SSID. Usage: <SSID>
#!/usr/bin/env python
#modified from:
import sys
import hashlib
from binascii import hexlify, unhexlify
from itertools import product
from multiprocessing import Process
Created Jan 27, 2014
Takes a bed file with gene annotation ("annotationFile.bed") and makes another with TSS annotation ("annotationFile.TSSs.bed")
awk -v OFS='\t' '$6 == "+" {print $1, $2, $2+1, $4, $5, $6}' annotationFile.bed > tmp
awk -v OFS='\t' '$6 == "-" {print $1, $3, $3+1, $4, $5, $6}' annotationFile.bed >> tmp
bedtools sort -i tmp > annotationFile.TSSs.bed
Last active Jan 4, 2016
Reads in a score (1-100) and prints out the corresponding character (grade). Done without the if control structure.
#Create a method that reads in a score (1-100) and prints out the corresponding character (grade).
#Assume the following grade assignment: 'A' = 100-81 points, 'B' = 80-61 points, 'C' = 60-41 points, 'D' = 40-21 points and 'E' = 20-1 points.
# Don't use the "if" control structure
score = 1
scale = [range(81,100), range(61,80), range(41,60), range(21,40), range(1,20)]
grades = ["A", "B", "C", "D", "E"]
for grade in range(0, len(scale)):
while score in scale[grade]:
Last active Jan 4, 2016
Make tab-delimited chromossome size file from fasta genome
import csv
from Bio import SeqIO
fastagenome = "data/oikopleura/assembly/Oikopleura_reference_unmasked_v3.0.fa"
output = "data/oikopleura/assembly/Oikopleura_reference_chrSizes.tsv"
myfile = open(output, "wb")
spamwriter = csv.writer(myfile, delimiter='\t', quoting=csv.QUOTE_MINIMAL)
for seq_record in SeqIO.parse(fastagenome, "fasta"):
import numpy as np
import pandas as pd
class DifferentialRegions(object):
Compute two-tailed empirical p-value for difference between values of two variables.
def __init__(self, df, a, b, permutations=100, alpha=0.05, correct=True):
super(DifferentialRegions, self).__init__()
Created Feb 4, 2016
Mass rename files cheatsheet
# I just need to have these somewhere to remember them later
for F in `find . | grep -e 'CM[0-9]\{2,\}s'`
echo $F $(echo $F | sed 's/CM\([0-9]\{2,\}\)s/CM\1-/g')
mv $F $(echo $F | sed 's/CM\([0-9]\{2,\}\)s/CM\1-/g')
for F in `find . | grep -e '_[1-2]_' | grep -v PBMC`
Last active Feb 4, 2016
NGS for dummies

Introduction to next-generation sequencing (NGS)

General workflow

The current used technology for next generation sequencing is Illumina sequencing - all others cannot compete with its speed, price and output power - they have therefore specialized in niche applications (not discussed here).

Nevertheless, no sequencing technology cannot simply start sequencing one end of a chromosome until the other end.

The approach therefore is: