Skip to content

Instantly share code, notes, and snippets.

Avatar

André F. Rendeiro afrendeiro

View GitHub Profile
@afrendeiro
afrendeiro / cutdiff_output_parser.py
Created Jun 18, 2013
Parses output from Cufflinks' cuffdiff averaging two samples. Outputs tab-delimited csv files (log2 transformed and not).
View cutdiff_output_parser.py
#!/usr/bin/python
from sys import argv
import csv,StringIO
import math
expr = open(argv[1], 'r')
output = open(argv[1] + '.bed','w')
output_log = open(argv[1] + '_log2.bed','w')
@afrendeiro
afrendeiro / jaspar2fasta.pl
Created Aug 5, 2013
The tool reads the description of JASPAR and reads all files required to provide decent input to clover. jaspar2fasta takes the directory containing all the matrix_list.txt file of JASPAR as the only argument. The output should be redirected into a new file on which to subsequently work with clover.
View jaspar2fasta.pl
#!/usr/bin/perl -w
# Convert JASPAR matrices to fasta-like format
# Written by Martin C Frith
# I intend that anyone who finds this code useful be free to use,
# modify, or redistribute it without any restrictions
=head1 NAME
jaspar2fasta - conversion of JASPAR database release for use with clover
@afrendeiro
afrendeiro / ChIP_mapping_pipeline.sh
Last active Dec 20, 2015
Batch script to preprocess ChIP PE reads samples. Parallelization with GNU parallel.
View ChIP_mapping_pipeline.sh
#!/bin/bash
# Pipeline for PE samples
# paths and variables to change
RAW=/sysdev/s3/share/data/oikopleura/chip-seq/raw
MAPPED=/sysdev/s3/share/data/oikopleura/chip-seq/mapped
GENOMEREF=~/data/oikopleura/assembly/Oikopleura_reference_unmasked_v3.0.fa
CHRSIZES=~/data/oikopleura/assembly/Oikopleura_reference_chrSizes.tsv
@afrendeiro
afrendeiro / speedtouchkey.py
Created Dec 26, 2013
Reveal likely WPA key for Thompson Routers based on network's SSID. Usage: speedtouchkey.py <SSID>
View speedtouchkey.py
#!/usr/bin/env python
#original: http://www.korokithakis.net/posts/thomsonspeedtouch-routers-and-wpa-keys/
#modified from: http://pastie.org/3108591
import sys
import hashlib
from binascii import hexlify, unhexlify
from itertools import product
from multiprocessing import Process
@afrendeiro
afrendeiro / annotateTSSs.sh
Created Jan 27, 2014
Takes a bed file with gene annotation ("annotationFile.bed") and makes another with TSS annotation ("annotationFile.TSSs.bed")
View annotateTSSs.sh
awk -v OFS='\t' '$6 == "+" {print $1, $2, $2+1, $4, $5, $6}' annotationFile.bed > tmp
awk -v OFS='\t' '$6 == "-" {print $1, $3, $3+1, $4, $5, $6}' annotationFile.bed >> tmp
bedtools sort -i tmp > annotationFile.TSSs.bed
@afrendeiro
afrendeiro / gradeStuff.py
Last active Jan 4, 2016
Reads in a score (1-100) and prints out the corresponding character (grade). Done without the if control structure.
View gradeStuff.py
#Create a method that reads in a score (1-100) and prints out the corresponding character (grade).
#Assume the following grade assignment: 'A' = 100-81 points, 'B' = 80-61 points, 'C' = 60-41 points, 'D' = 40-21 points and 'E' = 20-1 points.
# Don't use the "if" control structure
score = 1
scale = [range(81,100), range(61,80), range(41,60), range(21,40), range(1,20)]
grades = ["A", "B", "C", "D", "E"]
for grade in range(0, len(scale)):
while score in scale[grade]:
@afrendeiro
afrendeiro / getChrSizesFromFasta.py
Last active Jan 4, 2016
Make tab-delimited chromossome size file from fasta genome
View getChrSizesFromFasta.py
import csv
from Bio import SeqIO
fastagenome = "data/oikopleura/assembly/Oikopleura_reference_unmasked_v3.0.fa"
output = "data/oikopleura/assembly/Oikopleura_reference_chrSizes.tsv"
myfile = open(output, "wb")
spamwriter = csv.writer(myfile, delimiter='\t', quoting=csv.QUOTE_MINIMAL)
for seq_record in SeqIO.parse(fastagenome, "fasta"):
View empirical_pvalues.py
import numpy as np
import pandas as pd
class DifferentialRegions(object):
"""
Compute two-tailed empirical p-value for difference between values of two variables.
"""
def __init__(self, df, a, b, permutations=100, alpha=0.05, correct=True):
super(DifferentialRegions, self).__init__()
@afrendeiro
afrendeiro / mass_rename.sh
Created Feb 4, 2016
Mass rename files cheatsheet
View mass_rename.sh
# I just need to have these somewhere to remember them later
for F in `find . | grep -e 'CM[0-9]\{2,\}s'`
do
echo $F $(echo $F | sed 's/CM\([0-9]\{2,\}\)s/CM\1-/g')
mv $F $(echo $F | sed 's/CM\([0-9]\{2,\}\)s/CM\1-/g')
done
for F in `find . | grep -e '_[1-2]_' | grep -v PBMC`
do
@afrendeiro
afrendeiro / ngs_101.md
Last active Feb 4, 2016
NGS for dummies
View ngs_101.md

Introduction to next-generation sequencing (NGS)

General workflow

The current used technology for next generation sequencing is Illumina sequencing - all others cannot compete with its speed, price and output power - they have therefore specialized in niche applications (not discussed here).

Nevertheless, no sequencing technology cannot simply start sequencing one end of a chromosome until the other end.

The approach therefore is: