Skip to content

Instantly share code, notes, and snippets.

View tbrittoborges's full-sized avatar
🎯

Thiago Britto Borges tbrittoborges

🎯
View GitHub Profile
@tbrittoborges
tbrittoborges / bioinfo_bits.py
Created February 22, 2016 12:02
python interesting bits of bioinformatics
#
import operator
sequence = "ACGACTGATCGATCGATCGATGCATCGATCGACGAT"
random_positions = random.sample(xrange(len(sequence)), 30)
get_positions = operator.itemgetter(*random_positions)
get_positions(sequence)
('T', 'C', 'G', 'C', 'A', 'C', 'C', 'T', 'A', 'T', 'G', 'T', 'A', 'T', 'C', 'C', 'T', 'T', 'A', 'G', 'T', 'A', 'A', 'A', 'C', 'G', 'G', 'C', 'G', 'A')
from itertools import groupby
@tbrittoborges
tbrittoborges / pd_latex_table.py
Last active February 9, 2018 22:14
Pandas recipe for better latex tables
def better_table(table, caption, name):
start = r"""
\begin{{table}}[!htb]
\sisetup{{round-mode=places, round-precision=2}}
\caption{{{}}}\label{{table:{}}}
\centering
""".format(caption, name)
end = r"\end{table}"
@tbrittoborges
tbrittoborges / unlistfy.py
Last active May 20, 2016 11:16
Pandas unlistfy one colum
df['new'] = df['new'].str.split('/') # example how to listfy a column of strings
temp = pd.DataFrame(df['new'].dropna().tolist())
temp = temp.stack()
temp.index = temp.index.droplevel(1) # index need to be coherent with the original dataframe
temp.name = 'new_colum' # name of the new column in the original dataframe
df = df.join(temp)
@tbrittoborges
tbrittoborges / example_flowdiagram.tex
Last active July 21, 2016 15:04
example of a flow diagram with latex tikzpicture
\documentclass{article}
\usepackage{tikz}
\usepackage{array}
\usepackage{siunitx}
\usetikzlibrary{shapes.geometric, shapes.misc, arrows, fit, calc}
\newcommand\addvmargin[1]{
\node[fit=(current bounding box),inner ysep=#1,inner xsep=0]{};
}
@tbrittoborges
tbrittoborges / gist:f3a58425f5f5d5fbab747af5dc364d83
Created November 7, 2017 13:36
remove_r_installed_by_conda.sh
# run this in you bash command line
# list all r3 packages installed with conda:
conda list | grep r3 | awk '{print $1}')
# remove all pakages r3
for i in $(conda list | grep r3 | awk '{print $1}'); do conda remove -y $i; done
# finally, remove R
conda remove r-essentials
@tbrittoborges
tbrittoborges / pandas_reverse_complement.py
Created December 22, 2017 12:26
Pandas reverse complement
def reverse_complement(sequence):
tab = str.maketrans("ACGT", "TGCA")
return sequence.translate(tab)[::-1]
def apply_rc(row):
if row['strand'] == '-':
row['seq'] = reverse_complement(row['seq'])
return row
@tbrittoborges
tbrittoborges / Junction_type_classification.py
Created January 22, 2018 12:58
Junction type classification
def junction_type2(row):
"""Junction type classification"""
# if there is no exons supported by realible junctions
# return a interable with empty strings
if row['exons_w_junct_sup'] is None:
return ['', '']
type_ = []
# each row is a junction
j_start, j_end, exons, strand = row.loc[[
@tbrittoborges
tbrittoborges / dorina_example.py
Created February 8, 2018 21:41
code for the Carina website
def analyse(genome, set_a, match_a='any', region_a='any',
set_b=None, match_b='any', region_b='any',
combine='or', genes=None, window_a=-1, window_b=-1,
datadir=None):
# It takes the name of the genome assembly to use, and at least a list of set A regulator names.
# A simple analysis run with a custom regulator would be:
from dorina.run import analyse
results = analyse('hg19', ['/path/to/custom/regulator.bed', 'PARCLIP_PUM2_hg19'])
@tbrittoborges
tbrittoborges / mean_sd_read_length.sh
Created February 23, 2018 12:54
calculates the average and sd read length from https://www.biostars.org/p/243552/#243563
for f in raw_reads{39..50}.fq.gz
do
echo "$f "
gzip -cd $f | awk 'BEGIN { t=0.0;sq=0.0; n=0;} ;NR%4==2 {n++;L=length($0);t+=L;sq+=L*L;}END{m=t/n;printf("total %d avg=%f stddev=%f\n",n,m,sq/n-m*m);}'
done
def read_fasta_from_str(fasta):
"""
:param str fasta: multiple sequences in fasta string
"""
from itertools import groupby
def is_header(line):
return line.startswith(">")