Skip to content

Instantly share code, notes, and snippets.

@radaniba
radaniba / read_and_convert_phylo_trees.py
Created February 14, 2013 17:47
Ete2 is a mighty useful toolkit for working with phylogenies. Now, one of the problems of phylogenetic data is the variety of formats it can come in, some programs needing one rather than the other. Ete2 is fairly good in this regard is that it can read and write Newick and PhyloXML (and NeXML). However, it cannot inter-convert between the two, …
import ete2
def tree_to_phyloxml (ete_tree):
"""
Convert an Ete2 tree to PhyloXML.
:Parameters:
ete_tree
An Ete2 format tree
@radaniba
radaniba / largedata.py
Created November 29, 2012 17:25
Object-oriented approach to handle very large (i.e., genomic) multi-sequence fasta files.
"""Object-oriented approach to handle very large (i.e., genomic) multi-sequence fasta files.
LFastaSeq is the central class of this module providing nucleotide-based access to individual sequence entries as if those entries were separate objects in memory.
Example for use in a command line pipe: varlist_generator | gedit <seq_file> > out_file
This writes a new sequence to <out_file> with nucleotide changes incorporated as specified by the list of variations <varlist>."""
import sys
import IOhelper
@radaniba
radaniba / readexcel.rb
Created November 29, 2012 17:25
Excel Spreadsheet Reader with Ruby
### IMPORTS
require 'roo'
require 'pp'
### IMPLEMENTATION ###
# A Excel spreadsheet reader that can clean up column names and convert data.
#
# Assumptions: The data is read off the first sheet of the workbook. The sheet
@radaniba
radaniba / bulk-bigwig.pl
Created November 29, 2012 17:23
This script calls the program bigWigSummary to extract enrichment from a bunch of bigWig files (they can be a lot of Chip-Seq experiments for example) for a set of intervals stored in a bed file. You need to files to run the script, a bed file containing
# author : Radhouane Aniba
#!/usr/bin/perl -w
@files = <FOLDER_FOR_BigWigs/*.bigWig>;
# For this example we have 4 bigwig files
open(BEDFILE, "FILENAME");
while(<BEDFILE>)
@radaniba
radaniba / ambigous.rb
Created November 29, 2012 17:22
Some times you need sequences that are unambiguous (i.e. only 'ACGT', lacking gaps). This script reads in a sequence file of any format, reports the location and nature of ambiguous characters, and optionally corrects these from a consensus sequence and s
checkseqs [options] INFILE [INFILE, INFILE ...]
where options are:
--repair-with-conc: patch ambiguous characters with the consensus sequence and save
--overwrite: newly created files can write over pre-existing ones
Consensus is calculated on a 50% threshold.
#!/usr/bin/env ruby
@radaniba
radaniba / fulltextsearch.sql
Created November 29, 2012 17:21
Mysql Fulltext search
# CREATE a TABLE TO USE FOR testing fulltext INDEX AND full-text search
DROP TABLE IF EXISTS test;
CREATE TABLE test (
id int UNSIGNED PRIMARY KEY AUTO_INCREMENT,
title char(100) NOT NULL,
article text NOT NULL,
fulltext (article) # CREATE the fulltext INDEX ON FIELD article
)engine=myisam; # Fulltext INDEX only works WITH MyISAM TABLES, which IS usually DEFAULT
# A way TO CREATE a fulltext INDEX ON an already existing TABLE
@radaniba
radaniba / format.pl
Created November 29, 2012 17:20
This is an example on how the Perl formatter works.
#!/usr/bin/perl
# This script takes something like this as input:
# ID WorkId Date
# 1 2 12-10-2009
# 2 16 01-01-2008</pre>
# And prints out this:
#+-----+----------+---------------+
#|ID |WorkId |Date |
@radaniba
radaniba / exon-intron.pl
Created November 29, 2012 17:20
Is there any signal in the intron of genes that could guide the mRNA splicing? This could be kind of a old question, but let's just start with it. So to find out the answer, the first thing you need is the sequence flanking the exon-intron border, or mayb
use Bio::EnsEMBL::Registry;
use Bio::SeqIO;
use Bio::Seq;
my $sp = "mouse";
my $ensembl = "Bio::EnsEMBL::Registry";
$ensembl->load_registry_from_db(
-host => 'ensembldb.ensembl.org',
-user => 'anonymous',
-verbose => '1'
@radaniba
radaniba / conservation2.pl
Created November 29, 2012 17:19
Get everything working, extract line by line in BED file, run hgWiggle and calculate conserved bases and conserved regions and output into bed file
#!/usr/bin/perl -w
my $numb = 1;
open(RefGenes, "< ../RefGenes.bed");
while(<RefGenes>)
{
print("$numb\r");
@radaniba
radaniba / conservedbases.R
Created November 29, 2012 17:18
Count the conserved bases and conserved region within a genomic region from phastCons44way UCSC track
cmd_args <- commandArgs(TRUE)
chr <- cmd_args[1]
cbeg <- cmd_args[2]
cend <- cmd_args[3]
tmp <- read.table("tmp",header=F)
nbases <- length(which(tmp$V1 != 0))
vec <- tmp$V1
i <- 1