Skip to content

Instantly share code, notes, and snippets.

@radaniba
radaniba / parsefasta.html
Created November 29, 2012 16:53
Parse fasta with Javascript
#I found this stuff interesting to share, more on http://code.google.com/p/bio-js/
#This is my attempt at a bioinformatics framework in JavaScript. I've noticed more and more bioinformatics interfaces going live online, and so we are in a kind of cloud-computing era in bioinformatics. For example, there are many sites out there which let a user paste FASTA sequences into a web form. Things like parsing FASTA files should be easy to do and should be object oriented. I have decided on using the existing framework PrototypeJS to facilitate the process and to ensure that the project is compatible with most operating systems and browsers.
#Example
<html>
<head>
<title>Bio-JS Test</title>
<script src='lib/prototype.js' type='text/javascript'></script>
@radaniba
radaniba / pearsoncorrelation.pl
Created November 29, 2012 16:54
A small script to calculate pearson correlation
#!/usr/bin/perl
# Correlation
# Didier Gonze
# Updated: 28/4/2004
##########################################################################################
&ReadArguments;
@radaniba
radaniba / parseblast-anotherscript.pl
Created November 29, 2012 16:55
perl script to parse blast output
# Variables:
# $NQuery - the number of query sequences
# $QueryHeader{$i} - the header line for query $i
# $QueryLength{$i} - the length of the query
# $Database{$i} - the database searched
# $DbSequences{$i} - the number of sequences in the database
# $DbLength{$i} - the number of residues in the database
# $Lambda{$i} - lambda factor
# $Kterm{$i} - K term
# $Information{$i} - expected information content of the alignment
@radaniba
radaniba / parsegb.rb
Created November 29, 2012 16:56
Parse Genbank with BioRuby
#You can parse Genbank bank files with BioRuby the standard way, but there's a hidden problem. If the file ends with blank lines, i.e. after the genbank terminator (two forwards slashes, //) there are empty lines, BioRuby reads these as additional, empty records. However, you can route around this by trimming the blank lines before handing it to the parser.
puts "Parsing seqs ..."
Bio::FlatFile.auto("foo.genbank").each_entry { |gb|
puts "Sequence '#{gb.to_biosequence.entry_id}'"
}
puts "Finished."
which will print the id of every sequence in the file. However, if the file ends with blank lines, i.e. after the genbank terminator (two forwards slashes, which the wiki markup doesn't like) there are empty lines, BioRuby reads these as additional, empty records:
@radaniba
radaniba / colornodes.rb
Created November 29, 2012 16:58
Coloring Nodes on a Phylogeny
#An automation of a tedious task I have to do often: coloring the nodes on a phylogeny. This script takes a dendroscope tree file and a "color description" file, a simple csv file with taxa labels and a corresponding color. The color may either be an RGB triplet or a scalar value which will be mapped to a pallete. Usage is:
#color-dendro.rb [options] CLRFILE TREEFILE1 [...]
#where the options are:
#-h, --help Display this screen
#-m, --default-color STR The default color nodes will be given
#--map-to-colors The coloring instructions give a float value which will be mapped to a color
#--save STR
@radaniba
radaniba / parsetree.py
Created November 29, 2012 16:59
A sample script using the ETE package to detect shift in internal nodes where a significant change in enrichment values (could be anything) happens
#A sample script using the ETE package to detect shift in internal nodes where a significant change in enrichment values (could be anything) happens
#An example (using the scipy python module to perform a K-S test):
from scipy import stats
from ete2 import Tree
newick = "((((A, B)edge1, C)edge2, ((D, E)edge3, F)edge4)edge5, (((G, H)edge6, I)edge7, ((J, K)edge8, L)edge9)edge10)RootEdge;"
@radaniba
radaniba / multifasta.cs
Created November 29, 2012 17:00
Multifasta Parser is a new parser for fasta files. basically allows you to extract fasta sequences from multifasta file.
/*
to compile:
$gmcs multifasta-parser.cs -out:multifasta-parser
to run:
$mono multifasta-parser [/path/multifasta-file]
*/
using System;
using System.IO;
@radaniba
radaniba / convertformat.py
Created November 29, 2012 17:00
A simple Python script to convert biosequences between different formats.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Convert biosequences from one format to another.
Usage is: convbioseq [options] FORMAT INFILES ...
Options:
--version show program's version number and exit
@radaniba
radaniba / getseqbyid.rb
Created November 29, 2012 17:01
A simple script to download sequences by accession, vuia BioRuby. (Like much of BioRuby, finding a relevant example of how to do something can often be difficult.) It can accept accession ids on the commandline or by a piped file (one accession per line).
#!/usr/bin/env ruby
# download sequences from db by id
### IMPORTS
require 'bio'
require 'ostruct'
require 'timeout'
require 'pp'
require 'test/unit/assertions'
@radaniba
radaniba / alignment.rb
Created November 29, 2012 17:02
In which we explore the ill-defined and undocumented: In BioRuby, alignments are equipped with several methods for obtaining consensus sequences. Unfortunately, these have terse descriptions which point you at the BioPerl documentation, with the added bon
# For demonstration purposes, let's create a very simple alignment, where
# everything agrees xcept the last sequence which leads with a differing
# character and ends with a gap:
require 'bio'
aln = Bio::Alignment.new(['acgt', 'acgt', 'acgt', 'ccg-'])
# consensus_iupac produces a "true" consensus sequence across all members.
# If sequences differ, the consensus sequence has an ambiguous character
# that sums these differences: