Skip to content

Instantly share code, notes, and snippets.

View michaelbarton's full-sized avatar

Michael Barton michaelbarton

View GitHub Profile

Continuous, reproducible genome assembler benchmarking

New bioinformatics software is always being produced and published. The constant stream of new developments makes it difficult to keep track of the software available for common bioinformatics tasks. An example of this is the domain of genome assembly where there is already a large number of existing software.

If you are researching which bioinformatics software to use, it can be difficult understanding how effective the software is. For example given a new

import System.Environment
import Data.List.Split
import Data.List
import qualified Data.Map.Strict as Map
main = do
[f] <- getArgs
contents <- readFile f
putStr . unlines . map unwords . map (kmers 6) . getSequences $ contents
require 'open-uri'
require 'rubygems'
require 'hpricot'
ARGV.each do |gi|
# Fetch the URL and pass the contents to Hpricot
url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=#{gi}&rettype=fasta&retmode=xml"
doc = Hpricot.XML(open(url).read)
@michaelbarton
michaelbarton / specification.mkd
Last active August 29, 2015 14:11
Specification for bioinformatics containers

Introduction

Purpose

The purpose of this document is provide a detailed specification for developers to write community-standardised bioinformatics containers. The audience of this document are bioinformaticians writing bioinformatics code that can be shared interchangeably using Linux containers. This document will describe the interfaces that a developer may expect to be available when the container is run.

bind-key C-a last-window
bind-key C-c run-shell "tmux show-buffer | reattach-to-user-namespace pbcopy"
bind-key C-o rotate-window
bind-key C-v run-shell "reattach-to-user-namespace pbpaste | tmux load-buffer - && tmux paste-buffer"
bind-key C-z suspend-client
bind-key Space copy-mode
bind-key ! break-pane
bind-key " split-window
bind-key # list-buffers
bind-key $ command-prompt -I #S "rename-session '%%'"
library(reshape)
library(lattice)
#
# Categorical example
#
data1 <- read.csv(file="plots/categorical.csv")
# Reshape the data into long format
This file has been truncated, but you can view the full file.
++ mktemp -d
+ TMP_DIR=/tmp/tmp.5CiDif8NjP
+ cd /tmp/tmp.5CiDif8NjP
+ PROC=default
++ cut -f 2 -d :
++ egrep '^default:' /Procfile
+ CMD=' gatb -p /inputs/reads.fq.gz --multik -o genome '
+ [[ -z gatb -p /inputs/reads.fq.gz --multik -o genome ]]
+ eval gatb -p /inputs/reads.fq.gz --multik -o genome
++ mktemp -d
+ TMP_DIR=/tmp/tmp.jL0YC9HdH6
+ cd /tmp/tmp.jL0YC9HdH6
+ PROC=default
++ cut -f 2 -d :
++ egrep '^default:' /Procfile
+ CMD=' minia -in /inputs/reads.fq.gz -kmer-size 55 -abundance 5 -out genome'
+ [[ -z minia -in /inputs/reads.fq.gz -kmer-size 55 -abundance 5 -out genome ]]
+ eval minia -in /inputs/reads.fq.gz -kmer-size 55 -abundance 5 -out genome
++ minia -in /inputs/reads.fq.gz -kmer-size 55 -abundance 5 -out genome
#!/usr/bin/env Rscript
# Assume that the libraries are installed in "../vendor/r/"
# Adjust this path to the relative path between the script
# and the packrat directory
set_package_environment <- function(args){
dir <- dirname(sub("--file=", "", args[grep("--file", args)]))
libs <- file.path(dir, "..", "vendor", "r", "packrat", "lib", "*", "*")
.libPaths(c(.libPaths(), libs))
# Prompt with just directory basename
PS1="\W $ "
# Colourise ls output an sort by size
alias ls='ls -l -G -t'