Skip to content

Instantly share code, notes, and snippets.

View nathanhaigh's full-sized avatar

Nathan S. Watson-Haigh nathanhaigh

  • Alkahest Inc.
  • Adelaide, Australia
  • 22:00 (UTC +09:30)
  • X @watsonhaigh
View GitHub Profile
@nathanhaigh
nathanhaigh / mlFASTA2slFASTA.sh
Created November 14, 2014 04:09
Convert a FASTA file containing multi-line (wrapped) sequences to occupy a single line
#!/bin/bash
# Convert a multi-line FASTA file into a single line FASTA file
# These are easier/faster to process using native UNIX tools like paste - - < in.fasta
dos2unix | awk 'BEGIN { RS = "\n>"; FS = "\n"; OFS = "" };
{
if (NR == 1) {
print $1
} else
if (NR > 1) {
print ">"$1
@nathanhaigh
nathanhaigh / asqg2dot.pl
Created August 27, 2012 03:39
Convert asqg formatted graph into dot format
#!/usr/bin/perl
# Usage: gunzip -c graph.asqg.gz | asqg2dot.pl ['<node_label_regex>'] > graph.dot
use strict;
use warnings;
my $bidirected = 1;
my $highlight_vertex_regex = shift @ARGV || '';
my %vertex_labels;
@nathanhaigh
nathanhaigh / quality_value_summary.sh
Last active October 10, 2015 19:38
This script takes FASTQ formatted sequences on STDIN and computes the number of occurences of each quality character. Useful for determining what FASTQ encoding the FATSQ file might be using.
#!/bin/bash
# Usage: quality_value_summary.sh < in.fastq > quality_stats.txt
# Inspired by Torsten Seemann's blog post:
# http://thegenomefactory.blogspot.com.au/2012/05/cool-use-of-unix-paste-with-ngs.html
# Inspired by:
# http://www.unix.com/shell-programming-scripting/37305-how-convert-hex-value-dec.html
paste - - - - | \
cut -f 4 | \
xxd -g 1 -c 1 -p | \
grep -v '0a' | \
@nathanhaigh
nathanhaigh / my.mates
Created October 16, 2012 22:37
Add read groups to SAM file using AMOS-style mates file
library Illimina_pe 200 500 ^(AGRF).*$
# Note sure if the following single-ended "library" is valid for AMOS:
library 454_se 0 0 ^(GKUZBF401|GKUZBF402|GK7W4KZ01|GK7W4KZ02|GK538RQ01|GK538RQ02|GK0HZXL01|GK0HZXL02|F196STF01|F196STF02|F4S7VAM01|F4S7VAM02|F5ZSAHT01|F5ZSAHT02|F6J8SXW01|F6J8SXW02).*$
library 454_3kb 1000 5000 ^(GK33D5G01|GK33D5G02|GK9RFDF01|GK9RFDF02|GLBLI5G01|GLBLI5G02|GLDJ2JW01|GLKV9FY01).*$
library 454_20kb 8000 32000 ^(F5XZBDV01|F5XZBDV02).*$
pair ^(.+?)_left.*$ ^(.+?)_right.*$
pair ^(.+?)/1.*$ ^(.+?)/2.*$
@nathanhaigh
nathanhaigh / ace2fas.pl
Created October 17, 2012 02:17
Convert contigs of an ACE file into a FASTA file.
#!/usr/bin/perl
#
# Convert contigs of an ACE file into a FASTA file.
# Usage: ace2fas.pl < my.ace > my.fas
#
use strict;
use warnings;
while(<>){
next until /^CO\s+(\S+).+?$/; # read ACE file till next contig starts
@nathanhaigh
nathanhaigh / mira_install.sh
Created October 18, 2012 00:41
Download, build and install MIRA assembler and dependencies for regular, non-sudo user.
#!/bin/bash
#
# NOTE: This may not entirely work! I couldn't quite get this working on my CentOS release 5.6 (Final) box
#
# References: http://www.freelists.org/post/mira_talk/header-file-missing-in-my-configlog,7
# https://svn.boost.org/trac/boost/ticket/5917
# http://permalink.gmane.org/gmane.comp.lib.boost.user/69898
####################
# Specify versions #
@nathanhaigh
nathanhaigh / sspace_evidence2agp.pl
Created October 29, 2012 12:22
Convert an SSPACE evidence file into an AGP 2.0 file
#!/usr/bin/perl
#
# Usage: sspace_evidence2agp.pl formattedcontigs.fasta < final.evidence > out.agp 2> out.stderr
# e.g. sspace_evidence2agp.pl intermediate_results/standard_output.formattedcontigs.fasta < standard_output.final.evidence > standard_output.agp 2> standard_output.agp.stderr
#
# What this script does:
# 1) Uses the *.final.evidence file created by SSPACE to generate an AGP v2.0 file.
# 2) Uses information in the *.formattedcontigs.fasta file to recover the original contig
# names.
# 3) Non-positive length gaps are output as component_type=U and gap length 100, as per the
@nathanhaigh
nathanhaigh / ACAD.sh
Created November 3, 2012 13:39
Script to take deploy the ACAD workshop to the NGSTrainingV1.2 image.
#!/bin/bash
export WORKSHOP_NAME="ACAD"
export PARENT_DIR="/mnt"
export SUDO_USER="ubuntu"
export TRAINEE_USER="ngstrainee"
export TMPDIR=/mnt/tmp
# set the timezone
TZ="Australia/Adelaide"
echo "$TZ" > /etc/timezone
@nathanhaigh
nathanhaigh / NXServer.sh
Created November 3, 2012 13:37
Script to build an Ubuntu machine with the FreeNX server for a remote desktop-like connection. Can be used as "user data" to be passed to cloud-init in cloud environments.
#!/bin/bash
apt-get update
apt-get -y dist-upgrade
apt-get -y install ubuntu-desktop gnome-session-fallback python-software-properties
add-apt-repository -y ppa:freenx-team
apt-get update
apt-get -y install freenx
cd /tmp
wget https://bugs.launchpad.net/freenx-server/+bug/576359/+attachment/1378450/+files/nxsetup.tar.gz
tar xzf nxsetup.tar.gz
@nathanhaigh
nathanhaigh / fastq2fasta.sh
Last active December 10, 2015 22:08
Converts FASTQ to FASTA
#!/bin/bash
# Usage: fastq2fasta.sh < in.fastq > out.fasta
# Taken from http://stackoverflow.com/a/10359425/1413849 and happens to
# be able to convert 10 millions reads in 14 seconds!!
# The fastest converter I've seen!
sed -n '1~4s/^@/>/p;2~4p'