by Camille Scott
v1.0.dev0, 2016
#!/bin/bash | |
# Usage: deinterleave_fastq.sh < interleaved.fastq f.fastq r.fastq [compress] | |
# | |
# Deinterleaves a FASTQ file of paired reads into two FASTQ | |
# files specified on the command line. Optionally GZip compresses the output | |
# FASTQ files using pigz if the 3rd command line argument is the word "compress" | |
# | |
# Can deinterleave 100 million paired reads (200 million total | |
# reads; a 43Gbyte file), in memory (/dev/shm), in 4m15s (255s) | |
# |
#!/usr/bin/perl | |
my $cmd = 'strace -e trace=mmap,munmap,brk '; | |
for my $arg (@ARGV) { | |
$arg =~ s/'/'\\''/g; | |
$cmd .= " '$arg'"; | |
} | |
$cmd .= ' 2>&1 >/dev/null'; | |
open( PIPE, "$cmd|" ) or die "Cannot execute command \"$cmd\"\n"; |
These commands are based on a askubuntu answer http://askubuntu.com/a/581497 | |
To install gcc-6 (gcc-6.1.1), I had to do more stuff as shown below. | |
USE THOSE COMMANDS AT YOUR OWN RISK. I SHALL NOT BE RESPONSIBLE FOR ANYTHING. | |
ABSOLUTELY NO WARRANTY. | |
If you are still reading let's carry on with the code. | |
sudo apt-get update && \ | |
sudo apt-get install build-essential software-properties-common -y && \ | |
sudo add-apt-repository ppa:ubuntu-toolchain-r/test -y && \ |
#!/bin/bash | |
j=$(basename $1) | |
/usr/bin/makeblastdb -in $1 -dbtype $2 -title $j | |
killall sequenceserver | |
/usr/bin/screen -dmS ss /usr/local/bin/sequenceserver -d=/home//blast/public/blast/ -H 127.0.0.1 -p 4567 > /dev/null | |
exit |
This is my recommended pipeline for assembly and annotation of small eukaryotic genomes (50 - 500 Mb).
All small scripts are available at CGP-scripts. For the programs a link is provided.
Please cite if you found the pipeline useful!
#!/usr/bin/python | |
import sys | |
''' | |
Just a simple tool that adds the line number at | |
the end of each line | |
''' | |
with open(sys.argv[1]) as f_in, open(sys.argv[2], 'w') as f_out: |
import gzip | |
import csv | |
import argparse | |
import sys | |
parser = argparse.ArgumentParser(description="script to convert an all sites vcf to sweepfinder format. FASTA description will be the sample name in the VCF header.Only does one chromosome/region at a time.") | |
parser.add_argument("-v", "--vcf", action="store", required=True, help="Input VCF file. Should be a multisample vcf, though it should theoretically work with a single sample.") | |
parser.add_argument("-o", "--out", action="store", required=True, help="Output filename") | |
parser.add_argument("-c", "--chromosome", action="store", required=True, help="Chromosome to output. Should be something in the first column of the vcf.") | |
parser.add_argument("-g", "--gzip", action="store_true", required=False, help="Set if the VCF is gzipped.") |
credplot.gg <- function(d){ | |
# d is a data frame with 4 columns | |
# d$x gives variable names | |
# d$y gives center point | |
# d$ylo gives lower limits | |
# d$yhi gives upper limits | |
require(ggplot2) | |
p <- ggplot(d, aes(x=x, y=y, ymin=ylo, ymax=yhi))+ | |
geom_pointrange()+ |