peterdfields peterdfields

## example.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              1 star
            
          
                camillescott
                / example.md
            
            
              Last active
              October 24, 2016 20:13
            
              
                Example dammit 1.0 output
              
          
    dammit

a tool for easy de novo transcriptome annotation

by Camille Scott
v1.0.dev0, 2016
submodule: annotate

Dependency Check


## deinterleave_fastq.sh
#!/bin/bash
# Usage: deinterleave_fastq.sh < interleaved.fastq f.fastq r.fastq [compress]
#
# Deinterleaves a FASTQ file of paired reads into two FASTQ
# files specified on the command line. Optionally GZip compresses the output
# FASTQ files using pigz if the 3rd command line argument is the word "compress"
#
# Can deinterleave 100 million paired reads (200 million total
# reads; a 43Gbyte file), in memory (/dev/shm), in 4m15s (255s)
#

## measure-memory.pl
#!/usr/bin/perl

my $cmd = 'strace -e trace=mmap,munmap,brk ';
for my $arg (@ARGV) {
    $arg =~ s/'/'\\''/g;
    $cmd .= " '$arg'";
}
$cmd .= ' 2>&1 >/dev/null';
open( PIPE, "$cmd|" ) or die "Cannot execute command \"$cmd\"\n";

## how-to-install-latest-gcc-on-ubuntu-lts.txt
These commands are based on a askubuntu answer http://askubuntu.com/a/581497
To install gcc-6 (gcc-6.1.1), I had to do more stuff as shown below.
USE THOSE COMMANDS AT YOUR OWN RISK. I SHALL NOT BE RESPONSIBLE FOR ANYTHING.
ABSOLUTELY NO WARRANTY.

If you are still reading let's carry on with the code.

sudo apt-get update && \
sudo apt-get install build-essential software-properties-common -y && \
sudo add-apt-repository ppa:ubuntu-toolchain-r/test -y && \

## addblastdb.sh
#!/bin/bash
j=$(basename $1)
/usr/bin/makeblastdb -in $1 -dbtype $2 -title $j
killall sequenceserver
/usr/bin/screen -dmS ss /usr/local/bin/sequenceserver -d=/home//blast/public/blast/ -H 127.0.0.1 -p 4567 > /dev/null
exit

## CGP-Pipeline.md

      
              1 file
            
          
              1 fork
            
          
                0 comments
              
            
              12 stars
            
          
                GDKO
                / CGP-Pipeline.md
            
            
              Last active
              April 2, 2020 16:42
            
              
                CGP Pipeline
              
          
    Preface

This is my recommended pipeline for assembly and annotation of small eukaryotic genomes (50 - 500 Mb).
All small scripts are available at CGP-scripts. For the programs a link is provided.
Please cite if you found the pipeline useful!


## example_tool.py
#!/usr/bin/python

import sys

'''
Just a simple tool that adds the line number at
the end of each line
'''

with open(sys.argv[1]) as f_in, open(sys.argv[2], 'w') as f_out:

## vcf2sweepfinder.py
import gzip
import csv
import argparse
import sys

parser = argparse.ArgumentParser(description="script to convert an all sites vcf to sweepfinder format. FASTA description will be the sample name in the VCF header.Only does one chromosome/region at a time.")
parser.add_argument("-v", "--vcf", action="store", required=True, help="Input VCF file. Should be a multisample vcf, though it should theoretically work with a single sample.")
parser.add_argument("-o", "--out", action="store", required=True, help="Output filename")
parser.add_argument("-c", "--chromosome", action="store", required=True, help="Chromosome to output. Should be something in the first column of the vcf.")
parser.add_argument("-g", "--gzip", action="store_true", required=False, help="Set if the VCF is gzipped.")

## credplot.R

credplot.gg <- function(d){
  # d is a data frame with 4 columns
  # d$x gives variable names
  # d$y gives center point
  # d$ylo gives lower limits
  # d$yhi gives upper limits
  require(ggplot2)
  p <- ggplot(d, aes(x=x, y=y, ymin=ylo, ymax=yhi))+
    geom_pointrange()+

## sam_sambamba.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              2 stars
            
          
                macmanes
                / sam_sambamba.md
            
            
              Last active
              January 22, 2017 20:00
            
              
                samtools v sambamba when streaming
              
          
    About 12% faster streaming sambamba view/sort than if using samtools..
Going from 50M raw pe reads to a sorted BAM file in 15 minutes is pretty sweet.
samtools 1.2

bwa index -p index bwa.Trinity.fasta
	#!/bin/bash
	# Usage: deinterleave_fastq.sh < interleaved.fastq f.fastq r.fastq [compress]
	#
	# Deinterleaves a FASTQ file of paired reads into two FASTQ
	# files specified on the command line. Optionally GZip compresses the output
	# FASTQ files using pigz if the 3rd command line argument is the word "compress"
	#
	# Can deinterleave 100 million paired reads (200 million total
	# reads; a 43Gbyte file), in memory (/dev/shm), in 4m15s (255s)
	#
	#!/usr/bin/perl

	my $cmd = 'strace -e trace=mmap,munmap,brk ';
	for my $arg (@ARGV) {
	$arg =~ s/'/'\\''/g;
	$cmd .= " '$arg'";
	}
	$cmd .= ' 2>&1 >/dev/null';
	open( PIPE, "$cmd\|" ) or die "Cannot execute command \"$cmd\"\n";
	These commands are based on a askubuntu answer http://askubuntu.com/a/581497
	To install gcc-6 (gcc-6.1.1), I had to do more stuff as shown below.
	USE THOSE COMMANDS AT YOUR OWN RISK. I SHALL NOT BE RESPONSIBLE FOR ANYTHING.
	ABSOLUTELY NO WARRANTY.

	If you are still reading let's carry on with the code.

	sudo apt-get update && \
	sudo apt-get install build-essential software-properties-common -y && \
	sudo add-apt-repository ppa:ubuntu-toolchain-r/test -y && \
	#!/bin/bash
	j=$(basename $1)
	/usr/bin/makeblastdb -in $1 -dbtype $2 -title $j
	killall sequenceserver
	/usr/bin/screen -dmS ss /usr/local/bin/sequenceserver -d=/home//blast/public/blast/ -H 127.0.0.1 -p 4567 > /dev/null
	exit
	#!/usr/bin/python

	import sys

	'''
	Just a simple tool that adds the line number at
	the end of each line
	'''

	with open(sys.argv[1]) as f_in, open(sys.argv[2], 'w') as f_out:
	import gzip
	import csv
	import argparse
	import sys

	parser = argparse.ArgumentParser(description="script to convert an all sites vcf to sweepfinder format. FASTA description will be the sample name in the VCF header.Only does one chromosome/region at a time.")
	parser.add_argument("-v", "--vcf", action="store", required=True, help="Input VCF file. Should be a multisample vcf, though it should theoretically work with a single sample.")
	parser.add_argument("-o", "--out", action="store", required=True, help="Output filename")
	parser.add_argument("-c", "--chromosome", action="store", required=True, help="Chromosome to output. Should be something in the first column of the vcf.")
	parser.add_argument("-g", "--gzip", action="store_true", required=False, help="Set if the VCF is gzipped.")

	credplot.gg <- function(d){
	# d is a data frame with 4 columns
	# d$x gives variable names
	# d$y gives center point
	# d$ylo gives lower limits
	# d$yhi gives upper limits
	require(ggplot2)
	p <- ggplot(d, aes(x=x, y=y, ymin=ylo, ymax=yhi))+
	geom_pointrange()+