Srividya Ramakrishnan srividya22

## counts_to_tpm.R
#' Convert counts to transcripts per million (TPM).
#'
#' Convert a numeric matrix of features (rows) and conditions (columns) with
#' raw feature counts to transcripts per million.
#'
#'    Lior Pachter. Models for transcript quantification from RNA-Seq.
#'    arXiv:1104.3889v2
#'
#'    Wagner, et al. Measurement of mRNA abundance using RNA-seq data:
#'    RPKM measure is inconsistent among samples. Theory Biosci. 24 July 2012.

## rpkm_versus_tpm.R
# RPKM versus TPM
#
# RPKM and TPM are both normalized for library size and gene length.
#
# RPKM is not comparable across different samples.
#
# For more details, see: http://blog.nextgenetics.net/?e=51

rpkm <- function(counts, lengths) {
  rate <- counts / lengths

## rename_genes_in_maker_gff.pl
#!/usr/bin/env perl

=head1 NAME

    rename_genes_in_maker_gff.pl

=head1 SYNOPSIS

    rename_genes_in_maker_gff.pl input_gff output_gff outputdir species
        where input_gff is the input gff file,

## rename_genes_in_maker_gff.pl
#!/usr/bin/env perl

=head1 NAME

    rename_genes_in_maker_gff.pl

=head1 SYNOPSIS

    rename_genes_in_maker_gff.pl input_gff output_gff outputdir species
        where input_gff is the input gff file,

## maker_genome_annotation.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                srividya22
                / maker_genome_annotation.md
            
            
              Created
              September 8, 2017 19:20
                — forked from darencard/maker_genome_annotation.md
            
              
                In-depth description of running MAKER for genome annotation.
              
          
    Genome Annotation using MAKER

MAKER is a great tool for annotating a reference genome using empirical and ab initio gene predictions. GMOD, the umbrella organization that includes MAKER, has some nice tutorials online for running MAKER. However, these were quite simplified examples and it took a bit of effort to wrap my head completely around everything. Here I will describe a de novo genome annotation for Boa constrictor in detail, so that there is a record and that it is easy to use this as a guide to annotate any genome.
Software & Data

Software prerequisites:


RepeatModeler and RepeatMasker with all dependencies (I used NCBI BLAST) and RepBase (version used was 20150807).
MAKER MPI version 2.31.8 (though any other version 2 releases should be okay).
[Augustus](http://bio


## maker_genome_annotation.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                srividya22
                / maker_genome_annotation.md
            
            
              Created
              September 8, 2017 19:20
                — forked from darencard/maker_genome_annotation.md
            
              
                In-depth description of running MAKER for genome annotation.
              
          
    Genome Annotation using MAKER

MAKER is a great tool for annotating a reference genome using empirical and ab initio gene predictions. GMOD, the umbrella organization that includes MAKER, has some nice tutorials online for running MAKER. However, these were quite simplified examples and it took a bit of effort to wrap my head completely around everything. Here I will describe a de novo genome annotation for Boa constrictor in detail, so that there is a record and that it is easy to use this as a guide to annotate any genome.
Software & Data

Software prerequisites:


RepeatModeler and RepeatMasker with all dependencies (I used NCBI BLAST) and RepBase (version used was 20150807).
MAKER MPI version 2.31.8 (though any other version 2 releases should be okay).
[Augustus](http://bio


## xargs.sh
# turn a find or cut (cut delimiter, get first column) output into a list
/etc find . -name "*bash*" | xargs
cut -d, -f1 file.csv | xargs

# find a file and grep for a word in the file
find . -name "*.java" | xargs grep "Stock"

# handeling filenames which have WHITESPACE
ls *txt | xargs  -d '\n' grep "cost"

## get_gap_postions.py
#!/usr/bin/env python
# Script to identify gaps regions in an assembly
# input : fasta
# output : bed
# usage : get_gap_postions.py fasta bed
# Import necessary packages
import argparse
import re
from Bio import SeqIO

## Advanced bedtools usage
Links:
http://quinlanlab.org/tutorials/bedtools/bedtools.html

Use Case 1:  Given a.bam and b.regions.bed. how to get the parts of b.regions.bed that are not covered by a.bam?
Answer:
bedtools genomecov -ibam aln.bam -bga \
               | awk '$4==0' |
               | bedtools intersect -a regions -b - > foo

Option -bga	Report depth in BedGraph format, as above (i.e., -bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: “grep -w 0$” to the output.

## README.md

      
              8 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                srividya22
                / README.md
            
            
              Created
              May 29, 2018 22:08
                — forked from jdblischak/README.md
            
              
                snakemake_vmem_usage
              
          
    Testing Snakemake virtual memory usage

John Blischak
2014-05-14
Multiple users have observed that submitting jobs via Snakemake
requires much more memory than is necessary to run the command
(e.g. mailing list post, [Bitbucket issue][issue]).
	#' Convert counts to transcripts per million (TPM).
	#'
	#' Convert a numeric matrix of features (rows) and conditions (columns) with
	#' raw feature counts to transcripts per million.
	#'
	#' Lior Pachter. Models for transcript quantification from RNA-Seq.
	#' arXiv:1104.3889v2
	#'
	#' Wagner, et al. Measurement of mRNA abundance using RNA-seq data:
	#' RPKM measure is inconsistent among samples. Theory Biosci. 24 July 2012.
	# RPKM versus TPM
	#
	# RPKM and TPM are both normalized for library size and gene length.
	#
	# RPKM is not comparable across different samples.
	#
	# For more details, see: http://blog.nextgenetics.net/?e=51

	rpkm <- function(counts, lengths) {
	rate <- counts / lengths
	#!/usr/bin/env perl

	=head1 NAME

	rename_genes_in_maker_gff.pl

	=head1 SYNOPSIS

	rename_genes_in_maker_gff.pl input_gff output_gff outputdir species
	where input_gff is the input gff file,
	# turn a find or cut (cut delimiter, get first column) output into a list
	/etc find . -name "bash" \| xargs
	cut -d, -f1 file.csv \| xargs

	# find a file and grep for a word in the file
	find . -name "*.java" \| xargs grep "Stock"

	# handeling filenames which have WHITESPACE
	ls *txt \| xargs -d '\n' grep "cost"
	#!/usr/bin/env python
	# Script to identify gaps regions in an assembly
	# input : fasta
	# output : bed
	# usage : get_gap_postions.py fasta bed
	# Import necessary packages
	import argparse
	import re
	from Bio import SeqIO
	Links:
	http://quinlanlab.org/tutorials/bedtools/bedtools.html

	Use Case 1: Given a.bam and b.regions.bed. how to get the parts of b.regions.bed that are not covered by a.bam?
	Answer:
	bedtools genomecov -ibam aln.bam -bga \
	\| awk '$4==0' \|
	\| bedtools intersect -a regions -b - > foo

	Option -bga Report depth in BedGraph format, as above (i.e., -bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: “grep -w 0$” to the output.