Daren Card darencard

## missing_from_vcf.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                darencard
                / missing_from_vcf.md
            
            
              Last active
              January 12, 2017 16:26
            
              
                Extract the proportion of missing data per sample in a VCF/BCF file
              
          
    Simply replace <<FILE>> with your properly formated VCF/BCF file name (2 places).
Required bcftools v. 1.2+.
paste \
<(bcftools query -f '[%SAMPLE\t]\n' <<FILE>> | head -1 | tr '\t' '\n') \
<(bcftools query -f '[%GT\t]\n' <<FILE>> | awk -v OFS="\t" '{for (i=1;i<=NF;i++) if ($i == "./.") sum[i]+=1 } END {for (i in sum) print i, sum[i] / NR }' | sort -k1,1n | cut -f 2)

  
## filter_high_missing_samples.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                darencard
                / filter_high_missing_samples.md
            
            
              Created
              January 12, 2017 17:14
            
              
                Filter away samples from a VCF/BCF that have high amounts of missing data
              
          
    Simply replace <<INPUT>>, <<OUTPUT>>, and <<PROP>> with the input file name, output file name, and proportion missing data at which points samples begin to get excluded, repectively. For example, 0.75 means that samples with greater than 75% missing data are filtered away. Requires bcftools v. 1.2+.
bcftools view -S ^<(paste <(bcftools query -f '[%SAMPLE\t]\n' <<INPUT>> | head -1 | tr '\t' '\n') <(bcftools query -f '[%GT\t]\n' <<INPUT>> | awk -v OFS="\t" '{for (i=1;i<=NF;i++) if ($i == "./.") sum[i]+=1 } END {for (i in sum) print i, sum[i] / NR }' | sort -k1,1n | cut -f 2) | awk '{ if ($2 > <<PROP>>) print $1 }') <<INPUT>> | bgzip > <<OUTPUT>>

  
## parse_ncbi_python.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                darencard
                / parse_ncbi_python.md
            
            
              Last active
              February 15, 2017 14:52
            
              
                shell one-liner that parses the fasta headers from the NCBI python genome. will likely work on other genomes from NCBI as well.
              
          
    shell one-liner that parses the fasta headers from the NCBI python genome. will likely work on other genomes from NCBI as well.
output fields:

transcript ID
full transcript ID w/ version (.1, .2, etc.)
full gene identifier (watch out for spaces and weird symbols)
gene symbol
transcript variant (watch out for spaces), with NA meaning none
type of transcript (mRNA, ncNRA, etc.)


## parse_ncbi_transcripts.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                darencard
                / parse_ncbi_transcripts.md
            
            
              Last active
              February 21, 2017 15:50
            
              
                parse transcripts out of NCBI GFF based on gene ids
              
          
    Script to parse a NCBI GFF based on transcript IDs (e.g., XM_000..). These transcript IDs must not include the version suffix (.1, .2, etc.).
Columns returned:

chromosome/scaffold
start position of transcript
end position of transcript
transcript number
gene number
gene ID (NCBI)


## image_dimensions.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                darencard
                / image_dimensions.md
            
            
              Created
              March 3, 2017 16:39
            
              
                Python script that determines image dimensions using ImageJ
              
          
    Overview:

Below is a ImageJ Python script that will read a user-provided image file and output the width and height, with units, to stdout. This macro is designed to be called from the command line using the ImageJ executable. With my Mac OSX computer running Fiji, the path is /Applications/Fiji.app/Contents/MacOS/ImageJ-macosx. This has not been tested elsewhere and may not work without some effort. It relies on the Bio-Formats plugin to read the file and was written to convert from Zeiss's .czi files, so no guarantee that it works with others as desired. It is especially important to note that this does not set the scale, but infers it based on the metadata stored in the .czi files. Therefore, it will probably not work well with other file types.
Usage:

/Applications/Fiji.app/Contents/MacOS/ImageJ-macosx --headless get_image_dims.py input
Python script:


## plotly_tutorial_offline_beers.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                darencard
                / plotly_tutorial_offline_beers.ipynb
            
            
              Created
              July 11, 2017 18:11
            
              
                Nice intro to Plotly in Python
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## active_heatmap.R
# install.packages(c("plotly", "reshape2", "ggdendro"))
# devtools::install_github("sjmgarnier/viridis")
library(ggplot2)
library(ggdendro)
library(plotly)
library(viridis)


# helper function for creating dendograms
ggdend <- function(df) {

## SLiM_intro_annotation_simulation.Md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                darencard
                / SLiM_intro_annotation_simulation.Md
            
            
              Created
              August 25, 2017 15:52
            
              
                Introduction to forward-time simulations using SLiM
              
          
    Some introductory notes on forward-time population genetic simulations using SLiM

SLiM is a newer, powerful piece of population genetic simulation software that is well documented, user-friendly, flexible, and has a pretty sweet GUI interface (plus command-line capability). The following script represents an initial dummy simulation situation I created as I got my feet wet with SLiM, and I added many notes to make it clear what each command was doing.
// in SLiM context are comments.
// set up a simple neutral simulation
initialize() {
	initializeMutationRate(1e-7);


## config_jbrowse.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                darencard
                / config_jbrowse.md
            
            
              Last active
              December 14, 2017 17:55
            
              
                Configuring JBrowse to display gene annotation tracks
              
          
    Configuring JBrowse

JBrowse is a handy genome browser and is especially useful for viewing the results of iterative rounds of MAKER. The documentation is decent, but for those not used to creating a data server, it can be difficult to understand. I struggled a bit at first.
Software & Data

Software


JBrowse version 1.12.3 (though other versions should work just fine)

Data


## variant_effect_analysis.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                darencard
                / variant_effect_analysis.md
            
            
              Last active
              January 26, 2018 21:11
            
              
                Discerning the impacts of genetic variants based on genome annotations
              
          
    Variant Effect Analysis using VEP

When calling variants it can be very useful to know what impact a polymorphism might have on the biology of an organism. It is often difficult to extend a simple list of variants to specific phenotypes, but it is possible to broadly classify a variant based on its possible impact on protein coding genes (e.g., mis-sense mutations, etc.). A very useful tool for performing such an analysis is the Varient Effect Predictor (VEP) tool, which is produced by the folks at Ensembl. This tutorial will describe how to perform such an analysis on your organism of interest using variants stored in a VCF file and annotations from a GFF file.
Software and Data


VEP must be installed. Detailed information on this tool and its installation can be found here.
Boa_constrictor_SGA_7C_scaffolds.fa: A genome FASTA file for your organism is requi
	# install.packages(c("plotly", "reshape2", "ggdendro"))
	# devtools::install_github("sjmgarnier/viridis")
	library(ggplot2)
	library(ggdendro)
	library(plotly)
	library(viridis)


	# helper function for creating dendograms
	ggdend <- function(df) {