Skip to content

Instantly share code, notes, and snippets.

View darencard's full-sized avatar

Daren Card darencard

View GitHub Profile
@darencard
darencard / variant_effect_analysis.md
Last active January 26, 2018 21:11
Discerning the impacts of genetic variants based on genome annotations

Variant Effect Analysis using VEP

When calling variants it can be very useful to know what impact a polymorphism might have on the biology of an organism. It is often difficult to extend a simple list of variants to specific phenotypes, but it is possible to broadly classify a variant based on its possible impact on protein coding genes (e.g., mis-sense mutations, etc.). A very useful tool for performing such an analysis is the Varient Effect Predictor (VEP) tool, which is produced by the folks at Ensembl. This tutorial will describe how to perform such an analysis on your organism of interest using variants stored in a VCF file and annotations from a GFF file.

Software and Data

  1. VEP must be installed. Detailed information on this tool and its installation can be found here.
  2. Boa_constrictor_SGA_7C_scaffolds.fa: A genome FASTA file for your organism is requi
@darencard
darencard / extract_fastq_bam.md
Last active July 14, 2023 06:19
Extract paired FASTQ reads from a BAM mapping file

Please see the most up-to-date version of this protocol on my blog at https://darencard.net/blog/.

Extracting paired FASTQ read data from a BAM mapping file

Sometimes FASTQ data is aligned to a reference and stored as a BAM file, instead of the normal FASTQ read files. This is okay, because it is possible to recreate raw FASTQ files based on the BAM file. The following outlines this process. The useful software samtools and bedtools are both required.

From each bam, we need to extract:

  1. reads that mapped properly as pairs
  2. reads that didn’t map properly as pairs (both didn’t map, or one didn’t map)
@darencard
darencard / gnuplot_quickstart.md
Created August 31, 2017 14:20
A quick-start guide for using gnuplot for in-terminal plotting

A quick-start guide for using gnuplot for in-terminal plotting

Sometimes it is really nice to just take a quick look at some data. However, when working on remote computers, it is a bit of a burden to move data files to a local computer to create a plot in something like R. One solution is to use gnuplot and make a quick plot that is rendered in the terminal. It isn't very pretty by default, but it gets the job done quickly and easily. There are also advanced gnuplot capabilities that aren't covered here at all.

gnuplot has it's own internal syntax that can be fed in as a script, which I won't get into. Here is the very simplified gnuplot code we'll be using:

set terminal dumb size 120, 30; set autoscale; plot '-' using 1:3 with lines notitle

Let's break this down:

@darencard
darencard / SLiM_intro_annotation_simulation.Md
Created August 25, 2017 15:52
Introduction to forward-time simulations using SLiM

Some introductory notes on forward-time population genetic simulations using SLiM

SLiM is a newer, powerful piece of population genetic simulation software that is well documented, user-friendly, flexible, and has a pretty sweet GUI interface (plus command-line capability). The following script represents an initial dummy simulation situation I created as I got my feet wet with SLiM, and I added many notes to make it clear what each command was doing.

// in SLiM context are comments.

// set up a simple neutral simulation
initialize() {
	initializeMutationRate(1e-7);
@darencard
darencard / gdrive_download
Created August 1, 2017 18:58
Script to download files from Google Drive using Bash
#!/usr/bin/env bash
# gdrive_download
#
# script to download Google Drive files from command line
# not guaranteed to work indefinitely
# taken from Stack Overflow answer:
# http://stackoverflow.com/a/38937732/7002068
gURL=$1
@darencard
darencard / active_heatmap.R
Created August 1, 2017 18:57
R functions for creating interactive heatmap using Plotly (now packages exist to do this)
# install.packages(c("plotly", "reshape2", "ggdendro"))
# devtools::install_github("sjmgarnier/viridis")
library(ggplot2)
library(ggdendro)
library(plotly)
library(viridis)
# helper function for creating dendograms
ggdend <- function(df) {
@darencard
darencard / popstats_from_vcf.Md
Created July 17, 2017 16:16
Calculating population genetic statistics from VCF files using BCFtools

Useful Oneliners for Calculating Population Genetic Statistics from VCF files

The following commands require non-standard software like BCFtools and VCFtools.

thin variants to prevent linkage biases and output the number of sampled alleles and the allele frequency for the reference allele

vcftools --thin 10000 --recode --recode-INFO-all --stdout --gzvcf <my_variants.vcf.gz> | \
  bcftools query -f '%CHROM\t%POS[\t%GT]\n' - | \
 awk -v OFS="\t" '{ miss=0; hom_ref=0; hom_alt=0; het=0; \
@darencard
darencard / plotly_tutorial_offline_beers.ipynb
Created July 11, 2017 18:11
Nice intro to Plotly in Python
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@darencard
darencard / config_jbrowse.md
Last active December 14, 2017 17:55
Configuring JBrowse to display gene annotation tracks

Configuring JBrowse

JBrowse is a handy genome browser and is especially useful for viewing the results of iterative rounds of MAKER. The documentation is decent, but for those not used to creating a data server, it can be difficult to understand. I struggled a bit at first.

Software & Data

Software

  1. JBrowse version 1.12.3 (though other versions should work just fine)

Data

@darencard
darencard / maker_genome_annotation.md
Last active March 7, 2024 08:50
In-depth description of running MAKER for genome annotation.

Please see the most up-to-date version of this protocol on my blog at https://darencard.net/blog/.

Genome Annotation using MAKER

MAKER is a great tool for annotating a reference genome using empirical and ab initio gene predictions. GMOD, the umbrella organization that includes MAKER, has some nice tutorials online for running MAKER. However, these were quite simplified examples and it took a bit of effort to wrap my head completely around everything. Here I will describe a de novo genome annotation for Boa constrictor in detail, so that there is a record and that it is easy to use this as a guide to annotate any genome.

Software & Data

Software prerequisites:

  1. RepeatModeler and RepeatMasker with all dependencies (I used NCBI BLAST) and RepBase (ver