Skip to content

Instantly share code, notes, and snippets.

Daren Card darencard

Block or report user

Report or block darencard

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@darencard
darencard / maker_genome_annotation.md
Last active Dec 12, 2019
In-depth description of running MAKER for genome annotation.
View maker_genome_annotation.md

Genome Annotation using MAKER

MAKER is a great tool for annotating a reference genome using empirical and ab initio gene predictions. GMOD, the umbrella organization that includes MAKER, has some nice tutorials online for running MAKER. However, these were quite simplified examples and it took a bit of effort to wrap my head completely around everything. Here I will describe a de novo genome annotation for Boa constrictor in detail, so that there is a record and that it is easy to use this as a guide to annotate any genome.

Software & Data

Software prerequisites:

  1. RepeatModeler and RepeatMasker with all dependencies (I used NCBI BLAST) and RepBase (version used was 20150807).
  2. MAKER MPI version 2.31.8 (though any other version 2 releases should be okay).
  3. [Augustus](http://bio
@darencard
darencard / orthomcl_tutorial.md
Last active Dec 9, 2019
Running OrthoMCL on a set of protein annotations
View orthomcl_tutorial.md

Running OrthoMCL on a set of protein annotations from various species

OrthoMCL is the leading piece of software for inferring orthologs across several organisms. In this tutorial I will provide detailed instructions for running a set of protein annotations through OrthoMCL.

Software and Data

  1. OrthoMCL, and it's dependencies, must be installed. Detailed information on this tool and its installation can be found here. I actually used a slightly modified version of OrthoMCL that was made available by the author of the orthomcl-pipeline (see below). There isn't much details on the ways this is different from the existing OrthoMCL, but this is available here.
  2. orthmcl-pipeline must also be installed, as this is how we will automate the OrthoMCL process. Detailed information on this tool and its installation can be found [here](https
@darencard
darencard / gene_structure_stats.md
Last active Dec 7, 2019
Script to produce estimates of gene structure
View gene_structure_stats.md

Inferring the structure of gene annotations

When annotating genomes it is often desireable to know the overall structure of genes, including information like exon and intron lengths among other metrics. Here is a program genestats that will calculate such measures for a user.

#!/usr/bin/env bash

usage()
{
cat << EOF
@darencard
darencard / gnuplot_quickstart.md
Created Aug 31, 2017
A quick-start guide for using gnuplot for in-terminal plotting
View gnuplot_quickstart.md

A quick-start guide for using gnuplot for in-terminal plotting

Sometimes it is really nice to just take a quick look at some data. However, when working on remote computers, it is a bit of a burden to move data files to a local computer to create a plot in something like R. One solution is to use gnuplot and make a quick plot that is rendered in the terminal. It isn't very pretty by default, but it gets the job done quickly and easily. There are also advanced gnuplot capabilities that aren't covered here at all.

gnuplot has it's own internal syntax that can be fed in as a script, which I won't get into. Here is the very simplified gnuplot code we'll be using:

set terminal dumb size 120, 30; set autoscale; plot '-' using 1:3 with lines notitle

Let's break this down:

@darencard
darencard / gdrive_download
Created Aug 1, 2017
Script to download files from Google Drive using Bash
View gdrive_download
#!/usr/bin/env bash
# gdrive_download
#
# script to download Google Drive files from command line
# not guaranteed to work indefinitely
# taken from Stack Overflow answer:
# http://stackoverflow.com/a/38937732/7002068
gURL=$1
@darencard
darencard / auto_git_file.md
Created May 1, 2017
Automatic file git commit/push upon change
View auto_git_file.md

Automatically push an updated file whenever it is changed

Linux

  1. Make sure inotify-tools is installed (https://github.com/rvoicilas/inotify-tools)
  2. Configure git as usual
  3. Clone the git repository of interest from github and, if necessary, add file you want to monitor
  4. Allow username/password to be cached so you aren't asked everytime
git config credential.helper store
@darencard
darencard / genome_oneliners.md
Last active Nov 5, 2019
Quick one-liner commands that are useful for genomics
View genome_oneliners.md

Useful Genomics Oneliners

The following commands sometimes require non-standard software like bioawk and seqtk.

rename scaffold headers with sequential numbers and lengths ("scaffold-N ")

bioawk -c fastx '{ print ">scaffold-" ++i" "length($seq)"\n"$seq }' < genome.fasta > new_genome.fasta

make association table of old and renamed scaffold names after above renaming command

@darencard
darencard / install_run_provean_notes.md
Created Oct 10, 2018
Notes on installing and running Provean
View install_run_provean_notes.md

Notes from work installing and running Provean to predict protein impact of variants. Provean input files were produced based on VEP output using commands below. Some trial runs were completed using a computer to understand how quickly Provean can be run in parallel to work through all annotated genes.

# running PROVEAN

# installation & dependencies
# 1. checked that blast was installed and also reinstalled cd-hit to avoid issue with certain version
# 2. installed the NCBI nr database
sudo mkdir /opt/ncbi_blast_nr_db_2018-01-29
sudo chmod 775 /opt/ncbi_blast_nr_db_2018-01-29
@darencard
darencard / extract_fastq_bam.md
Last active Oct 2, 2019
Extract paired FASTQ reads from a BAM mapping file
View extract_fastq_bam.md

Extracting paired FASTQ read data from a BAM mapping file

Sometimes FASTQ data is aligned to a reference and stored as a BAM file, instead of the normal FASTQ read files. This is okay, because it is possible to recreate raw FASTQ files based on the BAM file. The following outlines this process. The useful software samtools and bedtools are both required.

From each bam, we need to extract:

  1. reads that mapped properly as pairs
  2. reads that didn’t map properly as pairs (both didn’t map, or one didn’t map)

For #1, the following command will work. This was taken from this webpage.

@darencard
darencard / ucsc_genome_track_setup.md
Last active Mar 8, 2019
Creating a UCSC Genome Track for viewing genome annotations
View ucsc_genome_track_setup.md
layout title date excerpt
posts
Visualizing Genome Annotations
2019-01-25
Creating a UCSC Genome Track for Viewing Genome Annotations

Creating a UCSC Genome Track for Viewing Genome Annotations

When creating a custom, in-house genome annotation, there is no straight-forward way to share it upon publication. NCBI does not accept and archive these, so most users just end up depositing the text files in an online repository. The UCSC Genome Browser team has fortunately come up with a decent way to share an annotation in a graphical manner, so readers can browse the assembly and annotation at some level without a lot of work.

You can’t perform that action at this time.