Skip to content

Instantly share code, notes, and snippets.

View darencard's full-sized avatar

Daren Card darencard

View GitHub Profile
@darencard
darencard / CDS_extract.md
Last active March 28, 2024 13:30
Extracting spliced sequences (e.g., CDS) from GFF files

Extracting spliced sequences (e.g., CDS) from GFF files

GFF is a common format for storing genetic feature annotations. In the case of gene annotations, subsets of elements are split over multiple lines, as things like exons and CDS features will have gaps based on the full genome sequence. Therefore, while it is easy to extract exon and CDS lines, it can be difficult to associate them together based on a parent (e.g., transcript) ID and perform downstream operations. Even extracting the full CDS sequence using a GFF file can be tricky for this reason, even though it seems trivial.

Here we'll overcome this difficulty using the gffread tool. Installation is pretty easy and is documented in the GitHub README. gffread has a lot of options, but here we'll just document one that extracts the spliced CDS for each GFF transcript (-x option). Note that you can do the same thing for exons (-w option) and can also produce the protein sequence (-y option).

Let's extra

@darencard
darencard / genome_oneliners.md
Last active March 12, 2024 17:40
Quick one-liner commands that are useful for genomics

Useful Genomics Oneliners

The following commands sometimes require non-standard software like bioawk and seqtk.

rename scaffold headers with sequential numbers and lengths ("scaffold-N ")

bioawk -c fastx '{ print ">scaffold-" ++i" "length($seq)"\n"$seq }' < genome.fasta > new_genome.fasta

make association table of old and renamed scaffold names after above renaming command

@darencard
darencard / maker_genome_annotation.md
Last active March 7, 2024 08:50
In-depth description of running MAKER for genome annotation.

Please see the most up-to-date version of this protocol on my blog at https://darencard.net/blog/.

Genome Annotation using MAKER

MAKER is a great tool for annotating a reference genome using empirical and ab initio gene predictions. GMOD, the umbrella organization that includes MAKER, has some nice tutorials online for running MAKER. However, these were quite simplified examples and it took a bit of effort to wrap my head completely around everything. Here I will describe a de novo genome annotation for Boa constrictor in detail, so that there is a record and that it is easy to use this as a guide to annotate any genome.

Software & Data

Software prerequisites:

  1. RepeatModeler and RepeatMasker with all dependencies (I used NCBI BLAST) and RepBase (ver
@darencard
darencard / gnuplot_quickstart.md
Created August 31, 2017 14:20
A quick-start guide for using gnuplot for in-terminal plotting

A quick-start guide for using gnuplot for in-terminal plotting

Sometimes it is really nice to just take a quick look at some data. However, when working on remote computers, it is a bit of a burden to move data files to a local computer to create a plot in something like R. One solution is to use gnuplot and make a quick plot that is rendered in the terminal. It isn't very pretty by default, but it gets the job done quickly and easily. There are also advanced gnuplot capabilities that aren't covered here at all.

gnuplot has it's own internal syntax that can be fed in as a script, which I won't get into. Here is the very simplified gnuplot code we'll be using:

set terminal dumb size 120, 30; set autoscale; plot '-' using 1:3 with lines notitle

Let's break this down:

@darencard
darencard / imagej_macros_scale_png.md
Last active February 26, 2024 17:53
ImageJ macro to automate scale bar addition using command line

Overview:

Below is a ImageJ macro that will read a user-provided image file, add a scale bar, and then output a PNG image. The user must specify two arguments, the input file and the output file, as one quote-enclosed argument (see example below). The scale bar characteristics have been hard coded and can be changed by hand, or the script can be modified to allow argument specifications.

This macro is designed to be called from the command line using the ImageJ executable. With my Mac OSX computer running Fiji, the path is /Applications/Fiji.app/Contents/MacOS/ImageJ-macosx. This has not been tested elsewhere and may not work without some effort. It relies on the Bio-Formats plugin to read the file and was written to convert from Zeiss's .czi files, so no guarantee that it works with others as desired. It is especially important to note that this does not set the scale, but infers it based on the metadata stored in the .czi files. Therefore, it will probably not work well with other file types.

If y

@darencard
darencard / auto_git_file.md
Last active January 6, 2024 10:33
Automatic file git commit/push upon change

Please see the most up-to-date version of this protocol on my blog at https://darencard.net/blog/.

Automatically push an updated file whenever it is changed

Linux

  1. Make sure inotify-tools is installed (https://github.com/rvoicilas/inotify-tools)
  2. Configure git as usual
  3. Clone the git repository of interest from github and, if necessary, add file you want to monitor
  4. Allow username/password to be cached so you aren't asked everytime
@darencard
darencard / popstats_from_vcf.Md
Created July 17, 2017 16:16
Calculating population genetic statistics from VCF files using BCFtools

Useful Oneliners for Calculating Population Genetic Statistics from VCF files

The following commands require non-standard software like BCFtools and VCFtools.

thin variants to prevent linkage biases and output the number of sampled alleles and the allele frequency for the reference allele

vcftools --thin 10000 --recode --recode-INFO-all --stdout --gzvcf <my_variants.vcf.gz> | \
  bcftools query -f '%CHROM\t%POS[\t%GT]\n' - | \
 awk -v OFS="\t" '{ miss=0; hom_ref=0; hom_alt=0; het=0; \

I am attesting that this GitHub handle darencard is linked to the Tezos account tz1aE51gtg4Rbim7hPZPmim2tE5EpMTmXhAG for tzprofiles

sig:edsigtzSNbxXjBfZyzBbmo72qaJzmkMEEHjAQULLbDmjjVUqWmMd8KDdJSBdeg14mbTEm5LmfUrZnBWDZ8VgdxWk5QaRjqQVumD

@darencard
darencard / ucsc_genome_track_setup.md
Last active November 4, 2023 09:16
Creating a UCSC Genome Track for viewing genome annotations
layout title date excerpt
posts
Visualizing Genome Annotations
2019-01-25
Creating a UCSC Genome Track for Viewing Genome Annotations

Creating a UCSC Genome Track for Viewing Genome Annotations

When creating a custom, in-house genome annotation, there is no straight-forward way to share it upon publication. NCBI does not accept and archive these, so most users just end up depositing the text files in an online repository. The UCSC Genome Browser team has fortunately come up with a decent way to share an annotation in a graphical manner, so readers can browse the assembly and annotation at some level without a lot of work.

@darencard
darencard / entware_synology.md
Last active November 3, 2023 03:31
Setting up and using Entware on Synology device