Skip to content

Instantly share code, notes, and snippets.

@iracooke
iracooke / README.md
Last active January 13, 2023 03:10
Marine Omics Nextflow Pipeline Setup

Using Marine Omics Nextflow Pipelines

Installing java

First make sure you have a sufficiently modern java (11 or higher). You can find out with

java -version

If your java is too old you can try installing a newer java using sdkman.

@iracooke
iracooke / setup_bats.sh
Created July 20, 2020 01:27
Setup BATS
cd
mkdir -p ~/.local/bin
mkdir -p ~/.local/lib
git clone https://github.com/bats-core/bats-core.git
cd bats-core
./install.sh ~/.local/
cd ~/.local/lib
@iracooke
iracooke / README.md
Last active January 23, 2020 00:02
Split a fasta file

Split a Fasta file

This method relies on bioawk . First make sure you have bioawk installed. Then download the file split_fasta.awk from this repository. Instructions below assume you have this file available in your working directory

Installing bioawk (instructions specific for JCU HPC)

  1. Make a bin directory if you haven't already
cd ~
mkdir bin
@iracooke
iracooke / jcu_hpc_faq.md
Last active September 20, 2019 09:29
JCU HPC FAQ

The first port of call for info on the JCU HPC system is the official wiki . This gist is a supplement to the main wiki that provides some quick answers to common questions and links to this wiki as well as other useful resources.

This gist assumes that your local machine (ie your personal computer, not the HPC) is running a unix-like OS (macOS or linux). Windows users should consider setting up windows subsystem for linux so that they can also have a unix-like operating system to work with.

What is the JCU HPC system

It is a fairly substantial collection of high performance computers. At the time of writing this constituted 15 nodes each of which has 80 cpus and just under 400Gb of memory. All the nodes are networked together so that large jobs can be distributed across multiple nodes. A range of high capacity data storage is also networked to HPC accounts as [detailed here](ht

@iracooke
iracooke / ms_for_psmc.md
Created March 12, 2019 23:11
Use ms to simulate data for PSMC

How to use MS to simulate data for PSMC/MSMC

The ms command usage looks like this

usage: ms nsam howmany

So it is nessary to provide nsam (The number of haplotypes to be sampled) and howmany which is the number of replicate sets of data to generate.

For PSMC data we always choose nsam to be 2 because the method is designed for diploid genomes. For convenience howmany should just be set to 1 because we will rerun ms to generate separate random replicate datasets

@iracooke
iracooke / README.md
Last active August 1, 2018 23:32
Tutorial 2 : Fix

Run these commands from within the tutorial 2 directory

mkdir -p bin
touch bin/greet.sh

Copy content from greet.sh above into the file greet.sh that you just created

Change permissions

@iracooke
iracooke / extract_answers.awk
Created August 1, 2018 00:03
BC3203 Patches
BEGIN { in_q=0 }
/^```.*question_/ {
in_q=1;
match($0,"question_[a-z]*[0-9]+")
printf("%s(){\n",substr($0,RSTART,RLENGTH));
}
/#Your answer here/ {
@iracooke
iracooke / sweep_details.R
Last active July 14, 2018 21:02
Sweep Plots
library(tidyverse)
sweep_detail_plot <- function(scaffold,xl,rl,gff,sf2_data){
anno_data <- gff %>%
filter(seqid==scaffold) %>%
mutate(geneid=str_extract(attributes,"m[^\\;]+"))
anno_data$type <- factor(anno_data$type)

Running signalp on trinity/transdecoder output

Assuming we have a fasta file of proteins with ids generated from Trinity and Transdecoder called transdecoder.pep. Truncate names as follows.

cat transdecoder.pep | sed -r  's/[^:]*::/>/' > transdecoder_truncated.pep

Note that on a mac you should use -E instead of -r

@iracooke
iracooke / README.md
Last active May 7, 2021 07:32
Stacks vcf filtering

Basic filtering for AGRF RADseq data

This data consists of vcf file output from Stacks. See this post for some info about this output.

The general filtering strategy is as follows;

  1. Remove sites where the minor allele frequency is too low as these might also be the result of sequencing or alignment errors in a handful of individuals.

  2. Remove individuals where the depth is too low. Ideally we would use a likelihood based scoring measure here instead (eg GQ field) but this is not provided by Stacks.