Ira Cooke iracooke

## README.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                iracooke
                / README.md
            
            
              Last active
              January 13, 2023 03:10
            
              
                Marine Omics Nextflow Pipeline Setup
              
          
    Using Marine Omics Nextflow Pipelines

Installing java

First make sure you have a sufficiently modern java (11 or higher). You can find out with
java -version
If your java is too old you can try installing a newer java using sdkman.


## setup_bats.sh
cd
mkdir -p ~/.local/bin
mkdir -p ~/.local/lib

git clone https://github.com/bats-core/bats-core.git

cd bats-core
./install.sh ~/.local/
cd ~/.local/lib

## README.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                iracooke
                / README.md
            
            
              Last active
              January 23, 2020 00:02
            
              
                Split a fasta file
              
          
    Split a Fasta file

This method relies on bioawk . First make sure you have bioawk installed.  Then download the file split_fasta.awk from this repository.  Instructions below assume you have this file available in your working directory
Installing bioawk (instructions specific for JCU HPC)


Make a bin directory if you haven't already

cd ~
mkdir bin

  
## jcu_hpc_faq.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                iracooke
                / jcu_hpc_faq.md
            
            
              Last active
              September 20, 2019 09:29
            
              
                JCU HPC FAQ
              
          
    The first port of call for info on the JCU HPC system is the official wiki . This gist is a supplement to the main wiki that provides some quick answers to common questions and links to this wiki as well as other useful resources.
This gist assumes that your local machine (ie your personal computer, not the HPC) is running a unix-like OS (macOS or linux).  Windows users should consider setting up windows subsystem for linux so that they can also have a unix-like operating system to work with.
What is the JCU HPC system

It is a fairly substantial collection of high performance computers. At the time of writing this constituted 15 nodes each of which has 80 cpus and just under 400Gb of memory. All the nodes are networked together so that large jobs can be distributed across multiple nodes. A range of high capacity data storage is also networked to HPC accounts as [detailed here](ht

  
## ms_for_psmc.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                iracooke
                / ms_for_psmc.md
            
            
              Created
              March 12, 2019 23:11
            
              
                Use ms to simulate data for PSMC
              
          
    How to use MS to simulate data for PSMC/MSMC

The ms command usage looks like this

usage: ms nsam howmany

So it is nessary to provide nsam (The number of haplotypes to be sampled) and howmany which is the number of replicate sets of data to generate.
For PSMC data we always choose nsam to be 2 because the method is designed for diploid genomes. For convenience howmany should just be set to 1 because we will rerun ms to generate separate random replicate datasets

  
## README.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                iracooke
                / README.md
            
            
              Last active
              August 1, 2018 23:32
            
              
                Tutorial 2 : Fix
              
          
    Run these commands from within the tutorial 2 directory
mkdir -p bin
touch bin/greet.sh
Copy content from greet.sh above into the file greet.sh that you just created
Change permissions

  
## extract_answers.awk
BEGIN { in_q=0 }

/^```.*question_/ {
	in_q=1;
	match($0,"question_[a-z]*[0-9]+")
	printf("%s(){\n",substr($0,RSTART,RLENGTH));
}


/#Your answer here/ {

## sweep_details.R
library(tidyverse)

sweep_detail_plot <- function(scaffold,xl,rl,gff,sf2_data){

  anno_data <- gff %>%
    filter(seqid==scaffold) %>%
    mutate(geneid=str_extract(attributes,"m[^\\;]+"))

  anno_data$type <- factor(anno_data$type)

## signalp_transdecoder.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                iracooke
                / signalp_transdecoder.md
            
            
              Created
              May 16, 2017 05:22
            
          
    Running signalp on trinity/transdecoder output

Assuming we have a fasta file of proteins with ids generated from Trinity and Transdecoder called transdecoder.pep.
Truncate names as follows.
cat transdecoder.pep | sed -r  's/[^:]*::/>/' > transdecoder_truncated.pep
Note that on a mac you should use -E instead of -r

  
## README.md

      
              3 files
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                iracooke
                / README.md
            
            
              Last active
              May 7, 2021 07:32
            
              
                Stacks vcf filtering
              
          
    Basic filtering for AGRF RADseq data

This data consists of vcf file output from Stacks.  See this post for some info about this output.
The general filtering strategy is as follows;


Remove sites where the minor allele frequency is too low as these might also be the result of sequencing or alignment errors in a handful of individuals.


Remove individuals where the depth is too low.  Ideally we would use a likelihood based scoring measure here instead (eg GQ field) but this is not provided by Stacks.
	cd
	mkdir -p ~/.local/bin
	mkdir -p ~/.local/lib

	git clone https://github.com/bats-core/bats-core.git

	cd bats-core
	./install.sh ~/.local/
	cd ~/.local/lib
	BEGIN { in_q=0 }

	/^```.*question_/ {
	in_q=1;
	match($0,"question_[a-z]*[0-9]+")
	printf("%s(){\n",substr($0,RSTART,RLENGTH));
	}


	/#Your answer here/ {
	library(tidyverse)

	sweep_detail_plot <- function(scaffold,xl,rl,gff,sf2_data){

	anno_data <- gff %>%
	filter(seqid==scaffold) %>%
	mutate(geneid=str_extract(attributes,"m[^\\;]+"))

	anno_data$type <- factor(anno_data$type)