You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This data consists of vcf file output from Stacks. See this post for some info about this output.
The general filtering strategy is as follows;
Remove sites where the minor allele frequency is too low as these might also be the result of sequencing or alignment errors in a handful of individuals.
Remove individuals where the depth is too low. Ideally we would use a likelihood based scoring measure here instead (eg GQ field) but this is not provided by Stacks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
So it is nessary to provide nsam (The number of haplotypes to be sampled) and howmany which is the number of replicate sets of data to generate.
For PSMC data we always choose nsam to be 2 because the method is designed for diploid genomes. For convenience howmany should just be set to 1 because we will rerun ms to generate separate random replicate datasets
The first port of call for info on the JCU HPC system is the official wiki . This gist is a supplement to the main wiki that provides some quick answers to common questions and links to this wiki as well as other useful resources.
This gist assumes that your local machine (ie your personal computer, not the HPC) is running a unix-like OS (macOS or linux). Windows users should consider setting up windows subsystem for linux so that they can also have a unix-like operating system to work with.
What is the JCU HPC system
It is a fairly substantial collection of high performance computers. At the time of writing this constituted 15 nodes each of which has 80 cpus and just under 400Gb of memory. All the nodes are networked together so that large jobs can be distributed across multiple nodes. A range of high capacity data storage is also networked to HPC accounts as [detailed here](ht
This method relies on bioawk . First make sure you have bioawk installed. Then download the file split_fasta.awk from this repository. Instructions below assume you have this file available in your working directory
Installing bioawk (instructions specific for JCU HPC)