In order to get the right code, you need to clone the correct repository and branch
git clone -b ecl298 https://github.com/arundurvasula/angsd-wrapper.git
First things first, we need to create a directory in the cloned folder called results:
mkdir results
Now in order to create a site frequency spectrum, we need to tell ANGSD a few things.
- where the data is located (individual bam files)
- where the reference sequence is locatated (and ancestral sequence)
- the inbreeding coefficients for each sample
Let's tackle these one at a time.
To do this, we need to create a file called data/${TAXON}_samples.txt
in the data folder. ${TAXON}
is a variable in ANGSD-wrapper that can be set to whatever you want. It is supposed to refer to the type of sample you have. For example, for Oryza glumaepetula samples, this can be og. Therefore, you can create a file called og_samples.txt.
In this file, we need to place the path of each bam file you want to include in the analysis. Make sure these are absolute paths, not relative paths.
In order to tell ANGSD where the reference sequence is we need to edit the configuration file. We can modify the example one: scripts/sfs_example.conf
.
In this file, we need to add a value for the ANC_SEQ
and REF_SEQ
variables. These should be set to the absolute path of both of the files. In this case, the ancestral species is Oryza meridionalis and the reference is Oryza sativa indica.
Normally, these values would need be estimated from the data, but in this case we can use 1 for each one. In order to tell ANGSD what these values are for each sample, we need to create a file data/${TAXON}_F.txt
. In this file, we put the inbreeding coefficient values for each sample set on different lines.
The default region set in common.conf is chromosome 1. This can be changed by modifying the REGIONS variable using the correct region syntax
Now that we have all this data for angsd, we can submit out job to farm.
sbatch -p class scripts/ANGSD_SFS.sh scripts/sfs_example.conf
To run the theta script, you will need to edit the theta_example.conf
file (or create your own) to point to the data above. The theta script will look for the output of the SFS script in the results directory and use the same data files we created above. Once you have adjusted the values in the configuration file, you can submit your job to farm using similar syntax to the SFS script.
Also need to modify common.conf in "scripts" to make the project directory wherever you have angsd-wrapper installed.