arundurvasula/gist:4a402ef7af76a03dfc7e

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    In order to get the right code, you need to clone the correct repository and branch
git clone -b ecl298 https://github.com/arundurvasula/angsd-wrapper.git

SFS

First things first, we need to create a directory in the cloned folder called results:
mkdir results

Now in order to create a site frequency spectrum, we need to tell ANGSD a few things.

where the data is located (individual bam files)
where the reference sequence is locatated (and ancestral sequence)
the inbreeding coefficients for each sample

Let's tackle these one at a time.
1) bam files

To do this, we need to create a file called data/${TAXON}_samples.txt in the data folder. ${TAXON} is a variable in ANGSD-wrapper that can be set to whatever you want. It is supposed to refer to the type of sample you have. For example, for Oryza glumaepetula samples, this can be og. Therefore, you can create a file called og_samples.txt.
In this file, we need to place the path of each bam file you want to include in the analysis. Make sure these are absolute paths, not relative paths.
2) reference sequence & ancestral sequence

In order to tell ANGSD where the reference sequence is we need to edit the configuration file. We can modify the example one: scripts/sfs_example.conf.
In this file, we need to add a value for the ANC_SEQ and REF_SEQ variables. These should be set to the absolute path of both of the files. In this case, the ancestral species is Oryza meridionalis and the reference is Oryza sativa indica.
3) inbreeding coefficients

Normally, these values would need be estimated from the data, but in this case we can use 1 for each one. In order to tell ANGSD what these values are for each sample, we need to create a file data/${TAXON}_F.txt. In this file, we put the inbreeding coefficient values for each sample set on different lines.
4) Other stuff

The default region set in common.conf is chromosome 1. This can be changed by modifying the REGIONS variable using the correct region syntax
Now that we have all this data for angsd, we can submit out job to farm.
sbatch -p class scripts/ANGSD_SFS.sh scripts/sfs_example.conf

Theta

To run the theta script, you will need to edit the theta_example.conf file (or create your own) to point to the data above. The theta script will look for the output of the SFS script in the results directory and use the same data files we created above. Once you have adjusted the values in the configuration file, you can submit your job to farm using similar syntax to the SFS script.