August 27, 2017, Justin C. Bagley, Richmond, VA
In this Gist, I briefly provide some examples of how to set DNA substitution models in the program Seq-Gen (Rambaut and Grassly 1997). This software is available for download through Andrew Rambaut's website, and its infrequent development can also be tracked on GitHub at the Seq-Gen GitHub repository.
Here is an example using an alpha shape parameter of 0.5 (-a
) for gamma-distributed rate heterogeneity, 4 discrete gamma categories (-g
), empirical (fixed) base frequencies (-f
), and a Ts:Tv ratio of 1.5 (-t
):
seqgencommand = -mHKY -l9077 -a0.5 -g4 -f0.314,0.198,0.218,0.270 -t1.5
This example is similar to that for HKY + G above, except a proportion of invariant sites of 0.5 is added with the -i
flag:
seqgencommand = -mHKY -l9077 -a0.5 -g4 -i0.5 -f0.314,0.198,0.218,0.270 -t1.5
Go from HKY to K80 (also known as Kimura 2-parameter model, or "K2P") by starting from HKY and then setting base frequencies to be all equal to one another:
seqgencommand = -mHKY -l3263 -a0.5 -g4 -fe -t1.5
**
Important note: if the -t
flag is not used to provide a Ts:Tv rate ratio here, then Ts:Tv will be set to 1 and the model will reduce to the Jukes-Cantor '69 (JC69) model, which is probably not what you want here.
seqgencommand = -mHKY -l3263 -a0.5 -g4 -i0.5 -fe -t1.5
(Use HKY + G model as a substitute.)
(Use HKY + I + G model as a substitute.)
Here is a GTR example with gamma-distributed rate heterogeneity, Ts:Tv ratio, and other parameters set similar to the settings for HKY examples above:
seqgencommand = -mGTR -l499 -a0.5 -g4 -f0.314,0.198,0.218,0.270 -t1.5
seqgencommand = -mGTR -l499 -a0.5 -g4 -i0.5 -f0.314,0.198,0.218,0.270 -t1.5
Get the SYM model by starting from GTR and then setting all base frequencies to be equal with the -f
flag:
seqgencommand = -mGTR -l2237 -a0.5 -g4 -fe -t1.5
seqgencommand = -mGTR -l499 -a0.5 -g4 -i0.5 -fe -t1.5
- Rambaut, A. and Grassly, N. C. (1997) Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Computer Applications in the Biosciences 13, 235-238.