Skip to content

Instantly share code, notes, and snippets.

@dereneaton
Last active August 29, 2015 13:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dereneaton/9567786f0150354dd0c3 to your computer and use it in GitHub Desktop.
Save dereneaton/9567786f0150354dd0c3 to your computer and use it in GitHub Desktop.
** PERMALINKED **
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"PyRAD manuscript supplementary notebook"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Scripts to fully reproduce all analyses "
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"D.A.R. Eaton (Nov. 12, 2013)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"------------------------- \n",
"\n",
"This IPython notebook contains a combination of Python, bash and R scripts that describe the full analysis used in the \n",
"pyRAD manuscript. If you wish to execute the scripts yourself you can copy and paste commands from the input cells, \n",
"or you can download this notebook as a .pynb and re-execute all scripts in an IPython notebook of your own. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are unfamiliar with IPython notebooks (you are reading one now) you can read more about them [here](http://ipython.org/notebook.html), however I'll \n",
"explain briefly: The default language in any coding cell is _Python_, however other languages can be executed within cells \n",
"as well. An easy way to do this is to designate the language at the beginning of the cell using the __%%__ character. Cells \n",
"which begin with %%bash are executed like a shell script, and those with \"%%R\" execute _R_ scripts, including the ability \n",
"to embed high quality graphics in the output below cells, which is demonstrated near the end of this notebook. Below I use \n",
"the __ls -l__ unix command to check what is in my current working directory. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"--------------------------------------- \n",
"\n",
"If you plan to execute scripts in this notebook you will need the following Python modules: \n",
"\n",
"* Numpy \n",
"* Scipy \n",
"* Egglib \n",
"\n",
"In addition you will need PyRAD v.1.64 and Stacks 1.1 or higher (the MYSQL installation is not required). You will also need \n",
"two directories containing the necessary scripts and input files, which are located in a zipped directory in the supplemental \n",
"materials. Unzip this archive and move the contents into your current working directory.\n",
"\n",
"-------------------------------- \n",
"\n",
"You will need to change the location of pyRAD and Stacks in all of the following scripts to reflect its location on your machine.\n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!ls -l"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"total 400\r\n",
"-rw-rw-r-- 1 deren deren 43695 Dec 2 18:31 PyRAD_STACKS.ipynb\r\n",
"drwxrwxr-x 2 deren deren 4096 Dec 2 18:21 REALDATA\r\n",
"drwxrwxr-x 2 deren deren 4096 Dec 2 18:22 SIMbig\r\n",
"-rw-rw-rw- 1 deren deren 7687 Nov 20 14:14 simradsMANU.py\r\n",
"drwxrwxr-x 2 deren deren 4096 Dec 2 18:23 SIMsmall\r\n"
]
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---------------------------- \n",
"At this point you see there is a .pynb file, which is this Ipython notebook, and also three directories and a python script for \n",
"simulating RADseq data under a coalescent model. This will also create a barcodes output files for each data set. The SIMsmall \n",
"directory contains the params input files for three _PyRAD_ analyses."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!ls -l SIMsmall/"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"total 24\r\n",
"-rw-rw-r-- 1 deren deren 4369 Dec 2 18:23 params_high.txt\r\n",
"-rw-rw-r-- 1 deren deren 4370 Dec 2 18:23 params_low.txt\r\n",
"-rw-rw-r-- 1 deren deren 4365 Dec 2 18:24 params_no.txt\r\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Simulating RAD sequences"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The script simradsMANU.py (included with supplemental materials) uses the _egglib_ module to simulate sequences under a coalescent \n",
"model and then trims them to emulate RADseq-like data. To begin, I simulate three different data sets which I use to compare the \n",
"performance of _PyRAD_ and _Stacks_ on data containing indels. \n",
"\n",
"The first argument to the script is the rate of indel deletions, and the second is whether mutations should arise in the restriction \n",
"recognition site (locus dropout), which I do not use here (set to 0). The third argument is the number of loci to simulate and the \n",
"fourth is the number of individuals to sample per species. The final argument is the name of the output file. Data are simulated on \n",
"a species tree as described in the manuscript text. All other parameters used in the simulation are kept constant, because I am \n",
"primarily interested in examining the effect of indel variation."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"## you must have the egglib Python module installed to do this step. \n",
"python simradsMANU.py 0.00 0 10000 1 SIMsmall/simRADs_no.fastq\n",
"python simradsMANU.py 0.02 0 10000 1 SIMsmall/simRADs_low.fastq\n",
"python simradsMANU.py 0.05 0 10000 1 SIMsmall/simRADs_high.fastq"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\tsimulating 10000 RAD loci across 12 tip taxa with 1 samples per taxon\n",
"\tindels arise at frequency of 0.000000\n",
"\tmutations in restriction site = False\n",
"\ttheta=4Nu= 0.0014\n",
"\tcreating new barcode map\n",
". . . . . . . . . .\n",
"\tsimulating 10000 RAD loci across 12 tip taxa with 1 samples per taxon\n",
"\tindels arise at frequency of 0.020000\n",
"\tmutations in restriction site = False\n",
"\ttheta=4Nu= 0.0014\n",
"\tcreating new barcode map\n",
". . . . . . . . . .\n",
"\tsimulating 10000 RAD loci across 12 tip taxa with 1 samples per taxon\n",
"\tindels arise at frequency of 0.050000\n",
"\tmutations in restriction site = False\n",
"\ttheta=4Nu= 0.0014\n",
"\tcreating new barcode map\n",
". . . . . . . . . .\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"--------------- \n",
"\n",
"The raw simulated data are in _fastQ_ format as shown below. With uniformly high quality scores."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!head -n12 SIMsmall/simRADs_no.fastq"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"@lane1_fakedata_R1_0 1:N:0:\r\n",
"ATGGATTGCAGTGATATATTCCCATTTCTGGTTCTGGAGATACCCTCGTCCGGGAATCTAACTATCTGTGTCTTTACGGCTGAGGTTGTCCCATTGTATTTCCAGTCTTAG\r\n",
"+\r\n",
"BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\r\n",
"@lane1_fakedata_R1_1 1:N:0:\r\n",
"ATGGATTGCAGTGATATATTCCCATTTCTGGTTCTGGAGATACCCTCGTCCGGGAATCTAACTATCTGTGTCTTTACGGCTGAGGTTGTCCCATTGTATTTCCAGTCTTAG\r\n",
"+\r\n",
"BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\r\n",
"@lane1_fakedata_R1_2 1:N:0:\r\n",
"ATGGATTGCAGTGATATATTCCCATTTCTGGTTCTGGAGATACCCTCGTCCGGGAATCTAACTATCTGTGTCTTTACGGCTGAGGTTGTCCCATTGTATTTCCAGTCTTAG\r\n",
"+\r\n",
"BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\r\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Small simulated data analysis "
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"PyRAD analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now I run the _PyRAD_ analysis. To begin, I create a different working directory for each analysis, then execute _PyRAD_ on the three \n",
"different params files which point to one of the three working directories, respectively. The following cells will each take some time \n",
"to run. The __time__ command is used to measure how long. And each is run in a separate cell. The params files designate to use parallel \n",
"processing on 12 processors."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"## create working directories for each PyRAD analysis\n",
"mkdir SIMsmall/Py_no\n",
"mkdir SIMsmall/Py_low\n",
"mkdir SIMsmall/Py_high"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"No Indels"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%time ! ~/pyrad_v.1.62/pyRAD -p SIMsmall/params_no.txt 2> SIMsmall/log.pyno"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"usearch_i86linux32 v6.0.307, 4.0Gb RAM (49.5Gb total), 24 cores\r\n",
"(C) Copyright 2010-12 Robert C. Edgar, all rights reserved.\r\n",
"http://drive5.com/usearch\r\n",
"\r\n",
"Licensed to: daeaton@uchicago.edu\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\tingroup ['1A0', '1B0', '1C0', '1D0', '2E0', '2F0', '2G0', '2H0', '3I0', '3J0', '3K0', '3L0']\r\n",
"\taddon []\r\n",
"\texclude []\r\n",
"\t"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"..."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"..."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"\tfinal stats written to:\r\n",
"\t /home/deren/Dropbox/PYRADMANU/SIMsmall/Py_no/stats/c85d6m2pN.stats\r\n",
"\toutput files written to:\r\n",
"\t /home/deren/Dropbox/PYRADMANU/SIMsmall/Py_no/outfiles/ directory\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 1.92 s, sys: 0.34 s, total: 2.26 s"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Wall time: 1128.01 s\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Low Indels"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%time ! ~/pyrad_v.1.62/pyRAD -p SIMsmall/params_low.txt 2> SIMsmall/log.pylow"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"usearch_i86linux32 v6.0.307, 4.0Gb RAM (49.5Gb total), 24 cores\r\n",
"(C) Copyright 2010-12 Robert C. Edgar, all rights reserved.\r\n",
"http://drive5.com/usearch\r\n",
"\r\n",
"Licensed to: daeaton@uchicago.edu\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\tingroup ['1A0', '1B0', '1C0', '1D0', '2E0', '2F0', '2G0', '2H0', '3I0', '3J0', '3K0', '3L0']\r\n",
"\taddon []\r\n",
"\texclude []\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\t"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"\tfinal stats written to:\r\n",
"\t /home/deren/Dropbox/PYRADMANU/SIMsmall/Py_low/stats/c85d6m2pN.stats\r\n",
"\toutput files written to:\r\n",
"\t /home/deren/Dropbox/PYRADMANU/SIMsmall/Py_low/outfiles/ directory\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 1.81 s, sys: 0.30 s, total: 2.10 s"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Wall time: 1112.46 s\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"High Indels"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%time ! ~/pyrad_v.1.62/pyRAD -p SIMsmall/params_high.txt 2> SIMsmall/log.pyhigh"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"usearch_i86linux32 v6.0.307, 4.0Gb RAM (49.5Gb total), 24 cores\r\n",
"(C) Copyright 2010-12 Robert C. Edgar, all rights reserved.\r\n",
"http://drive5.com/usearch\r\n",
"\r\n",
"Licensed to: daeaton@uchicago.edu\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\tingroup ['1A0', '1B0', '1C0', '1D0', '2E0', '2F0', '2G0', '2H0', '3I0', '3J0', '3K0', '3L0']\r\n",
"\taddon []\r\n",
"\texclude []\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\t"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"..."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"\tfinal stats written to:\r\n",
"\t /home/deren/Dropbox/PYRADMANU/SIMsmall/Py_high/stats/c85d6m2pN.stats\r\n",
"\toutput files written to:\r\n",
"\t /home/deren/Dropbox/PYRADMANU/SIMsmall/Py_high/outfiles/ directory\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 1.73 s, sys: 0.33 s, total: 2.06 s"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Wall time: 1133.35 s\n"
]
}
],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"## The following script removes unneeded files leftover from the analysis to save memory space.\n",
"rm -r SIMsmall/Py_no/fastq/ SIMsmall/Py_no/edits/ \n",
"rm -r SIMsmall/Py_low/fastq/ SIMsmall/Py_low/edits/ \n",
"rm -r SIMsmall/Py_high/fastq/ SIMsmall/Py_high/edits/"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Stacks Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now I run the _Stacks_ analysis. To begin, I create three output directories, one for each analysis."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"## create working directories for each stacks analysis\n",
"mkdir SIMsmall/stackf_no\n",
"mkdir SIMsmall/stackf_low\n",
"mkdir SIMsmall/stackf_high"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I then use process\\_tags to sort reads by their barcodes and then the denovo\\_map.pl pipeline to run the three _Stacks_ analyses. \n",
"Parameter setting used are analagous (as close as possible) to those used in the _PyRAD_ analysis. Because the barcodes were randomly \n",
"generated from the simulation and they must be entered by hand in the stacks script I use the commands below to generate a _Stacks_ input \n",
"script as a .sh file. I show the file and then execute it. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"## create a barcode file for stacks analyses\n",
"cut -f 2 SIMsmall/simRADs_no.fastq.barcodes > SIMsmall/simRADs_no.stacks.barcodes\n",
"cut -f 2 SIMsmall/simRADs_low.fastq.barcodes > SIMsmall/simRADs_low.stacks.barcodes\n",
"cut -f 2 SIMsmall/simRADs_high.fastq.barcodes > SIMsmall/simRADs_high.stacks.barcodes"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- assemble denovo RADseq data sets using Stacks for each data set. \n",
"- mindepth = m = 2\n",
"- clust.within = M = 13 differences\n",
"- secondary clust = N = M+2\n",
"- across sample clust = n = 15\n",
"- These settings are (close to being) analagous to the .85 clust similarity used in PyRAD given that read length are 100 bp. "
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"No Indels"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"process_radtags -f SIMsmall/simRADs_no.fastq -o SIMsmall/stackf_no/ \\\n",
" -b SIMsmall/simRADs_no.stacks.barcodes -e pstI -E phred33"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Using Phred+33 encoding for quality scores.\n",
"Found 1 input file(s).\n",
"Searching for single-end, inlined barcodes.\n",
"Loaded 12 barcodes (6bp).\n",
"Processing file 1 of 1 [simRADs_no.fastq]\n",
" 2600000 total reads; -200000 ambiguous barcodes; -0 ambiguous RAD-Tags; +0 recovered; -0 low quality reads; 2400000 retained reads.\n",
"Closing files, flushing buffers...\n",
"Outputing details to log: 'SIMsmall/stackf_no/process_radtags.log'\n",
"\n",
"2600000 total sequences;\n",
" 200000 ambiguous barcode drops;\n",
" 0 low quality read drops;\n",
" 0 ambiguous RAD-Tag drops;\n",
"2400000 retained reads.\n"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The function below creates the shell script (\".sh\") to run a stacks analysis."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import glob \n",
"def makestackslist(dset):\n",
" infiles = [\"-s \"+ff+\" \" for ff in glob.glob(\"SIMsmall/stackf_\"+dset+\"/*.fq\")]\n",
" denovo = \"denovo_map.pl -m 2 -M 13 -n 15 -T 12 -b 1 -B 1 -S -X \"+\\\n",
" \"'populations:-m 6' -D 'SIMsmall_no' -o SIMsmall/stackf_\"+dset+\" \"+\"\".join(infiles)\n",
" with open(\"SIMsmall/stacks_\"+dset+\".sh\",'w') as outfile:\n",
" outfile.write(denovo)\n",
" "
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"makestackslist('no')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For example, the stacks_no.sh script below will run stacks on the data set with no indels."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!cat SIMsmall/stacks_no.sh"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"denovo_map.pl -m 2 -M 13 -n 15 -T 12 -b 1 -B 1 -S -X 'populations:-m 6' -D 'SIMsmall_no' -o SIMsmall/stackf_no -s SIMsmall/stackf_no/sample_AAGAGG.fq -s SIMsmall/stackf_no/sample_GAGAGA.fq -s SIMsmall/stackf_no/sample_GGTTTA.fq -s SIMsmall/stackf_no/sample_ATAGTA.fq -s SIMsmall/stackf_no/sample_TAAATG.fq -s SIMsmall/stackf_no/sample_TGAAAT.fq -s SIMsmall/stackf_no/sample_GAGAAT.fq -s SIMsmall/stackf_no/sample_GAGTAT.fq -s SIMsmall/stackf_no/sample_GATAGA.fq -s SIMsmall/stackf_no/sample_ATGGAT.fq -s SIMsmall/stackf_no/sample_TGTAGA.fq -s SIMsmall/stackf_no/sample_ATGGTT.fq "
]
}
],
"prompt_number": 15
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%time ! sh SIMsmall/stacks_no.sh 2> SIMsmall/log.stackno"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 1 of 12 [sample_AAGAGG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 2 of 12 [sample_GAGAGA]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 3 of 12 [sample_GGTTTA]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 4 of 12 [sample_ATAGTA]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 5 of 12 [sample_TAAATG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 6 of 12 [sample_TGAAAT]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 7 of 12 [sample_GAGAAT]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 8 of 12 [sample_GAGTAT]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 9 of 12 [sample_GATAGA]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 10 of 12 [sample_ATGGAT]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 11 of 12 [sample_TGTAGA]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 12 of 12 [sample_ATGGTT]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 7.00 s, sys: 0.99 s, total: 7.99 s"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Wall time: 2490.63 s\n"
]
}
],
"prompt_number": 16
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Low Indels"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"process_radtags -f SIMsmall/simRADs_low.fastq -o SIMsmall/stackf_low/ \\\n",
" -b SIMsmall/simRADs_low.stacks.barcodes -e pstI -E phred33"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Using Phred+33 encoding for quality scores.\n",
"Found 1 input file(s).\n",
"Searching for single-end, inlined barcodes.\n",
"Loaded 12 barcodes (6bp).\n",
"Processing file 1 of 1 [simRADs_low.fastq]\n",
" 2600000 total reads; -200000 ambiguous barcodes; -0 ambiguous RAD-Tags; +0 recovered; -0 low quality reads; 2400000 retained reads.\n",
"Closing files, flushing buffers...\n",
"Outputing details to log: 'SIMsmall/stackf_low/process_radtags.log'\n",
"\n",
"2600000 total sequences;\n",
" 200000 ambiguous barcode drops;\n",
" 0 low quality read drops;\n",
" 0 ambiguous RAD-Tag drops;\n",
"2400000 retained reads.\n"
]
}
],
"prompt_number": 17
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"makestackslist(\"low\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 18
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!cat SIMsmall/stacks_low.sh"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"denovo_map.pl -m 2 -M 13 -n 15 -T 12 -b 1 -B 1 -S -X 'populations:-m 6' -D 'SIMsmall_no' -o SIMsmall/stackf_low -s SIMsmall/stackf_low/sample_ATGTGG.fq -s SIMsmall/stackf_low/sample_AAAAAG.fq -s SIMsmall/stackf_low/sample_GTTGGA.fq -s SIMsmall/stackf_low/sample_GTATGT.fq -s SIMsmall/stackf_low/sample_GGTAGG.fq -s SIMsmall/stackf_low/sample_TAGTAA.fq -s SIMsmall/stackf_low/sample_GGTAGT.fq -s SIMsmall/stackf_low/sample_GGTATG.fq -s SIMsmall/stackf_low/sample_GGGTTG.fq -s SIMsmall/stackf_low/sample_GGGGAG.fq -s SIMsmall/stackf_low/sample_AAGATT.fq -s SIMsmall/stackf_low/sample_TTTGTG.fq "
]
}
],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%time ! sh SIMsmall/stacks_low.sh 2> SIMsmall/log.stacklow"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 1 of 12 [sample_ATGTGG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 2 of 12 [sample_AAAAAG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 3 of 12 [sample_GTTGGA]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 4 of 12 [sample_GTATGT]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 5 of 12 [sample_GGTAGG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 6 of 12 [sample_TAGTAA]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 7 of 12 [sample_GGTAGT]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 8 of 12 [sample_GGTATG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 9 of 12 [sample_GGGTTG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 10 of 12 [sample_GGGGAG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 11 of 12 [sample_AAGATT]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 12 of 12 [sample_TTTGTG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 7.28 s, sys: 1.13 s, total: 8.42 s"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Wall time: 2485.87 s\n"
]
}
],
"prompt_number": 20
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"High Indels"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"process_radtags -f SIMsmall/simRADs_high.fastq -o SIMsmall/stackf_high/ \\\n",
" -b SIMsmall/simRADs_high.stacks.barcodes -e pstI -E phred33"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Using Phred+33 encoding for quality scores.\n",
"Found 1 input file(s).\n",
"Searching for single-end, inlined barcodes.\n",
"Loaded 12 barcodes (6bp).\n",
"Processing file 1 of 1 [simRADs_high.fastq]\n",
" 2600000 total reads; -200000 ambiguous barcodes; -0 ambiguous RAD-Tags; +0 recovered; -0 low quality reads; 2400000 retained reads.\n",
"Closing files, flushing buffers...\n",
"Outputing details to log: 'SIMsmall/stackf_high/process_radtags.log'\n",
"\n",
"2600000 total sequences;\n",
" 200000 ambiguous barcode drops;\n",
" 0 low quality read drops;\n",
" 0 ambiguous RAD-Tag drops;\n",
"2400000 retained reads.\n"
]
}
],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"makestackslist(\"high\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 22
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!cat SIMsmall/stacks_high.sh"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"denovo_map.pl -m 2 -M 13 -n 15 -T 12 -b 1 -B 1 -S -X 'populations:-m 6' -D 'SIMsmall_no' -o SIMsmall/stackf_high -s SIMsmall/stackf_high/sample_TAGATT.fq -s SIMsmall/stackf_high/sample_GGATAT.fq -s SIMsmall/stackf_high/sample_AGAATG.fq -s SIMsmall/stackf_high/sample_TTAGGG.fq -s SIMsmall/stackf_high/sample_ATATGT.fq -s SIMsmall/stackf_high/sample_TTAAAG.fq -s SIMsmall/stackf_high/sample_TTGAAG.fq -s SIMsmall/stackf_high/sample_AGATAT.fq -s SIMsmall/stackf_high/sample_TTAAAA.fq -s SIMsmall/stackf_high/sample_GAAATG.fq -s SIMsmall/stackf_high/sample_ATAGGA.fq -s SIMsmall/stackf_high/sample_GGAAAT.fq "
]
}
],
"prompt_number": 23
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%time ! sh SIMsmall/stacks_high.sh 2> SIMsmall/log.stackhigh"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 1 of 12 [sample_TAGATT]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 2 of 12 [sample_GGATAT]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 3 of 12 [sample_AGAATG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 4 of 12 [sample_TTAGGG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 5 of 12 [sample_ATATGT]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 6 of 12 [sample_TTAAAG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 7 of 12 [sample_TTGAAG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 8 of 12 [sample_AGATAT]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 9 of 12 [sample_TTAAAA]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 10 of 12 [sample_GAAATG]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 11 of 12 [sample_ATAGGA]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 12 of 12 [sample_GGAAAT]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 7.86 s, sys: 1.15 s, total: 9.00 s"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Wall time: 2777.69 s\n"
]
}
],
"prompt_number": 24
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Simulation results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I create a list measuring taxon coverage for each _Stacks_ analysis, and similarly for each _PyRAD_ analysis below. This measures the \n",
"total number of loci that have data for at least N taxa for each value of N. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"## Stacks results ##\n",
"lines = open(\"SIMsmall/stackf_high/batch_1.haplotypes.tsv\").readlines()\n",
"cnts = [int(field.strip().split(\"\\t\")[1]) for field in lines[1:]]\n",
"shigh = [cnts.count(i) for i in range(1,13)]\n",
"SHIGH = [int(sum(shigh)-sum(shigh[:i-1])) for i in range(1,13)]\n",
"\n",
"lines = open(\"SIMsmall/stackf_low/batch_1.haplotypes.tsv\").readlines()\n",
"cnts = [int(field.strip().split(\"\\t\")[1]) for field in lines[1:]]\n",
"slow = [cnts.count(i) for i in range(1,13)]\n",
"SLOW = [int(sum(slow)-sum(slow[:i-1])) for i in range(1,13)]\n",
"\n",
"lines = open(\"SIMsmall/stackf_no/batch_1.haplotypes.tsv\").readlines()\n",
"cnts = [int(field.strip().split(\"\\t\")[1]) for field in lines[1:]]\n",
"sno = [cnts.count(i) for i in range(1,13)]\n",
"SNO = [int(sum(sno)-sum(sno[:i-1])) for i in range(1,13)]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 25
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"## PyRAD results ##\n",
"infile = open(\"SIMsmall/Py_no/stats/c85d6m2pN.stats\").readlines()\n",
"pno = [int(infile[j].strip().split(\"\\t\")[1]) for j in range(26,38)]\n",
"PNO = [sum(pno)-sum(pno[:i-1]) for i in range(1,13)]\n",
"\n",
"infile = open(\"SIMsmall/Py_low/stats/c85d6m2pN.stats\").readlines()\n",
"plow = [int(infile[j].strip().split(\"\\t\")[1]) for j in range(26,38)]\n",
"PLOW = [sum(plow)-sum(plow[:i-1]) for i in range(1,13)]\n",
"\n",
"infile = open(\"SIMsmall/Py_high/stats/c85d6m2pN.stats\").readlines()\n",
"phigh = [int(infile[j].strip().split(\"\\t\")[1]) for j in range(26,38)]\n",
"PHIGH = [sum(phigh)-sum(phigh[:i-1]) for i in range(1,13)]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"## convert PyRAD lists to R objects ##\n",
"%load_ext rmagic \n",
"%Rpush PNO PLOW PHIGH SNO SLOW SHIGH"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 27
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"RES = cbind(PNO,PLOW,PHIGH,SNO,SLOW,SHIGH)\n",
"print(RES)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "display_data",
"text": [
" PNO PLOW PHIGH SNO SLOW SHIGH\n",
" [1,] 10000 9997 10049 10000 12085 16041\n",
" [2,] 10000 9997 10033 10000 10918 12683\n",
" [3,] 10000 9997 10021 10000 10707 11986\n",
" [4,] 10000 9996 10010 10000 10466 11100\n",
" [5,] 10000 9995 9988 10000 10062 10162\n",
" [6,] 10000 9995 9982 10000 9991 9929\n",
" [7,] 10000 9993 9974 10000 9966 9781\n",
" [8,] 10000 9993 9968 10000 9882 9481\n",
" [9,] 10000 9992 9945 10000 9523 8651\n",
"[10,] 10000 9991 9934 10000 9301 8000\n",
"[11,] 10000 9991 9923 10000 9110 7432\n",
"[12,] 10000 9991 9911 9996 8301 5755\n"
]
}
],
"prompt_number": 28
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Plotting the results"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"makeplot <- function() {\n",
" par(mfcol=c(1,3),\n",
" cex.axis=1.1,\n",
" cex.main=1.1,\n",
" cex.lab=1.1,\n",
" mar=c(4,4,3,0), \n",
" oma=c(1,2,0,2))\n",
"\n",
" plot(RES[,1], ylim=c(0,18000),xlim=c(1,12),\n",
" xlab=\"Minimum taxon coverage\",\n",
" ylab=\"Number of loci recovered\",\n",
" main=\"No indels\")\n",
"\n",
" points(RES[,1], pch=20, col='black', cex=2.5)\n",
" lines(RES[,1], lwd=3)\n",
" points(RES[,4], pch=20, col='darkgrey', cex=1.35)\n",
" lines(RES[,4], lwd=2.5, col='darkgrey')\n",
"\n",
" plot(RES[,2], ylim=c(0,18000),xlim=c(1,12),\n",
" xlab=\"Minimum taxon coverage\",\n",
" ylab=\"\",#loci with at least N taxa\",\n",
" main=\"Low indels\")\n",
"\n",
" points(RES[,2], pch=20, col='black', cex=2.5)\n",
" lines(RES[,2], lwd=3)\n",
" points(RES[,5], pch=20, col='darkgrey', cex=1.35)\n",
" lines(RES[,5], lwd=2.5, col='darkgrey')\n",
"\n",
" plot(RES[,3], ylim=c(0,18000),xlim=c(1,12),\n",
" xlab=\"Minimum taxon coverage\",\n",
" ylab=\"\",#loci with at least N taxa\",\n",
" main=\"High indels\")\n",
"\n",
" points(RES[,3], pch=20, col='black', cex=2.5)\n",
" lines(RES[,3], lwd=3)\n",
" points(RES[,6], pch=20, col='darkgrey', cex=1.35)\n",
" lines(RES[,6], lwd=2.5, col='darkgrey')\n",
"}"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"pdf(\"SIMsmall/FIG_1.pdf\",\n",
" width=6.5, height=2.8,\n",
" family=\"Times\")\n",
"makeplot()\n",
"dev.off()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 30
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%R makeplot()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAeAAAAHgCAMAAABKCk6nAAADAFBMVEUAAAABAQECAgIDAwMEBAQF\nBQUGBgYHBwcICAgJCQkKCgoLCwsMDAwNDQ0ODg4PDw8QEBARERESEhITExMUFBQVFRUWFhYXFxcY\nGBgZGRkaGhobGxscHBwdHR0eHh4fHx8gICAhISEiIiIjIyMkJCQlJSUmJiYnJycoKCgpKSkqKior\nKyssLCwtLS0uLi4vLy8wMDAxMTEyMjIzMzM0NDQ1NTU2NjY3Nzc4ODg5OTk6Ojo7Ozs8PDw9PT0+\nPj4/Pz9AQEBBQUFCQkJDQ0NERERFRUVGRkZHR0dISEhJSUlKSkpLS0tMTExNTU1OTk5PT09QUFBR\nUVFSUlJTU1NUVFRVVVVWVlZXV1dYWFhZWVlaWlpbW1tcXFxdXV1eXl5fX19gYGBhYWFiYmJjY2Nk\nZGRlZWVmZmZnZ2doaGhpaWlqampra2tsbGxtbW1ubm5vb29wcHBxcXFycnJzc3N0dHR1dXV2dnZ3\nd3d4eHh5eXl6enp7e3t8fHx9fX1+fn5/f3+AgICBgYGCgoKDg4OEhISFhYWGhoaHh4eIiIiJiYmK\nioqLi4uMjIyNjY2Ojo6Pj4+QkJCRkZGSkpKTk5OUlJSVlZWWlpaXl5eYmJiZmZmampqbm5ucnJyd\nnZ2enp6fn5+goKChoaGioqKjo6OkpKSlpaWmpqanp6eoqKipqamqqqqrq6usrKytra2urq6vr6+w\nsLCxsbGysrKzs7O0tLS1tbW2tra3t7e4uLi5ubm6urq7u7u8vLy9vb2+vr6/v7/AwMDBwcHCwsLD\nw8PExMTFxcXGxsbHx8fIyMjJycnKysrLy8vMzMzNzc3Ozs7Pz8/Q0NDR0dHS0tLT09PU1NTV1dXW\n1tbX19fY2NjZ2dna2trb29vc3Nzd3d3e3t7f39/g4ODh4eHi4uLj4+Pk5OTl5eXm5ubn5+fo6Ojp\n6enq6urr6+vs7Ozt7e3u7u7v7+/w8PDx8fHy8vLz8/P09PT19fX29vb39/f4+Pj5+fn6+vr7+/v8\n/Pz9/f3+/v7////isF19AAAgAElEQVR4nO2dCXwU5d3Hx9paW7yQ12rVWl/eak+NRy3lBlkgAbkE\nRDkMKocQEOUIICGGQ6lXaRorjaBiAeOzm4ScEEyQcIjIKRBASEO4ISCEOwlJ9nmfmdmw2Zll88zO\nf47s/r+fD+TZJ0+e/DLf3dlnZ555RqBISCNYHQAxFhQc4qDgEAcFhzgoOMRBwSEOCg5xUHCIg4JD\nHBQc4hgreJ6wkU5o7XmQFFFXPae9osI0stV/r8Wx5t1J6chOnt/p+wUkktGCHV7B5Qfrqm0l2OJY\nsmBPCM+vhoxksOCm966c0PpsryaPbBNTvvvbJ251lbW5vU37Ay1vH12bFLHzjzd2OWdoAgWSYCnP\nQx8Q4WjvSZbHqnsFn+54R/euSfe3vG8ZbCSDBd85v8X41jN+U9S/lRT7ls39HeMf3hzRfmifQ/ct\nSYqY9n97hq03NIECSbCU57Uhk27OumeF5bHmCT/96Y+Y4Pjfbnysa9KNG7r3ho1ktODqBx5s/exw\n+kVTKXYr+m7ryEl0UvsWd/z2roSkiOLuN7Taa2gCBZJgKU/ug0+OGXHDJctjzWu2Z89AJrjvaPpG\n16SH2X+wkYwWTL8QWic8sMvzCm7NYo+L2BLRfuDg8ilbkiLmJh68Z46hCRRkC8XFxfFings33PP1\nTR2p5bHEXXQMEzzxD1ue6Mreaz2CwSIZLrj2kdZne/w8YtvV2Ef/euuf2xc/3mx4TVJE4QM3/PmA\noQkUZAuMI1Keto5LP3qTWh6rTvDRJ5p2VAgGiYSfg23CV/9yv/CcAf2iYJtQ8uivH9lpQL8oOMRB\nwSEOCg5xUHCIg4JDHBQc4qDgEAcFhzhBC17XwmEenSbxRKppY2IkR5srPJmmPGlipBar1QGCFrwk\nOdifDIJL3XhaVXUxOkd9ul3iadXTzNPdH3+mrkPBwYKCAUHBXKBgSFAwICiYCxQMCQoGBAVzgYIh\nQcGAoGAuUDAkKBgQFMwFCoYEBQOCgrlAwZCgYEBQMBcoGBIUDAgK5gIFQ4KCAUHBXKBgSFAwICiY\nCxQMCQoGBAVzgYIhQcGAoGAuUDAkKBgQFMwFCoYEBQOCgrlAwZCgYEBQMBcoGBIUDAgK5gIFQ4KC\nAUHBXKBgSFAwICiYCxQMCQoGBAVzgYIhQcGAoGAuUDAkKBgQFMwFCoYEBQOCgrlAwZCgYEBQMBco\nGBIUDAgK5gIFQ4KCAUHBXKBgSFAwICiYCxQMCbfgg4UbuW7Qoh8UDAmv4EuEkO+MDiODgiHhFVzO\nBG8wOowMCoaEV7B7PSGnjQ4jg4Ih4R9kZbjcRoeRQcGQ8AsuIFxN9YOCIeEXvIGUGR1GBgVDwi94\nJ9lvdBgZFAwJv+D9ZKfRYWRQMCT8gk+Sb4wOI4OCIeEXfIkUGB1GRpfgUleys7TeYxsIVkaygWBl\nJEmw25VpTiQ9guOj4pLjeiR4K6wXrIpkvWBVJPlkwzJSbUokPYLbSf/X287WC1ZFsl6wKpIseDU5\na0okPYIHxmdtzZoZ7a2wXrAqkvWCVZFkwZvJUVMi6RFc5YodNtFV5a2wXrAqkvWCVZFkwd+TvaZE\nwkEWJPyDLHqYbDUlEg6yINEwyDpD1poSCQdZkGgYZF0hy02JhIMsSDQMsujSVFMi4SALEg2DLJpP\nKsyIhIMsSDQMsuh6csqMSDjIgkTDIItuJwfMiAQ2yPrHiBEj2k6CCcVFw4OsNSzSsAfNS8QxyLrI\nIo24T5qO9V+yy4xIYIOsYyUlJUOeg4rFQcODrHMs0s7bTYzU8CCrlkUqueOIWDxBvjUjEuggK2Yw\nSCY++AZZF5qZFojyDrLukgRfIF+ZEUmP4P3UXZBYUG92oPWCVZGsF6yK5BFc68w2I5IewRH0jafn\nj3jFW2G9YFUk6wWrInkE0xxnrQmR9AmOZE/Mjt4KOwhWRLKDYEWkOsGryHkTIukR3OzNTqtoTmdv\nhfWCVZGsF6yKVCd4IzluQiQ9gg+tXriKJhzzVlgvWBXJesGqSHWCd5NiEyKBTrqzXrAK6wWr8Qg+\naMoFhnoEV8h4K6wXrIpkvWBVpDrBP5CvTYikR3CU0LETw1thvWBVJOsFqyLVCa4kX5oQSdcu+jHF\nxEDrBasiWS9YFalOME1LNzoP1Sn4tOISSBsIVkaygWBlpKuCV5AqVWNwcJAFiaZBFl1nxkXgKBgS\nbYK3kUNGhpFBwZBoE7yP7DEyjAwKhkSb4GNkk5FhZFAwJNoEnyOFRoaRQcGQaBNcQ3KNDCODgiHR\nJphmOY1fagcFQ6JR8Epy0cAwMigYEo2CzVhqBwVDolFwESkxMIwMCoZEo+BSssPAMDIoGBKNgs1Y\nagcFQ6JR8GWSb2AYGRQMiUbBNDXDuCweUDAkWgUvI4av+46CIdEqeA0pNy6MDAqGRKvgLeRIoIYQ\noGBItAo2YakdFAyJVsFHyBbjwsigYEi0Ci4na4wLI4OCIdEquNr4pXZQMCRaBdOMVKNPGKJgSDQL\nzieXDQsjg4Ih0SzY+KV2UDAkmgXvIKVGZfGAgiHRLLiEFBkWRgYFQ6JZsPFL7aBgSDQLvkhWGhZG\nBgVDolmw25VlWBgZFAyJZsE0h9QYFUYGBUOiXXAhMfh+7ygYEu2CN5Fj124IAQqGRLtgw5faQcGQ\naBd8iGwzKowMCoZEu+AiYvDVDSgYEu0fkwghq4xKI4GCIdH+CmaClxmVRgIFQ6Jd8JE8Z5qhZwxR\nMCTaBYv3blhnTBgZFAxJMIJrcg2dO4uCIQlGMC0jmQZe34CCIQlKMN1g5ORZFAxJcIKrMpzGTdxB\nwZAEJ5iWkuWG3b4BBUMSpGBaaNxNslAwJMEKvpjmMuqsYTgIPrDd+MVsZIIVTPcYdpesMBB8kBBT\nbmBDdQh2f0n2g6eRCAPB25ngw0ankQlaMD3tTN1uyP2Ew0DwKSa40ug0MsELpnnEmL10GAimF5aR\nH4xOI6ND8FfsaWjEAa1wEExLzFiYWUSH4CJCDFlyJywEX0lNN3hyqgcdgumxTEMuRAsLwXSDObdT\n1yWYHiCrQcPIhIfgEwbPi6lDl2D3MiNewuEhmOaYsDIz1SmYlhrxEg4TwTsNv0xTQp9gQ17CYSL4\nojPH+NXz9Qo25CUcJoLpKhMWV9ct2IiXcLgILiUbjAzjQadgI17C4SK4Oj3V8IVd9Qs24CUcLoLp\nRhPuj6BbMN0P/hIOG8GnDF8sgQIIdudCv4TDRjBdZsJJYd2C4V/C3IKXygTszNaC95DtxoXxoF9w\nbQ7wS5hb8Lx5kU+92WNAwM5sLfiyM8vwj8L6BYO/hDXsojuzf46AndlaMF1j9GoJIILZS/gEUBoJ\nDYLb5J7NbRewM3sLPkTWGxbGA4BguouQ9YC7Gg2C9w/rNLI0YGf2Flyb4TJ65g6E4C2EQL4NaxBc\nm51yIPBTy96C2abbZ1QYDxCCi5hgwEnSGgSPmNArPSZgZzYXfIbkGnzSEELwlY0uchYmjogGwa1o\nNI0M2JnNBVd+Qchpo+JIQAimdB/kcXMNgrsl907vGbAzmws+zHZ+BYa+DcMIrslwwu1pNAg+Oyd6\nZuAXgM0Fn2OCSdo2A5fEgBHMBtJwFwxrENyzwcG7zQXTsi3F21OJcz3gW5wvQIKr0lLBrnLQIHjk\nqM9droCd2V2wSGXRUkJWkFUXDIgEJZh+B3fnaA2C54sE7KwxCGaKd6azXbURU1TBBF92pUOdvQ6n\nz8FXOS++GW824FovKMF0E9mjO4xMOH0O9rKH5GYQ11bwOR5ggi84M4GuxdD1ObjUlewsrffYBoKV\nka55hX/1nnSydA/w0hh+BSsj8Qim68l/YSLp+RwcHxWXHNcjwVthvWBVpABLOFRudZKs3ALIczf+\nBKsicQkuJ0DzfDUILlZ+DpZPLdXbztYLVkUKuEZH+Wrx0AdgJn+CVZG4BNPV5CBIJA2Cu/Rc7Dvp\nZWB81tasmdHeCusFqyI1sAgLgb043J9gVSQ+wWVkBUgkLXOyTn3S55n6j6tcscMmuqq8FdYLVkVq\nQPARJjgHbq6WP8GqSHyC6UqY+QkaBJ9e/KzjLZ+axjzIqsO9iWSegcoEN8gSn3sg00A1CO6UeMi3\nopEPsuooImnHgTLBDbKouGoHxHl/LQc68j7I8/lY0egHWR6Kna5DDbfiAXCQJV7Hkg0wztIgeOjw\nj4e/VL8iBAZZMgddTphPnbyDrCufxMTLt9O5sH7TtQZ5J9n4QP+uRYNgB1XMqvQ3yPom9jX5tvT7\n48cslA7GlL8z6u/Skf0rH8UkyM/djHFTtkqFDbGvyXcsKI2P+VRqffadUe9LravnxyTIq1tljpss\nnz37NvbVXKlwQGzNPcg6/+Hri6VjVjVLXv+wXKpdEf9uqVTYOnu22PeJNLJkwVqp5vIX7+ZI+yn3\nsndT5G2/ce4CeWd5NOVz+Q84n7Zwt1SoWbHAs177+o+XV/MOsg50Ehji57MM9rW1eAPSM4MFoY+0\nYOVXPTsPlxZFK2GC86QTX5d31K3zVVG/Hy40CI5atG9RQ0ey/iXcfZ8wiRVXC82aX9+zmtLDwk3N\nfy6wDVTZ9SfNmwqb2fdGXnf/ncIiVpgn/PI+YQIrrBVb92Ctj0itT7LWkWJrcXGcl6/79V3Cf1gh\nWWw9Xmp9QzOhxznOQdYBcWO2Y1v+clexJF6hNEIsLJcCMOaxQv4itjVjmdgjLYQbhS5MbHVf9q22\n4jWnU8VGG1khRSyIuTeLhSmscKoNK/RiuWvE1q3KOAdZfxd/Xvh9dn6uVPgLe2r3l1IynalS1Qn2\nehgifoZzfnuR/eGC8JT4FNsXKQj9xEiV8W2FwUfFvi4tei9f7rV8m9+1ojQILosdMMbnsI96kDVA\n6Mm200MzEhM79yZkTsSYxMSn27GYLZ9NTBzx2NuEdItMTJz2UBwh0Q/NTXzvoeGETH4oITGxSy/W\n+pGYxMS+bVnrVqz1yEdZ6+5dxdbTCBnKWr8vtp7YPG7u3LYdCJlw3b85B1n9pC3WJzm5t1R4PDl5\nnFQQnM6P5cICp1OYJW7NL4hzUte3yYQfj8jPHys0H/CY0CU/P1FqE5Gfv1RunZ6f/7BUmJufHykV\nRoutRZ7nHGS9JCj46W3y1z85HP8jFbomfTpIaPKH2xz/Ic6MX4g1A9kztL1Y6M56eFEstL3I3lzE\nZ9hgN9vfjGeFEeJOcEtnQZjBnnO06p+9+i6s1SB4+ng6JK5+hc/wYVb//v1/01JIICZx/6gGB1n5\nLFKfJsqN6Z9IQv79xnxP358mzpk+YTwrRF3P99N1NDjIusAi9f/ZVLn1DRw9/tjxEVkSlzCK/RlN\nbpar7vvDI3Kh++tvPy4V3i4peVcq/I3tL6TCZPa7nhYL4zQIbkvFEw718Bk+XDhz5syw3sIEQpKH\nT5w6dfgctn1iXp06NWYqK0waM3XquLGsMGv01KkT2CuR/GP4lKmThyf5bR2raP0FSaxrveC16bPf\nifuAfe/2GQ0OsipZpMPNOkp/8S3Nm8vb52fNm8svEuF/m98jF25uyl5Af2x3o/DzXz3R+Q3W9+Kr\nT6PF77zS83d/nTHuVi6/7RscZLlZpDO/WC+1nkbpXvnn8rJS5MILkyffLRXuefDuH3s6vXnIEhbk\nlet4AvyoadOb5JKjl0MuJGp4D844ltm1foWfQVbXW4aOvLPlZUon/qTv2N+Jb3gpwpNj2whsKLVT\neHhsz+tnsXFUi3tHDWnSl7WPvCWatWYbJVZqzUayhLVuK+SwD6fCQ2N7XT+T0nMt7nl5SJM+rHW3\nm6NH3iW2nvyzoVNaCDs4B1nvSH/nV5QWSIX3KD0kFQax1h3EQgfWfLBUxcY2790w6q3+wrKqi1kt\n2GZ9J8Ujekw/9nyR9pDtKijtIbVmw6xpUoFQmiYVYnmPZG1iW3+WuD+Vfu4bVvhcLMTUFcT34H9I\nhddKtiY+LCaYn5kuv1kI/0x8Uy50cLRs2PkUfsHHJ3R/xfcTnHqQVRbNhgHiR44qthuKWiNWikOE\nxWLhSzbImSn+VUXs3XCEePToJGvdV/y4UPU6ay3NsfiItRZHVDSftU4QW+/qIwjDpdZDWWtx6nrV\nNLE175GsWtGC9Fd+ygpx4gi5kG3foeJ4el9PQegpdlk+QBBaiEeOamNZI2neSmKzDvfFXLl4eGcq\n27yfiYcNd7A3vNbfscIRcRS8hBUqxSfGdLH1DFZ4rkL7kaxTuV/KR0q3ThknL2WY21seRdewP1cY\nID4xJrN3i3ksRE5Sq5dfuP0jViON0nqxt97xksRlP5RMkQq9nc5RUuHeyTG9ZMHv65jR0WiOZF29\nUOBq4VLd6fSz5XXfqtvqlQeq5cKVnZ55W5vZxl0hTmSt3rfX870DOz3zQU5t8/R57rsy4CNZ9OBX\nnjWkdyzMqLj0fb60L5E/VuaPfH6+GKVCHK8tEMOKw/goFrK6s6h1BStII8AhemZ0hMqRrAapqFhH\n0o9yNAQ9kqXi0h4mOE1xWcu5Is9zbvWCAunld+XTMdP3ioXzEzt0mF2lZ0ZHyBzJ4qCIODlmSQGe\nLvTLVmY49XtNMwH0zOgIgdOF/BxKI+sbnCYFebrQLxculaSRFeUNN7xKOF3ZoI/yHLLiYAOrqYFN\nugvAxa+Iq4j/RaznyoYKGW+F9YJVkQBvq3O5gO0gA19T4k+wKpJOwdS9z0UK9hZzTgnVc2VDlNCx\nE8NbYb1gVSTI+ybViqPYgLtHf4JVkfQKpvTMcsJ943BdVzY8Vu3bwnrBqkigN8ZaKxr+OsCVTX53\n0cpI+gXTmmwWhO/Ekq51sk4rdto2EKyMBCq4csf2vVmEFF5zJOJXsDISgGDxVkGBV7S6SvgshAZE\nbQlTvGr9Wr/DLTMGWRLuA5mcc/K4BWfSRr4QGhjV32eIu2rljlfENMHinLw8rqE0t+APadLfk95M\nCthZeAhmisVBTt4B9XUvJgqmK/luNKLnfLCKcBFMj7G3QEIydykvUDRT8EmSw3NllZ7zwSrCRjCt\nPEvPbHAR59pv1tSfrmemYLqGa1UoLXOylOeDVYSPYLnvbeKV5PVvm2eq4HJnhr9xgAINgtXng5WE\nmWD2ZryG+NzZ1FTB9Bue+4brmfiuIuwE04vEZ91BcwVfdKU1fOGcnonvKsJPMK3dW/+usOYKplvI\ntgbb6Jn4riIMBYv37vZOBTBZcGWaq8EV03RNfFcSjoLZR6ZlV484mCyY7iTfNtREy8T3aQNeD7ze\nQVgKpoXe+7mYLbg609nQom7cgl0yATsLT8Hlzsy6jytmC6b7yLoGWmi4Z4NEwM7CU3C9jyumC67N\nIaWBD0nj2ST9sI8rnmOWpgsWlw9fFXCqGLfgXbThWYVhKphuq5vJY75g8QYAASeKcQtuM+ORWYyA\nvy1cBVctdcqzlc0XvLehWUTcgs+tfWEtI+BvC1fBdI9nrGO+4NrduYHHWXioEoKabPmApfmC2Uel\n3IAnhvFQJQil8oJ5Vgimp5zpAX4tHqoEwf2ldFLJEsFsjFd47W/ioUoYjhOytsoiwbXLA9wbGQ9V\nwlDGRrO7LRJMz7jSrnmHAjzQAQN7BZMiqwTTHWTltY5noWAY3N+SwkrLBNd+ec35WWF1Wx3DsUow\nLXelXmPN3LC7rY6hWCaY7iYF/l9/YXhbHQOxTrC7gKwo9veNsLytjmFYJ5iWEvKFv37D87Y6RmGh\n4P1sGO9vFm043V7WeCwUXMkEF/rZwYbT7WWNx0LBtOaH5f5ewrgICyRWCmZ/a7pTfc2wBsEX/z3l\n08D3+0PBlgqmR8lS1TxpLbvouLy4vgF/AQq2VjDdTlYoJ2hpueuK59+1QcEWC3avJpsUVdyCt217\nbvbKWS8G7B8FWyyYVmaR/b413IInywTsHgVbLZj+4Er1HQhr2EUvHzpo0KCAvaNgywXTYpLtc02p\nBsEtdx0/Hvg+PijYesH0W7L068vehxoED97dUN8o2AaCzxFCNnsfahD8zKO+Sy6qQcE2EHyeCa53\nUanGWZWBQcE2EEy3EXLS+0iD4Genp6SkBOwaBdtBMN1SfyY8nvCHxB6CjwS5i94kErBnFGwLwVec\nWd4HGgRPmza5Tb+APaNgWwimBcQ7A0/btFk3nvAPiE0E7yDe6VkaBGdnZy9sE7BjFGwPwWXk66tl\nDYLnzp37QeBboaNgewiuTU2/OnmHW3C0TMCOUbA9BNNCcvWMA7fgYsbS348N2C8KtongPeTqgioa\ndtGnx0QVBe4XBdtE8BnvFcPcgqv/9ZeMhi5OQsE2EezOSK2busMt+E/3/42NsuYG7BcF20Qw/ZrU\nXcnNLThFJmC3KNgugv9LdnhKeH0wJLYRfIHke0ooGBLbCKY5Ts+d71AwJPYRvJF4fgkKhsQ+gg/W\nLZ+JgiGxj+AKskwuoGBI7COY5hE5CwqGxEaCt5FS6SsKhsRGgo+SDdJXFAyJjQRXuzKlrygYEhsJ\npiuJtEg5CobEToJ3yovfoWBI7CT4JJGW50fBkNhJcG1amrg+PwqGxE6C6RrpzsYoGBJbCf5eWlUJ\nBUNiK8HlZBVFwbDYSjDNcFWjYFjsJXg9OY6CYbGX4A2EHEXBoNhKsJsQ9i6MgiGxlWAqrj+LgkGx\nl+BjhatPo2BQ7CVYAgVDgoIBQcFcoGBIQk1wqSvZWVrvsQ0EKyPZQLAyUuMRHB8VlxzXI8FbYb1g\nVSTrBasiNR7B7aT/621n6wWrIlkvWBWp8QgeGJ+1NWtmtLfCesGqSNYLVkVqPIKrXLHDJrqqvBXW\nC1ZFsl6wKlLjEYyDLCU4yAIEB1lc4CALEhxkAYKDLC7ABlmTHQ7HvVFguRqm4UFWDovUoYmJkRoe\nZJ1nkRw3HjYxEg6yIMFBFiA4yOICB1mQ4CALEBxkcYFHsiAJsSNZ+6m7ILGg3gKW1gtWRbJesCpS\n4xEcQd94ev6IV7wV1gtWRbJesCpSYxIcyZ6YHb0VdhCsiGQHwYpIjUdwszc7raI5nb0V1gtWRbJe\nsCpS4xF8aPXCVTThmLfCesGqSNYLVkVqPIJVWC9YhfWC1aBgLlAwFygYEhQMCArmAgVDgoIBQcFc\noGBIUDAgKJgLFAwJCgYEBXOBgiFBwYCgYC5QMCQoGBAUzAUKhgQFA4KCuUDBkKBgQFAwFygYEhQM\nCArmAgVDgoIBQcFcoGBIUDAgKJgLFAwJCgYEBXOBgiFBwYCgYC5QMCQoGBAUzAUKhgQFA4KCuUDB\nkKBgQFAwFygYEhQMCArmAgVDgoIBQcFcoGBIUDAgKJgLFAwJCgYEBXOBgiFBwYCgYC5QMCQoGBAU\nzAUKhgQFA4KCuUDBkKBgQFAwFygYEhQMCArmAgVDgoIBQcFcoGBIUDAgKJgLFAwJCgYEBXOBgiFB\nwYCgYC5QMCQoGBAUzAUKhgQFA4KCuUDBkKBgQFAwFygYEhQMCArmAgVDgoIBQcFcoGBIUDAgKJgL\nFAwJCgYEBXOBgiFBwYCgYC5QMCQoGBAUzAUKhgQFA4KCuUDBkKBgQFAwFygYEhQMCArmAgVDgoIB\nQcFcoGBIUDAgKJgLFAwJCgYEBXOBgiEJNcGlrmRnab3HNhCsjGQDwcpIjUdwfFRcclyPBG+F9YJV\nkawXrIrUeAS3k/73bOfJDofjgRdhQnHhV7BPpBwWqeMvzUvkX7BPpPMskuO2MhMj6RE8MD5ra9bM\naG/FkmSISJz4FayKVNXFrDwi/gSrItGe58zKQ/UJrnLFDpvoqvJWWC9YFcl6wapIjUewCusFq7Be\nsBoUzAUK5gIFQ4KCAUHBXKBgSFAwICiYCxQMCQoGBAVzgYIhQcGAoGAuUDAkKBgQFMwFqOBd7R1e\nOtxxu5Imqppbb+FodPNtypqmdzscnabxRKp5pl4kx6/AAtys7on1/+QVnkyJnepFevRWjgA3NVXW\n3KYOoP6xWx5xONquVgcIWrAPRweqqnqcV9Zkv6tq1EFVM2OlsqbsmeAyqfte+AlHo3HfKWt2jeb4\nMS7ez1TWXOyuajT4sLJmzXSOAIlp/n8nClaAgv2BgrlAwSjYpoKv7FdVFdcoa86eUDXaq6o5pnpe\nVJcEl0nd9w8/cDQ6dFlZU3GQ48e4OFGurKktVjXaX6WsuXCUI0CZqm8ZGMGIbUHBIQ4KDnFQcIiD\ngkMcEMHnn27dT3ngbk20oqJ2wl9f9q25FPX4WLdPTVQFrRrUNt63xl/vQUVSZQouUrCZgoykczOB\nCP4skU4kiqBtohWNCl6ig0p9ahbOpj131Xt8+nGhgrriaVSpT42f3oOKpM4UTKTgMwUZSedmAhG8\n4Rido+h70aRoRaMpn9GLFT41ronVnet/gHZXd6ygsVk0zuVT46f3oCKpMwUTKfhMQUbSuZlg3oPd\ni1v7HqC43L0oWtFm2PPtR/nuQiru/3Vv331Ppwo6bCtNTvapUfceVCQ/mYKLFHSm4CLp3EwggmsG\nDz3rWzM7b0+0otGrb9eOW+RTE/dZ7dgsnxqWcyJ7ajp9atS9BxXJT6bgIgWbKchIOjcTiOBFqtO1\ng7q2vvtj36rF82jCQp+a0dk03vfwMMvpmkl7lPrUqHsPKpKfTMFFCjZTkJF0biYQwSMfiIhwKupU\nr+DKZ1v0rfSpOdD2L319329Yzqoh3RJ8a/z1HlQkVabgIgWbKchIOjcTfg4OcVBwiIOCQxwUHOKg\n4BAHBYc4KDjEQcEhDgoOcVBwiIOCQxwUHOLoFpzSrJrSQYM+lCftf6iauy9Rslbvr8FIQaJf8D2r\n6ZWHBwVulD9H76/RAkaqh37Bo8bTVaMHJS2dP6pvVDX78mznAUNaLp+bTadt8pQpfe6xtU92fq72\nnf8c7X68m+PFWqktpce7tY0vEx93Ok27FD/V6uXa+SN7HRNbHukc2Sf/rFiDkfRF0i/4n5Hu1/Kk\n6HTcOvZlOn10966hcnS5LD43t66jTx250rnf7mmL6PA8qS2lEzLo+FjxceLiUz3nJNMJ2fNHuaWW\nr+S42+VLNZTzoa0AAAEWSURBVBhJXyT9gpPGF3UvFaMvorNWsS8ptBM9PohFn7zJUxajl7z06sOH\naXI3OmAf/fgDqS2l3Y9R+fHBZz9ZFN07ulfK/M+p1LLrSTo+X6rBSPoiAQguHDxJip5SP/oHC91t\n6kV/a1yhu8uhS12e3/D6IjoiT2pL6StZdEiM+Jg6nj4/M52mH5jvolLL0cvYc1OqwUj6IgEIrm62\nTh39UJvIft7oex9bENknOmFiXlnHI5GdX6r1RD/SpdW0E+Jj+tYz9HSPyBg3i75abHmwYw/HWqkG\nI+mLZNPPwUt3097qy3ItpZFGsqng3e36xVqdQUEjjWRTwQgUKDjEQcEhDgoOcVBwiIOCQxwUHOKg\n4BAHBYc4/w/j4tk5AxZrNwAAAABJRU5ErkJggg==\n"
}
],
"prompt_number": 31
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Statistics"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import glob\n",
"RES = []\n",
"for run in [\"no\", \"low\", \"high\"]:\n",
" handle = \"SIMsmall/stackf_\"+run+\"/sample_*.tags.tsv\"\n",
" c = [open(ff,'r').read().count(\"consensus\") for ff in glob.glob(handle)]\n",
" RES.append(['stacks',run,float(sum(c))/len(c)])\n",
" \n",
" handle = \"SIMsmall/Py_\"+run+\"/stats/s5.consens.txt\"\n",
" c = map(float,[line.split(\"\\t\")[3] for line in open(handle,'r').readlines()[1:-7]]) \n",
" RES.append(['pyrad',run,float(sum(c))/len(c)])\n",
"\n",
"for res in RES:\n",
" print 'mean N loci per sample from %s indel %s run \\t%f' % (res[1],res[0],res[2])\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"mean N loci per sample from no indel stacks run \t10000.000000\n",
"mean N loci per sample from no indel pyrad run \t10000.000000\n",
"mean N loci per sample from low indel stacks run \t10027.083333\n",
"mean N loci per sample from low indel pyrad run \t10000.500000\n",
"mean N loci per sample from high indel stacks run \t10087.083333\n",
"mean N loci per sample from high indel pyrad run \t10002.000000\n"
]
}
],
"prompt_number": 32
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"infile = open(\"SIMsmall/stackf_high/batch_1.haplotypes.tsv\").readlines()\n",
"tcov = [line.split(\"\\t\")[1] for line in infile[1:]]\n",
"print mean(map(int,tcov)), 'mean coverage stacks highindel'\n",
"\n",
"infile = open(\"SIMsmall/stackf_low/batch_1.haplotypes.tsv\").readlines()\n",
"tcov = [line.split(\"\\t\")[1] for line in infile[1:]]\n",
"print mean(map(int,tcov)), 'mean coverage stacks lowindel'\n",
"\n",
"infile = open(\"SIMsmall/stackf_no/batch_1.haplotypes.tsv\").readlines()\n",
"tcov = [line.split(\"\\t\")[1] for line in infile[1:]]\n",
"print mean(map(int,tcov)), 'mean coverage stacks noindel'\n",
"\n",
"infile = open(\"SIMsmall/Py_high/stats/c85d6m2pN.stats\").readlines()[26:38]\n",
"tcov = []\n",
"for line in infile:\n",
" a = line.split(\"\\t\")\n",
" tcov += [int(a[0])]*int(a[1]) \n",
"print mean(tcov), 'mean coverage pyrad highindel'\n",
"\n",
"infile = open(\"SIMsmall/Py_low/stats/c85d6m2pN.stats\").readlines()[26:38]\n",
"tcov = []\n",
"for line in infile:\n",
" a = line.split(\"\\t\")\n",
" tcov += [int(a[0])]*int(a[1]) \n",
"print mean(tcov), 'mean coverage pyrad lowhindel'\n",
"\n",
"infile = open(\"SIMsmall/Py_no/stats/c85d6m2pN.stats\").readlines()[26:38]\n",
"tcov = []\n",
"for line in infile:\n",
" a = line.split(\"\\t\")\n",
" tcov += [int(a[0])]*int(a[1]) \n",
"print mean(tcov), 'mean coverage pyrad noindel'\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"7.54323296553 mean coverage stacks highindel\n",
"9.95548200248 mean coverage stacks lowindel\n",
"11.9996 mean coverage stacks noindel\n",
"11.9154144691 mean coverage pyrad highindel\n",
"11.9963989197 mean coverage pyrad lowhindel\n",
"12.0 mean coverage pyrad noindel\n"
]
}
],
"prompt_number": 33
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Empirical Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An empirical analysis was conducted on the published data set from Eaton & Ree (2013), available de-multiplexed in fastQ format at the \n",
"ncbi sequence read archive (SRA072507). This step takes many hours using 12 processors, as described in the paper. The data are already \n",
"sorted, so PyRAD is started from step 2. After trimming the barcode and adapter read lengths are 69bp. At .85 similarity this is equivalent \n",
"to 10 base differences. The params.txt file is available in the supplementary materials. Because _Stacks_ and _PyRAD_ use different quality \n",
"filtering methods I employed filtering by _PyRAD_ only, and the data are converted back to fastQ format to be read into _Stacks_. This way \n",
"quality filtering does not affect the results of comparison between the two programs. "
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Initial filtering "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%time ! ~/pyrad_v.1.62/pyRAD -p REALDATA/paramsPAR.txt -s 2"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"\r\n",
" ------------------------------------------------------------------\r\n",
" pyRAD : RAD/GBS for phylogenetics & introgression analyses\r\n",
" ------------------------------------------------------------------\r\n",
"\r\n",
"\tsorted .fastq from /home/deren/Dropbox/raw_cyatho/*.fq being used\r\n",
"\tstep 2: editing raw reads \r\n",
"\t"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 0.37 s, sys: 0.04 s, total: 0.40 s"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Wall time: 157.25 s\n"
]
}
],
"prompt_number": 34
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To remove the effect of different error filtering of the two programs the filtered data from PyRAD were used for both analyses. Because \n",
"PyRAD converts the data from fastq to fasta at this step, discarding read quality scores, the data were converted back to fastq with \n",
"arbitrarily high quality scores (b) added to for each base for use in Stacks. \n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"! mkdir REALDATA/edits4stacks"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 35
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for ff in glob.glob(\"REALDATA/edits/*.edit\"):\n",
" dat = open(ff).readlines()\n",
" outfile = open(\"REALDATA/edits4stacks/\"+ff.split(\"/\")[-1].replace(\".edit\",\".fq\"),'w')\n",
" writing = []\n",
" for line in dat:\n",
" if \">\" in line:\n",
" x,y = line.split(\"_\")[2:4]\n",
" nn = \"@GRC13_0027_FC:4:1:\"+x+\":\"+y+\"#0/1\"\n",
" writing.append(nn.strip())\n",
" else:\n",
" seq = \"TGCAG\"+line.strip()\n",
" writing.append(seq)\n",
" writing.append(\"+\")\n",
" writing.append(\"b\"*len(seq)+\"\\n\")\n",
" outfile.write(\"\\n\".join(writing))\n",
" writing = []\n",
" outfile.close()\n",
" "
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 36
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"PyRAD analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"PyRAD analysis is done with and without paralog filtering"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%time ! ~/pyrad_v.1.62/pyRAD -p REALDATA/paramsNOPAR.txt -s 34567 2> REALDATA/log.py.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"usearch_i86linux32 v6.0.307, 4.0Gb RAM (49.5Gb total), 24 cores\r\n",
"(C) Copyright 2010-12 Robert C. Edgar, all rights reserved.\r\n",
"http://drive5.com/usearch\r\n",
"\r\n",
"Licensed to: daeaton@uchicago.edu\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\tingroup ['FieldMuseum_Ree_29154', 'FieldMuseum_Ree_30556', 'FieldMuseum_Ree_30686', 'FieldMuseum_Ree_32082', 'FieldMuseum_Ree_33405', 'FieldMuseum_Ree_33413', 'FieldMuseum_Ree_33588', 'FieldMuseum_Ree_35855', 'FieldMuseum_Ree_38362', 'FieldMuseum_Ree_39618', 'FieldMuseum_Ree_40578', 'FieldMuseum_Ree_41478', 'FieldMuseum_Ree_41954']\r\n",
"\taddon []\r\n",
"\texclude []\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\t"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"\tfinal stats written to:\r\n",
"\t /home/deren/Dropbox/PYRADMANU/REALDATA/stats/c85d6m2pN.stats\r\n",
"\toutput files written to:\r\n",
"\t /home/deren/Dropbox/PYRADMANU/REALDATA/outfiles/ directory\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 208.31 s, sys: 32.33 s, total: 240.64 s"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Wall time: 127436.99 s\n"
]
}
],
"prompt_number": 37
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"mv REALDATA/stats REALDATA/stats.nopar\n",
"mkdir REALDATA/stats\n",
"rm REALDATA/clust.85/*.consens\n",
"\n",
"~/pyrad_v.1.62/pyRAD -p REALDATA/paramsPAR.txt -s 4567 2> REALDATA/log.py2.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"usearch_i86linux32 v6.0.307, 4.0Gb RAM (49.5Gb total), 24 cores\n",
"(C) Copyright 2010-12 Robert C. Edgar, all rights reserved.\n",
"http://drive5.com/usearch\n",
"\n",
"Licensed to: daeaton@uchicago.edu\n",
"\n",
"\tingroup ['FieldMuseum_Ree_29154', 'FieldMuseum_Ree_30556', 'FieldMuseum_Ree_30686', 'FieldMuseum_Ree_32082', 'FieldMuseum_Ree_33405', 'FieldMuseum_Ree_33413', 'FieldMuseum_Ree_33588', 'FieldMuseum_Ree_35855', 'FieldMuseum_Ree_38362', 'FieldMuseum_Ree_39618', 'FieldMuseum_Ree_40578', 'FieldMuseum_Ree_41478', 'FieldMuseum_Ree_41954']\n",
"\taddon []\n",
"\texclude []\n",
"\t............\n",
"\tfinal stats written to:\n",
"\t /home/deren/Dropbox/PYRADMANU/REALDATA/stats/c85d6m2p3H3N3.stats\n",
"\toutput files written to:\n",
"\t /home/deren/Dropbox/PYRADMANU/REALDATA/outfiles/ directory\n",
"\n"
]
}
],
"prompt_number": 38
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Stacks analysis"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!mkdir REALDATA/stacks85"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 39
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import glob\n",
"infiles = [\"-s \"+ff+\" \" for ff in glob.glob(\"REALDATA/edits4stacks/*\")]\n",
"denovo = \"denovo_map.pl -m 2 -M 10 -N 10 -n 10 -T 12 -b 1 -B 1 -S -X \"+\\\n",
" \"'populations:-m 6' -X 'ustacks:--max_locus_stacks 2' -D 'stacks85' \"+\\\n",
" \"-o REALDATA/stacks85 \"+\"\".join(infiles)\n",
"with open(\"REALDATA/stacks85/stacks.sh\",'w') as outfile:\n",
" outfile.write(denovo)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 40
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!cat REALDATA/stacks85/stacks.sh"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"denovo_map.pl -m 2 -M 10 -N 10 -n 10 -T 12 -b 1 -B 1 -S -X 'populations:-m 6' -X 'ustacks:--max_locus_stacks 2' -D 'stacks85' -o REALDATA/stacks85 -s REALDATA/edits4stacks/FieldMuseum_Ree_41478.fq -s REALDATA/edits4stacks/FieldMuseum_Ree_35855.fq -s REALDATA/edits4stacks/FieldMuseum_Ree_30556.fq -s REALDATA/edits4stacks/FieldMuseum_Ree_33588.fq -s REALDATA/edits4stacks/FieldMuseum_Ree_39618.fq -s REALDATA/edits4stacks/FieldMuseum_Ree_38362.fq -s REALDATA/edits4stacks/FieldMuseum_Ree_40578.fq -s REALDATA/edits4stacks/FieldMuseum_Ree_29154.fq -s REALDATA/edits4stacks/FieldMuseum_Ree_33413.fq -s REALDATA/edits4stacks/FieldMuseum_Ree_33405.fq -s REALDATA/edits4stacks/FieldMuseum_Ree_41954.fq -s REALDATA/edits4stacks/FieldMuseum_Ree_32082.fq -s REALDATA/edits4stacks/FieldMuseum_Ree_30686.fq "
]
}
],
"prompt_number": 41
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%time !sh REALDATA/stacks85/stacks.sh 2> REALDATA/log.stacks.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 1 of 13 [FieldMuseum_Ree_41478]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 2 of 13 [FieldMuseum_Ree_35855]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 3 of 13 [FieldMuseum_Ree_30556]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 4 of 13 [FieldMuseum_Ree_33588]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 5 of 13 [FieldMuseum_Ree_39618]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 6 of 13 [FieldMuseum_Ree_38362]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 7 of 13 [FieldMuseum_Ree_40578]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 8 of 13 [FieldMuseum_Ree_29154]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 9 of 13 [FieldMuseum_Ree_33413]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 10 of 13 [FieldMuseum_Ree_33405]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 11 of 13 [FieldMuseum_Ree_41954]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 12 of 13 [FieldMuseum_Ree_32082]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 13 of 13 [FieldMuseum_Ree_30686]\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 800.48 s, sys: 143.01 s, total: 943.49 s"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Wall time: 266088.77 s\n"
]
}
],
"prompt_number": 42
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Compiling results"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"infile = open(\"REALDATA/stats.nopar/c85d6m2pN.stats\").readlines()\n",
"pno = [int(infile[j].strip().split(\"\\t\")[1]) for j in range(27,40)]\n",
"PEMPN = [sum(pno)-sum(pno[:i-1]) for i in range(1,14)]\n",
"\n",
"infile = open(\"REALDATA/stats/c85d6m2p3H3N3.stats\").readlines()\n",
"fno = [int(infile[j].strip().split(\"\\t\")[1]) for j in range(27,40)]\n",
"PEMPF = [sum(fno)-sum(fno[:i-1]) for i in range(1,14)]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 43
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Stacks does not have the same paralog filtering options that pyrad does and so I implement three additional filters on the data. \n",
"First, I remove loci that have more than three alleles in a locus, and second I remove loci with more than three heterozygous sites, \n",
"and lastly, I remove stacks from the final output that are heterozygous at the same site across more than four samples. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"handle = \"REALDATA/stacks85/batch_1.haplotypes.tsv\"\n",
"lines = open(handle).readlines()\n",
"FILTER = []\n",
"NOFILTER = []\n",
"for line in lines[1:]:\n",
" dat = line.strip().split(\"\\t\")[1:]\n",
" cnt = int(dat[0])\n",
" ocnt = int(dat[0])\n",
" hets = []\n",
" for loc in dat:\n",
" \" remove loci with more than 2 alleles\"\n",
" if loc.count(\"/\") > 1:\n",
" dat.remove(loc)\n",
" cnt -= 1\n",
" else:\n",
" if \"/\" in loc:\n",
" a,b = loc.split(\"/\") \n",
" \" remove loci with more than 4 hetero sites \"\n",
" if len(a) > 3:\n",
" dat.remove(loc)\n",
" cnt -= 1\n",
" else:\n",
" \" remove stack if a hetero site is shared across more than 4 samples\"\n",
" h = [i for i in range(len(a)) if a[i]!=b[i]]\n",
" for i in h:\n",
" hets.append(i)\n",
" if [hets.count(i) for i in set(hets)]:\n",
" if max([hets.count(i) for i in set(hets)]) > 4:\n",
" cnt = 0\n",
" FILTER.append(cnt)\n",
" NOFILTER.append(ocnt)\n",
"\n",
"filt = [FILTER.count(i) for i in range(1,14)]\n",
"nofilt = [NOFILTER.count(i) for i in range(1,14)]\n",
"SEMPF = [int(sum(filt)-sum(filt[:i-1])) for i in range(1,14)]\n",
"SEMPN = [int(sum(nofilt)-sum(nofilt[:i-1])) for i in range(1,14)]\n",
"\n",
"print \"mean coverage stacks no filter\", mean(NOFILTER)\n",
"print \"%fullcov stacks no filter\",float([NOFILTER.count(13)][0])/sum([FILTER.count(i) for i in range(1,14)])*100\n",
"print \"\"\n",
"print \"mean coverage stacks paralog filtered\", mean(FILTER)\n",
"print \"%fullcov stacks filtered\",float([FILTER.count(13)][0])/sum([FILTER.count(i) for i in range(1,14)])*100\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"mean coverage stacks no filter 2.65351217823\n",
"%fullcov stacks no filter 0.93914895677\n",
"\n",
"mean coverage stacks paralog filtered "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2.44040946562\n",
"%fullcov stacks filtered 0.543641380646\n"
]
}
],
"prompt_number": 44
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def meancovpyrad(lines):\n",
" tcov = []\n",
" for line in lines:\n",
" a = line.split(\"\\t\")\n",
" tcov += [int(a[0])]*int(a[1])\n",
" full = 100*float(tcov.count(13))/len(tcov)\n",
" return mean(tcov), full\n",
"\n",
"\n",
"infile = open(\"REALDATA/stats.nopar/c85d6m2pN.stats\").readlines()[27:40]\n",
"nofiltpycov, fullnofiltpy = meancovpyrad(infile)\n",
"infile = open(\"REALDATA/stats/c85d6m2p3H3N3.stats\").readlines()[27:40]\n",
"filtpycov, fullfiltpy = meancovpyrad(infile)\n",
"print \"mean coverage pyrad no filter\", nofiltpycov\n",
"print \"%fullcov pyrad no filter\", fullnofiltpy\n",
"print \"\"\n",
"print \"mean coverage pyrad paralog filtered\", filtpycov\n",
"print \"%fullcov pyrad no filter\", fullfiltpy\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"mean coverage pyrad no filter 2.99135473853\n",
"%fullcov pyrad no filter 1.35862030721\n",
"\n",
"mean coverage pyrad paralog filtered 2.96574310049\n",
"%fullcov pyrad no filter 0.985513914028\n"
]
}
],
"prompt_number": 45
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%load_ext rmagic \n",
"%Rpush SEMPF SEMPN PEMPF PEMPN"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 46
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R \n",
"DIFFN <- c(SEMPN-PEMPN)\n",
"DIFFF <- c(SEMPF-PEMPF)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 47
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"REALRES = cbind(PEMPN,SEMPN,PEMPF,SEMPF,DIFFN,DIFFF)\n",
"print(REALRES)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "display_data",
"text": [
" PEMPN SEMPN PEMPF SEMPF DIFFN DIFFF\n",
" [1,] 180330 213742 165193 206570 33412 41377\n",
" [2,] 78326 86858 72317 82252 8532 9935\n",
" [3,] 51231 52562 47542 48868 1331 1326\n",
" [4,] 42819 42429 39505 39032 -390 -473\n",
" [5,] 37216 35963 34146 32694 -1253 -1452\n",
" [6,] 33022 31270 30151 28009 -1752 -2142\n",
" [7,] 29293 27159 26532 23780 -2134 -2752\n",
" [8,] 25804 23508 23162 20024 -2296 -3138\n",
" [9,] 22171 19807 19470 16182 -2364 -3288\n",
"[10,] 17596 15489 15031 11873 -2107 -3158\n",
"[11,] 12330 10649 10078 7533 -1681 -2545\n",
"[12,] 6843 5791 5165 3678 -1052 -1487\n",
"[13,] 2450 1940 1628 1123 -510 -505\n"
]
}
],
"prompt_number": 48
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Plot of the difference in taxon coverage for the empirical data set"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The difference in the number of loci recovered by the two programs is shown. Above the zero line (grey) Stacks recovers more loci and \n",
"below the line (black) pyRAD recovers more loci. PyRAD returns more loci at high coverage than does Stacks, which instead recovers many \n",
"more singleton and small coverage loci. Plotted on log scale. A better way to visualize the difference in performance between the two programs. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"diffplot <- function() {\n",
" par(cex.axis=1.1,\n",
" cex.main=1.1,\n",
" cex.lab=1.6,\n",
" mar=c(4,4,3,0), \n",
" oma=c(1,2,0,2))\n",
" D <- log(abs(DIFFN)) * c(-1,-1,-1,rep(1,10))\n",
" E <- log(abs(DIFFF)) * c(-1,-1,-1,rep(1,10))\n",
" names(D) <- seq(1,13)\n",
" df.bar <- barplot(D, col=c(rep('grey',3),rep('black',10)),\n",
" ylim=c(-12,10), space=0.5,\n",
" ylab=\"Log difference in number of loci recovered\",\n",
" xlab=\"Minimum taxon coverage\",axes=F)\n",
" box()\n",
" abline(h=0,lwd=2,lty=2)\n",
" points(df.bar, E, bg=c(rep('grey',3),rep('black',10)),\n",
" cex=2, pch=21)\n",
" axis(2,c(-10,-5,0,5,10),c(\"10\",\"5\",\"0\",\"5\",\"10\"))\n",
"} \n",
"diffplot()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAeAAAAHgCAMAAABKCk6nAAADAFBMVEUAAAABAQECAgIDAwMEBAQF\nBQUGBgYHBwcICAgJCQkKCgoLCwsMDAwNDQ0ODg4PDw8QEBARERESEhITExMUFBQVFRUWFhYXFxcY\nGBgZGRkaGhobGxscHBwdHR0eHh4fHx8gICAhISEiIiIjIyMkJCQlJSUmJiYnJycoKCgpKSkqKior\nKyssLCwtLS0uLi4vLy8wMDAxMTEyMjIzMzM0NDQ1NTU2NjY3Nzc4ODg5OTk6Ojo7Ozs8PDw9PT0+\nPj4/Pz9AQEBBQUFCQkJDQ0NERERFRUVGRkZHR0dISEhJSUlKSkpLS0tMTExNTU1OTk5PT09QUFBR\nUVFSUlJTU1NUVFRVVVVWVlZXV1dYWFhZWVlaWlpbW1tcXFxdXV1eXl5fX19gYGBhYWFiYmJjY2Nk\nZGRlZWVmZmZnZ2doaGhpaWlqampra2tsbGxtbW1ubm5vb29wcHBxcXFycnJzc3N0dHR1dXV2dnZ3\nd3d4eHh5eXl6enp7e3t8fHx9fX1+fn5/f3+AgICBgYGCgoKDg4OEhISFhYWGhoaHh4eIiIiJiYmK\nioqLi4uMjIyNjY2Ojo6Pj4+QkJCRkZGSkpKTk5OUlJSVlZWWlpaXl5eYmJiZmZmampqbm5ucnJyd\nnZ2enp6fn5+goKChoaGioqKjo6OkpKSlpaWmpqanp6eoqKipqamqqqqrq6usrKytra2urq6vr6+w\nsLCxsbGysrKzs7O0tLS1tbW2tra3t7e4uLi5ubm6urq7u7u8vLy9vb2+vr6/v7/AwMDBwcHCwsLD\nw8PExMTFxcXGxsbHx8fIyMjJycnKysrLy8vMzMzNzc3Ozs7Pz8/Q0NDR0dHS0tLT09PU1NTV1dXW\n1tbX19fY2NjZ2dna2trb29vc3Nzd3d3e3t7f39/g4ODh4eHi4uLj4+Pk5OTl5eXm5ubn5+fo6Ojp\n6enq6urr6+vs7Ozt7e3u7u7v7+/w8PDx8fHy8vLz8/P09PT19fX29vb39/f4+Pj5+fn6+vr7+/v8\n/Pz9/f3+/v7////isF19AAAgAElEQVR4nO2deUAV1f7AjyRiwMUSl9xTLAPEJSoRRVBwyVJT0xbs\naaWRYfXrpeZTy7RUyszUp2U+17QMU+yZe2a0aJmZL5VMS0t58dz3JQXOb87cbe4sl3s4cw8zp+/n\nD+dyvjPf78x8vHNnPYMwIDSoomcACC4gWHBAsOCAYMEBwYIDggUHBAsOCBYcECw4IFhwQLDggGDB\nAcGCA4IFBwQLDggWHBAsOCBYcECw4IBgwQHBggOCBQcECw4IFhwQLDggWHBAsOCAYMEBwYIDggUH\nBAsOCBYcECw4IFhwQLDgsAk+NnVY/6HTjpk0L0AQYBK8ISI5e3R2SkS+WXMDmA6T4IQF8uCj1mbM\nCRAUmARHHpUHl24wZVaAYMAkuMvQc9K/F0Z2MWlmAPNhEnwwqUp8ckLVxCNmzQ1gOmx70aU/LJ+1\nbGepoqVgh8zXOwBunA2eYC0zXpCp+gLAi8f+j6dgF7DbxY8fn/YXZRKc7UYTAcH8YBA8ToFefF50\nHAiucBgEIw+R0bojjM0ymBIE84NBcFHR/ibNl+wpWNr8jnO6I+ycazAlCOYH02/wkKbyPvjZmMF0\nRUEwP5gEx4xyDl+IoSsKgvnBJPiGZ5zDp6vTFQXB/GASnFarkAwKa6bTFQXB/GASvAHVmZqfP7VO\npc/oioJgfrCd6JgfTY6Sar9HWRQE84PxTNaZNdNmrj9PWxQE8yOIpyqNAcH8YBN8Madz611b865R\nFgXB/GASfC4BOdC2uaj7RbqiIJgfTIJHoXcvom1XJqLJdEVBMD/YzmT1xpfRNoz7xNEVBcH8YBJc\ndbJT8ORwuqIgmB9MgmOHOQUPaEZXFATzg0lwluMAEZwf9ihdURDMDybBJ+tGDURZA8KqF9EVBcH8\nYDsOLsysjFBIzwOURUEwP5gE7z6Mr+7bc4W6KAjmB5Pgyj3LVxQE84NJ8N116L+9BBDMDybBp9M7\n7CqPYsEEl+zdsa9UL3DljXYIJT2juwv65xe5H/0S3PmSYRKc2ELaxapGoCsqlOA/J8q3Dk8v1kRO\nprtuK96hnWqOHOiyP+izxyS4rwe6orYUfPXUGb3mC26Lva6qQz09N46fVodGuCO7gjGrSuB6cGDs\n6SfZSJyo/UH6u8ei+orLV94nAyaoQps9kVTtF99cQHBAfOzykaH+Ll72WkSq3+Gx3kgn1VR9vKFN\nmlonXu+Rcd90/WcJ6IEL/oHwq8eH+sfoW4Vg1T7T44qQaipF5FV1rVxXYL05sw4X/APhKa+Q3b6R\ntQpX3/iGnvZGUlX5FBM9rwp5t946u2blAC74Kyhd2Vvas31L+9+1g1fINN+In2/wv7yRTFU+xUSv\nq0Lpnkh35uUhwAV/L+d7OddscoE6ohAy3Dfi5zf4tDeyRZXvMW9I9T09pMh3in2R4IK/kn7uNdtG\nfSOwYq2PV4W8e9GT1Pncv6ZIs44LPBPdq4psUpTaybg8MnDB38MO4+1mL29ojSp0oYsr0NvnOHgv\n6f/krTg5kr2d/OGzJ7rENVHqUVU670+wScfIcMHfwwTvqlV37fWJJ9JeqfHlDIlOMXKgWTr5Y6or\nUog0LPTJ+E030jZc88zA74op/PaPEyhwwd/DYO+qTVM0F5C+ahLckUzpj8klrlCqxmJnV+SgVvAc\nV+jsJicfL16y0fnpW+VcdPRM0MuUpforXvDfPyNn1m/a5mFeGR0VzTlaV+6zHWnlEfy6NqQ8B+o9\n/7VHM4elawe1a3vfvy7RLCzrmSz7XfA/fK+8+u7XdHM827vG+ymazRbsJ5+M+8j6S82cX3btBbY/\nSLG4TIJpv7luKlJwYbJrBbb3OQqRdoo2eNf4G9JOkfvHlrdgfObdRzIGLbqgnfWB7vGTKb7DTIJR\n2mLKc1hOKlLwg57Vquze6bB2rc9zhTgKfl/TT12eYh5/9E4wPfDlZRL8cDiKevI73WvdfqlAwYqz\nD+hPb3P5hJgtOMUwH2Git7lb4AvM9ht84f17K6OEt04EXk+mAgV/qVh7ihNW1hBsnI8wxNus3Msv\nA+bLhSfeSa1UpX/gBQkVKHidYu0pDk/sIDi7YgRj/FOW5nJYGVSg4P8o1l6ht9kOgqd7m3sHvsCs\nggvGN0eo5WuBFyRUlOA3+/W737uWavTr18+91u0gWLEj+GHgy8wkeO/L8QjdPHpv4OWcVJTgcp15\nsoxg/Ia7tTfFfi3bYRKqmf01/U50hQnmJyQogvEUZ+NAml5vmAQPWKu5kzAgQDBtvrMpiYQWDaKr\n142TPyZ+Fdgy/6VuurOvYD/5yoBR8OZh3TKyPw2slBcQbBPBJQ8iFBmFUGaJ//HUgGCbCJ6BBhws\nLT2YiWYFVswNCLaJ4DYp8le3JLltYMXcgGCbCHa84hyOdwRWzA0ItongRq4rbk81CayYGxBsuuCj\nI9sj1PUD7UkJtqcLo+QLMgUO611sODszLfXOgVt8Gy0hJCiC3fff9/8Tq2ASvD88bPCMGYPDwilv\n7Qi+4N1tnQv8uM/De5YQEgzBRz2tT6rXBNtx8PY2JGnSt2WMpibogk95Ftjn0R9LCAmG4BHe5l9V\nq4LxREfpH/n5RdRno4Mu+EXvAh9WNFtCSDAEt/c2z1StCtZTldJxEv2TrEEXnKa/HiwhxOx8axMT\nExXNNyUmpit/iBnf2RAj/YfJbLy8jNHUBF2wYoFfUDRbQgjnuzSZBG+q1GAjxksao7X+x1PDU/CL\nimarCzE7H4FJcMZt8uMz5+NS/I+nJuiCu3gX9n1Fs9WFmJ2PwCS4hqtzkQk3+h9PTVAFn8zNzc3y\nLuyi3NyN7pDVhZidj8AkuN5I53B4Xf/jqQmqYD/P/lhdiNn5CEyC76snP1Z4tK76MeYyCKpgGwsx\nOx+BSXDB9TXHfrhiQu3QH/yPpwYE88lHYDtM+qIlydeUtscfEMwnH4HxREfx7hUffK85wV0WIJhP\nPoJ4N93ZWIjZ+Qji9XRnYyFm5yOI19OdjYWYnY8gXk93NhZidj6CeD3d2ViI2fkI4vV0Z2MhZucj\niNfTnY2FmJ2PIF5PdzYWYnY+QhB7ult3CZcu693/Y20EBPPJRwhiT3foEJ7nGPFSrYWaCAjmk48Q\nxFfbSYLvysV4i2IX+1tn54yUD0LQYWMhZucjBPHVdpLgJr9gfEKxi70wRybCf1o2bCzE7HyEIL7a\nDs3elTkX48W3ayKwieaTjxDEV9sNSq6JbsGrQlZrIiCYTz5CcF9td2Yv3qV5AQII5pWPIN6r7Wws\nxOx8BLgeLGMNIWbnI4BgGWsIMTsfAQTLWEOI2fkIIFjGGkLMzkcAwTLWEGJ2PgIIlrGGELPzEUCw\njDWEmJ2PwNbLjgf/46kBwXzyEZgEd/Xgfzw1IJhPPgJsomWsIcTsfATx+uiwsRCz8xHE66PDxkLM\nzkcQr48OGwsxOx9BvD46bCzE7HwE8frosLEQs/MRxOujw8ZCzM5HgD46ZKwhxOx8BOijQ8YaQszO\nR4A+OmSsIcTsfAToo0PGGkLMzkeAU5Uy1hBidj6CeC/GsrEQs/MRxHsxlo2FmJ2PIN6LsWwsxOx8\nBPFejGVjIWbnI4j3YiwbCzE7H0G8F2PZWIjZ+QjivRjLxkLMzkcQ78VYNhZidj6CeC/GsrEQs/MR\nxHsxlo2FmJ2PAKcqZawhxOx8BAbB0Qr8jacFBPPJR2AQ3EuBv/G0gGA++QiwiZaxhhCz8xFAsIw1\nhJidjwCCZawhxOx8BBAsYw0hZucjgGAZawgxOx8BBMtYQ4jZ+QhMgiM+8x83AgTzyUdgEJy3Dk1a\n58LfeFpAMJ98BAbBypT+xtMCgvnkIzAIXpWHXs5z4W88LSCYTz4C02/wAz/6jxsBgvnkI8BetIw1\nhJidjwA3vstYQ4jZ+Qhw47uMNYSYnY8AN77LWEOI2fkIcOO7jDWEmJ2PADe+y1hDiNn5CHDju4w1\nhJidjwA3vstYQ4jZ+Qhw47uMNYSYnY8AN77LWEOI2fkIcOO7jDWEmJ2PAKcqZawhxOx8BBAsYw0h\nZucjgGAZawgxOx8BBMtYQ4jZ+QisbwAvFyCYTz5CEN8AbgwI5pOPEMQ3gBsDgvnkIwTxDeDGgGA+\n+QjBfQO4ASCYTz4CvAFcxhpCzM5HgMMkGWsIMTsfAQTLWEOI2fkIbIIv5nRuvWtr3rUyRlMDgvnk\nIzAJPpeAHGjbXNT9ov/x1IBgPvkITIJHoXcvom1XJqLJ/sdTA4L55CMwCY7pjS+jbRj3ifM/nhoQ\nzCcfgUlw1clOwZPD/Y+nBgTzyUdgEhw7zCl4QDP/46kBwXzyEZgEZzkOEMH5YY/6H08NCOaTj8Ak\n+GTdqIEoa0BY9SLd8LGpw/oPnXZMGwDBfPIR2I6DCzMrIxTSU/+u2Q0Rydmjs1Mi8jUREMwnH4H1\nTNbVfXuMriclLJAHH7X2Nq2YI1M5gzDgT6nlt14psRJtZknNk1rGyp+Xyu3SGHf4tkufv5PbpWkX\n+LZLn1NdOXUWOM1Vq7omUsc5Dxnt9Fags1ZTbai7a/61U1V3LZeOkFjX8urkO+1a3hs0kc6u5b1P\nJ1/Zy0umTbqTTTAuNrxpNvKoPLik+L5uypVxCu5zWWrZ363pTU2aNIkdM27cuKebNiHE3nqatJMF\n9m0fM25sZ7k9I6NrC5926XPtWFfOwtzprRIIbZZKpcjnVu5aSb7tEp2c85CR3tK3Xfp8zDkPGSmq\n9tzcOd1c8/9NrrpWG9dyFc9upa7VwbW8Kep5aLOi1LW8d6rnIXena3l75+aqa3Uoe3nJtEktWQSf\nev620IjWb1zWDXYZek7698LILpqIzyY6J+czFe09myx1ZKNnk9VNHRozBwM6MG2ij9dH9br3qI9a\n6J6qPJhUJT45oWriEU0EBPODSfBj6PWrGF/LQX/XDZf+sHzWsp06m3AQzA+2U5X3OIfd4+mKgmB+\nMAmuNt45fDmKrigI5geT4E73O4d90uiKgmB+MAn+vOqsYumQYBZ6j64oCOYHg+CBAwfeihr17NkI\nxb9CVxQE84NBcJgCuqIgmB8Vf9MdCA4qIFhw2LoyXDQk0wldURDMDybBoxEKuU6GrigI5geT4Lpx\nBdQdsBBAMD+YBEe+Xb6iIJgfTILbjStfURDMDybBm278olxFQTA/mARfS0I3J8rQFQXB/GASPAw5\nElrK0BUFwfxgEtwwnfKpMxcgmB9MgmvNLV9REMwPJsG9ny1fURDMDybBR2LeLdc2GgTzg0lwfG1U\nKUKGrigI5gfbq+080BUFwfyAy4WCA4IFh0nwFg90RUEwP5gEe59noysKgvnBJPgDwsLRtTvuoCsK\ngvlhxm/wqSZT6IqCYH6YspM1OYauKAjmhymCZ1WhKwqC+WGG4JOtWF5OCYKDCpPgZjK3hCF4dMWy\nMAlOddLtzWK6oiCYH3AmS3BAsOCwCS5a8Z4TuqIgmB9Mgr+IhFOVVoftxvfwt/K/lKErCoL5wSQ4\nYkz5ioJgfjAJvqGcKxUE84NJcGafcj1cCII5wiT4WMLQgyXlKAqC+cF2V2VjhELhrkpLA3dVCg6c\nyRIcECw4IFhwQLDggGDBAcGCA4IFB64HCw5cDxYcuB4sOHA9WHDgerDgwPVgwYHrwYID14MFB64H\nCw6cyRIcECw4TIIbeaArCoL5wSS4qwe6oiCYH7CJFhwQLDggWHBAsOCAYMEBwYIDggWHTfDFnM6t\nd23Nu0ZZFATzg0nwuQTkQNvmou6Ur+YAwfxgEjwKvXsRbbsyEU2mKwqC+cEkOKY3voy2Ydwnjq4o\nCOYHk+Cqk52CJ4frRdddwqXLevf/WBsBwfxgEhw7zCl4QDPdiQ/heY4RL9VaqImAYH4wCc5yHCCC\n88Me1Z34EL4rF+Mt2u03COYHk+CTdaMGoqwBYdWLdCc+hJv8gvEJxfZ7YY6Mzy1cIDiosB0HF2ZW\nRiik5wH9iWfvypyL8eLbvU3fbpJxKEcDwUGF9UzW1X17rhiEBiXXRLfgVSGrNRHYRPODVXAJxueM\no2f24l0F2mYQzA82wfNjZmKc2Xg5ZVEQzA8mwZsqNdiI8ZLGaC1dURDMDybBGbedJYPzcSl0RUEw\nP5gE15jgHE64ka4oCOYHk+B6I53D4XXpioJgfjAJvq+efIbjaN176YqCYH4wCS64vubYD1dMqB36\nA11REMwPtsOkL1qSHliarqcsCoL5wXiio3j3ig++/5O2KAjmB9x0JzjQEZrgQEdoggMdoQkOdIQm\nONARmuBAR2iCAx2hCQ50hCY40BGa4MCZLMEBwYIDzwcLDjwfLDjwfLDgwPPBghPE54ONAcH8COLz\nwcaAYH4E8flgY0wTvH7yo4+MzgPBfgji88HGmCV4TNyTr015/q4HNoFgQ4L4fLAxJgke2XWD3Ppc\nVxBsCJPg3Yf9PR9sjDmC85pvcjX3ygHBRjAJrtyzfEXNEfz8cHfz0k4g2AgmwXfXof/2EkwQnLp0\nadd5nvbYpUuzQbAuTIJPp3fYVR7F7ILPZ/br13iJp71Vv379vivHjPwFYBKc2ELaxapGoCvKLpgw\nbqq7eX0GXf2/EkyC+3qgK2qO4K093M0vvkpX/6+ETa8Hy9wzw9m68rZTwZlPEbCz4JNdniUHwq+1\n3BGc2RQCm17wd3JtXtfmcWnj4PvrB5te8AcCxaYX/IFAsekFfyBQbHrBHwgUm17wBwLFphf8gUCx\n6QV/IFBsesEfCJQgdghuDAjmB1sXDp+VrygI5geD4Lx1aNI6F3RFQTA/GAQjBXRFQTA/GASvykMv\n57mgKwqC+cH2hP+P5SsKgvkBO1mCAztZggM7WYIDO1mCAztZgmPnm+6AAGAS3MgDXVEQzA8mwV09\n0BWlFbxm6ToQXE6sv4ne/HxiUmqbu8aC4HJhecGfpj9Gvr6fPNhrMwguBwyCxymgKxqo4JQpU6bc\n/ZSr9eH7p0zJAcG0mHKiIzKarmigglfl5EyK/dTVur5hTk4O7Su4AAbBRUX7mzRfsqdgafM7/LwE\nXI9ABUt839fT3Pl3uioAgek3eEhT+f3BZ2MG0xWlELz2CU/zQ9/QVQEIbE82jHIOX4ihK0oh+OtM\nT3OPfXRVAALbW1eecQ6frk5XlELwuds9zfG0DzECmFFwWq1CMiismU5XlEIwHvSGq3XscLoigAyT\n4A2oztT8/Kl1KlFe+KcRfKzNm3LjxBTKPTlAhu1Ex/xocpRUm/LdlFSC8cnB7Yf+Pavts+cpiwAy\njGeyzqyZNnM99aqnEozx8X/PWaNqAgIlmKcqj00d1n/otGPaAKVggIEgCt4QkZw9OjslIl8TAcH8\nCKLghAXy4KPWmggI5kcQBUcelQeXbtBEQDA/gii4y1ByYHNhZBdNBATzI4iCDyZViU9OqJp4RBMB\nwfxgEjzXxaJ131/WCZf+sHzWsp3KVwy/+oQMCOYHk2DFne913g6kWsEOmSRlGwgOKkyCD7evP/Hj\n1TkNOn/yTge0NvCiaco/QHBQYRI8otEJMjjR6J+4OFVzwSHbjWY6EMwPJsG3/sM5HCUd6s6uo47O\ni44DwRUOk+Dqrgv+I6thvNChCY/NMphOT/Dq0UOyZm4GwWbDJLhLA/lE87EGnXBpjzs04Z1zDabT\nCt70UOJTOeP6x08HwSbDJPjb0LoTV6+eVC/k68Md0cLAi/oIfishNTW11nPyd/fju1pIf9SGK7/m\nwXaiY10zcojUcBX+rubrpWWMq8BHcPGpU6cW9Xf9/K6+vejUKfBrIoxnsq5uX7Jg6yVpWEJTNE3d\n0GOlewfrsdU0iYAyqZBHVzSCW3n2oN8cH5ySf1kYBW8e1i0j+1Paon4Evw231pkLk+CSBxGKjEIo\nk2oDrSO4w2a34PHT6FIBZcAkeAYacLC09GAmmkVXVCP42Rluwem76VIBZcAkuE2K/NUtSW5LV1Qj\n+EirNU6/Ux6mywSUBZNgxyvO4XjtWSy/aATjNfFvS3o3PtkJDpFMhq2PDtdp5qea0BXVCsYHn2iV\n2DpjNjydYjZsL6eMKiCDAkd/uqI6goEgwSR4f3jY4BkzBoeFU/bpD4L5wXYcvL0NOVWZ9C1lURDM\nD8YTHaV/5OcXUZyFdgKC+WHKqcrfvqQrCoL5YYrgMZTnrEEwP0Cw4IBgwQHBggOCBQcECw6D4OUe\n+oBgy1IhL+UAwfxgEDxZAV1REMwPa9x0BwQNECw4FSI4dg7Ai5cqQvCOXIAbfq/VB0kwYBVAsOCA\nYMEBwYIDggUHBAsOCBYcECw4IFhwQLDggGDBAcGCA4IFBwQLDggWHBAsOFwFz9f2O+xkd1pU/fH6\nPTXNjglrMM7g8dWSjFf0A4PIfaDrdEMXhtxUe4JevrnyzaP6r9FdmVC1oe5EGK9qFd5K963l8sIe\n6uposVE3or825Db9tSGH/K0NfTgK3vlStIHgczeOPflNvX/qhX4I23zu0ypr9Kd7DRkITp596NCh\nS7qhng8Vfl3tI72ZkCY59MAkvWmOh8w8tiU8T3f+IvIufFbzoKbdubClSUP/Ny/iuE5Ed20423TX\nhjPkd23ow1Hw3Kw4A8EbI4oxfvFevdDhDfjKnhrbdSfbHtPBQHDNXUYz8Uv4ael7VWgQ3Z5WrNd8\n4Yb5l76r9oVeaGpH6Z/Hx2ranQv7Q5WzGLedqhPRXRvONt214Qz5WxsGcN1E63QNL3PuN4yL0/6u\nHzxeCT2nu1E61yy/l77gMygzJmmB7kQrm09KSHzbYCN37Y6f9AOfo0pogm5kXbUfS7Y3zNSJkIVd\nHk8+DNOJYP21QdoM1oZzdMO1YYQlBEv80qXTSYNQ8fZaeptUPHAMNhC895Z3DyxzfKgXmo2G7lsb\n/b5+pSkGHdgX1l585Zu6uu8kKX3FEdG6by+dCFnYWcnSh9HqHorKEIz114YrZLQ2jLCG4Ctja0/U\n70DrwD7pn2eG6ESW3XXVSLDMiN56rYujpe3f6Pt0pzgX9Yt+qjntpX/GGHTRV3IeD3pSp50s7LIE\n8kEdLUuw/togIeO1YYglBJd0767zilqZ2d2kfx5Xb+QIA8Ojo0Ov175mQGLXKumfl3S79vqKCB6j\n3+vX7E4GMyF/Ef+hO9Evg67g0kZ6r5QiC7uz6gWMO7ypE8HGgg3WBgkZrw1DLCF4Q7WfpD3Yo3qh\nn6ssP7MuUm//5uSRI0e6DP9Db6KCkMUnPq+1Si9UEj/8RH70Ct3Z6DBdf/bw7463T39WXXcv+nLt\n0f8dVVPvBdpkYUsTX7i8MkKtqwzBBmuDhIzXhiGWEPyqfAiq90OG8YdtI+KWGSU02kT/u3V47Hz9\nkHRkGjNbN3K2suG+91dJ4bcavH9kawtH+s96AXlhj3Su1vJz3YixYIO1IY/ud23oAmeyBAcECw4I\nFhwQLDggWHBAsOCAYMEBwYIDggUHBAsOCBYcECw4IFhwQLDggGDBsb7gZ1Gl/zo/namCHsc4vZFP\nWPUnoMIOgpHrTovFiAi+J9YnrPozIBagbSbMmD2wg+Bayc5P99Yigk0ABFuJZ9FQdJh8OF1lKAim\nxg6Ct6A3yIdF0gdJcNdGGD8Qf/yR6pHtyevYnH8W9q3daNi1T9s76o8twTi1GRn/A7RFGyKkI4Sk\nqfK71g9r+MQfGP9YpXep9G/oACl2oF/jqLYzyYiKGk5+6l2/RudNWDnSALSX/P0wOoDxyaFxVZv+\ng7wduW/i6btDd3vzY7y9W3SzkYcRua3LOxof7CD4UJs7yYd7Yo95BDdpcd/bo8MdRa4/azd9Zk4n\nlHTjS7Nj0WJfwaoQ4fvhaPqXOA81fXH8PYhs/l9D7+Ortzc8I5mIcDw1oR16CPvUkPkyvO7w8bdV\nWqAcaS16WYpcCE/F+L/1HNmTeqEWFyXBLdLr9DisyL++Sv2RoxqlEMGK0fhgC8Fvol+lLXTouOMe\nwWio9J2bhla6/3wX44sO9AXGe9EQX8GqkIy8ie4XSu5MfQRJq7q4XfWiCZU+x7i0bZj0lSzJRJ/5\n1CAUt6opfRnP1m+qHOlqDfJsyvvkf86AqP3SxzkoRxKMUq4o819rVlf6eKI+EawYjQ+2EHwETcZ4\nIdqjECwZx/noPdeflcgXIuFm6Z/SkExfwaqQjCz4/Fnp07W70WlpcCC8XegIaVjk/JH/FT3tU4Ow\nG8lPmG2cWaIcKRsVSDt/URfx1coDL0tcdHQmgsn9t97829FEMsFESbByND7YQjBu3wrj7vHYKziU\n/Ep+6REcQUZsmUj+vU4lWBWSce5k/TT/uQwHkgXj2ShW+tJJOmeQv0oju/nUIKxA7medlCN9jcbj\nE5WfxHif++00rSXBISSXN/8S9G/y98eSYOVofLCH4Jno51Oh4xWCZW1lCF7oX/CrKHrgvP887xQ8\nAjl+x153URk+NQgfIPfjEMqRSm9uLm1vt2O8B2Vtk5G+vH2rkbg3/wK0mjSslgQrR+ODPQQXhUxY\nIG0MAxZ8C/k0xq/gE9d1Jg94OQV/XikrvLP0i/uH82f6EMrWCP7e+fzoksxzypHwaLQvNUGa8koI\n2QfHV+ZudQlW5N+K5JcS5UiClaPxwR6Cccf4uxNwwIK7hZ3C+NRNfgR/Lf2EPit9OhuHpFHPNLz9\n6gw0R/pCtqn6k7z/tEkj+M+Ym6Q9pQtNGvuMRL6Sld4i8R5h+Zj8zq50CVbkv9Sg3jHp+KgB2clS\njMYHmwh+B4W8ErjgN1Hr2a816WgoeCnK/LDktusn5r5aLw69+D/8t9D/4JJ2kVKdb8KjhklHQORJ\nUZVgvC607ohX4lGuz0hSalTlBBkeqB72SM79qNs1l2Bl/lUhDV8YfXNntMhnND7YRPCx69C+wAUX\nj785tPHrOwwFH+8e2Rr/2rdG9fTNRa0c+R/JW9+fq3aS9qr2973Z0eaf8okOlWC8s/tNN7ST+8fw\njkSOoV2Plf7xt1uuj51I9tmdv8He/Bh/1alaq5mfyN3CeEfjg/UF25+r38mnWueg7yugOAgOPiU1\nOkq7YcUd6nfqGNYAAABQSURBVFyugOIgmAOvoR6L3klD+o8lBxkQzIHS+bdH3tSJqnsr0wDBggOC\nBQcECw4IFhwQLDggWHBAsOCAYMEBwYIDggUHBAsOCBYcECw4/w/pEquuHWln1wAAAABJRU5ErkJg\ngg==\n"
}
],
"prompt_number": 49
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"pdf(\"diffplot.pdf\", width=7, height=6, family='Times')\n",
"diffplot()\n",
"dev.off()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 50
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Estimate proportion of indels in pyRAD assembled empirical data set"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def counts(infile, mincov):\n",
" lines = [i.split(\"\\n\") for i in infile]\n",
" INDS = []\n",
" for loc in lines[:-1]:\n",
" seqs = []\n",
" for line in loc:\n",
" if \">\" in line:\n",
" seqs.append(line.split(\" \")[-1])\n",
" if len(seqs) >= mincov:\n",
" N = numpy.array([list(i) for i in seqs])\n",
" cols = iter(N.transpose())\n",
" inds = 0\n",
" col = cols.next()\n",
" while 1:\n",
" if \"-\" in col:\n",
" try: col = cols.next()\n",
" except StopIteration: break\n",
" if \"-\" not in col:\n",
" inds += 1\n",
" else:\n",
" try: col = cols.next()\n",
" except StopIteration: break\n",
" INDS.append(inds)\n",
" i = [i for i in INDS if i!=0]\n",
" return float(len(i))/len(INDS)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 51
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#infile1 = open(\"REALDATA/outfiles/c85d6m2pN.loci\").read().split(\"//\")\n",
"infile1 = open(\"REALDATA/outfiles/c85d6m2p3H3N3.loci\").read().split(\"//\")\n",
"infile2 = open(\"SIMsmall/Py_high/outfiles/c85d6m2pN.loci\").read().split(\"//\")\n",
"infile3 = open(\"SIMsmall/Py_low/outfiles/c85d6m2pN.loci\").read().split(\"//\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 52
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for i in [13,12,10,8,6,4]:\n",
" dat = counts(infile1,i)\n",
" print 'cyatho',i,\"%0.3f\" % dat\n",
"indelesthigh = counts(infile2,12)\n",
"indelestlow = counts(infile3,12)\n",
"print 'simhigh', 12, \"%0.3f\" % indelesthigh\n",
"print 'simlow', 12, \"%0.3f\" % indelestlow"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"cyatho 13 0.141\n",
"cyatho"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 12 0.156\n",
"cyatho"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 10 0.214\n",
"cyatho"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 8 0.260\n",
"cyatho"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 6 0.287\n",
"cyatho"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 4 0.306\n",
"simhigh"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 12 0.467\n",
"simlow 12 0.195\n"
]
}
],
"prompt_number": 53
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Large Simulation Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Simulate the large RADseq data set on the same species tree as before, with no indel variation, but allowing mutations to arise in the \n",
"restriction recognition site (25bp region in this case to simulate large dropout). I simulate 500K loci to create a data set that has \n",
"many fewer loci within any individual sample due to locus dropout (~150K) such that within-sample clustering is fast, but many loci to \n",
"cluster across samples (500K) such that across-sample clustering is slow. I compare run times to analyze these data in _Stacks_ and _PyRAD_ \n",
"including when implementing hierarchical clustering in _PyRAD_. "
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [
"!python simradsMANU.py 0.0 50 500000 1 SIMbig/big12.fastq"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\tsimulating 500000 RAD loci across 12 tip taxa with 1 samples per taxon\r\n",
"\tindels arise at frequency of 0.000000\r\n",
"\tmutations in restriction site at length (freq) = 50\r\n",
"\ttheta=4Nu= 0.0014\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\tcreating new barcode map\r\n"
]
}
],
"prompt_number": 54
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"mkdir SIMbig/big0/\n",
"mkdir SIMbig/big4/"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 64
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%time ! ~/pyrad_v.1.62/pyRAD -p SIMbig/paramsbig.txt -s 12345 2> SIMbig/log.big"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 10.67 s, sys: 1.66 s, total: 12.33 s"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Wall time: 7884.25 s\n"
]
}
],
"prompt_number": 65
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%time ! ~/pyrad_v.1.62/pyRAD -p SIMbig/paramsbig.txt -s67 2> SIMbig/log.big"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"usearch_i86linux32 v6.0.307, 4.0Gb RAM (49.5Gb total), 24 cores\r\n",
"(C) Copyright 2010-12 Robert C. Edgar, all rights reserved.\r\n",
"http://drive5.com/usearch\r\n",
"\r\n",
"Licensed to: daeaton@uchicago.edu\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\tingroup ['1A0', '1B0', '1C0', '1D0', '2E0', '2F0', '2G0', '2H0', '3I0', '3J0', '3K0', '3L0']\r\n",
"\taddon []\r\n",
"\texclude []\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\t"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"\tfinal stats written to:\r\n",
"\t SIMbig/big0/stats/big.c85d6m2pN.stats\r\n",
"\toutput files written to:\r\n",
"\t SIMbig/big0/outfiles/ directory\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 44.70 s, sys: 7.13 s, total: 51.83 s"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Wall time: 33415.28 s\n"
]
}
],
"prompt_number": 66
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"mv SIMbig/big0/clust.85/ SIMbig/big4/\n",
"mkdir SIMbig/big4/stats"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 67
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%time ! ~/pyrad_v.1.62/pyRAD -p SIMbig/paramsbigH4.txt -s 67 2> SIMbig/log.bigH4"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 4\r\n",
"2 4\r\n",
"3 4\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"usearch_i86linux32 v6.0.307, 4.0Gb RAM (49.5Gb total), 24 cores\r\n",
"(C) Copyright 2010-12 Robert C. Edgar, all rights reserved.\r\n",
"http://drive5.com/usearch\r\n",
"\r\n",
"Licensed to: daeaton@uchicago.edu\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\tingroup ['1A0', '1B0', '1C0', '1D0', '2E0', '2F0', '2G0', '2H0', '3I0', '3J0', '3K0', '3L0']\r\n",
"\taddon []\r\n",
"\texclude []\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\t"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
". . . . . ."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"\tfinal stats written to:\r\n",
"\t SIMbig/big4/stats/big.c85d6m2pN.stats\r\n",
"\toutput files written to:\r\n",
"\t SIMbig/big4/outfiles/ directory\r\n",
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 8.23 s, sys: 1.12 s, total: 9.36 s"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Wall time: 6755.55 s\n"
]
}
],
"prompt_number": 68
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"mkdir SIMbig/stackf\n",
"cut -f 2 SIMbig/big12.fastq.barcodes > SIMbig/big12.stacks.barcodes"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 69
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"! process_radtags -f SIMbig/big12.fastq -o SIMbig/stackf -b SIMbig/big12.stacks.barcodes -e pstI -E phred33"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Using Phred+33 encoding for quality scores.\r\n",
"Found 1 input file(s).\r\n",
"Searching for single-end, inlined barcodes.\r\n",
"Loaded 12 barcodes (6bp).\r\n",
" Processing RAD-Tag 22170000 \r",
" 22173620 total reads; -10000000 ambiguous barcodes; -0 ambiguous RAD-Tags; +0 recovered; -0 low quality reads; 12173620 retained reads.\r\n",
"Closing files, flushing buffers...\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Outputing details to log: 'SIMbig/stackf/process_radtags.log'\r\n",
"\r\n",
"22173620 total sequences;\r\n",
" 10000000 ambiguous barcode drops;\r\n",
" 0 low quality read drops;\r\n",
" 0 ambiguous RAD-Tag drops;\r\n",
"12173620 retained reads.\r\n"
]
}
],
"prompt_number": 71
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because the barcodes were made randomly and every single of the 120 input files has to be written in by hand for the stacks command \n",
"denovo_map.pl I wrote a script here to find the sample names, as output by process_radtags, and create an input file for denovo_map.pl \n",
"written to a bash script called stacks.sh"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"makestackslist(\"nom\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 73
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"infiles = [\"-s \"+ff+\" \" for ff in glob.glob(\"SIMbig/stackf/*.fq\")]\n",
"denovo = \"denovo_map.pl -m 2 -M 13 -n 15 -T 12 -b 1 -B 1 -S -X \"+\\\n",
" \"'populations:-m 6' -D 'SimRADsBIG' -o SIMbig/stackf \"+\"\".join(infiles)\n",
"with open(\"SIMbig/stacks.sh\",'w') as outfile:\n",
" outfile.write(denovo)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 74
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%time ! sh SIMbig/stacks.sh"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Found 12 sample file(s).\r\n",
"Identifying unique stacks; file 1 of 12 [sample_GTTAGG]\r\n",
" /usr/local/bin//ustacks -t fastq -f SIMbig/stackf/sample_GTTAGG.fq -o SIMbig/stackf -i 1 -m 2 -M 13 -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 2 of 12 [sample_ATGTGT]\r\n",
" /usr/local/bin//ustacks -t fastq -f SIMbig/stackf/sample_ATGTGT.fq -o SIMbig/stackf -i 2 -m 2 -M 13 -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 3 of 12 [sample_TGGTGG]\r\n",
" /usr/local/bin//ustacks -t fastq -f SIMbig/stackf/sample_TGGTGG.fq -o SIMbig/stackf -i 3 -m 2 -M 13 -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 4 of 12 [sample_TGATGA]\r\n",
" /usr/local/bin//ustacks -t fastq -f SIMbig/stackf/sample_TGATGA.fq -o SIMbig/stackf -i 4 -m 2 -M 13 -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 5 of 12 [sample_ATTTGG]\r\n",
" /usr/local/bin//ustacks -t fastq -f SIMbig/stackf/sample_ATTTGG.fq -o SIMbig/stackf -i 5 -m 2 -M 13 -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 6 of 12 [sample_TGGGTG]\r\n",
" /usr/local/bin//ustacks -t fastq -f SIMbig/stackf/sample_TGGGTG.fq -o SIMbig/stackf -i 6 -m 2 -M 13 -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 7 of 12 [sample_ATTGAG]\r\n",
" /usr/local/bin//ustacks -t fastq -f SIMbig/stackf/sample_ATTGAG.fq -o SIMbig/stackf -i 7 -m 2 -M 13 -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 8 of 12 [sample_GGAGAA]\r\n",
" /usr/local/bin//ustacks -t fastq -f SIMbig/stackf/sample_GGAGAA.fq -o SIMbig/stackf -i 8 -m 2 -M 13 -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 9 of 12 [sample_AGTTAT]\r\n",
" /usr/local/bin//ustacks -t fastq -f SIMbig/stackf/sample_AGTTAT.fq -o SIMbig/stackf -i 9 -m 2 -M 13 -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 10 of 12 [sample_TAGTTA]\r\n",
" /usr/local/bin//ustacks -t fastq -f SIMbig/stackf/sample_TAGTTA.fq -o SIMbig/stackf -i 10 -m 2 -M 13 -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 11 of 12 [sample_GAAATA]\r\n",
" /usr/local/bin//ustacks -t fastq -f SIMbig/stackf/sample_GAAATA.fq -o SIMbig/stackf -i 11 -m 2 -M 13 -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Identifying unique stacks; file 12 of 12 [sample_AGTTAA]\r\n",
" /usr/local/bin//ustacks -t fastq -f SIMbig/stackf/sample_AGTTAA.fq -o SIMbig/stackf -i 12 -m 2 -M 13 -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Generating catalog...\r\n",
" /usr/local/bin//cstacks -b 1 -o SIMbig/stackf -s SIMbig/stackf/sample_GTTAGG -s SIMbig/stackf/sample_ATGTGT -s SIMbig/stackf/sample_TGGTGG -s SIMbig/stackf/sample_TGATGA -s SIMbig/stackf/sample_ATTTGG -s SIMbig/stackf/sample_TGGGTG -s SIMbig/stackf/sample_ATTGAG -s SIMbig/stackf/sample_GGAGAA -s SIMbig/stackf/sample_AGTTAT -s SIMbig/stackf/sample_TAGTTA -s SIMbig/stackf/sample_GAAATA -s SIMbig/stackf/sample_AGTTAA -n 15 -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Matching RAD-Tags to catalog; file 1 of 12 [sample_GTTAGG]\r\n",
" /usr/local/bin//sstacks -b 1 -c SIMbig/stackf/batch_1 -s SIMbig/stackf/sample_GTTAGG -o SIMbig/stackf -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Matching RAD-Tags to catalog; file 2 of 12 [sample_ATGTGT]\r\n",
" /usr/local/bin//sstacks -b 1 -c SIMbig/stackf/batch_1 -s SIMbig/stackf/sample_ATGTGT -o SIMbig/stackf -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Matching RAD-Tags to catalog; file 3 of 12 [sample_TGGTGG]\r\n",
" /usr/local/bin//sstacks -b 1 -c SIMbig/stackf/batch_1 -s SIMbig/stackf/sample_TGGTGG -o SIMbig/stackf -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Matching RAD-Tags to catalog; file 4 of 12 [sample_TGATGA]\r\n",
" /usr/local/bin//sstacks -b 1 -c SIMbig/stackf/batch_1 -s SIMbig/stackf/sample_TGATGA -o SIMbig/stackf -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Matching RAD-Tags to catalog; file 5 of 12 [sample_ATTTGG]\r\n",
" /usr/local/bin//sstacks -b 1 -c SIMbig/stackf/batch_1 -s SIMbig/stackf/sample_ATTTGG -o SIMbig/stackf -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Matching RAD-Tags to catalog; file 6 of 12 [sample_TGGGTG]\r\n",
" /usr/local/bin//sstacks -b 1 -c SIMbig/stackf/batch_1 -s SIMbig/stackf/sample_TGGGTG -o SIMbig/stackf -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Matching RAD-Tags to catalog; file 7 of 12 [sample_ATTGAG]\r\n",
" /usr/local/bin//sstacks -b 1 -c SIMbig/stackf/batch_1 -s SIMbig/stackf/sample_ATTGAG -o SIMbig/stackf -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Matching RAD-Tags to catalog; file 8 of 12 [sample_GGAGAA]\r\n",
" /usr/local/bin//sstacks -b 1 -c SIMbig/stackf/batch_1 -s SIMbig/stackf/sample_GGAGAA -o SIMbig/stackf -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Matching RAD-Tags to catalog; file 9 of 12 [sample_AGTTAT]\r\n",
" /usr/local/bin//sstacks -b 1 -c SIMbig/stackf/batch_1 -s SIMbig/stackf/sample_AGTTAT -o SIMbig/stackf -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Matching RAD-Tags to catalog; file 10 of 12 [sample_TAGTTA]\r\n",
" /usr/local/bin//sstacks -b 1 -c SIMbig/stackf/batch_1 -s SIMbig/stackf/sample_TAGTTA -o SIMbig/stackf -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Matching RAD-Tags to catalog; file 11 of 12 [sample_GAAATA]\r\n",
" /usr/local/bin//sstacks -b 1 -c SIMbig/stackf/batch_1 -s SIMbig/stackf/sample_GAAATA -o SIMbig/stackf -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Matching RAD-Tags to catalog; file 12 of 12 [sample_AGTTAA]\r\n",
" /usr/local/bin//sstacks -b 1 -c SIMbig/stackf/batch_1 -s SIMbig/stackf/sample_AGTTAA -o SIMbig/stackf -p 12 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Calculating population-level summary statistics\r\n",
"/usr/local/bin//populations -b 1 -P SIMbig/stackf -s -t 12 -m 6 2>&1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 221.47 s, sys: 34.13 s, total: 255.60 s"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Wall time: 73780.51 s\n"
]
}
],
"prompt_number": 75
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Measure stats for big simulation analysis"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"infile = open(\"SIMbig/stackf/batch_1.haplotypes.tsv\").readlines()\n",
"tcov = [line.split(\"\\t\")[1] for line in infile[1:]]\n",
"print mean(map(int,tcov)), 'mean coverage stacks'\n",
"\n",
"infile = open(\"SIMbig/big0/stats/big.c85d6m2pN.stats\").readlines()[26:38]\n",
"tcov = []\n",
"for line in infile:\n",
" a = line.split(\"\\t\")\n",
" tcov += [int(a[0])]*int(a[1]) \n",
"print mean(tcov), 'mean coverage pyrad'\n",
"\n",
"infile = open(\"SIMbig/big4/stats/big.c85d6m2pN.stats\").readlines()[26:38]\n",
"tcov = []\n",
"for line in infile:\n",
" a = line.split(\"\\t\")\n",
" tcov += [int(a[0])]*int(a[1]) \n",
"print mean(tcov), 'mean coverage pyrad hierarchical'"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4.4786988998 mean coverage stacks\n",
"4.47936475229 mean coverage pyrad\n",
"2.8736130253 mean coverage pyrad hierarchical\n"
]
}
],
"prompt_number": 76
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"infile = open(\"SIMbig/big0/stats/big.c85d6m2pN.stats\").readlines()\n",
"phigh = [int(infile[j].strip().split(\"\\t\")[1]) for j in range(26,38)]\n",
"P0 = [sum(phigh)-sum(phigh[:i-1]) for i in range(1,13)]\n",
"\n",
"infile = open(\"SIMbig/big4/stats/big.c85d6m2pN.stats\").readlines()\n",
"phigh = [int(infile[j].strip().split(\"\\t\")[1]) for j in range(26,38)]\n",
"P4 = [sum(phigh)-sum(phigh[:i-1]) for i in range(1,13)]\n",
"\n",
"lines = open(\"SIMbig/stackf/batch_1.haplotypes.tsv\").readlines()\n",
"cnts = [int(field.strip().split(\"\\t\")[1]) for field in lines[1:]]\n",
"sno = [cnts.count(i) for i in range(1,13)]\n",
"S0 = [int(sum(sno)-sum(sno[:i-1])) for i in range(1,13)]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 77
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print array([P0,P4,S0]).transpose()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[[ 135884. 85618. 135885.]\n",
" [ 121263. 45613. 121228.]\n",
" [ 106282. 45613. 106261.]\n",
" [ 82504. 45613. 82487.]\n",
" [ 59113. 5609. 59106.]\n",
" [ 43673. 5609. 43670.]\n",
" [ 29057. 5609. 29054.]\n",
" [ 16368. 5609. 16367.]\n",
" [ 8528. 285. 8528.]\n",
" [ 4206. 285. 4206.]\n",
" [ 1511. 285. 1511.]\n",
" [ 285. 285. 285.]]\n"
]
}
],
"prompt_number": 78
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"100*(253./500000)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 79,
"text": [
"0.050600000000000006"
]
}
],
"prompt_number": 79
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment