dereneaton/tutorial_ddRAD_3.0.4.ipynb

## tutorial_ddRAD_3.0.4.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              tutorial_ddRAD_3.0.4.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## tutorial_pairddRAD.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              tutorial_pairddRAD.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## tutorial_pairddRAD_2.16.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              tutorial_pairddRAD_2.16.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## tutorial_pairddRAD_3.0-merged.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              tutorial_pairddRAD_3.0-merged.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## tutorial_pairddRAD_3.0.4-merged.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              tutorial_pairddRAD_3.0.4-merged.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## tutorial_pairddRAD_3.0.4.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              tutorial_pairddRAD_3.0.4.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## tutorial_pairddRAD_3.0.ipynb
{
 "metadata": {
  "name": "",
  "signature": "sha256:9bbfd6454f4ce02b4db4acbbd8201403cb3b130aafed252e6a44e308d458577a"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Tutorial: paired-end ddRAD analysis\n",
      "### _pyRAD_ v.3.0.1"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "-----------------------------  \n",
      "\n",
      "###__Topics__:  \n",
      "\n",
      " + Setup of params file for paired-end ddRAD (pairddrad)\n",
      " + Assemble simulated data set\n",
      " + Visualize ideal results on simulated data\n",
      "\n",
      "-----------------------------  "
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "What is ddRAD?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The double digest library preparation method was developed and described by [Peterson et al. 2012](http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0037135). Here I will be talking about __ _paired-end ddRAD_ __ data and describe my recommendations for how to setup a params file to analyze them in _pyRAD_. "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A ddRAD library is prepared by cutting genomic DNA with two different restriction enzymes and selecting the intervening fragments that are within a certain size window. These will contain overhangs from the respective cut sites on either side. One side will have a barcode+illumina adapter ligated, and the other end will have only the reverse Illumina adapter ligated. The first reads may come in one or multiple files with \"\\_R1\\_\" in the name, and the second reads are in a separate file/s with \"\\_R2\\_\". Second read files will contain the corresponding pair from the first read files in the exact same order.\n",
      "\n",
      "-------------------"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![alt](https://dl.dropboxusercontent.com/u/2538935/PYRAD_TUTORIALS/figures/diag_ddradpair.svg)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "---------------------  \n",
      "\n",
      "Your data will likely come to you non-demultiplexed (meaning not sorted by which individual the reads belong to). You can demultiplex the reads according to their barcodes using _pyRAD_ or separate software.  \n",
      "\n",
      "A number of programs are available to check for overlap of paired end reads, and you can run your data through these programs before being input to _pyRAD_. At the time of writing this, I recommend the software PEAR (https://github.com/xflouris/PEAR), which I demonstrate on pairddrad data in a separate [tutorial here](http://nbviewer.ipython.org/gist/dereneaton/dc6241083c912519064e/tutorial_pairddRAD_3.0-merged.ipynb).\n"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "The Example data"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "For this tutorial I simulated paired-end ddRAD loci on a 12 taxon species tree, shown below. You can download the data using the script below and assemble these data by following along with all of the instructions."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![](https://dl.dropboxusercontent.com/u/2538935/PYRAD_TUTORIALS/figures/fig_tree4sims.svg)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "## download the data\n",
      "wget -q http://www.dereneaton.com/downloads/simpairddrads.zip simpairddrads.zip\n",
      "## unzip the data\n",
      "unzip simpairddrads.zip"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Archive:  simpairddrads.zip\n",
        "  inflating: simpairddrads_R1_.fastq.gz  \n",
        "  inflating: simpairddrads_R2_.fastq.gz  \n",
        "  inflating: simpairddrads.barcodes  \n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "---------------   \n",
      "# Assembling the data set with _pyRAD_"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We first create an empty params.txt input file for the _pyRAD_ analysis.  \n",
      "The following command will create a template which we will fill in with all relevant parameter settings for the analysis."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "pyrad -n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "\tnew params.txt file created\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "-------------  \n",
      "\n",
      "Take a look at the default options. Each line designates a parameter, and contains a \"##\" symbol after which comments can be added, and which includes a description of the parameter. The affected step for each parameter is shown in parentheses. The first 12 parameters are required. Numbers 13-36 are optional."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "cat params.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "==** parameter inputs for pyRAD version 3.0.0  **==========================  affected step ==\n",
        "./                          ## 1. Working directory                                 (all)\n",
        "./*.fastq.gz                ## 2. Loc. of non-demultiplexed files (if not line 16)  (s1)\n",
        "./*.barcodes                ## 3. Loc. of barcode file (if not line 16)             (s1)\n",
        "vsearch                     ## 4. command (or path) to call vsearch (or usearch)    (s3,s6)\n",
        "muscle                      ## 5. command (or path) to call muscle                  (s3,s7)\n",
        "TGCAG                       ## 6. Restriction overhang (e.g., C|TGCAG -> TGCAG)     (s1,s2)\n",
        "2                           ## 7. N processors (parallel)                           (all)\n",
        "6                           ## 8. Mindepth: min coverage for a cluster              (s4,s5)\n",
        "4                           ## 9. NQual: max # sites with qual < 20 (line 18)       (s2)\n",
        ".88                         ## 10. Wclust: clustering threshold as a decimal         (s3,s6)\n",
        "rad                         ## 11. Datatype: rad,gbs,ddrad,pairgbs,pairddrad,merge   (all)\n",
        "4                           ## 12. MinCov: min samples in a final locus             (s7)\n",
        "3                           ## 13. MaxSH: max inds with shared hetero site          (s7)\n",
        "c88d6m4p3                   ## 14. Prefix name for final output (no spaces)         (s7)\n",
        "==== optional params below this line ===================================  affected step ==\n",
        "                       ## 15.opt.: select subset (prefix* only selector)            (s2-s7)\n",
        "                       ## 16.opt.: add-on (outgroup) taxa (list or prefix*)         (s6,s7)\n",
        "                       ## 17.opt.: exclude taxa (list or prefix*)                   (s7)\n",
        "                       ## 18.opt.: loc. of de-multiplexed data                      (s2)\n",
        "                       ## 19.opt.: maxM: N mismatches in barcodes (def= 1)          (s1)\n",
        "                       ## 20.opt.: phred Qscore offset (def= 33)                    (s2)\n",
        "                       ## 21.opt.: filter: def=0=NQual 1=NQual+adapters. 2=strict   (s2)\n",
        "                       ## 22.opt.: a priori E,H (def= 0.001,0.01, if not estimated) (s5)\n",
        "                       ## 23.opt.: maxN: max Ns in a cons seq (def=5)               (s5)\n",
        "                       ## 24.opt.: maxH: max heterozyg. sites in cons seq (def=5)   (s5)\n",
        "                       ## 25.opt.: ploidy: max alleles in cons seq (def=2;see docs) (s4,s5)\n",
        "                       ## 26.opt.: maxSNPs: (def=100). Paired (def=100,100)         (s7)\n",
        "                       ## 27.opt.: maxIndels: within-clust,across-clust (def. 3,99) (s3,s7)\n",
        "                       ## 28.opt.: random number seed (def. 112233)              (s3,s6,s7)\n",
        "                       ## 29.opt.: trim overhang left,right on final loci, def(0,0) (s7)\n",
        "                       ## 30.opt.: output formats: p,n,a,s,v,u,t,m,k,g,* (see docs) (s7)\n",
        "                       ## 31.opt.: call maj. consens if depth < stat. limit (def=0) (s5)\n",
        "                       ## 32.opt.: keep trimmed reads (def=0). Enter min length.    (s2)\n",
        "                       ## 33.opt.: max stack size (int), def= max(500,mean+2*SD)    (s3)\n",
        "                       ## 34.opt.: minDerep: exclude dereps with <= N copies, def=1 (s3)\n",
        "                       ## 35.opt.: use hierarchical clustering (def.=0, 1=yes)      (s6)\n",
        "                       ## 36.opt.: repeat masking (def.=1='dust' method, 0=no)      (s3,s6)\n",
        "                       ## 37.opt.: vsearch threads per job (def.=6; see docs)       (s3,s6)\n",
        "==== optional: list group/clade assignments below this line (see docs) ==================\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "---------------   \n",
      "\n",
      "### Edit the params file"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "I will use the script below to substitute new values, but you should simply __use any text editor__ to make changes. For this analysis I made the following changes from the defaults:  \n",
      "\n",
      "--------------------  \n",
      "\n",
      "     6. set the two restriction enzymes used to generate the ddRAD data\n",
      "     10. lowered clustering threshold to .85\n",
      "     11. set datatype to pairddrad\n",
      "     14. changed the output name prefix\n",
      "     19. mismatches for demulitiplexing set to 0, exact match.\n",
      "     24. Raised maxH. Lower is better for filtering paralogs.\n",
      "     30. added additional output formats (e.g., nexus,SNPs,STRUCTURE)\n",
      "\n",
      "--------------------  \n",
      "    "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "sed -i '/## 6. /c\\TGCAG,AATT                 ## 6. cutsites... ' ./params.txt\n",
      "sed -i '/## 10. /c\\.85                       ## 10. lowered clust thresh' ./params.txt\n",
      "sed -i '/## 11. /c\\pairddrad                 ## 11. datatype... ' ./params.txt\n",
      "sed -i '/## 14. /c\\c85d6m4p3                 ## 14. prefix name ' ./params.txt\n",
      "sed -i '/## 19./c\\0                     ## 19. errbarcode... ' ./params.txt\n",
      "sed -i '/## 24./c\\10                    ## 24. maxH... ' ./params.txt\n",
      "sed -i '/## 30./c\\*                     ## 30. outformats... ' ./params.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "-----------------  \n",
      "\n",
      "Let's take a look at the edited params.txt file\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "cat params.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "==** parameter inputs for pyRAD version 3.0.0  **==========================  affected step ==\n",
        "./                          ## 1. Working directory                                 (all)\n",
        "./*.fastq.gz                ## 2. Loc. of non-demultiplexed files (if not line 16)  (s1)\n",
        "./*.barcodes                ## 3. Loc. of barcode file (if not line 16)             (s1)\n",
        "vsearch                     ## 4. command (or path) to call vsearch (or usearch)    (s3,s6)\n",
        "muscle                      ## 5. command (or path) to call muscle                  (s3,s7)\n",
        "TGCAG,AATT                 ## 6. cutsites... \n",
        "2                           ## 7. N processors (parallel)                           (all)\n",
        "6                           ## 8. Mindepth: min coverage for a cluster              (s4,s5)\n",
        "4                           ## 9. NQual: max # sites with qual < 20 (line 18)       (s2)\n",
        ".85                       ## 10. lowered clust thresh\n",
        "pairddrad                 ## 11. datatype... \n",
        "4                           ## 12. MinCov: min samples in a final locus             (s7)\n",
        "3                           ## 13. MaxSH: max inds with shared hetero site          (s7)\n",
        "c85d6m4p3                 ## 14. prefix name \n",
        "==== optional params below this line ===================================  affected step ==\n",
        "                       ## 15.opt.: select subset (prefix* only selector)            (s2-s7)\n",
        "                       ## 16.opt.: add-on (outgroup) taxa (list or prefix*)         (s6,s7)\n",
        "                       ## 17.opt.: exclude taxa (list or prefix*)                   (s7)\n",
        "                       ## 18.opt.: loc. of de-multiplexed data                      (s2)\n",
        "0                     ## 19. errbarcode... \n",
        "                       ## 20.opt.: phred Qscore offset (def= 33)                    (s2)\n",
        "                       ## 21.opt.: filter: def=0=NQual 1=NQual+adapters. 2=strict   (s2)\n",
        "                       ## 22.opt.: a priori E,H (def= 0.001,0.01, if not estimated) (s5)\n",
        "                       ## 23.opt.: maxN: max Ns in a cons seq (def=5)               (s5)\n",
        "10                    ## 24. maxH... \n",
        "                       ## 25.opt.: ploidy: max alleles in cons seq (def=2;see docs) (s4,s5)\n",
        "                       ## 26.opt.: maxSNPs: (def=100). Paired (def=100,100)         (s7)\n",
        "                       ## 27.opt.: maxIndels: within-clust,across-clust (def. 3,99) (s3,s7)\n",
        "                       ## 28.opt.: random number seed (def. 112233)              (s3,s6,s7)\n",
        "                       ## 29.opt.: trim overhang left,right on final loci, def(0,0) (s7)\n",
        "*                     ## 30. outformats... \n",
        "                       ## 31.opt.: call maj. consens if depth < stat. limit (def=0) (s5)\n",
        "                       ## 32.opt.: keep trimmed reads (def=0). Enter min length.    (s2)\n",
        "                       ## 33.opt.: max stack size (int), def= max(500,mean+2*SD)    (s3)\n",
        "                       ## 34.opt.: minDerep: exclude dereps with <= N copies, def=1 (s3)\n",
        "                       ## 35.opt.: use hierarchical clustering (def.=0, 1=yes)      (s6)\n",
        "                       ## 36.opt.: repeat masking (def.=1='dust' method, 0=no)      (s3,s6)\n",
        "                       ## 37.opt.: vsearch threads per job (def.=6; see docs)       (s3,s6)\n",
        "==== optional: list group/clade assignments below this line (see docs) ==================\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Step 1: De-multiplexing the data"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "----------------  \n",
      "  \n",
      "Four examples of acceptable input file name formats for paired-end data: \n",
      "\n",
      "       1. xxxx_R1_001.fastq      xxxx_R2_001.fastq\n",
      "       2. xxxx_R1_001.fastq.gz   xxxx_R2_001.fastq.gz\n",
      "       3. xxxx_R1_001.fq.gz      xxxx_R2_001.fq.gz\n",
      "       4. xxxx_R1_001.fq         xxxx_R2_001.fq\n",
      "\n",
      "The file ending can be .fastq, .fq, or .gz.  \n",
      "There should be a unique name or number shared by each pair and the characters \"\\_R1\\_\" and \"\\_R2\\_\"  \n",
      "For every file name with \"\\_R1\\_\" there should be a corresponding \"\\_R2\\_\" file. \n",
      "\n",
      "If your data are _already_ demultiplexed skip step 1 and see step 2 below.  \n",
      "\n",
      "-----------------  \n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "pyrad -p params.txt -s 1"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "\n",
        "\n",
        "     ------------------------------------------------------------\n",
        "      pyRAD : RADseq for phylogenetics & introgression analyses\n",
        "     ------------------------------------------------------------\n",
        "\n",
        "\n",
        "\tstep 1: sorting reads by barcode\n",
        "\t ."
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we can look at the stats output for this below:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "cat stats/s1.sorting.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "file    \tNreads\tcut_found\tbar_matched\n",
        "simpairddrads_.fastq.gz\t240000\t240000\t240000\n",
        "\n",
        "\n",
        "sample\ttrue_bar\tobs_bars\tN_obs\n",
        "2G0    \tAATATT    \tAATATT\t20000   \n",
        "1D0    \tATATGG    \tATATGG\t20000   \n",
        "1B0    \tATGAAG    \tATGAAG\t20000   \n",
        "1A0    \tCATCAT    \tCATCAT\t20000   \n",
        "3L0    \tGAGTTG    \tGAGTTG\t20000   \n",
        "1C0    \tGTATAA    \tGTATAA\t20000   \n",
        "3K0    \tGTGGAA    \tGTGGAA\t20000   \n",
        "2H0    \tGTTTAT    \tGTTTAT\t20000   \n",
        "3I0    \tTATATA    \tTATATA\t20000   \n",
        "2E0    \tTGAAAG    \tTGAAAG\t20000   \n",
        "3J0    \tTGTAGT    \tTGTAGT\t20000   \n",
        "2F0    \tTTGGTT    \tTTGGTT\t20000   \n",
        "\n",
        "nomatch  \t_            \t0\n"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The de-multiplexed reads are written to a new file for each individual in a new directory created within your working directory called fastq/"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash \n",
      "ls fastq/"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1A0_R1.fq.gz\n",
        "1A0_R2.fq.gz\n",
        "1B0_R1.fq.gz\n",
        "1B0_R2.fq.gz\n",
        "1C0_R1.fq.gz\n",
        "1C0_R2.fq.gz\n",
        "1D0_R1.fq.gz\n",
        "1D0_R2.fq.gz\n",
        "2E0_R1.fq.gz\n",
        "2E0_R2.fq.gz\n",
        "2F0_R1.fq.gz\n",
        "2F0_R2.fq.gz\n",
        "2G0_R1.fq.gz\n",
        "2G0_R2.fq.gz\n",
        "2H0_R1.fq.gz\n",
        "2H0_R2.fq.gz\n",
        "3I0_R1.fq.gz\n",
        "3I0_R2.fq.gz\n",
        "3J0_R1.fq.gz\n",
        "3J0_R2.fq.gz\n",
        "3K0_R1.fq.gz\n",
        "3K0_R2.fq.gz\n",
        "3L0_R1.fq.gz\n",
        "3L0_R2.fq.gz\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "An individual file will look like below:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "## FIRST READS file\n",
      "less fastq/1A0_R1.fq.gz | head -n 12 | cut -c 1-80"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "@lane1_fakedata0_R1_0 1:N:0:\n",
        "TGCAGTTACCTACTGTGATCGCCTAGACGGCAGTAAAACCGATGAGGCCCTCTCTAGAGTAACGGCTGAACTTATCCTAC\n",
        "+\n",
        "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\n",
        "@lane1_fakedata0_R1_1 1:N:0:\n",
        "TGCAGTTACCTACTGTGATCGCCTAGACGGCAGTAAAACCGATGAGGCCCTCTCTAGAGTAACGGCTGAACTTATCCTAC\n",
        "+\n",
        "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\n",
        "@lane1_fakedata0_R1_2 1:N:0:\n",
        "TGCAGTTACCTACTGTGATCGCCTAGACGGCAGTAAAACCGATGAGGCCCTCTCTAGAGTAACGGCTGAACTTATCCTAC\n",
        "+\n",
        "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\n"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "## SECOND READS file\n",
      "less fastq/1A0_R2.fq.gz | head -n 12 | cut -c 1-80"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "@lane1_fakedata0_R2_0 1:N:0:\n",
        "AATTATTAACCCAGACGAGTTGTCAGAGAGTGCTTTACCCCTTCTACGGTCCTCTGGGAGATTCGTGTGATTGTACTGGT\n",
        "+\n",
        "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\n",
        "@lane1_fakedata0_R2_1 1:N:0:\n",
        "AATTATTAACCCAGACGAGTTGTCAGAGAGTGCTTTACCCCTTCTACGGTCCTCTGGGAGATTCGTGTGATTGTACTGGT\n",
        "+\n",
        "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\n",
        "@lane1_fakedata0_R2_2 1:N:0:\n",
        "AATTATTAACCCAGACGAGTTGTCAGAGAGTGCTTTACCCCTTCTACGGTCCTCTGGGAGATTCGTGTGATTGTACTGGT\n",
        "+\n",
        "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\n"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "-------------------------\n",
      "\n",
      "The reads were sorted into a separate file for the first (R1) and second (R2) reads for each individual.  \n",
      "\n",
      "__If your data were previously de-multiplexed__ you need the following things before step 2:  \n",
      "\n",
      "+  your sorted file names should be formatted similar to above, but with sample names substituted for 1A0, 1A1, etc.   \n",
      "+  the files can be zipped (.gz) or not (.fq or .fastq).  \n",
      "+  the barcode should be removed (not on left side of reads)  \n",
      "+  the restriction site should _not_ be removed, but if it is, enter a '@' symbol before the location of your sorted data.\n",
      "\n",
      "+  __Enter on line 18 of the params file the location of your sorted data.__\n",
      "\n",
      "--------------------  \n"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Step 2: Quality filtering"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Next we apply the quality filtering.  \n",
      "\n",
      "+ We left the quality filter (line 21) at the default value of 0, meaning that it will only filter based on quality scores but not look for the presence of Illumina adapters. Low quality sites are converted to Ns, and any locus with more than X number of Ns is discarded, where X is the number set on line 9 of the params file. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "pyrad -p params.txt -s 2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "\n",
        "\n",
        "     ------------------------------------------------------------\n",
        "      pyRAD : RADseq for phylogenetics & introgression analyses\n",
        "     ------------------------------------------------------------\n",
        "\n",
        "\n",
        "\tstep 2: quality filtering \n",
        "\t............"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Statistics for the number of reads that passed filtering can be found in the stats/ directory. \n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash \n",
      "cat stats/s2.rawedit.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "sample\tNreads\texclude\ttrimmed\tpassed\n",
        "3I0\t20000\t0\t0\t20000\n",
        "1B0\t20000\t0\t0\t20000\n",
        "2H0\t20000\t0\t0\t20000\n",
        "3J0\t20000\t0\t0\t20000\n",
        "3K0\t20000\t0\t0\t20000\n",
        "2G0\t20000\t0\t0\t20000\n",
        "2E0\t20000\t0\t0\t20000\n",
        "1C0\t20000\t0\t0\t20000\n",
        "1D0\t20000\t0\t0\t20000\n",
        "2F0\t20000\t0\t0\t20000\n",
        "3L0\t20000\t0\t0\t20000\n",
        "1A0\t20000\t0\t0\t20000\n",
        "\n",
        "    Nreads = total number of reads for a sample\n",
        "    exclude = reads that were excluded\n",
        "    trimmed = reads that had adapter trimmed but were kept\n",
        "    passed = total kept reads\n",
        "    \n"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The filtered data files are converted to fasta format and written to a directory called edits/.  I show this below:\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "ls edits/"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1A0.edit\n",
        "1B0.edit\n",
        "1C0.edit\n",
        "1D0.edit\n",
        "2E0.edit\n",
        "2F0.edit\n",
        "2G0.edit\n",
        "2H0.edit\n",
        "3I0.edit\n",
        "3J0.edit\n",
        "3K0.edit\n",
        "3L0.edit\n"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "------------  \n",
      "\n",
      "Here you can see that the first and second reads have been concatenated into a single read separated by 'nnn' in these files, and the read quality scores have now been discarded. \n",
      "\n",
      "----------  \n",
      "\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "head -n 8 edits/1A0.edit"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ">1A0_0_pair\n",
        "TGCAGTTACCTACTGTGATCGCCTAGACGGCAGTAAAACCGATGAGGCCCTCTCTAGAGTAACGGCTGAACTTATCCTACCGAATGGTTGCGAnnnnTCATAAAACATTACTCTAAGACCAGTACAATCACACGAATCTCCCAGAGGACCGTAGAAGGGGTAAAGCACTCTCTGACAACTCGTCTGGGTTAATAATT\n",
        ">1A0_1_pair\n",
        "TGCAGTTACCTACTGTGATCGCCTAGACGGCAGTAAAACCGATGAGGCCCTCTCTAGAGTAACGGCTGAACTTATCCTACCGAATGGTTGCGAnnnnTCATAAAACATTACTCTAAGACCAGTACAATCACACGAATCTCCCAGAGGACCGTAGAAGGGGTAAAGCACTCTCTGACAACTCGTCTGGGTTAATAATT\n",
        ">1A0_2_pair\n",
        "TGCAGTTACCTACTGTGATCGCCTAGACGGCAGTAAAACCGATGAGGCCCTCTCTAGAGTAACGGCTGAACTTATCCTACCGAATGGTTGCGAnnnnTCATAAAACATTACTCTAAGACCAGTACAATCACACGAATCTCCCAGAGGACCGTAGAAGGGGTAAAGCACTCTCTGACAACTCGTCTGGGTTAATAATT\n",
        ">1A0_3_pair\n",
        "TGCAGTTACCTACTGTGATCGCCTAGACGGCAGTAAAACCGATGAGGCCCTCTCTAGAGTAACGGCTGAACTTATCCTACCGAATGGTTGCGAnnnnTCATAAAACATTACTCTAAGACCAGTACAATCACACGAATCTCCCAGAGGACCGTAGAAGGGGTAAAGCACTCTCTGACAACTCGTCTGGGTTAATAATT\n"
       ]
      }
     ],
     "prompt_number": 14
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "--------------   \n",
      "\n",
      "## Step 3: Within-sample clustering"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "From here you have two options for how to proceed. The first is to treat these concatenated reads like single end ddRAD sequences and cluster them as concatenated reads. To do this change the datatype option in the params folder from \"pairddrad\" to \"ddrad\", and proceed with steps 3-7. It will detect and remove the 'nnnn' seperator and cluster the long reads. \n",
      "\n",
      "The more complex option, and the one I prefer, is to cluster the reads separately, which I call the \"split clustering\" method. This will do the following:  \n",
      "\n",
      "+ de-replicate concatenated reads \n",
      "+ split them and cluster first reads only \n",
      "+ match second reads back to firsts\n",
      "+ align second read clusters separately from first-read clusters\n",
      "+ discard loci for which second reads align poorly (incomplete digestion or paralogs)\n",
      "\n",
      "This is illustrated graphically below:"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![](https://dl.dropboxusercontent.com/u/2538935/PYRAD_TUTORIALS/figures/pairclustermethod.svg)"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "The screen captures below show split clustering performed on an empirical paired ddRAD data set: \n"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "The good aligned clusters are written to a .clustS.gz file"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![](https://dl.dropboxusercontent.com/u/2538935/PYRAD_TUTORIALS/figures/goodpairs.png)"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "And the poor aligned pairs which are excluded from further analysis are written to a separate file ending with .badpairs.gz."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![](https://dl.dropboxusercontent.com/u/2538935/PYRAD_TUTORIALS/figures/badpairs.png)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "--------------  \n",
      "\n",
      "The benefit of this method over clustering concatenated first+second reads is that second reads can be retained that are either more divergent than the clustering threshold, or which contain many low quality base calls (Ns), thus potentially yielding more SNPs. Also, for very large data sets it is _much_ faster to cluster shorter length reads. The drawbacks are that it discards sequences for which the first and second reads do not appear to be from the same DNA fragment, whereas the concatenated clustering method _might_ cluster them separately. And also, even a single incompletely digested second read can cause an otherwise good cluster to be discarded."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The split clustering method filters the alignment of second reads based on the number of indels in the alignment. This number can be set on line 27 of the params file. The default values are 3,6,99,99. Meaning for within-sample clusters we allow 3 indels in first read clusters and 6 in second read clusters. For across-sample clusters we allow 99 in first read clusters and 99 in second read clusters. If only two numbers are set (e.g., 3,10) this is equivalent to 3,3,10,10. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "## we are using the split-clustering method since the \n",
      "## datatype is still set to pairddrad\n",
      "pyrad -p params.txt -s 3 "
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "\n",
        "\n",
        "     ------------------------------------------------------------\n",
        "      pyRAD : RADseq for phylogenetics & introgression analyses\n",
        "     ------------------------------------------------------------\n",
        "\n",
        "\n",
        "\tde-replicating files for clustering...\n",
        "\n",
        "\tstep 3: within-sample clustering of 12 samples at \n",
        "\t        '.85' similarity. Running 2 parallel jobs\n",
        "\t \twith up to 6 threads per job. If needed, \n",
        "\t\tadjust to avoid CPU and MEM limits\n",
        "\n",
        "\tsample 2E0 finished, 1000 loci\n",
        "\tsample 1A0 finished, 1000 loci\n",
        "\tsample 1C0 finished, 1000 loci\n",
        "\tsample 3J0 finished, 1000 loci\n",
        "\tsample 3I0 finished, 1000 loci\n",
        "\tsample 1B0 finished, 1000 loci\n",
        "\tsample 3L0 finished, 1000 loci\n",
        "\tsample 2H0 finished, 1000 loci\n",
        "\tsample 2F0 finished, 1000 loci\n",
        "\tsample 3K0 finished, 1000 loci\n",
        "\tsample 1D0 finished, 1000 loci\n",
        "\tsample 2G0 finished, 1000 loci\n"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "head -n 23 stats/s3.clusters.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "taxa\ttotal\tdpt.me\tdpt.sd\td>5.tot\td>5.me\td>5.sd\tbadpairs\n",
        "1A0\t1000\t20.0\t0.0\t1000\t20.0\t0.0\t0\n",
        "1B0\t1000\t20.0\t0.0\t1000\t20.0\t0.0\t0\n",
        "1C0\t1000\t20.0\t0.0\t1000\t20.0\t0.0\t0\n",
        "1D0\t1000\t20.0\t0.0\t1000\t20.0\t0.0\t0\n",
        "2E0\t1000\t20.0\t0.0\t1000\t20.0\t0.0\t0\n",
        "2F0\t1000\t20.0\t0.0\t1000\t20.0\t0.0\t0\n",
        "2G0\t1000\t20.0\t0.0\t1000\t20.0\t0.0\t0\n",
        "2H0\t1000\t20.0\t0.0\t1000\t20.0\t0.0\t0\n",
        "3I0\t1000\t20.0\t0.0\t1000\t20.0\t0.0\t0\n",
        "3J0\t1000\t20.0\t0.0\t1000\t20.0\t0.0\t0\n",
        "3K0\t1000\t20.0\t0.0\t1000\t20.0\t0.0\t0\n",
        "3L0\t1000\t20.0\t0.0\t1000\t20.0\t0.0\t0\n",
        "\n",
        "    ## total = total number of clusters, including singletons\n",
        "    ## dpt.me = mean depth of clusters\n",
        "    ## dpt.sd = standard deviation of cluster depth\n",
        "    ## >N.tot = number of clusters with depth greater than N\n",
        "    ## >N.me = mean depth of clusters with depth greater than N\n",
        "    ## >N.sd = standard deviation of cluster depth for clusters with depth greater than N\n",
        "    ## badpairs = mismatched 1st & 2nd reads (only for paired ddRAD data)\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Steps 4 & 5: Consensus base calling"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We next make consensus base calls for each cluster within each individual. First we estimate the error rate and heterozygosity within each sample:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "pyrad -p params.txt -s 4"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "\n",
        "\n",
        "     ------------------------------------------------------------\n",
        "      pyRAD : RADseq for phylogenetics & introgression analyses\n",
        "     ------------------------------------------------------------\n",
        "\n",
        "\n",
        "\tstep 4: estimating error rate and heterozygosity\n",
        "\t............"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Calling consensus sequences applies a number of filters, as listed in the params file, to remove potential paralogs or highly repetitive markers from the data set. For paired-end data the maxN filter only applies to first reads, since we don't mind allowing low quality base calls in second reads. The maxH filter applies to both reads together. The diploid filter (only allow 2 haplotypes in a consensus sequence) also applies to the two reads together."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "pyrad -p params.txt -s 5"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "\n",
        "\n",
        "     ------------------------------------------------------------\n",
        "      pyRAD : RADseq for phylogenetics & introgression analyses\n",
        "     ------------------------------------------------------------\n",
        "\n",
        "\n",
        "\tstep 5: created consensus seqs for 12 samples, using H=0.00141 E=0.00050\n",
        "\t............"
       ]
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "cat stats/s5.consens.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "taxon\tnloci\tf1loci\tf2loci\tnsites\tnpoly\tpoly\n",
        "2G0\t1000\t1000\t1000\t183006\t287\t0.0015683\n",
        "3I0\t1000\t1000\t1000\t183003\t268\t0.0014645\n",
        "2H0\t1000\t1000\t1000\t183003\t272\t0.0014863\n",
        "3K0\t1000\t1000\t1000\t183008\t266\t0.0014535\n",
        "3J0\t1000\t1000\t1000\t183008\t274\t0.0014972\n",
        "1B0\t1000\t1000\t1000\t183010\t269\t0.0014699\n",
        "1D0\t1000\t1000\t1000\t183008\t245\t0.0013387\n",
        "2E0\t1000\t1000\t1000\t183009\t232\t0.0012677\n",
        "1C0\t1000\t1000\t1000\t183008\t267\t0.001459\n",
        "2F0\t1000\t1000\t1000\t183004\t264\t0.0014426\n",
        "1A0\t1000\t1000\t1000\t183004\t245\t0.0013388\n",
        "3L0\t1000\t1000\t1000\t183004\t241\t0.0013169\n",
        "\n",
        "    ## nloci = number of loci\n",
        "    ## f1loci = number of loci with >N depth coverage\n",
        "    ## f2loci = number of loci with >N depth and passed paralog filter\n",
        "    ## nsites = number of sites across f loci\n",
        "    ## npoly = number of polymorphic sites in nsites\n",
        "    ## poly = frequency of polymorphic sites\n"
       ]
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Step 6: Across-sample clustering"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This step clusters consensus sequences across samples. It can take a very long time for very large data sets. If run normally it will print its progress to the screen so you can judge how long it might take. This example will take less than a minute."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "pyrad -p params.txt -s 6"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "vsearch v1.0.1_linux_x86_64, 62.8GB RAM, 40 cores\n",
        "https://github.com/torognes/vsearch\n",
        "\n",
        "\n",
        "\tfinished clustering\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "\n",
        "\n",
        "     ------------------------------------------------------------\n",
        "      pyRAD : RADseq for phylogenetics & introgression analyses\n",
        "     ------------------------------------------------------------\n",
        "\n",
        "\n",
        "\tstep 6: clustering across 12 samples at '.85' similarity \n",
        "\n",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 0%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 0%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 1%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 1%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 2%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 2%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 3%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 3%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 4%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 4%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 5%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 5%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 6%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 6%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 7%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 7%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 8%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 8%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 9%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 9%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 10%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 10%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 11%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 11%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 12%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 12%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 13%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 13%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 14%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 14%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 15%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 15%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 16%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 16%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 17%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 17%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 18%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 18%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 19%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 19%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 20%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 20%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 21%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 21%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 22%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 22%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 23%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 23%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 24%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 24%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 25%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 25%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 26%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 26%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 27%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 27%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 28%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 28%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 29%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 29%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 30%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 30%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 31%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 31%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 32%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 32%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 33%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 33%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 34%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 34%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 35%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 35%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 36%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 36%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 37%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 37%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 38%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 38%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 39%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 39%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 40%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 40%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 41%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 41%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 42%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 42%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 43%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 43%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 44%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 44%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 45%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 45%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 46%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 46%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 47%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 47%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 48%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 48%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 49%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 49%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 50%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 50%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 51%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 51%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 52%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 52%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 53%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 53%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 54%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 54%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 55%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 55%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 56%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 56%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 57%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 57%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 58%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 58%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 59%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 59%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 60%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 60%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 61%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 61%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 62%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 62%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 63%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 63%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 64%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 64%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 65%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 65%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 66%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 66%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 67%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 67%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 68%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 68%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 69%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 69%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 70%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 70%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 71%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 72%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 72%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 73%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 73%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.firsts_ 74%  \r",
        "Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/