josenavas/hyde_et_al_komodo_microbiome.ipynb

## hyde_et_al_komodo_microbiome.ipynb
{
 "metadata": {
  "name": "",
  "signature": "sha256:2cce2b7d34fdafbb608a096a80079ef1e74f935f37e466ba31257040674a6b26"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "The oral and skin microbiomes of captive Komodo Dragons are significantly shared with their habitat"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This IPython notebook contains the commands for the analysis described in ER Hyde et al., The oral and skin microbiomes of captive Komodo Dragons are significantly shared with their habitat."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Overview of the Dataset and Obtaining Raw Data"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This dataset represents the largest captive Komodo dragon microbiome dataset to date. Twelve zoos across the U.S. (Zoo Atlanta, ABQ BioPark, Bronx Zoo, Denver Zoo, Fort Worth Zoo, Gladys Porter Zoo, Houston Zoo, Honolulu Zoo, Jacksonville Zoo and Gardens, Los Angeles Zoo, the Virginia Aquarium, and Woodland Park Zoo) agreed to sample the skin, salivary, and fecal microbiomes of their resident Komodo dragons. Additionally, the Denver and Los Angeles Zoos collected samples from the Komodos' environments (soil, rock, plastic, etc.). Other reptiles were also sampled (Gray's Monitor at the LA Zoo, *Varanus rudicollis* and *Varanus indicus* at the Greeley Zoo, and wild-caught rattlesnakes in Colorado); however, due to small sample sizes, samples from these reptiles were not included in the current analysis. The V4 region of the 16S rRNA gene was amplified and sequenced on the Illumina HiSeq platform; samples were barcoded, which allowed for multiplexed sequencing. You can download the raw sequencing files and mapping files used for this study before getting started with this Notebook.\n",
      "\n",
      "Note: If you are using an Apple computer, use the `curl` command as OS X does not come with wget pre-installed."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Download the data using wget (if you are using OS X do not run this command)\n",
      "!wget ftp://ftp.microbio.me/pub/Hyde_et_al_captive_komodo_dragon_microbiome.tgz\n",
      "# Or alternativelly, if you're using OS X\n",
      "#!curl -OL ftp://ftp.microbio.me/pub/Hyde_et_al_captive_komodo_dragon_microbiome.tgz\n",
      "\n",
      "# Untar the files\n",
      "tar xzvf Hyde_et_al_captive_komodo_dragon_microbiome.tgz"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Demultiplexing and Quality Filtering"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We first need to demultiplex and quality filter the Illumina sequences. QIIME demultiplexes and quality filters using a single script: split_libraries_fastq.py. The inputs of this script are the sequencing outputs (raw reads and barcode fastq files) and the mapping file. We changed the maximum unacceptable Phred score from the default value of q=3 to a value of q=19 to ensure high-quality reads; the remaining defaults (r=3, p=75%, and n=0) were used. We also used the --rev_comp_mapping_barcodes option as our mapping file contains the reverse compliment of the barcodes."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!split_libraries_fastq.py -i $PWD/komodo/KomodoDragon_16sV4_L001_R1_001.fastq \\\n",
      "                          -b $PWD/komodo/KomodoDragon_16sV4_L001_I1_001.fastq \\\n",
      "                          -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                          -q 19 --rev_comp_mapping_barcodes \\\n",
      "                          -o $PWD/sl_out"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "OTU Picking and Tree Building"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "After preprocessing the raw data, we then cluster the sequences into Operational Taxonomic Units (OTUs), assign taxonomy to each OTU, build a phylogenetic tree, and make an OTU table. There are three approaches to OTU picking: closed-reference, open-reference, and *de novo* OTU picking. Closed-reference OTU picking is the fastest method. Sequences are aligned against a reference database, and all sequences that do not align against the reference database are discarded. *De novo* OTU picking is the most time-consuming method, as sequences are aligned against each other and clustered into OTUs *de novo*. Open-reference OTU picking is a mixed of closed-reference and *de novo* OTU picking, as it performs an initial closed-reference OTU picking step, followed by a *de novo* OTU picking step on those reads that did not cluster to the database. We first performed closed-reference OTU picking, using uclust as the clustering algorithm (QIIME default) and the May 2013 release of the Greengenes database as the reference database. A sequence identity of 97% was used for clustering."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!pick_closed_reference_otus.py -i $PWD/sl_out/seqs.fna -o $PWD/closed_ref_otus"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Since closed-reference OTU picking discards the sequences that did not match the reference, we will check how many sequences clustered against the reference database:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from biom import load_table\n",
      "from qiime.util import count_seqs_in_filepaths\n",
      "\n",
      "closed_ref_table = load_table('closed_ref_otus/otu_table.biom')\n",
      "clustered_seqs = closed_ref_table.matrix_data.sum()\n",
      "count_data, _, _ = count_seqs_in_filepaths(['sl_out/seqs.fna'])\n",
      "num_input_seqs = count_data[0][0][0]\n",
      "clustered_seqs / num_input_seqs * 100"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can see that only 72.7% of sequences in the dataset aligned to the reference database. To recover more information from the dataset, we can perform an open-reference OTU picking workflow, again using uclust as the underlying clustering algorithm and the May 2013 release of the Greengenes database as the reference."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!pick_open_reference_otus.py -i $PWD/sl_out/seqs.fna -o $PWD/open_ref_otus"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We now inspect the resulting OTU table to check how many sequences we kept using the open reference method."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "open_ref_table = load_table('open_ref_otus/otu_table_mc2_w_tax.biom')\n",
      "clustered_seqs = open_ref_table.matrix_data.sum()\n",
      "clustered_seqs / num_input_seqs * 100"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can see that in the open reference case we recover 95.2% of the sequences in the dataset. Since we kept more information with the open reference approach, this is the table that we are going to use for Downstream Analyses."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "OTU Table Filtering Prior to Downstream Analyses"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Before doing any Downstream Analyses, we are going to refine our data by removing low abundance, spurious OTUs which may be product of PCR or sequencing error by removing those OTUs that each represent less than 0.005% of the total number of sequences in the dataset (as recommended by Navas-Molina et al., 2013, Methods in Enzymology vol 531)."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!filter_otus_from_otu_table.py -i $PWD/open_ref_otus/otu_table_mc2_w_tax.biom \\\n",
      "                               -o $PWD/open_ref_otus/otu_table_mc2_w_tax_mcf00005.biom \\\n",
      "                               --min_count_fraction 0.00005"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We then perform an additional filtering step to remove samples that do not belong to Komodo dragons or their environments, plust an additional 5 samples with insufficient metadata to determine whether they belong to Komodo dragons or Komodo environments."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_mc2_w_tax_mcf00005.biom \\\n",
      "                                  -o $PWD/open_ref_otus/otu_table_komodo.biom \\\n",
      "                                  -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                                  -s 'HOST_COMMON_NAME:Komodo Dragon'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Alpha Diversity Analyses"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Alpha diversity is defined as the diversity of organisms in one sample or environment, and QIIME implements dozens of the most widely used alpha diversity indices (both phylogenetic and non-phylogenetic). Rarefaction plots, which not only enable visualization of the diversity of samples or environments, are also useful for assessing the sequencing effort sufficient for representing and comparing microbial communities in your dataset. The plot curves reach asymptotes when the sequencing depth does not contribute additional OTUs, indicating that the depth of sequencing was sufficient to sample the majority of diversity present in the sample. It is recommended that input OTU tables are filtered so that any samples containing less reads than the maximum number of reads you will sample are removed from the OTU table; if this step is omitted, some alpha diversity curves will not reach the end of the x-axis on the rarefaction plot. You can do this using the command filter_samples_from_otu_table.py. Because we are only interested in comparing the diversity of Komodo dragon skin, salivary, and fecal microbiomes right now, we will will filter the Komodo-only OTU table so that environmental samples are not present, and then we will filter so that any samples with less than 3000 reads are removed from the OTU table (this number was chosen to maximize the number of reads per sample while minimizing the number of samples discarded from analysis). "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# First filter the table to get only the Komodo dragon skin, salivary and fecal microbiomes\n",
      "!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_komodo.biom \\\n",
      "                                  -o $PWD/open_ref_otus/otu_table_komodo_no_env.biom \\\n",
      "                                  -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                                  -s 'BODY_PRODUCT:*,!NA'\n",
      "\n",
      "# Then filter to get only the samples with more than 3000 reads\n",
      "!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_komodo_no_env.biom \\\n",
      "                                  -o $PWD/open_ref_otus/otu_table_komodo_no_env_n3000.biom \\\n",
      "                                  -n 3000 "
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Although QIIME will perform a series of steps for calculating alpha diversity and creating rarefaction plots using the script alpha_rarefaction.py, we will perform each of these steps individually, as we alter some default parameters. The first step is rarefying the OTU table so that all samples have the same number of reads associated with them. This helps eliminate bias caused by different sequencing depths between samples. We will randomly rarefy the OTU table ten times (QIIME default), and later, the alpha diversity metrics calculated on each of these tables will be averaged. The script multiple_rarefactions.py requests the minimum and maximum number of sequences to rarefy to, as well as the size of each sampling step between the minimum and maximum number. Here, our minimum number will be 10, and our maximum will be 3300, with steps of 100. We will first make a directory within our working directory called \"alpha_diversity\" into which we can point the outputs of all alpha diversity scripts."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!mkdir alpha_diversity\n",
      "!multiple_rarefactions.py -i $PWD/open_ref_otus/otu_table_komodo_no_env_n3000.biom \\\n",
      "                          -o $PWD/alpha_diversity/multiple_rarefactions_komodo_no_env/ \\\n",
      "                          -m 10 -x 3300 -s 100"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We then calculate the desired alpha diversity metrics for each of the rarefied OTU tables created in the previous step. We will calculate the number of observed species (number of OTUs) as well as the Shannon Diversity index (a measure of richness and evenness of the community) using the comman alpha_diversity.py."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!alpha_diversity.py -i $PWD/alpha_diversity/multiple_rarefactions_komodo_no_env/ \\\n",
      "                    -o $PWD/alpha_diversity/adiv_shannon_OS_komodo_no_env/ \\\n",
      "                    -m shannon,observed_species"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We then collate the resulting alpha diversity results so that one file, representing the averages for all OTU tables at each read depth, is created for each alpha diversity metric."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!collate_alpha.py -i $PWD/alpha_diversity/adiv_shannon_OS_komodo_no_env/ \\\n",
      "                  -o $PWD/alpha_diversity/collated_alpha_komodo_no_env/"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Finally, we create rarefaction plots."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!make_rarefaction_plots.py -i $PWD/alpha_diversity/collated_alpha_komodo_no_env/ \\\n",
      "                           -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                           -o $PWD/alpha_diversity/rarefaction_plots_komodo_no_env/"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can open the html file to manipulate the rarefaction plots in our browser. Choosing the metadata category BODY_PRODUCT, we see that the number of OTUs and the Shannon Diversity index is lower for fecal samples than for skin or saliva samples. This is interesting, as in humans, the GI tract microbiome is the most diverse microbiome. We can further test whether this difference is significant and create a boxplot of the diversity metric distribution for each group of samples by running the script compare_alpha_diversity.py. This script uses as input the collated alpha diversity metrics files created in the collate_alpha.py step, as well as the mapping file."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# With observed species\n",
      "!compare_alpha_diversity.py -i $PWD/alpha_diversity/collated_alpha_komodo_no_env/observed_species.txt \\\n",
      "                            -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                            -o $PWD/alpha_diversity/OS_komodo_no_env \\\n",
      "                            -c BODY_PRODUCT\n",
      "\n",
      "# With Shannon Diversity index\n",
      "!compare_alpha_diversity.py -i $PWD/alpha_diversity/collated_alpha_komodo_no_env/shannon.txt \\\n",
      "                            -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                            -o $PWD/alpha_diversity/Shannon_Komodo_no_env \\\n",
      "                            -c BODY_PRODUCT"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We also want to compare the diversity of environmental samples to Komodo skin, saliva, and fecal microbiomes. We can therefore repeat each of the analyses steps outlined above, using an OTU table that includes Komodo skin, saliva, fecal, and environmental samples."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Filter the OTU table to get the samples with more than 3000 reads\n",
      "!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_komodo.biom \\\n",
      "                                  -o $PWD/open_ref_otus/otu_table_komodo_n3000.biom \\\n",
      "                                  -n 3000\n",
      "# Perform multiple rarefactions\n",
      "!multiple_rarefactions.py -i $PWD/open_ref_otus/otu_table_komodo_n3000.biom \\\n",
      "                          -o $PWD/alpha_diversity/multiple_rarefactions_komodo/ \\\n",
      "                          -m 10 -x 3300 -s 100\n",
      "# Perform alpha diversity\n",
      "!alpha_diversity.py -i $PWD/alpha_diversity/multiple_rarefactions_komodo/ \\\n",
      "                    -o $PWD/alpha_diversity/adiv_shannon_OS_komodo/ \\\n",
      "                    -m shannon,observed_species\n",
      "\n",
      "# Collate alpha diversity\n",
      "!collate_alpha.py -i $PWD/alpha_diversity/adiv_shannon_OS_komodo/ \\\n",
      "                  -o $PWD/alpha_diversity/collated_alpha_komodo/\n",
      "\n",
      "# Generate plots\n",
      "!make_rarefaction_plots.py -i $PWD/alpha_diversity/collated_alpha_komodo/ \\\n",
      "                           -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                           -o $PWD/alpha_diversity/rarefaction_plots_komodo/\n",
      "\n",
      "# Compare alpha diversity\n",
      "!compare_alpha_diversity.py -i $PWD/alpha_diversity/collated_alpha_komodo/observed_species.txt \\\n",
      "                            -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                            -o $PWD/alpha_diversity/OS_Komodo \\\n",
      "                            -c BODY_PRODUCT\n",
      "!compare_alpha_diversity.py -i $PWD/alpha_diversity/collated_alpha_komodo/shannon.txt \\\n",
      "                            -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                            -o $PWD/alpha_diversity/shannon_Komodo \\\n",
      "                            -c BODY_PRODUCT"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Creating Taxa Summaries to Visualize the Composition of the Komodo dragon Skin, Salivary, and Fecal Microbiomes"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To visualize which bacterial taxa comprise the microbiomes of individual samples or environments, we can create taxa plots. We first rarefy the OTU table to 3323 reads per sample."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!single_rarefaction.py -i $PWD/open_ref_otus/otu_table_komodo.biom \\\n",
      "                       -o $PWD/open_ref_otus/otu_table_komodo_even3323.biom \\\n",
      "                       -d 3323"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "First, we want to characterize the Komodo salivary, skin, and fecal microbiomes. We can collapse the OTU table such that all salivary, skin, and fecal samples are an average of the entire cohort, with one column in the OTU table for each metadata category. *Note, because this OTU table also contains environmental samples, a column for environmental samples (\"NA\") will also be created. We will not focus on environmental samples yet, but can return to this taxa plot later.*"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!mkdir taxa_summaries\n",
      "!collapse_samples.py -b $PWD/open_ref_otus/otu_table_komodo_even3323.biom \\\n",
      "                     -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                     --output_biom_fp $PWD/taxa_summaries/otu_table_komodo_even3323_by_body_product.biom \\\n",
      "                     --output_mapping_fp $PWD/taxa_summaries/Komodo_Mapping_file_by_body_product.txt \\\n",
      "                     --collapse_fields BODY_PRODUCT"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can then run a single script in QIIME which will split the OTU table into 5 different taxonomic-level OTU tables, labelled L2, L3, L4, L5, and L6.txt (L2 being Phylum and L6 being genus) and then create interactive html files (as well as 2D pdfs for each possible plot) for exploring the average relative abundances of individual taxa in each group of samples."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!summarize_taxa_through_plots.py -i $PWD/taxa_summaries/otu_table_komodo_even3323_by_body_product.biom \\\n",
      "                                 -m $PWD/taxa_summaries/Komodo_Mapping_file_by_body_product.txt \\\n",
      "                                 -o $PWD/taxa_summaries/Komodo_by_body_product_summaries/"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Beta Diversity Analyses"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Beta diversity is defined as the difference in the diversities across samples or environments. QIIME can compute many phylogenetic and non-phylogenetic beta diversity metrics, although UniFrac is the most generally useful metric. Using the script beta_diversity_through_plots.py, QIIME generates a distance matrix by computing the beta diversity between each pair of input sample as well as an Emperor html file for visualizing a PCoA plot interactively in 3-dimensions. We will create two sets of distance matrices and PCoA plots-one that includes Komodo dragon skin, salivary, and fecal samples and one that also includes environmental samples."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# First create a directory for all beta diversity results\n",
      "!mkdir beta_div\n",
      "\n",
      "# Komodo skin, saliva and fecal samples plus Komodo environmental samples\n",
      "!beta_diversity_through_plots.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \\\n",
      "                                 -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                                 -t $PWD/open_ref_otus/rep_set.tre \\\n",
      "                                 -o $PWD/beta_div/bdiv_even3323_komodo\n",
      "\n",
      "# Filter environment samples from rarefied table\n",
      "!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \\\n",
      "                                  -o $PWD/open_ref_otus/otu_table_komodo_no_env_even3323.biom \\\n",
      "                                  -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                                  -s 'BODY_PRODUCT:*,!NA'\n",
      "\n",
      "# Only Komodo skin, saliva and fecal samples\n",
      "!beta_diversity_through_plots.py -i $PWD/open_ref_otus/otu_table_komodo_no_env_even3323.biom \\\n",
      "                                 -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                                 -t $PWD/open_ref_otus/rep_set.tre \\\n",
      "                                 -o $PWD/beta_div/bdiv_even3323_komodo_no_env"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Coloring the dots by the metadata category BODY_PRODUCT, we see that fecal samples cluster separately from skin and saliva samples; additionally, environmental samples cluster among skin and saliva, but not fecal, samples.\n",
      "\n",
      "We can test the signficance of the clustering and whether body site is a driver of this clustering pattern using the script compare_categories.py. We will use two tests: anosim and permanova. This script requires the UniFrac distance matrix produced during the beta_diversity_through_plots.py step; we will only use the distance matrix containing Komodo body, but not environmental, samples."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# anosim - unweighted unifrac\n",
      "!compare_categories.py -i $PWD/beta_div/bdiv_even3323_komodo_no_env/unweighted_unifrac_dm.txt \\\n",
      "                       -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                       -o $PWD/beta_div/bdiv_even3323_komodo_no_env/cc_body_product_unweighted_unifrac_anosim \\\n",
      "                       -c BODY_PRODUCT \\\n",
      "                       --method anosim\n",
      "# anosim - weighted unifrac\n",
      "!compare_categories.py -i $PWD/beta_div/bdiv_even3323_komodo_no_env/weighted_unifrac_dm.txt \\\n",
      "                       -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                       -o $PWD/beta_div/bdiv_even3323_komodo_no_env/cc_body_product_weighted_unifrac_anosim \\\n",
      "                       -c BODY_PRODUCT \\\n",
      "                       --method anosim\n",
      "# permanova - unweighted unifrac\n",
      "!compare_categories.py -i $PWD/beta_div/bdiv_even3323_komodo_no_env/unweighted_unifrac_dm.txt \\\n",
      "                       -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                       -o $PWD/beta_div/bdiv_even3323_komodo_no_env/cc_body_product_unweighted_unifrac_permanova \\\n",
      "                       -c BODY_PRODUCT \\\n",
      "                       --method permanova\n",
      "# permanova - weighted unifrac\n",
      "!compare_categories.py -i $PWD/beta_div/bdiv_even3323_komodo_no_env/weighted_unifrac_dm.txt \\\n",
      "                       -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                       -o $PWD/beta_div/bdiv_even3323_komodo_no_env/cc_body_product_weighted_unifrac_permanova \\\n",
      "                       -c BODY_PRODUCT \\\n",
      "                       --method permanova"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Computing the Core Microbiome"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To determine whether or not there is a core Komodo dragon skin, salivary, or fecal microbiome, we can run a script in QIIME called compute_core_microbiome.py. This script will determine which OTUs are present in a user defined subset of samples; for example, we will set the minimum number of samples at 50% of the cohort, and the maximum number at 100% of the cohort. We can then indicate that we want 11 steps in between the minimum and maximum, to ensure that a core microbiome is calculated in steps of 5% (55%, 60%, 65%, etc.). We want to determine the core microbiome for skin, saliva, and fecal samples, and will therefore run this script three times, indicating which body site to use each time."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# First create a directory for all the core microbiome results\n",
      "!mkdir core_microbiome\n",
      "\n",
      "# Saliva\n",
      "!compute_core_microbiome.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \\\n",
      "                            --mapping_fp $PWD/Komodo_Mapping_file.txt \\\n",
      "                            --valid_states \"BODY_PRODUCT:UBERON:saliva\" \\\n",
      "                            --max_fraction_for_core 1.0 \\\n",
      "                            --min_fraction_for_core 0.5 \\\n",
      "                            --num_fraction_for_core_steps 11 \\\n",
      "                            -o $PWD/core_microbiome/cm_saliva\n",
      "# Skin\n",
      "!compute_core_microbiome.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \\\n",
      "                            --mapping_fp $PWD/Komodo_Mapping_file.txt \\\n",
      "                            --valid_states \"BODY_PRODUCT:UBERON:sebum\" \\\n",
      "                            --max_fraction_for_core 1.0 \\\n",
      "                            --min_fraction_for_core 0.5 \\\n",
      "                            --num_fraction_for_core_steps 11 \\\n",
      "                            -o $PWD/core_microbiome/cm_skin\n",
      "# Feces\n",
      "!compute_core_microbiome.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \\\n",
      "                            --mapping_fp $PWD/Komodo_Mapping_file.txt \\\n",
      "                            --valid_states \"BODY_PRODUCT:UBERON:feces\" \\\n",
      "                            --max_fraction_for_core 1.0 \\\n",
      "                            --min_fraction_for_core 0.5 \\\n",
      "                            --num_fraction_for_core_steps 11 \\\n",
      "                            -o $PWD/core_microbiome/cm_feces"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Using SourceTracker to Characterize Host-Environment Microbiome Sharing"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We are also interesed in how much of its microbiome the caprive Komodo dragon shares with its environment. Some have suggested that Komodos passively acquire potential pathogens from the environment; others posit that Komodos share pathogens with each other, usually through feeding on the same carrion, though this could be extrapolated to include any environmental object two Komodos may touch. Nevertheless, the Komodo environment (captive or wild) has been virtually ignored; therefore, analysing the  extent of Komodo-environment microbiome sharing represents an important knowledge gap. We can use SourceTracker to determine whether and which microbiomes the Komodo shares with its environment.\n",
      "\n",
      "In this dataset, two zoos provided Komodo and environmental samples. For our primary analysis, we will focus only on the samples from three Denver Zoo dragons: Anika, Kristika and Raja, due to larger sample sizes and matched dragon-environmental samples."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Filter the table to get Denver samples only\n",
      "!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \\\n",
      "                                  -o $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver.biom \\\n",
      "                                  -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                                  -s 'DRAGON_NAME:Anika,Kristika,Raja'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Following the [recommendations](http://qiime.org/tutorials/source_tracking.html) for running SourceTracker, we should filter those OTUs from our OTU table that are present in less than 1% of the samples. In our case, we only have 37 samples, so there is no reason to filter.\n",
      "\n",
      "SourceTracker does not accept biom formatted files, so we need to transform our biom table to a classic tab delimited text format."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!biom convert -i $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver.biom \\\n",
      "              -o $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver.txt \\\n",
      "              --table-type \"OTU table\" --to-tsv"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can now run SourceTracker:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver.txt \\\n",
      "                            -m $PWD/SourceTracker/mappings/Komodo_Mapping_file_ST_denver.txt \\\n",
      "                            -o $PWD/SourceTracker/st_denver/ \\\n",
      "                            < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can then perform SourceTracker on all Denver and Honolulu Zoo samples to confirm whether the results observed in three Denver Zoo dragons are recapitulated using a larger dataset representing two geographically distinct zoos."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Filter the table to contain only samples from Denver and Honolulu Zoo\n",
      "!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \\\n",
      "                                  -o $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver_honolulu.biom \\\n",
      "                                  -m $PWD/Komodo_Mapping_file.txt \\\n",
      "                                  -s 'PROVENANCE:Honolulu Zoo,Denver'\n",
      "\n",
      "# Convert the biom table to classic tab delimited text file\n",
      "!biom convert -i $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver_honolulu.biom \\\n",
      "              -o $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver_honolulu.txt \\\n",
      "              --table-type \"OTU table\" --to-tsv\n",
      "\n",
      "# Run SourceTracker\n",
      "!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver_honolulu.txt \\\n",
      "                            -m $PWD/SourceTracker/mappings/Komodo_Mapping_file_ST_denver_honolulu.txt \\\n",
      "                            -o $PWD/SourceTracker/st_denver_honolulu/ \\\n",
      "                            < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can also perform independence tests on our sources (Komodo skin, saliva, and fecal microbiomes) to determine whether each source is independant from each other source. To do this, we simply add \"-s\" to the SourceTracker command:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver_honolulu.txt \\\n",
      "                            -m $PWD/SourceTracker/mappings/Komodo_Mapping_file_ST_denver_honolulu.txt \\\n",
      "                            -o $PWD/SourceTracker/st_denver_honolulu_independence_test/ \\\n",
      "                            -s < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Comparing Captive Komodos and Their Environments to Wild Amphibians and Their Environments"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We next want to establish the specificity of the Komodo dragon environment. Therefore, we will combine the Komodo dragon dataset with a published wild amphibian-environmental dataset. We first combine the split_library_fastq.py outputs from both projects into a single .fna file, and then use that file as input for pick_open_reference_otus.py."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Create a directory to store the Komodo-Amphibians results\n",
      "!mkdir komodo_amphibians\n",
      "# Concatenate the split libraries output from both datasets\n",
      "!cat sl_out/seqs.fna amphibians/seqs.fna >> komodo_amphibians/seqs.fna\n",
      "# Run open reference OTU picking\n",
      "!pick_open_reference_otus.py -i $PWD/komodo_amphibians/seqs.fna -o $PWD/komodo_amphibians/open_ref_otus"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Once we have the OTU table, we can then perform the same quality filter outlined above to remove spurious OTUs:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!filter_otus_from_otu_table.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_mc2_w_tax.biom \\\n",
      "                               -o $PWD/komodo_amphibians/open_ref_otus/otu_table_mc2_w_tax_mcf00005.biom \\\n",
      "                               --min_count_fraction 0.00005"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As outlined above, the Komodo dataset actually includes samples from other Varanids, but due to limited sample size, we are interested only in the Komodo dragon and their environment samples. Additionally, in the Amphibian dataset there are two control samples that should also be filtered out:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Filter the Amphibian's control samples:\n",
      "!filter_samples_from_otu_table.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_mc2_w_tax_mcf00005.biom \\\n",
      "                                  -o $PWD/komodo_amphibians/open_ref_otus/otu_table_mc2_w_tax_mcf00005_no_Amph_controls.biom \\\n",
      "                                  -m $PWD/Amphibians_Komodo_Mapping_file.txt \\\n",
      "                                  -s 'host_species_abbrev:*,!Sterile_water_control,!sterile_glove_control'\n",
      "\n",
      "# Filter the non-Komodo related varanid samples\n",
      "!filter_samples_from_otu_table.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_mc2_w_tax_mcf00005_no_Amph_controls.biom \\\n",
      "                                  -o $PWD/komodo_amphibians/open_ref_otus/otu_table_AK.biom \\\n",
      "                                  -m $PWD/Amphibians_Komodo_Mapping_file.txt \\\n",
      "                                  -s \"host_common_name:*,!Gray's Monitor,!NA,!Prairie Rattlesnake,!Varanus Indicus,!Varanus Rudicollis\""
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We then rarefy our table prior to any analysis:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!single_rarefaction.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_AK.biom \\\n",
      "                       -o $PWD/komodo_amphibians/open_ref_otus/otu_table_AK_even5870.biom \\\n",
      "                       -d 5870"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We then perform beta diversity analysis to confirm that Komodo and amphibian microbial communities are different. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!beta_diversity_through_plots.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_AK_even5870.biom \\\n",
      "                                 -m $PWD/Amphibians_Komodo_Mapping_file.txt \\\n",
      "                                 -t $PWD/komodo_amphibians/open_ref_otus/rep_set.tre \\\n",
      "                                 -o $PWD/komodo_amphibians/bdiv_even_5870"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can also make OTU networks, which we can visualize using Cytoscape (please see the Cytoscape tutorial available [here](http://qiime.org/tutorials/making_cytoscape_networks.html)), allowing us to determine if there are any shared OTUs between the two datasets."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!make_otu_network.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_AK_even5870.biom \\\n",
      "                     -m $PWD/Amphibians_Komodo_Mapping_file.txt \\\n",
      "                     -o $PWD/komodo_amphibians/otu_network"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Comparing Captive Komodos and Their Environments to Humans, Their Pets, and Their Homes"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To further explore the specificity of the captive Komodo dragon environment, as well as to further our analysis of host-microbiome sharing in closed, \"built\" environments vs. open environments, we can combine the Komodo dragon dataset with a published human-pet-house dataset. As we did when we combined the Komodo dataset with the amphibian dataset, we will first combine split_libraries_fastq.py files from both datasets and then run the open reference OTU picking workflow."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Create a directory to store the Komodo-Humans-Pets results\n",
      "!mkdir komodo_humans_pets\n",
      "# Concatenate the split libraries output from both datasets\n",
      "!cat sl_out/seqs.fna humans_pets/seqs.fna >> komodo_humans_pets/seqs.fna\n",
      "# Run open reference OTU picking\n",
      "!pick_open_reference_otus.py -i $PWD/komodo_humans_pets/seqs.fna -o $PWD/komodo_humans_pets/open_ref_otus"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As in the previous cases, we applied the recommended filter for spurious OTUs and the filter for getting only Komodo related samples from the Komodo dataset:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Filter the spurious OTUs\n",
      "!filter_otus_from_otu_table.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_mc2_w_tax.biom \\\n",
      "                               -o $PWD/komodo_humans_pets/open_ref_otus/otu_table_mc2_w_tax_mcf00005.biom \\\n",
      "                               --min_count_fraction 0.00005\n",
      "# Filter other varanids data\n",
      "!filter_samples_from_otu_table.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_mc2_w_tax_mcf00005.biom \\\n",
      "                                  -m $PWD/Humans_Komodo_Mapping_file.txt \\\n",
      "                                  -o $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK.biom \\\n",
      "                                  -s 'HOST_COMMON_NAME:Komodo Dragon,no_data'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We then rarefy our table prior to any analysis:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!single_rarefaction.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK.biom \\\n",
      "                       -o $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK_even5367.biom \\\n",
      "                       -d 5367"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We then perform beta diversity analyses and associated statistical tests."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!beta_diversity_through_plots.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK_even5367.biom \\\n",
      "                                 -m $PWD/Humans_Komodo_Mapping_file.txt \\\n",
      "                                 -t $PWD/komodo_humans_pets/open_ref_otus/rep_set.tre \\\n",
      "                                 -o $PWD/komodo_humans_pets/bdiv_even_5367"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "We can also make OTU networks, as we did above for the Komodo-amphibian combined dataset. In this case, in order to avoid a really complex network that will be hard to see anything, we will filter out those OTUs that are not present in 1% of the total number of samples (in our case 18 samples):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Filter OTUs present in less than 1% of the samples\n",
      "!filter_otus_from_otu_table.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK_even5367.biom \\\n",
      "                               -o $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK_even5367_msc18.biom \\\n",
      "                               -s 18\n",
      "# Create the network\n",
      "!make_otu_network.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK_even5367_msc18.biom \\\n",
      "                     -m $PWD/Humans_Komodo_Mapping_file.txt \\\n",
      "                     -o $PWD/komodo_humans_pets/otu_networks"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Using SourceTracker to Analyze Human/Pet-House and Amphibian-Environment Microbiome Sharing"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now that we have established that Komodo environments are specific to Komodo dragons, we will then analyze host environment sharing among humans/pets and their homes as well as amphibians and their environments. In this way we can answer the following two questions:\n",
      " 1. Is host-environment microbiome sharing observed in captive Komodos typical of that observed among other vertebrates (in this case, humans, dogs, and cats) living in closed environments?\n",
      " 2. Is the host-environment microbiome sharing observed among captive Komodos characteristically different from that observed among wild vertebrates (in this case, six frog/newt species)?"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Human/Pet-House SourceTracker Analyses:"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In order to run the SourceTracker Analyses in the Human/Pet-House data, we first need to filter the Komodo data out of the OTU table containing human/pet/home and Komodo/environment samples. We need to do this in 2 steps. First, we remove the samples that belong to the Komodo dragons and their environment. Second, we filter those OTUs that have zero counts after the first filtering step because they're only present in the Komodo dragon samples."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Create a directory for only humans/pet-house data\n",
      "!mkdir $PWD/komodo_humans_pets/humans_pets_only\n",
      "\n",
      "# Filter Komodo dragon and their environment samples\n",
      "!filter_samples_from_otu_table.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK_even5367.biom \\\n",
      "                                  -m $PWD/Humans_Komodo_Mapping_file.txt \\\n",
      "                                  -o $PWD/komodo_humans_pets/humans_pets_only/otu_table_humans_even5367.biom \\\n",
      "                                  -s 'HOST_COMMON_NAME:no_data'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Since we have more samples in this case, we can now apply the recommended filter to exclude those OTUs from our OTU table that are present in less than 1% of the samples, which is 16:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!filter_otus_from_otu_table.py -i $PWD/komodo_humans_pets/humans_pets_only/otu_table_humans_even5367.biom \\\n",
      "                               -o $PWD/SourceTracker/tables/otu_table_humans_even5367_msc16.biom \\\n",
      "                               -s 16"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we can convert our OTU table from biom to the classic tab delimited format and run SourceTracker, including the test of source independence:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Convert from BIOM to classic txt table\n",
      "!biom convert -i $PWD/SourceTracker/tables/otu_table_humans_even5367_msc16.biom \\\n",
      "              -o $PWD/SourceTracker/tables/otu_table_humans_even5367_msc16.txt \\\n",
      "              --table-type \"OTU table\" --to-tsv\n",
      "# Run SourceTracker\n",
      "!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_humans_even5367_msc16.txt \\\n",
      "                            -m $PWD/SourceTracker/mappings/Humans_Mapping_file_ST.txt \\\n",
      "                            -o $PWD/SourceTracker/st_humans/ \\\n",
      "                            < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r\n",
      "# Run SourceTracker test of source independence\n",
      "!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_humans_even5367_msc16.txt \\\n",
      "                            -m $PWD/SourceTracker/mappings/Humans_Mapping_file_ST.txt \\\n",
      "                            -o $PWD/SourceTracker/st_humans_source_independence_test/ \\\n",
      "                            -s < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Amphibian-Environment SourceTracker Analyses"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here we are going to apply the same table filterings as in the case above: removing all Komodo-related samples and then apply the recommended filter to filter out OTUs that are not present in 1% of the samples (in this case 2)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Create a directory for only amphibians data\n",
      "!mkdir $PWD/komodo_amphibians/amphibians_only\n",
      "\n",
      "# Filter Komodo dragon and their environment samples\n",
      "!filter_samples_from_otu_table.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_AK_even5870.biom \\\n",
      "                                  -m $PWD/Amphibians_Komodo_Mapping_file.txt \\\n",
      "                                  -o $PWD/komodo_amphibians/amphibians_only/otu_table_amphibians_even5870.biom \\\n",
      "                                  -s 'host_common_name:*,!Komodo Dragon'\n",
      "# Filter OTUs not present in 1% (2) of the samples\n",
      "!filter_otus_from_otu_table.py -i $PWD/komodo_amphibians/amphibians_only/otu_table_amphibians_even5870.biom \\\n",
      "                               -o $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.biom \\\n",
      "                               -s 2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can now convert our BIOM table to classic tab delimited text file and run SourceTracker:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Convert from BIOM to classic txt table\n",
      "!biom convert -i $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.biom \\\n",
      "              -o $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.txt \\\n",
      "              --table-type \"OTU table\" --to-tsv\n",
      "# Run SourceTracker\n",
      "!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.txt \\\n",
      "                            -m $PWD/SourceTracker/mappings/Amphibians_Mapping_file_ST.txt \\\n",
      "                            -o $PWD/SourceTracker/st_amphibians/ \\\n",
      "                            < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r\n",
      "# Run SourceTracker test of source independence\n",
      "!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.txt \\\n",
      "                            -m $PWD/SourceTracker/mappings/Amphibians_Mapping_file_ST.txt \\\n",
      "                            -o $PWD/SourceTracker/st_amphibians_source_independence_test/ \\\n",
      "                            -s < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Because we only see host-evironment sharing in water samples (with minimal to no sharing of the host microbiome with soil and sediment samples), we want to determine whether the environment shares its microbes with the host. Therefore, we perform SourceTracker analyses as before, but adjust our mapping file so that amphibian skin is the sink and water, soil, and sediment are the sources. We also perform independence tests."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Run SourceTracker\n",
      "!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.txt \\\n",
      "                            -m $PWD/SourceTracker/mappings/Amphibians_Mapping_file_ST_reversed.txt \\\n",
      "                            -o $PWD/SourceTracker/st_amphibians_reversed/ \\\n",
      "                            < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r\n",
      "# Run SourceTracker test of source independence\n",
      "!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.txt \\\n",
      "                            -m $PWD/SourceTracker/mappings/Amphibians_Mapping_file_ST_reversed.txt \\\n",
      "                            -o $PWD/SourceTracker/st_amphibians_reversed_independence_test/ \\\n",
      "                            -s < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Comparing Distance Matrices Between Komodos, Humans/Pets, Amphibians, and Their Environments"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To further support the idea that host-environment microbe sharing is more similar between captive Komodo dragons and humans/pets and their environments than between captive Komodos and wild amphibians and their evironments, we will next compare the Unifrac distance matrices. First, we'll compare Komodo dragons to humans, as we expect the distances between these vertebrate hosts and their environments to be similarly small. Then, we'll compare Komodos to amphibians, as we expect the distances between amphibians and their environments to be larger than those between Komodos and their environments. We first use the make_distance_boxplots commands within Qiime to get an idea of what the pattern is; we will be sure to pass --save_raw_data, which will produce .txt files that we'll later use to create only the boxplot comparisons of interest. This command also performs two-sample t-tests to determine if distance distributions are significantly different."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# First, we filter the distance matrix so that only the Komodo dragons with paired\n",
      "# environmental samples are included (Denver and Honolulu):\n",
      "\n",
      "# Komodo vs Humans/Pets\n",
      "!filter_distance_matrix.py -i $PWD/komodo_humans_pets/bdiv_even_5367/unweighted_unifrac_dm.txt \\\n",
      "                           -m $PWD/Humans_Komodo_Mapping_file.txt \\\n",
      "                           -o $PWD/komodo_humans_pets/bdiv_even_5367/unweighted_unifrac_dm_filt.txt \\\n",
      "                           -s 'PROVENANCE:no_data,Denver,Honolul Zoo'\n",
      "\n",
      "!make_distance_boxplots.py -d $PWD/komodo_humans_pets/bdiv_even_5367/unweighted_unifrac_dm_filt.txt \\\n",
      "                           -m $PWD/Humans_Komodo_Mapping_file.txt2 \\\n",
      "                           -o $PWD/komodo_humans_pets/bdiv_even_5367/uw_distance_boxplots \\\n",
      "                           -f \"Sample_Type\" -n 999 --sort median --save_raw_data\n",
      "\n",
      "# Komodo vs Amphibians\n",
      "!filter_distance_matrix.py -i $PWD/komodo_amphibians/bdiv_even_5870/unweighted_unifrac_dm.txt \\\n",
      "                           -m $PWD/Amphibians_Komodo_Mapping_file.txt \\\n",
      "                           -o $PWD/komodo_amphibians/bdiv_even_5870/unweighted_unifrac_dm_filt.txt \\\n",
      "                           -s 'PROVENANCE:no_data,Denver,Honolul Zoo'\n",
      "\n",
      "!make_distance_boxplots.py -d $PWD/komodo_amphibians/bdiv_even_5870/unweighted_unifrac_dm_filt.txt \\\n",
      "                           -m $PWD/Amphibians_Komodo_Mapping_file.txt \\\n",
      "                           -o $PWD/komodo_amphibians/bdiv_even_5870/uw_distance_boxplots \\\n",
      "                           -f \"Env\" -n 999 --sort median --save_raw_data"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we are ready to create boxplots containing only the comparisons of interest, as the Qiime-produced boxplots show more comparisons than we are interested in. In order to do that, we are going to perform some custom scripting. We will start by defining the function `parse_distances`, which will parse the raw data files that we saved from the `make_distance_boxplots.py` QIIME command:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numpy as np\n",
      "import matplotlib.pyplot as plt\n",
      "\n",
      "def parse_distances(fp):\n",
      "    # Parse the input file\n",
      "    data = {}\n",
      "    with open(fp, \"U\") as f:\n",
      "        for line in f:\n",
      "            line = line.strip()\n",
      "            values = line.split()\n",
      "            data[values[0]] = np.asarray(values[1:], dtype=np.float64)\n",
      "    return data"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we can use this function to parse the distances:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "hk_dists = parse_distances(\"komodo_humans_pets/bdiv_even_5367/uw_distance_boxplots/Sample_Type_Distances.txt\")\n",
      "ak_dists = parse_distances(\"komodo_amphibians/bdiv_even_5870/uw_distance_boxplots/Env_Distances.txt\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we can take a look at the Komodo vs Human/Pet distances:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "plt.boxplot([hk_dists['Human_Environment_vs._Human-Pet'], hk_dists['Komodo_Environment_vs._Komodo']])\n",
      "plt.savefig(\"komodo_humans_pets/bdiv_even_5367/uw_distance_boxplots/boxplots.pdf\", format=\"pdf\")\n",
      "plt.show()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "And to the Komodo vs Amphibian:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "plt.boxplot([ak_dists['Amphibian_Environment_vs._Amphibian_Skin'], ak_dists['Komodo_Environment_vs._Komodo']])\n",
      "plt.savefig(\"komodo_amphibians/bdiv_even_5870/uw_distance_boxplots/boxplots.pdf\", format=\"pdf\")\n",
      "plt.show()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}