jennomics/gist:9790892

## gistfile1.txt
{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "This notebook is for the analysis of 15 16S PCR libraries that were produced from DNA extracted from meals, collected by blending a plate of food in the blender. The meals were prepared to be typical of three diet types (Average American, USDA-recommended, and Vegan)</p></p>\n",
      "\n",
      "Before launching this ipython notebook, I typed the macqiime command to configure the shell. I'm using macqiime 1.8.0\n",
      "http://www.wernerlab.org/software/macqiime/macqiime-installation \n",
      "\n"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "I copied the stuff below from a QIIME/iPython notebook tutorial: http://nbviewer.ipython.org/github/qiime/qiime/blob/1.8.0/examples/ipynb/illumina_overview_tutorial.ipynb</p></p>I'm not sure what all of it does...\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from os import chdir, mkdir\n",
      "from os.path import join\n",
      "#the following are only available in the current development branch of IPython\n",
      "from IPython.display import FileLinks, FileLink"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "At the time I started this project, the in-house demultiplexing script that I was using returned all of the sequences in the wrong orientation, so I start by reverse-complementing them. This is something that I figured out by trial and error. </p></p>\n",
      "Note: If all of the sequences are in the wrong orientation, then I get zero OTUs found when doing the OTU picking</p></p>\n",
      "Also note: you can enable reverse strand matching during the otu-picking step to avoid this problem. I chose to do it this way because the otu-picking runs faster without the reverse strand matching enabled"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!adjust_seq_orientation.py -i MicrobesWeEat.faa -o MicrobesWeEat.fasta -r"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Substitute the file paths below, and then there is no need to change any of the code in the rest of the notebook.</p></p>\n",
      "The version of macqiime that I'm using does not install the greengenes 99% cutoff OTUs and taxonomy, so did that manually as per the instructions on the MacQIIME Installation site. I just substituted the gg_13_8_otus folder that has all of the otu cutoffs for the one included in macqiime/greengenes/ \n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "project_name = \"MicrobesWeEat\"\n",
      "sequence_file = \"/Users/Jenna/Dropbox/Projects/MicrobesWeEat/MicrobesWeEat.fasta\"\n",
      "non_chimeric_sequence_file = \"/Users/Jenna/Dropbox/Projects/MicrobesWeEat/non_chimeric_sequences.fasta\"\n",
      "mapping_file = \"/Users/Jenna/Dropbox/Projects/MicrobesWeEat/MicrobesWeEat.txt\"\n",
      "otu_base = \"/macqiime/greengenes/gg_13_8_otus/\"\n",
      "reference_seqs_99 = join(otu_base,\"/macqiime/greengenes/gg_13_8_otus/rep_set/99_otus.fasta\")\n",
      "reference_tree_99 = join(otu_base,\"/macqiime/greengenes/gg_13_8_otus/trees/99_otus.tree\")\n",
      "reference_tax_99 = join(otu_base,\"/macqiime/greengenes/gg_13_8_otus/taxonomy/99_otu_taxonomy.txt\")\n",
      "reference_seqs_97 = join(otu_base,\"/macqiime/greengenes/gg_13_8_otus/rep_set/97_otus.fasta\")\n",
      "reference_tree_97 = join(otu_base,\"/macqiime/greengenes/gg_13_8_otus/trees/97_otus.tree\")\n",
      "reference_tax_97 = join(otu_base,\"/macqiime/greengenes/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt\")\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Make sure mapping file is good to go. This also provides a quick check for macqiime."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!validate_mapping_file.py -m $mapping_file"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Errors and/or warnings detected in mapping file.  Please check the log and html file for details.\r\n"
       ]
      }
     ],
     "prompt_number": 43
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "The mapping file errors were due to the fact that I do not have a column for barcodes. Because I did the demultiplexing before getting started with QIIME, I don't need this column. So, I'll ignore the errors and get rid of the associated corrected mapping file, log file, and html file."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!rm $project_name\\_corrected.txt\n",
      "!rm $project_name\\.html\n",
      "!rm $project_name\\.log"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 53
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Next task: check for chimeric sequences. </p></p>\n",
      "I kept getting this error when trying to run the chimera-checking script below:</p></p>\n",
      "\n",
      "cogent.app.util.ApplicationNotFoundError: Cannot find usearch61. Is it installed? Is it in your path?</p></p>\n",
      "\n",
      "the anwer was yes, it's installed and in my path</p></p>\n",
      "\n",
      "the macqiime installation instructions recommend against doing this, but the only way I could get the chimera-checking script to find usearch61 was to copy it into /macqiime/bin"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!identify_chimeric_seqs.py -i $sequence_file -m usearch61 -o usearch_chimera_detection/ -r $reference_seqs_97"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "FileLinks('usearch_chimera_detection/')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "usearch_chimera_detection/<br>\n",
        "&nbsp;&nbsp;<a href='files/usearch_chimera_detection/chimeras.txt' target='_blank'>chimeras.txt</a><br>\n",
        "&nbsp;&nbsp;<a href='files/usearch_chimera_detection/identify_chimeric_seqs.log' target='_blank'>identify_chimeric_seqs.log</a><br>\n",
        "&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_chimeras_denovo.log' target='_blank'>MicrobesWeEat.fasta_chimeras_denovo.log</a><br>\n",
        "&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_chimeras_denovo.uchime' target='_blank'>MicrobesWeEat.fasta_chimeras_denovo.uchime</a><br>\n",
        "&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_chimeras_ref.log' target='_blank'>MicrobesWeEat.fasta_chimeras_ref.log</a><br>\n",
        "&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_chimeras_ref.uchime' target='_blank'>MicrobesWeEat.fasta_chimeras_ref.uchime</a><br>\n",
        "&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_consensus_fixed.fasta' target='_blank'>MicrobesWeEat.fasta_consensus_fixed.fasta</a><br>\n",
        "&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_consensus_with_abundance.fasta' target='_blank'>MicrobesWeEat.fasta_consensus_with_abundance.fasta</a><br>\n",
        "&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_consensus_with_abundance.uc' target='_blank'>MicrobesWeEat.fasta_consensus_with_abundance.uc</a><br>\n",
        "&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_smallmem_clustered.log' target='_blank'>MicrobesWeEat.fasta_smallmem_clustered.log</a><br>\n",
        "&nbsp;&nbsp;<a href='files/usearch_chimera_detection/non_chimeras.txt' target='_blank'>non_chimeras.txt</a><br>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 54,
       "text": [
        "usearch_chimera_detection/\n",
        "  chimeras.txt\n",
        "  identify_chimeric_seqs.log\n",
        "  MicrobesWeEat.fasta_chimeras_denovo.log\n",
        "  MicrobesWeEat.fasta_chimeras_denovo.uchime\n",
        "  MicrobesWeEat.fasta_chimeras_ref.log\n",
        "  MicrobesWeEat.fasta_chimeras_ref.uchime\n",
        "  MicrobesWeEat.fasta_consensus_fixed.fasta\n",
        "  MicrobesWeEat.fasta_consensus_with_abundance.fasta\n",
        "  MicrobesWeEat.fasta_consensus_with_abundance.uc\n",
        "  MicrobesWeEat.fasta_smallmem_clustered.log\n",
        "  non_chimeras.txt"
       ]
      }
     ],
     "prompt_number": 54
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Now, remove the sequences that have been flagged as chimeric."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!filter_fasta.py -f $sequence_file -o $non_chimeric_sequence_file -s usearch_chimera_detection/non_chimeras.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Pick OTUs and run PICRUSt"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "While there are good reasons NOT to use closed-reference otu-picking in general, because one of the things I'd like to do is to use PICRUSt to predict a metagenome, I will start by using the 99% cutoff greengenes reference files for closed-reference picking"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!pick_closed_reference_otus.py -o greengenes_99_otus -i $non_chimeric_sequence_file -r $reference_seqs_99 -t $reference_tax_99 -a -O 2 -f"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!biom summarize-table -i greengenes_99_otus/otu_table.biom -o otu_table_summary.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "FileLink('otu_table_summary.txt')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<a href='files/otu_table_summary.txt' target='_blank'>otu_table_summary.txt</a><br>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 41,
       "text": [
        "/Users/Jenna/Dropbox/Projects/MicrobesWeEat/otu_table_summary.txt"
       ]
      }
     ],
     "prompt_number": 41
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#make a directory to hold all of the biom tables I will be generating and move the first one into it\n",
      "!mkdir otu_tables\n",
      "!cp greengenes_99_otus/otu_table.biom otu_tables/closed_ref_99_otu_table.biom"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 57
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Remove the chloroplast and mitochondrial sequences"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!filter_taxa_from_otu_table.py -i otu_tables/closed_ref_99_otu_table.biom -o otu_tables/closed_ref_99_otu_table_no_euks.biom -n c__Chloroplast,f__mitochondria"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 56
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Normalize otu table by 16S copy number"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py -i otu_tables/closed_ref_99_otu_table_no_euks.biom -o otu_tables/normalized_closed_ref_99_otu_table_no_euks.biom"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Traceback (most recent call last):\r\n",
        "  File \"/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py\", line 18, in <module>\r\n",
        "    from picrust.predict_metagenomes import transfer_observation_metadata,\\\r\n",
        "ImportError: No module named picrust.predict_metagenomes\r\n"
       ]
      }
     ],
     "prompt_number": 61
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "When I first tried to run normalize_by_copy_number.py, I got this error:</p></p>\n",
      "Traceback (most recent call last):\n",
      "  File \"/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py\", line 18, in <module>\n",
      "    from picrust.predict_metagenomes import transfer_observation_metadata,\\\n",
      "ImportError: No module named picrust.predict_metagenomes</p></p>\n",
      "That was happening because python wasn't finding the modules installed with PICRUSt. I confirmed this by typing:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!echo $PYTHONPATH"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "/macqiime/lib/python2.7/site-packages:\r\n"
       ]
      }
     ],
     "prompt_number": 68
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "And, I didn't see the path to the PICRUSt modules there, so I added it like this:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!export PYTHONPATH=$PYTHONPATH:/Applications/picrust-1.0.0/picrust"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 65
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Now, hopefully, the normalization script will work!"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py -i otu_tables/closed_ref_99_otu_table_no_euks.biom -o otu_tables/normalized_closed_ref_99_otu_table_no_euks.biom"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Traceback (most recent call last):\r\n",
        "  File \"/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py\", line 18, in <module>\r\n",
        "    from picrust.predict_metagenomes import transfer_observation_metadata,\\\r\n",
        "ImportError: No module named picrust.predict_metagenomes\r\n"
       ]
      }
     ],
     "prompt_number": 66
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Nope, that didn't work. I'm guessing that this has something to do with the fact that I am in the macqiime environment, so I need to add that path somwhere where macqiime can see it. So, I dug around and found the profile file that macqiime is sourcing here:</p>\n",
      "/macqiime/configs/bash_profile.txt</p>\n",
      "and, in order to edit it, I have to first make it writable by me."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!sudo chmod 777 /macqiime/configs/bash_profile.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Password:"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r\n"
       ]
      }
     ],
     "prompt_number": 69
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Then, add this line to it:</p>\n",
      "export PYTHONPATH=/Applications/picrust-1.0.0/picrust:${PYTHONPATH}</p>\n",
      "Don't forget to source it! "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py -i otu_tables/closed_ref_99_otu_table_no_euks.biom -o otu_tables/normalized_closed_ref_99_otu_table_no_euks.biom"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Traceback (most recent call last):\r\n",
        "  File \"/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py\", line 18, in <module>\r\n",
        "    from picrust.predict_metagenomes import transfer_observation_metadata,\\\r\n",
        "ImportError: No module named picrust.predict_metagenomes\r\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!echo $PYTHONPATH"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "/Applications/picrust-1.0.0/picrust:/macqiime/lib/python2.7/site-packages:\r\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "It looks like the picrust modules should be in my PYTHONPATH now, but I'm getting the same error. So, I emailed the developers for help...</p></p>\n",
      "I was told to add /Applications/picrust-1.0.0 instead of /Applications/picrust-1.0.0/picrust to my PYTHONPATH. Again, I had to add this to here: /macqiime/configs/bash_profile.txt and then source it and then re-launch the notebook"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!echo $PYTHONPATH"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "/Applications/picrust-1.0.0:/macqiime/lib/python2.7/site-packages:/Applications/picrust-1.0.0:/macqiime/lib/python2.7/site-packages:\r\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py -i otu_tables/closed_ref_99_otu_table_no_euks.biom -o otu_tables/normalized_closed_ref_99_otu_table_no_euks.biom"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Traceback (most recent call last):\r\n",
        "  File \"/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py\", line 146, in <module>\r\n",
        "    main()\r\n",
        "  File \"/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py\", line 91, in main\r\n",
        "    count_table_fh = gzip.open(input_count_table,'rb')\r\n",
        "  File \"/macqiime/lib/python2.7/gzip.py\", line 34, in open\r\n",
        "    return GzipFile(filename, mode, compresslevel)\r\n",
        "  File \"/macqiime/lib/python2.7/gzip.py\", line 89, in __init__\r\n",
        "    fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')\r\n",
        "IOError: [Errno 2] No such file or directory: '/Applications/picrust-1.0.0/picrust/data/16S_13_5_precalculated.tab.gz'\r\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Ugh! So now, it's not finding the 16S_13_5_precalculated.tab.gz file that it needs. Well, the version of QIIME I am now using is picking OTUs using gg_13_8_otus anyway, so I'm guessing that the 16S_13_5_precalculated.tab.gz won't be compatible. I guess I'll have to go hunt for 16S_13_8_precalculated.tab.gz</p></p>\n",
      "Can't find it on the google; emailed developers again...</p></p>\n",
      "So, apparently those releases are the same, and somehow I have 16S_13_5_precalculated.tab but not 16S_13_5_precalculated.tab.gz in /Applications/picrust-1.0.0/picrust/data</p></p>\n",
      "So, I downloaded 16S_13_5_precalculated.tab.gz from https://github.com/picrust/picrust/releases"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py -i otu_tables/closed_ref_99_otu_table_no_euks.biom -o otu_tables/normalized_closed_ref_99_otu_table_no_euks.biom"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Woot! That seems to have finally worked!</p></p>\n",
      "Now, to predict the metagenome. Note, I am including the -a option to calculate NSTI for each of my samples."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!/Applications/picrust-1.0.0/scripts/predict_metagenomes.py -a normalized_closed_ref_99_otu_table_no_euks_NSTI.tab -i otu_tables/normalized_closed_ref_99_otu_table_no_euks.biom -o otu_tables/otu_tables/metagenome_prediction_from_normalized_closed_ref_99_otu_table_no_euks.biom"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Traceback (most recent call last):\r\n",
        "  File \"/Applications/picrust-1.0.0/scripts/predict_metagenomes.py\", line 337, in <module>\r\n",
        "    main()\r\n",
        "  File \"/Applications/picrust-1.0.0/scripts/predict_metagenomes.py\", line 171, in main\r\n",
        "    ids_to_load=ids_to_load,verbose=opts.verbose,transpose=True)\r\n",
        "  File \"/Applications/picrust-1.0.0/scripts/predict_metagenomes.py\", line 113, in load_data_table\r\n",
        "    genome_table_fh = gzip.open(data_table_fp,'rb')\r\n",
        "  File \"/macqiime/lib/python2.7/gzip.py\", line 34, in open\r\n",
        "    return GzipFile(filename, mode, compresslevel)\r\n",
        "  File \"/macqiime/lib/python2.7/gzip.py\", line 89, in __init__\r\n",
        "    fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')\r\n",
        "IOError: [Errno 2] No such file or directory: '/Applications/picrust-1.0.0/picrust/data/ko_13_5_precalculated.tab.gz'\r\n"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "OK, guess I also need to download ko_13_5_precalculated.tab.gz from here: http://sourceforge.net/projects/picrust/files/precalculated_files"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!/Applications/picrust-1.0.0/scripts/predict_metagenomes.py -a normalized_closed_ref_99_otu_table_no_euks_NSTI.tab -i otu_tables/normalized_closed_ref_99_otu_table_no_euks.biom -o otu_tables/metagenome_prediction_from_normalized_closed_ref_99_otu_table_no_euks.biom"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 16
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!more normalized_closed_ref_99_otu_table_no_euks_NSTI.tab"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\u001b[?1h\u001b=#Sample Metric  Value\r\n",
        "USDAdinner      Weighted NSTI   0.0346476072363\r\n",
        "VEGANlunch      Weighted NSTI   0.0414567707318\r\n",
        "VEGANbreakfast  Weighted NSTI   0.0243931905111\r\n",
        "VEGANdinner     Weighted NSTI   0.0457568163566\r\n",
        "VEGANsnack3     Weighted NSTI   0.0494097346343\r\n",
        "VEGANsnack1     Weighted NSTI   0.00559926339809\r\n",
        "AMERICANbreakfast       Weighted NSTI   0.0154447927366\r\n",
        "USDAbreakfast   Weighted NSTI   0.0387559430454\r\n",
        "USDAsnack2      Weighted NSTI   0.0430168527016\r\n",
        "AMERICANsnack   Weighted NSTI   0.0523967100835\r\n",
        "USDAlunch       Weighted NSTI   0.0358958561468\r\n",
        "AMERICANdinner  Weighted NSTI   0.0373577746488\r\n",
        "AMERICANlunch   Weighted NSTI   0.0590079302876\r\n",
        "USDAsnack1      Weighted NSTI   0.0719431102504\r\n",
        "VEGANsnack2     Weighted NSTI   0.0685329453216\r\n",
        "\r",
        "\u001b[K\u001b[?1l\u001b>"
       ]
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!/Applications/picrust-1.0.0/scripts/categorize_by_function.py -i otu_tables/metagenome_prediction_from_normalized_closed_ref_99_otu_table_no_euks.biom -c KEGG_Pathways -l 3 -o $project_name\\_predicted_metagenomes.L3.biom"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 6
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!biom summarize-table -i $project_name\\_predicted_metagenomes.L3.biom -o picrust_biomL3_summary.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Num samples: 15\r\n",
        "Num observations: 6909\r\n",
        "Total count: 425016246\r\n",
        "Table density (fraction of non-zero values): 0.820\r\n",
        "Table md5 (unzipped): 7ceb0f852d33cf5db50937f3c5237bd1\r\n",
        "\r\n",
        "Counts/sample summary:\r\n",
        " Min: 465561.0\r\n",
        " Max: 136504275.0\r\n",
        " Median: 6824784.000\r\n",
        " Mean: 28334416.400\r\n",
        " Std. dev.: 36086904.928\r\n",
        " Sample Metadata Categories: None provided\r\n",
        " Observation Metadata Categories: KEGG_Description; KEGG_Pathways\r\n",
        "\r\n",
        "Counts/sample detail:\r\n",
        " VEGANsnack2: 465561.0\r\n",
        " VEGANdinner: 1774364.0\r\n",
        " USDAbreakfast: 2917633.0\r\n",
        " VEGANbreakfast: 3280000.0\r\n",
        " USDAdinner: 3335969.0\r\n",
        " USDAlunch: 5480990.0\r\n",
        " VEGANlunch: 6359016.0\r\n",
        " AMERICANdinner: 6824784.0\r\n",
        " AMERICANlunch: 25296577.0\r\n",
        " USDAsnack2: 27048105.0\r\n",
        " VEGANsnack3: 29336002.0\r\n",
        " USDAsnack1: 49754606.0\r\n",
        " VEGANsnack1: 57572715.0\r\n",
        " AMERICANsnack: 69065649.0\r\n",
        " AMERICANbreakfast: 136504275.0\r\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!cat picrust_biomL3_summary.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Num samples: 15\r\n",
        "Num observations: 328\r\n",
        "Total count: 720063157\r\n",
        "Table density (fraction of non-zero values): 0.839\r\n",
        "Table md5 (unzipped): d21edfb86c52a5907862a36688bc617e\r\n",
        "\r\n",
        "Counts/sample summary:\r\n",
        " Min: 783312.0\r\n",
        " Max: 233856269.0\r\n",
        " Median: 11458795.000\r\n",
        " Mean: 48004210.467\r\n",
        " Std. dev.: 61620780.189\r\n",
        " Sample Metadata Categories: None provided\r\n",
        " Observation Metadata Categories: KEGG_Pathways\r\n",
        "\r\n",
        "Counts/sample detail:\r\n",
        " VEGANsnack2: 783312.0\r\n",
        " VEGANdinner: 2994618.0\r\n",
        " USDAbreakfast: 4872468.0\r\n",
        " VEGANbreakfast: 5577176.0\r\n",
        " USDAdinner: 5647133.0\r\n",
        " USDAlunch: 9269937.0\r\n",
        " VEGANlunch: 10426834.0\r\n",
        " AMERICANdinner: 11458795.0\r\n",
        " AMERICANlunch: 42465249.0\r\n",
        " USDAsnack2: 44994834.0\r\n",
        " VEGANsnack3: 49509781.0\r\n",
        " USDAsnack1: 84697383.0\r\n",
        " VEGANsnack1: 97643632.0\r\n",
        " AMERICANsnack: 115865736.0\r\n",
        " AMERICANbreakfast: 233856269.0\r\n"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "FileLink('$project_name\\_predicted_metagenomes.L3.txt')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "Path (<tt>$project_name\\_predicted_metagenomes.L3.txt</tt>) doesn't exist. It may still be in the process of being generated, or you may have the incorrect path."
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 50,
       "text": [
        "/Users/Jenna/Dropbox/Projects/MicrobesWeEat/$project_name\\_predicted_metagenomes.L3.txt"
       ]
      }
     ],
     "prompt_number": 50
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!sed '1d' MWE_predicted_metagenomes.L3.txt | rev | cut -f 2- | rev > MWE_predicted_metagenome.L3.spf"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 26
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!/Applications/picrust-1.0.0/scripts/categorize_by_function.py -f -i otu_tables/metagenome_prediction_from_normalized_closed_ref_99_otu_table_no_euks.biom -c KEGG_Pathways -l 2 -o MWE_predicted_metagenomes.L2.txt "
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 27
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!sed '1d' MWE_predicted_metagenomes.L2.txt | rev | cut -f 2- | rev > MWE_predicted_metagenome.L2.spf"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 28
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Cool! Now, I can use these .spf files and my metadata file in STAMP. </p></p>\n",
      "The most significant difference between Diet Types is \"Other Glycan Degradation\" at L3 (see figure).</p>\n",
      "There are no significant differences between different nutrients at L3.</p>\n",
      "There are no significant differences at L2."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!core_diversity_analyses.py -i otu_tables/closed_ref_99_otu_table_no_euks.biom -o core_diversity_analyses_closed_ref_99 -m MicrobesWeEat.txt -e 974 -c DietType -t 97_uclust_otus/rep_set.tre"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 21
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Pick OTUs"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "The QIIME developers prefer the open reference OTU-picking approach, and I have no reason not to go with the default 97% cutoff, so that's what I'll do here. The pick_open_reference_otus.py script will cluster all of the sequences, assign taxonomy to the OTUs (when possible, a greengenes ID will be assigned,) choose a representative sequence from each OTU (rep_set), and align and build a phylogenetic tree from the representative sequences."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!pick_open_reference_otus.py -r $reference_seqs_97 -i $non_chimeric_sequence_file -o 97_uclust_otus"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 61
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Filter out singletons, mitochondria, and chloroplasts."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!biom summarize-table -i 97_uclust_otus/otu_table_mc2_w_tax_no_pynast_failures.biom -o otu_table_summary_before_cleanup.txt\n",
      "!cat otu_table_summary_before_cleanup.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Traceback (most recent call last):\r\n",
        "  File \"/macqiime/bin/pyqi\", line 5, in <module>\r\n",
        "    pkg_resources.run_script('pyqi==0.3.1', 'pyqi')\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/setuptools-0.9.8-py2.7.egg/pkg_resources.py\", line 540, in run_script\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/setuptools-0.9.8-py2.7.egg/pkg_resources.py\", line 1455, in run_script\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/pyqi-0.3.1-py2.7.egg/EGG-INFO/scripts/pyqi\", line 177, in <module>\r\n",
        "    optparse_main(cmd_obj, argv[1:])\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/pyqi-0.3.1-py2.7.egg/pyqi/core/interfaces/optparse/__init__.py\", line 276, in optparse_main\r\n",
        "    result = optparse_cmd(local_argv[1:])\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/pyqi-0.3.1-py2.7.egg/pyqi/core/interface.py\", line 42, in __call__\r\n",
        "    return self._output_handler(cmd_result)\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/pyqi-0.3.1-py2.7.egg/pyqi/core/interfaces/optparse/__init__.py\", line 251, in _output_handler\r\n",
        "    opt_value)\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/pyqi-0.3.1-py2.7.egg/pyqi/core/interfaces/optparse/output_handler.py\", line 55, in write_list_of_strings\r\n",
        "    raise IOError(\"Output path '%s' already exists.\" % option_value)\r\n",
        "IOError: Output path 'otu_table_summary_before_cleanup.txt' already exists.\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Num samples: 15\r\n",
        "Num observations: 3653\r\n",
        "Total count: 4151371\r\n",
        "Table density (fraction of non-zero values): 0.202\r\n",
        "Table md5 (unzipped): 4ac82db71487abb5ada0f2edb8f1ed98\r\n",
        "\r\n",
        "Counts/sample summary:\r\n",
        " Min: 168669.0\r\n",
        " Max: 318956.0\r\n",
        " Median: 288319.000\r\n",
        " Mean: 276758.067\r\n",
        " Std. dev.: 36395.138\r\n",
        " Sample Metadata Categories: None provided\r\n",
        " Observation Metadata Categories: taxonomy\r\n",
        "\r\n",
        "Counts/sample detail:\r\n",
        " USDAbreakfast: 168669.0\r\n",
        " VEGANbreakfast: 238057.0\r\n",
        " VEGANsnack2: 244886.0\r\n",
        " AMERICANbreakfast: 267254.0\r\n",
        " USDAsnack1: 270166.0\r\n",
        " VEGANdinner: 274360.0\r\n",
        " USDAlunch: 277213.0\r\n",
        " VEGANsnack3: 288319.0\r\n",
        " VEGANsnack1: 291459.0\r\n",
        " AMERICANdinner: 298442.0\r\n",
        " AMERICANlunch: 299035.0\r\n",
        " USDAsnack2: 299998.0\r\n",
        " VEGANlunch: 303246.0\r\n",
        " AMERICANsnack: 311311.0\r\n",
        " USDAdinner: 318956.0\r\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!filter_taxa_from_otu_table.py -i 97_uclust_otus/otu_table_mc2_w_tax_no_pynast_failures.biom -o otu_tables/open_ref_97_otu_table_no_euks.biom -n c__Chloroplast,f__mitochondria"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 79
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!filter_otus_from_otu_table.py -i otu_tables/open_ref_97_otu_table_no_euks.biom -o otu_tables/open_ref_97_otu_table_no_euks_no_singletons.biom  -n 2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 80
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!biom summarize-table -i otu_tables/open_ref_97_otu_table_no_euks_no_singletons.biom -o otu_tables/open_ref_97_otu_table_no_euks_no_singletons.summary"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 81
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!biom summarize-table --qualitative -i otu_tables/open_ref_97_otu_table_no_euks_no_singletons.biom -o otu_tables/open_ref_97_otu_table_no_euks_no_singletons.qualitative.summary"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 24
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!cat otu_tables/open_ref_97_otu_table_no_euks_no_singletons.summary\n",
      "!cat otu_tables/open_ref_97_otu_table_no_euks_no_singletons.qualitative.summary"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Num samples: 15\r\n",
        "Num observations: 3015\r\n",
        "Total count: 1115390\r\n",
        "Table density (fraction of non-zero values): 0.211\r\n",
        "Table md5 (unzipped): b2a3608cd226ece27fa5e1283f9ac75b\r\n",
        "\r\n",
        "Counts/sample summary:\r\n",
        " Min: 974.0\r\n",
        " Max: 279136.0\r\n",
        " Median: 16456.000\r\n",
        " Mean: 74359.333\r\n",
        " Std. dev.: 91486.528\r\n",
        " Sample Metadata Categories: None provided\r\n",
        " Observation Metadata Categories: taxonomy\r\n",
        "\r\n",
        "Counts/sample detail:\r\n",
        " VEGANsnack2: 974.0\r\n",
        " VEGANdinner: 3576.0\r\n",
        " USDAbreakfast: 5002.0\r\n",
        " USDAdinner: 6149.0\r\n",
        " VEGANbreakfast: 7310.0\r\n",
        " AMERICANdinner: 11666.0\r\n",
        " VEGANlunch: 13874.0\r\n",
        " USDAlunch: 16456.0\r\n",
        " VEGANsnack3: 54483.0\r\n",
        " VEGANsnack1: 62446.0\r\n",
        " AMERICANlunch: 96898.0\r\n",
        " USDAsnack2: 104114.0\r\n",
        " USDAsnack1: 226403.0\r\n",
        " AMERICANbreakfast: 226903.0\r\n",
        " AMERICANsnack: 279136.0\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Num samples: 15\r\n",
        "Num observations: 3015\r\n",
        "Table md5 (unzipped): b2a3608cd226ece27fa5e1283f9ac75b\r\n",
        "\r\n",
        "Observations/sample summary:\r\n",
        " Min: 229\r\n",
        " Max: 1838\r\n",
        " Median: 502.000\r\n",
        " Mean: 637.533\r\n",
        " Std. dev.: 387.462\r\n",
        " Sample Metadata Categories: None provided\r\n",
        " Observation Metadata Categories: taxonomy\r\n",
        "\r\n",
        "Observations/sample detail:\r\n",
        " VEGANsnack2: 229\r\n",
        " USDAsnack2: 333\r\n",
        " USDAsnack1: 334\r\n",
        " VEGANbreakfast: 399\r\n",
        " VEGANdinner: 417\r\n",
        " USDAdinner: 476\r\n",
        " VEGANlunch: 480\r\n",
        " USDAbreakfast: 502\r\n",
        " USDAlunch: 607\r\n",
        " AMERICANlunch: 622\r\n",
        " VEGANsnack1: 644\r\n",
        " AMERICANdinner: 660\r\n",
        " AMERICANsnack: 969\r\n",
        " VEGANsnack3: 1053\r\n",
        " AMERICANbreakfast: 1838\r\n"
       ]
      }
     ],
     "prompt_number": 25
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Core Diversity Analyses"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Running core_diversity_analyses.py with libraries subsampled to 974 sequences and categorized by DietType. Phylogeny-based analyses will use the tree produced by the pick_open_reference_otus.py workflow."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!core_diversity_analyses.py -i otu_tables/open_ref_97_otu_table_no_euks_no_singletons.biom -o core_diversity_analyses_open_ref_97 -m MicrobesWeEat.txt -e 974 -c DietType -t 97_uclust_otus/rep_set.tre"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 88
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!core_diversity_analyses.py -i otu_tables/open_ref_97_otu_table_no_euks_no_singletons.biom -o core_diversity_analyses_open_ref_97_notbyDietType -m MicrobesWeEat.txt -e 974 -t 97_uclust_otus/rep_set.tre "
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Statistical Analyses"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!nmds.py -i core_diversity_analyses_open_ref_97/bdiv_even974/weighted_unifrac_dm.txt -o NMDS_output"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 100
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!cat NMDS_output"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "samples\tNMDS1\tNMDS2\tNMDS3\r\n",
        "AMERICANbreakfast\t-0.389494419878\t0.00865409856052\t0.989002218031\r\n",
        "AMERICANsnack\t1.1668613141\t0.205811069656\t-0.0268782053646\r\n",
        "VEGANsnack1\t-1.16269656128\t1.62416158224\t-0.0350176227513\r\n",
        "AMERICANdinner\t-0.637575313264\t-0.190578233223\t0.114462831387\r\n",
        "VEGANsnack3\t-0.757046232072\t-0.427842753588\t-0.284042245494\r\n",
        "VEGANlunch\t0.309650800429\t0.0891678122833\t-0.760614243534\r\n",
        "USDAlunch\t0.0967382998757\t0.522305422021\t-0.15600527316\r\n",
        "USDAbreakfast\t-0.663563312609\t-0.213171210746\t-0.164820483877\r\n",
        "USDAsnack1\t1.3970680081\t0.0547609159045\t0.0207038638468\r\n",
        "AMERICANlunch\t1.22050812948\t-0.0483763297997\t0.00786515738269\r\n",
        "VEGANsnack2\t-0.548791103877\t-0.573646824527\t-0.501889379\r\n",
        "VEGANdinner\t-0.986305644888\t-0.427828502491\t0.101240742954\r\n",
        "USDAdinner\t-0.489593049147\t-0.414695252419\t0.170753782106\r\n",
        "USDAsnack2\t1.2078612921\t0.0810019093889\t0.123929129133\r\n",
        "VEGANbreakfast\t0.23637779293\t-0.289723703259\t0.401309728339\r\n",
        "\r\n",
        "stress\t0.0289760072641\t0\t0\r\n",
        "% variation explained\t0\t0\t0\t"
       ]
      }
     ],
     "prompt_number": 101
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!estimate_observation_richness.py -i otu_tables/open_ref_97_otu_table_no_euks_no_singletons.biom -o observation_richness_estimate"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "^C\r\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!compare_categories.py --method permanova -m $mapping_file -c DietType -i /Users/Jenna/Dropbox/Projects/MicrobesWeEat/core_diversity_analyses_open_ref_97_notbyDietType/bdiv_even974/weighted_unifrac_dm.txt -o permanova_DietTypes"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!biom convert -i otu_tables/open_ref_97_otu_table_no_euks_no_singletons.biom -o otu_tables/open_ref_97_otu_table_no_euks_no_singletons.txt -b"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!alpha_rarefaction.py -i otu_tables/open_ref_97_otu_table_no_euks_no_singletons.biom -m $mapping_file -o alpha_uneven -t 97_uclust_otus/rep_set.tre -f"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 11
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Filter OTU table before doing the group significance tests so that every OTU tested is present in at least 3 of the samples."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!biom add_metadata -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L2.biom -m $mapping_file -o core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L2_with_metadata.biom\n",
      "!biom add_metadata -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L3.biom -m $mapping_file -o core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L3_with_metadata.biom\n",
      "!biom add_metadata -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L4.biom -m $mapping_file -o core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L4_with_metadata.biom\n",
      "!biom add_metadata -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L5.biom -m $mapping_file -o core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L5_with_metadata.biom"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Traceback (most recent call last):\r\n",
        "  File \"/macqiime/bin/pyqi\", line 5, in <module>\r\n",
        "    pkg_resources.run_script('pyqi==0.3.1', 'pyqi')\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/setuptools-0.9.8-py2.7.egg/pkg_resources.py\", line 540, in run_script\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/setuptools-0.9.8-py2.7.egg/pkg_resources.py\", line 1455, in run_script\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/pyqi-0.3.1-py2.7.egg/EGG-INFO/scripts/pyqi\", line 177, in <module>\r\n",
        "    optparse_main(cmd_obj, argv[1:])\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/pyqi-0.3.1-py2.7.egg/pyqi/core/interfaces/optparse/__init__.py\", line 276, in optparse_main\r\n",
        "    result = optparse_cmd(local_argv[1:])\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/pyqi-0.3.1-py2.7.egg/pyqi/core/interface.py\", line 42, in __call__\r\n",
        "    return self._output_handler(cmd_result)\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/pyqi-0.3.1-py2.7.egg/pyqi/core/interfaces/optparse/__init__.py\", line 251, in _output_handler\r\n",
        "    opt_value)\r\n",
        "  File \"/macqiime/lib/python2.7/site-packages/biom/interfaces/optparse/output_handler.py\", line 29, in write_biom_table\r\n",
        "    raise IOError(\"Output path '%s' already exists.\" % option_value)\r\n",
        "IOError: Output path 'core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L2_with_metadata.biom' already exists.\r\n"
       ]
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L2_with_metadata.biom -m $mapping_file -c DietType -o group_significance_L2\n",
      "!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L3_with_metadata.biom -m $mapping_file -c DietType -o group_significance_L3\n",
      "!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L4_with_metadata.biom -m $mapping_file -c DietType -o group_significance_L4\n",
      "!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L5_with_metadata.biom -m $mapping_file -c DietType -o group_significance_L5"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "No metadata in biom table.\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "No metadata in biom table.\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "No metadata in biom table.\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "No metadata in biom table.\r\n"
       ]
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!filter_otus_from_otu_table.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L2_with_metadata.biom -s 3 -o core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L2_with_metadata_in_at_least_3_samples.biom\n",
      "!filter_otus_from_otu_table.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L3_with_metadata.biom -s 3 -o core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L3_with_metadata_in_at_least_3_samples.biom\n",
      "!filter_otus_from_otu_table.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L4_with_metadata.biom -s 3 -o core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L4_with_metadata_in_at_least_3_samples.biom\n",
      "!filter_otus_from_otu_table.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L5_with_metadata.biom -s 3 -o core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L5_with_metadata_in_at_least_3_samples.biom"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L2_with_metadata_in_at_least_3_samples.biom -m $mapping_file -c DietType -o group_significance_L2_3\n",
      "!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L3_with_metadata_in_at_least_3_samples.biom -m $mapping_file -c DietType -o group_significance_L3_3\n",
      "!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L4_with_metadata_in_at_least_3_samples.biom -m $mapping_file -c DietType -o group_significance_L4_3\n",
      "!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L5_with_metadata_in_at_least_3_samples.biom -m $mapping_file -c DietType -o group_significance_L5_3"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "No metadata in biom table.\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "No metadata in biom table.\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "No metadata in biom table.\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "No metadata in biom table.\r\n"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L2_with_metadata_in_at_least_4_samples.biom -m $mapping_file -c DietType -o group_significance_L2_4\n",
      "!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L3_with_metadata_in_at_least_4_samples.biom -m $mapping_file -c DietType -o group_significance_L3_4\n",
      "!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L4_with_metadata_in_at_least_4_samples.biom -m $mapping_file -c DietType -o group_significance_L4_4\n",
      "!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L5_with_metadata_in_at_least_4_samples.biom -m $mapping_file -c DietType -o group_significance_L5_4"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "No metadata in biom table.\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "No metadata in biom table.\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "No metadata in biom table.\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "No metadata in biom table.\r\n"
       ]
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Some Attempts to Visualize the Data"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!make_otu_heatmap.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc974_sorted_L2_with_metadata_in_at_least_4_samples.biom"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r\n",
        "\r\n",
        "Warning: The lineages are missing from the OTU table. If you used single_rarefaction.py to create your otu_table, make sure you included the OTU lineages.\r\n",
        "\r\n"
       ]
      }
     ],
     "prompt_number": 36
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!alpha_diversity.py -m shannon -i /Users/Jenna/Dropbox/Projects/MicrobesWeEat/core_diversity_analyses_open_ref_97/taxa_plots_DietType/DietType_otu_table_sorted_L5.biom -o shannon\n",
      "!alpha_diversity.py -m simpson -i /Users/Jenna/Dropbox/Projects/MicrobesWeEat/core_diversity_analyses_open_ref_97/taxa_plots_DietType/DietType_otu_table_sorted_L5.biom -o simpson\n",
      "!alpha_diversity.py -m simpson_e -i /Users/Jenna/Dropbox/Projects/MicrobesWeEat/core_diversity_analyses_open_ref_97/taxa_plots_DietType/DietType_otu_table_sorted_L5.biom -o simpson_e"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 44
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#brewer2mpl makes it easier to use color tables from colorbrewer2.org in matplotlib\n",
      "!pip install brewer2mpl"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Requirement already satisfied (use --upgrade to upgrade): brewer2mpl in /anaconda/lib/python2.7/site-packages\r\n",
        "Cleaning up...\r\n"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%matplotlib inline\n",
      "from urllib import urlopen\n",
      "\n",
      "import brewer2mpl\n",
      "import pandas as pd\n",
      "import numpy as np\n",
      "import matplotlib.pyplot as plt\n",
      "import pandas as pd"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 25
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Set up some better defaults for matplotlib\n",
      "from matplotlib import rcParams\n",
      "\n",
      "#colorbrewer2 Dark2 qualitative color table\n",
      "dark2_colors = brewer2mpl.get_map('Dark2', 'Qualitative', 7).mpl_colors\n",
      "\n",
      "rcParams['figure.figsize'] = (10, 6)\n",
      "rcParams['figure.dpi'] = 150\n",
      "rcParams['axes.color_cycle'] = dark2_colors\n",
      "rcParams['lines.linewidth'] = 2\n",
      "rcParams['axes.facecolor'] = 'white'\n",
      "rcParams['font.size'] = 14\n",
      "rcParams['patch.edgecolor'] = 'white'\n",
      "rcParams['patch.facecolor'] = dark2_colors[0]\n",
      "rcParams['font.family'] = 'StixGeneral'\n",
      "\n",
      "\n",
      "def remove_border(axes=None, top=False, right=False, left=True, bottom=True):\n",
      "    \"\"\"\n",
      "    Minimize chartjunk by stripping out unnecesasry plot borders and axis ticks\n",
      "    \n",
      "    The top/right/left/bottom keywords toggle whether the corresponding plot border is drawn\n",
      "    \"\"\"\n",
      "    ax = axes or plt.gca()\n",
      "    ax.spines['top'].set_visible(top)\n",
      "    ax.spines['right'].set_visible(right)\n",
      "    ax.spines['left'].set_visible(left)\n",
      "    ax.spines['bottom'].set_visible(bottom)\n",
      "    \n",
      "    #turn off all ticks\n",
      "    ax.yaxis.set_ticks_position('none')\n",
      "    ax.xaxis.set_ticks_position('none')\n",
      "    \n",
      "    #now re-enable visibles\n",
      "    if top:\n",
      "        ax.xaxis.tick_top()\n",
      "    if bottom:\n",
      "        ax.xaxis.tick_bottom()\n",
      "    if left:\n",
      "        ax.yaxis.tick_left()\n",
      "    if right:\n",
      "        ax.yaxis.tick_right()\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 26
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "file = open('OrderProportions.csv')\n",
      "order = pd.read_csv(file)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 29
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "order"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<pre>\n",
        "&lt;class 'pandas.core.frame.DataFrame'&gt;\n",
        "Int64Index: 3 entries, 0 to 2\n",
        "Columns: 110 entries, DietType to o__.11\n",
        "dtypes: float64(109), object(1)\n",
        "</pre>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 30,
       "text": [
        "<class 'pandas.core.frame.DataFrame'>\n",
        "Int64Index: 3 entries, 0 to 2\n",
        "Columns: 110 entries, DietType to o__.11\n",
        "dtypes: float64(109), object(1)"
       ]
      }
     ],
     "prompt_number": 30
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\"\"\"\n",
      "Demo of a basic pie chart plus a few additional features.\n",
      "\n",
      "In addition to the basic pie chart, this demo shows a few optional features:\n",
      "\n",
      "    * slice labels\n",
      "    * auto-labeling the percentage\n",
      "    * offsetting a slice with \"explode\"\n",
      "    * drop-shadow\n",
      "    * custom start angle\n",
      "\n",
      "Note about the custom start angle:\n",
      "\n",
      "The default ``startangle`` is 0, which would start the \"Frogs\" slice on the\n",
      "positive x-axis. This example sets ``startangle = 90`` such that everything is\n",
      "rotated counter-clockwise by 90 degrees, and the frog slice starts on the\n",
      "positive y-axis.\n",
      "\"\"\"\n",
      "import matplotlib.pyplot as plt\n",
      "\n",
      "\n",
      "# The slices will be ordered and plotted counter-clockwise.\n",
      "labels = 'Frogs', 'Hogs', 'Dogs', 'Logs'\n",
      "sizes = [15, 30, 45, 10]\n",
      "colors = ['yellowgreen', 'gold', 'lightskyblue', 'lightcoral']\n",
      "explode = (0, 0.1, 0, 0) # only \"explode\" the 2nd slice (i.e. 'Hogs')\n",
      "\n",
      "plt.pie(sizes, explode=explode, labels=labels, colors=colors,\n",
      "        autopct='%1.1f%%', shadow=True, startangle=90)\n",
      "# Set aspect ratio to be equal so that pie is drawn as a circle.\n",
      "plt.axis('equal')\n",
      "plt.show()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Export for Use with Phinch"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!biom add-metadata -i otu_tables/open_ref_97_otu_table_no_euks_no_singletons_in_at_least_4_samples.biom -o otu_tables/open_ref_97_otu_table_no_euks_no_singletons_in_at_least_4_samples_with_metadata.biom -m MicrobesWeEat.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}