Skip to content

Instantly share code, notes, and snippets.

@jennomics
Created February 6, 2014 19:28
Show Gist options
  • Save jennomics/8850954 to your computer and use it in GitHub Desktop.
Save jennomics/8850954 to your computer and use it in GitHub Desktop.
Microbes We Eat notebook
{
"metadata": {
"name": "MicrobesWeEat-Copy0"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "Before launching this ipython notebook, I typed the macqiime command to configure the shell. I'm using macqiime 1.8.0\nhttp://www.wernerlab.org/software/macqiime/macqiime-installation </p></p>\n\nThis notebook is for the analysis of 15 16S PCR libraries that were produced from DNA extracted from meals, collected by blending a plate of food in the blender. The meals were prepared to be typical of three diet types (Average American, USDA-recommended, and Vegan)"
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": "I copied the stuff below from a QIIME/iPython notebook tutorial: http://nbviewer.ipython.org/github/qiime/qiime/blob/1.8.0/examples/ipynb/illumina_overview_tutorial.ipynb</p></p>I'm not sure what all of it does...\n"
},
{
"cell_type": "code",
"collapsed": false,
"input": "from os import chdir, mkdir\nfrom os.path import join\n#the following are only available in the current development branch of IPython\nfrom IPython.display import FileLinks, FileLink",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": "At the time I started this project, the in-house demultiplexing script that I was using returned all of the sequences in the wrong orientation, so I start by reverse-complementing them. This is something that I figured out by trial and error. </p></p>\nNote: If all of the sequences are in the wrong orientation, then I get zero OTUs found when doing the OTU picking</p></p>\nAlso note: you can enable reverse strand matching during the otu-picking step to avoid this problem. I chose to do it this way because the otu-picking runs faster without the reverse strand matching enabled"
},
{
"cell_type": "code",
"collapsed": false,
"input": "!adjust_seq_orientation.py -i MicrobesWeEat.faa -o MicrobesWeEat.fasta -r",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": "Substitute the file path for your files, and then there is no need to change any of the code in the rest of the notebook.</p></p>\nThe version of macqiime that I'm using does not install the greengenes 99% cutoff OTUs and taxonomy, so did that manually as per the instructions on the MacQIIME Installation site. I just substituted the gg_13_8_otus folder that has all of the otu cutoffs for the one included in macqiime/greengenes/ \n"
},
{
"cell_type": "code",
"collapsed": false,
"input": "sequence_file = \"/Users/Jenna/Dropbox/Projects/MicrobesWeEat/MicrobesWeEat.fasta\"\nnon_chimeric_sequence_file = \"/Users/Jenna/Dropbox/Projects/MicrobesWeEat/non_chimeric_sequences.fasta\"\nmapping_file = \"/Users/Jenna/Dropbox/Projects/MicrobesWeEat/MicrobesWeEat.txt\"\notu_base = \"/macqiime/greengenes/gg_13_8_otus/\"\nreference_seqs_99 = join(otu_base,\"/macqiime/greengenes/gg_13_8_otus/rep_set/99_otus.fasta\")\nreference_tree_99 = join(otu_base,\"/macqiime/greengenes/gg_13_8_otus/trees/99_otus.tree\")\nreference_tax_99 = join(otu_base,\"/macqiime/greengenes/gg_13_8_otus/taxonomy/99_otu_taxonomy.txt\")\nreference_seqs_97 = join(otu_base,\"/macqiime/greengenes/gg_13_8_otus/rep_set/97_otus.fasta\")\nreference_tree_97 = join(otu_base,\"/macqiime/greengenes/gg_13_8_otus/trees/97_otus.tree\")\nreference_tax_97 = join(otu_base,\"/macqiime/greengenes/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt\")\n",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": "Make sure mapping file is good to go. This also provides a quick check for macqiime."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!check_id_map.py -m $mapping_file",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "No errors or warnings were found in mapping file.\r\n"
}
],
"prompt_number": 4
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": "Next task: check for chimeric sequences. </p></p>\nI kept getting this error when trying to run the chimera-checking script below:</p></p>\n\ncogent.app.util.ApplicationNotFoundError: Cannot find usearch61. Is it installed? Is it in your path?</p></p>\n\nthe anwer was yes, it's installed and in my path</p></p>\n\nthe macqiime installation instructions recommend against doing this, but the only way I could get the chimera-checking script to find usearch61 was to copy it into /macqiime/bin"
},
{
"cell_type": "code",
"collapsed": false,
"input": "!identify_chimeric_seqs.py -i $sequence_file -m usearch61 -o usearch_chimera_detection/ -r $reference_seqs_97",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": "FileLinks('usearch_chimera_detection/')",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "usearch_chimera_detection/<br>\n&nbsp;&nbsp;<a href='files/usearch_chimera_detection/chimeras.txt' target='_blank'>chimeras.txt</a><br>\n&nbsp;&nbsp;<a href='files/usearch_chimera_detection/identify_chimeric_seqs.log' target='_blank'>identify_chimeric_seqs.log</a><br>\n&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_chimeras_denovo.log' target='_blank'>MicrobesWeEat.fasta_chimeras_denovo.log</a><br>\n&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_chimeras_denovo.uchime' target='_blank'>MicrobesWeEat.fasta_chimeras_denovo.uchime</a><br>\n&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_chimeras_ref.log' target='_blank'>MicrobesWeEat.fasta_chimeras_ref.log</a><br>\n&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_chimeras_ref.uchime' target='_blank'>MicrobesWeEat.fasta_chimeras_ref.uchime</a><br>\n&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_consensus_fixed.fasta' target='_blank'>MicrobesWeEat.fasta_consensus_fixed.fasta</a><br>\n&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_consensus_with_abundance.fasta' target='_blank'>MicrobesWeEat.fasta_consensus_with_abundance.fasta</a><br>\n&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_consensus_with_abundance.uc' target='_blank'>MicrobesWeEat.fasta_consensus_with_abundance.uc</a><br>\n&nbsp;&nbsp;<a href='files/usearch_chimera_detection/MicrobesWeEat.fasta_smallmem_clustered.log' target='_blank'>MicrobesWeEat.fasta_smallmem_clustered.log</a><br>\n&nbsp;&nbsp;<a href='files/usearch_chimera_detection/non_chimeras.txt' target='_blank'>non_chimeras.txt</a><br>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": "usearch_chimera_detection/\n chimeras.txt\n identify_chimeric_seqs.log\n MicrobesWeEat.fasta_chimeras_denovo.log\n MicrobesWeEat.fasta_chimeras_denovo.uchime\n MicrobesWeEat.fasta_chimeras_ref.log\n MicrobesWeEat.fasta_chimeras_ref.uchime\n MicrobesWeEat.fasta_consensus_fixed.fasta\n MicrobesWeEat.fasta_consensus_with_abundance.fasta\n MicrobesWeEat.fasta_consensus_with_abundance.uc\n MicrobesWeEat.fasta_smallmem_clustered.log\n non_chimeras.txt"
}
],
"prompt_number": 8
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": "Now, remove the sequences that have been flagged as chimeric."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!filter_fasta.py -f $sequence_file -o $non_chimeric_sequence_file -s usearch_chimera_detection/non_chimeras.txt",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 12
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "Pick OTUs and preprocess .biom tables"
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": "There are good reasons NOT to use closed-reference otu-picking in general, but because one of the things I'd like to do is to use PICRUSt to predict a metagenome, I will start by using the 99% cutoff greengenes reference files for closed-reference picking"
},
{
"cell_type": "code",
"collapsed": false,
"input": "!pick_closed_reference_otus.py -o greengenes_99_otus -i $non_chimeric_sequence_file -r $reference_seqs_99 -t $reference_tax_99 -a -O 2 -f",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "^CTraceback (most recent call last):\r\n File \"/macqiime/QIIME/bin/pick_closed_reference_otus.py\", line 144, in <module>\r\n main()\r\n File \"/macqiime/QIIME/bin/pick_closed_reference_otus.py\", line 140, in main\r\n status_update_callback=status_update_callback)\r\n File \"/macqiime/lib/python2.7/site-packages/qiime/workflow/upstream.py\", line 442, in run_pick_closed_reference_otus\r\n close_logger_on_success=close_logger_on_success)\r\n File \"/macqiime/lib/python2.7/site-packages/qiime/workflow/util.py\", line 108, in call_commands_serially\r\n stdout, stderr, return_value = qiime_system_call(e[1])\r\n File \"/macqiime/lib/python2.7/site-packages/qcli/util.py\", line 39, in qcli_system_call\r\n stdout, stderr = proc.communicate()\r\n File \"/macqiime/lib/python2.7/subprocess.py\", line 754, in communicate\r\n return self._communicate(input)\r\n File \"/macqiime/lib/python2.7/subprocess.py\", line 1312, in _communicate\r\n stdout, stderr = self._communicate_with_poll(input)\r\n File \"/macqiime/lib/python2.7/subprocess.py\", line 1366, in _communicate_with_poll\r\n ready = poller.poll()\r\nKeyboardInterrupt\r\n"
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": "!print_biom_table_summary.py -i greengenes_99_otus/otu_table.biom",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Num samples: 15\r\nNum observations: 5190\r\nTotal count: 4386818.0\r\nTable density (fraction of non-zero values): 0.1910\r\nTable md5 (unzipped): ab9cc0db56ca5826554499bbb9c9a75b\r\n\r\nCounts/sample summary:\r\n Min: 239008.0\r\n Max: 341584.0\r\n Median: 298383.0\r\n Mean: 292454.533333\r\n Std. dev.: 26006.4321784\r\n Sample Metadata Categories: None provided\r\n Observation Metadata Categories: taxonomy\r\n\r\nCounts/sample detail:\r\n VEGANbreakfast: 239008.0\r\n USDAbreakfast: 245880.0\r\n AMERICANbreakfast: 268466.0\r\n USDAlunch: 276852.0\r\n USDAsnack1: 282100.0\r\n VEGANsnack3: 291074.0\r\n VEGANsnack1: 292209.0\r\n AMERICANlunch: 298383.0\r\n USDAsnack2: 299246.0\r\n VEGANdinner: 304858.0\r\n VEGANlunch: 307036.0\r\n AMERICANsnack: 310317.0\r\n VEGANsnack2: 313699.0\r\n AMERICANdinner: 316106.0\r\n USDAdinner: 341584.0\r\n"
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": "#make a directory to hold all of the biom tables I will be generating\n!mkdir otu_tables",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": "#I know that I will need to remove at least the chloroplast and mitochondria, so I will start with that\n!filter_taxa_from_otu_table.py -i greengenes_99_otus/otu_table.biom -o otu_tables/99_otu_table_no_euks.biom -n c__Chloroplast,f__mitochondria",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": "!print_biom_table_summary.py -i otu_tables/99_otu_table_no_euks.biom",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Num samples: 15\r\nNum observations: 4999\r\nTotal count: 1110191.0\r\nTable density (fraction of non-zero values): 0.1796\r\nTable md5 (unzipped): f20a55c887213e95537247b2f0ed33ec\r\n\r\nCounts/sample summary:\r\n Min: 972.0\r\n Max: 278095.0\r\n Median: 16437.0\r\n Mean: 74012.7333333\r\n Std. dev.: 91112.1524521\r\n Sample Metadata Categories: None provided\r\n Observation Metadata Categories: taxonomy\r\n\r\nCounts/sample detail:\r\n VEGANsnack2: 972.0\r\n VEGANdinner: 3309.0\r\n USDAbreakfast: 4989.0\r\n USDAdinner: 6121.0\r\n VEGANbreakfast: 7250.0\r\n AMERICANdinner: 11626.0\r\n VEGANlunch: 14103.0\r\n USDAlunch: 16437.0\r\n VEGANsnack3: 53990.0\r\n VEGANsnack1: 62030.0\r\n AMERICANlunch: 96292.0\r\n USDAsnack2: 103668.0\r\n USDAsnack1: 225591.0\r\n AMERICANbreakfast: 225718.0\r\n AMERICANsnack: 278095.0\r\n"
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": "!core_diversity_analyses.py -i otu_tables/99_otu_table_no_euks.biom -o 99_core_output -m $mapping_file -a -O 2 -c DietType -t $reference_tree_99 -e 972",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": "!core_diversity_analyses.py -i otu_tables/99_otu_table_no_euks.biom -o 99_core_output_3000min -m $mapping_file -a -O 2 -c DietType -t $reference_tree_99 -e 3000",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": "!/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py -i otu_tables/99_otu_table_no_euks.biom -o otu_tables/99_otu_table_no_euks_normalized.biom",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": "!/Applications/picrust-1.0.0/scripts/predict_metagenomes.py -i otu_tables/99_otu_table_no_euks_normalized.biom -o otu_tables/99_otu_table_no_euks_metagenome_prediction.biom",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 15
},
{
"cell_type": "code",
"collapsed": false,
"input": "!/Applications/picrust-1.0.0/scripts/categorize_by_function.py -i otu_tables/99_otu_table_no_euks_metagenome_prediction.biom -c KEGG_Pathways -l 3 -o MWE_predicted_metagenomes.L3.txt ",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": "!sed '1d' MWE_predicted_metagenomes.L3.txt | rev | cut -f 2- | rev > MWE_predicted_metagenomes.L3.spf",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 22
},
{
"cell_type": "code",
"collapsed": false,
"input": "!/Applications/picrust-1.0.0/scripts/predict_metagenomes.py -a NSTI.tab -i otu_tables/99_otu_table_no_euks_normalized.biom -o junk.biom",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": "!more NSTI.tab",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\u001b[?1h\u001b=#Sample Metric Value\r\nUSDAdinner Weighted NSTI 0.0348534480174\r\nVEGANlunch Weighted NSTI 0.0417364423876\r\nVEGANbreakfast Weighted NSTI 0.0244078910126\r\nVEGANdinner Weighted NSTI 0.0458789754022\r\nVEGANsnack3 Weighted NSTI 0.0494569643098\r\nVEGANsnack1 Weighted NSTI 0.00563246336882\r\nAMERICANbreakfast Weighted NSTI 0.0154812858609\r\nUSDAbreakfast Weighted NSTI 0.0389871688165\r\nUSDAsnack2 Weighted NSTI 0.0430064439577\r\nAMERICANsnack Weighted NSTI 0.0523705792364\r\nUSDAlunch Weighted NSTI 0.0358553436883\r\nAMERICANdinner Weighted NSTI 0.0373244270361\r\nAMERICANlunch Weighted NSTI 0.058984148405\r\nUSDAsnack1 Weighted NSTI 0.071916534972\r\nVEGANsnack2 Weighted NSTI 0.0686223821176\r\n\r\u001b[K\u001b[?1l\u001b>"
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment