Skip to content

Instantly share code, notes, and snippets.

@gregcaporaso
Created September 10, 2012 20:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gregcaporaso/3693491 to your computer and use it in GitHub Desktop.
Save gregcaporaso/3693491 to your computer and use it in GitHub Desktop.
IPython notebooks corresponding to Reagan et al, ISME Journal 2012
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "compare Pearson distances to Robinson-Foulds distances"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": "In this notebook we perform parallel calculations as in \"complete\" analysis, but use Robinson-Foulds distance rather than Pearson tip-to-tip distances to confirm that our conclusions are robust to the choice of distance metric.\n\nIn this notebook we make use of the trees that were computed in the complete notebook to avoid wasted compute time. Note that **this notebook will not work** if you have not previously run the complete notebook on the same instance.\n\nWe begin by defining the Robinson-Foulds (RF) distance as the `RFD` function."
},
{
"cell_type": "code",
"collapsed": false,
"input": "def shear(tree, names):\n \"\"\"Lop off tips until the tree just has the desired tip names\"\"\"\n\n tcopy = tree.deepcopy()\n all_tips = set([n.Name for n in tcopy.tips()])\n ids = set(names)\n\n if not ids.issubset(all_tips):\n raise ValueError, \"ids are not a subset of the tree!\"\n \n while len(tcopy.tips()) != len(ids):\n for n in tcopy.tips():\n if n.Name not in ids:\n n.Parent.removeNode(n)\n tcopy.prune()\n \n return tcopy\n\ndef RFD(tree1, tree2, proportion = False):\n \"\"\"Calculates the Robinson and Foulds symmetric difference\n\n Implementation based off of code by Julia Goodrich\n \"\"\"\n t1names = tree1.getTipNames()\n t2names = tree2.getTipNames()\n if set(t1names) != set(t2names):\n if len(t1names) < len(t2names):\n tree2 = shear(tree2, t1names)\n else:\n tree1 = shear(tree1, t2names)\n \n tree1_sets = tree1.subsets()\n tree2_sets = tree2.subsets()\n \n not_in_both = tree1_sets ^ tree2_sets\n\n total_subsets = len(tree1_sets) + len(tree2_sets)\n dist = len(not_in_both)\n\n if proportion:\n dist = dist/float(total_subsets)\n return dist",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Next we generate the list of trees that we'll work on. Notice that we're reusing the trees computed in the 'complete' notebook here."
},
{
"cell_type": "code",
"collapsed": false,
"input": "base_region_boundaries = [\n ('v2', 136, 1868), #27f-338r\n ('v2.v3', 136, 2232),\n ('v2.v4', 136, 4051),\n ('v2.v6', 136, 4932),\n ('v2.v8', 136, 6426),\n ('v2.v9', 136, 6791),\n ('v3', 1916, 2232), #349f-534r\n ('v3.v4', 1916, 4051),\n ('v3.v6', 1916, 4932),\n ('v3.v8', 1916, 6426),\n ('v3.v9', 1916, 6791),\n ('v4', 2263, 4051), #515f-806r\n ('v4.v6', 2263, 4932),\n ('v4.v8', 2263, 6426),\n ('v4.v9', 2263, 6791),\n ('v6', 4653, 4932), #967f-1048r\n ('v6.v8', 4653, 6426),\n ('v6.v9', 4653, 6791),\n ('v9', 6450, 6791), #1391f-1492r\n ('full.length', 0, 7682), # Start 150, 250, 400 base pair reads\n ('v2.150', 136, 702),\n ('v2.250', 136, 1752),\n ('v2.v3.400', 136, 2036), # Skips reads that are larger than amplicon size\n ('v3.v4.150', 1916, 2235),\n ('v3.v4.250', 1916, 2493),\n ('v3.v4.400', 1916, 4014),\n ('v4.150', 2263, 3794),\n ('v4.250', 2263, 4046),\n ('v4.v6.400', 2263, 4574),\n ('v6.v8.150', 4653, 5085),\n ('v6.v8.250', 4653, 5903),\n ('v6.v8.400', 4653, 6419)\n]\npercentages = range(76,98,3)\n\nlabels = []\ntrees = []\nfrom os.path import exists\nfor p in percentages:\n for rb in base_region_boundaries:\n tree_path = '/home/ubuntu/data/gg_%s_otus_4feb2011_aligned_%s_pfiltered.tre' % (p,rb[0])\n if exists(tree_path):\n labels.append(\"%i.%s\" % (p,rb[0]))\n trees.append(tree_path)\nprint len(trees)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "256\n"
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Set up the parallel IPython environment and the parallel map function."
},
{
"cell_type": "code",
"collapsed": false,
"input": "from IPython import parallel\nrc = parallel.Client(packer='pickle')\nview = rc.load_balanced_view()\nprint \"We have %i engines available\" % len(rc.ids)\n# skip cached results (for faster debugging)\nrc[:]['skip_cache'] = True\n\nimport socket\nar = rc[:].apply_async(socket.gethostname)\nmapping = ar.get_dict()",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "We have 111 engines available\n"
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": "rev_map = {}\n\nper_node = 4\nfor eid, node in mapping.iteritems():\n if node not in rev_map:\n rev_map[node] = []\n if len(rev_map[node]) < per_node:\n rev_map[node].append(eid)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": "from IPython.utils.data import flatten\ntargets = flatten(rev_map.values())\nlen(targets)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 5,
"text": "56"
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": "dv = rc[targets]\ndv.push(dict(shear=shear, RFD=RFD, trees=trees), block=False)\nload_ar = dv.execute(\n\"\"\"\nfrom cogent.parse.tree import DndParser\nts = [DndParser(open(tree_fp)) for tree_fp in trees]\n\"\"\", block=False)\nload_ar",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 6,
"text": "<AsyncResult: execute>"
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": "def compute_row(i, ts):\n \"\"\"function to compute a whole row with RFD\"\"\"\n import numpy\n row = []\n t1 = ts[i]\n for t2 in ts:\n row.append(RFD(t1, t2, True))\n return numpy.array(row)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Initialize the calculations."
},
{
"cell_type": "code",
"collapsed": false,
"input": "ntasks = len(trees)\ntree_ref = parallel.Reference('ts')\nrf_view = rc.load_balanced_view(targets=targets)\nrow_ars = [ rf_view.apply_async(compute_row, i, tree_ref) for i in range(ntasks) ]",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Poll for completion of the jobs, providing updates each time another job has completed. When the computation has completed, write the distance matrix to file."
},
{
"cell_type": "code",
"collapsed": false,
"input": "import time\nready = [ ar.ready() for ar in row_ars ]\nwhile not all(ready):\n ready = [ ar.ready() for ar in row_ars ]\n sys.stdout.write('\\r%3i / %3i' % (sum(ready), len(ready)))\n sys.stdout.flush()\n time.sleep(5)\n\nrfrows = [ ar.get() for ar in row_ars ]\n\nrf_dist_mat = numpy.array(rfrows)\n\n",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 0 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 13 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 53 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 56 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 62 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 64 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 80 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 96 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r 97 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r111 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r112 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r113 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r118 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r122 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r127 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r128 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r128 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r128 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r128 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r130 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r143 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r152 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r156 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r160 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r160 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r160 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r160 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r160 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r160 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r160 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r160 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r160 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r160 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r160 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r165 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r168 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r173 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r176 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r179 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r180 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r184 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r184 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r185 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r187 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r192 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r193 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r194 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r196 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r199 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r204 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r213 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r216 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r216 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r216 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r216 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r216 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r216 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r216 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r216 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r216 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r216 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r216 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r218 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r218 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r224 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r225 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r225 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r225 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r225 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r225 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r225 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r226 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r226 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r227 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r227 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r229 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r232 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r233 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r236 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r237 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r238 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r239 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r240 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r243 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r245 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r247 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r247 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r247 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r247 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r248 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r250 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r250 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r251 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r251 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r252 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r253 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r255 / 256"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r256 / 256"
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": "from qiime.format import format_distance_matrix\nrf_parallel_fp = '/home/ubuntu/data/rf_distance_matrix.txt'\nrf_parallel_f = open(rf_parallel_fp,'w')\nrf_parallel_f.write(format_distance_matrix(labels,rf_dist_mat))\nrf_parallel_f.close()\n",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "raw",
"metadata": {},
"source": "We now have a Robinson-Foulds distance matrix which we can compare against our original Pearson distance matrix to confirm that our conclusions are robust to the choice of distance metric.\n\nWe'll begin by comparing these with the Mantel test."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!compare_distance_matrices.py -i /home/ubuntu/data/distance_matrix_complete.txt,/home/ubuntu/data/rf_distance_matrix.txt -o /home/ubuntu/data/pearson_v_rf --method mantel -n 1000\n!cat /home/ubuntu/data/pearson_v_rf/mantel_results.txt",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "# Number of entries refers to the number of rows (or cols) retained in each\r\n# distance matrix after filtering the distance matrices to include only those\r\n# samples that were in both distance matrices. p-value contains the correct\r\n# number of significant digits.\r\nDM1\tDM2\tNumber of entries\tMantel r statistic\tp-value\tNumber of permutations\tTail type\r\n/home/ubuntu/data/distance_matrix_complete.txt\t/home/ubuntu/data/rf_distance_matrix.txt\t256\t0.76938\t0.001\t1000\ttwo sided\r\n"
}
],
"prompt_number": 16
},
{
"cell_type": "raw",
"metadata": {},
"source": "We see that the Mantel test gives a significant p-value (p = 0.001), telling us that the Pearson distances are significantly correlated with the Robinson-Foulds distances. \n\nAs we based our original conclusions on our interpretation of a Principal Coordinates (PCoA) plot, it's also useful to perform a Procrustes analysis to tell us whether we would be likely to derive the same conclusion from the PCoA plot generated from Robinson-Foulds distances, based on low sum-of-squares distances between the paired points in the two principal coordinate plots."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!principal_coordinates.py -i /home/ubuntu/data/rf_distance_matrix.txt -o /home/ubuntu/data/rf_pc.txt",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 17
},
{
"cell_type": "code",
"collapsed": false,
"input": "!transform_coordinate_matrices.py -i /home/ubuntu/data/rf_pc.txt,/home/ubuntu/data/pc.txt -o /home/ubuntu/data/pearson_v_rf/ -r 1000",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 18
},
{
"cell_type": "code",
"collapsed": false,
"input": "!cat /home/ubuntu/data/pearson_v_rf/rf_pc_pc_procrustes_results.txt",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "FP1 FP2 Included_dimensions MC_p_value Count_better M^2\r\n/home/ubuntu/data/rf_pc.txt /home/ubuntu/data/pc.txt 3 0.000 0 0.686"
}
],
"prompt_number": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": "We see that the Procrustes test also gives a significant p-value (p < 0.001). We can visualize the Procrustes plot in three dimensions as well."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!compare_3d_plots.py -i /home/ubuntu/data/pearson_v_rf/pc1_transformed.txt,/home/ubuntu/data/pearson_v_rf/pc2_transformed.txt -o /home/ubuntu/data/pearson_v_rf/plots/ -m tree_metadata.txt",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": "We can view the resulting Procrustes plot <a href=\"/files/data/pearson_v_rf/plots/pc1_transformed_3D_PCoA_plots.html\" target=_blank>here</a>\n\n**NOTE**: The above link is not static: to view the plot, you must run the notebook. "
}
],
"metadata": {}
}
]
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment