Skip to content

Instantly share code, notes, and snippets.

@gregcaporaso
Created September 10, 2012 20:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gregcaporaso/3693491 to your computer and use it in GitHub Desktop.
Save gregcaporaso/3693491 to your computer and use it in GitHub Desktop.
IPython notebooks corresponding to Reagan et al, ISME Journal 2012
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "Cloud Demo (fast)"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": "# Step 0. Connect to IPython Cluster"
},
{
"cell_type": "code",
"collapsed": false,
"input": "from IPython import parallel\nrc = parallel.Client(packer='pickle')\nview = rc.load_balanced_view()\nprint \"We have %i engines available\" % len(rc.ids)\n# skip cached results (for faster debugging)\nrc[:]['skip_cache'] = False",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "We have 16 engines available\n"
}
],
"prompt_number": 41
},
{
"cell_type": "markdown",
"metadata": {},
"source": "A utility for monitoring progress while waiting on an AsyncResult:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "import sys\nfrom datetime import datetime\nfrom IPython.core.display import clear_output\n\ndef wait_on(ar):\n N = len(ar.msg_ids)\n rc = ar._client\n submitted = rc.metadata[ar.msg_ids[0]]['submitted']\n\n while not ar.ready():\n ar.wait(1)\n progress = sum([ msg_id not in rc.outstanding for msg_id in ar.msg_ids ])\n dt = (datetime.now()-submitted).total_seconds()\n clear_output()\n print \"%3i/%3i tasks finished after %4i s\" % (progress, N, dt),\n sys.stdout.flush()\n print\n print \"done\"",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 42
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# Step 1. Slice alignments"
},
{
"cell_type": "code",
"collapsed": false,
"input": "base_region_boundaries = [\n ('v2', 136, 1868), #27f-338r\n ('v2.v3', 136, 2232),\n ('v2.v4', 136, 4051),\n ('v2.v6', 136, 4932),\n ('v2.v8', 136, 6426),\n ('v2.v9', 136, 6791),\n ('v3', 1916, 2232), #349f-534r\n ('v3.v4', 1916, 4051),\n ('v3.v6', 1916, 4932),\n ('v3.v8', 1916, 6426),\n ('v3.v9', 1916, 6791),\n ('v4', 2263, 4051), #515f-806r\n ('v4.v6', 2263, 4932),\n ('v4.v8', 2263, 6426),\n ('v4.v9', 2263, 6791),\n ('v6', 4653, 4932), #967f-1048r\n ('v6.v8', 4653, 6426),\n ('v6.v9', 4653, 6791),\n ('v9', 6450, 6791), #1391f-1492r\n ('full.length', 0, 7682), # Start 150, 250, 400 base pair reads\n ('v2.150', 136, 702),\n ('v2.250', 136, 1752),\n ('v2.v3.400', 136, 2036), # Skips reads that are larger than amplicon size\n ('v3.v4.150', 1916, 2235),\n ('v3.v4.250', 1916, 2493),\n ('v3.v4.400', 1916, 4014),\n ('v4.150', 2263, 3794),\n ('v4.250', 2263, 4046),\n ('v4.v6.400', 2263, 4574),\n ('v6.v8.150', 4653, 5085),\n ('v6.v8.250', 4653, 5903),\n ('v6.v8.400', 4653, 6419)\n]\nprint \"%i regions, which we will break up into tasks to be done in parallel\" % len(base_region_boundaries)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "32 regions, which we will break up into tasks to be done in parallel\n"
}
],
"prompt_number": 43
},
{
"cell_type": "code",
"collapsed": false,
"input": "base_percentages = [82]\n\nseq_file_base = '/home/ubuntu/qiime_software/gg_otus-4feb2011-release/rep_set/gg_%i_otus_4feb2011_aligned.fasta'",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 44
},
{
"cell_type": "markdown",
"metadata": {},
"source": "If we want to use multiple seq_files, that's a nested list:\n\n sub_alignments = []\n for seq_file in seq_files:\n for region_boundary in region_boundaries:\n sub_alignemnts.append(load_sub_alignment(seq_file, region_boundary)\n\nThis can actually be transformed into a flat list suitable for map with clever use of `itertools.product`:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "import itertools\nlist_of_tuples = itertools.product(base_percentages, base_region_boundaries)\npercentages, region_boundaries = zip(*list_of_tuples)\nseq_files = [seq_file_base % i for i in percentages]\nlabels = [ \"%i.%s\" % (p,rb[0]) for p,rb in zip(percentages, region_boundaries) ]\n\nntasks = len(region_boundaries)\nntasks",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 45,
"text": "32"
}
],
"prompt_number": 45
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Loading data is an expensive operation. This takes the most time, of any steps"
},
{
"cell_type": "code",
"collapsed": false,
"input": "def load_sub_alignment(seq_file, region_boundary):\n \"\"\"load subregion of data into new file\"\"\"\n from cogent import LoadSeqs\n from cogent.core.alignment import DenseAlignment\n import os\n id_, start, end = region_boundary\n base, ext = os.path.splitext(os.path.basename(seq_file))\n sub_fname = '/home/ubuntu/data/' + base + \"_%s\" % id_ + ext\n if skip_cache and os.path.exists(sub_fname):\n # skip if we've already generated it\n return sub_fname\n aln = LoadSeqs(seq_file, aligned=DenseAlignment)\n sub_alignment = aln.takePositions(range(start, end))\n sub_alignment.writeToFile(sub_fname)\n return sub_fname",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 46
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Submit the loads to be done in parallel"
},
{
"cell_type": "code",
"collapsed": true,
"input": "amr = load_amr = view.map_async(load_sub_alignment, seq_files, region_boundaries)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 47
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Submission is asynchronous, and returns immediately.\n\nNow we wait for the computations to actually finish, returning the list of filenames for the subregions.\n\n**this will take time**"
},
{
"cell_type": "code",
"collapsed": false,
"input": "wait_on(amr)\nsub_aligns = amr.get()\nsub_aligns",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": " 32/ 32 tasks finished after 236 s"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\ndone\n"
},
{
"output_type": "pyout",
"prompt_number": 48,
"text": "['/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v2.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v2.v3.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v2.v4.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v2.v6.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v2.v8.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v2.v9.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v3.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v3.v4.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v3.v6.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v3.v8.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v3.v9.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v4.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v4.v6.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v4.v8.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v4.v9.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v6.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v6.v8.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v6.v9.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v9.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_full.length.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v2.150.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v2.250.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v2.v3.400.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v3.v4.150.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v3.v4.250.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v3.v4.400.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v4.150.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v4.250.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v4.v6.400.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v6.v8.150.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v6.v8.250.fasta',\n '/home/ubuntu/data/gg_82_otus_4feb2011_aligned_v6.v8.400.fasta']"
}
],
"prompt_number": 48
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Now let's take a quick peek at the overhead of performing this compuation with IPython"
},
{
"cell_type": "code",
"collapsed": false,
"input": "def print_parallel_stats(ar):\n \"\"\"print some performance info for a given AsyncResult\"\"\"\n ar.wait()\n serial = 0.\n times = []\n for start,stop in zip(ar.started, ar.completed):\n elapsed = (stop-start).total_seconds()\n times.append(elapsed)\n longest = max(times)\n serial = sum(times)\n finished = max(ar.received)\n submitted = min(ar.submitted)\n wall = (finished - submitted).total_seconds()\n \n bar(range(len(times)), sorted(times))\n xlim(0, ntasks)\n ylabel(\"time (s)\")\n title(\"min=%is max=%is\" % (min(times), max(times)))\n \n print \"ran %.1fs of work in %.1fs in %i tasks on %i engines\" % (serial, wall, len(ar.msg_ids), len(rc.ids))\n print \"for a speedup of %.1fx\" % (serial/wall)\n print \"longest task was %.1fs, which is the best we could hope to do.\" % (longest)\n print \"IPython overhead: %ippm\" % (1e6*(wall-longest)/longest)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 49
},
{
"cell_type": "code",
"collapsed": false,
"input": "print_parallel_stats(load_amr)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "ran 1397.5s of work in 236.3s in 32 tasks on 16 engines\nfor a speedup of 5.9x\nlongest task was 129.3s, which is the best we could hope to do.\nIPython overhead: 826950ppm\n"
},
{
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAEICAYAAAC55kg0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAH4NJREFUeJzt3X1UVHX+B/D3sChqPAiZo4YjasSTA4zKQ64Po+sD6Rqk\nqdFqJXhS1MjMHnaXfo3bGrWmrroyWqfRkqNtuaeyDMncMPUYg+imIoqgiM+apAwCivn9/YHeIBiG\np5lh5r5f59xzmHvv3Plcr9w33++93zsKIYQAERHJkou9CyAiIvthCBARyRhDgIhIxhgCREQyxhAg\nIpIxhgARkYwxBKiOkpISeHh4gHcOE8kDQ4DqUKlUMJlMUCgUbbpdFxcXuLu7w8PDAx4eHnjuuefa\ndPtkX6+//jrUajU6dOiAxYsX11m2bds2DB06FN7e3hg8eDBSU1NRVVUlLb98+TJ0Oh369++P4OBg\nbNq0ydbly5qrvQsg+Th8+DD69u1r7zLICvz9/bF06VKsXbu23h8QZWVl+L//+z8MHz4cZ86cwbPP\nPgtvb2/MmTMHAKDT6VBRUYEffvgBubm5mDJlCh566CFERkbaY1dkhy0BGfDz84Ner8cjjzwCpVKJ\nV155BTdu3MDUqVPRs2dPLFiwACaTCQBQXFwMFxcX3LlzBwCg1Wrx9ttvY9y4cejZsycWLlyIa9eu\ntaiOe9v8raNHj2LSpEno3r07evTogZdeeqllOyoTGzZswNChQ6HT6fDggw9i8ODBOHz4MLZs2QK1\nWo3BgwcjMzNTWn/btm3QaDTw8vLCmDFj8NFHH0nL/v3vf6Nfv37S8c/IyEDPnj1x9erVZtX09NNP\nIyYmpsGuxPj4eIwdOxadOnWCv78/5s6diw0bNgAAqqur8emnn+K1117DAw88gJiYGIwZMwbvvfce\nAEAIgYULFyIwMBDe3t6IjIzE5cuXW/LPRmYwBGRAoVDg/fffh16vx44dO7Bu3TqMHDkSM2bMwIED\nB/DDDz/g888/N/v+tLQ0vPLKK9i/fz/27NmD//znP9Kyrl27wtvbu8HpH//4R53tDB8+HCEhIViy\nZAkuXLggzX/jjTcwcuRInDt3DidPnsTUqVPb/h/ByeTk5KBDhw44cuQIBg8ejLi4OHz55Zf45ptv\nMH/+fMydO1da193dHenp6SgtLcWiRYswf/58FBYWAgCmTZuGIUOGIDk5GVevXsWsWbPwwQcf4P77\n7wcAhIaGmj2+8+fPb1Ht+/btg7+/v/T6zp07df5AuH37No4dOwYA+Prrr3Hw4EHs3bsXpaWlWLdu\nHTp37tyizyUzBDk9Pz8/sXz5cun1mDFjxKRJk6TXS5YsEc8884wQQohTp04JhUIhfvnlFyGEEFqt\nVjz//PPSuqmpqWLatGnNrmH37t2iurpaHD16VDz11FPiiSeekJZNmjRJLFq0SFy8eLHZ25Wj9evX\nCx8fH+kY7dmzRygUCnHo0CEhhBDV1dWiS5cuori4uMH3T58+Xbz77rvS62vXrgmVSiXUarWYM2dO\nq2qbPn260Ol0ZpdnZGSIrl27itOnT0vzZs2aJf70pz+J8+fPiy+//FLcd999IiwsTAghxBdffCEG\nDhwocnJyWlUXmceWgEyEhYVJPyuVyjqvu3fvjnPnzpl9b3h4uPRzjx49Gl3XnKFDh8LV1RVBQUFY\nvnw5MjIypC6HFStWoKKiAgMGDEBMTAx27drV7O3LTXBwMFxcan59lUolAECtVgMAXF1d4ePjIx2n\nvLw8zJw5EwEBAfDy8sKWLVtw6NAhaVteXl544okncOTIEat2xe3btw/Tp0/HZ599BpVKJc1/8803\n0bdvXwwdOhTLli3DhAkToNVqAQATJkxAQkICZs6ciX79+mHp0qVmuxWpZRgCMiXa6BbQ2nf8/HZ6\n++23G33vvV9mlUqFNWvW4OLFi5g6dSri4+P5i96GFi1aBF9fX+zatQvXr1/H5MmT6xz///3vf1i/\nfj2eeuopPP/883XeGxISYvb41u5yqq2hO8sOHjyIuLg4fPjhh9IJ/p4ePXrgzTffRFFREb777jsU\nFRXhj3/8IwDgd7/7HebNm4fDhw9j27Zt0Ov1yMjIaOW/CNXGu4PIosYCo7y83OL7jx49ilu3bkGt\nVqOoqAhvvvkmxo8fjwceeAAAkJ6ejnHjxsHb2xv33Xcf3N3d26x2As6fP49u3brBy8sLW7duxdat\nWxEXFwcAqKqqwvTp05GamoqZM2di8ODB0Ov1SEpKAlDTimiK27dv4/bt2/jll19QXV2NqqoqdOzY\nES4uLjhy5AhiYmKwevVqTJgwod57jx07hgcffBDl5eVIS0vD2bNnpaDIysrC/fffj+DgYLi7u8PF\nxQUeHh5t8w9DANgSkK3af60pFIp6r5u6blNcunQJTz75JLy8vDB16lT4+/tj+fLl0vLMzEwMGDAA\nSqUS6enpeO+996SuDqqvoWPQ2DFZtmwZPvnkE6hUKmzevBmzZ8+Wlv35z39Gnz59MHv2bHTs2BHp\n6elISUlBUVFRs2qaNWsWunTpgo8//hhLlixBly5dkJ6eLn3+1atXkZiYKLUi7nVdATXHv3///hgw\nYABycnKwe/duuLrW/H168eJFTJkyBV27dkVsbCyeffZZDB8+vFm1UeMUoq36BYiIyOFY7c+thIQE\nKJXKOol/z7Jly+Di4oLS0lJp3qpVq+Dv74/g4GDs2bPHWmUREVEtVguBmTNnYvv27fXmnzlzBjt2\n7ECfPn2keZcvX0ZaWhp27twJvV6P5ORka5VFRES1WC0Ehg0bBm9v73rzFy5cWG8QUXZ2NmJiYqBS\nqTBixAgIIaQRjEREZD02vfr2xRdfwNfXF6GhoXXmG41GBAUFSa8DAgJgNBptWRoRkSzZ7BbRiooK\nvPXWW9ixY4c079416YauTTd0t0NbP9mSiEguzN0DZLOWQFFREYqLixEWFoa+ffvi7NmzGDRoEC5d\nuoSoqCgcPXpUWvfYsWOIiIhocDtCCKea3njjDbvXwP2Rz/444z5xfyxPjbFZS0CtVuPSpUvS6759\n+yI3Nxc+Pj6IjIzEyy+/jJKSEpw8eZIDQoiIbMRqLYH4+HgMGTIEBQUF6N27N9avX19nee2uHaVS\niaSkJIwaNQpz587FypUrrVUWERHVYrWWwObNmxtdfvLkyTqvX3jhBbzwwgvWKqfd+u1zVBwd96f9\nc7Z94v60jkONGFYoFBb7t4iIqK7Gzp18QAsRkYwxBIiIZIwhQEQkYwwBIiIZYwgQEckYQ4CISMYY\nAkREMsYQICKSMYYAEZGMMQSIiGSMIUBEJGMMASIiB+Xp6QOFQmF28vT0sbgNPkCOiMhB1TySv7Fz\nYs05kw+QIyKiBjEEiIhkjCFARCRjDAEiIhljCBARyRhDgIhIxhgCREQyxhAgIpIxhgARkYxZLQQS\nEhKgVCqhVquleS+//DKCgoIwcOBALFiwAJWVldKyVatWwd/fH8HBwdizZ4+1yiIiolqsFgIzZ87E\n9u3b68wbO3Ys8vLysH//fty4cQObNm0CAFy+fBlpaWnYuXMn9Ho9kpOTrVUWERHVYrUQGDZsGLy9\nvevMGzNmDFxcXODi4oJx48Zh165dAIDs7GzExMRApVJhxIgREELAZDJZqzQiIrrL1V4f/P7772PW\nrFkAAKPRiKCgIGlZQEAAjEYj/vCHP9R7n06nk37WarXQarXWLpWIyMFk3Z3qnjMbYpcQ+Nvf/gYP\nDw9MmTIFABp8ul3N0/Hqs7RDRESkvTsthk6nw+LFi82uafO7gzZs2IDMzEykp6dL86KionD06FHp\n9bFjxxAREWHr0oiIZMemIbB9+3YsXboUW7duRadOnaT5kZGRyMzMRElJCbKysuDi4gIPDw9blkZE\nJEtW6w6Kj4/Hrl278NNPP6F3795YvHgxUlNTcevWLYwePRoA8MgjjyAtLQ1KpRJJSUkYNWoUOnbs\niHXr1lmrLCIiqoXfLEZE5KD4zWJERNQqDAEiIhljCBARyRhDgIhIxhgCREQyxhAgIpIxhgARUTvi\n6ekDhULR6OTp6dNmn8dxAkRE7Yjle/+B2vf/c5wAERG1GEOAiEjGGAJERDLGECAikjGGABGRjDEE\niIhkjCFARCRjDAEiIhljCBARyRhDgIjIBiw9DqItHwXRHHxsBBGRDTTnEQ98bAQREdkEQ4CISMYY\nAkREMma1EEhISIBSqYRarZbmmUwmxMbGQqVSIS4uDuXl5dKyVatWwd/fH8HBwdizZ4+1yiIiolqs\nFgIzZ87E9u3b68zT6/VQqVQ4ceIEfH19sXbtWgDA5cuXkZaWhp07d0Kv1yM5OdlaZRERUS1WC4Fh\nw4bB29u7zjyj0YjExES4ubkhISEB2dnZAIDs7GzExMRApVJhxIgREELAZDJZqzQiIrrLptcEcnJy\nEBgYCAAIDAyE0WgEUBMCQUFB0noBAQHSMiIish5XW35Yc+7xr7n/tT6dTif9rNVqodVqW1kVEZGz\nybo71T1nNsSmIRAREYH8/HxoNBrk5+cjIiICABAVFYVvv/1WWu/YsWPSst+ytENERKS9Oy2GTqfD\n4sWLza5p0+6gqKgoGAwGVFZWwmAwIDo6GgAQGRmJzMxMlJSUICsrCy4uLvDw8LBlaUREsmS1EIiP\nj8eQIUNQUFCA3r17Y/369UhKSkJJSQkCAgJw7tw5zJkzBwCgVCqRlJSEUaNGYe7cuVi5cqW1yiIi\nolr47CAiIhvgs4OIiKjdYQgQEckYQ4CISMYYAkREMsYQICKSMYYAEZGMMQSIiGSMIUBEJGMMASIi\nGWMIEBHJGEOAiEjGGAJERDLGECAikjGGABGRjDEEiIhkjCFARCRjDAEiIhljCBARyRhDgIhIxhgC\nREQyxhAgIpIxhgARkYwxBIiIWsjT0wcKhaLRydPTx95lNsouIfD+++9jyJAhGDRoEBYsWAAAMJlM\niI2NhUqlQlxcHMrLy+1RGhFRk5lMPwMQjU4167RfNg+B0tJSvPXWW9ixYwdycnJQUFCAzMxM6PV6\nqFQqnDhxAr6+vli7dq2tSyMikh3XxhaaTCZs3rwZBw4cwPHjx6FQKPDwww9j4MCBiI+Ph4eHR7M/\nsHPnzhBC4Pr16wCAiooKdO3aFUajESkpKXBzc0NCQgJSU1NbtkdERK3k6enT6F/wHh7eKCsrtWFF\n1qMQQoiGFsybNw+5ubmYOHEigoKC0K9fPwghcPLkSeTn5+Orr77C4MGD8a9//avZH5qRkYHY2Fi4\nubkhOTkZS5YsQZ8+fXD8+HF06tQJFRUVCAoKwunTp+sWq1DATLlERG1GoVCgpjvH7BoQQjRhveas\na91tmjt3mm0JPPPMM1izZk29+RqNBgCQkpICo9FoodD6rly5gqSkJBw9ehTe3t6YMmUKvvrqqyaf\n3HU6nfSzVquFVqttdg1ERM4t6+5U95zZELMtgYZUV1fj0qVL8PX1bXFp27Ztw8aNG/Hxxx8DAPR6\nPYqLi1FYWIiUlBRoNBrk5uYiNTUVW7ZsqVssWwJEZAP2/qvdli0BixeGR4wYgbKyMty8eRPBwcGI\niYnB22+/beltZg0bNgz79+9HaWkpbt68iYyMDIwdOxZRUVEwGAyorKyEwWBAdHR0iz+DiIiaxmII\nXLt2DZ6enti8eTMef/xxHD58GJ9//nmLP9DT0xMpKSl4/PHHMXToUISFhWHkyJFISkpCSUkJAgIC\ncO7cOcyZM6fFn0FERE1jsTto+PDh2LBhAxITE7Fy5UqEhoYiNDQUhw4dslWNEnYHEVFLWbrjB/j1\nrh97d920q+6g119/HQkJCfj973+P0NBQFBUVwd/f39LbiIhswtKo3Xsjdp1hYJc1NOvCsL2xJUBE\nv2Xvv7AdZZvNbgm8+uqrOHHihNlNFxQU4NVXX7VQKBERtWdmxwmMHTsWr7zyCi5cuICHH34Yfn5+\nEEKguLgYBQUF6NmzJ5KTk21ZKxERtTGL3UHnz5/HoUOHUFhYCADw9/eHWq1Gr169bFJgbewOIqLf\nsnc3i6Ns09y5k9cEiMih2fvk6ijbbPHdQURE5LwYAkREMsYQICKSMYshcOrUKSQlJUlPDz106BD+\n/ve/W70wIiKyPoshoNPpMHHiROm1Wq3G5s2brVoUERHZhsUQKCgowPjx46XXd+7cQceOHa1aFBER\n2YbFEBg6dChyc3MBADdv3sTq1asxbtw4qxdGRPLW1GcCUetYDIEFCxYgLS0NFy9eRL9+/ZCXl8eR\nwkRkdZYe+CbHh71ZQ5MHi92+fdvuXUEcLEYkH203uMrxBna1i+8YvsdkMuGbb77Bvn37cPPmzZrN\nKhRYtWqVpbcSEdVh6Zn+957nT7ZjMQSee+45dO7cGY888gg6duxYK32IiJrn1y4ec8t5brE1iyGQ\nl5dnl28RIyIi67N4TUCv16O0tBTx8fHo2rWrNN/Hx/ZX5nlNgMixNbUPuznr2ruv3VG22eJrAl26\ndMFLL72EtLQ06aKwQqHAyZMnLb2ViIjaOYstgX79+uG7775Dnz59bFWTWWwJEDk2tgTaX0vA4jiB\nhx56CJ07d7a0GhEROSCL3UHe3t4ICwvD6NGjpWsCvEWUiMg5WAyBRx99FI8++midea29RfTGjRuY\nO3cu9u3bB1dXV6xfvx7BwcGYPn06Dh48iIEDByI9PR3u7u6t+hwiImqcXb5ectGiRejcuTP++te/\nwtXVFTdu3MC6detw5swZvPvuu3jppZfg5+eHRYsW1S2W1wSIHBqvCbS/awJmWwJTpkzBp59+CrVa\nXX+zCkWrxg58++232LdvHzp16gQA8PLygtFoREpKCtzc3JCQkIDU1NQWb5+IiJrGbEvg/Pnz6NWr\nF06fPl0vQRQKRYvvFjp79ixGjx6N6Oho5OfnY9KkSUhOTkZgYCCOHz+OTp06oaKiAkFBQTh9+nS9\nz2VLgMhxsSXQ/loCZu8O6tWrFwAgLS0Nfn5+daa0tDQLBZpXVVWFgoICTJ48GVlZWcjLy8Mnn3zS\n5JO7TqeTpqysrBbXQUTkvLIA6ADUnDMbY/GagEajwcGDB+vMGzhwIA4cONDi8oKCgpCfnw8AyMjI\nwEcffYRbt24hJSUFGo0Gubm5SE1NxZYtW+oWy5YAkUNjS8CBWgJ6vR5qtRrHjx+HWq2Wpj59+mD0\n6NEWCmycv78/srOzcefOHWzbtg2jR49GVFQUDAYDKisrYTAYEB0d3arPICIiy8y2BK5fv46ff/4Z\nr732Gt555x0pRZRKZasHjxUUFODpp59GVVUVRo8ejcWLF+POnTsWbxFlS4DIsbEl0P5aAna5RbSl\nGAJEjo0h0P5CwOJjI4iIyHkxBIiIZIwhQEQkYwwBIiIZYwgQEckYQ4CISMYYAkREMsYQICKSMYYA\nEZGMMQSIqNU8PX2gUCjMTp6ePvYukcyw+PWSRESWmEw/o7HHF5hMrftKWrIetgSIiGSMIUBEJGMM\nASIiGWMIEBHJGEOAiEjGGAJERDLGECAikjGGABGRjDEEiKhBlkYBcySwc+CIYSJqkKVRwDXrcCSw\no2NLgIhIxhgCREQyZpcQ+OWXX6DRaDBx4kQAgMlkQmxsLFQqFeLi4lBeXm6PsoiIZMcuIbBy5UoE\nBwdDoajpT9Tr9VCpVDhx4gR8fX2xdu1ae5RFRCQ7Ng+Bs2fP4uuvv8asWbMgRM1FJ6PRiMTERLi5\nuSEhIQHZ2dm2LouISJZsHgIvvvgili5dCheXXz86JycHgYGBAIDAwEAYjUZbl0VEJEs2vUX0q6++\nQvfu3aHRaJCVlSXNv9ciaAqdTif9rNVqodVq265AIiKnkHV3qnvObIhCNOcM3Ep/+ctfsHHjRri6\nuqKqqgplZWWYNGkSKioqkJKSAo1Gg9zcXKSmpmLLli31i1UomhUYRFSfp6fP3TEADfPw8EZZWend\na3aWft9qfictr9u89QC0+Tbtsz/tZ5vmzp027Q566623cObMGZw6dQoff/wxRo0ahY0bNyIqKgoG\ngwGVlZUwGAyIjo62ZVlEsvLrILCGp8YCgpyPXccJ3Ls7KCkpCSUlJQgICMC5c+cwZ84ce5ZFRCQb\nNu0Oai12BxG1nr27JdgdZJ9ttovuICKyDj7sjVqKIUBkY805YVta9956lvr52ddP5jAEiNqINU7Y\nvIhL1sZHSRO1EUuPXuZjl6k9YkuAiEjGGAJERDLGECAikjGGAFEjeOslOTteGCZqBL9nl5wdWwJE\nRDLGECCnYY1BWETOjiFAbao5J9emrstBWETWwwfIUZviA8K4zbZYD+D/j7beJh8gR0RE9TAEiIhk\njCFATcILqUTOieMEqEn4cDQi58SWgIxxNCwRsSUgYxwNS0RsCRARyRhDgIhIxhgCREQyxhAgIpIx\nm4fAmTNnMHLkSISEhECr1WLTpk0AAJPJhNjYWKhUKsTFxaG8vNzWpRERyY7NQ6BDhw5YsWIF8vLy\nsGXLFqSkpMBkMkGv10OlUuHEiRPw9fXF2rVrbV2a0+DALiJqKpuHQI8ePRAeHg4A6NatG0JCQpCT\nkwOj0YjExES4ubkhISEB2dnZti7NafAJmUTUVHa9JlBYWIi8vDxERkYiJycHgYGBAIDAwEAYjUZ7\nlkZEJAt2GyxmMpkwbdo0rFixAu7u7k1+RLROp5N+1mq10Gq11imQiMhhZd2d6p4zGyTs4NatW2LM\nmDFixYoV0rxJkyaJAwcOCCGE2L9/v5g8eXK999mpXIcDQACikQlNXK856zZvPWts0z77w23y/4dj\nbNMcm3cHCSGQmJiIAQMGYMGCBdL8qKgoGAwGVFZWwmAwIDo62talERHJjs1DYO/evUhPT8d///tf\naDQaaDQabN++HUlJSSgpKUFAQADOnTuHOXPm2Lo0IiLZ4ddLOiF7f41dU9Zr2zod76v+uM3G1wP4\n/6Ott2nu3MkRw0REMsYQICKSMYYAEZGMMQSIiGSMIUBEJGMMASIiGWMIEBHJGEOAiEjGGAJERDLG\nECAikjGGgIOw9G1h/MYwImoJhoCdNfWrIC19Wxj4jWFE1AJ2+1IZqvHryd3ccoXtiiEi2WFLgIhI\nxhgCVsD+eyJyFOwOsgJLXTw167Cbh4jsjy2BZmjqRVwiIkfhtCHQ1BN2c7puLN2hw7tziMjROFx3\nUM3XqZnn4eGNsrLSJt91w64bIpIzhwsBnrCJiNqO03YHERGRZQwBIiIZYwgQEclYuwqB77//HkFB\nQfD398fq1avtXQ4RkdNrVyHwwgsvYN26dfj222+xZs0a/PTTT/YuiYjIqbWbELh+/ToAYPjw4ejT\npw/Gjh2L7OxsO1dFROTc2k0I5OTkIDAwUHodHByMH374wY4VERE5PwccJ2B5HMCvA8oaX7ep63Gb\nbbvNugP+2nqbzvFvJOdt8v+HtbbZsHYTAhEREXj55Zel13l5eYiJiamzjhCNDxQjIqLmaTfdQV5e\nXgBq7hAqLi7Gjh07EBUVZeeqiIicW7tpCQDAP//5T8yePRvV1dVITk5Gt27d7F0SEZFTazctAQAY\nMWIE8vPzUVhYiOTkZGm+M44f8PPzQ2hoKDQaDSIjI+1dTrMlJCRAqVRCrVZL80wmE2JjY6FSqRAX\nF4fy8nI7Vth8De2TTqeDr68vNBoNNBoNtm/fbscKm+fMmTMYOXIkQkJCoNVqsWnTJgCOe5zM7Y+j\nHqOqqipERUUhPDwc0dHRWLFiBQA7HB/hAMLDw8WuXbtEcXGxCAgIEFeuXLF3Sa3m5+cnrl69au8y\nWuz7778XBw4cEAMGDJDmvfPOO2L+/PmiqqpKzJs3TyxdutSOFTZfQ/uk0+nEsmXL7FhVy124cEEc\nPHhQCCHElStXRN++fUVZWZnDHidz++PIx+jGjRtCCCGqqqpESEiIKCgosPnxaVctgYY48/gB4cAX\nuocNGwZvb+8684xGIxITE+Hm5oaEhASHO04N7RPguMepR48eCA8PBwB069YNISEhyMnJcdjjZG5/\nAMc9Rl26dAEAlJeX4/bt23Bzc7P58Wn3IeCs4wcUCgVGjRqFuLg4bN261d7ltInaxyowMBBGo9HO\nFbWN1atXIzo6Gu+88w5MJpO9y2mRwsJC5OXlITIy0imO0739uXfziKMeozt37iAsLAxKpRLz58+H\nSqWy+fFp9yHgrPbu3Ysff/wRqampWLhwIS5evGjvklrNUf8aa0xSUhJOnTqFzMxMFBUVYd26dfYu\nqdlMJhOmTZuGFStWwN3d3eGPU+39ue+++xz6GLm4uODHH39EYWEh0tLScPDgQZsfn3YfAhERETh2\n7Jj0Oi8vD9HR0XasqG307NkTABAUFITHHnsMX375pZ0rar2IiAjk5+cDAPLz8xEREWHnilqve/fu\nUCgU8PLywrx58/DZZ5/Zu6Rmqa6uxuTJkzFjxgzExsYCcOzj1ND+OPoxAmpuFBk/fjyys7Ntfnza\nfQg44/iBiooKqcl65coVZGZm1hsY54iioqJgMBhQWVkJg8HgFGF94cIFAMDt27exadMmjB8/3s4V\nNZ0QAomJiRgwYAAWLFggzXfU42Rufxz1GP3000+4du0aAODq1av45ptvEBsba/vjY9XLzm0kKytL\nBAYGiv79+4uVK1fau5xWO3nypAgLCxNhYWFi1KhR4oMPPrB3Sc325JNPip49e4qOHTsKX19fYTAY\nRFlZmXjsscdE7969RWxsrDCZTPYus1nu7VOHDh2Er6+v+OCDD8SMGTOEWq0WgwYNEi+++KJD3dG1\ne/duoVAoRFhYmAgPDxfh4eEiIyPDYY9TQ/vz9ddfO+wxOnTokNBoNCI0NFSMHTtWfPjhh0IIYfPj\noxDCwTsIiYioxdp9dxAREVkPQ4CISMYYAkREMsYQICKSMYYAEZGMMQSIiGTs/wE1FMeVL7tpDgAA\nAABJRU5ErkJggg==\n"
}
],
"prompt_number": 50
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# Step 2. Filter hypervariable positions and mostly gapped positions"
},
{
"cell_type": "code",
"collapsed": false,
"input": "def filter_alignment(fname):\n \"\"\"call out to subcommand, which filters the positions\"\"\"\n import os\n import subprocess\n cmd = \"filter_alignment.py -i %s -e 0.1 -g 0.8 --suppress_lane_mask_filter -o /home/ubuntu/data/\" % fname\n # RUN cmd\n subprocess.call(cmd, shell=True)\n base, ext = os.path.splitext(fname)\n filtered = base + '_pfiltered' + ext\n return filtered",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 51
},
{
"cell_type": "markdown",
"metadata": {},
"source": "This one is quick, so we do it synchronously, but still in parallel:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "%time filtered = view.map_sync(filter_alignment, sub_aligns)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "CPU times: user 0.51 s, sys: 0.07 s, total: 0.58 s\nWall time: 5.37 s\n"
}
],
"prompt_number": 52
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# Step 3. Build trees in parallel"
},
{
"cell_type": "code",
"collapsed": true,
"input": "def build_tree(filtered_aln_fp):\n \"\"\"build tree from a filtered alignment\"\"\"\n import os\n from cogent import LoadSeqs\n from cogent.core.alignment import DenseAlignment\n from cogent.app.fasttree import build_tree_from_alignment\n from cogent import DNA\n \n tree_fp = '%s.tre' % os.path.splitext(filtered_aln_fp)[0]\n if skip_cache and os.path.exists(tree_fp):\n # skip already done\n return tree_fp\n tree = build_tree_from_alignment(LoadSeqs(filtered_aln_fp, aligned=DenseAlignment), moltype=DNA)\n tree.writeToFile(tree_fp,with_distances=True)\n return tree_fp",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 53
},
{
"cell_type": "markdown",
"metadata": {},
"source": "This is the other step that takes some real time."
},
{
"cell_type": "markdown",
"metadata": {},
"source": "**this will take time**"
},
{
"cell_type": "code",
"collapsed": false,
"input": "amr = tree_amr = view.map_async(build_tree, filtered)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 54
},
{
"cell_type": "code",
"collapsed": false,
"input": "wait_on(amr)\ntrees = amr.get()",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": " 32/ 32 tasks finished after 200 s"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\ndone\n"
}
],
"prompt_number": 55
},
{
"cell_type": "code",
"collapsed": false,
"input": "print_parallel_stats(tree_amr)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "ran 1299.5s of work in 200.1s in 32 tasks on 16 engines\nfor a speedup of 6.5x\nlongest task was 100.7s, which is the best we could hope to do.\nIPython overhead: 986627ppm\n"
},
{
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAEICAYAAAC55kg0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHhdJREFUeJzt3XlQFGf+BvCnWQNqOERZERcHZBcZQISJwhCNcXQ9iCmF\naEw06xHBWiUmrppNoilTwV9lg5ZluerKaBJHjZZms1aSNRglahZz7MrgsaJIBLzwiOtBlEFARd7f\nH8SJhGNAmOkZ3udT1VVMzzs936a1H7rft7sVIYQAERFJyU3tAoiISD0MASIiiTEEiIgkxhAgIpIY\nQ4CISGIMASIiiTEEqI6SkhJ4eXmBI4eJ5MAQoDo0Gg0sFgsURWmzZX7zzTfw8vKqM7m5ueHTTz9t\ns+8gdb311luIiorCI488gsWLF9d7//PPP0ffvn3h5+eHyZMno6KiwvpedXU15syZg4CAAPTp0wcf\nfPCBI0uXHkOA7G7w4MGwWCzWKTMzE56enkhISFC7NGojoaGhWLZsGZ5++ul6f0CcOXMGL7zwAmbO\nnIl///vfOH/+PF555RXr++np6fjqq6+QmZmJxYsXY8GCBfjmm28cvQrSYghIIDg4GEajEY8//jj8\n/f3x+uuv49atW3juuecQEBCAuXPnwmKxAADOnj0LNzc31NTUAAAMBgOWLFmCUaNGISAgAPPnz8eN\nGzdaVc/GjRsxYcIEdOrUCQBw4sQJjBs3Dt27d0ePHj3w6quvtm6F27mNGzfiiSeeQFpaGn7zm99g\nwIABOHbsGLZv346oqCgMGDAAWVlZ1vY7d+6ETqeDj48PRowYgQ8//ND63t///neEhIRYt/+uXbsQ\nEBCA69evt6imqVOnIiEhocFTiZs2bUJ8fDxeeeUV9OnTB0uWLMG2bdtQWVkJAFi/fj0WLlyI/v37\nY9KkSRg/frz1aEAIgfnz50Or1cLX1xdxcXG4cuXKQ/3eqBGC2r3g4GCh0+nEkSNHxNGjR4W3t7eI\njY0VO3bsEJcuXRJ6vV58+OGHQgghzpw5IxRFEffu3RNCCDFkyBDRq1cvsXfvXnHhwgURGxsrPvjg\nA+uyfXx8RJcuXRqcli5dWq+W8vJy4eXlJfbv32+d9+yzz4pVq1aJO3fuiFu3bokDBw7Y+Tfi2jZs\n2CDc3d3FO++8I0pLS8XMmTNFSEiImDp1qrh06ZLYsGGDCAkJsbbPzs4Wx48fF9XV1WL37t3Cy8tL\nFBUVWd//wx/+IF588UVx7do10bNnT7Fz507re1FRUY1u39mzZ9erbfLkySItLa3OvIkTJ4p58+ZZ\nX9+8eVMoiiIKCgpEVVWVUBRFHD161Pr+6tWrRVxcnBBCiMzMTGEwGMS1a9dETU2NOHz4sCgrK2v9\nL5GsOqgdQuQYU6ZMQUxMDABAr9fDy8sLY8aMAQCMHTsW+/btw5QpU+p9TlEUJCUl4fe//z0AYNy4\ncdizZw9SUlIAoMVHBZ988gl+/etf48knn7TOq6mpQUlJCUpLS+Hv7w+9Xv9Q6ygTT09PLFy4EG5u\nbpgyZQree+89fPbZZwgICMDkyZMxe/ZsnDt3DkFBQRgyZIj1c6NGjUJiYiL++c9/Wo+41qxZg379\n+mHo0KEYO3YsRo8ebW2fl5fX6lpLS0sxcOBA62tvb2/4+fnh+vXr8Pb2BgD07t3b+n7v3r2tRyL3\n7t1DWVkZzpw5g27dukGn07W6HqqLp4MkER0dbf3Z39+/zuvu3bvj4sWLjX72fngAQI8ePZpsa8um\nTZswderUOvNWrFiBiooK9O3bFwkJCdi/f/9DL18WERERcHOr/e/r7+8PAIiKigIAdOjQAV27drVu\np/z8fEyfPh1hYWHw8fHB9u3b6+zcfXx88Oyzz+L48eN2ORXXrVs3nD592vq6rKwM165dQ7du3dCt\nWzcAtf0G950+fdo6/+mnn0ZycjKmT5+OkJAQLFu2zHqqktoGQ0BSoo2GgHp6etYb+XN/WrJkSZ22\n58+fx/79++uFgEajwZo1a3D58mU899xzmDRpEv+jt6E///nPCAwMxP79+3Hz5k2MHz++zvb/73//\niw0bNuCFF16o02ELAJGRkY1u35deeqnB7/tlx3BYWBiOHTtmfX38+HF07NgRQUFB8PDwQFBQUJ1Q\nOnbsGMLDwwEAv/rVrzB79mwcO3YMO3fuhNFoxK5du1r9O6Gf8XQQ2dRUYJSXlzd7OZs3b8agQYPq\nHPoDwJYtWzBq1Cj4+vri0Ucfhaen50PXSvVdunQJfn5+8PHxwY4dO7Bjxw4kJSUBAKqqqjB58mSk\np6dj+vTpGDBgAIxGI1JTUwHUHkU0R3V1Naqrq3Hv3j3cvXsXVVVVcHd3h5ubG6ZNm4bly5djzZo1\nGD58OBYsWIAXXnjBOjAgJSUFy5YtQ0REBE6ePIlPPvkEn332GQAgOzsb3bp1Q0REBDw9PeHm5gYv\nLy87/JbkxSMBST3415qiKPVeN7dtS2zevBnTpk2rNz8rKwt9+/aFv78/tmzZgvfee896qoPqa2gb\nNLVNli9fjo8//hgajQbbtm3DzJkzre8tXLgQQUFBmDlzJtzd3bFlyxYsWrQIp06dalFNM2bMQOfO\nnfHRRx/hL3/5Czp37owtW7YAqB2dtnXrVhiNRgwaNAgajQarV6+uU4PBYMDTTz+Nt99+G0uXLsUT\nTzwBALh8+TImTJiALl26IDExES+++GKd/iRqPUW01XkBIiJyOXb7cys5ORn+/v7WzioAeO211xAe\nHo7HHnsMc+fOtY4TBoBVq1YhNDQUERER+Pbbb+1VFhERPcBuITB9+nTs3r27zryRI0ciPz8fBw8e\nxK1bt7B161YAwJUrV5CRkYF9+/bBaDRizpw59iqLiIgeYLcQGDx4MHx9fevMGzFiBNzc3ODm5oZR\no0ZZhwLm5OQgISEBGo0GQ4YMgRDCegUjERHZj2q9b++//771YiWz2WwdEgbUDikzm81qlUZEJA1V\nhoj+3//9H7y8vDBhwgQADQ9BbGi0Q1ve2ZKISCaNjQFy+JHAxo0bkZWVZR0+BtTexuDEiRPW199/\n/z1iY2Mb/LwQol1Nb7/9tuo1cH3kWZ/2uE5cH9tTUxwaArt378ayZcuwY8cOdOzY0To/Li4OWVlZ\nKCkpQXZ2Ni8IISJyELudDpo0aRL279+Pa9euoVevXli8eDHS09Nx584dDB8+HADw+OOPIyMjA/7+\n/khNTcWwYcPg7u6OdevW2assIiJ6gEtdLKYois1DG1eTnZ0Ng8Ggdhlthuvj/NrbOnF9bGtq38kQ\nICJq55rad/IGLUREEmMIEBFJjCFARCQxhgARkcQYAkREEmMIEBFJjCFARCQxhgARkcQYAkREEmMI\nEBFJjCFARCQxhgARkcQYAkREEmMIEBFJjCFARCQxhgARkcQYAkREEmMIEBFJjCFARCQxhgARkcQY\nAkREEmMIEBFJjCFARCQxhgARkcQYAkREEmMIEBFJjCFARCQxu4VAcnIy/P39ERUVZZ1nsViQmJgI\njUaDpKQklJeXW99btWoVQkNDERERgW+//dZeZRER0QPsFgLTp0/H7t2768wzGo3QaDQoKipCYGAg\n1q5dCwC4cuUKMjIysG/fPhiNRsyZM8deZRER0QPsFgKDBw+Gr69vnXlmsxkpKSnw8PBAcnIycnJy\nAAA5OTlISEiARqPBkCFDIISAxWKxV2lERO2Ct3dXKIrS6OTt3dXmMhzaJ5CbmwutVgsA0Gq1MJvN\nAGpDIDw83NouLCzM+h4RETXMYvkRgGh0qn2/aR3sWeAvCSGa3VZRlAbnp6WlWX82GAwwGAytrIqI\nqL3J/mmqu89siENDIDY2FgUFBdDpdCgoKEBsbCwAQK/XY+/evdZ233//vfW9X7K1QkRErszbu6vN\nv+C9vHxRVlbaRAvDT9NipKWlYfHixY22dOjpIL1eD5PJhMrKSphMJsTHxwMA4uLikJWVhZKSEmRn\nZ8PNzQ1eXl6OLI2IyCnYOsXT3NM8zWW3EJg0aRIGDhyIwsJC9OrVCxs2bEBqaipKSkoQFhaGixcv\nYtasWQAAf39/pKamYtiwYXjppZewcuVKe5VFREQPUERLTtSrTFGUFvUrEBG5mtr+UFv7udp9oe22\nP7drbN/JK4aJiCTGECAikhhDgIhIYgwBIiKJMQSIiCTGECAicoC2uM+PPXCIKBGRA7RkOCeHiBIR\nkUMwBIiIJMYQICKSGEOAiEhiDAEiIokxBIiIJMYQICKSGEOAiEhiDAEiIokxBIiIJMYQICKSGEOA\niEhiDAEiIokxBIiIJMYQICKSGEOAiEhiDAEiIokxBIiIJMYQICKSGEOAiEhiDAEiIompEgLvv/8+\nBg4ciP79+2Pu3LkAAIvFgsTERGg0GiQlJaG8vFyN0oiIpOLwECgtLcW7776LPXv2IDc3F4WFhcjK\nyoLRaIRGo0FRURECAwOxdu1aR5dGRCQdh4dAp06dIITAzZs3UVlZiYqKCnTp0gVmsxkpKSnw8PBA\ncnIycnJyHF0aEZF0VAkBo9GI4OBg9OjRA4MGDYJer0dubi60Wi0AQKvVwmw2O7o0IqIW8fbuCkVR\nmpy8vbuqXWaTOjj6C69evYrU1FScOHECvr6+mDBhAjIzMyGEaNbn09LSrD8bDAYYDAb7FEpEZIPF\n8iOApvddFovimGLqyP5pqrvPbIgimrv3bSM7d+7E5s2b8dFHHwEAjEYjzp49i+LiYixatAg6nQ6H\nDh1Ceno6tm/fXrdYRWl2WBAR2ZuiKLAVAkDtfst22+a2e7hlNrbvdPjpoMGDB+PgwYMoLS3F7du3\nsWvXLowcORJ6vR4mkwmVlZUwmUyIj493dGlERNJxeAh4e3tj0aJFeOaZZ/DEE08gOjoaQ4cORWpq\nKkpKShAWFoaLFy9i1qxZji6NiEg6Dj8d1Bo8HUREzoSng4iIyKUxBIiIJMYQICKSGEOAiOgXbF0E\n5uwXgLUEO4aJiH5B7U5cdgwTEZFDMASIiCTGECAikhhDgIhIYgwBIpJCe7jtsz1wdBARScGVRvJw\ndBARETkEQ4CISGIMASIiiTX5eEmLxYJt27bh8OHDOHnyJBRFQZ8+ffDYY49h0qRJ8PLyclSdRERk\nB412DM+ePRuHDh3CmDFjEB4ejpCQEAghcPr0aRQUFCAzMxMDBgzA3/72N8cVy45hInpIrtSJ68iO\n4UZDwGw2Iy4urskymtOmLTEEiOhhudIO2ylCoCF3797F//73PwQGBjb3I22KIUBEv+Tt3RUWy4+N\nvu/l5YuyslKX2mE71RDRIUOGoKysDLdv30ZERAQSEhKwZMkSWx8jInKI2gAQjU5NBQQ1IwRu3LgB\nb29vbNu2Dc888wyOHTuGzz77zBG1ERGRndkMAR8fH5w+fRqbNm3C5MmToSgKKioqHFEbERHZmc0Q\neOutt5CcnIxBgwahX79+OHXqFEJDQx1RGxER2RnvHURELk3tDldXWWaLO4bfeOMNFBUVNbrowsJC\nvPHGGzYKJSIiZ9boFcMjR47E66+/jh9++AF9+vRBcHAwhBA4e/YsCgsLERAQgDlz5jiyViIiamM2\nTwddunQJeXl5KC4uBgCEhoYiKioKPXv2dEiBD+LpICL6JbVPs7jKMtvkYjG1MQSI6JfU3rm6yjL5\nPAEiIqpHlRC4desWpk2bhj59+iAiIgI5OTmwWCxITEyERqNBUlISysvL1SiNiEgqqoTA22+/DY1G\ng7y8POTl5UGr1cJoNEKj0aCoqAiBgYFYu3atGqURkZOw9UxgGZ8HbA82Q+DMmTNITU2FTqcDAOTl\n5eGdd95p1Zfu3bsXb775Jjp27IgOHTrAx8cHZrMZKSkp8PDwQHJyMnJyclr1HUTk2nhPIMewGQJp\naWkYM2aM9XVUVBS2bdv20F944cIFVFVVITU1FXq9HkuXLkVlZSVyc3Oh1WoBAFqtFmaz+aG/g4iI\nmsdmCBQWFmL06NHW1zU1NXB3d3/oL6yqqkJhYSHGjx+P7Oxs5Ofn4+OPP272qJ+0tDTrlJ2d/dB1\nEBG1X9kA0gDU7jObYnOI6GuvvYaJEydixowZOHDgAIxGIy5fvtyq20mHh4ejoKAAALBr1y58+OGH\nuHPnDhYtWgSdTodDhw4hPT0d27dvr1ssh4gSSaPthlS63nBOpxoiOnfuXGRkZODy5csICQlBfn5+\nq68UDg0NRU5ODmpqarBz504MHz4cer0eJpMJlZWVMJlMiI+Pb9V3EBGRbc2+WKy6urrVp4LuKyws\nxNSpU1FVVYXhw4dj8eLFqKmpweTJk3HkyBE89thj2LJlCzw9PesWyyMBImnwSKBtl/nQVwxbLBZ8\n+eWX+M9//oPbt2/XfkhRsGrVKhtFtj2GAJFra+6jIAGGQFsvs7F9Z6M3kLvvj3/8Izp16oTHH38c\n7u7uD3wxEVHL/Dzss7H3uW9xNJshkJ+fj7y8PEfUQkREDmbzdJDRaERpaSkmTZqELl26WOd37er4\nq/V4OojItTX39EVL2qp9msVVlvnQp4M6d+6MV199FRkZGdZOYUVRcPr0aVsfJSIiJ2fzSCAkJAT/\n+te/EBQU5KiaGsUjASLXxiMB5zsSsHmdwO9+9zt06tTJVjMiInJBNk8H+fr6Ijo6GsOHD7f2Cag1\nRJSIiNqWzRB46qmn8NRTT9WZxyGiRETtAx8vSUQOwz4B5+sTaPRIYMKECfjHP/6BqKio+otVFF47\nQETUDjR6JHDp0iX07NkT586dq5cgiqKoMlqIRwJEro1HAs53JNDo6KCePXsCADIyMhAcHFxnysjI\nsFEgERG5AptDRL/88st68/bs2WOXYoiIyLEa7RMwGo3IyMjAqVOn6vQLlJWV4fnnn3dIcUREZF+N\n9gncvHkTP/74IxYsWIClS5dazyf5+/urdvEY+wSInFNzbxHNPgHn6xPgEFEiarW23mHbY5mutMN2\nio5hIiJq/xgCREQSYwgQEUmMIUBEJDGGABGRxBgCREQSYwgQEUmMIUBEJDGGABGRxBgCREQSYwgQ\nEUmMIUBEJDFVQuDevXvQ6XQYM2YMAMBisSAxMREajQZJSUkoLy9XoywiIumoEgIrV65ERETET3fA\nq312gUajQVFREQIDA7F27Vo1yiIiko7DQ+DChQv44osvMGPGDOutTc1mM1JSUuDh4YHk5GTk5OQ4\nuiwiIik5PATmzZuHZcuWwc3t56/Ozc2FVqsFAGi1WpjNZkeXRUQkpUYfL2kPmZmZ6N69O3Q6HbKz\ns63zW/KgmLS0NOvPBoMBBoOh7QokkkBznwJmq92DbcnZZP801d1nNsShTxZ78803sXnzZnTo0AFV\nVVUoKyvDuHHjUFFRgUWLFkGn0+HQoUNIT0/H9u3b6xfLJ4sRNaglO2y1n3DFJ4ups0yneLLYu+++\ni/Pnz+PMmTP46KOPMGzYMGzevBl6vR4mkwmVlZUwmUyIj493ZFlELq82AESTk62QIDmpep3A/dFB\nqampKCkpQVhYGC5evIhZs2apWRYRkTT4oHmidsCVTkvwdJA6y3SK00FERORcGAJERBJjCBARSYwh\nQEQkMYYAEZHEGAJERBJjCBA5MW/vrlAUpdHJ27ur2iWSi3PovYOIqGV+vhK4sfcVxxVD7RKPBIja\nSHP/arfVjn/hkyPxSICojTT3r3Zb7R5sS2RvPBIgKfGvdqJaDAFqN1qyw7Z11837d9zk3TmpvePp\nIGo3eJqFqOV4JEBEJDGGABGRxBgCREQSYwgQEUmMIUBOj7dOILIfjg4ip8dbJxDZD48EiIgkxhAg\nIpIYQ4CISGIMASIiiTEEiIgkxhAgIpIYQ4BUw/H/ROrjdQKkGo7/J1IfjwSIiCTGECAikpjDQ+D8\n+fMYOnQoIiMjYTAYsHXrVgCAxWJBYmIiNBoNkpKSUF5e7ujSiIik4/AQeOSRR7BixQrk5+dj+/bt\nWLRoESwWC4xGIzQaDYqKihAYGIi1a9c6ujRqA+zsJXItDg+BHj16ICYmBgDg5+eHyMhI5Obmwmw2\nIyUlBR4eHkhOTkZOTo6jS2s37PEQ9eYus7nP7iUi56AIIZp+KKsdFRcXY+TIkcjLy0NkZCROnjyJ\njh07oqKiAuHh4Th37lyd9oqiQMVyXYaiKGj6Wbu1v0fb7VrStmXt2rZONdeHy+S/D9dYZmP7TtWG\niFosFjz//PNYsWIFPD09m71zT0tLs/5sMBhgMBjsUyARkcvK/mmqu89skFDBnTt3xIgRI8SKFSus\n88aNGycOHz4shBDi4MGDYvz48fU+p1K5LgeAAEQTE5rZriVtW9bOHstUZ324TP77cI1lNsbhfQJC\nCKSkpKBv376YO3eudb5er4fJZEJlZSVMJhPi4+MdXRoRkXQcHgLfffcdtmzZgq+++go6nQ46nQ67\nd+9GamoqSkpKEBYWhosXL2LWrFmOLo2ISDqqdgy3FDuGm0ftDqjmtGvbOl2vk47LbLodwH8fbb3M\nxvadvGKYiEhiDAEiIokxBIiIJMYQICKSGEOAiEhiDAE7sMc9eVqyTCKi5pI+BNS4iRoeuJFac2+4\n1pJlEhE1l/SPl7T1iMPaNkqz2vJxiETkaqQ/EiAikhlDgIhIYgwBIiKJMQSIiCTGECAikhhDgIhI\nYgwBIiKJMQSIiCTGECAikhhDgIhIYi4XAm19nx8iIpm54L2DeJ8fIqK24nJHAkRE1HYYAkREEmMI\nEBFJjCFARCQxhgARkcQYAkREEmMIEBFJjCFARCQxpwqBr7/+GuHh4QgNDcXq1avVLoeIqN1zqhD4\n05/+hHXr1mHv3r1Ys2YNrl27pnZJRETtmtOEwM2bNwEATz75JIKCgjBy5Ejk5OSoXBURUfvmNCGQ\nm5sLrVZrfR0REYEDBw6oWBERUfvngjeQs33jN0W536bpts1tx2W27TJ/bmePZbaP35HMy+S/D3st\ns2FOEwKxsbF47bXXrK/z8/ORkJBQp40QTd9BlIiIWsZpTgf5+PgAqB0hdPbsWezZswd6vV7lqoiI\n2jenORIAgL/+9a+YOXMm7t69izlz5sDPz0/tkoiI2jWnORIAgCFDhqCgoADFxcWYM2eOdX57vH4g\nODgY/fr1g06nQ1xcnNrltFhycjL8/f0RFRVlnWexWJCYmAiNRoOkpCSUl5erWGHLNbROaWlpCAwM\nhE6ng06nw+7du1WssGXOnz+PoUOHIjIyEgaDAVu3bgXgutupsfVx1W1UVVUFvV6PmJgYxMfHY8WK\nFQBU2D7CBcTExIj9+/eLs2fPirCwMHH16lW1S2q14OBgcf36dbXLeGhff/21OHz4sOjbt6913tKl\nS8XLL78sqqqqxOzZs8WyZctUrLDlGlqntLQ0sXz5chWreng//PCDOHLkiBBCiKtXr4revXuLsrIy\nl91Oja2PK2+jW7duCSGEqKqqEpGRkaKwsNDh28epjgQa0p6vHxAu3NE9ePBg+Pr61plnNpuRkpIC\nDw8PJCcnu9x2amidANfdTj169EBMTAwAwM/PD5GRkcjNzXXZ7dTY+gCuu406d+4MACgvL0d1dTU8\nPDwcvn2cPgTa6/UDiqJg2LBhSEpKwo4dO9Qup008uK20Wi3MZrPKFbWN1atXIz4+HkuXLoXFYlG7\nnIdSXFyM/Px8xMXFtYvtdH997g8ecdVtVFNTg+joaPj7++Pll1+GRqNx+PZx+hBor7777jscPXoU\n6enpmD9/Pi5fvqx2Sa3mqn+NNSU1NRVnzpxBVlYWTp06hXXr1qldUotZLBY8//zzWLFiBTw9PV1+\nOz24Po8++qhLbyM3NzccPXoUxcXFyMjIwJEjRxy+fZw+BGJjY/H9999bX+fn5yM+Pl7FitpGQEAA\nACA8PBxjx47F559/rnJFrRcbG4uCggIAQEFBAWJjY1WuqPW6d+8ORVHg4+OD2bNn49NPP1W7pBa5\ne/cuxo8fjylTpiAxMRGAa2+nhtbH1bcRUDtQZPTo0cjJyXH49nH6EGiP1w9UVFRYD1mvXr2KrKys\nehfGuSK9Xg+TyYTKykqYTKZ2EdY//PADAKC6uhpbt27F6NGjVa6o+YQQSElJQd++fTF37lzrfFfd\nTo2tj6tuo2vXruHGjRsAgOvXr+PLL79EYmKi47ePXbud20h2drbQarXit7/9rVi5cqXa5bTa6dOn\nRXR0tIiOjhbDhg0T69evV7ukFps4caIICAgQ7u7uIjAwUJhMJlFWVibGjh0revXqJRITE4XFYlG7\nzBa5v06PPPKICAwMFOvXrxdTpkwRUVFRon///mLevHkuNaLrm2++EYqiiOjoaBETEyNiYmLErl27\nXHY7NbQ+X3zxhctuo7y8PKHT6US/fv3EyJEjxaZNm4QQwuHbRxHCxU8QEhHRQ3P600FERGQ/DAEi\nIokxBIiIJMYQICKSGEOAiEhiDAEiIon9P5Bklxbu/MQiAAAAAElFTkSuQmCC\n"
}
],
"prompt_number": 56
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# Step 4. Compute distances between trees"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "`compare_trees()` computes the distance *submatrix* corresponding to a given tree."
},
{
"cell_type": "code",
"collapsed": false,
"input": "def compare_trees(list_of_tree_files, i, nreps=50, sample_percent=0.1):\n \"\"\"compute section of distance matrix for a single tree\"\"\"\n from cogent.parse.tree import DndParser\n from numpy import zeros, mean\n\n trees = [DndParser(open(f)) for f in list_of_tree_files]\n dist_mat = zeros((len(trees), len(trees)))\n t1 = trees[i]\n t1_ntips = len(t1.tips())\n for dj,t2 in enumerate(trees[i+1:]):\n j = i+dj+1\n sample_size = int(round((min(t1_ntips, len(t2.tips())) * sample_percent)))\n distances = [t1.compareByTipDistances(t2, sample=sample_size) for r in range(nreps)]\n dist_mat[i,j] = mean(distances)\n dist_mat[j,i] = mean(distances)\n return dist_mat",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 57
},
{
"cell_type": "markdown",
"metadata": {},
"source": "For instance, the distance elements for the third-to-last tree:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "%precision 3\ncompare_trees(trees, ntasks-3)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 58,
"text": "array([[ 0. , 0. , 0. , ..., 0. , 0. , 0. ],\n [ 0. , 0. , 0. , ..., 0. , 0. , 0. ],\n [ 0. , 0. , 0. , ..., 0. , 0. , 0. ],\n ..., \n [ 0. , 0. , 0. , ..., 0. , 0.087, 0.132],\n [ 0. , 0. , 0. , ..., 0.087, 0. , 0. ],\n [ 0. , 0. , 0. , ..., 0.132, 0. , 0. ]])"
}
],
"prompt_number": 58
},
{
"cell_type": "markdown",
"metadata": {},
"source": "We can then compute these submatrices in parallel"
},
{
"cell_type": "code",
"collapsed": false,
"input": "map_trees = [trees]*ntasks\namr = view.map_async(compare_trees, map_trees, range(ntasks)[::-1], ordered=False)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 59
},
{
"cell_type": "markdown",
"metadata": {},
"source": "And compute the final distance matrix by perorming a sum (via builtin `reduce()`)"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "**this will take the most time**"
},
{
"cell_type": "code",
"collapsed": false,
"input": "def _print_progress(ar):\n N = len(ar.msg_ids)\n rc = ar._client\n submitted = rc.metadata[ar.msg_ids[0]]['submitted']\n progress = sum([ msg_id not in rc.outstanding for msg_id in ar.msg_ids ])\n dt = (datetime.now()-submitted).total_seconds()\n clear_output()\n print \"%4i/%3i tasks finished after %4i s\" % (progress, N, dt),\n sys.stdout.flush()\n \ndef progress_sum(a,b):\n c = a+b\n _print_progress(amr)\n return c\n\ndist_mat = reduce(progress_sum, amr, 0)\ndist_mat.tofile('/home/ubuntu/data/dist_mat_fast.np')",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": " 32/ 32 tasks finished after 174 s"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 60
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Now we can peek at the distance matrix, to see if there is anything interesting."
},
{
"cell_type": "code",
"collapsed": true,
"input": "# Uncomment here to load dist_mat from cache, to regenerate plots\n# import numpy\n# dist_mat = numpy.fromfile('/home/ubuntu/data/dist_mat.np').reshape(ntasks,ntasks)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 61
},
{
"cell_type": "code",
"collapsed": false,
"input": "pcolor(dist_mat)\nxlim(0,ntasks)\nylim(0,ntasks)\ncolorbar()",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 62,
"text": "<matplotlib.colorbar.Colorbar instance at 0x7fd8b47161b8>"
},
{
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAV4AAAD5CAYAAABvRV34AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X1YlNeZP/DvoyJqVBQhQgKCVCKDIkx0GNMaINYFqwXx\n5VfFbbYrNh3NCxhRs9nNNphsG5P+siGlrpk0IU2rJunGNWrXSKDNiDHhJQY1waG+TjVGEjARsKAB\nfPaPiaMDzLkPM8MwM7k/1zXXJdznOec4jMeHM/ecW1FVVQVjjDGPGTTQE2CMsW8bXngZY8zDeOFl\njDEP44WXMcY8jBdexhjzMF54GWPMw4b0V8eKovRX14wxP+RqZusIRUG7ZNuxY8fiyy+/dGk8Vyj9\nlcerKAqmq++51Mehqu+JG3xOdPAHiUE+KgRiCh2Gs8teJ7t4a+8ycYMGooPHyCEAtRAIL3Qc30Rc\n3yQxxk+IeCfdxdrHR2Bt4QiH8QiFej5vJ8dQM+KFcWWE+CWtPq6g8AWgcJXjNv+W+O/CPo5BPIc2\nOH4OrvsaQ4Vx07G5wOZC4IFCh22GhrWIxygbLZ7EVXEYABBIxNcDaC4Eggp7j5/bKDFIocsLr6Io\n+A/Jto/B9YXeFf12x8sYY54WMNATkMR7vIwxvzFE8tGbiooKaDQaxMbGori42OEYNTU1GDJkCHbs\n2GH7XnR0NKZNmwatVovk5GSpeX67jU0b6BnIGZk20DOQcleab9xzpM0Y6BlI0qUN9AzkBKYN9AwA\nAMNduDY/Px9GoxFRUVHIyMhATk4OQkJC7Np0dXXhkUcewdy5c+2+rygKTCYTgoODpcbiO15fWXhH\npQ30DKTwwutmyWkDPQM5w9IGegYArFsNMo/umpubAQApKSmIiopCeno6qqqqerQrLi7GkiVLEBoa\n2iPWlz1jXngZY37D2a2GmpoaxMXF2b6Oj49HZWWlXZvz589j165dWL16NQD7zC1FUTB79mxkZ2dj\n9+7dUvPsN/E45jD2ZssSuoNmIn6ZiMu8YztKHH7rs4V0H1TWwkkiTr+RD/T8D9beJOJ/2waJ9L5Y\nIt5Bd3Ecd4gbaOaI4yfoMcifO7XFdpEeQvTaBYCX8FNhvOnzceQYY0MviRt8SnaBr6+IsxaG/+Ar\nYbz99bH0IL8j4lTWQuTj9BjnCuk2Ehz9vmX+5uGKNWvWYNOmTVAUBaqq2t3hHjx4EOHh4TCbzcjM\nzERycjLCwsIc9sV3vIwxv+HoDjcBwI9uenSn0+lQX19v+7qurg4zZ860a3Po0CEsW7YMEydOxI4d\nO3D//ffb7m7Dw8MBABqNBllZWdizZ49wnrzwMsb8hrN7vEFBQQCsmQ0WiwVlZWXQ6/V2bU6fPo0z\nZ87gzJkzWLJkCbZs2YKsrCy0tbWhtbUVANDY2IjS0tIeb751J9xquHLlClJTU3H16lUMGzYMS5cu\nxcMPP4zW1lb8+Mc/Rm1tLe68805s3boVI0eOpJ8VxhjrR668tVtUVASDwYCOjg7k5eUhJCQERqMR\nAGAwGBxe19DQgEWLFgEAxo0bh4KCAkRGRgrHEi68w4YNw7vvvosRI0bg6tWrmD59On74wx9i586d\nmDBhAv74xz+ioKAAL7zwAtatW9fXvydjjLmVK+lkqampMJvtd4IdLbivvPKK7c8xMTE4fPhwn8Yi\ntxpGjLB+9PHy5cvo7OxEYGAgqqursXLlSgQGBiI3N7fXtAvGGPM0Vz5A4Unkwnvt2jUkJiZi/Pjx\nePDBBzFhwgS71Iu4uDhUV1f3+0QZY4zi7B6vp5GL/6BBg3DkyBFYLBbMmzcP3/ve96QThRsLjbY/\nT0oLx6S022xf/3X0ZPL66ugUcYMQcRj/TA4BEBk96beVkl28k5YpbpBGdBAnkep1QRyeHvO+MH4I\n36XH+BdiHlQaF4CzEO9tkaeYfEKPASMRJ84swrv0EONmi3POomERxoeO/5ocYwzEqV7D0+mztm7D\nZ8L4YHQJ4+d+Rvy8AHxq2CZucA+RLtbbb+vHTNaHbSLkNKR4w92sDOl5RkdHY968eaiqqoJOp4PZ\nbIZWq4XZbIZOp+v1mrmF0902UcaYH4lPsz6ue1PmBDOaN9zNyhBuNTQ1NeHSJest4cWLF/HOO+9g\nwYIF0Ov1KCkpQXt7O0pKSnrkuzHG2EDwiz3eCxcuYPbs2UhMTMTy5cuxbt06hIeHY/Xq1Th79iwm\nT56M8+fPY9UqwcGmjDHmIX6xx5uQkICPPvqox/dHjRqFXbt29dukGGPMGa6kk3mSN9x1M8aYW3jD\n3awMXngZY37DVxa0fq25hs0udk0dakSdMiU1PJGy87LELy8rqWO7qP+HJY79Isv4UXGZMcYTcbro\nWqN6qzAeur1V3EGlOAwAqkmc9qZ8LP7Bf6BqyTGW4TVh/G9PxwnjVFYdALoO3pMSfVDj1FJHnL1M\nDhGh/qMw/umLk8QdDCOHAH6iuKXm2kXJlXdcJ9dcY4wxtxgiu6JJFG7tT7zwMsb8RsDggZ6BHF54\nGWN+Q/qOd4D5yDQZY4wWEDjQM5DDCy9jzH/4yIrGFSgYY/7Dhc8MV1RUQKPRIDY2FsXFxQ6HqKmp\nwZAhQ7Bjx44+X3vzNPtPmiA2TCKV43XitCyq6EU9EQeATiJdbJZEH0OIdLEk4vqLEmnfI4k2HxNH\nc1JFJgHgChFvpecZ8glxhBn1XEicgIZ9RDxBHNa20YdW60eIz5j+211EOlm0xNvmHxD//GRS0sjX\nOJUutpIcIpKoQHpxmbiwZ/vbEgU13cWFFS0/Px9GoxFRUVHIyMhATk4OQkLsj0Ds6urCI4880qO0\nj8y1N+M7XsaY/xgs+eimudla0jwlJQVRUVFIT0/vtcBDcXExlixZgtDQ0D5fezNeeBlj/sPJrYab\nizsAQHx8PCor7T/Nc/78eezatQurV68G8M2HxCSv7W2ajDHmHxxkNZguWx+uWLNmDTZt2gRFsX7K\nzpVPvvHCyxjzHw5WtLQx1sd1Gxvs4zqdDuvXr7d9XVdX12Mf99ChQ1i2zFrepKmpCW+//TYCAgKQ\nmppKXtsdbzUwxvyHk1sNQUFBAKzZCRaLBWVlZdDr9XZtTp8+jTNnzuDMmTNYsmQJtmzZgqysLKlr\ne5smY4z5Bxc+MlxUVASDwYCOjg7k5eUhJCQERqO1wJ+jMu+ia0X6d+Edc9Vx7JzER0wiXByfyPgB\nADTQTUihdBOhMIk21P6UQqSLUaliAEBVcLLQXfzP1B8I48NbxAUe2yMkUo9uIeJUEVQJQ4gikeTz\neUXin5aFiMu8LmqJWmXDiSP+JP6NdOG0MN5+UvwzG/oPLeQYdGlQSS6saKmpqTCbzXbfc7TgvvLK\nK+S1InzHyxjzHz6yovnINBljTIKPrGg+Mk3GGJPAh+QwxpiH+ciK5iPTZIwxCXwQOmOMeZiPrGj9\nO80/CDZcjBLXn6EK9VEFHsWnJlm1icMzJHLaWk3i+IU0cVzmp9BJFKuMJE4OoworAkANEScyrABg\nBg4J40OHCVIMAbR/TI8BcRdAufh1cXTEVHKIDzFD3KCQ6OATcgj65dtJpIoBwA+IdDEqXfIcPcRn\nuE3cgHhtfT1sND2Iu/DCyxhjHsZbDYwx5mE+sqIJz2o4d+4c7rnnHkyZMgVpaWnYvn07AKCwsBAR\nERHQarXQarXYt486mZoxxjxgmORjgAn/fwgICMBzzz2HpKQkNDU1ITk5GZmZmVAUBWvXrsXatWs9\nNU/GGKP5w1ZDWFgYwsKsHxgPCQnBlClTUFNjfQfGlbMoGWOsX/jIVoP0NE+ePIm6ujro9XocOHAA\nxcXF+O///m8sXLgQ999/P0aNGtXzovLCG3+OSbM+GGOs2gTUmNzfr48svIoqceva2tqKtLQ0/Pzn\nP8eCBQvwxRdfIDQ0FC0tLVi/fj3uuOMOrFu3zr5jRQGOCLpeJjE7CxGnTqmSqCGJMUTc/LlEJ+PF\nYarOpMzpZlR60k+JuEw6GXUCGlVcFED1E+JKk0asEsZfPvoAOYa6SVwEVWkQv6RP/4VIjwLwGP5D\nGN8emyvu4Dw5BNBOpIsNIVLFABDTBKKJuMTLe2qeOM/wwjXx85kwiM4RNCk/cPm3aEVRoP6rZNtf\nDuxv7eRB6B0dHVi8eDHuvfdeLFiwAABw6623QlEUBAUF4YEHHsDOnTv7faKMMUZyoby7JwkXXlVV\nsXLlSkydOhVr1qyxff/ChQsAgM7OTmzfvh3z5s3r31kyxpgMFxbeiooKaDQaxMbGori4uEd8165d\nSExMRFJSEubPn297vwsAoqOjMW3aNGi1WiQnJ0tN06GDBw9i69attg4B4Je//CVee+01HD58GEOH\nDkVKSoqt6iZjjA0oF04ny8/Ph9FoRFRUFDIyMpCTk2NXSWLOnDm23/r379+PgoICVFRUALBuc5hM\nJgQHB0uNJVx4Z82ahWvXrvX4/g9+IK4ywBhjA8LJbYTm5mYAQEpKCgAgPT0dVVVVmD9/vq3NLbfc\nYtd+2DD7hOC+7BlzsUvGmP9wcquhpqYGcXE36iDFx8ejsrKyR7udO3ciOjoaubm5ePHFF23fVxQF\ns2fPRnZ2Nnbv3i01TcYY8w8OPkBhOmF9uGrhwoVYuHAh3njjDSxcuBC1tbUArNuy4eHhMJvNyMzM\nRHJysu0zEL3p34VX1LtMIT/iQC6STJoWVRjRLJEXNJxIJ6PmIVOckUp7+yHxa84+cQqWFIlXywji\nyK1ziBTGI6adpAfZT8S/Lw5PPHKBHGJc4kVxgzSig5ckThajClHK/OtcSMSHdYrj79GDXIK4mOVX\njeIX59DxbitlSXPw10nTWB/XbdxrH9fpdFi/fr3t67q6OsydO9fhMEuXLkVeXh7a29sxfPhwhIeH\nAwA0Gg2ysrKwZ88e3HfffQ6v560Gxpj/cHKrISgoCIA1s8FisaCsrAx6vd6uzalTp2z7uHv37sX0\n6dMxfPhwtLW1obW1FQDQ2NiI0tJS4aJ9fZqMMeYfXDiroaioCAaDAR0dHcjLy0NISAiMRuvB4QaD\nATt27MDvf/97BAQEQKvV4plnngEANDQ0YNGiRQCAcePGoaCgAJGR4t/seOFljPkPF04eS01Nhdls\ntvuewWCw/XnDhg3YsGFDj+tiYmJw+PDhPo3FCy9jzH/4yIrmI9NkjDEJ/nAsJGOM+RQfWdH6d5qi\ng6gOtEh04GKRvEtuaJN5p+t9UAUHqVPWADqdrJJIF6uVGEP8fgAgUffzIlFgtA3DhfHWryWOQKMO\nF/tQHK5N1IgbACjeu17cgEoX+6nEyWImIi6TDmkh4sOIf+ISxS5vw2fC+NfjhwrjI6iCsu7ECy9j\njHkYbzUwxpiHeUE9NRm88DLG/Aff8TLGmIf5yIrmI9NkjDEJPrKi+cg0GWNMgo+saP07zcmC2CqJ\nVLEZRPwrIi4+VMmKSPV6KOVXZBflRDVLS0u0MD5qdCs5RmtLL1WcbxI92iKMn5r7HXKM7wSfEsZl\n0oJSqqqF8Wl6ceHDlKEHyDFwhoj/Whw+iUn0GPOJdLH/FaeLPTSPft1QqXcfkv8AgBlE7lwnselZ\nlaIXxgFgI8R/V+rvcQzx5BhvkS0k8R4vY4x5mI+saD4yTcYYk+BCzTVP4oWXMeY/fGRF85FpMsaY\nBB9Z0bgCBWPMfzhZgQKwVp/QaDSIjY1FcXFxj/iuXbuQmJiIpKQkzJ8/HzU1NdLXdscLL2PMb6iD\n5R69yc/Ph9FoRHl5OTZv3oympia7+Jw5c3DkyBEcPnwYGzZsQEFBgfS13fXrjflvfvtTh7Hvo5y8\n/mNMI+IJwvhybCPHuHSHOOcsvusY2cf9gzeLGxCZc1+AKJYJ4Opo8QlQk3FcGLcER5NjjIO4wGMr\nxCltAEBkHmHzbwqE8ZaJ4r8nAGwk6lC+t1ychjVLySTH+KN6SBifRKRIaY+YhXEAOJMYLow3EWla\nADCt7ROyjUjtiCSyzczNRHUF6rC3cTvIMX5JtpDT5eSK1tzcDABISUkBAKSnp6Oqqgrz58+3tbnl\nllvs2g8bNkz62u74jpcx5je6hsg9uqupqUFcXJzt6/j4eFRWVvZot3PnTkRHRyM3Nxe//e1v+3Tt\nzYT/P5w7dw7/9E//hC+++AKhoaH42c9+huXLl6O1tRU//vGPUVtbizvvvBNbt27FyJES56gyxlg/\nuhrY+29MB/Zfw3v71Zu+0+VU/wsXLsTChQvxxhtvIDs7G7W1Mgdd9yS84w0ICMBzzz2Huro6vPnm\nm3jsscfQ2tqKLVu2YMKECThx4gQiIiLwwgsvODU4Y4y5U9fgwb0+vjs7ABs2DrU9utPpdKivr7d9\nXVdXh5kzZzocZ+nSpfjss8/Q3t6OGTNm9OlagFh4w8LCkJRk3QMKCQnBlClTUFNTg+rqaqxcuRKB\ngYHIzc1FVVWVcBDGGPOELgyWenQXFBQEwJqdYLFYUFZWBr3e/uPUp06dgqpa75r37t2L6dOnY/jw\n4RgzZgx5bXfSW9EnT55EXV0dkpOTsWLFCtueRlxcHKqrxZ/PZ4wxT6DOphApKiqCwWBAR0cH8vLy\nEBISAqPRCMBa5n3Hjh34/e9/j4CAAGi1WjzzzDPCa0WkFt7W1lYsXboUzz33HEaOHGlb9Sl7Cz+y\n/Tk2LRyxaeJ3cRlj3w6mD60Pd+tyIVErNTUVZrN9NorBYLD9ecOGDdiwYYP0tSLkLDs6OrB48WLc\ne++9WLBgAQDrfojZbIZWq4XZbIZOp+v12r0VNxZeVAB44qag+E0/q/YTRIOPhNEnJU5FAtrF4UzH\nKXE2e6gG1A+EmAMAsqrhxEXi+JlPJcYYQcSJPC4Af1TF6WI/Wr5b3EG9OAwAaoy4sKdCpIu9p5I/\nMMzaTqwK/0F0QB8sBlA/knclXhdzxMVD8TlxvfiwOKs6Iv46EZfKeCOKtUrqbRvBGwn3eFVVxcqV\nKzF16lSsWbPG9n29Xo+SkhK0t7ejpKSE3EhmjDFPcHaP19OEC+/BgwexdetW/OUvf4FWq4VWq8W+\nffuwevVqnD17FpMnT8b58+exapWojjtjjHnGVQyVegw04VbDrFmzcO3atV5ju3bt6pcJMcaYs1zZ\n4/Uk35glY4xJ8IZtBBm88DLG/AYvvIwx5mGu5PF6Ur8uvGfedZwiEk2daAQAtxHxs0RcJqXnCyJ+\n4T66D+rvcgsRlylXEkvEK4j49yXGcINyIitITSAaTKTHUGoLxWPEiOMbJTKX1HEuzvPP9BhIJeIy\n/0bEpw8CV4i4+IA/q38g4tRzIXGMi3uSyXiPlzHGPI63GhhjzMO+9oJUMRm88DLG/Abv8TLGmIfx\nHi9jjHkY7/EyxpiH8cILYI7q+Fiiz9voAo+Xi8VnWpInL71JDkGmco08QuXrAJdfJ+Z5kujgKXII\n4CAR/18i/p7EGE+1EA2oOGBWZwnjimISd/Dxq+QYalaheIxz4mNLmzvp/L1/G/yIML756/uF8VFD\nL5NjfHp0krjB22QXdDoYdQKazMlhxMsbTxLxTpnjEmVOEqT5yh4vF7tkjPmNrxEo9ehNRUUFNBoN\nYmNjUVxc3CO+bds2JCYmIjExEcuXL8fx4zcqe0dHR2PatGnQarVITk4m58lbDYwxv+HKVkN+fj6M\nRiOioqKQkZGBnJwcu0oSMTExqKioQFBQEF599VU8+eST+MMf/gAAUBQFJpMJwcHBUmPxHS9jzG90\nYrDUo7vm5mYAQEpKCqKiopCent6jluRdd91lq802f/587N+/3y4uW5kH4IWXMeZHujBE6tFdTU2N\nrY4kAMTHx6Oy0nGZnBdffBGZmTcqnSiKgtmzZyM7Oxu7dxNVVsBbDYwxP+Joq+GvpgYcN1Hvxssp\nLy/H1q1b8f7779u+d/DgQYSHh8NsNiMzMxPJyckICwtz2Aff8TLG/IajUj+T0m7HvMI7bY/udDod\n6utvFPyrq6vrtaTZ0aNHsWrVKuzevdtW1h0AwsOthXw1Gg2ysrKwZ4+4rl+/3vHG45jD2PARbeT1\nn6S5mE52jhwCGCUOzxlRTnbx1pRlEgMJaCXaRIvDQXMahPHmTx3/72sTPlocv0TEAQxBF9GCShf7\nCTkGzhSK41PF4dFnviaHuDRpjDCeMFRcJXKERAHTjGmlwvibcYvJPr6+Ik6Na/90rLgDiZPDMIyI\nUyfnmc9LDOIezr65dn3vtqKiAhMmTEBZWRkef/xxuzZnz57F4sWLsW3bNkyadCMVsK2tDV1dXRg1\nahQaGxtRWlqKhx9+WDgebzUwxvzGVakzVntXVFQEg8GAjo4O5OXlISQkBEajEYC1zPsTTzyBL7/8\n0lZjMiAgANXV1WhoaMCiRdYq3+PGjUNBQQEiIyOFY/HCyxjzG66kk6WmpsJstv+wh8FgsP35pZde\nwksvvdTjupiYGBw+fLhPY/HCyxjzG/yRYcYY8zBf+cgwL7yMMb/Bx0IyxpiH+cpWg6L25XNufelY\nUaCecBx/d9JdZB/DiZScKogPo5iM48I4AAxGpzAeKZGTdoGoylmKDGH8bhwgx2jCOGF8yVXxUWyP\nBf6CHCMPvxbGz0H8Ti0ApCrzhPETas83J24m86vip8rfhPGv1Exh/P/NFedYAgA2isMVevFrbxwu\nkkO0YbgwPp6sxAp8iOnC+KJPxEecNU2l88n+F+Kf6QTi38gdEv8OI5SLffrIbW8URcG/qv8u1faX\nypMuj+cKvuNljPmNqz5Sc034ybXc3FyMHz8eCQk3Dv0sLCxEREQEtFottFot9u3b1++TZIwxGc6e\n1eBpwoV3xYoVPRZWRVGwdu1a1NbWora2FnPnzu3XCTLGmCxHHxnu/hhowqX/7rvvhsVi6fH9gdwb\nYYwxR7xhUZXh1CE5xcXFmDlzJp5++mm0tra6e06MMeYUZ8/j9bQ+b3asXr0aP//5z9HS0oL169fD\naDRi3bp1vbYtvOlN8jS99cEYYx+YOvCBqcPt/XrD/q0MMp3MYrEgMzMTH3/c8zSmI0eO4P7778fB\ngz0rMSqKAuwXdP2BxOw2E/ELRLzzI4lBiB/+sxL/W4jrIoLIWAMgkd5ECRenUJHPFQDgd0RcnMYF\nAPvVvcJ46ogqYRztVHVGQP2+OK1N+bM4fekD9UfkGA/iN8L4oZLviTsgf+agC1FuleiDOnROkNIJ\nAGiiT1HDZnHaG54jrqeKvQIAFLekk/2zukWq7e+U1b6VTnbhwgWEh4ejs7MT27dvx7x54hw/xhjz\nlK99JJ1MuPDm5ORg//79aGpqQmRkJDZu3AiTyYTDhw9j6NChSElJwerVqz01V8YYE/KG/VsZwoX3\ntdde6/G93NzcfpsMY4y5wlf2eLn0D2PMb7iSx1tRUQGNRoPY2FgUFxf3iG/btg2JiYlITEzE8uXL\ncfz4celru+OFlzHmN1xZePPz82E0GlFeXo7NmzejqanJLh4TE4OKigocOXIEGRkZePLJJ6Wv7Y4X\nXsaY33A2j7e5uRkAkJKSgqioKKSnp6Oqyj4D56677rLVZps/fz72798vfW13/bshckUQC3BD/1Qf\nnUQaDAAgWBwW/R1k50GRmifRhipIKDNEO5UuFkV2QZ7mJq4hCbTTBTVp4pPcRoH+0E8bRogbUKlg\nRJ1WAMBlIi7zPlE0ERfXQAWaJJYA6gAz6vUvs8rIpN9JcHaPt6amBnFxcbav4+PjUVlZifnz5/fa\n/sUXX0RmZqZT1wJ8OhljzI84Sie7ZDqKS6ajbhmjvLwcW7duxfvvv+90H7zVwBjzG462FkamaRFR\n+BPbozudTof6+nrb13V1dZg5c2aPdkePHsWqVauwe/dujBkzpk/X3owXXsaY33D2WMjre7cVFRWw\nWCwoKyuDXm//qdWzZ89i8eLF2LZtGyZNmtSna7vjrQbGmN9w5XSyoqIiGAwGdHR0IC8vDyEhITAa\njQCsZd6feOIJfPnll1i1ahUAICAgANXV1Q6vFeGFlzHmN1xZeFNTU2E2m+2+ZzAYbH9+6aWX8NJL\nvZeu6u1aEV54GWN+w1fO4+3fhdciiF2SuJ5q0/4l0UAjMQhxOpnMz5E84Knn6W32iJOuAACfi8NU\n2lA7Ub0RAIY/TvRBd/FraiKFRAcWiXSyaqpBmzB6kUg3A4BTX35H3ICqgSqTTkalaRE/cgBAJRGn\n0gxlBpl1uzj+L4o4LrPK/KNEGwlXEeiejvoZ3/EyxvwG3/EyxpiH8cLLGGMe5hfHQjLGmC/xlWMh\nfWOWjDEmgbcaGGPMw3xl4SWLXTrdsaIAuwRdWyQ6eZ2IUyeHyZwsRrXpWYSjp3uIOJGNg4sSY1BF\nDc1EuhiVKiYzhsTzefmM+IV/u3JeGG8upyYBqA+J05cU4iX9t9PjyTHmYp8wbt6uFXcwU+Kf1Z+I\nNKz/T3dBnpKmlhMNEsghpqviapWHTn9X3MFJ4u8JABnuKXYZdFWqqiuaA8N9q9glY4x5q65O31jS\nfGOWjDEmoavTN7YaeOFljPkNXngZY8zDOjt44WWMMY+61uUbS5pvzJIxxmTwVgOATwQxKlUMAD5u\nIRpYqA4kBiGO3Fr/U4kuiHmcpI71kqgF9VW9OK4h0sXMFnqMM+JUL4A6DQ54M3CJMN68iEgX20kO\nAfwzEf/d/wjDf8Ud5BCWlmhxA+r1+4JEChV1+t454uQ8AEigKk3OEYfJf2PAoaeJ0/P2Ex1QJ+e5\n0xXnl7SKigoYDAZ0dnYiLy8PDz30kF28vr4eK1asQG1tLX7xi1+goKDAFouOjsbo0aMxePBguwPS\nHeE7XsaY/3ChWnF+fj6MRiOioqKQkZGBnJwcu0oS48aNQ3FxMd56660e1yqKApPJhOBgomr5N4Q1\n13JzczF+/HgkJNxIsm5tbcWCBQswYcIEZGdn4/JlqkY1Y4x5SKfko5vm5mYAQEpKCqKiopCeno6q\nqiq7NqGhoZgxYwYCAnr/LaMvH8gQLrwrVqzAvn32n+DZsmULJkyYgBMnTiAiIgIvvPCC9GCMMdav\nnFx4a2pqEBcXZ/s6Pj4elZXUKfM3KIqC2bNnIzs7G7t37ybbC7ca7r77blgsFrvvVVdX47HHHkNg\nYCByc3O04nJmAAAQoElEQVTx1FNPSU+OMcb6laNt8Y9M1kc/OXjwIMLDw2E2m5GZmYnk5GSEhTl+\nP6PP5d1v/p8hLi6O3ERmjDGP6XLwSEwDVhTeeHSj0+lQX3/jDey6ujrMnDlTetjw8HAAgEajQVZW\nFvbs2SNs3+c31/p0sMSfC2/8eWIaEJPW1+EYY/6o1QRcNrm/XyffXAsKCgJgzWyYMGECysrK8Pjj\nvWcKdV8D29ra0NXVhVGjRqGxsRGlpaV4+OGHheP1eeHV6XQwm83QarUwm83Q6XSOG/+k0HFsssRg\nH1OFD2OJOJUeBQBEOs4kiS4OUKddUcePEaliAIA4cZhM2aFP5ALEp1DJCKH+rhaXhwBOUA2ihdFA\nfE0OMWp0qzDePn6sK1OwaiLiZipVDHTBTPJ1MZweI5yINzozhzRASbvpa4lirDJkTiR0oKioCAaD\nAR0dHcjLy0NISAiMRiMAa5n3hoYG6HQ6tLS0YNCgQXj++edx7NgxfPHFF1i0aBEAa+ZDQUEBIiMj\nhWP1eeHV6/UoKSnBM888g5KSkj7djjPGWL9yIZ0sNTUVZrPZ7nsGg8H257CwMJw717O89MiRI3H4\n8OE+jSXc483JycF3v/tdHD9+HJGRkXjllVewevVqnD17FpMnT8b58+exatWqPg3IGGP9xsmsBk8T\n3vG+9lrvp4Dv2rWrXybDGGMu8YJFVQZ/co0x5j8kPmXtDXjhZYz5j66BnoAcXngZY/6DtxoAiOrO\n1cl0QJ2c9DkRv1NiDOJ3E6kfJJWS8zIRXykxhji9CeOIy7/6iB5iCHGSlcRzcQDEOA8SHchktJVR\nDcQ/j1vJ1w3Q2jJK3OAU0QFds5NOw5J57VG/WlOvizMSKWuziNz9euIkNpmUTJl/AjJcSCfzJL7j\nZYz5D77jZYwxD+OFlzHGPIwXXsYY8zBOJ2OMMQ/jdDKID9cIEcRsqKwG6h1Zup4U2cefJLogD/gg\n6qFRdd8AkL9DkecBEXWzAPrplJCBUmH86SnUcyGhZ+WVbjQuDxE92iKMmzuJQ3LIOULix26i+zif\nJo5TmRNSt4jEMkFlLchkeLgLZzUwxpiH8R4vY4x5GO/xMsaYh/nIHm+fS/8wxpjXcuFYyIqKCmg0\nGsTGxqK4uLhHvL6+HnfddReGDRuGZ599tk/Xdsd3vIwx/+HCHm9+fj6MRiOioqKQkZGBnJwchITc\nyAIYN24ciouL8dZbPd85pa7tju94GWP+o0Py0U1zczMAICUlBVFRUUhPT0dVVZVdm9DQUMyYMQMB\nAQF9vra7/r3jjRbELBLXj4oQx6nZj5EYI5qIvytRC2oskSL1Q+L6IRJ1r05Gi+O/Iq430UOQ+2PD\n6C7C8Zkwnq1/XRgvTyAO6gGALUQ8Uxy+rUt0epPVnMHlwrh5jFbcQbvEoUS4XRwelUZ3YZIYRuQ9\nOocwPWa3MP7OsAxhPPu2neQYMtl3Uq46d9nN1dMBID4+HpWVlZg/f36/XMtbDYwx/+Foq+FzE/CF\nyYMTEeOtBsaY/3C0tRCcBsQV3nh0o9PpUF9/o9p3XV2ddCFfZ67lhZcx5j+6JB/dBAUFAbBmJ1gs\nFpSVlUGv1/c6hKran0/cl2uv460Gxpj/cCGroaioCAaDAR0dHcjLy0NISAiMRiMAa5n3hoYG6HQ6\ntLS0YNCgQXj++edx7NgxjBw5stdrRXjhZYz5DxcW3tTUVJjNZrvvGQwG25/DwsJw7tw56WtFeOFl\njPkPH/nIsKJ237BwV8eKAswRdG2S6KTzKNGAKtBFHtkF4Etx+B6J07TetRANqP/f6ok4QObfTfyp\nOH5G5hVJPV90rbJ31HXCeHruAXEHRBgA1ChxjS/lz78Vxj9QN5Nj3FVRK26wiuhAps4Y9fI1f0r3\noSVSLpuI689JnIz3MpHuSJ3gR5QLBACUKz32TvtKURTgbsk+Drg+niv4jpcx5j/4dDLGGPMwH9lq\ncHrhjY6OxujRozF48GAEBASgurranfNijLG+85HTyZxeeBVFgclkQnBwsDvnwxhjzvs2bDUM5OY0\nY4z14CMLr9OfXFMUBbNnz0Z2djZ27xYfosEYYx7h5Olknub0He/BgwcRHh4Os9mMzMxMJCcnIyys\nW1W7hsIbf741DRifduNrj+zFEKliAABiq8QiMw71k6ROH5N5JRB9kCeHyVSypFKL6D66qJcUlVok\nc7rUrVQD8XM1Bl/RY1AFGqkikjIn412iGoyg+6DmSY4hcTIe9Xeh7jJ7u77RZH24m5Onk3ma0wtv\neLi1hLBGo0FWVhb27NmD++67z75RQqErc2OM+avQNOvjumMSx6/K8Oethra2NrS2Wm9dGhsbUVpa\nirlz57p1Yowx1mf+vNXw+eefY+HChQCs5TAKCgoQGRnp1okxxlif+XM62cSJE3H48GF3z4Uxxlzj\nI1sN/Mk1xpj/4IWXMcY8zAv2b2X078IrOlxJlTgViTwt62MiniAxBoFMxwHoNKsWIp4mMcYJcZhM\nJ5NIrRuuEcclXtR/xR3iBvcSHci8VUCdhkW8rLfjH+kxqIwz6tQvmXQy8vm8SPcxhkiHjBOHpdIl\no4n4MiI+UmKMNyXayHDhjreiogIGgwGdnZ3Iy8vDQw891KPNo48+ijfeeANjx47Ftm3bbEUu+3qE\nAt/xMsYYgPz8fBiNRkRFRSEjIwM5OTl2lSSqq6tx4MABfPjhhygtLcW6devwpz9Z7wL6eoQC11xj\njH3rNTc3AwBSUlIQFRWF9PR0VFVV2bWpqqrCkiVLEBwcjJycnB4VJ/pyhAIvvIyxb72amhrbtgEA\nxMfHo7Ky0q5NdXU14uPjbV+Hhobi9OnTAPp+hAJvNTDG/IijjfP9ACpc6llVVYd3tVJHKNyE73gZ\nY36k08HjewAevelhT6fTob7+Rgmuuro6zJw5066NXq/HsWPHbF83NjYiJiYGQO9HKIjwwssY8yPO\nfWY4KCgIgDWzwWKxoKysDHq93q6NXq/Hjh07cPHiRWzfvh0ajTULyJkjFPp3q0EmjUSIqmqRTMRv\nd3UCkukp1NNIvdMpcUIUldNDpZONkni3VSYFiqCnfmbjiQ6oOABMIOIn7hSGE7CNHmOsxDxc9Xeq\ngczrgnCZiI+T6IP6dxxCxIMkxnAbmTTV3hUVFcFgMKCjowN5eXkICQmB0WgEYC3znpycjFmzZmHG\njBkIDg7G1q1bAQANDQ1YtGgRAPkjFHiPlzHmR5z/BEVqamqPTAWDwWD39aZNm7Bp0ya778XExPT5\nCAVeeBljfsQ3PjPMCy9jzI/4xmeGeeFljPkRvuNljDEP4ztexhjzMOezGjypfxdeYSrLMxIdbHBt\nfEUiHWcwEZdJsUqKEMepk6ymSIzx19HiOHVCFDUHgP67StxMtBMpUFP1NcK4JWEiPcirRHx4rDCc\ngKP0GB9SDYgT5yKJnxdAH2p3hXhdAUA+EQ8lzg/Yp5BDJN8h/sTXx2HiUwDjRx8TxgHgENlCFm81\nMMaYh/FWA2OMeRjf8TLGmIfxHS9jjHkY3/EyxpiH8R0vY4x5GKeTAQc2Oo7d/Th9PXUqEnXykkQ2\nDhqJ+L9L9GEh4heIeLjEGFoiHk3EyWKY7unjnpMfCOMTJ1mE8fYRI8gxLGaiwT3i8J8xhxwD7xLx\nu4l0sd/QQ5C/Fb8t0UfkVXH8UqBEJ2LLidPcDo2eIYwfQ7ww7l58x8sYYx7Ge7yMMeZhvnHH63QF\nioqKCmg0GsTGxqK4uNidc/Ksr0wDPQM5p00DPQMppiq6jTc4YaL2f7yEj/zcG0x/HegpfMNR6Z/u\nj55k1rRHH30UMTExmD59ul2poL6uh04vvNdr0JeXl2Pz5s1oapL5TKoX8pWF94xpoGcghRdeN/OR\nhfdz0/GBnsI3nCv9A9BrWnV1NQ4cOIAPP/wQ69atw7p166Sv7c6phVemBj1jjHmec3e8MmtaVVUV\nlixZguDgYOTk5NiqVTizHjq18MrUoGeMMc9rl3zYk1nTqqurER9/I0MjNDQUp06dcmo97Oc31wod\nhw4IYp52WpD2Vu65aZD+Ipinl9gIYKNwi+tHLo9Bnl/2rvjErQe/Ce/dWCto9XJfptSTzIlzssoF\nP/d/ceM4Dqx5gGrxIgDg6EZxSXPPKJRqNXJk3yvxqqoKVbU/7U1R6NPdeuPUHa9MDfrrk+QHP/jB\nD5mHq/oy1vVy7H1Z0/R6PY4du3HEZWNjI2JiYjBjxgzy2u6cWnhlatAzxpivkFnT9Ho9duzYgYsX\nL2L79u3QaDQAgDFjxpDXduf0VkNvNegZY8xX9bamGY1GANYy78nJyZg1axZmzJiB4OBgbN26VXit\nkOpm+/fvV+Pi4tRJkyapv/71r93dvdtERUWpCQkJalJSkqrT6QZ6OjYrVqxQb731VnXq1Km277W0\ntKhZWVlqZGSkumDBArW1tXUAZ2jV2zwff/xx9fbbb1eTkpLUpKQk9e233x7AGVqdPXtWTUtLU+Pj\n49XU1FR127Ztqqp633PqaJ7e9Jy2t7erycnJamJioqrX69X//M//VFXV+55LX+D2hTcpKUndv3+/\narFY1MmTJ6uNjY3uHsItoqOj1YsXLw70NHqoqKhQP/roI7sF7emnn1YffPBB9cqVK+oDDzyg/upX\nvxrAGVr1Ns/CwkL12WefHcBZ9XThwgW1trZWVVVVbWxsVCdOnKi2tLR43XPqaJ7e9pz+/e9/V1VV\nVa9cuaJOmTJFPX78uNc9l77A6Q9Q9MbX8ntVN2zou9vdd9+NsWPH2n2vuroaK1euRGBgIHJzc73i\nOe1tnoD3PadhYWFISkoCAISEhGDKlCmoqanxuufU0TwB73pOR3xziNHly5fR2dmJwMBAr3sufYFb\nF15fyu9VFAWzZ89GdnY2du/ePdDTEbr5eY2Li0N1dfUAz8ix4uJizJw5E08//XSPd44H2smTJ1FX\nV4fk5GSvfk6vz/P6GzTe9Jxeu3YNiYmJGD9+PB588EFMmDDBq59Lb+XWhdeXHDx4EEeOHMFTTz2F\ntWvXoqGhYaCn5JA33fGIrF69GmfOnEFpaSlOnTple2PCG7S2tmLp0qV47rnnMHLkSK99Tm+e5y23\n3OJ1z+mgQYNw5MgRnDx5Ev/1X/+F2tpar30uvZlbF16ZXDhvER5uPQRXo9EgKysLe/Z4Q/J373Q6\nne3jiWazGTqdboBn1Ltbb70ViqIgKCgIDzzwAHbu3DnQUwIAdHR0YPHixbj33nuxYMECAN75nPY2\nT299TqOjozFv3jxUVVV55XPp7dy68PpKfm9bW5vtV7bGxkaUlpZi7ty5Azwrx/R6PUpKStDe3o6S\nkhKv/c/swgXrwTOdnZ3Yvn075s2bN8Azsv62sHLlSkydOhVr1qyxfd/bnlNH8/Sm57SpqQmXLl0C\nAFy8eBHvvPMOFixY4HXPpU9w97t1JpNJjYuLU7/zne+ozz//vLu7d4vTp0+riYmJamJiojp79mz1\n5ZdfHugp2SxbtkwNDw9Xhw4dqkZERKglJSVema5zfZ4BAQFqRESE+vLLL6v33nuvmpCQoE6fPl19\n+OGHvSJr5MCBA6qiKGpiYqJdSpa3Pae9zXPv3r1e9ZwePXpU1Wq16rRp09T09HT11VdfVVWV08mc\noagqb9AwxpgnfWvfXGOMsYHCCy9jjHkYL7yMMeZhvPAyxpiH8cLLGGMexgsvY4x52P8B1trTXziB\n6KcAAAAASUVORK5CYII=\n"
}
],
"prompt_number": 62
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Write QIIME's distance matrix format"
},
{
"cell_type": "code",
"collapsed": false,
"input": "from qiime.format import format_distance_matrix\n\nopen('/home/ubuntu/data/distance_matrix_fast.txt','w').write(format_distance_matrix(labels,dist_mat))",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 63
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# Step 5. Compute PCoA: QIIME/PyCogent"
},
{
"cell_type": "code",
"collapsed": false,
"input": "!principal_coordinates.py -i /home/ubuntu/data/distance_matrix_fast.txt -o /home/ubuntu/data/pc_fast.txt",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 64
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# Step 6. Display PCoA: QIIME"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "This generates an HTML file and java visualization for the data. To do this we need one additional file: tree_metadata.txt. You can view this file directly <a href=\"http://qiime.org/home_static/nih-cloud-apr2012/tree_metadata.txt\">here</a>."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!wget http://qiime.org/home_static/nih-cloud-apr2012/tree_metadata.txt",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "--2012-08-06 22:20:12-- http://qiime.org/home_static/nih-cloud-apr2012/tree_metadata.txt\r\nResolving qiime.org... 216.34.181.97\r\nConnecting to qiime.org|216.34.181.97|:80... connected.\r\nHTTP request sent, awaiting response... "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "200 OK\r\nLength: 9313 (9.1K) [text/plain]\r\nSaving to: `tree_metadata.txt'\r\n\r\n\r 0% [ ] 0 --.-K/s \r100%[======================================>] 9,313 --.-K/s in 0.02s \r\n\r\n2012-08-06 22:20:13 (393 KB/s) - `tree_metadata.txt' saved [9313/9313]\r\n\r\n"
}
],
"prompt_number": 65
},
{
"cell_type": "markdown",
"metadata": {},
"source": "We can then generate 3D PCoA plots."
},
{
"cell_type": "code",
"collapsed": true,
"input": "!make_3d_plots.py -i /home/ubuntu/data/pc_fast.txt -o /home/ubuntu/data/pcoa_plots/ -m /home/ubuntu/tree_metadata.txt",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 69
},
{
"cell_type": "markdown",
"metadata": {},
"source": "And the notebook simply serves these files up in `/files`, so we can visit the visualization <a href=\"/files/data/pcoa_plots/pc_fast_3D_PCoA_plots.html\" target=_blank>directly</a>\n\n**NOTE**: The above link is not static: to view the plot, you must run the notebook. "
}
],
"metadata": {}
}
]
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment