neuromusic/allen api with tortilla.ipynb

## allen api with tortilla.ipynb
{
 "metadata": {
  "name": "",
  "signature": "sha256:6b468a0200fa77b500055d89fe83d71723b018e5048910807466780ce85ace41"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# exploring the Allen Brain Atlas API with Tortilla\n",
      "\n",
      "Tortilla simply uses Python-familiar structures like attributes and arguments to construct the URL for an API query and return the response. \n",
      "\n",
      "### We can start by wrapping the top level of the API url and giving it a friendly name."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import tortilla\n",
      "allen = tortilla.wrap(\"http://api.brain-map.org/api/v2/data\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now, to really explore Allen's API, you have to do more than just traverse a hierarchy of URLs. \n",
      "\n",
      "It maintains its own syntax for queries, passed in as the value of the \"criteria\" parameter. Tortilla can't help us much in constructing this, so we are just going to construct it by assembling the string appropriately.\n",
      "\n",
      "The full Allen API is beyond the scope of this demo, so we're just going to construct our criteria borrowed from one of Allen's examples (http://www.brain-map.org/api/examples/examples/doc/structures/download_data.py.html)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "MOUSE_PRODUCT_ID = 1 # aba\n",
      "PLANE_ID = 1 # coronal\n",
      "\n",
      "criteria = ','.join(\n",
      "    [\"[failed$eq'false'][expression$eq'true']\",\n",
      "     \"products[id$eq%d]\"%MOUSE_PRODUCT_ID,\n",
      "     \"plane_of_section[id$eq%d]\" % PLANE_ID,\n",
      "     ]\n",
      "    )\n",
      "print criteria"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[failed$eq'false'][expression$eq'true'],products[id$eq1],plane_of_section[id$eq1]\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Once we know our necessary parameters, building the query is easy\n",
      "\n",
      "We can construct our query with the default parameters passed in as a dictionary and save the wrapper to its own variable."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dataset_parameters = {\n",
      "    'criteria': criteria,\n",
      "    'start_row': 0,\n",
      "    'num_rows': 2000,\n",
      "}\n",
      "section_query = allen.SectionDataSet('query.json',params=dataset_parameters)\n",
      "print section_query"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "<Wrap for http://api.brain-map.org/api/v2/data/SectionDataSet/query.json>\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Did you catch what Tortilla did there with the \"SectionDataSet\" attribute?\n",
      "\n",
      "Calling `allen.SectionDataSet` added `/SectionDataSet` to the url defined in `allen`. If your API is hierarchical, you can traverse the whole thing like this. Slick, huh?\n",
      "\n",
      "Then, the `query.json` argument passed into that gets appended as `/query.json` as well.\n",
      "\n",
      "Nicely, this is just a wrapper, so it won't be executed until we call `.get()`, at which point the parameters will be appended as well."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "response = section_query.get()\n",
      "type(response)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 4,
       "text": [
        "tortilla.utils.Bunch"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### This response is a tortilla \"Bunch\" \n",
      "\n",
      "... at the simplist level, it maintains the JSON of the response in a dictionary structure:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print response.keys()\n",
      "print response['total_rows']\n",
      "print len(response['msg'])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[u'msg', u'total_rows', u'success', u'num_rows', u'start_row', u'id']\n",
        "3319\n",
        "2000\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "However, since it is a \"Bunch\", we can also access the key values through attributes."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print response.total_rows\n",
      "print len(response.msg)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3319\n",
        "2000\n"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can also grab the next 2000 rows by calling `.get()` again, but updating the `start_row` parameter."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "response = section_query.get(params={'start_row':2000})\n",
      "print response.start_row\n",
      "print len(response.msg)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2000\n",
        "1319\n"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### We want all the rows, however\n",
      "\n",
      "So, to make our life a bit easier, we'll define a generator so we can just loop through all of queries needed to assemble the full output of the query."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def wrapped_query(wrapped,num_rows):\n",
      "    start_row = 0\n",
      "    while True:\n",
      "        \n",
      "        # let's get the response and spit out the rows\n",
      "        response = wrapped.get(params={\n",
      "            'start_row': start_row,\n",
      "            'num_rows': num_rows,\n",
      "        })\n",
      "        yield response.msg\n",
      "        \n",
      "        # if we are at the end, let's bail. otherwise set up to grab the next set of rows\n",
      "        if len(response.msg) < num_rows:\n",
      "            break\n",
      "        else:\n",
      "            start_row += len(response.msg)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Generators are awesome\n",
      "\n",
      "They are basically functions that you can iterate over.\n",
      "\n",
      "On each iteration of this generator, it is going to query the API and spit out the rows. And each time it is called, it will increment `start_row` by the number of rows it just returned."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "rows = []\n",
      "for r in wrapped_query(section_query,1000):\n",
      "    print len(r)\n",
      "    rows += r\n",
      "print rows[0]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1000\n",
        "1000"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "1000"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "319"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "{u'weight': 5270, u'qc_date': u'2009-05-02T22:58:26Z', u'reference_space_id': 9, u'failed_facet': 734881840, u'rnaseq_design_id': None, u'storage_directory': u'/external/aibssan/production32/prod336/image_series_71670698/', u'id': 71670698, u'plane_of_section_id': 1, u'name': None, u'sphinx_id': 141427, u'blue_channel': None, u'green_channel': None, u'failed': False, u'delegate': False, u'specimen_id': 70896862, u'red_channel': None, u'expression': True, u'section_thickness': 25.0}\n"
       ]
      }
     ],
     "prompt_number": 9
    }
   ],
   "metadata": {}
  }
 ]
}
	{
	"metadata": {
	"name": "",
	"signature": "sha256:6b468a0200fa77b500055d89fe83d71723b018e5048910807466780ce85ace41"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# exploring the Allen Brain Atlas API with Tortilla\n",
	"\n",
	"Tortilla simply uses Python-familiar structures like attributes and arguments to construct the URL for an API query and return the response. \n",
	"\n",
	"### We can start by wrapping the top level of the API url and giving it a friendly name."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"import tortilla\n",
	"allen = tortilla.wrap(\"http://api.brain-map.org/api/v2/data\")"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 1
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now, to really explore Allen's API, you have to do more than just traverse a hierarchy of URLs. \n",
	"\n",
	"It maintains its own syntax for queries, passed in as the value of the \"criteria\" parameter. Tortilla can't help us much in constructing this, so we are just going to construct it by assembling the string appropriately.\n",
	"\n",
	"The full Allen API is beyond the scope of this demo, so we're just going to construct our criteria borrowed from one of Allen's examples (http://www.brain-map.org/api/examples/examples/doc/structures/download_data.py.html)"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"MOUSE_PRODUCT_ID = 1 # aba\n",
	"PLANE_ID = 1 # coronal\n",
	"\n",
	"criteria = ','.join(\n",
	" [\"[failed$eq'false'][expression$eq'true']\",\n",
	" \"products[id$eq%d]\"%MOUSE_PRODUCT_ID,\n",
	" \"plane_of_section[id$eq%d]\" % PLANE_ID,\n",
	" ]\n",
	" )\n",
	"print criteria"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"[failed$eq'false'][expression$eq'true'],products[id$eq1],plane_of_section[id$eq1]\n"
	]
	}
	],
	"prompt_number": 2
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Once we know our necessary parameters, building the query is easy\n",
	"\n",
	"We can construct our query with the default parameters passed in as a dictionary and save the wrapper to its own variable."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"dataset_parameters = {\n",
	" 'criteria': criteria,\n",
	" 'start_row': 0,\n",
	" 'num_rows': 2000,\n",
	"}\n",
	"section_query = allen.SectionDataSet('query.json',params=dataset_parameters)\n",
	"print section_query"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"<Wrap for http://api.brain-map.org/api/v2/data/SectionDataSet/query.json>\n"
	]
	}
	],
	"prompt_number": 3
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Did you catch what Tortilla did there with the \"SectionDataSet\" attribute?\n",
	"\n",
	"Calling `allen.SectionDataSet` added `/SectionDataSet` to the url defined in `allen`. If your API is hierarchical, you can traverse the whole thing like this. Slick, huh?\n",
	"\n",
	"Then, the `query.json` argument passed into that gets appended as `/query.json` as well.\n",
	"\n",
	"Nicely, this is just a wrapper, so it won't be executed until we call `.get()`, at which point the parameters will be appended as well."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"response = section_query.get()\n",
	"type(response)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 4,
	"text": [
	"tortilla.utils.Bunch"
	]
	}
	],
	"prompt_number": 4
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### This response is a tortilla \"Bunch\" \n",
	"\n",
	"... at the simplist level, it maintains the JSON of the response in a dictionary structure:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"print response.keys()\n",
	"print response['total_rows']\n",
	"print len(response['msg'])"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"[u'msg', u'total_rows', u'success', u'num_rows', u'start_row', u'id']\n",
	"3319\n",
	"2000\n"
	]
	}
	],
	"prompt_number": 5
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"However, since it is a \"Bunch\", we can also access the key values through attributes."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"print response.total_rows\n",
	"print len(response.msg)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"3319\n",
	"2000\n"
	]
	}
	],
	"prompt_number": 6
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can also grab the next 2000 rows by calling `.get()` again, but updating the `start_row` parameter."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"response = section_query.get(params={'start_row':2000})\n",
	"print response.start_row\n",
	"print len(response.msg)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"2000\n",
	"1319\n"
	]
	}
	],
	"prompt_number": 7
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### We want all the rows, however\n",
	"\n",
	"So, to make our life a bit easier, we'll define a generator so we can just loop through all of queries needed to assemble the full output of the query."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"def wrapped_query(wrapped,num_rows):\n",
	" start_row = 0\n",
	" while True:\n",
	" \n",
	" # let's get the response and spit out the rows\n",
	" response = wrapped.get(params={\n",
	" 'start_row': start_row,\n",
	" 'num_rows': num_rows,\n",
	" })\n",
	" yield response.msg\n",
	" \n",
	" # if we are at the end, let's bail. otherwise set up to grab the next set of rows\n",
	" if len(response.msg) < num_rows:\n",
	" break\n",
	" else:\n",
	" start_row += len(response.msg)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 8
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Generators are awesome\n",
	"\n",
	"They are basically functions that you can iterate over.\n",
	"\n",
	"On each iteration of this generator, it is going to query the API and spit out the rows. And each time it is called, it will increment `start_row` by the number of rows it just returned."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"rows = []\n",
	"for r in wrapped_query(section_query,1000):\n",
	" print len(r)\n",
	" rows += r\n",
	"print rows[0]"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"1000\n",
	"1000"
	]
	},
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"\n",
	"1000"
	]
	},
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"\n",
	"319"
	]
	},
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"\n",
	"{u'weight': 5270, u'qc_date': u'2009-05-02T22:58:26Z', u'reference_space_id': 9, u'failed_facet': 734881840, u'rnaseq_design_id': None, u'storage_directory': u'/external/aibssan/production32/prod336/image_series_71670698/', u'id': 71670698, u'plane_of_section_id': 1, u'name': None, u'sphinx_id': 141427, u'blue_channel': None, u'green_channel': None, u'failed': False, u'delegate': False, u'specimen_id': 70896862, u'red_channel': None, u'expression': True, u'section_thickness': 25.0}\n"
	]
	}
	],
	"prompt_number": 9
	}
	],
	"metadata": {}
	}
	]
	}