AlexArcPy/arcpy_chain_cursors.ipynb

## arcpy_chain_cursors.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using `itertools` module within `arcpy` workflows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you have multiple feature classes and you want to iterate their cursors simultaneously, you will need somehow to merge their cursor iterators objects. You might of course loop over iterators individually in an own `for` loop, however, you might need to iterate them alltogether. A quick way to get this done is to merge the feature classes first; however, this can take some extra time, and may mess up the data schema as you might not have the full control over the data fields.\n",
    "\n",
    "When doing this with Python, a very efficient and elegant solution is to use `itertools.chain` which does just one relatively simple thing:\n",
    "\n",
    "`\n",
    "Return a chain object whose .__next__() method returns elements from the\n",
    "first iterable until it is exhausted, then elements from the next\n",
    "iterable, until all of the iterables are exhausted.\n",
    "`\n",
    "\n",
    "So if you have two `da.SearchCursor` iterators (remember they are iterators and not the plain lists, right?), you would like to chain them. Let's see how it works on a simple example first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n",
      "London\n",
      "200\n"
     ]
    }
   ],
   "source": [
    "cur1 = iter([1,'London',200])\n",
    "\n",
    "for i in cur1:\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n",
      "Manchester\n",
      "300\n"
     ]
    }
   ],
   "source": [
    "cur2 = iter([2,'Manchester',300])\n",
    "for i in cur2:\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But what if you need to iterate both of these cursors at the same time? Maybe you want to do some fancy slicing or data aggregation and data comparison. Well, you could chain these iterators into a single iterable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(1, 'London', 200),\n",
       " (2, 'York', 150),\n",
       " (3, 'Manchester', 300),\n",
       " (4, 'Cape', 450)]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import itertools\n",
    "\n",
    "cur1 = iter([(1,'London',200),(2,'York',150)])\n",
    "cur2 = iter([(3,'Manchester',300),(4,'Cape',450)])\n",
    "\n",
    "list(itertools.chain(cur1,cur2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1, 'London', 200)\n",
      "(2, 'York', 150)\n",
      "(3, 'Manchester', 300)\n",
      "(4, 'Cape', 450)\n"
     ]
    }
   ],
   "source": [
    "cur1 = iter([(1,'London',200),(2,'York',150)])\n",
    "cur2 = iter([(3,'Manchester',300),(4,'Cape',450)])\n",
    "\n",
    "for i in itertools.chain(cur1,cur2):\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This means you can now iterate over features from multiple cursors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<da.SearchCursor object at 0x07290DC8>,\n",
       " <da.SearchCursor object at 0x07290E60>]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'''\n",
    "Chain multiple feature classes da.SearchCursor iterators using itertools.chain\n",
    "'''\n",
    "import os\n",
    "import sys\n",
    "import itertools\n",
    "import arcpy\n",
    "\n",
    "fcs = [r\"C:\\GIS\\Temp\\scratch.gdb\\streets_east\",\n",
    "       r\"C:\\GIS\\Temp\\scratch.gdb\\streets_west\"]\n",
    "\n",
    "fc_name = r\"streets\"\n",
    "out_gdb = r\"in_memory\"\n",
    "out_fc = os.path.join(out_gdb,fc_name)\n",
    "sr = arcpy.Describe(fcs[0]).spatialReference\n",
    "\n",
    "arcpy.Delete_management(out_fc)\n",
    "arcpy.CreateFeatureclass_management(out_path=out_gdb, out_name=fc_name,\n",
    "                                    geometry_type=\"POLYLINE\",\n",
    "                                    template=fcs[0],spatial_reference=sr)\n",
    "\n",
    "#finding out the OID name\n",
    "oid = arcpy.Describe(fcs[0]).OIDFieldName.lower()\n",
    "\n",
    "#creating a list of fields in a feature class\n",
    "fields = [f.name for f in arcpy.ListFields(fcs[0])\n",
    "          if f.name.lower() not in \n",
    "          ('shape_length','shape_area','shape',oid)] + [\"SHAPE@\"]\n",
    "\n",
    "cursors = [arcpy.da.SearchCursor(fc,fields) for fc in fcs]\n",
    "cursors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<itertools.chain at 0x3978690>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#chaining iterators using tuple unpacking\n",
    "feats_generator = (itertools.chain(*cursors))\n",
    "feats_generator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because `itertools.chain` takes multiple arguments as input data (namely, multiple iterables), we need somehow to \"explode\" our list of cursors into arguments. This can be done using the tuple unpacking:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "11"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def calc_sum(a,b):\n",
    "    return a + b\n",
    "\n",
    "a = 5\n",
    "b = 6\n",
    "calc_sum(a,b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "15"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def calc_sum(a,b):\n",
    "    return a + b\n",
    "\n",
    "indata = [7,8]\n",
    "calc_sum(*indata)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false,
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "7 8\n"
     ]
    }
   ],
   "source": [
    "from __future__ import print_function\n",
    "print(*indata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Now, when we have a single iterable, we can iterate over it and, for instance, create a new feature class that will represent the merged version of those two feature classes. This is essentially the same result we would get using the Merge GP tool."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "C:\\GIS\\Temp\\scratch.gdb\\streets_all\n"
     ]
    }
   ],
   "source": [
    "with arcpy.da.InsertCursor(out_fc,fields) as icur:\n",
    "    for row in feats_generator:\n",
    "        icur.insertRow(row)\n",
    "\n",
    "arcpy.env.overwriteOutput = True\n",
    "res_fc = r'C:\\GIS\\Temp\\scratch.gdb\\streets_all'\n",
    "print(arcpy.CopyFeatures_management(out_fc,res_fc))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I got curious at this point of time and decided to check whether reading features with `da.SearchCursor` and then writing them with `da.InsertCursor` is faster than using the Merge GP tool. I've done some tests on data of various size (everything from a couple thousands to ten million features ) and it became clear that that the performance of two processes - inserting rows with `da.InsertCursor` and using Merge tool - was identical. Well, at least we know that now. "
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Using `itertools` module within `arcpy` workflows"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"If you have multiple feature classes and you want to iterate their cursors simultaneously, you will need somehow to merge their cursor iterators objects. You might of course loop over iterators individually in an own `for` loop, however, you might need to iterate them alltogether. A quick way to get this done is to merge the feature classes first; however, this can take some extra time, and may mess up the data schema as you might not have the full control over the data fields.\n",
	"\n",
	"When doing this with Python, a very efficient and elegant solution is to use `itertools.chain` which does just one relatively simple thing:\n",
	"\n",
	"`\n",
	"Return a chain object whose .__next__() method returns elements from the\n",
	"first iterable until it is exhausted, then elements from the next\n",
	"iterable, until all of the iterables are exhausted.\n",
	"`\n",
	"\n",
	"So if you have two `da.SearchCursor` iterators (remember they are iterators and not the plain lists, right?), you would like to chain them. Let's see how it works on a simple example first."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"1\n",
	"London\n",
	"200\n"
	]
	}
	],
	"source": [
	"cur1 = iter([1,'London',200])\n",
	"\n",
	"for i in cur1:\n",
	" print(i)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {
	"collapsed": false,
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"2\n",
	"Manchester\n",
	"300\n"
	]
	}
	],
	"source": [
	"cur2 = iter([2,'Manchester',300])\n",
	"for i in cur2:\n",
	" print(i)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"But what if you need to iterate both of these cursors at the same time? Maybe you want to do some fancy slicing or data aggregation and data comparison. Well, you could chain these iterators into a single iterable."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"[(1, 'London', 200),\n",
	" (2, 'York', 150),\n",
	" (3, 'Manchester', 300),\n",
	" (4, 'Cape', 450)]"
	]
	},
	"execution_count": 3,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"import itertools\n",
	"\n",
	"cur1 = iter([(1,'London',200),(2,'York',150)])\n",
	"cur2 = iter([(3,'Manchester',300),(4,'Cape',450)])\n",
	"\n",
	"list(itertools.chain(cur1,cur2))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {
	"collapsed": false,
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"(1, 'London', 200)\n",
	"(2, 'York', 150)\n",
	"(3, 'Manchester', 300)\n",
	"(4, 'Cape', 450)\n"
	]
	}
	],
	"source": [
	"cur1 = iter([(1,'London',200),(2,'York',150)])\n",
	"cur2 = iter([(3,'Manchester',300),(4,'Cape',450)])\n",
	"\n",
	"for i in itertools.chain(cur1,cur2):\n",
	" print(i)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"This means you can now iterate over features from multiple cursors."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"[<da.SearchCursor object at 0x07290DC8>,\n",
	" <da.SearchCursor object at 0x07290E60>]"
	]
	},
	"execution_count": 5,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"'''\n",
	"Chain multiple feature classes da.SearchCursor iterators using itertools.chain\n",
	"'''\n",
	"import os\n",
	"import sys\n",
	"import itertools\n",
	"import arcpy\n",
	"\n",
	"fcs = [r\"C:\\GIS\\Temp\\scratch.gdb\\streets_east\",\n",
	" r\"C:\\GIS\\Temp\\scratch.gdb\\streets_west\"]\n",
	"\n",
	"fc_name = r\"streets\"\n",
	"out_gdb = r\"in_memory\"\n",
	"out_fc = os.path.join(out_gdb,fc_name)\n",
	"sr = arcpy.Describe(fcs[0]).spatialReference\n",
	"\n",
	"arcpy.Delete_management(out_fc)\n",
	"arcpy.CreateFeatureclass_management(out_path=out_gdb, out_name=fc_name,\n",
	" geometry_type=\"POLYLINE\",\n",
	" template=fcs[0],spatial_reference=sr)\n",
	"\n",
	"#finding out the OID name\n",
	"oid = arcpy.Describe(fcs[0]).OIDFieldName.lower()\n",
	"\n",
	"#creating a list of fields in a feature class\n",
	"fields = [f.name for f in arcpy.ListFields(fcs[0])\n",
	" if f.name.lower() not in \n",
	" ('shape_length','shape_area','shape',oid)] + [\"SHAPE@\"]\n",
	"\n",
	"cursors = [arcpy.da.SearchCursor(fc,fields) for fc in fcs]\n",
	"cursors"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {
	"collapsed": false,
	"scrolled": true
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"<itertools.chain at 0x3978690>"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"#chaining iterators using tuple unpacking\n",
	"feats_generator = (itertools.chain(*cursors))\n",
	"feats_generator"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Because `itertools.chain` takes multiple arguments as input data (namely, multiple iterables), we need somehow to \"explode\" our list of cursors into arguments. This can be done using the tuple unpacking:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"11"
	]
	},
	"execution_count": 7,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"def calc_sum(a,b):\n",
	" return a + b\n",
	"\n",
	"a = 5\n",
	"b = 6\n",
	"calc_sum(a,b)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"15"
	]
	},
	"execution_count": 8,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"def calc_sum(a,b):\n",
	" return a + b\n",
	"\n",
	"indata = [7,8]\n",
	"calc_sum(*indata)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {
	"collapsed": false,
	"scrolled": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"7 8\n"
	]
	}
	],
	"source": [
	"from __future__ import print_function\n",
	"print(*indata)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"collapsed": false
	},
	"source": [
	"Now, when we have a single iterable, we can iterate over it and, for instance, create a new feature class that will represent the merged version of those two feature classes. This is essentially the same result we would get using the Merge GP tool."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"C:\\GIS\\Temp\\scratch.gdb\\streets_all\n"
	]
	}
	],
	"source": [
	"with arcpy.da.InsertCursor(out_fc,fields) as icur:\n",
	" for row in feats_generator:\n",
	" icur.insertRow(row)\n",
	"\n",
	"arcpy.env.overwriteOutput = True\n",
	"res_fc = r'C:\\GIS\\Temp\\scratch.gdb\\streets_all'\n",
	"print(arcpy.CopyFeatures_management(out_fc,res_fc))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"I got curious at this point of time and decided to check whether reading features with `da.SearchCursor` and then writing them with `da.InsertCursor` is faster than using the Merge GP tool. I've done some tests on data of various size (everything from a couple thousands to ten million features ) and it became clear that that the performance of two processes - inserting rows with `da.InsertCursor` and using Merge tool - was identical. Well, at least we know that now. "
	]
	}
	],
	"metadata": {
	"anaconda-cloud": {},
	"kernelspec": {
	"display_name": "Python 2",
	"language": "python",
	"name": "python2"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 2
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.10"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 1
	}