public
Last active

Practising Python (1)

  • Download Gist
Practicing.ipynb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: this blog post is the first I am undertaking with the IPython Notebook. I am still playing with formatting and so on, so please bear with me if the content doesn't seem as easy to read as it should. The notebook itself can be found as [a gist file on Github]( https://gist.github.com/holdenweb/8341811 \"\") and you can alternatively view it using [the online Notebook viewer](http://nbviewer.ipython.org/gist/holdenweb/8341811 \"\")."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I want to discuss a typical bit of Python, taken from a program sent me by a colleague (whether it's his code or someone else's I don't know, and it hardly matters). It's the kind of stuff we all do every day in Python, and despite the Zen of Python's advice that \u201c_there should be one, and preferably only one, obvious way to do it_\u201d there are many choices one could make that can impact the structure of the code.\n",
"\n",
"This started out as a way to make the code more readable (I suspect it may have been written by somebody more accustomed to a language like C), but I thought it might be interesting to look at some timings as well.\n",
"\n",
"In order to be able to run the code without providing various dependencies I have taken the liberty of defining a dummy `Button` function and various other \u201cmock\u201d objects to allow the code to run (they implement just enough to avoid exceptions being raised)\\*. This in turn means we can use IPython's __%%timeit__ cell magic to determine whether my \u201cimprovements\u201d actually help the execution speed.\n",
"\n",
"Note that each timed cell is preceded by a garbage collection to try as far as possible to run the samples on a level playing field\\*\\*."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import gc\n",
"\n",
"class MockFrame():\n",
" def grid(self, row, column, sticky):\n",
" pass\n",
"mock_frame = MockFrame()\n",
"\n",
"def Button(frame, text=None, fg=None, width=None, command=None, column=None, sticky=None):\n",
" return mock_frame\n",
"\n",
"class Mock():\n",
" pass\n",
"\n",
"self = Mock()\n",
"self.buttonRed, self.buttonBlue, self.buttonGreen, self.buttonBlack, self.buttonOpen = (None, )*5\n",
"\n",
"f4 = Mock()\n",
"f4.columnconfigure = lambda c, weight: None\n",
"\n",
"ALL = Mock()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The code in this next cell is extracted from the original code to avoid repetition - all loop implementations are written to use the same data."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"button = [\"Red\", \"Blue\", \"Green\", \"Black\", \"Open\"] \n",
"color = [\"red\", \"blue\", \"green\", \"black\", \"black\"] \n",
"commands = [self.buttonRed, self.buttonBlue, self.buttonGreen,\n",
" self.buttonBlack, self.buttonOpen] "
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So here's the original piece of code:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"g = gc.collect()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"# Benchmark 1, the original code\n",
"for c in range(5): \n",
" f4.columnconfigure(c, weight=1)\n",
" Button(f4, text=button[c], fg=color[c], width=5,\n",
" command=commands[c]).grid(row=0, column=c, sticky=ALL)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100000 loops, best of 3: 4.45 \u00b5s per loop\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You might suspect, as I did, that there are better ways to perform this loop. \n",
"\n",
"The most obvious is simply to create a single list to iterate over, using unpacking assignment in the __for__ loop to assign the individual elements to local variables. This certainly renders the loop body a little more readably. We do still need the column number, so we can use the _`enumerate()`_ function to provide it."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"g = gc.collect()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"for c, (btn, col, cmd) in enumerate(zip(button, color, commands)): \n",
" f4.columnconfigure(c, weight=1)\n",
" Button(f4, text=btn, fg=col, width=5, command=cmd). \\\n",
" grid(row=0, column=c, sticky=ALL)\n",
" pass"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100000 loops, best of 3: 4.26 \u00b5s per loop\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unfortunately any speed advantage appears insignificant. These timings aren't very repeatable under the conditions I have run them, so really any difference is lost in the noise - what you see depends on the results when this notebook was run (and therefore also on which computer), and it would be unwise of me to make any predictions about the conditions under which you read it.\n",
"\n",
"We can avoid the use of _`enumerate()`_ by maintaining a loop counter, but from an esthetic point of view this is\n",
"almost as bad (some would say worse) than iterating over the range of indices. In CPython it usually just comes out ahead, but at the cost of a certain amount of Pythonicity. It therefore makes the program a little less comprehensible."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"g = gc.collect()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"c = 0\n",
"for (btn, col, cmd) in zip(button, color, commands): \n",
" f4.columnconfigure(c, weight=1)\n",
" Button(f4, text=btn, fg=col, width=5, command=cmd). \\\n",
" grid(row=0, column=c, sticky=ALL)\n",
" c += 1\n",
" pass"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100000 loops, best of 3: 4.05 \u00b5s per loop\n"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next two cells repeat the same timings without the loop body, and this merely emphasises the speed gain of ditching\n",
"the call to _`enumerate()`_. At this level of simplicity, though, it's difficult to tell how much optimization is taking place\n",
"since the loop content is effectively null. I suspect PyPy would optimize this code out of existence. Who knows what CPython is\n",
"actually measuring here."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"g = gc.collect()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"for c, (btn, col, cmd) in enumerate(zip(button, color, commands)): \n",
" pass"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000000 loops, best of 3: 1.18 \u00b5s per loop\n"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"g = gc.collect()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"c = 0\n",
"for btn, col, cmd in zip(button, color, commands):\n",
" pass\n",
" c += 1"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000000 loops, best of 3: 854 ns per loop\n"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Somewhat irritatingly, manual maintenance of an index variable appears to have a predictable slight edge over use of _`enumerate()`_, and naive programmers might therefore rush to convert all their code to this paradigm. Before they do so, though, they should consider that code's environment. In this particular example the whole piece of code is simply setup, executed once only at the start of the program execution as a GUI is being created. Optimization at this level woud not therefore be a sensible step: to optimize you should look first at the code inside the most deeply-nested and oft-executed loops.\n",
"\n",
"If the timed code were to be executed billions of times inside two levels of nesting then one might, in production, consider using such an optimization if (and hopefully only if) there were a real need to extract every last ounce of speed from the hardware. In this case, since the program uses a graphical user interface and so user delays will use orders of magnitude more time than actual computing, it would be unwise to reduce the readability of the code, for which reason I prefer the _`enumerate()`_-based solution.\n",
"\n",
"With many loops the body's processing time is likely to dominate in real cases, however, and that again supportus using _`enumerate()`_. If loop overhead accounts for 5% of each iteration and you reduce your loop control time by 30% you are still only reducing your total loop run time by 1.5%.\n",
"So keep your program readable and Pythonically idiomatic.\n",
"\n",
"Besides which, who knows, some Python dev might come along and change implementations to alter the relative time advantage, and then wouldn't you feel silly changing all that code back again?\n",
"<hr />"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\\* If you have a serious need for mock objects in testing, you really should look at [the _`mock`_ module](http://www.voidspace.org.uk/python/mock/ \"Thanks to Michael Foord and the implementation team\"), [part of the standard library](http://docs.python.org/3.3/library/unittest.mock \"fuzzyman rules!\")\n",
"since Python 3.3. Thanks to [Michael Foord](http://www.voidspace.org.uk/python/weblog/index.shtml \"Hero of the Python world\") for his valiant efforts. Please help him by not using _`mock`_ in production."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\\*\\* An interesting issue here. Originally I wrote the above code to create a new MockFrame object for each call to Button(), and I consistently saw the result of the second test as three orders of magnitude slower than the first (_i.e._ ms, not \u00b5s). It took me a while to understand why timeit was running so many iterations for such a long test, adding further to the elapsed time. It turned out the second test was paying the price of collecting the garbage from the first, and that without garbage collections in between runs the GC overhead would distort the timings."
]
}
],
"metadata": {}
}
]
}

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.