Skip to content

Instantly share code, notes, and snippets.

@pvg
Created April 8, 2016 00:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pvg/fb7adc1be9419c3588eef5d69010a1af to your computer and use it in GitHub Desktop.
Save pvg/fb7adc1be9419c3588eef5d69010a1af to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## Comparing ways of measuring length of an iterable"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Define several algorithms for determining the length of an iterable. They are all fundamentally O(n), having to walk the iteratable. Note that these methods are destructive, they may consume the iterator's contents. Also iterators aren't necessarily finite, so these methods may never return.\n",
"\n",
"* https://nelsonslog.wordpress.com/2016/04/06/python3-no-len-for-iterators/\n",
"* http://stackoverflow.com/questions/390852/is-there-any-built-in-way-to-get-the-length-of-an-iterable-in-python"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [],
"source": [
"import collections\n",
"\n",
"def len1(iterable):\n",
" \"\"\"Calculate length by creating a collection\n",
" O(n) in run time\n",
" O(n) in memory\"\"\"\n",
"\n",
" return len(tuple(iterable))\n",
"\n",
"\n",
"def len2(iterable):\n",
" \"\"\"Calculate length by summing the integer 1 for every element\n",
" O(n) in run time\n",
" O(1) in memory\"\"\"\n",
"\n",
" return sum(1 for e in iterable)\n",
"\n",
"\n",
"def len3(iterable):\n",
"\n",
" \"\"\"Calculate length by using enumerate and deque to iterate in C code\n",
" A trick borrowed from the cardinality library\n",
" O(n) in run time, but loop is in C\n",
" O(1) in memory\"\"\"\n",
"\n",
" d = collections.deque(enumerate(iterable, 1), maxlen=1)\n",
" return d[0][0] if d else 0"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"button": false,
"collapsed": true,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [],
"source": [
"# How big a collection we're going to test\n",
"size = 1000000"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Try on a list collection, which also has an O(1) len() function"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"list"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = [0] * size\n",
"type(a)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1000000 1000000 1000000 1000000 "
]
}
],
"source": [
"for f in len, len1, len2, len3:\n",
" print(f(a), end=\" \")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The slowest run took 12.36 times longer than the fastest. This could mean that an intermediate result is being cached \n",
"10000000 loops, best of 3: 82.6 ns per loop\n"
]
}
],
"source": [
"%timeit len(a)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"100 loops, best of 3: 4.92 ms per loop\n"
]
}
],
"source": [
"%timeit len1(a)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10 loops, best of 3: 59 ms per loop\n"
]
}
],
"source": [
"%timeit len2(a)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10 loops, best of 3: 35.5 ms per loop\n"
]
}
],
"source": [
"%timeit len3(a)"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Now try on an ordinary iterator"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"__main__.Itsy"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class Itsy:\n",
" def __init__(self, max):\n",
" self.i = 0\n",
" self.max = max\n",
"\n",
" def __iter__(self):\n",
" return self\n",
"\n",
" def __next__(self):\n",
" self.i += 1 \n",
" if self.i > self.max:\n",
" raise StopIteration\n",
" return 0\n",
"\n",
"b = Itsy(size)\n",
"type(b)\n",
"# Note we'll have to keep recreating b, since it is consumed on use"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"object of type 'Itsy' has no len()\n"
]
}
],
"source": [
"# Try the len() method on b; we expect an error because there's no length for an iterator\n",
"try:\n",
" len(b)\n",
"except Exception as e:\n",
" print(e)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1000000 1000000 1000000 "
]
}
],
"source": [
"# Test our length methods to see the output is reasonable\n",
"for f in len1, len2, len3:\n",
" b = Itsy(size)\n",
" print(f(b), end=\" \")"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 loops, best of 3: 400 ms per loop\n"
]
}
],
"source": [
"%%timeit\n",
"b = Itsy(size)\n",
"len1(b)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 loops, best of 3: 456 ms per loop\n"
]
}
],
"source": [
"%%timeit\n",
"b = Itsy(size)\n",
"len2(b)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"button": false,
"collapsed": false,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 loops, best of 3: 450 ms per loop\n"
]
}
],
"source": [
"%%timeit\n",
"b = Itsy(size)\n",
"len3(b)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"button": false,
"collapsed": true,
"deletable": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment