Skip to content

Instantly share code, notes, and snippets.

@NelsonMinar
Created April 7, 2016 19:12
Show Gist options
  • Save NelsonMinar/90212fbbfc6465c8e263341b86aa01a8 to your computer and use it in GitHub Desktop.
Save NelsonMinar/90212fbbfc6465c8e263341b86aa01a8 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comparing ways of measuring length of an iterable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Define several algorithms for determining the length of an iterable. They are all fundamentally O(n), having to walk the iteratable. Note that these methods are destructive, they may consume the iterator's contents. Also iterators aren't necessarily finite, so these methods may never return.\n",
"\n",
"* https://nelsonslog.wordpress.com/2016/04/06/python3-no-len-for-iterators/\n",
"* http://stackoverflow.com/questions/390852/is-there-any-built-in-way-to-get-the-length-of-an-iterable-in-python"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import collections\n",
"\n",
"def len1(iterable):\n",
" \"\"\"Calculate length by creating a collection\n",
" O(n) in run time\n",
" O(n) in memory\"\"\"\n",
"\n",
" return len(tuple(iterable))\n",
"\n",
"\n",
"def len2(iterable):\n",
" \"\"\"Calculate length by summing the integer 1 for every element\n",
" O(n) in run time\n",
" O(1) in memory\"\"\"\n",
"\n",
" return sum(1 for e in iterable)\n",
"\n",
"\n",
"def len3(iterable):\n",
"\n",
" \"\"\"Calculate length by using enumerate and deque to iterate in C code\n",
" A trick borrowed from the cardinality library\n",
" O(n) in run time, but loop is in C\n",
" O(1) in memory\"\"\"\n",
"\n",
" d = collections.deque(enumerate(iterable, 1), maxlen=1)\n",
" return d[0][0] if d else 0"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# How big a collection we're going to test\n",
"size = 1000000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Try on a list collection, which also has an O(1) len() function"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"list"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = [0] * size\n",
"type(a)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1000000 1000000 1000000 1000000 "
]
}
],
"source": [
"for f in len, len1, len2, len3:\n",
" print(f(a), end=\" \")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10000000 loops, best of 3: 89.7 ns per loop\n"
]
}
],
"source": [
"%timeit len(a)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"100 loops, best of 3: 4.36 ms per loop\n"
]
}
],
"source": [
"%timeit len1(a)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10 loops, best of 3: 59.1 ms per loop\n"
]
}
],
"source": [
"%timeit len2(a)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10 loops, best of 3: 40.4 ms per loop\n"
]
}
],
"source": [
"%timeit len3(a)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now try on an ordinary iterator"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"itertools.repeat"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import itertools\n",
"b = itertools.repeat(0, size)\n",
"type(b)\n",
"# Note we'll have to keep recreating b, since it is consumed on use"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"object of type 'itertools.repeat' has no len()\n"
]
}
],
"source": [
"# Try the len() method on b; we expect an error because there's no length for an iterator\n",
"try:\n",
" len(b)\n",
"except Exception as e:\n",
" print(e)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1000000 1000000 1000000 "
]
}
],
"source": [
"# Test our length methods to see the output is reasonable\n",
"for f in len1, len2, len3:\n",
" b = itertools.repeat(0, size)\n",
" print(f(b), end=\" \")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"100 loops, best of 3: 5.44 ms per loop\n"
]
}
],
"source": [
"%%timeit\n",
"b = itertools.repeat(0, size)\n",
"len1(b)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10 loops, best of 3: 58.4 ms per loop\n"
]
}
],
"source": [
"%%timeit\n",
"b = itertools.repeat(0, size)\n",
"len2(b)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10 loops, best of 3: 38.8 ms per loop\n"
]
}
],
"source": [
"%%timeit\n",
"b = itertools.repeat(0, size)\n",
"len3(b)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
@rhaps0dy
Copy link

rhaps0dy commented May 3, 2017

I tried a comparison with:

len(list(a))

It performs slightly better than len(tuple(a))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment