Skip to content

Instantly share code, notes, and snippets.

@grepdisc
Created June 7, 2011 19:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save grepdisc/1012945 to your computer and use it in GitHub Desktop.
Save grepdisc/1012945 to your computer and use it in GitHub Desktop.
patterns to stop iteration on a sorted sequence
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""These are patterns to stop iteration on a sorted sequence.
In the case that there is a performance requirement to short-circuit
an iteration, using a sentinel pattern can reduce the cost incurred by
a conditional on each iteration.
itertools.combinations() is used in these example functions to
represent any iterator that emits tuples that are sorted with
respect to a known key function.
In each example, an iterator emits tuples in an order
given by a key function, and a bin corresponding to that
key is incremented by the result of calculations on the
tuple.
>>> results = []
>>> results.append(dict(loop_fcn_v1(10, 3)))
>>> results.append(dict(loop_fcn_v2(10, 3)))
>>> results.append(dict(loop_fcn_v3(10, 3)))
>>> results.append(dict(loop_fcn_v4(10, 3)))
>>> results.append(dict(loop_fcn_v5(10, 3)))
>>> all(r == results[0] for r in results)
True
"""
from collections import defaultdict
from itertools import combinations, groupby
from operator import itemgetter
def f(iterable):
"""This is some calculation on a sequence."""
return sum(iterable)
def g(iterable):
"""This is another calculation on a sequence."""
return max(iterable)
def loop_fcn_v1(length, width):
"""This is the longest loop in a module.
This is the only version that would still work for a generator
which emitted items in an arbitrary order.
"""
d = defaultdict(int)
for comb in combinations(xrange(length), width):
d[comb[0]] += f(comb) * g(comb)
return d
def loop_fcn_v2(length, width, threshold=None):
"""Knowledge of a sequence and a key function permits a conditional stop.
Add a third input parameter and a conditional within the loop.
Since itertools.combinations() is documented to have lexicographic
sort order (sorted order when the input iterable is sorted),
a threshold can be compared against the first index of the output tuple.
If the threshold is reached quickly or the rest of the loop body is slow,
this works great. However, if the optional threshold is not provided and
the rest of the loop body is fast, the cost of performing a comparison
on each iteration can be significant.
"""
d = defaultdict(int)
for comb in combinations(xrange(length), width):
if comb[0] == threshold:
break
d[comb[0]] += f(comb) * g(comb)
return d
def loop_fcn_v3(length, width, threshold=None):
"""A sentinel value can be used to break out of a loop.
A threshold is implemented by introducing None as a sentinel value
into a container of integers. Incrementing an integer by None raises
a TypeError exception, which may be handled by stopping the iteration.
As a cleanup operation, the sentinel value is replaced before returning.
This method gives a performance boost by moving the control flow outside
of the loop. The code remains readable.
"""
d = defaultdict(int)
if threshold is not None:
d[threshold] = None
try:
for comb in combinations(xrange(length), width):
d[comb[0]] += f(comb) * g(comb)
except TypeError:
d.pop(threshold)
return d
def loop_fcn_v4(length, width):
"""Yields (key, value) pairs for each grouping using a conditional.
By making a generator rather than a function, iteration can be
stopped by the calling function.
However, this version handles an edge case imperfectly. If the first
key was nonzero or the iterator was empty, the generator would
initially yield the tuple (0, 0). The defaultdict has the
potential to introduce similar behavior.
"""
key, value = 0, 0
for comb in combinations(xrange(length), width):
if key != comb[0]:
yield (key, value)
key, value = comb[0], 0
value += f(comb) * g(comb)
yield key, value
def loop_fcn_v5(length, width):
"""Yields (key, value) pairs for each grouping by using itertools.groupby().
Neither a sentinel value nor a conditional is necessary because the generator
over the keys allows the iteration to be stopped outside of this function.
The (key, value) pairs can be inserted into a dictionary in the calling
function or processed directly.
Seems pretty readable.
"""
combos = combinations(xrange(length), width)
get_first = itemgetter(0)
for (key, grp) in groupby(combos, get_first):
yield key, sum(f(comb) * g(comb) for comb in grp)
if __name__ == '__main__':
from timeit import Timer
t1 = Timer("loop_fcn_v1(50, 3)", "from __main__ import loop_fcn_v1")
t2 = Timer("loop_fcn_v2(50, 3)", "from __main__ import loop_fcn_v2")
t3 = Timer("loop_fcn_v3(50, 3)", "from __main__ import loop_fcn_v3")
t4 = Timer("dict(loop_fcn_v4(50, 3))", "from __main__ import loop_fcn_v4")
t5 = Timer("dict(loop_fcn_v5(50, 3))", "from __main__ import loop_fcn_v5")
print_template = 'version %d takes %0.1f msec/pass'
print print_template % (1, 1000 * min(t1.repeat(repeat=3, number=10)) / 10)
print print_template % (2, 1000 * min(t2.repeat(repeat=3, number=10)) / 10)
print print_template % (3, 1000 * min(t3.repeat(repeat=3, number=10)) / 10)
print print_template % (4, 1000 * min(t4.repeat(repeat=3, number=10)) / 10)
print print_template % (5, 1000 * min(t5.repeat(repeat=3, number=10)) / 10)
@gofish
Copy link

gofish commented Jun 7, 2011

warning: not a proof

def count(seq):
  return sum(1 for e in seq)

def f(n, w, t):
  return (c for c in combinations(range(n), w) if c[0] < t)

def g(n, w, t):
  for i in range(t):
    for c in combinations(range(i + 1, n), w - 1):
      yield (i,) + c

>>> [count(f(10, 5, i)) for i in range(5)]
[0, 126, 196, 231, 246]
>>> [count(g(10, 5, i)) for i in range(5)]
[0, 126, 196, 231, 246]
>>> list(f(5, 3, 2))
[(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (0, 2, 4), (0, 3, 4), (1, 2, 3), (1, 2, 4), (1, 3, 4)]
>>> list(g(5, 3, 2))
[(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (0, 2, 4), (0, 3, 4), (1, 2, 3), (1, 2, 4), (1, 3, 4)]

@grepdisc
Copy link
Author

grepdisc commented Jun 9, 2011

Thanks

I want to thank several BayPIGgies for their helpful advice, which I have tried to incorporate into the documentation and code above. http://baypiggies.net/

Mea culpa

While the above function g(n, w, t) did satisfy the original requirements, the requirements have been more explicitly specified that these patterns should support any iterator that emits tuples that are sorted with respect to a known key function.

Comment

The subset of functions that return iterators may not all support being called with successive parameters as in the above g(n, w, t)

for c in combinations(range(i + 1, n), w - 1):

Conclusion

However, this highlights some advantages of such function generators. They not only support Jeremy's pattern above, but they also can facilitate distributing a (potentially large) calculation. The use of itertools.groupby() in version 5 is my favorite so far. This solution, and several others, are mentioned in this great thread by a couple luminaries in 2003 http://mail.python.org/pipermail/python-dev/2003-November/040469.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment