Skip to content

Instantly share code, notes, and snippets.

@joyrexus
Created March 27, 2014 18:17
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save joyrexus/9814456 to your computer and use it in GitHub Desktop.
Save joyrexus/9814456 to your computer and use it in GitHub Desktop.
groupby and countby for python

groupby and countby for python

Python has the standard methods for applying functions over iterables, viz. map, filter, and reduce.

For example, we can use filter to filter some numbers by some criterion:

even = lambda x: x % 2 is 0
odd  = lambda x: not even(x)
data = [1, 2, 3, 4]

assert filter(even, data) == [2, 4]
assert filter(odd, data) == [1, 3]

These built-in methods are supplemented by the collection methods in itertools and itertoolz.

What follows is just a quick demonstration of how you might implement and use two iteration methods commonly used for data summarization: groupby and countby.

groupby

Group a collection by a key function.

def groupby(f, seq):
    result = {}
    for value in seq: 
        key = f(value)
        if key in result:
            result[key].append(value) 
        else: 
            result[key] = [value]
    return result

Alternatively, leveraging defaultdict ...

from collections import defaultdict

def groupby(f, seq):
    d = defaultdict(list)
    for i in seq: d[f(i)].append(i)
    return dict(d)
data = [1, 2, 3, 4]
assert groupby(even, data) == { False: [1, 3], True: [2, 4] }
assert groupby(odd, data)  == { True: [1, 3], False: [2, 4] }
names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith']
expected = {3: ['Bob', 'Dan'], 5: ['Alice', 'Edith'], 7: ['Charlie']}
assert groupby(len, names) == expected

countby

Count elements of a collection by a key function.

def countby(f, seq):
    result = {}
    for value in seq: 
        key = f(value)
        if key in result:
            result[key] += 1
        else: 
            result[key] = 1
    return result

Alternatively, leveraging defaultdict ...

def countby(f, seq):
    d = defaultdict(int)
    for i in seq: d[f(i)] += 1
    return dict(d)
assert countby(len, ['cat', 'mouse', 'dog']) == {3: 2, 5: 1}
assert countby(even, [1, 2, 3]) == {True: 1, False: 2}

See also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment