Skip to content

Instantly share code, notes, and snippets.

@canwe
Forked from joyrexus/README.md
Created October 31, 2020 08:36
Show Gist options
  • Save canwe/5994a0e20a485c94cb93e81617751162 to your computer and use it in GitHub Desktop.
Save canwe/5994a0e20a485c94cb93e81617751162 to your computer and use it in GitHub Desktop.
groupby and countby for python

groupby and countby for python

Python has the standard methods for applying functions over iterables, viz. map, filter, and reduce.

For example, we can use filter to filter some numbers by some criterion:

even = lambda x: x % 2 is 0
odd  = lambda x: not even(x)
data = [1, 2, 3, 4]

assert filter(even, data) == [2, 4]
assert filter(odd, data) == [1, 3]

These built-in methods are supplemented by the collection methods in itertools and itertoolz.

What follows is just a quick demonstration of how you might implement and use two iteration methods commonly used for data summarization: groupby and countby.

groupby

Group a collection by a key function.

def groupby(f, seq):
    result = {}
    for value in seq: 
        key = f(value)
        if key in result:
            result[key].append(value) 
        else: 
            result[key] = [value]
    return result

Alternatively, leveraging defaultdict ...

from collections import defaultdict

def groupby(f, seq):
    d = defaultdict(list)
    for i in seq: d[f(i)].append(i)
    return dict(d)
data = [1, 2, 3, 4]
assert groupby(even, data) == { False: [1, 3], True: [2, 4] }
assert groupby(odd, data)  == { True: [1, 3], False: [2, 4] }
names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith']
expected = {3: ['Bob', 'Dan'], 5: ['Alice', 'Edith'], 7: ['Charlie']}
assert groupby(len, names) == expected

countby

Count elements of a collection by a key function.

def countby(f, seq):
    result = {}
    for value in seq: 
        key = f(value)
        if key in result:
            result[key] += 1
        else: 
            result[key] = 1
    return result

Alternatively, leveraging defaultdict ...

def countby(f, seq):
    d = defaultdict(int)
    for i in seq: d[f(i)] += 1
    return dict(d)
assert countby(len, ['cat', 'mouse', 'dog']) == {3: 2, 5: 1}
assert countby(even, [1, 2, 3]) == {True: 1, False: 2}

See also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment