Skip to content

Instantly share code, notes, and snippets.

@FilippoBovo
Last active January 26, 2022 13:49
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save FilippoBovo/f55a12890394894fc5acb4ade50ed289 to your computer and use it in GitHub Desktop.
Save FilippoBovo/f55a12890394894fc5acb4ade50ed289 to your computer and use it in GitHub Desktop.
Decorator Example in Data Science

Decorator Example in Data Science

This example explains Python decorators in the context of data science. The example acts as a quick reminder, rather than a complete guide.

Consider a Pandas DataFrame about posts on a social media. The DataFrame, called posts, contains a column with the number of likes for each post.

post_id ... likes ...
1 ... 43 ...
2 ... 92 ...
3 ... 54 ...

The following function calculates the average number of likes per post.

def average_likes(data):
    return data['likes'].mean()

We would like to decorate the function with a decorator checking that the likes column has integer type. Such a decorated function may look like the following.

@data_column_has_int_type('likes')
def average_likes(data):
    return data['likes'].mean()

The decorator can be defined in the following way.

def data_column_has_int_type(column):
    def decorator(function):
        def wrapper(*args, **kwargs):
            data = args[0]
            if not pandas.api.types.is_integer_dtype(data[column]):
                raise ValueError(f"Column {column} does not have integer type.")
            return function(*args, **kwargs)
        return wrapper
    return decorator

The decorated function, average_likes, is equivalent to:

data_column_has_int_type('likes')(average_likes)(posts)

This composition of functions unwraps as:

  • data_column_has_int_type('likes')decorator with column set to 'likes', equivalent to:

    def decorator(function):
        def wrapper(*args, **kwargs):
            data = args[0]
            if not pandas.api.types.is_integer_dtype(data['likes']):  # <- See change
                raise ValueError(f"Column {column} does not have integer type.") 
            return function(*args, **kwargs)
        return wrapper
  • decorator(average_likes)wrapper with function set to 'average_likes', equivalent to:

    def wrapper(*args, **kwargs):
        data = args[0]
        if not pandas.api.types.is_integer_dtype(data['likes']):
            raise ValueError(f"Column {column} does not have integer type.")
        return average_likes(*args, **kwargs)  # <- See change
  • wrapper(posts) becomes:

    if not pandas.api.types.is_integer_dtype(posts['likes']):
            raise ValueError(f"Column {column} does not have integer type.")
    average_likes(posts)  # <- See change

This flow can be viewed compactly as,

data_column_has_int_type('likes')(average_likes)(data)
                        decorator(average_likes)(data)  # column='likes'
                                         wrapper(data)  # column='likes', function=average_likes
@FilippoBovo
Copy link
Author

FilippoBovo commented Jul 27, 2018

@kurtu5, you are correct, and it is good to use @wraps in decorators. With @wraps, the undecorated function name and docstring would be preserved in the decorated function.

I omitted them from the short guide as I wanted to focus on the logic of decorators.

I leave the link to @wraps in the Python documentation here for those interested to learn more about it: https://docs.python.org/3/library/functools.html#functools.wraps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment