Skip to content

Instantly share code, notes, and snippets.

@AaradhyaSaxena
Last active February 27, 2023 11:30
Show Gist options
  • Save AaradhyaSaxena/381195fcaee145e26053fff5273713f8 to your computer and use it in GitHub Desktop.
Save AaradhyaSaxena/381195fcaee145e26053fff5273713f8 to your computer and use it in GitHub Desktop.
Python

Decorators in Python

A decorator is a design pattern in Python that allows a user to add new functionality to an existing object without modifying its structure. Decorators are usually called before the definition of a function you want to decorate.

def uppercase_decorator(function):
    def wrapper():
        func = function()
        make_uppercase = func.upper()
        return make_uppercase

    return wrapper

def say_hi():
    return 'hello there'

Our decorator function takes a function as an argument, and we shall, therefore, define a function and pass it to our decorator.

decorate = uppercase_decorator(say_hi)
decorate()

Python provides a much easier way for us to apply decorators. We simply use the @ symbol before the function we'd like to decorate.

@uppercase_decorator
def say_hi():
    return 'hello there'

say_hi()

List Comprehensions vs. For Loops #

Many articles, posts, or questions on Stack Overflow emphasize that list comprehensions are faster than for loops in Python. It is more complicated than this.

Create a list using a for loop versus a list comprehension.

import time

iterations = 100000000
start = time.time()
mylist = []
for i in range(iterations):
    mylist.append(i+1)
end = time.time()
print(end - start)

9.90 seconds

start = time.time()
mylist = [i+1 for i in range(iterations)]
end = time.time()
print(end - start)

8.20 seconds

As we can see, the for loop is slower than the list comprehension (9.9 seconds vs. 8.2 seconds).

List comprehensions are faster than for loops to create lists.

But, this is because we are creating a list by appending new elements to it at each iteration. This is slow.

For Loops are Faster than List Comprehensions

Suppose we only want to perform some computations (or call an independent function multiple times) and do not want to create a list.

start = time.time()
for i in range(iterations):
    i+1
end = time.time()
print(end - start)

6.16 seconds

start = time.time()
[i+1 for i in range(iterations)]
end = time.time()
print(end - start)

7.80 seconds

In that case, we see that the list comprehension is 25% slower than the for loop.

For loops are faster than list comprehensions to run functions.

Array Computations are Faster than For Loops

One element is missing here: what’s faster than a for loop or a list comprehension? Array computations! Actually, it is a bad practice in Python to use for loops, list comprehensions, or .apply() in pandas. Instead, you should always prefer array computations. As you can see, we can create our list even faster using list(range())

start = time.time()
mylist = list(range(iterations))
end = time.time()
print(end - start)

4.84 seconds

It only took 4.84 seconds! This is 40% less time-intensive than our previous list comprehension (8.2 seconds). Moreover, creating a list using list(range(iterations)) is even faster than performing simple computations (6.16 seconds with for loops). If you want to perform some computation on a list of numbers, the best practice is not to use a list comprehension or a for loop but to perform array computations.

Array computations are faster than loops.

The View/Copy Headache in pandas¶

In numpy, the rules for when you get views and when you don’t are a little complicated, but they are consistent: certain behaviors (like simple indexing) will always return a view, and others (fancy indexing) will never return a view.

But in pandas, whether you get a view or not—and whether changes made to a view will propagate back to the original DataFrame—depends on the structure and data types in the original DataFrame.

https://www.practicaldatascience.org/html/views_and_copies_in_pandas.html

Eg1: In the first modification, I replaced one integer with another, so that operation could be done in the existing integer array; in the second, I try to put a floating point number into an integer array. This can’t be done, so a new floating point array was created, and that new array replaced the old one as column a in the original DataFrame, breaking the “view” connection.)

To help address this issue, pandas has a built-in alert system that will sometimes warning you when you’re in a situation that may cause problems, called the SettingWithCopyWarning

If you take a subset for any purpose other than immediately analyzing, you should add .copy() to that subsetting. Seriously. Just when in doubt, .copy().

A noticed pattern in the illustrations above, and from them developed an intuition that a column will only lose it’s “view-ness” when one changes the datatype of that column. Though this will always cause problems, it is not the only place problems can arise. What follows isn’t something you need to know, but may be useful if you’re deeply interested.

In the examples above, each column was it’s own object, and so behaved independently. But this is not always the case in pandas. If a DataFrame is created from a single numpy matrix with multiple columns, pandas will try to be efficient by just keeping that matrix intact. But as a result, if you do something (like change the type) of one of the columns that is tied to that matrix, pandas will create new arrays to back all the columns that were once tied to the matrix. As a result, a view of a single column can stop being a view due to changes to a different column.

https://stackoverflow.com/questions/23296282/what-rules-does-pandas-use-to-generate-a-view-vs-a-copy

Here's the rules, subsequent override:

  • All operations generate a copy
  • If inplace=True is provided, it will modify in-place; only some operations support this
  • An indexer that sets, e.g. .loc/.iloc/.iat/.at will set inplace.
  • An indexer that gets on a single-dtyped object is almost always a view (depending on the memory layout it may not be that's why this is not reliable). This is mainly for efficiency. (the example from above is for .query; this will always return a copy as its evaluated by numexpr)
  • An indexer that gets on a multiple-dtyped object is always a copy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment