Skip to content

Instantly share code, notes, and snippets.

@raybuhr
Created June 21, 2019 15:50
Show Gist options
  • Save raybuhr/9481077b4c95c80591c6f0736329925a to your computer and use it in GitHub Desktop.
Save raybuhr/9481077b4c95c80591c6f0736329925a to your computer and use it in GitHub Desktop.
More than you wanted to know about loops in python

Loops

Loops are a concept for repeat an action on each item in a collection. In day to day life, you might think of this like brushing your teeth -- for each tooth in your mouth scrub with toothbrush and a little bit of toothpaste.

For Loops

The basic format for looping in python is usually taught like this:

>>> for number in range(5):
...     print(number)

0
1
2
3
4
>>> 

What happened here? What is range? What is number?

In python, the function range creates an integer generator.

>>> range(5)

range(0, 5)

>>> 

In more simple terms, range creates a starting point, a stopping point, and how big of steps to take, but doesn't create any data. In order to be super efficient, it waits until you use the integers in the range to do anything. If we call the help function on range you can see more details and explanations:

>>> help(range)

class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object

 ...

By default, range only expects the stopping point and assumes the start is 0 and the step is 1, but you can choose other values when they make sense. Additionally, the starting point is always included and the stopping point is always excluded. In our example, range(5) you can visualize the data that function generates by explicitly converting it to a list.

>>> list(range(5))                                     
[0, 1, 2, 3, 4]

So range(5) gave us 5 integers, 0 through 4. In our example, we print each of those integers by telling python to act on each one. This is all a for loop does. So what was number in our example? It was a variable, same as if we assigned one normally like number = 0. So for each item in the collection of integers from 0 through 4, the for loop first assigns the variable number to the item and then carries our whatever task is below. In fact, at the end of your for loop, you can call the variable and it will come back as the last item ran.

>>> for number in range(5): 
...     pass

>>> number
4

In that example, pass keyword is used to do nothing.

While Loops

An alternative to for is while. Any loop can be constructed using either keyword, but usually for is used to repeat through a collection and while is used to repeat until some specific condition is met.

A typical example for a while loop might be:

>>> countdown = 5
>>> while countdown >= 0:
...     if countdown > 0:
...         print(countdown)
...     else:
...         print("Blastoff!")
...     countdown -= 1

5
4
3
2
1
Blastoff!
>>>  

While loops are common sources of bugs in python code. It's really easy to miss something and end up with a loop that never ends. For example, I did this when constructing this example -- I got so excited about printing out "Blastoff!" than I forgot to add that final line to subtract 1 from the countdown!

On the other hand, sometimes you do want a program to run forever until the person using the program decides to quit. For example, if you wanted to build a never ending Tetris clone, you might initiate the program as:

while True:
    play_tetris()

That program would just keep running until it window was closed or maybe a keyboard interrupt was encountered.

Enumerate

Sometimes when using loops you need to keep track of both the item and how far along you are. This is exactly what the enumerate function does.

>>> fruits = ['apple', 'banana', 'cherry', 'dragonfruit']
>>> for i, fruit in enumerate(fruits):
...     print("item:", i, "is", fruit)
item: 0 is apple
item: 1 is banana
item: 2 is cherry
item: 3 is dragonfruit

Any iterable (i.e. thing you can loop over) works with enumerate. You may find this handy when working with different lists of the same length where you want to update one based on the other or want to use items from one list in another.

>>> tastes_good = [False, True, False, True]
>>> for i, fruit in enumerate(fruits):
...     if tastes_good[i]:
...         print(fruit, "tastes good")

banana tastes good
dragonfruit tastes good

Zip

Like that last example in enumerate, sometime you actually want to combine multiple iterables into a single one joined by the elements. This is where the zip function comes in.

For example, if we want to loop over the combine lists to create a dict of fruits and their taste, we could do something like:

>>> fruit_data = {}
>>> for fruit, taste in zip(fruits, tastes_good):
...     fruit_data[fruit] = taste
>>> fruit_data
{'apple': False, 'banana': True, 'cherry': False, 'dragonfruit': True}

This example was meant to show how you can loop through a zip object, which you may want to do if you need to apply some function or logic to the items conditionally. That said, you could create the dict more simply with just:

>>> fruit_data = dict(zip(fruits, tastes_good))

Looping over Dictionaries

The dict in python is a {key: value} data structure. A dict key can be anything that is hashable, which is to say anything that is not itself some sort of indexed data structure. For example, any string or number is a valid key, but a list or another dictionary is not. Interestingly, a tuple is a valid key even though it looks like a list because it is immutable (i.e. once it is defined it can't be changed). A dict value can be any object, including another dict.

With that information in mind, python gives you a few options of how to loop over a dict. Using the regular for loop way only looks at the keys.

>>> for thing in fruit_data:
...     print(thing)
apple
banana
cherry
dragonfruit

Additionally, dict objects have methods for getting just the keys, just the values, or both:

>>> fruit_data.keys()
dict_keys(['apple', 'banana', 'cherry', 'dragonfruit'])
>>> fruit_data.values()
dict_values([False, True, False, True])
>>> fruit_data.items()
dict_items([('apple', False), ('banana', True), ('cherry', False), ('dragonfruit', True)])

Nested For Loops

Sometimes your data and logic might be complex enough that a single loop won't be enough to check all conditions or make all necessary changes. This might occur when working with data from the web, which is typically in JSON format.

{
    "time": "2019-04-12 11:49:07",
    "pets": {
        "dogs": [
            {
                "name": "Koda",
                "sex": "female"
            },
            {
                "name": "Wilbur",
                "sex": "male"
            }
        ],
        "cats": [
            {
                "name": "Tipsy",
                "sex": "female"
            },
            {
                "name": "Balto",
                "sex": "male"
            }
        ]
    }
}

Let's assume we converted this JSON to a python dictionary (maybe using json.load()) names pet_data. If we access the pets from the dict, we would get another dict:

>>> with open("pets.json") as pets:
...     pet_data = json.load(pets)

>>> pet_data
{'time': '2019-04-12 11:49:07', 'pets': {'dogs': [{'name': 'Koda', 'sex': 'female'}, {'name': 'Wilbur', 'sex': 'male'}], 'cats': [{'name': 'Tipsy', 'sex': 'female'}, {'name': 'Balto', 'sex': 'male'}]}}

>>> pet_data["pets"]
{'dogs': [{'name': 'Koda', 'sex': 'female'}, {'name': 'Wilbur', 'sex': 'male'}], 'cats': [{'name': 'Tipsy', 'sex': 'female'}, {'name': 'Balto', 'sex': 'male'}]}

Let's say we want to enhance this data by adding the sound the animal makes -- dogs go "woof" and cats go "meow".

First just to show what each item results in:

>>> for pet_type, pets in pet_data["pets"].items():
...     print(pet_type)
...     for pet in pets:
...         print(pet)
dogs
{'name': 'Koda', 'sex': 'female'}
{'name': 'Wilbur', 'sex': 'male'}
cats
{'name': 'Tipsy', 'sex': 'female'}
{'name': 'Balto', 'sex': 'male'}

Next how we might accomplish the task:

>>> for pet_type, pets in pet_data["pets"].items():
...     if pet_type == "dogs": 
...         for pet in pets: 
...             pet["sound"] = "woof!" 
...     if pet_type == "cats": 
...         for pet in pets: 
...             pet["sound"] = "meow"

>>> pet_data["pets"]
{'dogs': [{'name': 'Koda', 'sex': 'female', 'sound': 'woof!'}, {'name': 'Wilbur', 'sex': 'male', 'sound': 'woof!'}], 'cats': [{'name': 'Tipsy', 'sex': 'female', 'sound': 'meow'}, {'name': 'Balto', 'sex': 'male', 'sound': 'meow'}]}

As you can see, nested loops are really small components of logic stacked together. This means we could probably break out the logic into small functions in order to keep our code DRY (don't repeat yourself) and more importantly allow for testing (it's easy to test a small function, but difficult to test a large, complex loop).

Loop Performance

A lot of the times you want to get the results of a loop back into a list. You could could accomplish this by creating an empty list and adding the results to that list.

>>> results = []
>>> for number in range(1, 11): 
...     results.append(number ** 2)

>>> results
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

What's wrong with this approach? It looks straight-forward and is easy to understand, which is good, but it's a little inefficient. For small lists, this is totally fine. When working with really large lists, this starts to become a problem because of how python represents lists "under the hood". Specifically, each iteration in the loop causes python to have to grow the list as it goes. A more efficient approach would be to create an empty list of the size of the results first and then add each item (this is known as preallocation).

>>> better_results = [None] * 10
>>> better_results
[None, None, None, None, None, None, None, None, None, None]
>>> for number in range(10): 
...     better_results[number] = (number + 1) ** 2
>>> better_results
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

We can prove this using the timeit module, which is really easy to do in ipython or jupyter using the %%timeit magic.

In [5]: %%timeit 
   ...: results = [] 
   ...: for number in range(10_000_000): 
   ...:     results.append(number ** 0.5) 
   ...:
1.43 s ± 38.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [6]: %%timeit 
   ...: results = [None] * 10_000_000 
   ...: for number in range(10_000_000): 
   ...:     results[number] = number ** 0.5 
   ...:
1.14 s ± 36.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

While it might not seem that much better, this is a pretty simple example. Also, if you are doing this often, each savings can add up in the long run.

Map, Filter, Reduce

Python has support for applying functions to iterables instead of using a loop. The idea is exactly the same, but instead of writing a loop statement, we wrap our work in a function.

Map

The map function runs another function against each item in an iterable. For example, say we want to get the square root of a list of numbers. The for loop syntax could be:

>>> square_roots = []
>>> for number in range(10):
...     square_roots.append(sqrt(number))
>>> square_roots
[0.0, 1.0, 1.4142135623730951, 1.7320508075688772, 2.0, 2.23606797749979, 2.449489742783178, 2.6457513110645907, 2.8284271247461903, 3.0]

We can shorten the amount of code quite a bit using map instead, but it works a little differently. For one, it doesn't immediately calculate the results. Instead, in order to be efficient it waits until we do something with that map object as well as let's us work on only one item at time. This prevents really large lists from having to be built at once. The caveat is that you only get one chance at each item before it gets cleared from memory. To read more about this, check out this article from Real Python on Generators.

>>> square_roots = map(sqrt, range(10))
>>> square_roots
<map object at 0x1251b0828>
>>> list(square_roots)
[0.0, 1.0, 1.4142135623730951, 1.7320508075688772, 2.0, 2.23606797749979, 2.449489742783178, 2.6457513110645907, 2.8284271247461903, 3.0]

Performance of using map can be hit or miss. For this example it was very good. Compared to the preallocated list creation, 1.14 s ± 36.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each), we see quite a big speedup.

In [17]: %%timeit 
    ...: square_roots = list(map(sqrt, range(10_000_000))) 
    ...:                   
697 ms ± 26.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Filter

If we have an iterable and we just to exclude items from it, the filter function is a great way to do this.

Say we only want to keep the even numbers, the for loop way might be:

>>> def number_is_even(number):
...     return number % 2 == 0

>>> numbers = list(range(20))
>>> even_numbers = []
>>> for number in numbers: 
...     if number_is_even(number):
...         even_numbers.append(number)
>>> even_numbers
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

The much shorter filter way would be:

>>> f = filter(number_is_even, numbers)
>>> f
<filter object at 0x126886cf8>
>>> list(f)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

As you can see, this works the same way as map and also has similar performance.

In [22]: %%timeit 
    ...: numbers = list(range(1_000_000)) 
    ...: even_numbers = [] 
    ...: for number in numbers:  
    ...:     if number_is_even(number): 
    ...:         even_numbers.append(number) 
    ...:
162 ms ± 1.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [23]: %%timeit 
    ...: list(filter(number_is_even, range(1_000_000))) 
    ...:
110 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Reduce

Whenever you loop through an iterable in order to aggregate or combine results, we store a current state and modify it with each new item. This is known as reduce in functional programming.

Side Note: In python2, reduce was a built-in function. In python3 it was moved to the functools module so needs to be imported.

Say we want multiply all the items of a list together.

>>> total = 1
>>> for number in range(1, 11): 
...     total *= number
>>> total
3628800

The reduce way to do this requires us to use a function that multiplies numbers together instead of using the operator *. We could write this as:

def multiply(x, y):
    return x * y

However, the built-in operators are also accessible from the operator module.

>>> import operator
>>> multiply(4, 5) == operator.mul(4, 5)
True

So the reduce way could be:

>>> reduce(operator.mul, range(1, 11))
3628800

So what about performance? For this specific example, the performance is almost identical.

In [12]: %%timeit 
    ...: total = 1 
    ...: for number in range(1, 100_000): 
    ...:     total *= number 
    ...:
2.33 s ± 71 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [13]: %%timeit 
    ...: reduce(operator.mul, range(1, 100_000)) 
    ...:  
    ...:
2.29 s ± 40.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In general, performance using map, filter and reduce can sometimes be slower than a for loop depending on the function getting applied and how the code in the for loop is written. Whenever you aren't sure, running some benchmark tests like timeit can be very helpful.

List Comprehensions

What are list comprehensions? They are just a convenience form for doing exactly this, otherwise known as syntactic sugar. The form of a list comprehension goes like this:

  1. create the empty list
  2. add the result item first
  3. add the for loop statement next
  4. add any conditionals last

Say you have this loop:

divisible_by_seven = [ ]
for num in range(100):
    if not num % 7:
        divisible_by_seven.append(num)

Step 1:

divisible_by_seven = []

Step 2:

divisible_by_seven = [num]

Step 3:

divisible_by_seven = [num for num in range(100)]

Step 4:

divisible_by_seven = [num for num in range(100) if not num % 7]

Additionally, you are free to use extra whitespace to make it more readable as your list comprehensions can't become longer than a typical single line. For example, this is equivalent to the prior example:

divisible_by_seven = [
    num
    for num in range(100)
    if not num % 7
]

Lastly, let's benchmark our previous preallocated list creation using the comprehension syntax. For reference, the stats for that way were 1.14 s ± 36.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each).

In [7]: %%timeit 
   ...: results = [sqrt(number) for number in range(10_000_000)]
   ...:
985 ms ± 44.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Not only is this less to write, it's actually gains a little bit more performance over the standard loop. You might also notice this is slower than the map version. That's definitely true in this example, but in practice the difference may not always be very big. In general, map works really great for cases where both the function and the iterable are already defined and the function overhead is minimal, where list comprehensions work really well when building logic and conditions on the fly.

When not to use list comprehension

There is not hard or fast rule, but in general an old style for loop can be more readable after a certain amount of complexity. Here are some basic rules that I find helpful.

  • When there are more than two for statements in the loop, should probably break that nested data down first.
  • When there are more than two if statements in the for loop, should probably turn that into a function first.

Dict Comprehensions

Like the list comprehension, the dict comprehension is syntactic sugar for build dictionaries in an efficient manner.

Old way to build a dict:

>>> is_even = {}
>>> for number in range(10):
...     is_even[number] = number % 2 == 0 
... 
>>> is_even
{0: True,
 1: False,
 2: True,
 3: False,
 4: True,
 5: False,
 6: True,
 7: False,
 8: True,
 9: False}

Using a dict comprehension goes like this:

  1. create the empty dict
  2. add key: value item
  3. add the for loop statement next
  4. add any conditionals last

So it's basically the same as list comprehension but you have a key: value item instead of a single item.

>>> is_even = {number: number % 2 == 0 for number in range(10)}
>>> is_even
{0: True,
 1: False,
 2: True,
 3: False,
 4: True,
 5: False,
 6: True,
 7: False,
 8: True,
 9: False}

Itertools

In addition to the built-in loop constructors, python comes with the module itertools containing functions creating iterators for efficient looping.

The functions in itertools can help for situations where you need to do things like:

  • cycle through a list forever until you say stop, e.g. start back at the beginning every it reaches the end
  • repeat the items from a small list into a much bigger list
>>> list(itertools.repeat([1, 2], 4))
>>> [[1, 2], [1, 2], [1, 2], [1, 2]]
  • chain multiple lists together into a single long list
>>> list(itertools.chain([1, 2, 3], ['A', 'B', 'C']))
>>> [1, 2, 3, 'A', 'B', 'C']
  • product of multiple lists into a list of all the combinations
>>> list(itertools.product([1, 2, 3], ['A', 'B', 'C']))
[(1, 'A'),
 (1, 'B'),
 (1, 'C'),
 (2, 'A'),
 (2, 'B'),
 (2, 'C'),
 (3, 'A'),
 (3, 'B'),
 (3, 'C')]

And many more! Whenever you think there's a better way to work with loops than what you're doing now, consider looking up itertools examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment