Skip to content

Instantly share code, notes, and snippets.

@Grissess
Created April 5, 2024 04:52
Show Gist options
  • Save Grissess/02461057ee6539887ec80b198a67bb0b to your computer and use it in GitHub Desktop.
Save Grissess/02461057ee6539887ec80b198a67bb0b to your computer and use it in GitHub Desktop.
Iterators and Iterables
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "2a736b9d-916f-47d2-98ec-c9a419325302",
"metadata": {},
"source": [
"# Iterators and Iterables\n",
"\n",
"Every now and then, I find a musing on the nature of these two objects that's subtly incorrect--often from beginners--belying a lack of understanding of the distinction between these two concepts. They have _almost_ the same name, don't they? Regardless, that they tend to be a source of confusion represents an excellent teaching opportunity, so I invite you to follow along the following (interactive!) conversation with Python to demystify them. I hope in the process, you'll be able to go forth with a bit more understanding of:\n",
"\n",
"- What does `for` _really_ do?\n",
"- Generators\n",
"- `async`/`await`\n",
"\n",
"Which is a surprising breadth from such a simple little concept!\n",
"\n",
"First, though, let's get the precise definitions out of the way. An **iterable** is defined as follows:\n",
"\n",
"> One method needs to be defined for container objects to provide iterable support:\n",
">\n",
"> `container.__iter__()`\n",
">\n",
"> Return an iterator object. The object is required to support the iterator protocol described below.\n",
"\n",
"-- https://docs.python.org/3/library/stdtypes.html#iterator-types\n",
"\n",
"In other words, an _iterable_ is any object implementing a protocol that returns an _iterator_. We can access that particular iterator using the `iter` built-in function, but let's not get ahead of ourselves; an **iterator** is thus defined:\n",
"\n",
"> `iterator.__next__()`\n",
">\n",
"> Return the next item from the iterator. If there are no further items, raise the StopIteration exception.\n",
"\n",
"-- idem\n",
"\n",
"... and there you have it. As you might've surmised, there's a built-in function `next` that merely calls this method (or its equivalent C slot, more accurately). With that, we can begin to appreciate iterators and iterables in the shell; let's start with a hopefully familiar one, the list."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b8b9ac63-56fb-4b25-9938-f0a45eb66626",
"metadata": {},
"outputs": [],
"source": [
"ls = [1, 2, 3]"
]
},
{
"cell_type": "markdown",
"id": "c21568be-765b-41f5-b2f6-5f608e38494a",
"metadata": {},
"source": [
"One of the most familiar things one can do with an iterable is **iterate** over it with a `for` loop, so let's do that now."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "40712e00-9d5e-4340-97c7-54475552e275",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"2\n",
"3\n"
]
}
],
"source": [
"for item in ls:\n",
" print(item)"
]
},
{
"cell_type": "markdown",
"id": "51a460c8-d235-4482-a773-7db5d6c98bdc",
"metadata": {},
"source": [
"Those results should come as a surprise to no one. Just for fun, let's do it again."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f366bf5d-b428-4452-80e2-38b17846c99e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"2\n",
"3\n"
]
}
],
"source": [
"for item in ls:\n",
" print(item)"
]
},
{
"cell_type": "markdown",
"id": "3889b8fb-2cce-4e45-bfed-afce8710d692",
"metadata": {},
"source": [
"The list's value hasn't changed, so we can anticipate that we can do this as many times as we'd like and still see the same three numbers. Even though lists in Python are nominally mutable, we aren't mutating it, so, in effect, this list is a fixed data structure of three elements, floating around in memory somewhere. Appreciate that fact for a moment as we spawn our first iterator."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "8db07a0d-b2d2-4a46-b372-807101216cd0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<list_iterator object at 0x77d7aa164430>\n"
]
}
],
"source": [
"it = iter(ls)\n",
"print(it)"
]
},
{
"cell_type": "markdown",
"id": "14679fc3-884f-4a9a-a35d-588dcf1d9de4",
"metadata": {},
"source": [
"What is this magical object? It is, indeed, an iterator--a `list_iterator`, more precisely. It's _not_ the original list, as we can see by its representation; however, it does carry a reference to it. We'll see the implications of that later; for now, let's just do the one thing it says we can do with an iterator."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "327acdfd-c350-44b3-881c-65f2ff75a8b1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n"
]
}
],
"source": [
"print(next(it))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "c7a5d527-3e84-4e03-9756-ffdf8fa9eb98",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2\n"
]
}
],
"source": [
"print(next(it))"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "7cb1d77f-d71b-432d-9358-1b43ee2330b9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3\n"
]
}
],
"source": [
"print(next(it))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "73184cc1-959a-46fb-b68e-38744dc0397d",
"metadata": {},
"outputs": [
{
"ename": "StopIteration",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[8], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;28;43mnext\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mit\u001b[49m\u001b[43m)\u001b[49m)\n",
"\u001b[0;31mStopIteration\u001b[0m: "
]
}
],
"source": [
"print(next(it))"
]
},
{
"cell_type": "markdown",
"id": "671e4151-6de3-4f5d-99d5-fa451b51d6f4",
"metadata": {},
"source": [
"And there you have it, the iterator protocol in a nutshell. If we ponder that iteration comes down to \"just calling this function repeatedly\", we can start to see what the `for` loop really looks like behind the scenes; for some code written as:\n",
"\n",
"```python\n",
"for NAME, NAME, ... in EXPRESSION:\n",
" # BLOCK\n",
"```\n",
"\n",
"It's approximately the same, up to bytecode variation and name hygiene, to do so as such:\n",
"\n",
"```python\n",
"iterator = iter(EXPRESSION)\n",
"while True:\n",
" try:\n",
" NAME, NAME, ... = next(iterator)\n",
" except StopIteration:\n",
" break\n",
" else:\n",
" # BLOCK\n",
"```\n",
"\n",
"(There are some complications with the `else` block that would make this a bit harder, so I'm ignoring that for now.)\n",
"\n",
"There are some important implications in this unravelled structure (that are also true in the real implementation): for starters, `EXPRESSION` is only evaluated once, which means that its dependencies on program state are captured _right_ at the entry to the `for` loop; secondly, the object so evaluated is only explicitly _used_ once, as a parameter to `iter` (and thus has its `__iter__`) method invoked. That \"explicitly\" is a key word; it is often a case, as an implementation detail, that an iterator borrows the iterable, and thereby uses it during each iteration.\n",
"\n",
"There are other iterators, of course, including ones returned explicitly by other methods. Here's a sampling:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "d2b661ef-988e-4369-b3be-e5a9d415f303",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<tuple_iterator object at 0x77d798f3efb0>\n",
"<set_iterator object at 0x77d7982a5940>\n",
"<dict_keyiterator object at 0x77d7982d89a0>\n",
"<dict_itemiterator object at 0x77d7982d89a0>\n",
"<_io.TextIOWrapper name='/dev/null' mode='r' encoding='UTF-8'>\n"
]
}
],
"source": [
"print(iter(()))\n",
"print(iter(set()))\n",
"print(iter({}))\n",
"print(iter({}.items()))\n",
"import os\n",
"devnull = open(os.devnull)\n",
"print(iter(devnull))"
]
},
{
"cell_type": "markdown",
"id": "ab57d3c2-5524-43bb-80f3-9596863adad5",
"metadata": {},
"source": [
"Let's zoom in on that last one, the \"file iterator\". That looks suspiciously like the open file, doesn't it?"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "72b7f6b1-6def-46f8-9123-dca87e186164",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<_io.TextIOWrapper name='/dev/null' mode='r' encoding='UTF-8'>\n"
]
}
],
"source": [
"print(devnull)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "aeca639e-530a-427b-8704-59a1b771d631",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True\n"
]
}
],
"source": [
"print(devnull is iter(devnull))"
]
},
{
"cell_type": "markdown",
"id": "3c641413-98b0-4b4c-bac3-cf78bd12edad",
"metadata": {},
"source": [
"It is! They're the exact same object! This alludes to one more detail of Python's iterator protocol that I didn't mention off the bat, even though the developers think it is a core feature:\n",
"\n",
"> `iterator.__iter__()`\n",
">\n",
"> Return the iterator object itself. This is required to allow both containers and iterators to be used with the for and in statements.\n",
"\n",
"-- https://docs.python.org/3/library/stdtypes.html#iterator-types\n",
"\n",
"As you can see, their justification is that it allows an end-user to be agnostic to the fact that `iter` is always called on the `for` loop `EXPRESSION`, as above, but it belies an even more important fact: **all iterators are iterable**. Every iterator is, in itself, iterable; its iterator is itself. That means we can take old `ls` from the beginning and `for`-loop over its iterator, right?"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "2b3cab59-e7c4-4dea-bc9f-2fb5f735d15d",
"metadata": {},
"outputs": [],
"source": [
"iterator = iter(ls)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "4e4d5d8d-47e7-472b-b56b-65fd88fd1345",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"2\n",
"3\n"
]
}
],
"source": [
"for item in iterator:\n",
" print(item)"
]
},
{
"cell_type": "markdown",
"id": "017f3822-e0cf-4c64-9bb6-d7424c3b84a6",
"metadata": {},
"source": [
"Like a charm. As before, let's try doing just that loop again."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "ea524937-eee7-4f6e-adb4-4aa6dfb93e2e",
"metadata": {},
"outputs": [],
"source": [
"for item in iterator:\n",
" print(item)"
]
},
{
"cell_type": "markdown",
"id": "d6968473-dfd8-40e5-9d22-3082f16a43dc",
"metadata": {},
"source": [
"Wait, what happened? When we did that with `ls` itself, above, it always produced the same three values! Why does the second loop produce nothing?\n",
"\n",
"Go ahead, I invite you to ponder this yourself for a moment. I can wait.\n",
"\n",
"Ready to continue? Alright, let's go.\n",
"\n",
"As can be seen by calling `next` on the iterator above, the iterator is eventually exhausted by raising `StopIteration`. What's more, since all iterators are iterable, returning themselves, the `iter` call in the loop setup is the identity function, and so the same object receives the `next` call _across both loops_. I'll admit, we didn't cover what happens when one calls `next` on an already-exhausted iterator, but it shouldn't come as a surprise."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "8d4cfd4f-f9d0-424c-89aa-81b3cd98f057",
"metadata": {},
"outputs": [
{
"ename": "StopIteration",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[15], line 2\u001b[0m\n\u001b[1;32m 1\u001b[0m iterator \u001b[38;5;241m=\u001b[39m \u001b[38;5;28miter\u001b[39m([])\n\u001b[0;32m----> 2\u001b[0m \u001b[38;5;28;43mnext\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43miterator\u001b[49m\u001b[43m)\u001b[49m\n",
"\u001b[0;31mStopIteration\u001b[0m: "
]
}
],
"source": [
"iterator = iter([])\n",
"next(iterator)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "6d39f372-fd7b-4397-81a7-de73103e08c3",
"metadata": {},
"outputs": [
{
"ename": "StopIteration",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[16], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;43mnext\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43miterator\u001b[49m\u001b[43m)\u001b[49m\n",
"\u001b[0;31mStopIteration\u001b[0m: "
]
}
],
"source": [
"next(iterator)"
]
},
{
"cell_type": "markdown",
"id": "1b77bb6b-dc3a-47ce-8b32-918a0e3fd31d",
"metadata": {},
"source": [
"_Standard_ library iterators always give up once they've stopped; that's true even of iterators on views, such as our friend, the `list_iterator`, which nominally keeps track of changes to the underlying list."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "82ad9a96-8f91-4619-9814-c9e21764366e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"2\n",
"3\n",
"4\n"
]
}
],
"source": [
"another_list = [1, 2, 3]\n",
"iterator = iter(another_list)\n",
"another_list.append(4)\n",
"for item in iterator:\n",
" print(item)"
]
},
{
"cell_type": "markdown",
"id": "9b4eef9a-bbf4-48f3-ad5f-7999a5d70631",
"metadata": {},
"source": [
"If we've already exhausted this iterator, it won't return any new items later."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "d48e4524-e4e2-4492-8585-ba0a0a38c2cf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"2\n",
"3\n"
]
}
],
"source": [
"another_list = [1, 2, 3]\n",
"iterator = iter(another_list)\n",
"for item in iterator:\n",
" print(item)\n",
"\n",
"another_list.append(4)\n",
"for item in iterator:\n",
" print(item)"
]
},
{
"cell_type": "markdown",
"id": "a15f5988-5770-40a3-88ce-7752e37a6b47",
"metadata": {},
"source": [
"(The `4` is conspicuously absent.)\n",
"\n",
"Taken altogether, this has another important implication: **iterators carry state**. The state of an iterator for a list (or any sequence, really) could conceptually be the index into the sequence. There's no requirement for that to be the case, of course; a tree-traversing iterator, as we'll see in a moment, might store the current node, and the iterator for `dict` (over its keys) has no clean pure-Python equivalent. This is why we can keep iterating over the iterable and expecting the same results: the `iter(EXPRESSION)` call mints a new, fresh iterator each time, with a new internal state, starting from \"the beginning\"; when that call is omitted (or made ineffective by using `iter` as identity), the state dependency of an iterator is much more obvious.\n",
"\n",
"## Rolling your Own\n",
"\n",
"We can, of course, implement our own iterables and iterators; the slots are automatically populated by Python using the special `__iter__` and `__next__` names. So, for example, if our object has an internal sequence we'd like to be able to iterate over directly, we can expose its implementations directly in these methods. Say we have a class that's really just a view into a sequence, but with something extra (such as caching or lifecycle management). We can make it, for all intents and purposes, be as iterable as its underlying sequence by implementing `__iter__`."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "8b61e328-3f6b-40ca-a732-a52bc3fa3d48",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"alice\n",
"bob\n",
"eve\n"
]
}
],
"source": [
"class UserCache:\n",
" def __init__(self, *users):\n",
" self.users = list(users)\n",
" # pretend there's other logic here...\n",
"\n",
" def __iter__(self):\n",
" return iter(self.users)\n",
"\n",
"cache = UserCache('alice', 'bob', 'eve')\n",
"for user in cache:\n",
" print(user)"
]
},
{
"cell_type": "markdown",
"id": "1286a896-20b1-4282-8f43-7a0110b8e86e",
"metadata": {},
"source": [
"Similarly, we can make our own iterators, too. This is where it becomes obvious that we need to hold state. Let's start with a reimplementation of `range` in pure Python."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "30218afd-3b66-439b-8476-4d60590dc2ef",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"3\n",
"5\n",
"1\n",
"3\n",
"5\n"
]
}
],
"source": [
"class Range:\n",
" def __init__(self, *args):\n",
" self.slice = slice(*args)\n",
" self.current = 0 if self.slice.start is None else self.slice.start\n",
" self.step = 1 if self.slice.step is None else self.slice.step\n",
"\n",
" def __iter__(self):\n",
" return self # Remember: all iterators should be iterable\n",
"\n",
" def __next__(self):\n",
" if self.current >= self.slice.stop:\n",
" raise StopIteration()\n",
" current = self.current\n",
" self.current += self.step\n",
" return current\n",
"\n",
"for i in range(1, 6, 2):\n",
" print(i)\n",
"for i in Range(1, 6, 2):\n",
" print(i)"
]
},
{
"cell_type": "markdown",
"id": "59b19953-2526-49a8-bbe6-b03ed8ec4995",
"metadata": {},
"source": [
"We can do better, of course; rather than reimplement a standard iterator, let's make one for a contrived singly-linked list. The state will be, of course, the current node."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "69817ded-ee25-4019-aa92-cbfbbcbcce7f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"implicit iterator: 4\n",
"implicit iterator: 5\n",
"implicit iterator: 6\n",
"the explicit iterator: <__main__.LinkedListIter object at 0x77d798624200>\n",
"explicit iterator: 4\n",
"explicit iterator: 5\n",
"explicit iterator: 6\n"
]
}
],
"source": [
"class LinkedList:\n",
" def __init__(self, value, next_=None):\n",
" self.value, self.next = value, next_\n",
"\n",
" def __iter__(self):\n",
" return LinkedListIter(self)\n",
"\n",
"class LinkedListIter:\n",
" def __init__(self, current):\n",
" self.current = current\n",
" \n",
" def __iter__(self):\n",
" return self\n",
"\n",
" def __next__(self):\n",
" if self.current is None:\n",
" raise StopIteration()\n",
" current = self.current\n",
" self.current = current.next\n",
" return current.value\n",
"\n",
"linked_list = LinkedList(4, LinkedList(5, LinkedList(6)))\n",
"for item in linked_list:\n",
" print('implicit iterator:', item)\n",
"\n",
"iterator = iter(linked_list)\n",
"print('the explicit iterator:', iterator)\n",
"for item in iterator:\n",
" print('explicit iterator:', item)"
]
},
{
"cell_type": "markdown",
"id": "e50a4bce-ade6-45bd-b579-7e987b4f2a07",
"metadata": {},
"source": [
"In fact, since the iterator comes down to \"just calling a function\", we can flout some of the conventions that aren't hard rules. Remember how the standard iterators don't ordinarily continue ever after exhaustion?"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "eb140c48-d226-4233-9a94-61c8052b3aab",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True\n"
]
}
],
"source": [
"class ResumableIterator:\n",
" def __init__(self, iterable):\n",
" self.iterable = iterable\n",
" self.iterator = iter(iterable)\n",
"\n",
" def __iter__(self):\n",
" return self\n",
"\n",
" def __next__(self):\n",
" try:\n",
" return next(self.iterator)\n",
" except StopIteration:\n",
" self.iterator = iter(self.iterable)\n",
" raise\n",
"\n",
"res_iter = ResumableIterator([1, 2, 3])\n",
"print(res_iter is iter(res_iter)) # no tricks here, it really is the iterator"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "c4caa76b-aee5-491d-89c8-d137b3e60958",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"2\n",
"3\n"
]
}
],
"source": [
"for item in res_iter:\n",
" print(item)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "dfa4f60e-fc72-46d0-b540-edd32193652d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"2\n",
"3\n"
]
}
],
"source": [
"for item in res_iter:\n",
" print(item)"
]
},
{
"cell_type": "markdown",
"id": "6fe3264f-154e-4dcc-9e9a-f52497c9a72f",
"metadata": {},
"source": [
"The usefulness of this is dubious, of course, since it would make more sense to just take advantage of the iterable protocol and use the list in such cases, but it provides a good demonstration. Interestingly enough, some other languages iterators _don't_ have such guarantees of non-resumption that Python implies; in particular, Rust has [Iterator::fuse](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.fuse) to introduce this guarantee unto its iterators where it otherwise isn't provided.\n",
"\n",
"As a little bit of an aside, and a moderate surprise: Python _technically_ endows any object that supports `__getitem__` with an iterable implementation:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "64e44638-55f4-470e-8a67-84c560b08198",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<iterator object at 0x77d798625c60>\n"
]
}
],
"source": [
"class Sequence:\n",
" def __init__(self, seq):\n",
" self.seq = seq\n",
"\n",
" def __getitem__(self, idx):\n",
" print('index', idx)\n",
" return self.seq[idx]\n",
"\n",
"seq = Sequence([1, 2, 3])\n",
"iterator = iter(seq)\n",
"print(iterator)"
]
},
{
"cell_type": "markdown",
"id": "4df2e33b-7eba-438e-881a-c58c404e2844",
"metadata": {},
"source": [
"The resulting object is a \"plain old\" iterator, implemented in C, and it has one job: call `__getitem__` sequentially, starting from 0, until it raises `IndexError`."
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "d05ccd8b-b1d0-45f3-a0a0-a163d4c81bd3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"index 0\n",
"1\n",
"index 1\n",
"2\n",
"index 2\n",
"3\n",
"index 3\n"
]
}
],
"source": [
"for item in iterator:\n",
" print(item)"
]
},
{
"cell_type": "markdown",
"id": "302aa22d-c84e-4d6b-8b02-0d446e2fc360",
"metadata": {},
"source": [
"Note that the `index 3` line printed before the `IndexError` was raised--this was the signal to stop, and was silently converted into `StopIteration(None)` by the default iterator. This protocol is entirely unaware of `__len__` or other such methods; it exists only as a stopgap to allow creators of sequence-like classes to be afforded a sane default, especially when their users expect them to be iterable."
]
},
{
"cell_type": "markdown",
"id": "0397b241-f349-43d2-8f35-c7586d077342",
"metadata": {},
"source": [
"## Generators\n",
"\n",
"As can be seen, the implementation of even simple iterators can be a little untoward, not in the least because iterators maintain state between function calls. Take, for example, this binary tree. "
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "c2e5fdc2-0530-4e44-a2ef-d51aabbfb694",
"metadata": {},
"outputs": [],
"source": [
"class Bin:\n",
" def __init__(self, value, left=None, right=None):\n",
" self.value = value\n",
" self.left, self.right = left, right\n",
"\n",
"root = Bin('k',\n",
" Bin('g',\n",
" Bin('b',\n",
" Bin('a'),\n",
" ),\n",
" Bin('h',\n",
" None,\n",
" Bin('j'),\n",
" ),\n",
" ),\n",
" Bin('p',\n",
" Bin('m'),\n",
" ),\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "a75bbd21-0d9f-4dcc-b053-c3c1f136496b",
"metadata": {},
"source": [
"Say we want to print this tree out. A sensible way to do this would be to print out the current value, at an appropriate indentation, and then print out our children. Since we won't assume our tree is complete (the above one is not), it might not be bad to sneak a prefix in as well, so we know which child is addressed."
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "c259de31-88ec-488f-b262-9904d5b97de2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"k\n",
" L g\n",
" L b\n",
" L a\n",
" R h\n",
" R j\n",
" R p\n",
" L m\n"
]
}
],
"source": [
"def print_tree(here, level=0, prefix=''):\n",
" # Show what's here...\n",
" print(' ' * level, prefix, here.value, sep='')\n",
" # ... then print our children, left-first by convention.\n",
" if here.left:\n",
" print_tree(here.left, level + 1, 'L ')\n",
" if here.right:\n",
" print_tree(here.right, level + 1, 'R ')\n",
"\n",
"print_tree(root)"
]
},
{
"cell_type": "markdown",
"id": "98d289d2-9d13-46eb-abfb-d722c8f5f79d",
"metadata": {},
"source": [
"That looks at least like it was when we defined it. It's not a coincidence that recursive functions are often used with structural recursion (here, `Bin` objects that contain `Bin` objects), as the resulting programs are fairly pithy and comprehensible enough to reason about. Now for a simple ask: say that we want to _iterate_ over a representation of the lines above--we want to know the depth and the direction from the parent (it wouldn't be hard to extend this to a full path), as well as the present node under consideration. A straightforward transliteration, right?\n",
"\n",
"Well, no. The iterator protocol demands that a value is _returned_, or an exception raised. With that limitation in mind, I don't expect anyone to casually understand this code, only to marvel at its existence."
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "4d54930d-c0e3-4d59-8862-efdf170b307d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"k\n",
" L g\n",
" L b\n",
" L a\n",
" R h\n",
" R j\n",
" R p\n",
" L m\n"
]
}
],
"source": [
"class PreOrderIter:\n",
" def __init__(self, root):\n",
" self.current = root\n",
" self.level = 0\n",
" self.prefix = ''\n",
" self.visit_stack = []\n",
"\n",
" def __iter__(self):\n",
" return self\n",
"\n",
" def __next__(self):\n",
" if self.current is None:\n",
" raise StopIteration()\n",
"\n",
" current, level, prefix = self.current, self.level, self.prefix\n",
" \n",
" if current.left:\n",
" self.current = current.left\n",
" self.visit_stack.append((current, self.level)) # need to find the right child again\n",
" self.level += 1\n",
" self.prefix = 'L '\n",
" elif current.right:\n",
" self.current = current.right\n",
" # but no need to find it again here\n",
" self.level += 1\n",
" self.prefix = 'R '\n",
" else:\n",
" # We're at a leaf; pop visitations until we find a right child to descend\n",
" while self.visit_stack:\n",
" node, lev = self.visit_stack.pop()\n",
" self.level = lev\n",
" if node.right:\n",
" self.current = node.right\n",
" self.level += 1\n",
" self.prefix = 'R '\n",
" break\n",
" else:\n",
" self.current = None # nowhere else to go\n",
"\n",
" return current, level, prefix\n",
"\n",
"for here, level, prefix in PreOrderIter(root):\n",
" print(' ' * level, prefix, here.value, sep='')"
]
},
{
"cell_type": "markdown",
"id": "691a8063-556f-4085-a62a-003cd1b5827e",
"metadata": {},
"source": [
"Yikes! Again, I'm not expecting deep understanding, but writing this took _several_ attempts. The important note here is that we need _a_ stack to remember our position in the traversal; if we can't use the call stack as such, `self.visit_stack` will have to suffice. We know we don't have to keep track of the right branch (due to the early return in the recursive implementation above), but adding any further number of branches into this tree would change that fragile assumption. Just from the sheer size, it's now possible to see how much benefit we were getting from the call stack in the recursive implementation.\n",
"\n",
"In fact, since there's no state in calling `next`, perhaps we could implement a \"call stack\" of sorts ourselves..."
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "ce689d90-7b1c-4f56-987e-10191b60adf5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"k\n",
" L g\n",
" L b\n",
" L a\n",
" R h\n",
" R j\n",
" R p\n",
" L m\n"
]
}
],
"source": [
"class RecPreOrderIter:\n",
" def __init__(self, root, level=0, prefix=''):\n",
" self.current = root\n",
" self.level, self.prefix = level, prefix\n",
" self.sub_iters = []\n",
"\n",
" def __iter__(self):\n",
" return self\n",
"\n",
" def __next__(self):\n",
" if self.current:\n",
" current = self.current\n",
" self.current = None\n",
" if current.left:\n",
" self.sub_iters.append(RecPreOrderIter(current.left, self.level + 1, 'L '))\n",
" if current.right:\n",
" self.sub_iters.append(RecPreOrderIter(current.right, self.level + 1, 'R '))\n",
" return current, self.level, self.prefix\n",
" else:\n",
" while self.sub_iters:\n",
" try:\n",
" return next(self.sub_iters[0])\n",
" except StopIteration:\n",
" del self.sub_iters[0]\n",
" raise StopIteration() # now completely exhausted\n",
"\n",
"for here, level, prefix in RecPreOrderIter(root):\n",
" print(' ' * level, prefix, here.value, sep='')"
]
},
{
"cell_type": "markdown",
"id": "2ded322e-8106-4cc9-9986-7200b9ff9784",
"metadata": {},
"source": [
"That is a little better, but still much more complicated than the trivial recursive implementation. In effect, we are _suspending_ the call stack by pushing \"thunks\"--function closures--onto `self.sub_iters`. The method by which we invoke the iterators looks a little weird, `next` instead of a straight function call, but the result is the same. What would be nice is if we could somehow introduce those suspension points into code itself, so that we could use the essential recursive implementation of `print_tree` as a guide.\n",
"\n",
"In effect, this is what `yield` does; when introduced to the body of a function, that function becomes a generator constructor, as we'll discuss below."
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "cb394ed6-7d58-416a-950f-a33b816c0c7f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"k\n",
" L g\n",
" L b\n",
" L a\n",
" R h\n",
" R j\n",
" R p\n",
" L m\n"
]
}
],
"source": [
"def pre_order_tree(node, level=0, prefix=''):\n",
" yield node, level, prefix\n",
" if node.left:\n",
" for child, level_, prefix_ in pre_order_tree(node.left, level + 1, 'L '):\n",
" yield child, level_, prefix_\n",
" if node.right:\n",
" for child, level_, prefix_ in pre_order_tree(node.right, level + 1, 'R '):\n",
" yield child, level_, prefix_\n",
"\n",
"for node, level, prefix in pre_order_tree(root):\n",
" print(' ' * level, prefix, node.value, sep='')"
]
},
{
"cell_type": "markdown",
"id": "db44b903-0007-40d7-b75a-5a8c12009028",
"metadata": {},
"source": [
"It's still a little bigger than `print_tree`, though, because of the nested `for`-loops. If we take advantage of a slightly-newer feature, `yield from`, we can basically bring it into parity with `print_tree`."
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "d33720f1-7a17-497c-a520-92597deae9c9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"k\n",
" L g\n",
" L b\n",
" L a\n",
" R h\n",
" R j\n",
" R p\n",
" L m\n"
]
}
],
"source": [
"def pre_order_tree(node, level=0, prefix=''):\n",
" yield node, level, prefix\n",
" if node.left:\n",
" yield from pre_order_tree(node.left, level + 1, 'L ')\n",
" if node.right:\n",
" yield from pre_order_tree(node.right, level + 1, 'R ')\n",
"\n",
"for node, level, prefix in pre_order_tree(root):\n",
" print(' ' * level, prefix, node.value, sep='')"
]
},
{
"cell_type": "markdown",
"id": "8ec20d08-67cb-4c6c-9481-86bcb70fdd76",
"metadata": {},
"source": [
"What is the type of `pre_order_tree(root)`, anyway?"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "b32411b6-22fb-43df-a44f-9da7af05288e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<generator object pre_order_tree at 0x77d798f77060>\n"
]
}
],
"source": [
"gen = pre_order_tree(root)\n",
"print(gen)"
]
},
{
"cell_type": "markdown",
"id": "49e4af6a-fdb4-4b68-92da-633e95e8dffb",
"metadata": {},
"source": [
"This is a `generator` object, apparently; while we'll get to the details later, what's relevant to this conversation is that **a generator is an iterator**."
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "79bada1a-689c-4221-af88-994066c4f638",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(<__main__.Bin object at 0x77d798626450>, 0, '')\n",
"(<__main__.Bin object at 0x77d798625070>, 1, 'L ')\n",
"(<__main__.Bin object at 0x77d798626960>, 2, 'L ')\n"
]
}
],
"source": [
"print(next(gen))\n",
"print(next(gen))\n",
"print(next(gen))"
]
},
{
"cell_type": "markdown",
"id": "3e61f927-d238-41d1-8070-057b977b5e31",
"metadata": {},
"source": [
"Like all iterators, generators have _state_; the state here is, in essence, a suspension point. Calling the _function_ constructs a new generator object, which is poised to transfer control to the top of the function's code (but hasn't yet done that). Whenever a `yield` is evaluated, the values it is passed are returned from `__next__`, satisfying the iterator protocol, and the execution of the code object is suspended until the next invocation of `__next__`. That suspension is effectively invisible to _most_ user code; any locals, including arguments, are stored with the code (technically the frame), and also thus retain their values across suspensions. It is technically also possible for generators to access globals and shared closure variables, and thus witness the state of the world suddenly change between suspensions, but--for reasons that are hopefully clear--this has to be done carefully.\n",
"\n",
"Note, again, that the **declared function is a constructor**, not \"the generator\" itself. When a function contains `yield` or `yield from`, it is transformed from being a regular function to being a constructor for a generator. There is no other way to create a generator: you can't \"clone\" the state of an existing one, nor can you construct one through the type (which does not support `__new__` nor `__init__`). One of the notable shortcomings of this feature is that it's a little obtuse to create a generator that, intentionally, produces no values."
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "158784b9-03e5-4798-87cc-57a5041776c7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<generator object empty_gen at 0x77d7aa36fd70>\n"
]
}
],
"source": [
"def empty_gen():\n",
" if False:\n",
" yield None\n",
" return 99\n",
"\n",
"gen = empty_gen()\n",
"print(gen)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "932d2986-ee10-4535-b45a-9310e7d07719",
"metadata": {},
"outputs": [],
"source": [
"for item in gen:\n",
" print(item)"
]
},
{
"cell_type": "markdown",
"id": "cc66d937-273c-4d4c-a632-2d7c86f23bc6",
"metadata": {},
"source": [
"We couldn't have just done it without the statically-provable unreachable branch, because then it wouldn't have been a generator."
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "244c3fe5-74ff-4bbe-b042-3420778efdb9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"99\n"
]
}
],
"source": [
"def bad_empty_gen():\n",
" return 99\n",
"\n",
"gen = bad_empty_gen()\n",
"print(gen)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "acf75f7d-6a43-4a97-b91f-3fecbea7fba4",
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "'int' object is not iterable",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[38], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;43;01mfor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mitem\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01min\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mgen\u001b[49m\u001b[43m:\u001b[49m\n\u001b[1;32m 2\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mprint\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mitem\u001b[49m\u001b[43m)\u001b[49m\n",
"\u001b[0;31mTypeError\u001b[0m: 'int' object is not iterable"
]
}
],
"source": [
"for item in gen:\n",
" print(item)"
]
},
{
"cell_type": "markdown",
"id": "d32e52b9-5fd3-408e-ada5-907b4d0e8ae9",
"metadata": {},
"source": [
"As a little aside, note that `empty_gen` _returns_, rather than yielding, the value 99. Where does that go? Well, it turns out that you can't get it from a standard `for` loop due to its implementation, but if you run it manually..."
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "76519e9a-f9c4-44c6-b1d4-cc69435b0777",
"metadata": {},
"outputs": [
{
"ename": "StopIteration",
"evalue": "99",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[39], line 2\u001b[0m\n\u001b[1;32m 1\u001b[0m gen \u001b[38;5;241m=\u001b[39m empty_gen()\n\u001b[0;32m----> 2\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;28;43mnext\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mgen\u001b[49m\u001b[43m)\u001b[49m)\n",
"\u001b[0;31mStopIteration\u001b[0m: 99"
]
}
],
"source": [
"gen = empty_gen()\n",
"print(next(gen))"
]
},
{
"cell_type": "markdown",
"id": "ac5d8656-f898-4ee3-a6ab-0c4a95e97c4c",
"metadata": {},
"source": [
"Ah, it's inside the `StopIteration` exception! It turns out this is accessible, as the first `Exception` argument usually is, from the `value` attribute on the exception object."
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "cfc41ba2-df28-4adc-9e99-3e4c30900ed8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"generator returned 99\n"
]
}
],
"source": [
"gen = empty_gen()\n",
"try:\n",
" next(gen)\n",
"except StopIteration as e:\n",
" print('generator returned', e.value)"
]
},
{
"cell_type": "markdown",
"id": "d7d09e60-c853-4ebd-b230-a2a724fc7105",
"metadata": {},
"source": [
"Some things to be aware of: subsequent calls to the same exhausted generator do _not_ produce subsequent return values; in fact, their associated value is the default, `None`. I'd consider it a programming error to attempt to resume a stopped generator twice, but it is strictly allowable by the iterator protocol as above, so I imagine this was the lesser of the various evils."
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "b91cebf5-6111-4fef-b4ae-13957723569d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"generator returned None\n"
]
}
],
"source": [
"try:\n",
" next(gen)\n",
"except StopIteration as e:\n",
" print('generator returned', e.value)"
]
},
{
"cell_type": "markdown",
"id": "0530a2a5-e9e3-4e16-9300-68b9adcb7112",
"metadata": {},
"source": [
"Secondly: you _can_ implement custom iterators that raise whatever value in `StopIteration` you want, of course, but the standard Python built-in iterators _always_ return None."
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "20a751cb-3ba2-4d44-8e7b-031127c6fd82",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n",
"\n"
]
}
],
"source": [
"def stopping_exception(iterable):\n",
" iterator = iter(iterable)\n",
" while True:\n",
" try:\n",
" next(iterator)\n",
" except StopIteration as e:\n",
" return e\n",
"\n",
"print(stopping_exception([]))\n",
"print(stopping_exception({}))\n",
"print(stopping_exception(set()))\n",
"print(stopping_exception(devnull))"
]
},
{
"cell_type": "markdown",
"id": "376530f9-3198-461f-b220-bfb43c05b2f0",
"metadata": {},
"source": [
"As can be seen, the `StopIteration`, nor `print`, will display `None` as its only value."
]
},
{
"cell_type": "markdown",
"id": "14a2d617-296a-48b0-b7da-ff0d55b4febf",
"metadata": {},
"source": [
"Although such generators seem decidedly useless from a purely-functional perspective, there are times where we might want to abuse this suspension point mechanism for something more than merely yielding values from an iterator. For example, we could conceive of generators as being cooperatively-scheduled \"tasks\", whereupon each suspension is, for lack of a better term, yield its thread of control. When doing so, it's useful to yield something that gives a condition for resumption; a basic one would be \"when to resume\", which could be regarded as a simple discrete-time event simulator."
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "c0093f2c-c876-4ece-9ab0-6aea44656b57",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 - clock A ticks\n",
"1 - clock B ticks\n",
"5 - clock A ticks\n",
"8 - clock B ticks\n",
"10 - main_task starts\n",
"10 - sub_task 0 starts\n",
"10 - clock A ticks\n",
"13 - sub_task 0 ends\n",
"13 - sub_task 1 starts\n",
"15 - clock A ticks\n",
"15 - clock B ticks\n",
"16 - sub_task 1 ends\n",
"16 - sub_task 2 starts\n",
"19 - sub_task 2 ends\n",
"19 - sub_task 3 starts\n",
"20 - clock A ticks\n",
"22 - sub_task 3 ends\n",
"22 - sub_task 4 starts\n",
"22 - clock B ticks\n",
"25 - sub_task 4 ends\n",
"25 - main_task ends\n",
"25 - clock A ticks\n",
"29 - clock B ticks\n",
"30 - clock A ticks\n"
]
}
],
"source": [
"import heapq\n",
"\n",
"now = 0\n",
"tasks = [] # list of (when, priority, generator)\n",
"\n",
"def run(limit=None):\n",
" global now\n",
" \n",
" heapq.heapify(tasks)\n",
" while tasks:\n",
" now, pri, gen = heapq.heappop(tasks)\n",
" if now > limit:\n",
" break\n",
" try:\n",
" then = next(gen)\n",
" except StopIteration:\n",
" pass\n",
" else:\n",
" heapq.heappush(tasks, (then, pri, gen))\n",
"\n",
"def clock(ident, rate):\n",
" while True:\n",
" print(now, '- clock', ident, 'ticks')\n",
" yield now + rate\n",
"\n",
"def main_task():\n",
" print(now, '- main_task starts')\n",
" for i in range(5):\n",
" yield from sub_task(i)\n",
" print(now, '- main_task ends')\n",
"\n",
"def sub_task(i):\n",
" print(now, '- sub_task', i, 'starts')\n",
" yield now + 3\n",
" print(now, '- sub_task', i, 'ends')\n",
"\n",
"tasks.extend([\n",
" (0, 0, clock('A', 5)),\n",
" (1, 10, clock('B', 7)),\n",
" (10, -50, main_task()),\n",
"])\n",
"\n",
"run(30)"
]
},
{
"cell_type": "markdown",
"id": "e9e04640-0eb2-44a5-84e1-bcaaf2f45796",
"metadata": {},
"source": [
"This is not too far off from domain-specific languages used in digital logic simulations.\n",
"\n",
"It's also sometimes useful to communicate information back into these tasks. There are a couple ways to get this to occur; one of the more surprising is to note that `yield` can evaluate to a value; in fact, it always does, though when invoked with `__next__` in the standard iteration protocol, `None` is passed in. To send some other value, we have to veer off the regular iterator protocol and into generator-specific methods: `send` is used to pass an arbitrary value back as the result of a `yield`."
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "308be781-a8ac-46a7-858b-423679366c86",
"metadata": {},
"outputs": [],
"source": [
"def shows_sent_values():\n",
" while True:\n",
" value = (yield 'more') # the precedence of yield is very low--parentheses like these are always recommended\n",
" print(value)\n",
"\n",
"gen = shows_sent_values()"
]
},
{
"cell_type": "markdown",
"id": "9c0d70ad-d65a-4693-b724-8a38f61b4b42",
"metadata": {},
"source": [
"The first value sent in, right after construction, _must_ be `None`; that is because there's no initial `yield` suspension point, and so there is no sensible place to pass that value."
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "a2d92060-2ced-4117-97d9-eb5293474215",
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "can't send non-None value to a just-started generator",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[45], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[43mgen\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msend\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mfoo\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m)\n",
"\u001b[0;31mTypeError\u001b[0m: can't send non-None value to a just-started generator"
]
}
],
"source": [
"print(gen.send('foo'))"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "23c26bf5-bfe9-4b03-95ae-202ebb038c18",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"more\n"
]
}
],
"source": [
"print(gen.send(None))"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "4fbe7518-3695-4388-807f-4bb90aa560be",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"foo\n",
"more\n"
]
}
],
"source": [
"print(gen.send('foo'))"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "bccbae6f-2a23-41d2-83af-7417196a1e94",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 2, 3]\n",
"more\n"
]
}
],
"source": [
"print(gen.send([1, 2, 3]))"
]
},
{
"cell_type": "markdown",
"id": "82431414-dee9-4158-ab86-c70926dae7df",
"metadata": {},
"source": [
"The other special generator method allows you to throw an exception, as if you used `raise` at the immediate point of the `yield`."
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "78884dd5-acd9-4726-b053-ade3fd68611a",
"metadata": {},
"outputs": [
{
"ename": "ValueError",
"evalue": "oops",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[49], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[43mgen\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mthrow\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;167;43;01mValueError\u001b[39;49;00m\u001b[43m(\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43moops\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m)\n",
"Cell \u001b[0;32mIn[44], line 3\u001b[0m, in \u001b[0;36mshows_sent_values\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mshows_sent_values\u001b[39m():\n\u001b[1;32m 2\u001b[0m \u001b[38;5;28;01mwhile\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m:\n\u001b[0;32m----> 3\u001b[0m value \u001b[38;5;241m=\u001b[39m (\u001b[38;5;28;01myield\u001b[39;00m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mmore\u001b[39m\u001b[38;5;124m'\u001b[39m) \u001b[38;5;66;03m# the precedence of yield is very low--parentheses like these are always recommended\u001b[39;00m\n\u001b[1;32m 4\u001b[0m \u001b[38;5;28mprint\u001b[39m(value)\n",
"\u001b[0;31mValueError\u001b[0m: oops"
]
}
],
"source": [
"print(gen.throw(ValueError('oops')))"
]
},
{
"cell_type": "markdown",
"id": "2edf971c-cb3f-4e9d-9e72-7ca3ee3df165",
"metadata": {},
"source": [
"In this case, the exception propagated up to the toplevel, _through_ the resumption of the generator. It's not hard to see how, for example, one could catch the kind of exception thrown in, to catch the case in which it bubbled through. This is how [`asyncio.CancelledError`](https://docs.python.org/3/library/asyncio-exceptions.html#asyncio.CancelledError) is usually handled, with the caveat that user code is allowed to catch it on the way out to perform specific cleanup--indeed, it's a feature that this mechanism triggers `try` `finally` blocks.\n",
"\n",
"An alternative to this is to `yield` mutable objects; before resumption, the object can be mutated to the desired state of progress. This is especially useful if external, asynchronous processes are indicating completion, rather than the program itself."
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "bb37a695-223e-4d90-ad87-577a99d12b1c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"the result was 30\n"
]
}
],
"source": [
"class Future:\n",
" def __init__(self):\n",
" self.ready = False\n",
" self.value = None\n",
"\n",
" def set_ready(self, value):\n",
" self.ready = True\n",
" self.value = value\n",
"\n",
"# Pretend this is some external channel like, say, an event loop\n",
"callbacks = []\n",
"\n",
"def example_coro():\n",
" f = Future()\n",
" callbacks.append(lambda: f.set_ready(30)) # e.g., pretend this was socket.write or the like\n",
" yield f\n",
" print('the result was', f.value)\n",
"\n",
"coro = example_coro()\n",
"value_to_return = None\n",
"while True:\n",
" # Run the callbacks\n",
" for cb in callbacks:\n",
" cb()\n",
" del callbacks[:]\n",
"\n",
" # run the coroutine\n",
" try:\n",
" future = coro.send(value_to_return)\n",
" except StopIteration:\n",
" break\n",
"\n",
" # value_to_return = f # would allow the yield to return the future, too\n",
" "
]
},
{
"cell_type": "markdown",
"id": "1ef86a4b-5092-4d45-9608-1bb709a77d24",
"metadata": {},
"source": [
"If you're familiar with [`asyncio`](https://docs.python.org/3/library/asyncio.html), you can probably already see where this is going; it wouldn't be hard to wrap up the generator object with `value_to_return`, make a priority queue as in the simulator above, and this create an exceedingly-simplified \"event loop\". `asyncio` was, perhaps, the first standard-library module to ever consider generators cast in this light--as suspendable tasks that could be cooperatively scheduled. Unfortunately, some of the idiosyncracies of generators previously discussed made this harder than usual. For example, going back to the event simulator, let's say we wanted a \"task\" that ran just once, without any dependency on rescheduling or time."
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "37d48fa0-54da-4756-b09a-935630e87a69",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"bang!\n"
]
},
{
"ename": "TypeError",
"evalue": "'NoneType' object is not an iterator",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[51], line 6\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mbang!\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m 4\u001b[0m tasks \u001b[38;5;241m=\u001b[39m [(\u001b[38;5;241m0\u001b[39m, \u001b[38;5;241m0\u001b[39m, impulse())]\n\u001b[0;32m----> 6\u001b[0m \u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m10\u001b[39;49m\u001b[43m)\u001b[49m\n",
"Cell \u001b[0;32mIn[43], line 15\u001b[0m, in \u001b[0;36mrun\u001b[0;34m(limit)\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[38;5;28;01mbreak\u001b[39;00m\n\u001b[1;32m 14\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m---> 15\u001b[0m then \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mnext\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mgen\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 16\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mStopIteration\u001b[39;00m:\n\u001b[1;32m 17\u001b[0m \u001b[38;5;28;01mpass\u001b[39;00m\n",
"\u001b[0;31mTypeError\u001b[0m: 'NoneType' object is not an iterator"
]
}
],
"source": [
"def impulse():\n",
" print('bang!')\n",
"\n",
"tasks = [(0, 0, impulse())]\n",
"\n",
"run(10)"
]
},
{
"cell_type": "markdown",
"id": "b31cc605-00c4-4029-baa1-be6e0bf4d572",
"metadata": {},
"source": [
"We'd have to resort to the \"fake `yield`\" method employed above to transform this into a generator, since it has no actual time dependency.\n",
"\n",
"Similarly, the parenthesization of `yield` is awkward whenever results are returned to the generator. The use of `yield` as a bidirectional communication method was, perhaps, a historical oddity permitted by the implementation back in Python 2, and its gravity not considered until it became used in a manner not unlike `await`. Ultimately, a syntax change was needed, though, surprisingly, very little had to be done to the underlying code.\n",
"\n",
"Finally, there's always an awkward little gotcha: it's very easy to construct a generator, then forget to _use_ it at all (e.g. with `yield from`). The syntax and semantics for doing so are perfectly legal, but--in the world where these generators represent tasks--almost always unexpected. Being able to distinguish between \"The Old Way\" of generators as merely (pure or mostly pure) functions which happened to stream values via suspension and \"The New Way\" of treating them as cooperatively-scheduled tasklets was a considerable goal.\n",
"\n",
"Enter...\n",
"\n",
"## Async/Await\n",
"\n",
"In my experience, very few programmers really think of what `async` and `await` actually do; they just know, as a matter of fact, that, when they see \"`coroutine`\" written in the `asyncio` docs, that they have to `await` the result of the function. Intuitively, they're aware that it's a suspension point--a place where their nice, sequential code can stop running--and so they have to be aware of the challenges of data races that can result from the world shifting under their feet. This is not an unfamiliar problem, as it's exactly the same kind of parallelism woes often encountered in multithreading; however, while code scheduled cooperatively is _parallel_ it is not (yet) concurrent, at least in the vast majority of cases, and this has the effect of making the minutiae of micro-architectural data race concerns less obvious. Still, in no small part due to `asyncio`'s rather commendable implementation, writing coroutines as such, with `async def`, is a _rather easy_ way of dealing with an asynchronous, unreliable world, especially over best-effort computer networks.\n",
"\n",
"Python wasn't the first, and won't be the last, programming language to add `async` and `await` as keywords, operators, or what-have-you, but it has a very distinct feature that, while making it a little more verbose, tends to make Python's implementation much more flexible. Although `asyncio` is taken for granted as _the_ asynchronous programming API for Python, as you'll soon see, it need not be the only one.\n",
"\n",
"Let's dive right in. As before, the definition of an awaitable is\n",
"\n",
"> An awaitable object generally implements an `__await__()` method. Coroutine objects returned from async def functions are awaitable.\n",
">\n",
"> `object.__await__(self)`\n",
">\n",
"> Must return an iterator. Should be used to implement awaitable objects. For instance, `asyncio.Future` implements this method to be compatible with the `await` expression.\n",
"\n",
"-- https://docs.python.org/3/reference/datamodel.html#awaitable-objects\n",
"\n",
"That's really all there is to it. Note that we know several ways of implementing iterators, now, and it's no surprise that it's common to make the `__await__` method a generator. Let's give it a try."
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "e36da3d7-eae7-4f77-a256-a643d414141a",
"metadata": {},
"outputs": [],
"source": [
"class Awaitable:\n",
" def __init__(self, ident):\n",
" self.ident = ident\n",
"\n",
" def __await__(self):\n",
" return (yield self.ident)"
]
},
{
"cell_type": "markdown",
"id": "8401dbc8-679b-4cd1-8bf1-c344debdc979",
"metadata": {},
"source": [
"This fits the definition of an \"awaitable\", no doubt, but how do we use it? Well, `await`ing it seems like the obvious choice, but syntactically we can only do that from in one place: the body of an `async def` function."
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "88f3f7d6-9691-4edb-84bf-556016784461",
"metadata": {},
"outputs": [],
"source": [
"async def coroutine():\n",
" return await Awaitable(33)"
]
},
{
"cell_type": "markdown",
"id": "41882d3b-fa74-4f55-99ab-dc1b2e5addcc",
"metadata": {},
"source": [
"By virtue of being coroutines, objects returned by `async def` are also awaitable."
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "2faec672-507b-495f-b6a7-5364faf53bc2",
"metadata": {},
"outputs": [],
"source": [
"async def main():\n",
" return await coroutine()"
]
},
{
"cell_type": "markdown",
"id": "aaedce16-25b1-4800-ad95-ab125e2e5437",
"metadata": {},
"source": [
"In ordinary `asyncio` code, this is the part where you would `asyncio.run(main())` and let it take care of the minutiae; however, we're already off the rails by having defined our own `Awaitable` class which it will fail to comprehend. In general, the awaitables in use by one implementation should not be assumed to be compatible with others; you can think of the exact nature of an \"awaitable\" provided by an event library as an ABI detail. So that leaves us with one option: construct it and find out."
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "e92728c9-545f-44d6-81b4-6e3ed31f9c15",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<coroutine object main at 0x77d798f7b7c0>\n"
]
}
],
"source": [
"coro = main()\n",
"print(coro)"
]
},
{
"cell_type": "markdown",
"id": "dca0cff7-9c36-4c71-9fe9-e80b30d650c6",
"metadata": {},
"source": [
"Coroutine objects are a little special, by intent. For starters, if you discard one without having ever `await`ed it (more specifically, without having ever _resumed_ it from its initial state), the Python interpreter will warn you."
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "c104add0-8f5a-43ba-9b7b-882c41afcf96",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/run/user/1000/app/org.jupyter.JupyterLab/ipykernel_3196/791138065.py:1: RuntimeWarning: coroutine 'main' was never awaited\n",
" del coro\n",
"RuntimeWarning: Enable tracemalloc to get the object allocation traceback\n"
]
}
],
"source": [
"del coro"
]
},
{
"cell_type": "markdown",
"id": "430860db-0113-44ea-94e1-ef631224a0ed",
"metadata": {},
"source": [
"This is because of an assumed difference in intent: coroutines are _meant_ to be scheduled; it's a common enough error to forget to `await` one such coroutine object that it's worthwhile to complain when it seems like they're improperly used.\n",
"\n",
"Another special feature is that `async` keyword; whenever introduced, the function returns a coroutine. A function with `await` expressions _must_ be `async`, but an `async` function _need_ not have `await` expressions. This makes it easy to get around the trouble of \"empty\" coroutines without resorting to syntax hacks."
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "c3ce5a57-175c-4eff-90e6-1afb23c8513b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<coroutine object empty_coro at 0x77d7982b5900>\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/run/user/1000/app/org.jupyter.JupyterLab/ipykernel_3196/2331493078.py:4: RuntimeWarning: coroutine 'empty_coro' was never awaited\n",
" print(empty_coro())\n",
"RuntimeWarning: Enable tracemalloc to get the object allocation traceback\n"
]
}
],
"source": [
"async def empty_coro():\n",
" return 99\n",
"\n",
"print(empty_coro())"
]
},
{
"cell_type": "markdown",
"id": "c6e80206-2aac-48c7-8c63-d4a6eb14131e",
"metadata": {},
"source": [
"Arguably, such functions are probably _better_ designed as synchronous routines, as `async` functions tend to have viral propagation: while an `async` function can call regular (\"synchronous\") functions just fine, the moment they need to also depend on an `async` function (via `await`, most of the time), they must become `async` too. This tends to lead to [a large amount of understandable despair](https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/), since a callously-designed API lends itself to duplicating work between synchronous and asynchronous paths. However, that's a philosophical issue; whereas you may be stuck in Node.js with functions of both ilks in their API, and whereas both C# and JS have their own Task schedulers, Python exposes the machinery to you, the developer, directly. Since, at the end of the day, it's all just a bunch of generators running in a Python thread, you can have your cake and eat it too, doing terribly unrecommendable things as calling blocking APIs inside asynchronous functions if need be. If you take anything away from this text, just understand the consequences of doing so. Indeed, it is an oddity among programming languages that the core of an `asyncio` program is kicked off by running a _synchronous_ function--usually `asyncio.run`--that blocks until the asynchronous work is completely done.\n",
"\n",
"Back to the topic, despite the specialness of these coroutines, however, their API matches the generator API. Let's capture `main` from above and play with it as if it were a generator."
]
},
{
"cell_type": "code",
"execution_count": 58,
"id": "606aa4c1-110c-4855-8fa4-314504ed9a09",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<coroutine object main at 0x77d798f7bac0>\n"
]
},
{
"ename": "TypeError",
"evalue": "'coroutine' object is not an iterator",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[58], line 3\u001b[0m\n\u001b[1;32m 1\u001b[0m coro \u001b[38;5;241m=\u001b[39m main()\n\u001b[1;32m 2\u001b[0m \u001b[38;5;28mprint\u001b[39m(coro)\n\u001b[0;32m----> 3\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;28;43mnext\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mcoro\u001b[49m\u001b[43m)\u001b[49m)\n",
"\u001b[0;31mTypeError\u001b[0m: 'coroutine' object is not an iterator"
]
}
],
"source": [
"coro = main()\n",
"print(coro)\n",
"print(next(coro))"
]
},
{
"cell_type": "markdown",
"id": "3444e01f-a838-4b6c-bb3f-f9115fd00335",
"metadata": {},
"source": [
"Ah, well, they don't support `__next__`, that's good to know. The other methods, however..."
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "48df850f-966b-4d93-86f4-0fee9355ca10",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"33\n"
]
}
],
"source": [
"print(coro.send(None))"
]
},
{
"cell_type": "markdown",
"id": "1b3625f3-8e94-4746-a5f6-b3260c01479d",
"metadata": {},
"source": [
"Where'd that come from? Well, it's actually the `yield` inside of `Awaitable.await`--we constructed it with a `self.ident` of 33. Since we're conceptually inside that call stack, we can send a value back to it to return it from the `yield`."
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "95a5f212-7aca-4ab9-9e30-da80b9d8b393",
"metadata": {},
"outputs": [
{
"ename": "StopIteration",
"evalue": "33 is good",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[60], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[43mcoro\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msend\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43m33 is good\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m)\n",
"\u001b[0;31mStopIteration\u001b[0m: 33 is good"
]
}
],
"source": [
"print(coro.send('33 is good'))"
]
},
{
"cell_type": "markdown",
"id": "4ac54fdd-c097-4d22-8747-5d11148ab2cb",
"metadata": {},
"source": [
"And here we see the return value of `main`, which is the return value of `coroutine`, which was the awaited result of `Awaitable(33)`, is exactly what was returned from `Awaitable.__await__`. This is really the crux of the matter; although coroutines are certainly \"different\" in some senses, they are not only very similar to generators, **coroutines are implemented as generators**, with a few special proclivities designed to imply one or the other as a conscious design choice.\n",
"\n",
"Let's bring it all back together. Remember that discrete-time simulator? The tasks therein were definitely better suited to being coroutines, rather than generators; let's switch that out."
]
},
{
"cell_type": "code",
"execution_count": 61,
"id": "6f1e9ae8-21bd-4faf-9869-e7afa23889bb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 - clock A ticks\n",
"1 - clock B ticks\n",
"5 - clock A has resumed from time 0\n",
"5 - clock A ticks\n",
"8 - clock B has resumed from time 1\n",
"8 - clock B ticks\n",
"10 - main_task starts\n",
"10 - sub_task 0 starts\n",
"10 - clock A has resumed from time 5\n",
"10 - clock A ticks\n",
"13 - sub_task 0 resumed from time 10\n",
"13 - sub_task 0 ends\n",
"13 - sub_task 1 starts\n",
"15 - clock A has resumed from time 10\n",
"15 - clock A ticks\n",
"15 - clock B has resumed from time 8\n",
"15 - clock B ticks\n",
"16 - sub_task 1 resumed from time 13\n",
"16 - sub_task 1 ends\n",
"16 - sub_task 2 starts\n",
"19 - sub_task 2 resumed from time 16\n",
"19 - sub_task 2 ends\n",
"19 - sub_task 3 starts\n",
"20 - clock A has resumed from time 15\n",
"20 - clock A ticks\n",
"22 - sub_task 3 resumed from time 19\n",
"22 - sub_task 3 ends\n",
"22 - sub_task 4 starts\n",
"22 - clock B has resumed from time 15\n",
"22 - clock B ticks\n",
"25 - sub_task 4 resumed from time 22\n",
"25 - sub_task 4 ends\n",
"25 - main_task ends\n",
"25 - clock A has resumed from time 20\n",
"25 - clock A ticks\n",
"29 - clock B has resumed from time 22\n",
"29 - clock B ticks\n",
"30 - clock A has resumed from time 25\n",
"30 - clock A ticks\n"
]
}
],
"source": [
"import heapq\n",
"\n",
"class Task:\n",
" def __init__(self, coro):\n",
" self.coro = coro\n",
" self.value = None # Value to be sent in on next resume\n",
"\n",
" def resume(self):\n",
" return self.coro.send(self.value)\n",
"\n",
"class Delay:\n",
" def __init__(self, value):\n",
" self.deadline = self.map_value(value)\n",
"\n",
" def __await__(self):\n",
" return (yield self.deadline)\n",
"\n",
"now = 0\n",
"tasks = [] # list of (when, priority, Task)\n",
"\n",
"class Absolute(Delay):\n",
" def map_value(self, value):\n",
" return value\n",
"\n",
"class Relative(Delay):\n",
" def map_value(self, value):\n",
" return now + value\n",
"\n",
"def run(limit=None):\n",
" global now\n",
" \n",
" heapq.heapify(tasks)\n",
" while tasks:\n",
" now, pri, task = heapq.heappop(tasks)\n",
" if now > limit:\n",
" break\n",
" try:\n",
" then = task.resume()\n",
" except StopIteration:\n",
" pass\n",
" else:\n",
" task.value = now # last time it suspended\n",
" heapq.heappush(tasks, (then, pri, task))\n",
"\n",
"async def clock(ident, rate):\n",
" while True:\n",
" print(now, '- clock', ident, 'ticks')\n",
" then = await Relative(rate)\n",
" print(now, '- clock', ident, 'has resumed from time', then)\n",
"\n",
"async def main_task():\n",
" print(now, '- main_task starts')\n",
" for i in range(5):\n",
" await sub_task(i)\n",
" print(now, '- main_task ends')\n",
"\n",
"async def sub_task(i):\n",
" print(now, '- sub_task', i, 'starts')\n",
" then = await Relative(3)\n",
" print(now, '- sub_task', i, 'resumed from time', then)\n",
" print(now, '- sub_task', i, 'ends')\n",
"\n",
"tasks.extend([\n",
" (0, 0, Task(clock('A', 5))),\n",
" (1, 10, Task(clock('B', 7))),\n",
" (10, -50, Task(main_task())),\n",
"])\n",
"\n",
"run(30)"
]
},
{
"cell_type": "markdown",
"id": "ea1490b3-2692-4158-92ed-7da564f49f3f",
"metadata": {},
"source": [
"And there you have it! A foray into the meaning of iteration, iterables, and iterators takes us all the way into the dense, obtuse, and sadly poorly-documented details of a remarkable language feature. May that knowledge help you on your travels :)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment