Skip to content

Instantly share code, notes, and snippets.

@ekzhang
Last active July 11, 2024 01:58
Show Gist options
  • Save ekzhang/3bc0943dc1376d8f434d9bc8cb6bf907 to your computer and use it in GitHub Desktop.
Save ekzhang/3bc0943dc1376d8f434d9bc8cb6bf907 to your computer and use it in GitHub Desktop.
Build Systems à la Carte — Python edition
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "4e18e3b8-947e-47de-9dcc-be766fad098a",
"metadata": {},
"source": [
"# Build Systems à la Carte in Python\n",
"\n",
"A reimplementation of [Build Systems à la Carte (ICFP '18)](https://www.microsoft.com/en-us/research/uploads/prod/2018/03/build-systems-final.pdf) in Python. Based on the original Haskell implementation [here](https://github.com/snowleopard/build).\n",
"\n",
"**Why?** Because I want to understand the paper, and then to share it with others. Build systems are also of interest for programmers who are not familiar with Haskell (most programmers). The paper heavily relies on Haskell ideas like `Applicative` or `MonadState`\n",
"\n",
"Requires Python 3.11 to run."
]
},
{
"cell_type": "markdown",
"id": "c2054f04-10a7-4cea-a167-59a1aa30a270",
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"source": [
"## Getting Started\n",
"\n",
"Feel free to skip this section, just some prerequisites to set up the environment. Interesting part starts in [Section 3](#3.2-The-Task-Abstraction)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "bf7eb825-3f42-4cc2-8da7-66f918c6dbc1",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: oslash in /opt/homebrew/lib/python3.11/site-packages (0.6.3)\n",
"Requirement already satisfied: typing-extensions in /opt/homebrew/lib/python3.11/site-packages (from oslash) (4.8.0)\n"
]
}
],
"source": [
"!pip3 install oslash"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "0ce4ed7f-4b18-4d3c-87b7-b2278a71c67c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from __future__ import annotations\n",
"\n",
"import copy\n",
"import time\n",
"from dataclasses import dataclass, field\n",
"from typing import TypeVar, Generic, Callable, Optional\n",
"\n",
"from oslash import State, List, Maybe, Just, Nothing"
]
},
{
"cell_type": "markdown",
"id": "8de45c06-7b39-4ea3-a1b0-24b4b8d86388",
"metadata": {
"tags": []
},
"source": [
"### Fixing the State monad\n",
"\n",
"The `State` monad from OSlash doesn't implement `Applicative`. Let's fix that ourselves."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ec967aeb-f122-4965-ba24-f1d3846995b3",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def _state_apply(self: \"State[Callable[[TSource], TResult], S]\", something: \"State[TSource, S]\") -> \"State[TResult, S]\":\n",
" def inner(state):\n",
" f, state = self.run(state)\n",
" x, state = something.run(state)\n",
" return f(x), state\n",
" return State(inner)\n",
"\n",
"State.apply = _state_apply"
]
},
{
"cell_type": "markdown",
"id": "614f6f74-7a65-4260-8c2a-4c34a92c0d8d",
"metadata": {
"tags": []
},
"source": [
"### Implementing the Const applicative"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c307622f-dbc5-4924-9a43-c309463d9b30",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"W = TypeVar(\"W\")\n",
"A = TypeVar(\"A\")\n",
"B = TypeVar(\"B\")\n",
"\n",
"# Only for List right now due to limitations of Python\n",
"class ConstList(Generic[A, W]):\n",
" def __init__(self, const: List[W]) -> None:\n",
" self._const = const\n",
" \n",
" def getConst(self) -> List[W]:\n",
" return self._const\n",
" \n",
" # Functor Section\n",
" # ===============\n",
"\n",
" def map(self, mapper: Callable[[A], B]) -> ConstList[B, W]:\n",
" return self # do nothing\n",
" \n",
" # Applicative Section\n",
" # ===================\n",
" \n",
" @classmethod\n",
" def pure(cls, value: Callable[[A], B]) -> ConstList[Callable[[A], B], W]:\n",
" return ConstList(List.empty())\n",
" \n",
" def apply(self: ConstList[Callable[[A], B], W], something: ConstList[A, W]) -> ConstList[B, W]:\n",
" return ConstList(self.getConst().append(something.getConst()))"
]
},
{
"cell_type": "markdown",
"id": "2d3e9ba2-4a6a-4e3e-8941-0180edb031d7",
"metadata": {},
"source": [
"## 3.2 The Task Abstraction\n",
"\n",
"This section introduces `Task` and provides mathematical basis for the two major kinds.\n",
"\n",
"- _Applicative tasks_ (\"static\"): Tasks whose dependencies are known before they start.\n",
"- _Monadic tasks_ (\"dynamic\"): Tasks whose dependencies can depend on values of other dependencies.\n",
"\n",
"You can skip down to [Section 4](#4-Build-Systems-à-la-Carte) if you don't want to do any category theory."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "6313d531-8de5-4c21-829b-7503844567b8",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"K = TypeVar(\"K\") # Key\n",
"V = TypeVar(\"V\") # Value"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "82feeb57-6fc3-4b73-886d-fd2e443d79b1",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"class Task(Generic[K, V]): # implicit: constraint C, forall \"f. C f\"\n",
" def __init__(self, run: Callable[[Callable[[K], \"fV\"]], \"fV\"]):\n",
" self.run = run"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "0ae85f22-b9db-4f96-82bc-e81dc7abf998",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def sprsh1(k: str) -> Maybe[Task[str, int]]: # Tasks Applicative\n",
" match k:\n",
" case \"B1\":\n",
" # Just $ Task $ \\fetch -> ((+) <$> fetch \"A1\" <*> fetch \"A2\")\n",
" return Just(Task(lambda fetch: fetch(\"A1\").map(lambda x: lambda y: x + y).apply(fetch(\"A2\"))))\n",
" case \"B2\":\n",
" # Just $ Task $ \\fetch -> ((*2) <$> fetch \"B1\")\n",
" return Just(Task(lambda fetch: fetch(\"B1\").map(lambda x: x * 2)))\n",
" case _:\n",
" return Nothing()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "4c771209-727f-40b5-8962-af2c90be9af4",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def busy(tasks, key: K, store: dict[K, V]) -> dict[K, V]: # Build Applicative (or, Build Monad)\n",
" def fetch(k: K) -> State[V, dict[K, V]]:\n",
" task = tasks(k)\n",
" if task.is_nothing():\n",
" # gets (getValue k)\n",
" return State(lambda store: (store[k], store))\n",
" else:\n",
" # do\n",
" # v <- run task fetch\n",
" # modify (putValue k v)\n",
" # return v\n",
" return task._value.run(fetch).bind(\n",
" lambda value: State(\n",
" lambda store: (value, {**store, k: value})\n",
" )\n",
" )\n",
" # execState (fetch key) store\n",
" return fetch(key).run(store)[1]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "a809087f-c7e6-457e-b2d5-e5b166f47b5d",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'A1': 10, 'A2': 20, 'B1': 30, 'B2': 60}"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store = {\"A1\": 10, \"A2\": 20}\n",
"busy(sprsh1, \"B2\", store)"
]
},
{
"cell_type": "markdown",
"id": "9c2d0329-57b8-44f7-b4bc-31e6957cf617",
"metadata": {
"tags": []
},
"source": [
"## 3.5 Monadic Tasks"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "fb7976d3-73de-4bb1-b9f9-dec21b052cee",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def sprsh2(k: str) -> Maybe[Task[str, int]]: # Tasks Monad\n",
" match k:\n",
" case \"B1\":\n",
" # Just $ Task $ \\fetch -> do\n",
" # c1 <- fetch \"C1\"\n",
" # if c1 == 1 then fetch \"B2\" else fetch \"A2\"\n",
" def _(fetch):\n",
" return fetch(\"C1\").bind(\n",
" lambda c1: fetch(\"B2\") if c1 == 1 else fetch(\"A2\"),\n",
" )\n",
" return Just(Task(_))\n",
" case \"B2\":\n",
" # Just $ Task $ \\fetch -> do\n",
" # c1 <- fetch \"C1\"\n",
" # if c1 == 1 then fetch \"A1\" else fetch \"B1\"\n",
" def _(fetch):\n",
" return fetch(\"C1\").bind(\n",
" lambda c1: fetch(\"A1\") if c1 == 1 else fetch(\"B1\"),\n",
" )\n",
" return Just(Task(_))\n",
" case _:\n",
" return Nothing()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "a9f4ba68-2d59-4a81-b3ae-99d451b2187c",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'A1': 10, 'A2': 20, 'C1': 1, 'B2': 10, 'B1': 10}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store = {\"A1\": 10, \"A2\": 20, \"C1\": 1}\n",
"busy(sprsh2, \"B1\", store)"
]
},
{
"cell_type": "markdown",
"id": "c293dbd5-e13b-4c49-ba35-a7aa473c15ed",
"metadata": {},
"source": [
"## 3.7 Computing Dependencies"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "419b85d9-9a37-4ef8-bedd-a59c18a23f8a",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def dependencies(task: Task[K, V]) -> List[K]: # works for Task Applicative\n",
" return task.run(lambda k: ConstList(List.pure(k))).getConst()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "cdc13cf2-2a96-4e85-812a-1b84482ee8ba",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[A1, A2]\n",
"[B1]\n"
]
}
],
"source": [
"# dependencies $ fromJust $ sprsh1 \"B1\"\n",
"print(dependencies(sprsh1(\"B1\")._value))\n",
"\n",
"# dependencies $ fromJust $ sprsh1 \"B2\"\n",
"print(dependencies(sprsh1(\"B2\")._value))\n",
"\n",
"\n",
"# does not work, since it's a monadic Task\n",
"# dependencies(sprsh2(\"B1\")._value)"
]
},
{
"cell_type": "markdown",
"id": "b59d3568-c284-4793-81ab-6475cacb225f",
"metadata": {},
"source": [
"The `track` example with the `WriterT` Monad transformer is skipped because it's too complex to model in Python. The Monad transformer `WriterT` is a higher-kinded type with kind `(* -> *) -> * -> *`, and then that output is passed into the `Task` type constructor."
]
},
{
"cell_type": "markdown",
"id": "facca5da-3fda-4c5b-aebd-2ba7d38f6d94",
"metadata": {},
"source": [
"## 4 Build Systems à la Carte\n",
"\n",
"This section conceptually introduces `Scheduler` and `Rebuilder`. **The key idea of the paper: Any scheduler can be combined with any rebuilder to form a build system _à la carte_.**\n",
"\n",
"I'm going to make some simplifying assumptions to reduce the amount of abstraction:\n",
"\n",
"- Higher-kinded types like `Applicative` / `Monad` will no longer be used. So instead of genericizing over `pure()`, `apply()` and `bind()`, I'll just write down the concrete type. We'll fold `State` into a side-effect.\n",
"- Key will always be `str`, and Value will be a hashable, immutable type.\n",
"- We won't use the `IO` monad, feel free to `print()` whenever for debugging purposes (yes, it's impure! —but who cares)."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "0744797d-893a-4417-be76-4c55e70ddd30",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"class MonadicTask(Generic[V]):\n",
" \"\"\"A task whose dependencies can be determined dynamically.\"\"\"\n",
"\n",
" def run(self, fetch: Callable[[str], V]) -> V:\n",
" raise Exception(\"todo\")\n",
"\n",
"\n",
"class ApplicativeTask(MonadicTask[V]):\n",
" \"\"\"A task whose dependencies are known statically.\"\"\"\n",
"\n",
" def dependencies(self) -> list[str]:\n",
" raise Exception(\"todo\")"
]
},
{
"cell_type": "markdown",
"id": "ed880f6c-c4f4-49d4-acbc-1bef85298ffd",
"metadata": {},
"source": [
"Before we continue, let's reimplement the spreadsheet examples in this simplified view."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "5a62cec1-4dd7-4e2b-b868-72c07eb23947",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def sprsh1(k: str) -> Optional[ApplicativeTask[int]]:\n",
" match k:\n",
" case \"B1\":\n",
" class _(ApplicativeTask):\n",
" def dependencies(self): return [\"A1\", \"A2\"]\n",
" def run(self, fetch): return fetch(\"A1\") + fetch(\"A2\")\n",
" return _()\n",
" case \"B2\":\n",
" class _(ApplicativeTask):\n",
" def dependencies(self): return [\"B1\"]\n",
" def run(self, fetch): return 2 * fetch(\"B1\")\n",
" return _()\n",
" case _:\n",
" return None"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "c2659e73-8c0d-4a3a-8b6b-5538ec218475",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# This spreadsheet has dynamic dependencies, so it only implements `MonadicTask`,\n",
"# rather than the more general `ApplicativeTask`.\n",
"\n",
"def sprsh2(k: str) -> Optional[MonadicTask[int]]:\n",
" match k:\n",
" case \"B1\":\n",
" class _(MonadicTask):\n",
" def run(self, fetch):\n",
" if fetch(\"C1\") == 1: return fetch(\"B2\")\n",
" else: return fetch(\"A2\")\n",
" return _()\n",
" case \"B2\":\n",
" class _(MonadicTask):\n",
" def run(self, fetch):\n",
" if fetch(\"C1\") == 1: return fetch(\"A1\")\n",
" else: return fetch(\"B1\")\n",
" return _()\n",
" case _:\n",
" return None"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "8a18c264-66e1-4be6-96d8-147eeb460b1a",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"I = TypeVar(\"I\") # associated to store\n",
"\n",
"@dataclass\n",
"class Store(Generic[I, V]):\n",
" values: dict[str, V]\n",
" data: I"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "f224491a-4f27-4985-9464-8cf5a3c8cb37",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def busy(\n",
" tasks: Callable[[str], Optional[MonadicTask[V]]],\n",
" key: str,\n",
" store: Store[None, V],\n",
") -> dict[str, V]:\n",
" def fetch(k: str) -> V:\n",
" task = tasks(k)\n",
" if task is None:\n",
" return store.values[k]\n",
" else:\n",
" print(f\" BUILD {k}\")\n",
" result = task.run(fetch)\n",
" store.values[k] = result\n",
" return result\n",
"\n",
" fetch(key)\n",
" return store"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "90a1e54a-8022-4d2e-b132-73746e64ad1a",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" BUILD B2\n",
" BUILD B1\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'A1': 10, 'A2': 20, 'B1': 30, 'B2': 60}, data=None)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store = Store({\"A1\": 10, \"A2\": 20}, None)\n",
"busy(sprsh1, \"B2\", store)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "0da05069-1567-43da-86a9-eade1d695d5a",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" BUILD B2\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'A1': 10, 'A2': 20, 'C1': 1, 'B2': 10}, data=None)"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store = Store({\"A1\": 10, \"A2\": 20, \"C1\": 1}, None)\n",
"busy(sprsh2, \"B2\", store)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "637c1f21-4c33-4700-8f09-11554bed0517",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['A1', 'A2']\n",
"['B1']\n"
]
}
],
"source": [
"print(sprsh1(\"B1\").dependencies())\n",
"print(sprsh1(\"B2\").dependencies())\n",
"\n",
"# error: not ApplicativeTask\n",
"# print(sprsh2(\"B2\").dependencies())"
]
},
{
"cell_type": "markdown",
"id": "35e29459-5f31-47d3-8812-06bc43a8ea13",
"metadata": {},
"source": [
"## 5 Build Systems, Concretely\n",
"\n",
"Time to define schedulers and rebuilders! Conceptually, their signatures look like this:\n",
"\n",
"```\n",
"Store = dict[Key, Value]\n",
"Tasks = Key -> Task\n",
"Build = (Task, Key, Store) -> Store\n",
"\n",
"# Chooses which keys to build in what order.\n",
"Scheduler = Rebuilder -> Build\n",
"\n",
"# Rebuild a key only if it has changed, using some cache.\n",
"Rebuilder = (Key, Value, ApplicativeTask) -> MonadicTask (*with extra state for caching)\n",
"```\n",
"\n",
"In other words, the combination of a `Scheduler` and a `Rebuilder` gives us a `Build` system (like `busy`)."
]
},
{
"cell_type": "markdown",
"id": "e23377b6-01c5-4129-a19d-879570090dc2",
"metadata": {},
"source": [
"## 5.1 Make"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "2ef9b5d2-d0ed-4e33-b61e-7912bb4016de",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"@dataclass\n",
"class MakeInfo:\n",
" now: int = 0\n",
" mod_times: dict[str, float] = field(default_factory=dict)\n",
"\n",
"\n",
"def mod_time_rebuilder(key: str, value: V, task: ApplicativeTask[V], state: MakeInfo):\n",
" class RebuilderTask(MonadicTask):\n",
" def run(self, fetch):\n",
" match state.mod_times.get(key):\n",
" case None:\n",
" dirty = True\n",
" case mod_time:\n",
" dirty = False\n",
" for d in task.dependencies():\n",
" d_mod_time = state.mod_times.get(d)\n",
" if d_mod_time is not None and d_mod_time > mod_time:\n",
" dirty = True\n",
" break\n",
"\n",
" if not dirty:\n",
" print(f\" SKIP {key}\")\n",
" return value\n",
" else:\n",
" state.mod_times[key] = state.now\n",
" state.now += 1\n",
" print(f\" BUILD {key}\")\n",
" return task.run(fetch)\n",
"\n",
" return RebuilderTask()\n",
"\n",
"\n",
"def topological(rebuilder): # our first Scheduler! (Applicative)\n",
" def builder(tasks, target: V, store: Store[V, MakeInfo]):\n",
" def dep(key):\n",
" task = tasks(key)\n",
" return task.dependencies() if task is not None else []\n",
"\n",
" order = top_sort(reachable(dep, target))\n",
" for key in order:\n",
" task = tasks(key)\n",
" if task is None:\n",
" continue\n",
" value = store.values.get(key)\n",
" new_task = rebuilder(key, value, task, store.data)\n",
" store.values[key] = new_task.run(lambda k: store.values[k])\n",
"\n",
" return store\n",
"\n",
" return builder\n",
"\n",
"\n",
"## Standard graph algorithms — you should feel free to skip this.\n",
"@dataclass\n",
"class Graph:\n",
" deps: dict[str, list[str]]\n",
"\n",
"\n",
"def reachable(dep: Callable[[str], [str]], key: str) -> Graph:\n",
" g = Graph({})\n",
" s: list[str] = [key]\n",
" while s:\n",
" k = s.pop()\n",
" if k not in g.deps:\n",
" neighbors = dep(k)\n",
" g.deps[k] = neighbors\n",
" for next_k in neighbors:\n",
" s.append(next_k)\n",
" return g\n",
"\n",
"\n",
"def top_sort(g: Graph) -> list[str]:\n",
" # Returns reverse topological order (dependencies first). Throws error on a cyclic graph.\n",
"\n",
" order: dict[str, None] = {} # set[] does not preserve insertion order\n",
" stack: dict[str, None] = {} # set[] does not preserve insertion order\n",
"\n",
" def dfs(n: str) -> None:\n",
" if n in order:\n",
" return\n",
" if n in stack:\n",
" raise ValueError(\"cycle detected: \" + \" -> \".join(stack.keys()))\n",
" stack[n] = None\n",
" for v in g.deps[n]:\n",
" dfs(v)\n",
" del stack[n]\n",
" order[n] = None\n",
"\n",
" for n in g.deps:\n",
" dfs(n)\n",
"\n",
" return list(order.keys())"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "5a139584-0d63-4c73-a341-ca0aeeb84c07",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"make = topological(mod_time_rebuilder)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "952fe1ad-3a31-4dc0-9318-240ee6981421",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" BUILD B1\n",
" BUILD B2\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'A1': 10, 'A2': 20, 'B1': 30, 'B2': 60}, data=MakeInfo(now=2, mod_times={'B1': 0, 'B2': 1}))"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store = Store({\"A1\": 10, \"A2\": 20}, MakeInfo())\n",
"make(sprsh1, \"B2\", store)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "87e7541a-ab56-4ee1-98b0-e210052c2371",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" SKIP B1\n",
" SKIP B2\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'A1': 10, 'A2': 20, 'B1': 30, 'B2': 60}, data=MakeInfo(now=2, mod_times={'B1': 0, 'B2': 1}))"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"make(sprsh1, \"B2\", store)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "0295cf33-06a7-4ed9-833b-c114ddccb027",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" SKIP B1\n",
" BUILD B2\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'A1': 10, 'A2': 20, 'B1': 1080, 'B2': 2160}, data=MakeInfo(now=4, mod_times={'B1': 2, 'B2': 3}))"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Change B1\n",
"store.values[\"B1\"] = 1080\n",
"store.data.mod_times[\"B1\"] = store.data.now\n",
"store.data.now += 1\n",
"\n",
"make(sprsh1, \"B2\", store)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "6a566168-6aeb-4681-8eda-ab75fb0e7558",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" BUILD B1\n",
" BUILD B2\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'A1': 2, 'A2': 20, 'B1': 22, 'B2': 44}, data=MakeInfo(now=7, mod_times={'B1': 5, 'B2': 6, 'A1': 4}))"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Change A1\n",
"store.values[\"A1\"] = 2\n",
"store.data.mod_times[\"A1\"] = store.data.now\n",
"store.data.now += 1\n",
"\n",
"make(sprsh1, \"B2\", store)"
]
},
{
"cell_type": "markdown",
"id": "39840c58-119f-450d-9346-c5a6e55c4146",
"metadata": {},
"source": [
"### Aside: A couple other examples\n",
"\n",
"I'll create a couple other synthetic examples of `Task Applicative` besides `sprsh1`."
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "4852b647-67d2-4890-af53-2efedc65b637",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def chain(k: str) -> Optional[ApplicativeTask[int]]:\n",
" \"\"\"H0 <- H1 <- H2 <- H3 <- ...\"\"\"\n",
" num = int(k[1:])\n",
" if num > 0:\n",
" dep = \"H\" + str(num - 1)\n",
" class _(ApplicativeTask):\n",
" def dependencies(self): return [dep]\n",
" def run(self, fetch): return fetch(dep) + 1\n",
" return _()\n",
" else:\n",
" return None"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "aadb6d2f-0139-4144-8722-836b6b668c7f",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" BUILD H1\n",
" BUILD H2\n",
" BUILD H3\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'H0': 42, 'H1': 43, 'H2': 44, 'H3': 45}, data=MakeInfo(now=3, mod_times={'H1': 0, 'H2': 1, 'H3': 2}))"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store = Store({\"H0\": 42}, MakeInfo())\n",
"make(chain, \"H3\", store)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "6aa4ea37-6129-4b0e-9683-179c7ed06e16",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" SKIP H1\n",
" SKIP H2\n",
" SKIP H3\n",
" BUILD H4\n",
" BUILD H5\n",
" BUILD H6\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'H0': 42, 'H1': 43, 'H2': 44, 'H3': 45, 'H4': 46, 'H5': 47, 'H6': 48}, data=MakeInfo(now=6, mod_times={'H1': 0, 'H2': 1, 'H3': 2, 'H4': 3, 'H5': 4, 'H6': 5}))"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"make(chain, \"H6\", store)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "f5637e8b-3155-4750-a1ae-fc76565a4cd8",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" SKIP H1\n",
" SKIP H2\n",
" SKIP H3\n",
" SKIP H4\n",
" SKIP H5\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'H0': 42, 'H1': 43, 'H2': 44, 'H3': 45, 'H4': 46, 'H5': 47, 'H6': 48}, data=MakeInfo(now=6, mod_times={'H1': 0, 'H2': 1, 'H3': 2, 'H4': 3, 'H5': 4, 'H6': 5}))"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"make(chain, \"H5\", store)"
]
},
{
"cell_type": "markdown",
"id": "e4ea9482-d972-4b33-8a0d-708e831da2ab",
"metadata": {},
"source": [
"We can also use our previous, `busy` build system below. Note how it runs the tasks in reverse order."
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "0080c287-0ed6-45f7-9e74-62b49867e82a",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" BUILD H5\n",
" BUILD H4\n",
" BUILD H3\n",
" BUILD H2\n",
" BUILD H1\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'H0': 42, 'H1': 43, 'H2': 44, 'H3': 45, 'H4': 46, 'H5': 47, 'H6': 48}, data=MakeInfo(now=6, mod_times={'H1': 0, 'H2': 1, 'H3': 2, 'H4': 3, 'H5': 4, 'H6': 5}))"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"busy(chain, \"H5\", store)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "24a9dd3e-e3d6-4a28-98f1-2386f95d825a",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def tree(k: str) -> Optional[ApplicativeTask[list[int]]]:\n",
" \"\"\"T1 <- (T2 <- (T4 <- ..., T5 <- ...), T3 <- ...)\"\"\"\n",
" num = int(k[1:])\n",
" if num > 1:\n",
" dep = \"T\" + str(num // 2)\n",
" class _(ApplicativeTask):\n",
" def dependencies(self): return [dep] # aside: this is actually a `Task Functor`\n",
" def run(self, fetch): return [*fetch(dep), dep]\n",
" return _()\n",
" else:\n",
" return None"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "0e1caa08-3510-4873-93ba-2da55717495d",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"store = Store({\"T1\": []}, MakeInfo())"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "559b4fd9-b1e4-4876-9f3e-c5bcd0192d23",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" BUILD T3\n",
" BUILD T6\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'T1': [], 'T3': ['T1'], 'T6': ['T1', 'T3']}, data=MakeInfo(now=2, mod_times={'T3': 0, 'T6': 1}))"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"make(tree, \"T6\", store)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "126b3b8c-e1df-400c-ad9e-21e42163e348",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" SKIP T3\n",
" BUILD T7\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'T1': [], 'T3': ['T1'], 'T6': ['T1', 'T3'], 'T7': ['T1', 'T3']}, data=MakeInfo(now=3, mod_times={'T3': 0, 'T6': 1, 'T7': 2}))"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"make(tree, \"T7\", store)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "1439fe3f-e587-4b06-8caa-189ca4dc0573",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" BUILD T2\n",
" BUILD T5\n",
" BUILD T11\n",
" BUILD T23\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'T1': [], 'T3': ['T1'], 'T6': ['T1', 'T3'], 'T7': ['T1', 'T3'], 'T2': ['T1'], 'T5': ['T1', 'T2'], 'T11': ['T1', 'T2', 'T5'], 'T23': ['T1', 'T2', 'T5', 'T11']}, data=MakeInfo(now=7, mod_times={'T3': 0, 'T6': 1, 'T7': 2, 'T2': 3, 'T5': 4, 'T11': 5, 'T23': 6}))"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"make(tree, \"T23\", store)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "670121a1-41a1-488a-8f22-c93c1bfb7f23",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" SKIP T3\n",
" SKIP T6\n",
" BUILD T12\n",
" BUILD T24\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'T1': [], 'T3': ['T1'], 'T6': ['T1', 'T3'], 'T7': ['T1', 'T3'], 'T2': ['T1'], 'T5': ['T1', 'T2'], 'T11': ['T1', 'T2', 'T5'], 'T23': ['T1', 'T2', 'T5', 'T11'], 'T12': ['T1', 'T3', 'T6'], 'T24': ['T1', 'T3', 'T6', 'T12']}, data=MakeInfo(now=9, mod_times={'T3': 0, 'T6': 1, 'T7': 2, 'T2': 3, 'T5': 4, 'T11': 5, 'T23': 6, 'T12': 7, 'T24': 8}))"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"make(tree, \"T24\", store)"
]
},
{
"cell_type": "markdown",
"id": "c685ef23-16fa-4738-8d42-e5b3751946e4",
"metadata": {},
"source": [
"## 5.2 Excel\n",
"\n",
"A combination of the `restarting` scheduler with the `dirty_bit_rebuilder`.\n",
"\n",
"The paper uses a tricky `MonadState` feature here to act as a coroutine, interrupting the task's execution in the middle if there's a new, non-computed key. We're just going to raise and catch and exception instead."
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "baadcd06-c01d-46aa-b58c-15b9508a1c74",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"@dataclass\n",
"class ExcelInfo:\n",
" is_dirty: Callable[[str], bool]\n",
" chain: list[str]\n",
"\n",
"\n",
"def dirty_bit_rebuilder(key: str, value: V, task: MonadicTask[V], is_dirty: Callable[[str], bool]):\n",
" class RebuilderTask(MonadicTask):\n",
" def run(self, fetch):\n",
" if is_dirty(key):\n",
" print(f\" BUILD {key}\")\n",
" return task.run(fetch)\n",
" print(f\" SKIP {key}\")\n",
" return value\n",
" return RebuilderTask()\n",
"\n",
"\n",
"# Custom exception to handle interrupts.\n",
"class InterruptedException(Exception):\n",
" pass\n",
"\n",
"\n",
"def restarting(rebuilder): # scheduler for Task Monad\n",
" def builder(tasks, target: V, store: Store[V, ExcelInfo]):\n",
" keys = [*store.data.chain] # copy the chain\n",
" done: set[str] = set()\n",
"\n",
" if target not in keys:\n",
" keys.append(target)\n",
"\n",
" new_chain: list[str] = []\n",
"\n",
" while keys:\n",
" key = keys.pop(0)\n",
" task = tasks(key)\n",
"\n",
" if task is None:\n",
" done.add(key)\n",
" new_chain.append(key)\n",
" continue\n",
"\n",
" new_task = rebuilder(key, store.values.get(key), task, store.data.is_dirty)\n",
" def fetch(k: str):\n",
" if k not in done:\n",
" print(f\" - interrupted: missing dep {k}\")\n",
" raise InterruptedException(k)\n",
" return store.values[k]\n",
" \n",
" try:\n",
" new_value = new_task.run(fetch)\n",
" store.values[key] = new_value\n",
" done.add(key)\n",
" new_chain.append(key)\n",
" except InterruptedException as exc:\n",
" dep = str(exc)\n",
" keys = [dep, *(k for k in keys if k != dep), key] # put the failed key at the end again and retry later\n",
"\n",
" store.data.chain = new_chain\n",
" return store\n",
"\n",
" return builder"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "2450c5ad-53b8-43f1-9601-96c575bbe7fe",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"excel = restarting(dirty_bit_rebuilder)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "6a0838ce-7e0a-4364-aeda-fc5490eab716",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" BUILD B2\n",
" - interrupted: missing dep B1\n",
" BUILD B1\n",
" - interrupted: missing dep A1\n",
" BUILD B2\n",
" - interrupted: missing dep B1\n",
" BUILD B1\n",
" - interrupted: missing dep A2\n",
" BUILD B2\n",
" - interrupted: missing dep B1\n",
" BUILD B1\n",
" BUILD B2\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'A1': 10, 'A2': 20, 'B1': 30, 'B2': 60}, data=ExcelInfo(is_dirty=<function <lambda> at 0x121057ec0>, chain=['A1', 'A2', 'B1', 'B2']))"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dirty = {\"B1\", \"B2\"}\n",
"store = Store({\"A1\": 10, \"A2\": 20}, ExcelInfo(lambda k: k in dirty, []))\n",
"\n",
"excel(sprsh1, \"B2\", store)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "4a35e41e-a031-45cd-9f1f-05efe4e5ef66",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" SKIP B1\n",
" SKIP B2\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'A1': 10, 'A2': 20, 'B1': 30, 'B2': 60}, data=ExcelInfo(is_dirty=<function <lambda> at 0x121057ec0>, chain=['A1', 'A2', 'B1', 'B2']))"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dirty = {}\n",
"excel(sprsh1, \"B2\", store)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "da2ad93c-4eba-4e90-9c7d-9b044b333a65",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" BUILD B1\n",
" BUILD B2\n"
]
},
{
"data": {
"text/plain": [
"Store(values={'A1': 10, 'A2': 32, 'B1': 42, 'B2': 84}, data=ExcelInfo(is_dirty=<function <lambda> at 0x121057ec0>, chain=['A1', 'A2', 'B1', 'B2']))"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# change A2, and set its transitive closure to be dirty\n",
"store.values[\"A2\"] = 32\n",
"dirty = {\"B1\", \"B2\"}\n",
"excel(sprsh1, \"B2\", store)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment