esc/numba.typed.List-benchmark-001.ipynb

## numba.typed.List-benchmark-001.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "175e4c60-6321-4b36-8e58-2a99c338acb7",
   "metadata": {},
   "source": [
    "# Benchmarking Numba's `typed.List` implementation of `getitem` and `setitem` with some ideas on how to improve performance."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb4d3290-0dbc-4559-8f2e-36a69927f1ca",
   "metadata": {},
   "source": [
    "In this post I will attempt to examine the performance of the `numba.typed.List` datastructure. From a high level this datastructure is a hybrid between a NumPy array and a regular Python `list`. The datastructure is backed by a c-style contiguous array (or buffer) in memory, to which either primitive types or pointers can be written. In that sense it is similar to a Numpy array in terms of how the data is stored. On the other hand the `numba.typed.List` is like a regular Python `list` as common Python list API functions -- such as `append` and `extend` -- are also supported. In addition the `numba.typed.List` has the same growth semantics as a regular Python list (in fact the implementation of the growth subroutine itself is derived from  the cpython C implementation of `list`). That is to to say, the underlying c-style array will grow and shrink as elements are added or removed, with a certain fraction of overallocations internally. Effectively the runtime will be amortized constant (`O(1)`). One common source of grief for Numba users is that the `setitem` and `getitem` implementations as they are anecdotally reported as performing poorly.\n",
    "\n",
    "In this notebook I will benchmark the three different containers, Python `list`, NumPy array and `numba.typed.List` for the functions `setitem` and `getitem` in both regular Python and Numba `@njit` compiled variants. Based on the initial benchmarks I will attempt to narrow down and investigate the bottlenecks here.\n",
    "\n",
    "Thank you to Danny Weitekamp from the numba.discourse.group for the initial benchmark that this was derived from: https://numba.discourse.group/t/performance-of-typed-list-outside-of-jit-functions/2560/7?u=esc"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1dc0d13f-b57e-46ca-8761-e6d1e227e13c",
   "metadata": {},
   "source": [
    "## Baseline Code\n",
    "\n",
    "Below we define some baseline code to perform the benchmarks. We can define generalized test methods that will work will all three container types. We can use `py_func` on the `@njit` compiled functions to access the original function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e631a54c-7353-44a8-98cf-6bb821f93be5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# imports\n",
    "import numpy as np\n",
    "from numba import njit, types, int64, i8\n",
    "from numba.typed import List as TypedList\n",
    "from numba.types import ListType"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "20001246-bcf8-469b-a122-b698a6dff1de",
   "metadata": {},
   "outputs": [],
   "source": [
    "@njit\n",
    "def setitem(l):\n",
    "    \"\"\" Set all elements in the container to 1. \"\"\"\n",
    "    for i in range(len(l)):\n",
    "        l[i] = 1\n",
    "    return l"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "85c05513-8d4a-4efc-b8fe-29abe3bd9e05",
   "metadata": {},
   "outputs": [],
   "source": [
    "@njit\n",
    "def getitem(l):\n",
    "    \"\"\" Sum all elements in the container. \"\"\"\n",
    "    c = 0\n",
    "    for i in range(len(l)):\n",
    "        c += l[i]\n",
    "    return c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "a89b0123-3725-4a15-8b19-5d2986445339",
   "metadata": {},
   "outputs": [],
   "source": [
    "n = 100000                       # size of containers to benchmark\n",
    "py_list = [0] * n                # Python list\n",
    "np_array = np.zeros(n)           # NumPy array\n",
    "nbty_list = TypedList(py_list)   # Numba typed.List -- initialized from Python list"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ee3967d-0554-47cc-a27c-e42505bfd212",
   "metadata": {},
   "source": [
    "## Setitem Python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d3ee1bc1-59d5-417a-bfae-18c1f3978c70",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.8 ms ± 46.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit setitem.py_func(py_list)  # regular Python list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c1a7f8d3-6374-472f-bf11-5105c5223ae1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5.47 ms ± 52.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit setitem.py_func(np_array)  # NumPy array setting from the interpreter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "afaa896b-391b-4701-9f80-b659b9712d90",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "78.8 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit setitem.py_func(nbty_list)  # Numba typed.List setting from the interpreter"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77208c6c-d603-463c-9548-6774da09255f",
   "metadata": {},
   "source": [
    "In conclusion we observe that the `typed.List` is about an order of magnitude slower compared to NumPy for this example. Intuitively, this is little surprising, since in in principle, both containers must unbox an integer and set the value of a memory location."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e924c7b-64ee-48a0-99bc-d138300f9af5",
   "metadata": {},
   "source": [
    "## Setitem `@njit` compiled"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "6afc1712-cb05-4728-9d07-ad86ef7c3cdd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "51.2 ms ± 400 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit setitem(py_list)  # Numba reflected list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "ed2b3f92-965c-45d2-8bb3-a94af5ea73b5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "17 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit setitem(np_array)  # NumPy array setting in a JIT compiled function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "45d736d4-8662-459e-b03b-c18c2c1cfd7e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.12 ms ± 3.49 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit setitem(nbty_list)  # typed.List array setting in a JIT compiled function."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f7b30a3-fe96-4edb-a4ef-b697c6451264",
   "metadata": {},
   "source": [
    "In the compiled case, we are primarily looking at NumPy vs typed.List -- and we can see again that NumPy is significantly faster than the `numba.typed.List`. Although, both are now quicker than using a regular Python `list` from the interpreter. To explain the first data-point, if we pass a regular Python list to a `@njit` compiled function, then we use s so-called \"reflected\" list. This is an older -- and now deprecated -- feature of Numba. Essentially it's a structure that only exists inside `@njit` compiled functions. When a Python list is unboxed, it is turned into a proxy datastructure that memoizes all the changes that will be made to the Python list in this function. At the end of the function call, those changes are then reflected back to the Python list. As you can see, it is quite slow and `typed.List` is much better for this use-case"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3555b239-7c93-4e1b-9807-cd988852b27d",
   "metadata": {},
   "source": [
    "## Getitem Python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "27e7b869-afa9-409f-8274-101ba9e7a9ea",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.74 ms ± 3.86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit getitem.py_func(py_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "0d0924fb-c600-4d9f-8517-b9409d9f4a48",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "7.23 ms ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit getitem.py_func(np_array)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "c10ea786-a927-4b8f-b17e-019dd56eee5b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "73.8 ms ± 120 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit getitem.py_func(nbty_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d5547ff-5eaf-4f4c-aa2f-3917d3d4cd2d",
   "metadata": {},
   "source": [
    "Same story here, `typed.List` quite a bit slower."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8079e66c-f135-46b1-a220-9f0455e83752",
   "metadata": {},
   "source": [
    "## Getitem `@njit` compiled"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "343eda89-4e6b-432c-baa8-a32d479d4d30",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "50.5 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit getitem(py_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "41fabe11-1f35-4b94-983f-042a43559fb0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "94.7 µs ± 455 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit getitem(np_array)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "7040e867-cf4f-4d85-b270-fafd8ab47cea",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.11 ms ± 5.34 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit getitem(nbty_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5f5654c-cd11-43e1-b76b-d2495efcbe86",
   "metadata": {},
   "source": [
    "# Bypass dispatcher using `entry_point`\n",
    "\n",
    "As further pointed out by Danny Weitekamp in the aforementioned Discourse post, we can bypass the dispatcher using the `entry_point` directly using the following code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "7c9616fb-7add-4a82-8c45-f4dc0d556e1b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Annotate with types to force compilation\n",
    "@njit(types.void(ListType(int64), int64, int64))\n",
    "def setitem_i64(l, index, i):\n",
    "    l[index] = i\n",
    "\n",
    "# Get the entry_point\n",
    "setitem_i64_ep = setitem_i64.overloads[(ListType(int64), int64, int64)].entry_point\n",
    "\n",
    "# Use the entry_point in our benchmark\n",
    "def nb_setitem(l):\n",
    "    for i in range(len(l)):\n",
    "        setitem_i64_ep(l, i, 1)\n",
    "    return l"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "b140db96-04ba-4705-83a6-77bca5f69a1d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "numba.core.registry.CPUDispatcher"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(setitem_i64)  # type of the annotated function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "b716325b-384d-48e7-b058-b64c56b557df",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "builtin_function_or_method"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(setitem_i64_ep)  # just the entry_point, it's an executable function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "7144e00d-1cfc-4f72-9a4c-05e79477a43b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Annotate with types to force compilation\n",
    "@njit(types.int64(ListType(int64), int64))\n",
    "def getitem_i64(lst, index):\n",
    "    return lst[index]\n",
    "\n",
    "# Get the entry_point\n",
    "getitem_i64_ep = getitem_i64.overloads[(ListType(int64), int64)].entry_point\n",
    "\n",
    "# Use the entry_point in our benchmark\n",
    "def nb_getitem(l):\n",
    "    c = 0\n",
    "    for i in range(len(l)):\n",
    "        c += getitem_i64_ep(l, i)\n",
    "    return c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "854a0ad0-0bc6-44a6-9408-81c67bca2304",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "30.9 ms ± 445 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit nb_setitem(nbty_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "ab0a5624-196c-4d8c-9774-0ea72ce20ba7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "31.3 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit nb_getitem(nbty_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "554aeb53-5d3e-41a0-be0d-f171327b46bc",
   "metadata": {},
   "source": [
    "OK, so this is nice! We can roughly halve the cost of the benchmark using this."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "070d8d0e-3146-48be-951f-23e3785848c4",
   "metadata": {},
   "source": [
    "# Extracting the overhead of calling a Numba function\n",
    "\n",
    "Let's continue the investigation a little and look at the overhead of calling a Numba function. We also benchmark the call to `len(l)` to make sure it is neglegible compared to the actual \"work\", which is: Python iteration and function calls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "be743a3d-7c9f-455d-a770-298bb5ca2d72",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "767 ns ± 4.28 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit len(nbty_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "b966ca14-2e59-44a5-b9b2-0e1f02b69f4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Just return None\n",
    "@njit(types.void())\n",
    "def return_none():\n",
    "    return None\n",
    "\n",
    "return_none_ep = return_none.overloads[tuple()].entry_point\n",
    "\n",
    "def nb_return_none(l):\n",
    "    for i in range(len(l)):\n",
    "        return_none_ep()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "73d3cbc9-f983-4057-8a77-4ec411eec457",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.55 ms ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit nb_return_none(nbty_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc03b9b1-35f8-47b4-8d2f-33ade6756e6a",
   "metadata": {},
   "source": [
    "So, this is the overhead of calling a Numba compiled function without the dispatcher. Now let's create a function that just takes a `typed.List` as an argument but does no work otherwise. This will force the `typed.List` to be unboxed but do nothing otherwise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "f42fee73-4865-424d-a7c6-c99ee9d96907",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Just accept an argument\n",
    "@njit(types.void(ListType(int64)))\n",
    "def accept_argument(l):\n",
    "    return None\n",
    "\n",
    "accept_argument_ep = accept_argument.overloads[(ListType(int64),)].entry_point\n",
    "\n",
    "def nb_accept_argument(l):\n",
    "    for i in range(len(l)):\n",
    "        accept_argument_ep(l)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "696b18fd-6faa-4386-a57b-cd508cc9396e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "28.3 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit nb_accept_argument(nbty_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2e4d1f2-8af3-4f43-8ab2-c2ba020676e7",
   "metadata": {},
   "source": [
    "From this discrepenacy, it does appear obvious, that even just unboxing the `typed.List` incurs a non-neglegible cost, even if we don't do any work whatsoever on the unboxed structure."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd206507-9210-49e2-86f7-0ade4b1efd65",
   "metadata": {},
   "source": [
    "# Accelerating unboxing by avoiding a type check\n",
    "\n",
    "Let's next have a look at some of the unboxing code. We can grab a copy from the current `main` at:\n",
    "\n",
    "```Python\n",
    "@unbox(types.ListType)\n",
    "def unbox_listtype(typ, val, c):\n",
    "    context = c.context\n",
    "    builder = c.builder\n",
    "\n",
    "    # Check that `type(val) is Dict`\n",
    "    list_type = c.pyapi.unserialize(c.pyapi.serialize_object(List))\n",
    "    valtype = c.pyapi.object_type(val)\n",
    "    same_type = builder.icmp_unsigned(\"==\", valtype, list_type)\n",
    "\n",
    "    with c.builder.if_else(same_type) as (then, orelse):\n",
    "        with then:\n",
    "            miptr = c.pyapi.object_getattr_string(val, '_opaque')\n",
    "\n",
    "            native = c.unbox(types.MemInfoPointer(types.voidptr), miptr)\n",
    "\n",
    "            mi = native.value\n",
    "            ctor = cgutils.create_struct_proxy(typ)\n",
    "            lstruct = ctor(context, builder)\n",
    "\n",
    "            data_pointer = context.nrt.meminfo_data(builder, mi)\n",
    "            data_pointer = builder.bitcast(\n",
    "                data_pointer,\n",
    "                listobject.ll_list_type.as_pointer(),\n",
    "            )\n",
    "\n",
    "            lstruct.data = builder.load(data_pointer)\n",
    "            lstruct.meminfo = mi\n",
    "\n",
    "            lstobj = lstruct._getvalue()\n",
    "            c.pyapi.decref(miptr)\n",
    "            bb_unboxed = c.builder.basic_block\n",
    "\n",
    "        with orelse:\n",
    "            # Raise error on incorrect type\n",
    "            c.pyapi.err_format(\n",
    "                \"PyExc_TypeError\",\n",
    "                \"can't unbox a %S as a %S\",\n",
    "                valtype, list_type,\n",
    "            )\n",
    "            bb_else = c.builder.basic_block\n",
    "\n",
    "    # Phi nodes to gather the output\n",
    "    lstobj_res = c.builder.phi(lstobj.type)\n",
    "    is_error_res = c.builder.phi(cgutils.bool_t)\n",
    "\n",
    "    lstobj_res.add_incoming(lstobj, bb_unboxed)\n",
    "    lstobj_res.add_incoming(lstobj.type(None), bb_else)\n",
    "\n",
    "    is_error_res.add_incoming(cgutils.false_bit, bb_unboxed)\n",
    "    is_error_res.add_incoming(cgutils.true_bit, bb_else)\n",
    "\n",
    "    # cleanup\n",
    "    c.pyapi.decref(list_type)\n",
    "    c.pyapi.decref(valtype)\n",
    "\n",
    "    return NativeValue(lstobj_res, is_error=is_error_res)\n",
    "\n",
    "```\n",
    "\n",
    "We observe that this does quite a bit of work to ensure that the \"thing\" that came in as an argument is indeed a `typed.List`. However, this precondition is always satisfied by virtue of registering this unboxing code for the `ListType` which is the type object for the `typed.List`. As such, all the error checking and cleanup and so on is most likely redundent and can be remove. So let's write a potentially faster variant."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "a463d396-5438-4d60-9014-fa392b974be8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Imports from Numba internals\n",
    "from numba.core import cgutils\n",
    "from numba.core.pythonapi import unbox, _unboxers\n",
    "from numba.typed import listobject\n",
    "from numba.core.extending import NativeValue\n",
    "\n",
    "# Evict the original unboxer\n",
    "_unboxers.functions.pop(types.ListType)\n",
    "\n",
    "# Install the accelerated variant\n",
    "@unbox(types.ListType)\n",
    "def unbox_listtype(typ, val, c):\n",
    "    context = c.context\n",
    "    builder = c.builder\n",
    "\n",
    "    miptr = c.pyapi.object_getattr_string(val, '_opaque')\n",
    "    native = c.unbox(types.MemInfoPointer(types.voidptr), miptr)\n",
    "\n",
    "    mi = native.value\n",
    "    ctor = cgutils.create_struct_proxy(typ)\n",
    "    lstruct = ctor(context, builder)\n",
    "\n",
    "    data_pointer = context.nrt.meminfo_data(builder, mi)\n",
    "    data_pointer = builder.bitcast(\n",
    "        data_pointer,\n",
    "        listobject.ll_list_type.as_pointer(),\n",
    "    )\n",
    "\n",
    "    lstruct.data = builder.load(data_pointer)\n",
    "    lstruct.meminfo = mi\n",
    "\n",
    "    lstobj = lstruct._getvalue()\n",
    "    c.pyapi.decref(miptr)\n",
    "\n",
    "    return NativeValue(lstobj)\n",
    "\n",
    "# recompile and get new entry-point, such that new unboxer is picked up\n",
    "@njit(types.void(ListType(int64)))\n",
    "def accept_argument(l):\n",
    "    return None\n",
    "accept_argument_ep = accept_argument.overloads[(ListType(int64),)].entry_point"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "96b2225a-4598-42e1-917a-58c87036d2c1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "12.7 ms ± 149 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit nb_accept_argument(nbty_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d642a694-cee2-4daa-bf41-7cc876fb60e3",
   "metadata": {},
   "source": [
    "With the new unboxer in place, we can re-run the benchmark which avoids the dispatcher. Need to re-compile however."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "f34c5960-2d63-4e12-ab10-9840468b03dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Annotate with types to force compilation\n",
    "@njit(types.void(ListType(int64), int64, int64))\n",
    "def setitem_i64(l, index, i):\n",
    "    l[index] = i\n",
    "\n",
    "# Get the entry_point\n",
    "setitem_i64_ep = setitem_i64.overloads[(ListType(int64), int64, int64)].entry_point\n",
    "\n",
    "# Use the entry_point in our benchmark\n",
    "def nb_setitem(l):\n",
    "    for i in range(len(l)):\n",
    "        setitem_i64_ep(l, i, 1)\n",
    "    return l"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "c9008415-d89e-485d-b5c4-6e81bc5ded09",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "16.6 ms ± 99.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit nb_setitem(nbty_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cc5f85eb-2193-4272-b2c5-d0f7e02326d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: work out if bypassing the type check is actually safe, `git blame` may suggest otherwise."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}