Skip to content

Instantly share code, notes, and snippets.

@wallabra
Last active October 18, 2021 23:55
Show Gist options
  • Save wallabra/7ae1b71ee944f2511b8d03e1d095a516 to your computer and use it in GitHub Desktop.
Save wallabra/7ae1b71ee944f2511b8d03e1d095a516 to your computer and use it in GitHub Desktop.
PALIE: A proposed system of modifications to Python bytecode and attribute lookup

Python Attribute Lookup Internal Enumeration

This is a proposal to a modified version of Python bytecode, which supports an enumeration of class instance attributes that are unconditionally set by __init__ .

1. Abstract

PALIE is a proposed system of modifications, which concerns both the internal representation of Python operations, the manner in which it is produced, and the execution thereof. A PALIE-enabled Python compiler will produce extra information, alongside what Python bytecode already contains, in a number of circumstances:

a) It will keep track of __init__ methods, both when defined inside a class or when assigned to one, traversing the bytecode within to track attributes that may exist in self, and store those with the __init__ method; the interpreter will then always assign the class’s current table of known attributes to that of its current __init__ method. This is the hot symbol table.

b) It will automatically hash the identifiers of attribute lookups, which should already speed the interpreter up a little as the hash can simply be loaded directly from the bytecode as opposed to hashing the identifier at runtime.

Additionally, a PALIE-enabled Python interpreter will load such hot symbol table data whenever a class is instantiated or its __init__ reassigned. at runtime. Then, when a class is instantiated, the class instance will point to the respective class’s current hot symbol table, and point to that one even if it is later replaced (e.g. by a __init__ reassignment).

Then, when the interpreter’s attribute lookup method, is invoked, it will check the class instance’s hot symbol table pointer, and verify whether the attribute hash (or index, which might also be pre-computed in the compilation step?) exists. in it. If so, it’ll then run the lookup directly in the corresopnding bucket, performing the usual sanity checks (such as, "does the attribute still exist?" etc.) before returning the value. If otherwise, it’ll fall back to the usual method.

The idea is that this will leverage the compilation step to precompute some possibly expensive runtime operations and aid in achieving better runtime performance in a language that is known to be bashed for the suboptimal runtime performance of its reference implementation, CPython. As such, PALIE is a proposal that applies specifically to CPython, and not other Python implementations, such as PyPy (where such a proposal would be redundant by the fact that PyPy tracing JIT compiles Python code into machine code, where I imagine attribute lookups are already compiled where applicable?).

2. Motive

CPython is the reference Python implementation. It’s also the most common, and a relatively simple one at that. It is an interpreter, which utilizes a state machine of sorts in order to execute bytecode serially.

There do exist other implementations which take a different approach to executing Python. For instance, PyPy is a tracing JIT compiler, which will, at runtime, find the "hottest" (most frequent, or most overhead incurring) codepaths in the Python bytecode, and will Just-in-Time compile them into machine code, often with calls back to C methods in the host Python environment.

However, CPython is the most common one, but also one that is relatively mediocre in terms of performance (at least for performance-sensitive use cases). For simple people, this limits the usability of Python, and can be a turn-off.

Native libraries, such as NumPy, exist which attempt to bridge the pervasiveness of CPython with the performance of native(-ish) code. They will often compile large, homogenous operations, usually large numbers of the same instructions, into machine code, much oftentiems leveraging the vectorization and CPU cache capabilities of the host processor.

However, those are only effective in their own subdomain of Python use cases. This is why it’s attractive to help CPython run faster in general – this will make Python a more attractive choice to the common person, especially for applications such as games or multimedia processing.

3. Background

PALIE vies to address one of the most blatant issues with CPython regarding efficiency: attribute lookups.

In Python, attributes are one of the fundamental concepts of the language’s design. And yet, in the reference implementation CPython, the way attributes are found, whether for getting their value or setting it, is quite convoluted and full of pointer indirection.[1]

This is not the only thing that can be improved in CPython, but nonetheless it is the main focus of the PALIE proposal.

4. Changes

Enabling a Python interpreter with PALIE involves a number of modifications, which rely in each other and work in tandem (they’re tightly coupled in order to operate, hence "modification system"). Some lie in the compilation of Python code, others in the compiled bytecode’s execution.

Keep in mind that this involves fundamental changes to Python bytecode, which may make it incompatible with non-PALIE-enabled interpreters, unless it is possible to insert additional information without breaking said compatibility. (I am admittedly unaware.)

4.1. Compilation

The compilation step of a Python implementation involves taking the code in its original text form, tokenizing it, parsing it into a format that is handier to operate on for a computer, then apply a number of optimization steps, which will finally produce a bytecode which an interpreter can easily execute whenever via a state machine.

This bytecode will often contain information regarding the code originally executed. For instance, line numbers (and original lines) are kept, so that tracebacks printed will always match the source code from which the bytecode was produced.

PALIE proposes extra information to be produced by the compilation step:

4.1.1. The hot symbol table

In Python, a class instance’s attributes is assigned by its __init__ method, which is called on the class instance when it is, well, initialized.

This does not guarantee that any particular attribute will always exist at a given class instance, but it does provide a good starting point for attribute lookups where common attributes may be held.

A compiler that implements PALIE will find the __init__ method of a class, as well as follow reassignment to a class’s __init__. For every method which eventually becomes a class’s __init__, a hot symbol table is produced by the compiler. To produce it, the method’s body is analyzed, and all assignments to self are enumerated, even conditional ones.

This does pose a bit of a problem regarding __init__ decoration. For instance, if a class’s __init__ is assigned to a variable in a local scope, and then a function defined which uses this variable, which in turn is then reassigned to the class’s __init__, this is non-trivial to disentangle, especially at compile-time. Especially if this method definition, or local scope, is executed multiple times, and for many different classes, such as with a class decorator.

This proposal addresses this by attaching a hot symbol table to the method definition, and then recursing on all function calls inside the __init__ method body (only up to the first level), checking if they are known to the compiler and this scope and have, too, a hot symbol table.

The compiler may assess whether a hot symbol table can only be exactly disambiguated at runtime, e.g. if a function call is variable with respect to this current scope; if this is the case, a secondary table, pointed to by both the scope and the method’s definitions in bytecode, will list names that the interperter should be tracking whenever inside this scope so that, whenever a hot symbol table is disambiguated (kind of like C++ template instantiation), only whatever those names are currently set to will be accordingly analyzed for symbols to append to the hot symbol table disambiguation.

However, this approach can get complicated fast. PALIE is not yet a consolidated and set-in-stone proposal. Thus, caveat implementor.

4.1.2. The attribute identifier hash

Every attribute lookup, whether it is a get or a set, requires an identifier.

With PALIE, the identifier is hashed at the compilation step. This hash is stored alongside said identifier. If

  • the variable’s type is a class and is known at compile time (e.g. with PEP 585 type hints[2]), and

  • its __init__ is also known in this scope and has a hot symbol table, and

  • said hot symbol table happens to contain this identifier,

the index into this hot symbol table will also be stored alongside the identifier itself and the hash; otherwise the hot symbol table’s own hash mapp has to be used later on in order to be used.

4.2. Execution

Now that every class instance has a pointer to a hot symbol table, the attribute lookup method can look up there. Any given attribute lookup is most likely to be present there.

An attribute lookup in an interpreter with PALIE will, thus, first find the index into the hot symbol table for the given attribute lookup:

  1. If the index is stored alongside it, load the attribute slot pointed to by the hot symbol table entry at that index. Check that the hash of the attribute lookup bytecode matches that of the table entry, just in case. If they do, use that slot and break.

  2. Use the hash provided by the attribute lookup bytecode and lookup into the table’s tiny hash map. It maps identifier hashes directly into attribute slots, since it’d be a waste to have it map into a table entry only for that then to point back into the instance. Since this will be a small and relatively constant mapping, a linked list of buckets with the same hash modulo will tend to be much smaller than your usual internal attribute hashmap.

If Step 2 is reached (Step 1 was a miss), it might actually be slower than the current attribute lookup method. For this reason, being able to determine hot symbol table indices ahead of time is important. PALIE should by default already fallback to the CPython attribute lookup method if this point is reached.

If so desired by the implementor, a heuristic may be used to determine whether it is worthwhile stepping into Step 2 rather than fallbacking early.

In any case, even if the slot is found, the attribute may not exist in this instance, or it may have existed but have since been deleted. If that is the case, just do the usual: raise AttributeException.

It may also be worthwhile to allow the programmer to define some attributes as "undeletable", in which case it is trying to delete them that raises an exception from the interpreter. This would be recorded in any hot symbol table entries corresponding to said attributes (if any), and could skip the possibly expensive "attribute exists?" internal check.

5. Caveats

There do exist some caveats with PALIE that are not yet addressed by this proposal document. Think of this as a to-do list.

  1. Even the ideal PALIE execution will still involve some degree of pointer indirection.

  2. The suboptimal case may be actually slower if the symbol table is still used.

  3. There would be a lot of complicated, finicky, and potentially fruitless, code analysis being done at the CPython compilation time, in order to provide anywhere near the level of sophistication needed to overcome caveat 2. Whether it is worthwhile, or even well thought out at all, is debatable.


1. "Data-oriented programming in Python", Brian Kihoon Lee, written 2020-09-03. (https://www.moderndescartes.com/essays/data_oriented_python/)
2. "PEP 585 — Type Hinting Generics In Standard Collections", Łukasz Langa, written 2019-03-03. (https://www.python.org/dev/peps/pep-0585/)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment