Skip to content

Instantly share code, notes, and snippets.

@paugier
Last active September 25, 2025 15:05
Show Gist options
  • Select an option

  • Save paugier/bf9510c81a04b66506cf064868f42ad3 to your computer and use it in GitHub Desktop.

Select an option

Save paugier/bf9510c81a04b66506cf064868f42ad3 to your computer and use it in GitHub Desktop.
pyni-pep-draft

PEP 9999: A vision for Python's C API

This document is a tentative improvement (in particular adding an intro) by Pierre Augier of a PEP draft (https://docs.google.com/document/d/1hi7VJEfd7_59rqgkQhT1frmtxaktjnsVxeKZKFbxpJ8/edit). I wrote with my own point of view of Python user and package maintainer. I guess what I wrote is not actually adapted for this first PEP, but it seems to me that it can anyway be useful.

Authors: Stepan Sindelar, Petr Viktorin, Antonio Cuni, Tim Felgentreff, Mark Shannon, Ken Jin

Note: this PEP will be moved to the Py-NI repo once it’s more concrete.

Abstract

In this PEP, we set out a vision for the future of Python’s native API and ABI for 3rd party extensions. We propose the Python Native Interface (PyNI) - a modern C API and universal ABI for CPython extensions. PyNI addresses critical limitations of the current C API that constrain both CPython's evolution and alternative Python implementations' performance. The new API and its associated ABI will provide better performance, enhanced developer experience through better debugging capabilities, and universal compatibility across CPython versions, CPython build modes (free-threading), and alternative implementations, all while maintaining full backward compatibility with the existing C API.

Introduction

CPython's current C API places significant constraints on both CPython's internal evolution and the performance of alternative Python implementations. The API's design assumptions increasingly limit CPython's ability to implement performance optimizations, particularly for its new JIT compiler. While alternative implementations like PyPy and GraalPy can achieve strong performance on pure Python code, they face substantial performance penalties with C extensions due to incompatibilities with CPython's C API assumptions. This incompatibility fragments the Python ecosystem and limits adoption of alternative implementations.

The HPy project demonstrated that a modern, implementation-agnostic C API is technically feasible. One of HPy's key innovations is its universal ABI (Application Binary Interface), which provides two major benefits: first, it enables extensions to run across different CPython versions and alternative Python implementations without recompilation; second, it supports multiple runtime modes, including a debug mode that significantly improves development productivity by catching common errors. This concept becomes increasingly relevant as CPython introduces ABI-incompatible build modes (GIL-enabled and free-threading) and develops its JIT compiler. However, HPy faced adoption challenges that prevented it from reaching critical mass.

This PEP proposes the Python Native Interface (PyNI), a new C API and universal ABI designed to address these ecosystem-wide challenges. PyNI will be implemented as an additional layer within CPython, maintaining full compatibility with the existing C API while enabling new capabilities.

The migration path involves gradual adoption: as major packages transition to PyNI (either directly or through updated tools like Cython, nanobind, and PyO3), CPython will gain freedom to optimize its internals and JIT implementation without breaking existing extensions. Extensions built with PyNI's universal ABI will run efficiently across CPython versions and alternative implementations with better JIT compatibility, reducing maintenance overhead for package developers and enabling easier performance comparisons for users.

This document outlines the technical challenges with the current C API, presents the PyNI design, and describes the implementation roadmap for CPython integration.

Motivation

The following concrete examples demonstrate how CPython's current C API limits CPython's evolution and creates poor experiences for developers of Python implementations, Python package maintainers, and Python users. These issues, and the potential improvements enabled by a new C API, fall into three categories: performance, developer productivity, and compatibility.

Performance

CPython's current API exposes implementation details that are internal to CPython. Alternative implementations must emulate CPython internals to support the C API (see https://pypy.org/posts/2018/09/inside-cpyext-why-emulating-cpython-c-8083064623681286567.html), which is costly and negates the performance benefits of advanced techniques used in these implementations as soon as extensions are used. The current API also prevents performance improvements in CPython itself. For example, the following implementation details are exposed:

  1. Reference count semantics. This blocks reference count optimizations. For example, see this issue where CPython broke NumPy due to reference count optimizations python/cpython#133164.

  2. Struct details such as lists and tuples. This prevents optimizations to the internal representation of the list or tuple. See issue capi-workgroup/decisions#64.

  3. PyObject. This prevents adding tagged pointers to CPython's C API. CPython already uses tagged pointers in its interpreter loop, but cannot expand this optimization to other parts of the interpreter or C extensions.

A new C API would unlock future optimizations for both CPython and alternative implementations. For example, not depending on reference count semantics may allow CPython to eliminate even more reference counting, and free other implementations from being tied to CPython’s model.

Another significant performance opportunity lies in type information availability from extensions, which Python JITs can use to avoid boxing and unboxing operations. Research has shown substantial benefits from this approach (see https://dl.acm.org/doi/abs/10.1145/3652588.3663316) , and preliminary demonstrations with CPython's JIT show potential 3x speedups when type information is available from C extensions.

Developer Productivity and Robustness

Reference leaks are notoriously difficult to debug and track down in CPython extensions. By adopting a different ownership model, we can help both CPython developers and extension authors identify and fix reference leaks more easily. CPython has already implemented improved reference tracking in its interpreter loop (see python/cpython#127705), with plans to expand this functionality further (python/cpython#131527).

A new C API would extend these debugging capabilities beyond CPython's internal development to extension authors. All extensions that adopt the new C API would gain access to enhanced debugging tools for reference management, significantly improving development productivity and code robustness.

The HPy project demonstrated the value of such debugging facilities through its debug mode, which provides detailed tracing and validation of object lifecycle management, helping developers catch errors early in the development process (see https://hpyproject.org/blog/posts/2022/06/hpy-0.0.4-third-public-release/#debug-mode).

TODO: Include an example of a trace from the debugging facility for StackRefs in CPython.

Universal Builds / Forward and Backward Compatibility

CPython's C API changes with every version, requiring C extension authors to build separate versions of their extensions for each minor CPython release. While the stable ABI exists to address this issue, it has seen limited adoption due to performance concerns and API limitations, leaving the maintenance burden largely unresolved.

A new C API would enable universal builds that target multiple CPython versions simultaneously. Extensions built with the new API's universal ABI would run across different CPython versions without recompilation, significantly reducing the maintenance overhead for package developers. While this approach may involve a slight performance trade-off compared to version-specific optimizations, the benefits of reduced build complexity and improved compatibility would outweigh these costs for most use cases.

Additionally, the universal ABI would provide forward compatibility, allowing extensions to work with future CPython versions without modification, and backward compatibility within supported version ranges, enabling easier testing and deployment across diverse Python environments.

TODO: Include links to blog posts/talks about stable ABI (EuroPython 2024 talk about stable ABI from the PySide developer). This link https://ep2024.europython.eu/session/move-the-python-ecosystem-to-the-stable-abi/, actually by Victor Stinner ?

General presentation of the new API and ABI

The primary users of the new API and ABI will be 3rd party extensions, and the primary goal is to provide a well defined and stable boundary between the CPython VM and native extensions on both the API and ABI levels. While some parts of the CPython codebase may eventually migrate to PyNI, this is not a primary objective. For Python extension authors, it will still be recommended to prefer binding generators and higher level tools, such as nanobind, Cython, and PyO3, which will build on top of the new API and ABI.

PyNI is designed for correctness and performance rather than ergonomic convenience. The API should enable fast implementations and be suitable for native languages beyond C, including Rust and others. This design philosophy is reflected in the name: Python Native Interface (PyNI).

The new API will use the PyNI_ prefix for all functions and types, providing clear namespace separation from the existing C API.

The current C API will continue to be supported. Initially, the current C API would just coexist with the new C API. In the next phase the current C API will be built on top of the new C API.

TODO:

  • Define PyNI specification format (Python-based definition, see https://github.com/py-ni/api/tree/main/generate)
  • Specify header file organization (new PyAPI.h header?)
  • Detail PyNI versioning and evolution strategy?
  • Mention the split between CPython's internal APIs, legacy C API, and PyNI?

Context argument

Most new API functions will take a context argument: an opaque pointer-sized value. The context is borrowed from the caller; it is only valid for the duration of a call.

Exceptions:

  • Functions that initialize the interpreter do not take a context. (In the current API, these are functions that can be used when the runtime is not initialized.)

  • To enable gradually/partially converting extensions to the new API, and to support callbacks from non-Python libraries, we'll add functions to create and close a context. These can be used in cases where the context is not passed from the caller.

Finalization functions (for example, traverse/clear/dealloc hooks) will instead take a context-like argument of a distinct type, which will limit the API these functions can call.

References

Instead of PyObject*, the new API will use references for local variables. A reference is an opaque pointer-sized value. A reference can be duplicated; this creates a new reference that must be closed separately.

The lifetime of reference is limited by the context; that is, they are only valid for the duration of a call. If an object is needed after a call, it needs to be stored on an object using new API (or converted to PyObject* if needed for unmigrated parts of the codebase).

TODO: "limited by the context" unclear because using the word context in another meaning than just before?

For “global” variables, objects can be stored in module state.

Two target ABIs and compatibility with the current API

The current PyObject*-based API will remain fully available and can be freely mixed with PyNI even within a single extension. However, extensions using this mixed mode will only be compatible with the standard ABI, requiring compilation for specific Python implementation versions.

Extensions written in pure PyNI (without including python.h) can target two different ABIs:

  1. Standard ABI: Compatible with the current compilation model, targeting specific CPython versions
  2. Universal ABI: Based on opaque references and a context-based function pointer table, enabling cross-version and cross-implementation compatibility

For backward compatibility, a PyPI package will provide PyAPI.h, allowing PyNI extensions to compile against older CPython versions (typically ≥3.11), similar to how pythoncapi-compat works for the current C API.

PyNI universal extensions will be usable with older CPython versions (typically ≥3.11) through a compatibility package available on PyPI, following the approach demonstrated by HPy.

This design provides a clear migration path: developers can start by mixing PyNI with existing code, then gradually transition to pure PyNI for standard ABI benefits, and finally adopt the universal ABI for maximum compatibility when ready.

Planned Steps for CPython

TODO: general remark: it would be great to add a bit of time information. Are these steps planned for one Python version?

New versions of module initialization

TODO: PyNI_MyModuleInit? Returns a versioned struct with module specification. The version of the struct implies the required minimal PyNI version (or do we want a separate mechanism? a pre-hook?). PyNI_MyModuleInit should not call any APIs (it will not get the context argument). The “real” initialization should happen in the init/exec slots (but slots will be implemented in later steps, for the first step we will only load module builtins defined in the specification).

New versions of calling conventions for module builtins

The struct with module specification may contain “legacy” functions -- those will work and be handled by the VM exactly like the old C API. New functions will require new calling conventions that will have to be added to the VM (later when the “legacy” API will be rebuilt on top of the new API, we can get rid of the “old” calling conventions from the VM and keep only the “new” - there will be short period, when the number of calling conventions will double).

TODO: "a short period"? Short compared to what?

New versions of all API functions

For all C API functions (except ones we don't want to keep), we will add new functions that:

  • Take a context argument
  • Work with references rather than PyObject* pointers

GraalPy already used pycparser to get the existing C API. HPy already used pycparser to generate API wrapper functions. And HPy showed that we can create wrappers to call between the APIs, convert HPy handles to PyObject* and vice-versa, and create types with HPy-slots around the existing type slots. So we know that all of this is possible. The way we want to do it in CPython though is that CPython's own internal API should be divorced from the external API and we initially do not want to actually refactor neither internal code nor native extensions. So we cannot do some of the same shenanigans that HPy did to not lose performance in the first step. We will have to actually generate wrapper functions that do cost extensions some performance.

We define the PyContext and PyAny values similar to how HPy did it, as a pointer to an opaque struct and a ptr-sized struct, respectively:

#if defined(Py_BUILD_CORE) && !defined(Py_BUILD_CORE_MODULE)
struct py_context { PyThreadState *t; };
#endif
typedef struct PyAny { intptr_t i; } PyAny;
typedef struct py_context* PyContext;

Each currently exported (PyAPI_FUNC) function declaration is renamed with e.g. _internal suffix and marked as not exported (doable on MSVC+GCC+LLVM so maybe good enough?) and after it we add the previous declaration that is still exported and a define like:

#if defined(Py_BUILD_CORE) && !defined(Py_BUILD_CORE_MODULE)
#define Py_Foo(x) Py_Foo_internal(x)
#endif

Where the function is defined, we also rename it with _internal suffix, and generate a definition with the old name and signature that just calls into the _internal function directly. Due to the define above, CPython itself will now still use the old functions unchanged directly, and extensions now go through a trampoline.

Now we can also generate the PyNI_ functions for the API with context argument and PyAny instead of PyObject*. We could initially generate them to just have entry/exit code around the old functions:

  1. The PyContext* pointer is unpacked to a PyThreadState* and stored:
PyThreadState *ts_on_entry = PyThreadState_Swap(context->t);
  1. Any PyAny argument is converted to a PyObject*, using an internal function, so we can in the future convert from e.g. tagged integers or stackrefs at such points:
PyObject *py_object_argN = _PyObject_FromAny(py_ref_argN);
  1. The _internal function is called.

  2. The result is converted to the new-style API:

  • We use _PyAny_FromPyObject to return a PyAny

    ?) Do we do the unified return values here as proposed by Mark and Petr?

  • If an error occurred, then we report that via the TBD new error reporting scheme

This step can be done programmatically with a script to do the code adaptation. We should now be able to verify performance of the interpreter is unchanged and extensions should have only the overhead of one intermediate call so shouldn’t be too bad.

As an additional step to verify this is moving in the right direction, we can now implement some PyNI functions directly and take advantage of it in the interpreter. E.g. I would like to see BINARY_OP_ADD_INT to be able to pass stackrefs or tagged integers into PyNI_Long_Add and avoid boxing here, see if/how that shows up in a micro benchmark.

First PyNI module

At this point it should be possible to implement and showcase a simple PyNI module.

Declaring native types with methods

Type slots for native types

For all function type slots (except unused ones & legacy variants, like tp_setattr), we will add new slots that:

  • Take a context argument

  • Take the defining class

  • Work with references rather than PyObject* pointers

For the VM to be able to take advantage of these new slots and to still be compatible, the types need a flag to determine if they use the new-style slots. We can have a (potentially lazy) function to get a PyTypeObject* from a PyRef which generates the old slot wrappers around the new style slots. These old-style wrappers create a PyContext* using PyThreadState_Get and PyAny structures by just doing

PyAny py_any_argN = { .i = (intptr_y)py_object_argN };

(Initially this will not cost much except some stack space, but once the VM starts evolving to take advantage of the NI API, this may become more costly).

As a validation at this point, we can adapt the VM to take advantage of a new-style type with a nb_add slot (whatever we call it then) that takes e.g. a tagged integer as right hand side and works with that.

Declare old API as frozen

If this works so far, I think we could go and declare the old API as frozen (provided some of the things that are still in the review stage made it by then, like the new module initialization and making PyObject* opaque for the stable ABI) and start evolving the new API only (mostly by implementing NI API functions by hand instead of just wrapping the old APIs). As soon as that "old API wrapping" is over, we have neatly split the internal from the external APIs: The _internal old names can now evolve into whatever C Python needs, as far as the stable ABI is concerned it's no longer a problem (the external symbols no longer change). We can implement (potentially slower) "old-semantics" compatibility code in the old API exported functions without this affecting the interpreter and making everything worse for all Python programs. And we can start implementing the PyNI functions to take advantage of things we can stash in PyContext and PyAny values to gain performance with extensions. And since PyAny and PyContext are opaque, we can start with the Grand Unified Python Object Layout and the vtable in the context without affecting existing extensions.

Grand Unified Python Object Layout

The gist is: abstract the layout, make it declarative, accessor function to get the memory, we can leave out the details for next PEPs

See faster-cpython/ideas#553

  • Make most classes composable, including through multiple inheritance

  • Add API to declare fields; let the VM handle most finalization hooks (traverse, clear, dealloc)

    • [petr] +load/save API converting references from/to instance fields?

    • [petr] use the field declaration API for module state as well; replacing HPyGlobal?

vtable in context

The context may include a table of function pointers, with every API function call going through these pointers.

This would allow supporting multiple APIs in a single interpreter without recompilation.

An important use case is a “debug mode”, which does extra work to ensure correct reference handling and memory use.

A second use case is optimization where we give out extension specific specialized functions that do e.g. inline caching (think extensions that use CallMethodObjArgs)

Declare new API as stable

With the vtable we can be forward and backwards compatible just like HPy, but it still means work to keep old entries around. So IMO we should only declare the new API as stable once we feel we’re not immediately going to deprecate half the functions because we want to tweak semantics e.g. around error handling a little bit…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment