Skip to content

Instantly share code, notes, and snippets.

@simonw
Created February 13, 2025 17:27
Show Gist options
  • Save simonw/a9d72e7f903417fb49e1d7a531ee8f97 to your computer and use it in GitHub Desktop.
Save simonw/a9d72e7f903417fb49e1d7a531ee8f97 to your computer and use it in GitHub Desktop.

2025-02-13T17:14:04 conversation: 01jm04wpgn2t4xwkydp4227seh

Model: gemini-2.0-pro-exp-02-05

Prompt:

git clone https://github.com/bytecodealliance/componentize-py
cd componentize-py
files-to-prompt . -c 

How does this work? Does it include a python compiler or AST trick of some sort?

Response:

componentize-py works by leveraging several technologies in combination, rather than a single, simple trick. It's not a traditional Python compiler in the sense of, say, Cython (which compiles Python to C). Instead, it's a tool that packages a Python application (with its dependencies) and some runtime glue into a WebAssembly component. Here's a breakdown of the key elements:

  1. WASI SDK (wasi-sdk, wasi-libc): componentize-py needs a WASI SDK (WebAssembly System Interface SDK). This SDK provides a C/C++ compiler toolchain (based on Clang) that can target WASI. Crucially, it includes a wasi-libc which is a C standard library implementation for WASI. Python (specifically, CPython, the standard implementation) is written mostly in C. componentize-py compiles CPython itself to WASI using this toolchain.

  2. CPython (as a WASI module): This is the heart of it. Instead of compiling your Python code directly, componentize-py compiles the CPython interpreter to a WASI module. This means you have a full-fledged Python interpreter running inside WebAssembly. The resulting WASI module includes a minimal standard library, libpython3.12.so, as well as the core C runtime (libc.so, and related libraries).

  3. Python Dependencies (as data): componentize-py takes your Python application and its dependencies and packages them up as data within the component. It effectively creates a virtual filesystem inside the WebAssembly component. This includes:

    • Your application's .py files.
    • Dependencies installed via pip (or found in the specified --python-path directories). This even works for packages with native extensions (like NumPy), provided you have a WASI-compiled version of those extensions. It searches standard locations for dependencies including site-packages, and can bundle these.
    • A trimmed-down version of the Python standard library. The build process includes a bundled python-lib.tar.zst that's extracted at runtime.
  4. WIT (WebAssembly Interface Types): WIT is a crucial part of the WebAssembly Component Model. It's an interface definition language (IDL) that defines how components interact. You provide a .wit file (or a directory containing WIT files) that describes:

    • Imports: What functions the component needs from the host environment (e.g., WASI functions like file I/O, networking).
    • Exports: What functions the component provides to the host (e.g., your application's entry points).
  5. wit-component and wasm-encoder: These Rust crates are used to construct the WebAssembly component. wit-component processes the WIT definitions and helps build the component structure, linking the core CPython module with any required WASI imports. wasm-encoder is a lower-level library for creating the binary .wasm file.

  6. component-init: This is a tool (and Rust crate) from the Bytecode Alliance (same as componentize-py). It handles the "pre-initialization" of the component. It links together the CPython module, WASI imports, and the user's Python code, performing actions before the main component logic runs. This step is crucial for setting up the environment, such as:

    • Initializing the virtual filesystem containing your app and its dependencies.
    • Making sure sys.argv is set up correctly.
    • Setting environment variables, like PYTHONPATH, to point to the embedded libraries.
    • Calling random.seed() to ensure fresh, non-deterministic random numbers if needed.
  7. Runtime Glue (componentize-py-runtime): componentize-py includes a small Rust library (componentize-py-runtime) that's compiled to WASI and linked into the component. This runtime handles:

    • Marshalling: Converting data between the WIT types (which are Wasm-native) and Python objects. This is where the call_import, to_canon_*, from_canon_*, init, make_list etc. functions you see in the lib.rs of the runtime come into play. These functions handle the low-level details of copying data between the Wasm stack and Python objects.
    • Resource Management: WASI Preview 2 uses "resource" types to manage handles to external resources (like files, sockets, etc.). The runtime helps manage these resources, ensuring they are properly created and released.
    • Calling Python Functions: The runtime provides the componentize-py#CallIndirect function, which is the core mechanism for calling Python functions from the WASI world. This is essentially an indirect function call within the component.
  8. WASI Preview 1 Adapter The generated component is designed to work with the new WASI Preview 2 interfaces. To allow it to run in existing WebAssembly runtimes, such as Wasmtime, that may only support WASI Preview 1, componentize-py includes a wasi_snapshot_preview1.reactor.wasm adapter.

Workflow Summary:

  1. Define the Interface (WIT): You write a .wit file defining the component's interface.
  2. Write Python Code: You write Python code that implements the exported functions defined in your WIT file.
  3. Build: componentize-py uses the WASI SDK to compile CPython to WASI, then uses wit-component and wasm-encoder to:
    • Package your Python code and dependencies.
    • Link everything together, including the runtime glue.
    • Create a Wasm component that exposes the functions defined in your WIT file.
  4. Pre-initialize: component-init prepares the component for execution.
  5. Run: You run the resulting .wasm file using a Wasm runtime like wasmtime. The runtime handles the WASI imports, and your Python code runs inside the CPython interpreter running within the component.

Key Takeaways:

  • It's not compiling Python to Wasm directly. It's running the CPython interpreter as Wasm.
  • It uses WIT to define the component's interface and capabilities.
  • It leverages the WASI SDK and related tools to build a self-contained, sandboxed component.
  • It handles the intricate details of marshalling data between the Wasm world and the Python world.
  • Dependencies are bundled as data, not as separately compiled Wasm modules (with the exception of any required WASI implementations).

This architecture allows for relatively straightforward porting of existing Python applications, even those using C extensions, to WebAssembly, as long as WASI-compatible builds of those extensions are available. It achieves a high degree of sandboxing and portability due to the nature of WebAssembly components.

2025-02-13T17:25:16

Prompt:

how does it differ from pyodide?

Response:

componentize-py and Pyodide both allow running Python in WebAssembly, but they have significantly different goals, architectures, and use cases. Here's a breakdown of their key differences:

1. Core Goal and Target Environment:

  • componentize-py: Designed to create standalone WebAssembly components that adhere to the WASI (WebAssembly System Interface) standards, particularly WASI Preview 2 and the Component Model. It's aimed at building server-side, CLI, and other non-browser applications that can run in any WASI-compliant runtime (like Wasmtime, Wasmer, or Node.js with the WASI polyfill). The focus is on portability and sandboxing across different environments. It's part of the Bytecode Alliance's effort to build a modular, composable ecosystem for WebAssembly.

  • Pyodide: Primarily focused on bringing Python to the browser. It compiles CPython (and a large scientific stack) to WebAssembly so that you can run Python code directly within a web page. It provides JavaScript interoperability, allowing Python code to interact with the browser's DOM, web APIs, and JavaScript libraries. Its main goal is to enable scientific computing and data analysis within the browser.

2. Architecture:

  • componentize-py: Creates a WebAssembly Component. This is a self-contained, modular unit that adheres to the Component Model specification. It bundles:

    • A compiled CPython interpreter (as a WASI module).
    • Your Python application code and its dependencies (as data).
    • A small runtime library (written in Rust) for handling interactions between the Wasm world (WASI) and the Python world.
    • Optionally, the WASI Preview 1 adapter.
    • The component communicates with the outside world through a well-defined interface described in WIT.
  • Pyodide: Builds a large WebAssembly module that includes:

    • A compiled CPython interpreter.
    • A significant portion of the Python standard library.
    • A large collection of scientific packages (NumPy, pandas, SciPy, Matplotlib, scikit-learn, etc.).
    • A foreign function interface (FFI) layer that provides bidirectional communication with JavaScript. This is a core feature.
    • Emscripten: Pyodide uses Emscripten as its build toolchain, which provides browser-specific APIs (e.g., for DOM manipulation) that are not part of standard WASI.

3. Dependencies:

  • componentize-py: Bundles your application's dependencies as data within the component. It expects that any dependencies with native code (like NumPy) are pre-compiled for WASI. You must have WASI-compatible builds of your dependencies. This contributes to the sandboxed nature of the component.

  • Pyodide: Includes pre-compiled versions of many common scientific packages within the Pyodide distribution itself. These are compiled specifically for use with Pyodide's Emscripten environment. It also provides a mechanism (micropip) to install pure Python packages and a limited set of packages with pre-built wheels from PyPI. Adding new packages with C extensions requires building them specifically for Pyodide, which can be complex.

4. Interoperability:

  • componentize-py: Interoperability is strictly defined through the WIT interface. Communication with the host (which could be written in any language) happens through the functions and types defined in the WIT. It's designed for language-agnostic component composition.

  • Pyodide: Provides deep, bidirectional interoperability with JavaScript. You can access JavaScript objects from Python and vice-versa. This is essential for interacting with the browser environment.

5. Use Cases:

  • componentize-py:

    • Serverless functions (e.g., on platforms like Fermyon Cloud, AWS Lambda with Wasm support).
    • CLI tools built as Wasm components.
    • Embedded systems (where a WASI runtime is available).
    • Plugins/extensions for applications that support Wasm components.
    • Sandboxed execution of untrusted Python code.
  • Pyodide:

    • Interactive data analysis and visualization in the browser.
    • Creating client-side web applications that leverage Python libraries.
    • Educational tools for teaching Python and scientific computing.
    • Building interactive documentation with executable Python examples.

6. Sandboxing:

  • componentize-py: Strong sandboxing. By default, the component has no access to the host system except what is explicitly granted through WASI interfaces. You define exactly what capabilities (file system access, networking, etc.) the component has.

  • Pyodide: Runs within the browser's sandbox, which provides security. However, Pyodide itself gives Python code considerable access to the browser's environment (DOM, Web APIs), which can be a security concern if running untrusted code. While the browser sandboxes Pyodide, the Python code within Pyodide has a lot of power within that sandbox.

7. Performance:

  • componentize-py: Generally, performance is good, especially with newer runtimes that optimize for the Component Model. The runtime overhead is relatively small.

  • Pyodide: Performance can be a concern, especially for computationally intensive tasks. While it's impressive that it runs in the browser at all, it's often slower than native Python due to the overhead of the Emscripten environment and the JavaScript interop. However, for many interactive use cases, it's "fast enough".

8. Asynchronous Programming

  • componentize-py: Uses Wasm futures. Fully embraces WASI Preview 2 and its asynchronicity. You write your WIT interface with async in mind.

  • Pyodide: Originally used a synchronous model, making asynchronous operations and JavaScript promises difficult to work with. More recent versions (especially with the introduction of pyodide.webloop) have significantly improved async support, allowing you to use async/await with JavaScript promises. However, the underlying execution model is still fundamentally different from componentize-py.

In summary:

Think of componentize-py as a way to build portable, sandboxed Python applications that can run anywhere WASI runs. It's a building block for serverless, CLI, and embedded scenarios.

Think of Pyodide as a way to run a Python environment within the browser, with tight integration with JavaScript. It's excellent for in-browser data science and interactive applications. It prioritizes browser integration over strict sandboxing or cross-platform portability in the same way componentize-py does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment