Skip to content

Instantly share code, notes, and snippets.

@ocelotl
Last active March 2, 2020 17:43
Show Gist options
  • Save ocelotl/bf7bfcbb407b995280b6ef48c3870015 to your computer and use it in GitHub Desktop.
Save ocelotl/bf7bfcbb407b995280b6ef48c3870015 to your computer and use it in GitHub Desktop.
Python Auto Instrumentation Design Specification

Overview

Auto instrumentation is a mechanism to produce the telemetry data of an uninstrumented application without modifying the original application code itself. It relies on patching libraries utilized by the application and running the application via a command line script:

auto-instrumentation-command python3 uninstrumented_program.py

When uninstrumented_program.py is run in this way, it displays the results like it had been instrumented beforehand. The practical benefit of auto instrumentation is of course to make it possible for the end user to save time and effort by not having to instrument existing code.

Example

In order to make this a bit more clear, here is a brief example. The complete example can be found here. This example has 2 services running: formatter and publisher. We have a script in hello.py that communicates with these services.

                 ----------------             ----------------
                |                |           |                |
hello.py. ----> | formatter:8081 | < ----- > | publisher:8082 |
                |                |           |                |
                 ----------------             ----------------

The publisher comes in 2 flavors: instrumented and uninstrumented. The most relevant part of these components is shown below:

Formatter

@app.route("/format_request")
def format_request():

    with tracer.start_as_current_span(
        "format_request",
        parent=propagators.extract(get_as_list, request.headers),
    ):
        hello_to = request.args.get("helloTo")
        return "Hello, %s!" % hello_to

Instrumented Publisher

@app.route("/publish_request")
def publish_request():

    with tracer.start_as_current_span(
        "publish_request", propagators.extract(get_as_list, request.headers)
    ):
        hello_str = request.args.get("helloStr")
        print(hello_str)
        return "published"

Uninstrumented Publisher

@app.route("/publish_request")
def publish_request():
    hello_str = request.args.get("helloStr")
    print(hello_str)
    return "published"

Hello Script

with tracer.start_as_current_span("hello") as hello_span:

    with tracer.start_as_current_span("hello-format", parent=hello_span):
        hello_str = http_get(8081, "format_request", "helloTo", hello_to)

    with tracer.start_as_current_span("hello-publish", parent=hello_span):
        http_get(8082, "publish_request", "helloStr", hello_str)

The instrumented publisher is first run like this: python3 publisher_instrumented.py and it produces output similar to this one when the "hello" script, hello.py is run:

Hello, testing!
Span(name="publish", context=SpanContext(trace_id=0xd18be4c644d3be57a8623bbdbdbcef76, span_id=0x6162c475bab8d365, trace_state={}), kind=SpanKind.SERVER, parent=SpanContext(trace_id=0xd18be4c644d3be57a8623bbdbdbcef76, span_id=0xdafb264c5b1b6ed0, trace_state={}), start_time=2019-12-19T01:11:12.172866Z, end_time=2019-12-19T01:11:12.173383Z)
127.0.0.1 - - [18/Dec/2019 19:11:12] "GET /publish?helloStr=Hello%2C+testing%21 HTTP/1.1" 200 -

The uninstrumented publisher is now run like this: opentelemetry-auto-instrument python3 publisher_uninstrumented.py and, again, it produces output similar to this one when the "hello" script, hello.py is run:

Hello, testing!
Span(name="publish", context=SpanContext(trace_id=0xd18be4c644d3be57a8623bbdbdbcef76, span_id=0x6162c475bab8d365, trace_state={}), kind=SpanKind.SERVER, parent=SpanContext(trace_id=0xd18be4c644d3be57a8623bbdbdbcef76, span_id=0xdafb264c5b1b6ed0, trace_state={}), start_time=2019-12-19T01:11:12.172866Z, end_time=2019-12-19T01:11:12.173383Z)
127.0.0.1 - - [18/Dec/2019 19:11:12] "GET /publish?helloStr=Hello%2C+testing%21 HTTP/1.1" 200 -

As you can see, both outputs are very similar, which means that auto instrumentation does the same as manual instrumentation.

Implementation

The Python auto instrumentation mechanism consists of the command line interface, entry points and the patchers.

Command Line Interface

The command line interface provides the opentelemetry-auto-instrument command. That command is defined in a specific entry point named console_scripts (more on entry points later). This entry point is currently part of the API package [1], and when the API package is installed, the command is available in the console. This command is implemented as a Python function that executes when the opentelemetry-auto-instrument script is run. Here is the aforementioned function:

def run() -> None:

    bootstrap_dir = dirname(__file__)
    python_path = environ.get("PYTHONPATH", None)

    # Add our bootstrap directory to the head of $PYTHONPATH to ensure
    # it is loaded before program code
    if python_path is not None:
        environ["PYTHONPATH"] = join(bootstrap_dir, python_path)
    else:
        environ["PYTHONPATH"] = bootstrap_dir

    python3 = which(argv[1])
    execl(python3, python3, *argv[2:])  # type: ignore

This function does 2 important things:

  1. Adds the path to a directory to the environment variable PYTHONPATH (more on this in the next section).
  2. Runs python with the arguments passed to the script. For example, opentelemetry-auto-instrument publisher_uninstrumented.py, will call python publisher_uninstrumented.py.

The directory in PYTHONPATH

The function called by the script inserts the auto_instrument directory at the beginning of the PYTHONPATH environment variable. This directory contains 3 files:

  1. __init__.py (irrelevant at this moment)
  2. auto_instrument.py (which holds the previous function)
  3. customize.py (which is now the relevant file)

The sitecustomize.py file is executed before Python begins to execute publisher_uninstrumented.py. This is a mechanism provided by site, a package in the Python Standard Library. This allows the auto instrumentation mechanism to tap into the Python execution order to run code before anything else.

The sitecustomize.py File

The code in this file runs before the uninstrumented code. Before continuing, let's first have a short explaination on entry points

Entry Points

We have mentioned entry points before in this document. Nevertheless, we need to explain them better now.

Python provides a standard system to install packages, similar to how other languages do. A Python package may define an entry point and itself or other packages can implement these entry points. For example, the API package defines the entry point opentelemetry_patcher here, as you can see, an entry point is simply a string. The opentelemetry_patcher entry point is implemented by another package, the opentelemetry-ext-flask package [2], here. Here is the implementation of the entry point:

entry_points={
    "opentelemetry_patcher": [
        "flask = opentelemetry.ext.flask:FlaskPatcher"
    ]
},

This implementation of the entry point is named flask and it is just a path to a Python object, in this case a class [3], FlaskPatcher.

When a package is installed, its entry points implementations are registered against the definition of the entry points. Once this is done, the entry points library allows the user to load the objects pointed to by the entry point paths in the different implementation of the entry points, along with their names. For example, when the entry point implementation named flask is loaded, it will return the FlaskPatcher class.

Ok, back to the sitecustomize.py file. Here is its relevant content:

for entry_point in iter_entry_points("opentelemetry_patcher"):
    try:
        entry_point.load()().patch()  # type: ignore
        _LOG.debug("Patched %s", entry_point.name)

    except Exception:  # pylint: disable=broad-except
        _LOG.exception("Patching of %s failed", entry_point.name)

The code in this file basically iterates through all the entry point implementations that were registered against the opentelemetry_patcher entry point and then calls the load function on them. When this load function is called, it returns the object pointed to by the entry point implementation path. In our example, the load function will return the FlaskPatcher class when the flask entry point gets iterated by.

The most important line here is this one:

entry_point.load()().patch()

When it is the flask entry point turn, this happens:

FlaskPatcher().patch()

Since now we have a pair of parentheses at the right of FlaskPatcher, the class gets instantiated into a FlaskPatcher object:

flask_patcher_object.patch()

Finally, the patch method is called.

The idea of using entry points as explained in this section is that we can add patchers dynamically, this means, without having to modify any of the code of the core auto instrumentation system. Each patcher comes in a package, and we only need to install it for it to be loaded, there is no need to touch the loader code every time we want to add a new patcher. This is the standard way of doing this in Python and it is a very powerful mechanism to keep the different components of our system cleanly separated.

The Patchers

In the previous section we could read how the patch method was called for every opentelemetry_patcher entry point implementation (in other words, for every patcher object). Let's take a look at what a patcher is.

The BasePatcher Class

Every patcher (just as FlaskPatcher before) is an object of a child class of BasePatcher which can be found here.

This class is simply an interface (or in Python terms, an Abstract Base Class, or ABC) that requires that its children define a patch and an unpatch methods:

class BasePatcher(ABC):
    """An ABC for patchers"""

    @abstractmethod
    def patch(self) -> None:
        """Patch"""

    @abstractmethod
    def unpatch(self) -> None:
        """Unpatch"""

This base class exists in the API package and serves as an interface for all the patchers, like FlaskPatcher before.

Now, patchers are expected to do monkey patching on their respective frameworks. The FlaskPatcher basically replaces the flask.Flask class with another:

class _PatchedFlask(flask.Flask):
    ...

class FlaskPatcher(BasePatcher):
    ...

    def _patch(self):
        self._original_flask = flask.Flask
        flask.Flask = _PatchedFlask

    ...

So, when the FlaskPatcher.patch method is called when the entry point implementation is loaded, the flask.Flask class is replaced with _PatchedFlask. Every patcher has a different way to implement patching because of the differences between their corresponding frameworks. The actual instrumentation is done there, for example, by modifying functions in these frameworks so that they are enclosed in OpenTelemetry spans.

Keep in mind that all this happens before the code in publisher_uninstrumented.py gets executed. By the time it does, flask.Flask has already been changed into another class that does instrumentation, and magically, our uninstrumented code is instrumented now.

Final Considerations

So far, most of what has been explained here (with the exception of the opentelemetry-ext-flask package) can be placed in the opentelemetry-python repo. There exist already several patcher packages provided by DataDog and SignalFX that perform patching for different frameworks like Django, Requests, GRPC, PyMongo, etc. These components apparently provide a similar interface than the one explained here, with a patch or instrument methods that do the actual patching. It should be relatively (probably there will be a lot of details to consider) straightforward to incorporate this code in classes that have patch and unpatch methods instead.

Both DataDog and SignalFx have also implemented an auto instrumentation mechanism that works in a very similar way to the one explained before, both provide a command through the console_scripts entry point, both use sitecustomize.py to tap in the execution order. The DataDog repo is much larger and also includes a lot of code to handle other things like threading considerations.

Provided there is agreement on this approach, relevant code could be ported from both repos (and other community sources) on a feature by feature basis.

[1]The location of the console_scrips entry point for this command is expected to change and to be placed in an auto-instrumentation-specific package soon.
[2]The opentelemetry-python repo contains several Python packages, the opentelemetry-api package, the opentelemetry-sdk package, etc.
[3]Everything is an object in Python, even classes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment