ocelotl/configuration_proposal.rst

## configuration_proposal.rst

      
    Raw
  

              configuration_proposal.rst
            
          
    TLDR

We have been mixing up two things that are actually separate:

The pipeline configuration (the creation of providers, meters, etc.)
This should take its values from a YAML configuration file.
The configuration of the SDK (using a Configuration object)
This should take its values from environment variables only.

If we do this:

There is no overriding of YAML file values with environment variables with all the issues this causes.
There are no breaking changes.
We can support environment variable substitution in the YAML file if we want.
We can support overriding of YAML file values with anothe YAML file.


Limit the scope of this project

Here is an example of a typical application coded in Pyhton that adds telemetry
manually:
from sys import argv

from requests import get

from opentelemetry import trace
from opentelemetry.propagate import inject
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    BatchSpanProcessor,
    ConsoleSpanExporter,
)

trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer_provider().get_tracer(__name__)

trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(ConsoleSpanExporter())
)

assert len(argv) == 2

with tracer.start_as_current_span("client"):

    with tracer.start_as_current_span("client-server"):
        headers = {}
        inject(headers)
        requested = get(
            "http://localhost:8082/server_request",
            params={"param": argv[1]},
            headers=headers,
        )

        assert requested.status_code == 200
Notice these lines above:
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer_provider().get_tracer(__name__)

trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(ConsoleSpanExporter())
)
Similar lines can be found in pretty much any application that uses
OpenTelemetry, lines where traces, meters, processors, exporters, etc. are
created and configured.
Just to make this point easier to discuss, let's name these lines something,
let's name these lines the pipeline.
As far as I can understand, this project aims to introduce a mechanism to
create and configure the objects in the pipeline by using a YAML file that
when processed runs code that instantiates some objects (traces, meters,
processors, exporters, etc.).
In other words, this project aims to introduce a mechanism to do the exact
same thing a user can already do manually by adding the code above to the
user application by doing that in some other way. This means what we are
trying to do with this project is just a new mechanism to do what can currently
be done by hand.
If the statement above is true, then we have a problem. We are naming this
project OpenTelemetry Configuration. This name is misleading to many (it
was for me at least), because it is easy to think this project is a general
purpose configuration mechanism (something that can also configure the SDK,
for example).
To avoid this confusion, I propose we rename this project to something else
that is more specific. If pipeline is a good name, I suggest we rename
this project to Pipeline Configuration.
This being said, I understand other may have had more ambitious goals like
creating an object that would abstract all configuration in an object that
we can use in our SDKs, something like this:
## There is some value defined in an environment variable:
## OTEL_SOME_TIMEOUT_VALUE == 45

## There is also some value defined in some configuration file:
## exporter_endpoint: "http:abc.xyz"

from opentelemetry.configuration import Configuration

# This object abstracts both the environment variable and the configuration
# file:
configuration = Configuration()

some_exporter = SomeExporter(
    timeout=configuration.some_timeout_value,
    endpoint=configuration.exporter_endpoint
)
This would be great and very convenient, but it can be a next step. In fact
there is probably a lot that needs to be defined first before we can consider
the definition of such a configuration object (how will boolean values be
treated, same with integers or floats, etc.).
This approach (of limiting this project to the configuration of the pipeline)
has an advantage: it automatically solves the very complicated issue of
environment variable overriding.
Right now, every SDK is doing something with the environment variables, I
don't know exactly what but they are doing something. If we limit this project
to just pipeline configuration which is by definition equivalent to what any
user can already do manually then there is no environment variable overriding
problem to solve: If a user decides to use this pipeline configuration
mechanism, then what would happen with any environment variable that has been
defined would be the exact same thing that would have happened if the user had
decided not to use the pipeline configuration mechanism and had created the
same objects the pipeline configuration mechanism would have created if used.
If we decide to limit this project to just pipeline configurtion, then there is
no major issue with this proposed pipeline configuration project and the
SDKs can happily continue to do what the have been doing so far undisturbed.
Just a small comment here: it would be convenient for the user to know what
objects are being created by the pipeline configuration mechanism. I have
suggested (in a very Python-biased way) that add a feature that would print the
code that would be equivalent to the manual creation of the pipeline objects.
Another approach with the same intention could work better in some other
languages.
Also, we can also support environment variable substitution in the config file,
allowing the users to do so if they want to.

Specify environment variable and parameter relationship

I believe we have two problems with environment variables that:

Need to be solved at some point in time
Do not need to be soved right now for this project


First Problem

Right now our specification defines this environment variable:
..

OTEL_EXPORTER_OTLP_ENDPOINT
This environment variable defines the endpoint for an exporter. Now, an
exporter is usually defined as a class that receives an endpoint as a
parameter, something like this:
class SomeOTLPExporter:

    def __init__(self, endpoint: str = None):
        self._endpoint = endpoint
Now, if we want to be able to use the environment variable for this exporter we
first need to define what will have precedence, the argument passed to the
exporter class or the environment variable?
Let's say the argument will have precedence:
from os import environ

class SomeOTLPExporter:

    def __init__(self, endpoint: str = None):
        self._endpoint = endpoint or environ.get(
            "OTEL_EXPORTER_OTLP_ENDPOINT"
        )
Nice. Now, I think the specification does not define this precedence and it
should. If we decide to limit the scope of this project as explained before we
can solve this parameter vs environment variable precedence problem
independently of the pipeline configuration mechanism.

Second Problem

Clearly defining this relationship also solves the second problem. Any user may
see OTEL_EXPORTER_ENDPOINT and wonder which of all exporters this
environment variable applies to. But, if this environment variable is clearly
defined as what the value that will be used in any OTLP exporter class endpoint
parameter if the parameter is not specified then it all becomes clear for the
user.
It should be noted here that not all parameters can be associated to an
environment variable. The ones that can be are the ones which can have scalar
values. Nevertheless, I think we should be able to define an environment
variable for every complex

Define the behavior of confusing environment variables

After doing what was said before, we can define a configuration object. This
object will be an abstraction on top of environment variables only.
Let's consider an example:
class SomeOTLPExporter:

    def __init__(self, secure: bool = False):

        environ_secure = convert_from_string_to_bool()
            environ.get(
                "OTEL_EXPORTER_OTLP_SECURE"
            )
         )

        self._secure = secure or environ_secure
I would be nice to have an object that does all this conversion, something like this:
# OTEL_EXPORTER_OTLP_SECURE = true

configuration = Configuration()
# The configuration object automatically converts the "true" string to the true
# boolean value:
configuration.otel_exporter_otlp_secure == true
configuration.otel_exporter_otlp_secure =! "true"

Use another YAML file to override another one

Users may want to do overriding of a YAML file with something else, I suggest we use another YAML file.