Skip to content

Instantly share code, notes, and snippets.

@cstockton
Created May 21, 2019 17:29
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cstockton/ee010c1fcda74680e158a7989b7c5b50 to your computer and use it in GitHub Desktop.
Save cstockton/ee010c1fcda74680e158a7989b7c5b50 to your computer and use it in GitHub Desktop.

TLDR, I want to propose / have a conversation around two things:

  1. Create a protocol / wire / some form of structural definition for distributed tracing. I urge the project owners to please reconsider having the libraries be the specification. I should not have to import an SDK to participate in distributed tracing, I should only need to be a cooperative client able to produce the data in the format agreed upon.
  2. The protocol / specification is designed to be 100% stateless. For lack of better words "eventual causality", I should know which set of spans are open and there should not be a consequence to a span which never was closed. Leave it to visualization and tooling to iron through the edge cases. Tracing is about observation across systems, a failed transaction, seg fault, crashing program, etc should not lose all the data that led up to that event as it does in the implementations I've used today (due to buffering log records and hanging onto unfinished spans in memory).

Where could I find more information about the higher level goals for the upcoming OpenTelemetry effort? From what I gather it seems to be focusing on defining common API's similar to what OpenTracing seems to do. I would like to open an issue to discuss all of the consequences of having the "specification" end up living in each languages API. In short, I think that before everyone starts implementing libraries, yet again, a well defined protocol should be created first. The current specification for OpenTracing is a document with a few verbs in it. The definition of a standard protocol or data format has been completed avoided in favor of having that live inside an API.

But does it really make sense to have every single tracing provider implement a language specific API instead of having a common protocol / data format? Given that each vendor already has to implement all of these various libraries, wouldn't it be less effort to provide a single ingest type that can accept the common protocol? Instead of implementing the "API" x N languages the vendor wishes to support, they provide 1 endpoint with 1 common, well defined format.

This leaves the best-in-class implementations to emerge, rather than be stuck with whatever interfaces are created by the initial contributors. While I'm sure the intentions are pure and the efforts may result in top notch software, it's impossible to satisfy the requirements of all systems or meet all use cases. I should not have to import a library to participate in the distributed tracing ecosystem, I should only need to produce the data for a well defined protocol.

I feel like some of the major design issues I've found with the current libraries will emerge while carefully designing a protocol to satisfy all the use cases. The biggest flaws with the current design of OpenTracing-Go for example:

  • Spans accumulate log records on the span object
  • Spans stay in memory until they are finished (perhaps only beacuse they hold log records..?)

There are several design flaws / warts in the current API's, but I strongly feel that the two above are many orders of magnitude more detrimental. First off they cause a loss of visibility for in-flight spans, you don't know a span is running until it's finished. This makes any type of dashboards, alerting or visibility into issues impossible. Next it creates an entire class of software issues:

Strains memory usage & bandwidth for any burst of requests, hot paths I had that had zero or amortized to zero allocations suddenly have dozens and dozens when you add a handful of spans or log entries. Each one is "held" until finished. You can't pool log records because they are potentially read after they are finished by the opentracing "driver", so a caller can't use a sync.Pool because calling Release some time after a span is finished does not guarantee it's done being read. You're at the mercy of the API.

It makes it impossible to have a top level span in your main() and produce independent child spans in blocking services. You're limited to using spans in exactly one case: short lived requests that minimal metadata, you have to cautiosly log because if you produce too many records you end up losing your entire span. "Span contains too many log records".

Because you buffer- you now have an entire class of engineering problems. Responsible software should define bounds, but what do you do when a log record is added that exceeds the upper bounds? There is no error propagation. Well some libraries don't check the current number of records and accumulate log records forever, so an innocent enough piece of software like the one below:

func startWork(...) {
  span_start()
  defer span_finish()
  for { job := job_next(); job_log_start(job); job_handle(job); job_log_end(job); }
}

Becomes a runaway buffer until the program runs out of memory or exits. Worst- if you rely on tracing for visibility into issues the entire span with all it's glorious millions of entries are discarded: "Span contains too many log records" - yes, discarded. Not truncated. But many some libs truncate, maybe some log a helpful log record at limit-1 saying "truncated N more messages". The problem here is that the implementation specific behavior exists- and it is only a problem when you take the approach that the implementations are the specification.

All of these problems go away if there are distinct events produced for spans and log records. If a protocol exists that defines event types (nomenclature unimportant, just for illustration):

  • span_open // span has been opened
  • span_close // span has been closed
  • span_oneshot // maybe for a use case to have a span event be it's own open and close I haven't discovered yet?
  • span_log // log record for span, emitted as an independent event, the delivery guarantees & consistency model would be defined in protocol

Now our problem goes away and an entire class of tooling opens up for closer-to-realtime visibility into open transactions. I can also start spans anywhere in my software, despite how long it may stay open or how many child spans it may produce. If there was a protocol defined that took this approach library authors could choose to use a library like opentracing-go that buffers log records in spans for whatever reason, but users like me could produce high performance / quality libraries that didn't.

@cstockton
Copy link
Author

For an example of an event specification I wrote before I actually even knew the term "distributed tracing" (4-6 years ago) which was used at my company see: https://github.com/godaddy/py-emit/blob/master/EVENT.md - it's dated and a protocol / spec for tracing today would emerge in a much different form I'm sure thanks to much smarter and more experienced people producing it.

But it illustrates that it is helpful to have a concrete data structure to observe. Right now you have to look at a large client libraries implemented across N languages and build a mental model of what you're actually representing. I think we can do better.

@cstockton
Copy link
Author

I was provided some links to the proto spec files, annotating the conversation here for tracking.

Sergey, thanks for the links. I think https://github.com/open-telemetry/opentelemetry-java/blob/master/sdk/src/main/proto/trace/v1/trace.proto#L200 - is what worries me here. The fact spans seem to carry child data directly.
I'll read up on the wc3 spec, the current java sdk and any other design documents I can get my hand on to try to fully understand the current design rationale before I raise any issues. Thanks for the links.
Sergey Kanzhelev
@SergeyKanzhelev
May 21 10:56
time events are not child spans. It’s like a measurements inside the span that are small enough to be attributed directly to the span and cannot be justified to be promoted to the child span. Like exception happened at this time of span’s life. Or single chunk download of multi-chunk transfer was completed.
Chris Stockton
@cstockton
07:25
Yea I know they are not child spans, but they are a child "thing" - which means they have to be held until the span is released. I also think https://github.com/open-telemetry/opentelemetry-java/blob/master/sdk/src/main/proto/trace/v1/trace.proto#L238 - should be avoided as well. I think spans should be broken down into distinct events, at the very least a "start" and a "stop" - as well as every single detail that happens between then. Clients should be able to stream out trace events with minimal buffering, it prevents an entire class of problems with the current approach. The second you accumulate buffers you have to either make assumptions or define upper bounds.
The proto makes sense for a logical view of a finished span, something I would receive as a result from a query by "traceid" or something a long those lines. A convenient structure to work with after the fact. But it doesn't look like something that should live inside trace clients for reasons mentioned above, that I expanded in my gist and could give further detail on.
Chris Stockton
@cstockton
07:31
That is concern number one about the implementation of clients, the second is the fact each language defines trace events by implementation. There is no formal specification for trace data structures. The API is the specification, so the interface provided by each language which will vary due to a number of factors. This means that vendors need to implement the "drivers" for every single language, which will all vary in subtle ways (if they didn't they wouldn't feel natural for the target language).
To me it would make way more sense to have a common structural definition of trace events, first it helps visualize tracing as developers have something concrete to digest. More importantly it would mean vendors could write ingest servers in their own preferred language / systems wrapped with whatever transports or protocols they wanted to. They don't need a python, rust, go, java, ... N - driver implementation in every language. Just one ingest to accept traces in the language they prefer.
Chris Stockton
@cstockton
07:39
Which means I don't HAVE to import the community trace client in order to send trace data. It also doesn't stop anyone from importing the currently planned clients for each language if they choose to. They just won't be forced to use the trace clients to participate in tracing. I would just like to have an open discussion about the design rationale here, what the benefits are to the current approach vs the ones I mentioned and any others.

@cstockton
Copy link
Author

Tigran Najaryan
@tigrannajaryan
08:47
@/all I have been recently thinking about what should be the long-term vision for OpenTelemetry Agent/Collector. I believe it is important to have a clearly articulated set of high-level goals and guiding principles for the product that we work on. I took a first stab at it and would highly appreciate feedback on it and whether you think it is aligned with the goals of the OpenTelemetry community.
Please see here: https://docs.google.com/document/d/10ujkphyRi2Eyv-5teUZ14m0SVPb74ZskoTNl7MgKM_U/edit#
Chris Stockton
@cstockton
09:27
@tigrannajaryan Where may I find documentation (perhaps an architectural overview) which outlines how this agent fits in with the greater OpenTelemetry project? Hard to comment on the individual aspects of the agent without understanding the system it serves. It also would help me better understand how the OpenTelemetry project is going to execute it's goals, specifically "We are creating a new, unified set of libraries and specifications for observability telemetry." - I want to understand where the software & specification intersections are. Which specification will this agent implement - if there is none, will one be created? Is the implementation going to serve as the specification?
Chris Stockton
@cstockton
09:33
I'm trying to understand if the agent is an interchangeable or required component- may I ship directly to an endpoint, or it must always proxy through an agent? Is there a language selected or is that open for debate? The remote configuration is a little alarming and I wonder if it's necessary, maybe hot config reloading may be a good start? Is there a place I could direct questions in a more structured format than here?
Paulo Janotti
@pjanotti
09:36
Agent/collector is not required. The plan is to leverage OC work on this area so the language will be go. The remote configuration shouldn’t be top priority but something down the road and disabled by default.
Tigran Najaryan
@tigrannajaryan
09:36
@cstockton the OC docs on Agent and Collector are probably a good starting point: https://opencensus.io/service/components/agent/ The OpenTelementry Agent is likely going to be inherited from OC Agent. The Agent is an interchangeable component, it helps with collection but you can also ship directly to the backend.
Please feel free to add questions in form of comments to the Google Doc I posted.
Chris Stockton
@cstockton
09:42
I see- this page helps a lot. If I'm interpreting this document correctly it seems that exporters are written in software and live inside the agent?
Tigran Najaryan
@tigrannajaryan
09:42
Yes, exporters are part of the agent codebase.
Chris Stockton
@cstockton
09:44
What lead to that design decision? Would there not be many more advantages to settling on an open specification for trace ingest?
If vendors already have to get involved and write software to participate- there isn't any advantages to having the agent implementation be the integration point. You have no specification, no protocol. Every single vendor must merge code into the agent, every single user must update their agents to use a new vendor. It's adding a middleman, a moderator. It's the opposite of open I feel.
There should be an agreed upon trace specification for trace data ingest- then vendors can implement their ingest at their own systems, in their own languages of choice. The agent could be useful for enrichment, sending to multiple vendors, and so on. A specification allows the tracing pipeline to be composable. Add a ingest endpoint that fronts kafka, that pushes to an agent that copys to multiple endpoints and so on.
Of course the agent could still do what it pleases, but it removes this single code base as the moderator of the entire tracing ecosystem and follows the well navigated path of open technologies: open specifications that anyone may implement.
Paulo Janotti
@pjanotti
09:48
Ingest: the idea is to support OSS formats widely used (Jaeger, Zipkin, Prometheus), plust OpenCensus and OpenTelemetry. Vendors come with their own code as exporters, the idea being that until the last hop you are not tied to any vendor.
Tigran Najaryan
@tigrannajaryan
09:49
@cstockton it is a very good point. It is one of the directions that we think the codebase should evolve: there will be a "core" which implements base functionality and there will be "contrib" with vendor specific receivers and exporters.
Chris Stockton
@cstockton
09:51
Should it really be an afterthought for a brand new collaborative project? Why not start from day 1, it would allow parallel efforts and best-in-class implementations of the spec to emerge naturally.
Tigran Najaryan
@tigrannajaryan
09:52
@cstockton we want to reuse the agent/collector from OC. This existed for a while now and is used in production by many companies. It is a good starting point.
Chris Stockton
@cstockton
09:52
You could write the jaeger-exporter that implemented the open specification, the zipkin-exporter, etc. All as independent efforts. The agent could accept traces and exposes metrics about them and still be useful.
Could they not continue to use it today?
Paulo Janotti
@pjanotti
09:56
That’s basically what they do today in OC service: the internal pipeline is all in OC format so the exporters don’t know what was the original format. It has a bit of conversion cost but in practice this is small.
@cstockton I agree that naturally a very good exporter will have a tendency to win, vendors then will receive data from that format. The plan is to have a very good exporter with OpenTelemetry data format.
Chris Stockton
@cstockton
09:59
So there is a data format? e.g. a specification for trace data ingest? I would like to review it. I have an issue open about this area and the project in general.
Paulo Janotti
@pjanotti
09:59
In the case of service we are starting based on OpenCensus format.
Chris Stockton
@cstockton
10:00
What I'm hearing is that OpenTelemetry is essentially OpenCensus?
Steven Karis
@sjkaris
10:01
The format will evolve from the discussions at https://github.com/open-telemetry/opentelemetry-specification I believe. A proto spec will be published, which will serve as the OT trace data format
Chris Stockton
@cstockton
10:03
Shouldn't that happen before locking in the project to the existing OpenCensus formats?
Steven Karis
@sjkaris
10:03
The service/agent will use OC as a starting point, but the goal is to implement the OT format. The service/agent will also implement other trace data formats (jaeger, zipkin, etc), so that we can provide maximum compatibility with already instrumented systems
Chris Stockton
@cstockton
10:03
Otherwise open technical debate / engineering rigor will be impossible - as any opposition can be silenced with "backwards compat" (a very valid argument).
OpenCensus is already a project- I'm having trouble understanding why OpenTelemetry is starting with all of the software from OpenCensus.
I would be fine with it- but when time comes to debate https://github.com/census-instrumentation/opencensus-proto/blob/master/src/opencensus/proto/trace/v1/trace.proto - will technical merit be accepted for changes? I know that someone on my team proposed a large incompatible protobuf change for a project in production it would be an immediate "no" - technical merit aside.
Chris Stockton
@cstockton
10:12
My concern is that an opportunity to reevaluate architectural decisions & project goals is going to be somewhere between difficult and impossible when the starting point is literally the same project. It resembles a deletion of OpenTracing and not the emergence of a next generation system using the experiences gained in this space. Maybe I'm way off. Maybe not. I guess the voice of one person can't have much affect here, I just really care about this space and was thinking it was an opportunity to improve some of the pain points of the current design. I appreciate the efforts in this though. Thanks for clarifying these things.
Steven Karis
@sjkaris
10:19
I think the ultimate OT spec is independent of the OC spec, so breaking/incompatible changes between OC and OT isn't a worry. By re-using the OpenCensus service codebase we get a lot of stuff for free, but the service itself isn't tied to the OC spec, the final OT spec will be used as the core. Also, the OC service is relatively new -- its still in Beta itself, and still is undergoing significant changes. I think the documents like the one @tigrannajaryan shared are a great place to share concerns/learnings from the previous projects, since its goal is to give guiding principals for the service implementation
Tigran Najaryan
@tigrannajaryan
11:16
@cstockton you have valid concerns. As regards to being able to propose a new data exchange protocol, this should not be problem at all. The idea is that the Agent is able to support many protocols both on the receiving and on the exporting side. So if anyone has an idea for a more efficient protocol that should be absolutely doable and should be possible to add to "contrib". You don't need to change existing protocols, you can add new one, so you keep compatibility for existing users but also have a path to suggest better protocols.
As for reevaluating the architecture you are right that using the existing codebase makes it more difficult to re-architect the internals of the Agent (e.g. how data is processed in the pipeline), but given that the OC codebase is relatively young refactoring the implementation hopefully is not insurmountable effort. As an example: right now I am working on a new, more consistent and flexible configuration format for OC Agent [1] and as I implement it I expect that I will be refactoring parts of core functionality as I go.
[1] - https://docs.google.com/document/d/1GWOzV0H0RTN1adiwo7fTmkjfCATDDFGuOB4jp3ldCc8/edit#
Ted Young
@tedsuo
11:45

@cstockton I see OpenTelemetry as the next “version” of both projects. An opprotunity to re-evaluate things, but also a lot of attention given to ensuring backwards compatibility and an easy migration frmo the existing project.

Since I’m coming from the OpenTraing side, my concerns so far have mostly been in evaluating the language interfaces. I haven’t gotten up to speed yet with the Agent/Collector side of things. I imagine a lot of us are feeling out parts of the existing projects which we have not previously worked on! It would be good to start having some community meetings to see each other and get oriented. And then write some orientation guides to help others when they arrive. :)

But! One goal that I believe is agreed upon: using one part of OpenTelemetry does not require you to use all of the other parts. You can use the interfaces without the SDK, and the SDK without the Collector. You can use the recommended data formats and wire protocols, or use your own. So you will never be “required” to run a sidecar, or something like that. Understanding where we need to provide loose coupling is an important part of designing the system correctly.
Chris Stockton
@cstockton
14:06
Thanks for the replies, it seems this is a pretty large effort with a lot of moving parts. These dialogs have helped me piece things together and build a mental model of things. I appreciate it.
_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment