Skip to content

Instantly share code, notes, and snippets.

@tarekgh
Last active February 11, 2020 17:36
Show Gist options
  • Save tarekgh/01cdf2036e2908db44a589dac9792d96 to your computer and use it in GitHub Desktop.
Save tarekgh/01cdf2036e2908db44a589dac9792d96 to your computer and use it in GitHub Desktop.
Activity Proposed Additions

Activity Proposed Additions

OpenTelemetry has introduced the Span type which is very similar to System.Diagnostics.Activity. In general .NET libraries expected to use System.Diagnostics APIs to publish the tracing data. We need to support the scenario of allowing the .NET libraries exporting the published data through OpenTelemetry. As the OpenTelemetry exporting APIs work with the Span class, we need to ensure all features supported by the Span class can be achieved by the Activity class too and fill any gap between the Span and Activity.

Although Activity has some more properties than Span (e.g. Parent property), Span also support some properties which not exist in Activity. This document is listing these missing properties and proposing the APIs we need to add to Activity. Most of the proposed additions here are almost identical to what OpenTelemetry has.

Activity Missing Properties

Note

  • All proposed types will implement Equals, GetHashCode and ToString methods which not explicitly mentioned in the proposal and possibly we may override the == and != operators on some of them.

Events

Event is a type carry event name, optional event timestamp and optional attributes (key-value pair list). Activity will have a property called Events which will return the list of the attached events. Activity will provide a method called AddEvent to allow adding a new event to the Activity.

namespace System.Diagnostics
{
    public readonly struct ActivityEvent
    {
        // Constructors
        public ActivityEvent(string name) {}
        public ActivityEvent(string name, DateTimeOffset timestamp) {}
        public ActivityEvent(string name, IDictionary<string, object> attributes) {}
        public ActivityEvent(string name, DateTimeOffset timestamp, IDictionary<string, object> attributes) {}

        // Properties
        public string Name { get; }
        public DateTimeOffset Timestamp { get; }
        public IDictionary<string, object> Attributes { get; }
    }
}

Links

Link contains ActivityContext object which can be linked to the activity which being processed in the batch. Activity will provide a method called AddLink to allow adding a new link to the context.

namespace System.Diagnostics
{
    public readonly struct ActivityContext
    {
        public ActivityContext(in ActivityTraceId traceId, in ActivitySpanId Id, ActivityTraceFlags traceOptions, bool isRemote = false, IEnumerable<KeyValuePair<string, string>> tracestate = null) {}

        public ActivityTraceId TraceId { get; }
        public ActivitySpanId SpanId { get; }
        public ActivityTraceFlags TraceOptions { get; }

        // is a boolean flag which returns true if the ActivityContext was propagated from a remote parent.
        public bool IsRemote { get; }

        // is a boolean flag which returns true if the ActivityContext has a non-zero TraceID and a non-zero SpanID.
        public bool IsValid { get; }
    }

    public readonly struct ActivityLink
    {
        public ActivityLink(ActivityContext context) {}
        public ActivityLink(ActivityContext context, IDictionary<string, object> attributes) {}

        public ActivityContext Context { get; }
        public IDictionary<string, object> Attributes { get; }
    }
}

Status

Status is representing the activity execution status code. It contains the canonical code and optionally a description of the code. The Status type doesn't expose any public constructor and instead it expose a static properties which return a well defined Status objects for specific supported canonical codes.

namespace System.Diagnostics
{
    public enum ActivityCanonicalCode
    {
        Ok = 0,
        Cancelled = 1,
        Unknown = 2,
        InvalidArgument = 3,
        DeadlineExceeded = 4,
        NotFound = 5,
        AlreadyExists = 6,
        PermissionDenied = 7,
        ResourceExhausted = 8,
        FailedPrecondition = 9,
        Aborted = 10,
        OutOfRange = 11,
        Unimplemented = 12,
        Internal = 13,
        Unavailable = 14,
        DataLoss = 15,
        Unauthenticated = 16,
    }

    public readonly struct ActivityStatus
    {
        public ActivityCanonicalCode CanonicalCode { get; }
        public string Description { get; }

        // Convenient property which expected to be frequently used.
        public bool IsOk => this.CanonicalCode == CanonicalCode.Ok;

        // Create a new status object with CanonicalCode from this object and using a different description.
        public ActivityStatus CreateWithDescription(string description) {}

        public static readonly Status Ok => ...;
        public static readonly Status Cancelled => ...;
        public static readonly Status Unknown => ...;
        public static readonly Status InvalidArgument => ...;
        public static readonly Status DeadlineExceeded => ...;
        public static readonly Status NotFound => ...;
        public static readonly Status AlreadyExists => ...;
        public static readonly Status PermissionDenied => ...;
        public static readonly Status Unauthenticated => ...;
        public static readonly Status ResourceExhausted => ...;
        public static readonly Status FailedPrecondition => ...;
        public static readonly Status Aborted => ...;
        public static readonly Status OutOfRange => ...;
        public static readonly Status Unimplemented => ...;
        public static readonly Status Internal => ...;
        public static readonly Status Unavailable => ...;
        public static readonly Status DataLoss => ...;
    }
}

Open Questions

  • In OpenTelemetry side Status has IsValid property which can tell if the Status is one of the predefined canonical code status while Status public interface doesn't expose any public constructor and instead it expose static property returning specif status. Theoretically IsValid is not really needed.
  • OpenTelemetry has a method which allow creating a status object with one of the existing canonical codes but attaching a description. I don't think we need to do that and maybe we can has the description of all codes in our resources which will allow localization too at some point.
  • Do we need to have a public constructor allow creating some non standard status objects? I am thinking in the future we may have some codes that is not defined yet and having such constructor will allow using such code.

Kind

Describes the relationship between the Activity, its parents, and its children in a Trace. Activity will have a property that can set or get the activity kind.

    public enum ActivityKind
    {
        Internal = 1,   // Default value. Indicates that the span represents an internal operation within an application, as opposed to an operations with remote parents or children.
        Server = 2,     // Server activity represents request incoming from external component.
        Client = 3,     // Client activity represents outgoing request to the external component.
        Producer = 4,   // Producer activity represents output provided to external components.
        Consumer = 5,   // Consumer span represents output received from an external component.
    }

Activity Changes

namespace System.Diagnostics
{
    public partial class Activity
    {
        // Event Support
        public void AddEvent(in ActivityEvent event) {}
        public IEnumerable<ActivityEvent> Events {get;}

        // Link Support
        public void AddLink(in ActivityLink link) {}
        public IEnumerable<ActivityLink> Links {get;}

        // Status
        public ActivityStatus Status {get; set;}

        // Kind
        public ActivityKind Kind {get; set;}
    }
}
@reyang
Copy link

reyang commented Feb 7, 2020

OpenTelemetry has a method which allow creating a status object with one of the existing canonical codes but attaching a description. I don't think we need to do that and maybe we can has the description of all codes in our resources which will allow localization too at some point.

The idea is that we will have a default description string, and the user could provide a custom override in case more information is needed.
Here goes some Python code that we could refer to.

[Tarek] ok, I'll add a similar method WithDescription

@reyang
Copy link

reyang commented Feb 7, 2020

Do we need to have a public constructor allow creating some non-standard status objects? I am thinking in the future we may have some codes that are not defined yet and having such a constructor will allow using such code.

This is not clear to me. Could you elaborate the scenario?

[Tarek] what I mean is Status doesn't have a public constructor and the only way to create the Status is by the static methods which are limited to the predefined Canonical codes. I was wondering do we have a case someone wants to create a Status with code which not predefined codes.
In other words, would it make sense to to expose the constructor

        public Status(CanonicalCode canonicalCode, string description = null)

and callers can do something like:

        new Status((CanonicalCode) 777, "This is a custom code");

Although I don't recommend that but I wanted to raise the question to have more feedback on that.

@reyang
Copy link

reyang commented Feb 7, 2020

Describes the relationship between the Activity, its parents, and its children in a Trace. Activity will have a property that can set or get the activity kind.

The spec is here. We would want to change the comment from Internal = 1, // not specified. to something like Internal = 1, // Default value. Indicates that the span represents an internal operation within an application, as opposed to an operations with remote parents or children..

[Tarek] I'll update the comment.

@lmolkova
Copy link

lmolkova commented Feb 10, 2020

  1. Activity Context
    Second this. I suggest to merge it with Noah's proposal and introduce context. Instrumentation does not need to create Activity to represent context from HTTP request. Same goes to links - they should work with Context, not the Activity (activity needs to be started to have a SpanId and it's never clear what to use for links - ParentId or SpanId)

  2. Status.
    I suggest to work on this from OpenTelemetry side and rework approach to status as gRPC status codes makes no sense for anything except gRPC. I started it here: open-telemetry/opentelemetry-specification#306 but didn't have time to move it forward.
    I believe exception alone is not good enough: Exception message and type are not a good value to bucket errors on and it is language specific. Also not all errors are exceptions (e.g. http 404). At the same time, Status seems to be specific to the lib (e.g. Azure Storage SDK would perhaps benefit from "BlobAleadyExists")

  3. We'll need to understand how OTel will behave with new Activity:

    • Activity to SpanData: When new Activity is created it should not be transformed to Span by OTel. It should be possible to create SpanData out of it. I don't envision any serious issues here
    • Span to Activity: Users (at least for a while) should be able to create spans without too much inefficiency. Current issues (not sure if we have to resolve them completely though):
      • OTel supports mode (it's default) when it does not make Span Current by default. There are now hacks in OTel to work around the fact that activity is always current
      • How to get current Span from current Activity (it's ConditionalWeakTable now)

@SergeyKanzhelev
Copy link

@tarekgh is the overall approach proposed here is for Activity to catch up to what Span has? At some point of maturity of certain concepts of Span (like the concept of parent/child relationship) it is a great idea. However for other aspects it may be better to allow composition of Activity with Span. This way if Links will introduce a notion of "link type" - we will not need to wait for a new .NET release to introduce this concept.

Composition will also allow to use custom Span implementations. So one can implement feature like "child count" which requires to hold an additional integer counter on every Activity and on every new child Activity increment the counter.

Those are just an examples of what composition may help with

@tarekgh
Copy link
Author

tarekgh commented Feb 10, 2020

@SergeyKanzhelev

is the overall approach proposed here is for Activity to catch up to what Span has?

Yes but part of it. My proposal here is just adding the properties that exist in Span and missing from Activity which we should have it anyway. I understand your future scenarios which I agree with and it is on my radar. I already thought to have an extension model for activity in which you can associate some property name (as key in a dictionary) to some Object value. We can discuss the details of that. The question now is, do we need to support this extension model only or do we need to have this model and exposing the properties which Span currently has?

Are you back from your trip? if so, I can schedule some time we can discuss that in detail.

@noahfalk
Copy link

Events

I had forgotten earlier... A while ago @lmolkova suggested that Events may not be that important overall and in one of the OT comments she asked "perhaps after v1, will we regret having Events?"

If we are uncertain that APIs are going to stick around or be useful then we should be cautious about adding them to the BCL. Once those APIs are in there they we need to keep supporting them indefinately. If it winds up they aren't useful then that latent code and its implementing data structures will be a complexity cost and perf cost that every .NET application pays with no commensurate benefit.

If these APIs are still in a doubtful state I suggest we remove them from our proposal for now and add them later once we are more certain they are part of the long term roadmap.

@lmolkova
Copy link

lmolkova commented Feb 11, 2020

Events

From the current scenarios we have, we have some need to attach rich exceptions to spans (this is Azure SDK scenario) and we might have some need to track some other events (e.g. chunks of data/messages during communications that are not worth creating a new span - gRPC scenario).

The only problem with that is dependency on Microsoft.Extensions.Logging and if library doesn't have it, it has to leverage DiagnosticsSource/EventSource. At this point, we either have to come up with complicated convention or won't be able to have generic solution.

As far as I understand this is library concern only - services/application can use logging library of choice and we can support them with ILogger and give guidance how to correlate logs to Activities for other loggers.

I agree that removing events (with possibility to return them back later) is the most reasonable option at this point. Thanks for bringing it, @noahfalk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment