Skip to content

Instantly share code, notes, and snippets.

@eswdd
Last active August 29, 2015 14:05
Show Gist options
  • Save eswdd/44aba96d4ce46e89f79a to your computer and use it in GitHub Desktop.
Save eswdd/44aba96d4ce46e89f79a to your computer and use it in GitHub Desktop.
Cougar filters - detailed sketch
/**
* Base utils for building ExecutionContexts
*/
public abstract class BaseExecutionContextBuilder<T extends BaseExecutionContextBuilder> {
private BitSet whatsSet = new BitSet(MAX_BASE_BITS+getNumSpecificComponents());
protected GeoLocationDetails location;
protected RequestUUID uuid;
protected Date receivedTime;
protected Date requestTime;
protected boolean traceLoggingEnabled;
protected int transportSecurityStrengthFactor;
protected abstract int getNumSpecificComponents();
protected final static int MAX_BASE_BITS = 6;
public T setLocation(GeoLocationDetails location) {
this.location = location;
set(0);
return (T) this;
}
public T setRequestUUID(RequestUUID uuid) {
this.uuid = uuid;
set(1);
return (T) this;
}
public T setReceivedTime(Date receivedTime) {
this.receivedTime = receivedTime;
set(2);
return (T) this;
}
public T setRequestTime(Date requestTime) {
this.requestTime = requestTime;
set(3);
return (T) this;
}
public T setTraceLoggingEnabled(boolean traceLoggingEnabled) {
this.traceLoggingEnabled = traceLoggingEnabled;
set(4);
return (T) this;
}
public T setTransportSecurityStrengthFactor(int factor) {
transportSecurityStrengthFactor = factor;
set(5);
return (T) this;
}
protected void beenSet(int subComponent) {
set(MAX_BASE_BITS + subComponent);
}
private void set(int bit) {
if (whatsSet.get(bit)) {
throw new IllegalStateException("Component has already been set");
}
whatsSet.set(bit);
}
protected void checkReady() {
if (whatsSet.nextClearBit(0) <= getNumSpecificComponents()) {
throw new IllegalStateException("Not all components have been set on this execution context");
}
}
}
public enum DehydratedExecutionContextComponent {
Location,
IdentityTokens,
RequestUuid,
ReceivedTime,
RequestedTime,
TraceLoggingEnabled,
TransportSecurityStrengthFactor
}

Requirements

  • ExecutionContext must remain immutable once passed to the ExecutionVenue
  • Maintain transport independence/agnosticism for services:
  • Execution context interface must be the same for all transports (within a paradigm)
  • Filters must support all transports within a paradigm (including in-process)

Aims

  • Enable Zipkin integration (tracer calls and resolution of ids)
  • Server transports initially, but will need to consider client (at least for zipkin)
  • Cleanup of existing transport code would be a bonus
  • Provides the option to change how the ExecutonContext is resolved
  • Doesn't try to deal with ExecutionContext extension

Context

Cougar users have long been requesting a filters capability that would allow them access to the http request (in the case of Jetty transport) and a bucket o' values in the ExecutionContext to put stuff, and for just as long we've been rejecting it on the following basis:

  • We want to maintain Cougar's key selling point - transport independence with the interface wholely defined in an IDD

However, a recent request for filters was made in the context of providing tracing integration with Zipkin, and although our stance remained the same to begin, we soon realised that there was scope for a filters capability which provides protocol specific implementations of transport independent features (as opposed to protocol specific features).

Andre Pinto has put together a sketch of what a filters capability for Cougar might look like, which provides hooks for pre- and post- command execution.

Note, this gist doesn't try to address the idea of ExecutionContext extension, as I don't believe it is currently necessary and also to be a hard problem in terms of its implications.

Proposal

In addition to Andres sketch, which provides hooks for pre- and post- command execution, I'd like to suggest we rework execution context resolution such that plugins can provide alternate implementations from those currently supported in the transports (mostly in ExecutionContextFactory) which would enable more flexibility that current mechanism allow (for example you may want to resolve data from more than 2 headers for building request uuids).

This would entail the following changes:

  • Introduce an ExecutionContextBuilder, which provides a mutable interface for constructing an ExecutionContext
  • ExecutionContextFactory would delegate to this builder, but we'd remove from it the methods which take headers.
  • Introduce a new interface ExecutionContextResolver<In>(?), where each transport defines what type <In> is (c.f. IdentityTokenResolvers which provides 2 methods:
  • getComponents():ExecutionContextComponent[] - allows the resolver to declare what parts it can resolve
  • resolve(ExecutionContextBuilder builder, In request, ExecutionContextComponent[] allowed):void - requests that the resolver resolve those components it has been selected to resolve (if it does more or less there will be an error)
  • EC resolution will be setup at startup such that there is exactly one resolver for each part of the EC. Resolvers will always be called if they have been registered so that they have an option to error if they don't get enough - which would neatly affect health service calls too).
  • Would possibly need to have a ResolverFactory which returns the appropriate one for each transport. Should mandate that resolver used for one part of EC is the same for all transports.

Changes to be made alongside this

  • Rename ExecutionContextWithTokens to DehydratedExecutionContext - makes it obvious that we're hydrating when we resolve ID tokens to IDs rather than mutating EC and I never liked the name anyway. Perhaps rework so both this and EC inherit from a base iface.
  • Rework tracer and ID token resolution into filters / EC resolvers
  • Provide default resolver which resolves EC in the way we currently do now

In-process calls

  • Introduce an InProcessExecutable, which acts as both client and server transport. Make default namespace on clients "InProcess" or similar, and then when they initialise, make a call to ensure that the InProcessExecutable is registered for the service interface that instance represents (ie register if not already done so). This would ensure we have a new request uuid etc for in process calls.

Client support

  • Something similar to the IdentityTokenResolver's rewrite() method, but we would need a seperate set of resolvers per client. Perhaps have a default set of resolvers that all clients use and then a seperate list of overrides to construct the specific set with.

Applicability to Zipkin

This details how the Zipkin integration would work in the context of this proposal:

  • Zipkin would provide a ZipkinRequestUUID iface which extends from RequestUUID which provides access to numeric ids.
  • Zipkin would provide a ZipkinRequestUUIDGenerator iface which extends from RequestUUIDGenerator which constructs extended uuids it needs.
  • At startup the plugin would validate that the configured generator implemented this interface.
  • Plugin would provide an EC resolver providing resolution for the request uuid. This would obtain the extra headers containing the numeric ids and then construct a zipkin uuid using the configured generator. If it was called and it was not requested to construct the uuid then it could error or warn as appropriate.
  • Similarly it would provide an EC writer for the client, we'll have to work something out so it's easy to register default writers for all clients so zipkin can be used for all clients without having to register with each individually.

Assuming that the tracer SPI has been completed, zipkin would also provide a Tracer implementation, and the plugin's spring context could auto-wire itself in as a tracer into the configured CompoundTracer.

@andredasilvapinto
Copy link

What do you practically mean with "not dealing with ExecutionContext extension"? Does that mean we are not supposed to try to solve the request scoped data storage problem in a generic way with this solution?

What is in practice an ExecutionContextComponent and how is it represented?

requests that the resolver resolve those components it has been selected to resolve (if it does more or less there will be an error)

How would you implement this constrain?

Zipkin would provide a ZipkinRequestUUIDGenerator iface which extends from RequestUUIDGenerator [I assume you are talking about UUIDGenerator] which constructs extended uuids it needs

If we already have a ZipkinRequestUUID with numeric ids why would Zipkin need to extend UUIDGenerator if it can't benefit from any of this interface's methods? validateUuid is inadequate if we consider Zipkin uses separate and numeric ids, not merged strings, and getNextUUID is not really about Zipkin data but a Cougar requirement for UUID which Zipkin doesn't use.

Is it just because of the cougar.client.uuidgenerator property? In that case it will not solve the problems I referred on the email (someone might just implement a cougar.client.uuidgenerator not compatible with Zipkin) and will still make it impossible for any Cougar that uses Zipkin to have a different system to generate the UUID (which Zipkin actually doesn't enforce in any way). The first problem could be solved by removing the support for a configurable cougar.client.uuidgenerator but that would break backwards compatibility, the second one seems impossible to solve if the Zipkin generator must absolutely be an UUIDGenerator. If you really want to implement this on RequestUUID then wouldn't it be better to handle Zipkin ids generation only in ZipkinRequestUUID (or a non-configurable class used by this class) and delegate the UUID to a traditional Cougar generator set by cougar.client.uuidgenerator? In my opinion, it seems that you are forcing Zipkin data into a Cougar construction when it doesn't really fit in there.

Also, even though you have the ZipkinRequestUUID, this doesn't solve the problem of accessing the numeric ids via the provided interface on ExecutionContext, as it only returns the RequestUUID which only supports Strings. So still reflection or casting would be required. Additional validations might grant partial integrity for these expectations to actually not trigger runtime errors, but still it is only runtime validation, not enforced by compile time contracts and possibly more complex and harder to maintain.

I think your proposal has the right intentions to bring flexibility to Cougar and it is surely nice to be able to define how the ExecutionContext primitives are actually built in a composable way, but, in practice, it will maintain the current request scoped data limitation, the casts/reflection requirement for Zipkin, the uuidgenerator inadequacy and the isolation problem between UUID and Zipkin Ids generation. It will still require the additional Filters constructions to access the data, while removing from them the needed control over the data they use, and end up trimming the flexibility to the small number of components already thought for a generic Cougar implementation and, even those, only under the inflexible constrains of their interfaces.

I appreciate your effort in trying to bring flexibility and improvement while keeping backwards compatibility in the ExecutionContext but I'm afraid any solution that doesn't actually address the ExecutionContext will come out as limited, contrived and unnecessarily complex. In any case, as I said before, if we really can't change the ExecutionContext and this is the way to go in order to implement Zipkin, then let's do it.

P.S.: I didn't understand InProcessExecutable role in all this. Could you clarify please?

@eswdd
Copy link
Author

eswdd commented Aug 15, 2014

What do you practically mean with "not dealing with ExecutionContext extension"? Does that mean we are not supposed to try to solve the request scoped data storage problem in a generic way with this solution?

Correct.

What is in practice an ExecutionContextComponent and how is it represented?

See DehydratedExecutionContextComponent - it is a property of a DehydratedExecutionContext

How would you implement this constrain?

See BaseExecutionContextBuilder

Regarding Zipkin and RequestUUIDs, there is rational behind forcing one to use the other:

  1. Logically I see Zipkins numeric ids as request ids, I don't think that's a hard case to argue.
  2. RequestUUID is already a core Cougar concept, and so must always be present in an ExecutionContext
  3. Because it's a core Cougar concept, sub-uuids are already handled in various places (e.g. in the client where sub-uuids are created). Zipkin also needs sub-uuids and so needs handling at the same points.
  4. Because the string formatted ids are required for human consumers, we still need to produce RequestUUID instances that provide those strings.
    Certainly, it may make more sense for a Zipkin generator to call the standard one, rather than extend, which as you say gives people the flexibility to change the string format without breaking Zipkin.

So still reflection or casting would be required.

Casting I would presume. This was anticipated and expected.

Additional validations might grant partial integrity for these expectations to actually not trigger runtime errors, but still it is only runtime validation, not enforced by compile time contracts and possibly more complex and harder to maintain.

I do not see this as a concern. So much of what happens in Cougar is verified at runtime, and at least with this proposal I can ensure it's done at startup rather than request time.

I think your proposal has the right intentions to bring flexibility to Cougar and it is surely nice to be able to define how the ExecutionContext primitives are actually built in a composable way, but, in practice, it will maintain the current request scoped data limitation, the casts/reflection requirement for Zipkin, the uuidgenerator inadequacy and the isolation problem between UUID and Zipkin Ids generation.

Maintenance of that limitation is an express purpose of this proposal, I still contest that Zipkin doesn't need to break out of that limitation, and that the limitation is a long-held feature of Cougar, albeit one that most consumers (certainly within Betfair) don't see.

It will still require the additional Filters constructions to access the data, while removing from them the needed control over the data they use, and end up trimming the flexibility to the small number of components already thought for a generic Cougar implementation and, even those, only under the inflexible constrains of their interfaces.

Yes it will, although for Zipkin, the tracing spi will suffice. It doesn't quite limit the flexibility to the components already thought, but more to the capabilities which can be built with the data which is standard. As I've already proven, Zipkin can be integrated without changing these. I'm not convinced that the flexibility outweighs the benefits of the inflexibility at this time.

I appreciate your effort in trying to bring flexibility and improvement while keeping backwards compatibility in the ExecutionContext but I'm afraid any solution that doesn't actually address the ExecutionContext will come out as limited, contrived and unnecessarily complex.

So be it. I disagree on "unnecessarily complex", for the reasons stated above, but I understand why you see it that way.

In any case, as I said before, if we really can't change the ExecutionContext and this is the way to go in order to implement Zipkin, then let's do it.

That's my position for now.

P.S.: I didn't understand InProcessExecutable role in all this. Could you clarify please?

On the original issue, Richard mentioned that we need to treat in-process calls the same as out of process calls, this Executable provides the hook to enable us to do this (a failing in Cougar as it stands).

@andredasilvapinto
Copy link

Hi Simon,

I've started trying to integrate Zipkin (through Brave) on your SPI and I have two questions:

  • how should we handle sampling? i.e. deciding if a specific request will or will not be traced? A request shouldn't be partially traced, so the decision of whether to enable tracing or not should be preserved somewhere for each request. We have a traceLoggingEnabled() method on the ExecutionContext, but we don't have anyone for Zipkin. A possible workaround might be to use the value of null in one of the mandatory Zipkin ids (trace id or span id) to represent this situation. It lacks some semantics, but it would work.
  • Brave API requires some data to describe the host emitting the annotation ( https://github.com/kristofa/brave/blob/aa03bf2da1bb1a895504fc0130d5c58a76c70580/brave-interfaces/src/main/java/com/twitter/zipkin/gen/Endpoint.java ) which I don't know if Cougar provides:
    • IP (IPv4)
    • Port
    • Service Name

Do we have any easy access to these values through the Cougar API? The IP we can obtain easily through java.net.InetAddress as long as we are using a simple network configuration (not multiple network interfaces/multiple IPs), but the other ones are more tricky. Maybe we can use the Hostname as the service name, in which case it is also available through the same class. The port, I don't really know how to obtain, do you? The host is an optional field for each annotation, but I think it kind of defeats the purpose to lack this information.

@andredasilvapinto
Copy link

I would also like to better understand the relation between your RequestUUID components and the Zipkin ones.

Till now I have thought the relation was this:
rootUUID -> traceId
localUUID -> spanId
parentUUID -> parentSpanId

However that doesn't match the interface documentation:

    /**
     * Get the root component of this uuid. Returns null if there is none.
     */
    String getRootUUIDComponent();
    /**
     * Get the parent component of this uuid. Returns null if there is none.
     */
    String getParentUUIDComponent();
    /**
     * Get the local component of this uuid. Always returns a valid string.
     */
    String getLocalUUIDComponent();

Zipkin's traceId and spanId are mandatory fields (when tracing is enabled for the request). The only nullable field is the parentSpanId as the current service might be the starting point of the request when hitting our infrastructure, in which case there is no parent span.

@eswdd
Copy link
Author

eswdd commented Aug 19, 2014

how should we handle sampling? i.e. deciding if a specific request will or will not be traced? A request shouldn't be partially traced, so the decision of whether to enable tracing or not should be preserved somewhere for each request. We have a traceLoggingEnabled() method on the ExecutionContext, but we don't have anyone for Zipkin.

My expectation is that Zipkin tracing would be triggered by traceLoggingEnabled() too. If you extend AbstractTracer you'll get this for free.

Zipkin's traceId and spanId are mandatory fields (when tracing is enabled for the request). The only nullable field is the parentSpanId as the current service might be the starting point of the request when hitting our infrastructure, in which case there is no parent span.

I would expect that if there is no parent and tracing enabled then the local component becomes the new root. In theory it should never happen if every service were using Zipkin within an estate, but in reality it will occur for a period where rollout is occurring. That said, I expect the Zipkin holder will contain it's own ids and will resolve those seperately, you won't have the hook to do this until I get far enough along the EC resolution.

Will dig around for the other data this eve/later/when I get a chance.

@andredasilvapinto
Copy link

My expectation is that Zipkin tracing would be triggered by traceLoggingEnabled() too. If you extend AbstractTracer you'll get this for free.

The name is a little bit misleading then, but, in any case, the sampling logic, which would set that field, would reside somewhere in the ExecutionContext building process you will construct, right?

I would expect that if there is no parent and tracing enabled then the local component becomes the new root. In theory it should never happen if every service were using Zipkin within an estate, but in reality it will occur for a period where rollout is occurring. That said, I expect the Zipkin holder will contain it's own ids and will resolve those seperately, you won't have the hook to do this until I get far enough along the EC resolution.

If there is no traceId we can't immediately assume it is the starting point. It can be the nth RPC in our infrastructure of a not sampled request. We need to find a way to differentiate between not having a traceId because we didn't create one yet, and not having a traceId because it was already decided that this request won't be traced across our infrastructure. This may require adding a new transport field, or picking a value to one of the existent ones to represent the not sampled request (e.g. traceId present with no value -> not sampled, no traceId field at all -> starting point).
Also, what do you mean by picking the local UUID? Parsing Cougar's UUID through the {cougar.transport.uuidgenerator}.validateUuid() and get the 3rd element of the array? That won't conform with Zipkin's format restrictions.

Will dig around for the other data this eve/later/when I get a chance.

thanks

@eswdd
Copy link
Author

eswdd commented Aug 19, 2014

Ip & port should presumably be the ones the request was made to? in which case they could be obtained from the request?

The service, that's hard. If it's a simple request (ie non-batch json rpc or rescript or soap or socket) then there will be the name of a service interface, although it won't necessarily be available when the request is initiated (but if the data can be stored somewhere then we can get it out later). Unless the zipkin idea of service is different? In which case maybe hostname is fine. Do you know what is meant by "service" in zipkin?

@andredasilvapinto
Copy link

Ip & port can be obtained from the request, but we don't have access to the request in the Tracer methods (only the RequestUUID and, sometimes, the ExecutionContext).

Regarding the service name:

The name of the service handling the RPC
https://twitter.github.io/finagle/docs/com/twitter/finagle/zipkin/thrift/Span.html

and it should be the same on the client and server span:
openzipkin/brave#18
https://groups.google.com/d/topic/zipkin-user/Q_EZp3pQXk4/discussion

so ideally it shouldn't be the hostname as it is quite common to have the client directing the RPCs to an intermediate network layer and not to a specific host, in which case the client wouldn't know the hostname of the machine that will process its request.

@eswdd
Copy link
Author

eswdd commented Aug 24, 2014

The name is a little bit misleading then, but, in any case, the sampling logic, which would set that field, would reside somewhere in the ExecutionContext building process you will construct, right?

Yes, misleading, but legacy, we can add a comment to the EC that this is no longer just about logging. Yes, the sampling logic would go in a zipkin specific resolver in that EC building process.

@eswdd
Copy link
Author

eswdd commented Aug 24, 2014

If there is no traceId we can't immediately assume it is the starting point. It can be the nth RPC in our infrastructure of a not sampled request. We need to find a way to differentiate between not having a traceId because we didn't create one yet, and not having a traceId because it was already decided that this request won't be traced across our infrastructure. This may require adding a new transport field, or picking a value to one of the existent ones to represent the not sampled request (e.g. traceId present with no value -> not sampled, no traceId field at all -> starting point).

Could we make some assumptions - namely that if someone has passed us a cougar request uuid (which we expect to be sent always by cougar clients - ie internal clients) then they're also zipkin aware, in which case we can build a little truth table based on tracingEnabled, requestuuid sent and zipkin root sent? If not, having a new field for zipkin is not unreasonable, or we can change how we resolve X-Trace-Me (at the moment non-null = trace me, but now is an ideal time to change as I think there's very little actually using tracing, or we can choose a new header name and intentionally not pass tracing enabled to older servers).

Also, what do you mean by picking the local UUID? Parsing Cougar's UUID through the {cougar.transport.uuidgenerator}.validateUuid() and get the 3rd element of the array? That won't conform with Zipkin's format restrictions.

I meant Zipkin's locally generated id.

@eswdd
Copy link
Author

eswdd commented Aug 27, 2014

Ip & port can be obtained from the request, but we don't have access to the request in the Tracer methods (only the RequestUUID and, sometimes, the ExecutionContext).

But you do in the context resolver. Seems to me that context resolution (or in this case uuid resolution) should populate all info required by zipkin.

so ideally it shouldn't be the hostname as it is quite common to have the client directing the RPCs to an intermediate network layer and not to a specific host, in which case the client wouldn't know the hostname of the machine that will process its request.

We can use servicename/version which is a part of our operation identifier. the only q then is what to do in the case of json-rpc or any other future batch transport and which could call multiple services in the same request.. Then we just need to work out where we need to add a hook (unless we pass extra params to the start method (which is not unreasonable))

@andredasilvapinto
Copy link

Sorry for the delay, I was on holidays.

Could we make some assumptions - namely that if someone has passed us a cougar request uuid (which we expect to be sent always by cougar clients - ie internal clients) then they're also zipkin aware

They may not be zipkin aware if they run old cougar versions without zipkin support. I think if we don't want to add a new header then we need to rely on having some reserved value for some zipkin specific header (like the null hypothesis I stated before). Otherwise we can add a new one similar to "X-Trace-Me" like you suggested. I think reusing the existent one can be dangerous as the sampling decisions of the old cougars can be inadequate for zipkin (e.g. too much data being sampled). It would be a possibility if we knew that no one was using it anywhere, but I don't know if that is true.

I meant Zipkin's locally generated id.

OK. Then why is the root UUID a nullable field on the RequestUUID interface while the local one is not?

But you do in the context resolver. Seems to me that context resolution (or in this case uuid resolution) should populate all info required by zipkin.

Yes, but where should it store that info so it is available when emitting traces?

@andredasilvapinto
Copy link

I have some additional questions:

Considering that:

Certainly, it may make more sense for a Zipkin generator to call the standard one, rather than extend, which as you say gives people the flexibility to change the string format without breaking Zipkin.

if we are going to have ZipkinRequestUUIDImpl include the traditional Cougar UUID besides its own IDs, what should getUUID() return?

  • traditionalD (ignoring the zipkin specific fields)?
  • traceID:parentSpanID:spanID (ignoring the traditional generator specified by cougar.transport.uuidgenerator)?
  • traditionalID:traceID:parentSpanID:spanID (creating a new format with 4 instead of 3 components, incompatible with the old one)?

and what about the readExternal and writeExternal methods? which fields should they (de)serialize? the same question for hashCode and equals: should we just consider the traditional cougar ID or check all the other Zipkin specific data?

@andredasilvapinto
Copy link

I've just committed what I have done till now, maybe that will help in exploring the limitations of the design I'm referring to on the previous posts.

andredasilvapinto/cougar@0607d31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment