Skip to content

Instantly share code, notes, and snippets.

@eswdd
Last active August 29, 2015 14:05
Show Gist options
  • Save eswdd/44aba96d4ce46e89f79a to your computer and use it in GitHub Desktop.
Save eswdd/44aba96d4ce46e89f79a to your computer and use it in GitHub Desktop.
Cougar filters - detailed sketch
/**
* Base utils for building ExecutionContexts
*/
public abstract class BaseExecutionContextBuilder<T extends BaseExecutionContextBuilder> {
private BitSet whatsSet = new BitSet(MAX_BASE_BITS+getNumSpecificComponents());
protected GeoLocationDetails location;
protected RequestUUID uuid;
protected Date receivedTime;
protected Date requestTime;
protected boolean traceLoggingEnabled;
protected int transportSecurityStrengthFactor;
protected abstract int getNumSpecificComponents();
protected final static int MAX_BASE_BITS = 6;
public T setLocation(GeoLocationDetails location) {
this.location = location;
set(0);
return (T) this;
}
public T setRequestUUID(RequestUUID uuid) {
this.uuid = uuid;
set(1);
return (T) this;
}
public T setReceivedTime(Date receivedTime) {
this.receivedTime = receivedTime;
set(2);
return (T) this;
}
public T setRequestTime(Date requestTime) {
this.requestTime = requestTime;
set(3);
return (T) this;
}
public T setTraceLoggingEnabled(boolean traceLoggingEnabled) {
this.traceLoggingEnabled = traceLoggingEnabled;
set(4);
return (T) this;
}
public T setTransportSecurityStrengthFactor(int factor) {
transportSecurityStrengthFactor = factor;
set(5);
return (T) this;
}
protected void beenSet(int subComponent) {
set(MAX_BASE_BITS + subComponent);
}
private void set(int bit) {
if (whatsSet.get(bit)) {
throw new IllegalStateException("Component has already been set");
}
whatsSet.set(bit);
}
protected void checkReady() {
if (whatsSet.nextClearBit(0) <= getNumSpecificComponents()) {
throw new IllegalStateException("Not all components have been set on this execution context");
}
}
}
public enum DehydratedExecutionContextComponent {
Location,
IdentityTokens,
RequestUuid,
ReceivedTime,
RequestedTime,
TraceLoggingEnabled,
TransportSecurityStrengthFactor
}

Requirements

  • ExecutionContext must remain immutable once passed to the ExecutionVenue
  • Maintain transport independence/agnosticism for services:
  • Execution context interface must be the same for all transports (within a paradigm)
  • Filters must support all transports within a paradigm (including in-process)

Aims

  • Enable Zipkin integration (tracer calls and resolution of ids)
  • Server transports initially, but will need to consider client (at least for zipkin)
  • Cleanup of existing transport code would be a bonus
  • Provides the option to change how the ExecutonContext is resolved
  • Doesn't try to deal with ExecutionContext extension

Context

Cougar users have long been requesting a filters capability that would allow them access to the http request (in the case of Jetty transport) and a bucket o' values in the ExecutionContext to put stuff, and for just as long we've been rejecting it on the following basis:

  • We want to maintain Cougar's key selling point - transport independence with the interface wholely defined in an IDD

However, a recent request for filters was made in the context of providing tracing integration with Zipkin, and although our stance remained the same to begin, we soon realised that there was scope for a filters capability which provides protocol specific implementations of transport independent features (as opposed to protocol specific features).

Andre Pinto has put together a sketch of what a filters capability for Cougar might look like, which provides hooks for pre- and post- command execution.

Note, this gist doesn't try to address the idea of ExecutionContext extension, as I don't believe it is currently necessary and also to be a hard problem in terms of its implications.

Proposal

In addition to Andres sketch, which provides hooks for pre- and post- command execution, I'd like to suggest we rework execution context resolution such that plugins can provide alternate implementations from those currently supported in the transports (mostly in ExecutionContextFactory) which would enable more flexibility that current mechanism allow (for example you may want to resolve data from more than 2 headers for building request uuids).

This would entail the following changes:

  • Introduce an ExecutionContextBuilder, which provides a mutable interface for constructing an ExecutionContext
  • ExecutionContextFactory would delegate to this builder, but we'd remove from it the methods which take headers.
  • Introduce a new interface ExecutionContextResolver<In>(?), where each transport defines what type <In> is (c.f. IdentityTokenResolvers which provides 2 methods:
  • getComponents():ExecutionContextComponent[] - allows the resolver to declare what parts it can resolve
  • resolve(ExecutionContextBuilder builder, In request, ExecutionContextComponent[] allowed):void - requests that the resolver resolve those components it has been selected to resolve (if it does more or less there will be an error)
  • EC resolution will be setup at startup such that there is exactly one resolver for each part of the EC. Resolvers will always be called if they have been registered so that they have an option to error if they don't get enough - which would neatly affect health service calls too).
  • Would possibly need to have a ResolverFactory which returns the appropriate one for each transport. Should mandate that resolver used for one part of EC is the same for all transports.

Changes to be made alongside this

  • Rename ExecutionContextWithTokens to DehydratedExecutionContext - makes it obvious that we're hydrating when we resolve ID tokens to IDs rather than mutating EC and I never liked the name anyway. Perhaps rework so both this and EC inherit from a base iface.
  • Rework tracer and ID token resolution into filters / EC resolvers
  • Provide default resolver which resolves EC in the way we currently do now

In-process calls

  • Introduce an InProcessExecutable, which acts as both client and server transport. Make default namespace on clients "InProcess" or similar, and then when they initialise, make a call to ensure that the InProcessExecutable is registered for the service interface that instance represents (ie register if not already done so). This would ensure we have a new request uuid etc for in process calls.

Client support

  • Something similar to the IdentityTokenResolver's rewrite() method, but we would need a seperate set of resolvers per client. Perhaps have a default set of resolvers that all clients use and then a seperate list of overrides to construct the specific set with.

Applicability to Zipkin

This details how the Zipkin integration would work in the context of this proposal:

  • Zipkin would provide a ZipkinRequestUUID iface which extends from RequestUUID which provides access to numeric ids.
  • Zipkin would provide a ZipkinRequestUUIDGenerator iface which extends from RequestUUIDGenerator which constructs extended uuids it needs.
  • At startup the plugin would validate that the configured generator implemented this interface.
  • Plugin would provide an EC resolver providing resolution for the request uuid. This would obtain the extra headers containing the numeric ids and then construct a zipkin uuid using the configured generator. If it was called and it was not requested to construct the uuid then it could error or warn as appropriate.
  • Similarly it would provide an EC writer for the client, we'll have to work something out so it's easy to register default writers for all clients so zipkin can be used for all clients without having to register with each individually.

Assuming that the tracer SPI has been completed, zipkin would also provide a Tracer implementation, and the plugin's spring context could auto-wire itself in as a tracer into the configured CompoundTracer.

@eswdd
Copy link
Author

eswdd commented Aug 19, 2014

how should we handle sampling? i.e. deciding if a specific request will or will not be traced? A request shouldn't be partially traced, so the decision of whether to enable tracing or not should be preserved somewhere for each request. We have a traceLoggingEnabled() method on the ExecutionContext, but we don't have anyone for Zipkin.

My expectation is that Zipkin tracing would be triggered by traceLoggingEnabled() too. If you extend AbstractTracer you'll get this for free.

Zipkin's traceId and spanId are mandatory fields (when tracing is enabled for the request). The only nullable field is the parentSpanId as the current service might be the starting point of the request when hitting our infrastructure, in which case there is no parent span.

I would expect that if there is no parent and tracing enabled then the local component becomes the new root. In theory it should never happen if every service were using Zipkin within an estate, but in reality it will occur for a period where rollout is occurring. That said, I expect the Zipkin holder will contain it's own ids and will resolve those seperately, you won't have the hook to do this until I get far enough along the EC resolution.

Will dig around for the other data this eve/later/when I get a chance.

@andredasilvapinto
Copy link

My expectation is that Zipkin tracing would be triggered by traceLoggingEnabled() too. If you extend AbstractTracer you'll get this for free.

The name is a little bit misleading then, but, in any case, the sampling logic, which would set that field, would reside somewhere in the ExecutionContext building process you will construct, right?

I would expect that if there is no parent and tracing enabled then the local component becomes the new root. In theory it should never happen if every service were using Zipkin within an estate, but in reality it will occur for a period where rollout is occurring. That said, I expect the Zipkin holder will contain it's own ids and will resolve those seperately, you won't have the hook to do this until I get far enough along the EC resolution.

If there is no traceId we can't immediately assume it is the starting point. It can be the nth RPC in our infrastructure of a not sampled request. We need to find a way to differentiate between not having a traceId because we didn't create one yet, and not having a traceId because it was already decided that this request won't be traced across our infrastructure. This may require adding a new transport field, or picking a value to one of the existent ones to represent the not sampled request (e.g. traceId present with no value -> not sampled, no traceId field at all -> starting point).
Also, what do you mean by picking the local UUID? Parsing Cougar's UUID through the {cougar.transport.uuidgenerator}.validateUuid() and get the 3rd element of the array? That won't conform with Zipkin's format restrictions.

Will dig around for the other data this eve/later/when I get a chance.

thanks

@eswdd
Copy link
Author

eswdd commented Aug 19, 2014

Ip & port should presumably be the ones the request was made to? in which case they could be obtained from the request?

The service, that's hard. If it's a simple request (ie non-batch json rpc or rescript or soap or socket) then there will be the name of a service interface, although it won't necessarily be available when the request is initiated (but if the data can be stored somewhere then we can get it out later). Unless the zipkin idea of service is different? In which case maybe hostname is fine. Do you know what is meant by "service" in zipkin?

@andredasilvapinto
Copy link

Ip & port can be obtained from the request, but we don't have access to the request in the Tracer methods (only the RequestUUID and, sometimes, the ExecutionContext).

Regarding the service name:

The name of the service handling the RPC
https://twitter.github.io/finagle/docs/com/twitter/finagle/zipkin/thrift/Span.html

and it should be the same on the client and server span:
openzipkin/brave#18
https://groups.google.com/d/topic/zipkin-user/Q_EZp3pQXk4/discussion

so ideally it shouldn't be the hostname as it is quite common to have the client directing the RPCs to an intermediate network layer and not to a specific host, in which case the client wouldn't know the hostname of the machine that will process its request.

@eswdd
Copy link
Author

eswdd commented Aug 24, 2014

The name is a little bit misleading then, but, in any case, the sampling logic, which would set that field, would reside somewhere in the ExecutionContext building process you will construct, right?

Yes, misleading, but legacy, we can add a comment to the EC that this is no longer just about logging. Yes, the sampling logic would go in a zipkin specific resolver in that EC building process.

@eswdd
Copy link
Author

eswdd commented Aug 24, 2014

If there is no traceId we can't immediately assume it is the starting point. It can be the nth RPC in our infrastructure of a not sampled request. We need to find a way to differentiate between not having a traceId because we didn't create one yet, and not having a traceId because it was already decided that this request won't be traced across our infrastructure. This may require adding a new transport field, or picking a value to one of the existent ones to represent the not sampled request (e.g. traceId present with no value -> not sampled, no traceId field at all -> starting point).

Could we make some assumptions - namely that if someone has passed us a cougar request uuid (which we expect to be sent always by cougar clients - ie internal clients) then they're also zipkin aware, in which case we can build a little truth table based on tracingEnabled, requestuuid sent and zipkin root sent? If not, having a new field for zipkin is not unreasonable, or we can change how we resolve X-Trace-Me (at the moment non-null = trace me, but now is an ideal time to change as I think there's very little actually using tracing, or we can choose a new header name and intentionally not pass tracing enabled to older servers).

Also, what do you mean by picking the local UUID? Parsing Cougar's UUID through the {cougar.transport.uuidgenerator}.validateUuid() and get the 3rd element of the array? That won't conform with Zipkin's format restrictions.

I meant Zipkin's locally generated id.

@eswdd
Copy link
Author

eswdd commented Aug 27, 2014

Ip & port can be obtained from the request, but we don't have access to the request in the Tracer methods (only the RequestUUID and, sometimes, the ExecutionContext).

But you do in the context resolver. Seems to me that context resolution (or in this case uuid resolution) should populate all info required by zipkin.

so ideally it shouldn't be the hostname as it is quite common to have the client directing the RPCs to an intermediate network layer and not to a specific host, in which case the client wouldn't know the hostname of the machine that will process its request.

We can use servicename/version which is a part of our operation identifier. the only q then is what to do in the case of json-rpc or any other future batch transport and which could call multiple services in the same request.. Then we just need to work out where we need to add a hook (unless we pass extra params to the start method (which is not unreasonable))

@andredasilvapinto
Copy link

Sorry for the delay, I was on holidays.

Could we make some assumptions - namely that if someone has passed us a cougar request uuid (which we expect to be sent always by cougar clients - ie internal clients) then they're also zipkin aware

They may not be zipkin aware if they run old cougar versions without zipkin support. I think if we don't want to add a new header then we need to rely on having some reserved value for some zipkin specific header (like the null hypothesis I stated before). Otherwise we can add a new one similar to "X-Trace-Me" like you suggested. I think reusing the existent one can be dangerous as the sampling decisions of the old cougars can be inadequate for zipkin (e.g. too much data being sampled). It would be a possibility if we knew that no one was using it anywhere, but I don't know if that is true.

I meant Zipkin's locally generated id.

OK. Then why is the root UUID a nullable field on the RequestUUID interface while the local one is not?

But you do in the context resolver. Seems to me that context resolution (or in this case uuid resolution) should populate all info required by zipkin.

Yes, but where should it store that info so it is available when emitting traces?

@andredasilvapinto
Copy link

I have some additional questions:

Considering that:

Certainly, it may make more sense for a Zipkin generator to call the standard one, rather than extend, which as you say gives people the flexibility to change the string format without breaking Zipkin.

if we are going to have ZipkinRequestUUIDImpl include the traditional Cougar UUID besides its own IDs, what should getUUID() return?

  • traditionalD (ignoring the zipkin specific fields)?
  • traceID:parentSpanID:spanID (ignoring the traditional generator specified by cougar.transport.uuidgenerator)?
  • traditionalID:traceID:parentSpanID:spanID (creating a new format with 4 instead of 3 components, incompatible with the old one)?

and what about the readExternal and writeExternal methods? which fields should they (de)serialize? the same question for hashCode and equals: should we just consider the traditional cougar ID or check all the other Zipkin specific data?

@andredasilvapinto
Copy link

I've just committed what I have done till now, maybe that will help in exploring the limitations of the design I'm referring to on the previous posts.

andredasilvapinto/cougar@0607d31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment