Skip to content

Instantly share code, notes, and snippets.

@codefromthecrypt
Created July 18, 2017 11:02
Show Gist options
  • Save codefromthecrypt/ce2736953ebc310c7c286cfb7a27bbe7 to your computer and use it in GitHub Desktop.
Save codefromthecrypt/ce2736953ebc310c7c286cfb7a27bbe7 to your computer and use it in GitHub Desktop.
UUID is a 128bit ID like a square is a rectangle

I've been party to some pretty interesting conversations around Zipkin's ID format and UUIDs. Seems interesting to share, since some of this stuff is neat, if geeky.

So, we love UUIDs! Many people use UUID format to pass around things unlikely to ever clash. These are used for long-term retrieval and other handy things. A lot of correlation IDs are UUIDs, too.

What about distributed tracing? Trace IDs should be UUID, too! To this I answer.. maybe?

Firstly, there are a lot of systems inspired by the dapper paper which hints how to put a system together with probabilistically random 64bit IDs. Turns out folks have done this. For example, Zipkin started with 64bit trace IDs, and now supports 128bi<<< HEY USE UUID!!! ok ok I'll get to that.

Zipkin used the random 64bit thing everywhere. The 64bit part was used to ease data transport and storage concerns (as it is fixed width). The random part was used for consistent sampling algos. Basically a simplifying assumption carried all over.

The encoding of an identifier was a work in progress. For example, trace identifiers are commonly stored as 64bit numbers, but not for the number part, rather for the fixed bits part. At first numeric encoding was used in things like headers and logs. However, people would encode this as a signed number, and would miss the negative sign copy/pasting it, losing time (try it!) Or, they would get stuck because you can't fit a 64bit number in a json field (53bits!). So at the end of it all, a fixed hex encoding ended up best.

At sites like Twitter, you can collide on 64bit IDs, IIRC someone at the front end team said it could happen in magnitude of hours. There are ways to get around this. Then, there's long term storage. While typically Zipkin trace data are kept for only a matter of days, if you wanted to archive traces, you are more likely to eventually clash (even if not likley). In real life, though the best reason for an ID larger than 64bit is interop with a system that is 128bit. Zipkin was inspired by Google's Dapper. A family member Google StackDriver used 128bit trace IDs. Not having these be lossy through a zipkin system was what tipped the tide towards 128bi<<< HEY USE UUID!!! ok ok I'll get to that.

So in Zipkin, we had to grow to transport and store 128bits of trace ID even if we didn't necessarily need 128bits of uniqueness. The long story of that is the easiest way to handle this was to double the encoding so.. instead of 16 hex characters, use 32. A system could "downgrade" by ignoring the left-most 16 characters and still work well enough. <<< HEY USE UUID!!! ok ok I'll get to that.

So, we don't use UUID format because it doesn't actually fit the problem. For one, it re-introduces the hyphen problem (copy-paste and try). Even if you think that's weak, it also would break what we were trying to be compatible with which was using 32char hex encoding. Even if you think that's weak, remember that UUID has a format, and random is only one of them. If you put fully random 128bits into a UUID, you will only match the random (type 4) ID format on accident. Even if you think that's weak, it is less straightforward to downgrade a UUID into just hex 16 chars (even if it is only a matter of stripping characters first). Even if you think that's weak, a string match of 16hex chars works against an ID of 32hex chars, but not UUID. There are more reasons, but I'm bored typing at this point.

Main thing is that the context of choice in a system is important. UUIDs are great and yes they carry 128bits (mostly). People know what UUIDs are (mostly). This can simultaneously be true and not be universally true that all 128bit IDs are and must be UUIDs. Hope this was fun!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment