Skip to content

Instantly share code, notes, and snippets.

@enaeseth
Created June 12, 2013 19:29
Show Gist options
  • Star 23 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save enaeseth/5768348 to your computer and use it in GitHub Desktop.
Save enaeseth/5768348 to your computer and use it in GitHub Desktop.
Convert a MongoDB ObjectID to a valid, semantically similar UUID.
"""
Convert a MongoDB ObjectID to a version-1 UUID.
Python 2.7+ required for datetime.timedelta.total_seconds().
ObjectID:
- UNIX timestamp (32 bits)
- Machine identifier (24 bits)
- Process ID (16 bits)
- Counter (24 bits)
UUID 1:
- Timestamp (60 bits)
- Clock sequence (14 bits)
- Node identifier (48 bits)
"""
import datetime
import uuid
from bson import objectid
class _UTC(datetime.tzinfo):
ZERO = datetime.timedelta(0)
def utcoffset(self, dt):
return self.ZERO
def tzname(self, dt):
return 'UTC'
def dst(self, dt):
return self.ZERO
UTC = _UTC()
UUID_1_EPOCH = datetime.datetime(1582, 10, 15, tzinfo=UTC)
UUID_TICKS_PER_SECOND = 10000000
UUID_VARIANT_1 = 0b1000000000000000
def _unix_time_to_uuid_time(dt):
return int((dt - UUID_1_EPOCH).total_seconds() * UUID_TICKS_PER_SECOND)
def objectid_to_uuid(oid):
oid_time = oid.generation_time.astimezone(UTC)
oid_hex = str(oid)
machine_pid_hex = oid_hex[8:18]
counter = int(oid_hex[18:], 16)
timestamp_hex = '1%015x' % (_unix_time_to_uuid_time(oid_time))
clock_hex = '%04x' % (UUID_VARIANT_1 | (counter & 0x3fff))
node_hex = '%012x' % int(machine_pid_hex, 16)
converted_uuid = uuid.UUID(
'%s-%s-%s-%s-%s' % (
timestamp_hex[-8:],
timestamp_hex[4:8],
timestamp_hex[:4],
clock_hex,
node_hex
)
)
assert converted_uuid.variant == uuid.RFC_4122
assert converted_uuid.version == 1
return converted_uuid
if __name__ == '__main__':
oid = objectid.ObjectId()
print oid
oid_as_uuid = objectid_to_uuid(oid)
print '{%s}' % oid_as_uuid
@jdahlin
Copy link

jdahlin commented Apr 24, 2015

Hi, could you put a license on this so it can be reused in applications?

Thanks

@perlun
Copy link

perlun commented Nov 1, 2015

Agree. A Ruby implementation would also be totally cool. 😄

@nyov
Copy link

nyov commented Nov 5, 2015

People don't get notified on gist comments AFAIK.

@enaeseth, would you consider putting a License to your MongoDB ObjectID to UUID python code?
It's cool stuff. Thanks!

@samprakos
Copy link

Am I correct to assume that this will produce different uuids for the same object id over time?

@kevhill
Copy link

kevhill commented Feb 17, 2017

@samprakos no, it will always produce the same uuid from a given object id

3 values are extracted from the objectId: the time the objectID was created, the machine pid that created the objectId, and then the random counter. Those 3 values are used to create a uuid deterministically in uuid spec version 1

@Benjamin-Dobell
Copy link

Benjamin-Dobell commented Jul 2, 2018

Just a quick heads up that there are 3 fairly minor issues with this implementation.

  1. 4 (most significant) bits of the timestamp are wasted.
  2. The multicast bit isn't set on the node field, which is a violation of the UUID v1 spec.
  3. The 10 most significant bits of MongoDB's "counter" are discarded.

Issue 1

In practice this isn't going to be problem unless you're creating objects with a computer that has its system clock set to greater than 2491 AD, so I think it's reasonable enough to ignore that for now.

Issue 2

The RFC indicates that when node is not a MAC address, that the multicast bit must be set http://www.ietf.org/rfc/rfc4122.txt

Issue 3

This depends on how your MongoDB driver is using the "counter" field within MongoDB. Most implementations simply increment, so in that case you'd need to have inserted more than 16,384 MongoDB documents in a second to run into a collision. However, that does assume that the driver is simply incrementing this number, if it's not you may run into issues sooner.

Solution

  1. There's room to insert an additional 4 bits of the timestamp without any black magic, fits right in the field. It's only omitted as a result of the chosen implementation.

  2. Easy enough, just set the bit.

  3. This is where things get a bit tricky. We need to find space for 10 additional bits in our UUID, we've just consumed 1 additional bit above (Multicast bit).

We still have 7 most significant bits of the node field, so we can place 7 bits there.

MongoDB's timestamp only has a resolution of a second, where as UUID v1 has a resolution of 100 nanoseconds. So there's actually 7 (least significant) bits free in the timestamp, so the remaining 3 bits can be placed there.

Of course, the semantics are somewhat altered if you place those last 3 bits in UUID's time field, so this may not be desirable in all circumstances. However, if you want completely lossless (and therefore reversible) conversion, then this is at least one possible solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment