Skip to content

Instantly share code, notes, and snippets.

@evinism
Last active November 12, 2018 05:31
Show Gist options
  • Save evinism/6081779f0cdc07e03197f1cff57719d6 to your computer and use it in GitHub Desktop.
Save evinism/6081779f0cdc07e03197f1cff57719d6 to your computer and use it in GitHub Desktop.
=== Background ===
For one issue at work, we tossed around the idea of generating UUIDs
clientside in order to ensure that we had a unique immutable id without
having to go to the server to get an ID back, primarily for p2p applications.
Obviously, if somebody's maliciously creating ids, they can choose arbitrary
values and create collisions. Additionally, one client could tell if another
client had already chosen a certain value for an id by trying to commit a
certain UUID to the db.
For example:
> Client 1 creates a "todo" a certain UUID.
> Client 2 tries to create a "todo" with the same UUID.
> Client 2 now has some information about client 1's actions.
I posed the purely academic question of how we might be able to send arbitrary
messages over that channel, and the below is a possible incomplete spec that
attempts to propose a basic solution. To be perfectly clear: This is an
interesting solution to a toy problem.
=== Model ===
Each client can create a record (like a todo) with a certain UUID in this space:
----
0: 00000000-0000-0000-0000-0000000000
1: 00000000-0000-0000-0000-0000000001
...
n: ffffffff-ffff-ffff-ffff-ffffffffff
----
Assume that if an ID hasn't been used already, you get a 201 CREATED back
from the server. If an ID has been used, you get a 409 CONFLICT back. All
GETs will 404 NOT FOUND, so in order to tell whether an ID has been used,
you have to try to create a "todo" with that ID. The interesting bit is
that by reading whether or not another client has created a record with
a UUID, you inevitably end up creating one if it doesn't already exist.
We can imagine a table of sequential UUIDs and whether
they're taken.
----
uuid: | ...0-0000001 | ...0-0000002 | ...0-0000003 | ...
taken?: | no | yes | yes |
----
We can model the space of taken vs. untaken UUIDs as
shared memory, where the address is the sequential index,
of the UUID, and data is whether the record exists or not.
----
addr: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ...
data: | 0 | 1 | 1 | 0 | 1 | 1 | 0 | ...
----
The "create a record" operation can be viewed in this model as a joint
"read and write true" operation. The interesting thing about this operation
is that any time you read a byte, that byte gets overwritten with all 1s.
=== First Pass: Shared Message Store ===
The first attempt I'm going for is the use of a sled of 1s at the bottom of
the memory space, a data structure I'm calling a "shared message store" that's
just beyond the sled of 1s, and a data heap that starts at the top of the address
space and grows downwards in address space.
Rough overview is that when a client polls for messages, it "slides down" the sled
of 1s to try to find the shared message store, reads the whole shared message store,
performs manipulations to the shared message store, and rewrites the shared message
store just beyond where the previous one ended (which is now filled with 1s due to
the read.) More precise specification follows:
To make it easier on ourselves to start, we're going to start with the
assumption that each client acts are atomically. This is of course a
horrible assumption, but I'll try to address it later.
( Message Store Memory Layout )
[ 0 | 00000001 | 01000100 | 00101110 | 01001001 | 01101010 | ... ]
start, msg_count, head_ptr, dest_addr, msg_ptr, msg_size,
| -------- header ---------- | ------ message record 1 ------ | -- message record 2 -..
( Main memory layout )
[ ones | shared memory store | zeros | message_data ]
^
head_ptr from message store
Glossary of fields:
start: The start bit, indicating the beginning of the message store
msg_count: The number of messages in the message store
head_ptr: The pointer for the next. Gets updated when a new message
is added to the message store.
dest_addr: The destination address of the message
msg_ptr: The pointer to the to the message data for that message record
msg_size: The length of the message data for that message record
-- Reading a message --
To read from the message, read down the field of 1s until you find a zero,
then read the subsequent 2 bytes to get the message count. Since messages are
of a fixed size (2 bytes), we can now read the whole set of message records.
Once we've read the message queue. Since after reading, the field of 1s now
extends through where the shared message store was written, we rewrite the
message store at the end of the new field (removing any messages addressed
to that client.)
After reading, re-copy the message store just past the end of the newly
invalidated block.
-- Inserting a message --
Follow the steps for reading a message
Additionally, before rewriting the message, perform these steps:
1: Write the message data to the head_ptr address.
2: Update head_ptr to the end of the new data blob.
3: Add a new message record to the message store
4: Update msg_count
=== Second Pass: Concurrency issues + locks ===
I'm going to attempt to tackle this one whittling down from strong guarantees
to weak ones. The concurrency models from strongest to weakest are:
1: Clients can execute an entire routine atomically
2: Clients can execute a readwrite of k addresses atomically (with some limit).
3: Clients can execute a readwrite of 1 address atomically.
The above protocol is robust to (1) but to no others, and thus will have to be amended.
-- Tackling k < O(log(n)) --
First, we split up the shared message store into a static part and a dynamic part:
(head_ptr_record)
| 01 | 00100101 |
start head_ptr
We move the residual part of the message store to just beyond the head ptr:
(shared_message_store)
| message 1 | ... | message m-1 | message m | msg_count |
where each message is still of the form:
| dest_addr | msg_ptr | msg_size |
Our whole memory space now looks like:
[ ones | head_ptr_rec | zeros | message_store | message_data ]
We modify the start sequence, such that a 00 means someone else has a lock on the
datastore a 01 means that you've successfully gained a lock on the datastore. If
a client tries to read the head_ptr_record and gets a 00 in the start field, that
implies that someone else has a lock and so nothing should be done. The convenient
part of this operation is that the lock is maintained atomically-- reading a lock
maintains the lock, and reading a head_ptr_record atomically commits a lock.
This requires atomic manipulation of 2 + log2(n) bits where n is size of the
space of available uuids.
-- Setting up the memory space --
This protocol does not handle the initial coordination of the clients. If the
memory space were blank at the onset of clients joining, then all clients will
loop consistenty, each thinking that some other
To solve this, we'll add in a quick asymmetry to the specification: a client that
writes to the first word automatically gains the read/write lock, and assumes the
default head pointer state of the form [ 01 | top_of_address_space ]. Since a field
of zeros is a valid shared message store, the client performs no additional setup.
An extremely rough implementation of this version of the protocol can be found in
my scratchpad repo evinism/graveyard:
https://github.com/evinism/graveyard/blob/master/uuid_protocol.js
-- Tackling k=2 --
Really, the essence of the locking mechanism we need is the start sequence. We can
replace the whole lock operation with just the sled of 1s, followed by a two-bit
sequence. From a practical perspective, this makes it far less likely that two clients
will conflict horrifically if atomicity can only be guaranteed to k=1, but that's
little consolation when the jitter between requests is not completely neglegable.
If we wanted to go down this path, we could section off the memory into 3 sections
and treat each one of them as a contiguous shared memory space.
- M1 for i % 3 == 0
- M2 for i % 3 == 1
- M3 for i % 3 == 2
We could arbitrarily assign
M1: Locking mechanism (just the 01 sequence from the head_ptr data structure)
M2: Shared Message Store (identical the non-concurrent construction)
M3: Message Data
Since the locking mechanism can no longer store a pointer, we must still
put in a sled of 1s on both M1 and M2. This is the primary modification to
get us to k=2. Super simple.
=== Third Pass: Error correction ===
[ TODO ]
IDK yet. Obviously, there will be existing entries in the UUID space. We'll
have to work around them and mitigate the rare errors incurred. A simple hamming
code should do the trick for now.
An optimal choice of ECC would probably take into account that errors are over the
Z channel, that is the errors can only occur in one direction. I'm not well-versed
enough to give a response in that section, but one probably exists.
Definitely still a work in progress
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment