Last active
November 12, 2018 05:31
-
-
Save evinism/6081779f0cdc07e03197f1cff57719d6 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
=== Background === | |
For one issue at work, we tossed around the idea of generating UUIDs | |
clientside in order to ensure that we had a unique immutable id without | |
having to go to the server to get an ID back, primarily for p2p applications. | |
Obviously, if somebody's maliciously creating ids, they can choose arbitrary | |
values and create collisions. Additionally, one client could tell if another | |
client had already chosen a certain value for an id by trying to commit a | |
certain UUID to the db. | |
For example: | |
> Client 1 creates a "todo" a certain UUID. | |
> Client 2 tries to create a "todo" with the same UUID. | |
> Client 2 now has some information about client 1's actions. | |
I posed the purely academic question of how we might be able to send arbitrary | |
messages over that channel, and the below is a possible incomplete spec that | |
attempts to propose a basic solution. To be perfectly clear: This is an | |
interesting solution to a toy problem. | |
=== Model === | |
Each client can create a record (like a todo) with a certain UUID in this space: | |
---- | |
0: 00000000-0000-0000-0000-0000000000 | |
1: 00000000-0000-0000-0000-0000000001 | |
... | |
n: ffffffff-ffff-ffff-ffff-ffffffffff | |
---- | |
Assume that if an ID hasn't been used already, you get a 201 CREATED back | |
from the server. If an ID has been used, you get a 409 CONFLICT back. All | |
GETs will 404 NOT FOUND, so in order to tell whether an ID has been used, | |
you have to try to create a "todo" with that ID. The interesting bit is | |
that by reading whether or not another client has created a record with | |
a UUID, you inevitably end up creating one if it doesn't already exist. | |
We can imagine a table of sequential UUIDs and whether | |
they're taken. | |
---- | |
uuid: | ...0-0000001 | ...0-0000002 | ...0-0000003 | ... | |
taken?: | no | yes | yes | | |
---- | |
We can model the space of taken vs. untaken UUIDs as | |
shared memory, where the address is the sequential index, | |
of the UUID, and data is whether the record exists or not. | |
---- | |
addr: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ... | |
data: | 0 | 1 | 1 | 0 | 1 | 1 | 0 | ... | |
---- | |
The "create a record" operation can be viewed in this model as a joint | |
"read and write true" operation. The interesting thing about this operation | |
is that any time you read a byte, that byte gets overwritten with all 1s. | |
=== First Pass: Shared Message Store === | |
The first attempt I'm going for is the use of a sled of 1s at the bottom of | |
the memory space, a data structure I'm calling a "shared message store" that's | |
just beyond the sled of 1s, and a data heap that starts at the top of the address | |
space and grows downwards in address space. | |
Rough overview is that when a client polls for messages, it "slides down" the sled | |
of 1s to try to find the shared message store, reads the whole shared message store, | |
performs manipulations to the shared message store, and rewrites the shared message | |
store just beyond where the previous one ended (which is now filled with 1s due to | |
the read.) More precise specification follows: | |
To make it easier on ourselves to start, we're going to start with the | |
assumption that each client acts are atomically. This is of course a | |
horrible assumption, but I'll try to address it later. | |
( Message Store Memory Layout ) | |
[ 0 | 00000001 | 01000100 | 00101110 | 01001001 | 01101010 | ... ] | |
start, msg_count, head_ptr, dest_addr, msg_ptr, msg_size, | |
| -------- header ---------- | ------ message record 1 ------ | -- message record 2 -.. | |
( Main memory layout ) | |
[ ones | shared memory store | zeros | message_data ] | |
^ | |
head_ptr from message store | |
Glossary of fields: | |
start: The start bit, indicating the beginning of the message store | |
msg_count: The number of messages in the message store | |
head_ptr: The pointer for the next. Gets updated when a new message | |
is added to the message store. | |
dest_addr: The destination address of the message | |
msg_ptr: The pointer to the to the message data for that message record | |
msg_size: The length of the message data for that message record | |
-- Reading a message -- | |
To read from the message, read down the field of 1s until you find a zero, | |
then read the subsequent 2 bytes to get the message count. Since messages are | |
of a fixed size (2 bytes), we can now read the whole set of message records. | |
Once we've read the message queue. Since after reading, the field of 1s now | |
extends through where the shared message store was written, we rewrite the | |
message store at the end of the new field (removing any messages addressed | |
to that client.) | |
After reading, re-copy the message store just past the end of the newly | |
invalidated block. | |
-- Inserting a message -- | |
Follow the steps for reading a message | |
Additionally, before rewriting the message, perform these steps: | |
1: Write the message data to the head_ptr address. | |
2: Update head_ptr to the end of the new data blob. | |
3: Add a new message record to the message store | |
4: Update msg_count | |
=== Second Pass: Concurrency issues + locks === | |
I'm going to attempt to tackle this one whittling down from strong guarantees | |
to weak ones. The concurrency models from strongest to weakest are: | |
1: Clients can execute an entire routine atomically | |
2: Clients can execute a readwrite of k addresses atomically (with some limit). | |
3: Clients can execute a readwrite of 1 address atomically. | |
The above protocol is robust to (1) but to no others, and thus will have to be amended. | |
-- Tackling k < O(log(n)) -- | |
First, we split up the shared message store into a static part and a dynamic part: | |
(head_ptr_record) | |
| 01 | 00100101 | | |
start head_ptr | |
We move the residual part of the message store to just beyond the head ptr: | |
(shared_message_store) | |
| message 1 | ... | message m-1 | message m | msg_count | | |
where each message is still of the form: | |
| dest_addr | msg_ptr | msg_size | | |
Our whole memory space now looks like: | |
[ ones | head_ptr_rec | zeros | message_store | message_data ] | |
We modify the start sequence, such that a 00 means someone else has a lock on the | |
datastore a 01 means that you've successfully gained a lock on the datastore. If | |
a client tries to read the head_ptr_record and gets a 00 in the start field, that | |
implies that someone else has a lock and so nothing should be done. The convenient | |
part of this operation is that the lock is maintained atomically-- reading a lock | |
maintains the lock, and reading a head_ptr_record atomically commits a lock. | |
This requires atomic manipulation of 2 + log2(n) bits where n is size of the | |
space of available uuids. | |
-- Setting up the memory space -- | |
This protocol does not handle the initial coordination of the clients. If the | |
memory space were blank at the onset of clients joining, then all clients will | |
loop consistenty, each thinking that some other | |
To solve this, we'll add in a quick asymmetry to the specification: a client that | |
writes to the first word automatically gains the read/write lock, and assumes the | |
default head pointer state of the form [ 01 | top_of_address_space ]. Since a field | |
of zeros is a valid shared message store, the client performs no additional setup. | |
An extremely rough implementation of this version of the protocol can be found in | |
my scratchpad repo evinism/graveyard: | |
https://github.com/evinism/graveyard/blob/master/uuid_protocol.js | |
-- Tackling k=2 -- | |
Really, the essence of the locking mechanism we need is the start sequence. We can | |
replace the whole lock operation with just the sled of 1s, followed by a two-bit | |
sequence. From a practical perspective, this makes it far less likely that two clients | |
will conflict horrifically if atomicity can only be guaranteed to k=1, but that's | |
little consolation when the jitter between requests is not completely neglegable. | |
If we wanted to go down this path, we could section off the memory into 3 sections | |
and treat each one of them as a contiguous shared memory space. | |
- M1 for i % 3 == 0 | |
- M2 for i % 3 == 1 | |
- M3 for i % 3 == 2 | |
We could arbitrarily assign | |
M1: Locking mechanism (just the 01 sequence from the head_ptr data structure) | |
M2: Shared Message Store (identical the non-concurrent construction) | |
M3: Message Data | |
Since the locking mechanism can no longer store a pointer, we must still | |
put in a sled of 1s on both M1 and M2. This is the primary modification to | |
get us to k=2. Super simple. | |
=== Third Pass: Error correction === | |
[ TODO ] | |
IDK yet. Obviously, there will be existing entries in the UUID space. We'll | |
have to work around them and mitigate the rare errors incurred. A simple hamming | |
code should do the trick for now. | |
An optimal choice of ECC would probably take into account that errors are over the | |
Z channel, that is the errors can only occur in one direction. I'm not well-versed | |
enough to give a response in that section, but one probably exists. | |
Definitely still a work in progress |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment