Skip to content

Instantly share code, notes, and snippets.

@de-sh
Last active July 2, 2021 06:20
Show Gist options
  • Save de-sh/397f4bfd97ef75f65f1daf714047502b to your computer and use it in GitHub Desktop.
Save de-sh/397f4bfd97ef75f65f1daf714047502b to your computer and use it in GitHub Desktop.
Building dstore, a partially distributed memory store

At the end of 2020 I had learnt a lot about programming in rust-lang and had worked on some large codebases, all while doing some interesting system implementations on the side. We had planned to implement our thesis project by the end of the academic year, but faced a lot of challenges in getting started, procrastination had hit us hard. By January I felt we had done enough experimentation and that we needed to do some "real" work, and hence got started on implementing a core piece of the puzzle, the partially distributed datastore, which consisted of a central server and multiple clients with the ability to cache values and check if invalidated.

To do this, I had to learn gRPC, for which I figured out Tonic would be a great place to start with, and thus, with the code that @bkp31415 had written in proto3, I got started implementing what would be the first iteration of dstore's gRPC API. An interesting DSL to learn, proto turned out to be quite a bit more coplex than we had anticipated, we will talk a bit about that later.

Next, we had to implement the datastore, and to do that we had to write a server implementation. I was learning a lot about how Mutex locks work, I sure know a lot more now, but at that point I felt our application wouldn't worry much about having to wait a few seconds to get the lock while we were performing other operations on the datastore. Yes, at this point the datastore was built on top of a simple HashMap<String, String>, I was only trying to hold key-value data here, and this lack of complexity helped make things a lot simpler. Three RPC definitions were also made in this commit, and this is how dstore clients perform operations on the server, with set(), get() and del() being the most obvious name choices I had. Here, main() is basically hardcoded to run the server on localhost:50051, not much else to see here, maybe the cargo config files are slightly interesting, but we will get to that later.

When I started writing the dstore client implementation, I copied a lot of code from a previous project, KVDB. I now had a command parser and a CLI that mimicked something like redis implemented in the project, it was time to integrate this with logic to perform operations on the server. At this point, the idea was to implement an MVP, to see if something like KVDB, which was written to be used entirely locally, could be refactored for use over the wire. That meant the implmentation was strictly useful as a toy educational tool, maybe extended for future use in other projects. One who reads the code can see something similar take shape.

struct Store {
  db: HashMap<String, String>,
  global: DstoreClient<Channel>,
}

The code at this stage only allowed us to perform operations on the global datastore, while keeping the local cache updated, I still had to experiment a bit with a few implementations of how to perform the cache invalidation step, but let's keep that aside for a while. The parser is as simple as it gets, a crude implementation meant only to help us move forward with implementing the rest. Also, given that the cache was to be operated on by a single entity, i.e. the user's commands, I decided to implement it without Mutex, that as you will see, changes later.

After a day, I decided to use bytes::Bytes, which also meant a few pieces of additional logic to perform coversions had to be written in. I found this gist really helpful for that purpose, a concise and well written set of examples, very much a time saver. Rewriting the proto file was also fun and packed with new knowledge, especially the difference between how string and byte types are handled.

Now that we were ready with a basic "KVDB over the network" implementation, I could move on to rewriting dstore as a library and that is what I did. Pulling out code that was specific to the demo usecase of "a database that can be queried" into the examples folder helped clear out a lot of the mess that had been brought in with what was essentially an unnecessary CLI/parser implementation.

To be Continued

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment