Skip to content

Instantly share code, notes, and snippets.

@dacamp
Last active June 17, 2024 15:12
Show Gist options
  • Save dacamp/2db5b1be80d914022985bda39f64a499 to your computer and use it in GitHub Desktop.
Save dacamp/2db5b1be80d914022985bda39f64a499 to your computer and use it in GitHub Desktop.

Challenge

Your environment runs on Ubuntu 16.04 in a Docker container as a non-root user. Each of your containers will run in a separate network namespace, but the same mount and PID namespace.

Your job is to make a highly available distributed database...of counters. We will run N instances of the database. You are to build your challenge so that it extracts into the following layout:

challenge
challenge/bin
challenge/bin/challenge-executable
challenge/...

The working directory we will execute from is the directory directly above challenge. If you need us to set any environment variables, let us know.

You are to provide a compressed artifact (tar, with gz, bz2) that contains only one folder 'challenge'.

We will execute ./challenge/bin/challenge-executable , where id is an integer, 1... You can use anywhere on the system as scratch storage. You do not have to ensure that the data is synchronously.

You are to bind to 0.0.0.0:7777 on each instance and provide the following API:


* POST /config {"actors": ["1.2.3.4", "1.2.3.5", "1.2.3.6"]} (JSON)
* GET  /counter/:name:/value -> value
* GET  /counter/:name:/consistent_value -> value
* POST /counter/:name: non-negative-integer in ASCII 0...

As we understand that bootstrapping communications is a complex process, we will POST a config to you with the actors, which are each of the container's IP addresses before we send you any requests. The IPs in above example may not be the IPs that are actually used.

Your service may experience network disruptions and process failures. We expect you to recover from network disruption without sending another config.

Your service is to provide three HTTP REST API endpoints. The first is a /counter/:name:/value, which for a given counter is meant to provide an integer response of the value of the counter. If the counter does not exist, respond with 0.

It may be inconsistent, and therefore there is another endpoint, /counter/:name:/consistent_value. This endpoint is allowed to block for a period of time, and we will not send any requests until it responds. If network conditions prohibit it, you may return an error. The expectation is that after we heal the network, you will respond with the current value of the counter within a reasonable amount of time.

The other API that's available is POST /counter/:name:. We will only POST integers that are equal to or greater than 0.

Your objects in order are (highest priority to least priority):

  1. To be accurate for the consistent endpoint. That endpoint should always return values that are accurate given a point in time after all messages have finished processing in the system.
  2. To be available in light of network or process failure.
  3. To be crash-tolerant.

You should assume that the number of counters will be <10k, and that there will be more than enough memory on the machine that it will not be a problem.

Potential Compromises

A lot of software engineering is about compromise. If you find that consistency is very difficult, let us know, and explain why it's very difficult, and how you might implement it. We test the API programatically, but as long as the change is within reason, we should be able to adapt the tests.

Other examples of compromises may include persistence, speed, etc..

Why we do it

This is an actual problem we've solved at Mesosphere in the Systems & Networking team. In fact, every time we test it, we test what counter convergence looks like across some 20-something processes. You're allowed to use this code base an inspiration for your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment