Skip to content

Instantly share code, notes, and snippets.

@javisantana
Last active April 25, 2023 02:42
Show Gist options
  • Save javisantana/42b96b41ba5a4de8fdce925b11074c9c to your computer and use it in GitHub Desktop.
Save javisantana/42b96b41ba5a4de8fdce925b11074c9c to your computer and use it in GitHub Desktop.

The problem

There is a client who has a CSV like this one updated daily with new data (1.2Gb of new data each update). They want to expose an API like this:

curl http://rambo.com/api?month=0

{
  "trip_count": 123456,
  "pickup_locations": [148, 114,...]
}

being trip_count the number of trips for month=0 (January) and pickup_locations the different PULocationID (no repeated values) for that month.

It must support 1000QPS with response time: p99 under 100ms and max response time under 1 second.

The test

Write a document explaining how you would solve the problem to someone with development skills. You don't need to implement it.

Rules

  • You can't mention any tech provider (AWS, Google...) or any related technology.
  • One page limit, about 700 words, in English. You don't need to be Shakespeare (I'm currently working for this company and this is my real english level)
  • This problem is open on purpose, make all the decisions you want.
  • Ask anything you want to (jobs@tinybird.co). We recommend you to prepare a set of questions before start.
  • It'd be nice if you could share the writeup with us using a platform where we can leave comments (Google docs for example)

What we value, in order of importance

  • All the questions you ask and in general the comunication during the test. This is what this "tech" test is about, this is how we work.
  • Clarity.
  • Technical approach.

Bonus

  • Explain how it'd work if the CSV is updated every 60 seconds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment