Skip to content

Instantly share code, notes, and snippets.

@tlockney
Last active November 24, 2015 08:10
Show Gist options
  • Save tlockney/0e243843b6e2c9e271e2 to your computer and use it in GitHub Desktop.
Save tlockney/0e243843b6e2c9e271e2 to your computer and use it in GitHub Desktop.
What questions would you put on a phone screen for a distributed systems position?

What questions would you put on a phone screen for a distributed systems position?

These come from @tsantero with the last two additions being curteousy of @ifesdjeen in reply to this question from @SeanTAllen.

  1. explain the life of an http request.
  2. what does the FLP result teach us?
  3. what is a byzantine failure?
  4. explain CRDTs
  5. explain linearizability.
  6. how does DNS work?
  7. crash-stop vs. crash-recovery?
  8. difference between soft and hard real time
  9. model GC in an eventually consistent system
  10. discuss clock skew, NTP, and AWS vs metal
  11. what is causal consistency?
  12. whats the difference between a vector clock and a version vector?
  13. what is chain replication?
  14. at-most vs. at-least once
  15. model RYOW in an EC system.
  16. why does fail-stop rely on perfect fault detection?
  17. how is this flawed?
  18. how does 2PC fail?
  19. how does 3PC fail?
  20. why can't Raft survive byzantine failures?
  21. how is 2PC different from fast-consensus?
  22. what do merkel trees accomplish?
  23. how do replicated logs work?
  24. discuss an EC system that implement idempotent increments and decrements of counters
  25. why are timeouts common in RPC?
  26. push vs. poll for stats and why?
  27. advantages and limitations of the actor model?
  28. what is process calculus?
  29. what does termination, agreement, and validity guarantee in consensus protocols, respectfully?
  30. which systems are best optimized for safety?
  31. liveness?
  32. what are the characteristic differences among the two?
  33. why is WAN replication difficult?
  34. which programming languages are best suited for building concurrent systems? why?
  35. have you ever been to a RICON?
  36. watched the videos?
  37. what are 3 fundamental diffs between Cassandra and Riak?
  38. how can you reduce tail latencies in distributed systems?
  39. why do they matter?
  40. explain set union vs set intersection
  41. in your opinion why arent there any decent distributed time series databases on the market?
  42. how can you break paxos?
  43. why is a total order reflexive?
  44. what are the fundamentals of atomic broadcast?
  45. what does a POSET guarantee?
  46. what is TCP incast?
  47. compare and contrast Kafka, Storm, and Spark
  48. are you familiar with LASP?
  49. difficulties of programming concurrent systems in shared memory env?
  50. (last one) do you even know how vector clocks work?
  51. describe explain typical concurrency bugs (race conditions, deadlocks etc) how to avoid, prevent and detect them?
  52. which concurrency models do you know existing? Advantages/pitfalls/use cases?
@tlockney
Copy link
Author

For what it's worth, since this came up on Twitter, I think the above questions are fantastic, but they also set a very high bar that's not necessarily appropriate for everyone that might be doing distributed systems development. That said, I think this can be an excellent guide and benchmark for developing an overall "ideal" for a dist-sys engineer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment