Skip to content

Instantly share code, notes, and snippets.

@paambaati
Last active June 26, 2023 13:29
Show Gist options
  • Save paambaati/d74cd31823ed99f0dc3e280e28d5c545 to your computer and use it in GitHub Desktop.
Save paambaati/d74cd31823ed99f0dc3e280e28d5c545 to your computer and use it in GitHub Desktop.
Designing an Undo system

Designing an Undo system on the backend (v2)

Updates to implementation details based on today's call –

  1. On an update call, make the client send a "snapshot" of the current record, along with the incoming UPDATE payload and save these to cache. Let's call this v1.

  2. Generate the UPDATE query, execute it, and return the updated payload as usual, along with a unique undo_id. Additionally, save these to cache as well. Let's call this v2.

  3. Now if the API consumer needs an undo, they need to send the undo_id, with which we should be able to look up both the v1 record, the v2 record, do a diff and execute a new UPDATE query that rolls back v2 to v1.

  4. Evict the v1 and v2 records from the cache based on a configurable TTL or after an undo operation is executed, whichever is earliest.

  5. Also expose the undo TTL along with undo_id (see #2).

Designing an Undo system on the backend

Problem statement

When the user performs certain "update" actions (i.e. actions that update 1 or more records on the database), we'd need to give them an "undo" option that lets them roll it back.

Prior art / Inspiration

  1. https://blog.google/products/gmail/how-to-unsend-email-gmail/

High-level design

For certain update operations, we queue them up to a time-bound append-only log (say 10 seconds, but configurable), and at the end of the 10 seconds, "flush" or "commit" them to the database. Users can undo the previous operation by calling a specific .../undo API.

When an undo operation is called, we generate the undo (or rollback) database query and queue it too. At the end of the commit window, we play all the queued operations in order.

Constraints

To keep the system reasonably observable and operationally simple, we should add a few self-imposed constraints –

  1. Allow undo support for only a few tightly-scoped operations.

    1. Allow undo operations only for atomically update-able records. For example, records only inside a single table.
  2. Allow only very few undo steps (say a maximum of 1).

    1. While this might be a fun and interesting problem to solve engineering-wise, this can quickly get out of hand in complexity.
    2. Study user behaviour, and know for sure that users need more than 1 undo at a time, before increasing the maximum undo steps allowed.

      For example, Gmail allows only 1 undo, because at any given point in time, a user is trying to send only 1 email.

Low-level design

  1. Each undo-compatible operation will only return an acknowledgement in the API response after queueing the UPDATE query to the queue. Each queue item also gets a timestamp.

    Use a queue (like SQS) so we can get retries, error handling and delayed reading (i.e. executing at the end of the undo time window) out of the box. If this is not feasible, use Redis streams and poll them with a delay.

  2. On undo, generate the rollback UPDATE query and add it to the queue.

This assumes the UPDATE can be generated entirely in-place, and does not require knowledge of what inputs the previous operation did.

  1. Process the queue with a delay (= undo time window), and replay all the queries, in order, on the database.

Working around constraints

  1. If we have to support a large undo stack size (i.e. more than 1 undo possible within a window), we can still use the same system, with a few changes –

    1. Assign a unique ID (say undo_id) to each queued item, and return it in the acknowledgement.
    2. Use a hashmap-like data structure (if Redis, ordered sets will also suffice) to lookup and either delete the original operation, or to append a rollback operation, according to the performance characteristics of the queue implementation.

Edge-cases

  1. What happens to queued items mid-deployment?

We can have the queue consumer lookup items in the queue on bootstrap and replay them, according to the timestamp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment