Skip to content

Instantly share code, notes, and snippets.

@eddiecorrigall
Last active September 12, 2023 14:09
Show Gist options
  • Save eddiecorrigall/b91630865b945894220d70d0333e8b05 to your computer and use it in GitHub Desktop.
Save eddiecorrigall/b91630865b945894220d70d0333e8b05 to your computer and use it in GitHub Desktop.

Scaling

Tips

  • avoid concurrency
  • avoid cyclic dependencies
  • avoid full table scans
  • KISS (keep it stupid simple)
    • prioritize simplicity over complexity
    • don't be too clever
    • leverage open software solutions

Scaling Solutions

Concurrency

It is tempting to scale an existing design solution and make it concurrent. There are cases where it is appropriate to introduce concurrency, however, first exhaust all other solutions.

Concurrency is not a catch all solution. There are many more things to consider when running in parallel.

  • race conditions
    • clobbering data
    • failure of one or more threads
  • debugging
    • stack traces are no longer clear
    • event logging can be misleading
  • resource constraints
    • managing resources at scale
    • internal and external bottle necks
    • zombie threads/processes
  • call return
    • managing call return with join or non-joined threads

Look for the low hanging fruit, which have low risk and high reward. Otherwise you might make the system less scalable, and more complex with concurrency.

Asynchronous Workers and Acyclic Dependencies

Imagine a system as a graph, where a vertex is a system, and an edge is its dependency. Ideally we want a system that is a directed acyclic graph.

For example: AWS Kinesis streams should all flow in one direction. That is to say a consumer should never be a producer to a direct or indirect dependency. Otherwise the system becomes much more complex. A likely scenario of introducing a cyclic dependency is a positive feedback loop. This would mean that the processor might process the same event one or more times, possibly as an infinite loop.

Full-table Scan

A full-table scan is where a database query has a filter that causes most or all records in the table to be processed.

-- Example of a full table scan
SELECT * FROM users;

Sounds intuitive? Full-table scans happen by accident more often than you think.

For example, when implementing pagination for frontend, a search query performed by the user restricted by a limit and offset. A subset of the data is returned for a specific page, with a upper limit of the number of entries. While this is fine, what often is over looked is the total count of records that match the users search criteria. When the programmer wants to present the total number of records or pages, they may fall prey to implementing a full table scan.

-- Query: get second page matching search criteria
-- Runtime: O(log(n))
SELECT *
FROM users
WHERE email LIKE '%@gmail.com'
ORDER BY created_at DESC
LIMIT 5 -- records per page
OFFSET 10 -- skip first 10 records (or 2 pages)
;
-- Query: get total records matching search term
-- Runtime: O(n)
SELECT COUNT(*) AS total_records
FROM users
ORDER BY created_at DESC
WHERE email LIKE '%@gmail.com' -- search criteria
;

In this scenario, when a user performs a search using the frontend application, the backend will perform a full-table scan each time the page is loaded. There are a couple of ways around this.

  1. cache the query especially if the total amount must be exact - consider memcache
  2. have the database approximation rather than produce an exact amount
    • set an upper bound for query duration
    • set an upper bound for the number of records scanned
    • determine an optimized query to approximate for the specific database technology

KISS

Full-text Search

Don't use ElasticSearch unless the company is willing to pay for it.

  • Full-text search is available in many database technologies including: Postgres, MongoDB, etc. Introducing ES as an auxiliary database means that data must be replicated from the primary database. Replication introduces replication lag, and prevents data inserted and updated in the primary database from being immediately searchable. In addition, replication forces the system design introduce a batch processor or streaming technology. Alternatively, it is easier to explore feature sets of existing technologies first, before adding more to tech sprawl.
  • ElasticSearch needs maintenance and requires dedicated employees. Backed by Lucene, ES's Lucene indices are immutable. An ES with heavy writes means that the Lucene indicies will need to be rebuilt frequently to increase write and read capacity, and cleanup storage. It is not necessarily backwards compatible with previous versions, and ES index schema cannot be modified once created. ElasticSearch will cost the company more than the infrastructure bill.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment