Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save CMCDragonkai/67b82523fe4cc456f06e to your computer and use it in GitHub Desktop.
Save CMCDragonkai/67b82523fe4cc456f06e to your computer and use it in GitHub Desktop.
PHP: Shared Nothing Concurrency Architecture

Shared Nothing Concurrency Architecture

Although PHP now has a number of features that allow you to control concurrency, it was designed with the shared-nothing architecture philosophy.

Anything that needed to be shared was intended to be pushed down to a co-ordinated robust database or a network filesystem. All state inside a PHP process was to be short lived and only exist within a single PHP lifecycle, which is also the lifecycle of an HTTP request and response.

This allows a PHP application to be linearly scalable. Theoretically you can just spin up as many PHP process handlers as you need them. With the only bottleneck being your underlying database or filesystem layer.

Understanding this philosophy is very important in dealing with any kind of concurrency issue in a shared-nothing PHP application. All kinds of consistency and synchronisation solutions must be provided by your underlying datastore.

I made a mistake once when writing a PHP library. I inadvertently made any PHP application using this library to possess non-deterministic operational semantics due to the cache-coherency problem.

Basically, instead of relying on a database to handle all shared state. State from the database was queried at the beginning of each PHP lifecycle, it was then persisted in-memory, allowed the developer to think that they could mutate this data and save the data back into the database safely. This would only be true if there were no concurrent requests affecting the same data. If there were, then there could be data-loss, as the the data mutations depended upon the data being queried at the beginning of the lifecycle, but by the time this data was written to the database, the relied-upon data was already outdated. This meant the new updated data was updated on no-longer relevant assumptions.

Here's a simpler example. Suppose I have a summary counter that tracks the number page views, and I want to increment on each request. On the first request, I queried the data as 1000, added 1 to get 1001, and wrote this to the database. Now suppose that 2 requests happen concurrently, each of which queries the data to get 1001, each of them increment both to 1002, and each submit 1002 to the database. This is obviously incorrect, and the correct counter would be 1003.

The above is exactly what happened when using this library. I basically brought in the cache coherency problem into what should be a shared-nothing PHP application.

In PHP there are many ways to solve this cache-coherency problem. The most easiest way is to simply remove this stateful computation, and push it down the database. Instead of mutating based on state that exists inside the PHP process, mutate using only the database API, such as using a transaction, locks or an atomic update SQL query:

UPDATE table SET x = x + 1 WHERE id = y;

For the usage of transactional locks, see:

Be wary of using any kind of "active records" or other database abstraction tool, as they can be too leaky and prevent you from fully utilising your database's concurrency control techniques.

If you still want to keep this stateful computation inside your PHP process, then you will need to implement a cache consistency model, which allows your processes to coordinate their mutations, so they end up being consistent.

Look into cache-coherency for more information on this.

A Note About Sessions

Due to the shared nothing architecture, if your PHP application is managing the sessions, then you better make sure each request for a particular user session goes to the same PHP runtime. Otherwise you still run into the same cache coherency problems but for sessions. Most web servers support this endpoint pinning ability.

Otherwise you need some way of sharing the sessions across your endpoints.

@shalvah
Copy link

shalvah commented Mar 8, 2018

Thanks for taking the time to write this!

@MichaelObi
Copy link

Thanks a lot for this.

@mykeels
Copy link

mykeels commented Mar 8, 2018

"On the first request, I queried the data as 1000, added 1 to get 1002"

Should this be "On the first request, I queried the data as 1000, added 1 to get 1001" or "On the first request, I queried the data as 1001, added 1 to get 1002" @CMCDragonkai?

@CMCDragonkai
Copy link
Author

The first. Updated thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment