Dear reader,
This is a small article to explain the vision and strategy for replacing the TYPO3 caches with a PSR-6 implementation. In the article I will explain the background of caches in TYPO3, the similarity it has to the PSR-6 standard, and the strategy for replacing such a vital component with as few problems as possible.
Current status: framework is almost entirely replaced and the vision described here is being prepared as merge requests.
Note, I have intentionally put the references to chosen framework last in this article. Please read the article before you check the foot note references - that way you will keep in mind all my decisions when reading the capabilities of the chosen third-party library.
For the longest time - almost since the very beginning - TYPO3 has used a sort of "mixed caches" strategy and conceptually made a difference between what I call persistent caches (files that get generated and do not expire or have very long life) and the more volatile caches like those storing generated content, which usually lived for only a day by default.
Later, but still much, much earlier than PSR-6 arrived, TYPO3 created a proper caching framework that allowed users to more closely configure how caches would behave, and provided an API that lets integrators change these things as well.
This solution has served us well for many years, but the arrival of PSR-6 made it reasonable to re-evaluate - in particular because of the many similarities between TYPO3 caches and the PSR-6 standard.
In this sense, very little is required to migrate to a different cache implementation built on standards but provided by a third party library. One that is very open to being extended with new drivers for different engines.
The vision is simple enough to state:
Replace the concept of "cache backends" in TYPO3 with "cache pool" concept from PSR-6
Implementing it is however a multi-step process. I've described this plan in outline on https://forge.typo3.org/issues/81432 and will elaborate on that plan here.
I apologise for the rather technical nature, if you're not a developer this may be hard to relate to, but I'll try to explain in "human words" why each step is a good idea to pursue. Devevelopers will likely immediately understand the reasons and implementation decisions; non-developers will hopefully get a good idea how TYPO3 caching will work after the operation.
Without further ado and in no particular order:
The core of the vision is to replace our backends which implies that our frontends should be changed so they speak to these new replacement "backends". Hereafter I will refer to TYPO3 cache backends as "backends" and PSR-6 "backends" as "cache pools". Currently TYPO3 contains several cache frontends:
- StringFrontend which only handles strings and is compatible with the most basic backends
- VariableFrontend which is capable of storing other variables than simple strings (by serializing)
- PhpFrontend which only works with a particular set of backends that store generated PHP core
- Slightly off-topic: FluidCache from TYPO3 is theoretically also a cache-frontend, specifically a PhpFrontend, but contains a few adaptations that make this TYPO3 frontend work with the much simpler FluidCacheInterface implementation.
Jumping here for a second to the similarities between TYPO3 caches and PSR-6, the TYPO3 backends are currently a rather complex structure of several interfaces which communicate different capabilities. However, PSR-6 has no such concept of differences in the cache pools and instead provides a generic API and leaves it up to each implementation which type of values it supports and how it stores those values.
Because of this unified backend interface it now makes sense to unify the TYPO3 cache frontends to such a degree that we will have only one frontend (but preserve the option for developers to create custom frontends).
In bulletpoint form this means:
- The single cache frontend will be
CacheFrontend
and it will speak to PSR-6 cache pools. - I'm attempting currently to somehow unify the FluidCache, but worst case scenario: there will be one subclass of this CacheFrontend that does not speak to cache pools but rather acts as a bridge between TYPO3 and Fluid caches. Best case, there will be no need for a Fluid cache implementation (and in theory you could use any storage for Fluid caches too).
- All current frontends become deprecated but the interface is preserved (with one significant change which I'll explain later in the article as a separate section).
- All current backends become deprecated as well (which I will also explain in a separate section).
The conclusion regarding cache frontends is that the public contract must be preserved as much as possible because it is the way TYPO3 and extensions integrate with caches. Preserving the FrontendInterface but deprecating all current frontends in favor of a universal PSR-6 aware frontend serves this goal perfectly.
The vision for this has two main inspirations:
- A failed patch I created recently for TYPO3 - https://review.typo3.org/#/c/52415/ which would have allowed zero-config caches to be exploited.
- The current lack of support for switching all core cache backends (there are a lot of them, quite a collection if one were to override each one individually).
This is the first of three side-features that will help integrators and developers with a few vital aspects:
- Make it significantly easier to migrate existing cache configurations to the new expected configuration
- Allow integrators to switch all core caches from one backend to another with a single setting
- Make it possible to operate a cache with system defaults without explicitly declaring that the cache exists
- Make cache backend "engines" possible to consume separately from caches (described later)
To achieve this the TYPO3 cache configurations need to be significantly trimmed down and utilise a configured default rather than, as it is now, only refer to a (protected) PHP class property which contains defaults. The next step is to allow the previously hardcoded defaults to be configured as part of TYPO3's system configuration. And finally this makes it easily achievable to operate caches on-the-fly without first having configured them.
The resulting TYPO3 cache configuration might look like:
// DefaultConfiguration.php -> SYS.caching.cacheConfigurations
// Shared base configuration which applies to all caches. Custom configurations
// are then array_replaced on top of these defaults.
'_default_' => [
'frontend' => \TYPO3\CMS\Core\Cache\Frontend\CacheFrontend::class,
'backend' => \Cache\Adapter\Filesystem\FilesystemCachePool::class,
'communicator' => 'FlysystemCache', // refers to configured "communicator"
'options' => [
'defaultLifetime' => 0,
],
'groups' => []
],
// The cache_core cache is is for core php code only and must
// not be abused by third party extensions.
'cache_core' => [
'backend' => \TYPO3\CMS\Core\Cache\GeneratedPhpCache::class,
'groups' => ['system']
],
'cache_hash' => [
'groups' => ['pages']
],
'cache_pages' => [
'groups' => ['pages']
],
'cache_pagesection' => [
'groups' => ['pages']
],
'cache_phpcode' => [
'backend' => \TYPO3\CMS\Core\Cache\GeneratedPhpCache::class,
'groups' => ['system']
],
'cache_runtime' => [
'backend' => \Cache\Adapter\PHPArray\ArrayCachePool::class,
],
'cache_rootline' => [
'groups' => ['pages']
],
'cache_imagesizes' => [
'groups' => ['lowlevel'],
],
'assets' => [
'groups' => ['system']
],
'l10n' => [
'groups' => ['system']
],
'fluid_template' => [
'frontend' => \TYPO3\CMS\Fluid\Core\Cache\FluidTemplateCache::class,
'backend' => \TYPO3\CMS\Core\Cache\GeneratedPhpCache::class,
'groups' => ['system'],
],
'extbase_object' => [
'groups' => ['system']
],
'extbase_reflection' => [
'groups' => ['system']
],
'extbase_datamapfactory_datamap' => [
'groups' => ['system'],
],
This would be configuration, in it's entirity, required to operate TYPO3. All caches would unless explicitly configured
simply use the system default cache configuration (set _default_
in config). As you can see from this example, for nearly
all caches this means the only configuration it needs, is the groups
list that associates the cache with a group so that
it can be flushed when you flush a group of caches. Anything else has the nature of an override that you only define if
you need the value to be different than the default.
Because no specific configuration is required to operate a cache (any zero-config entry would simply inherit all options
from the _default_
set) there is only a tiny step more to allow caches to "exist" without demanding that they have an
entry in this cache configurations array. The only challenge that needs to be overcome there, is that since the cache would
not be explicitly configured and might not be initialized on all requests, TYPO3 might not know to flush the cache when a
user flushes caches in the backend. The straight-forward solution to this would be to store an entry in cache_core
which
simply applies a single line of additional TYPO3 configuration:
$GLOBALS['TYPO3_CONF_VARS']['SYS']['caching']['cacheConfigurations']['dynamically_used_identifier']['groups'] =
$GLOBALS['TYPO3_CONF_VARS']['SYS']['caching']['cacheConfigurations']['dynamically_used_identifier']['groups'] ?? ['all'];
Then, including this file will inform CacheManager that the on-the-fly cache exists and should be cleared along with "all" caches. And by using null coalesce in assignment the assignment only happens until the developer or integrator decides to configure the cache. Until then, it can be used freely without worrying about configuration.
This pragma will be created as a separate feature merge request.
The kicker is, with a proper zero-config capability that simply uses system defaults, migration can actually be done by deleting your custom configuration of a cache with frontend and backend class names. Delete, rinse, use system defaults that support any type, and you're done. Configuration that isn't there has zero risk of containing bad values. And when it is possible to change a single global default to switch cache strategy it becomes increasingly easy to migrate a site which might depend on caches being in DB, for example.
Which is why I decided to include this even though it is a feature not specifically related to PSR-6.
Like the first feature, this one is not specifically related to PSR-6 migration but makes perfect sense if combined with a larger caching framework refactoring. In essence, the feature consists of just three of new methods on FrontendInterface:
/**
* Lock a specific entry. When an entry is locked, it does not matter if the entry itself
* exists or not - any attempt to get() the entry will throw an "Entry Locked" exception
* and it is up to the consuming code to determine what to do (e.g. display a temporary
* message, do sleep() to wait for lock release, automatically remove stale locks, etc.)
*
* Call this when your implementation is generating cached content on-the-fly and the
* generating happens in user-land, and you want to avoid multiple threads working on
* creating the cached entry simultaneously.
*
* (moved to caching frontend as native feature instead of depending on consumers to do
* locking and waiting on entries)
*
* @param string $entryIdentifier
* @return bool
*/
public function lock($entryIdentifier);
/**
* Unlocks a cache entry that was locked with lock($entryIdentifier). Can be used to remove
* stale lock entries - and must be called by consumers after doing set() on a locked entry.
*
* @param string $entryIdentifier
* @return bool
*/
public function unlock($entryIdentifier);
/**
* Gracefully checks isLocked($entryIdentifier) until either the maximum number of retries
* is reached, or the entry gets unlocked - whichever comes first.
*
* @param string $entryIdentifier Identifier to get
* @param int $retries Number of times to retry
* @return mixed
*/
public function await($entryIdentifier, $retries = 3);
If this is done along with the deprecation of all current frontends it presents little problem. The deprecated frontends can simply implement a no-op solution and the CacheFrontend can be fitted with support for identifier locking.
The code comments should already explain what the purpose is, but to make it easier to consume:
- The idea is that a specific identifier can be locked externally while expensive functions fill the cache entry.
- A locked identifier that you
get()
will throw a WaitException. - If you use
await()
the frontend will wait for the lock to release and fail after some retries.
In essence, a lock that prevents multiple threads from attempting to write the same cache entry. One of the more prominent
examples of this in real life, is the "page is being generated" page that gets displayed when one thread is generating page
content and another is requesting the same page. The pragma there is concensed to a single function called await()
which
for example can let the page renderer wait 3 seconds and retry every second, instead of failing with a temporary page.
In addition: by applying the concept of locking directly to the cache frontend, filling a locked entry immediately releases
the lock and makes await()
return the entry on next retry, with no requirements put on the code that consumes the cache
to support such lock waiting.
The benefits from introducing locking combined with the ideal opportunity to add it during the refactoring, is why I chose to create this feature as part of the overall PSR-6 refactoring. While it can of course be used to replace the current "page is being generated" page in the (near) future, it also makes such locking available to extensions which work with heavy data sources.
"But what if your cache is distributed; locks need to be distributed too then!" you may be thinking right now. To calm that worry: the implementation of locks is done in such a way that locks are stored right alongside the actual entries of the cache. Therefore, if the cache engine is distributed then your locks are distrubuted as well. You would actually need to override classes to make it behave otherwise (and you would then be asking for the problems you would inevitably get).
At this point, describing the actual procedure of replacing TYPO3 cache backends with PSR-6 cache pools is trivial:
- The unified CacheFrontend will be programmed to a slightly different interface.
- The configuration will be automatically migrated and logged (like TCA migration) in all cases where this is possible.
- When automatic migration is not possible, an adapter for TYPO3 cache backends fitted with a PSR-6 interface will be used.
But the last point on this list should raise some questions in you, the reader. The immediate question is:
If there's an adapter for the TYPO3 backends that make them PSR-6 compatible, why is everything else described in this article even necessary?
The answer to that is found in part in the introduction. The overall goal is to be able to use third party PSR-6 cache implementations directly in TYPO3. The opposite approach of the one I selected would also be possible, but IMHO makes less sense on a technical level. The opposite approach being to create a TYPO3 cache backend that speaks to PSR-6 cache pools.
But rather than fit PSR-6 to our cache inventions I chose to do the opposite. And with that, eliminate the need for TYPO3 to maintain a (rather impressive) collection of cache backends which already have community-maintained alternatives in the chosen PSR-6 cache package (or via Doctrine caches). Bonus feature: we gain access to the concept of chained caches as a simple matter of configuration - something that with the current TYPO3 caching framework requires some overrides such as the ones I created in https://github.com/NamelessCoder/typo3-cms-multilevel-cache to add this exact capability in TYPO3.
Because there now are alternatives for nearly all TYPO3 cache backends, the ones we ship (possibly with one or two exceptions) become technically redundant - and the PSR-6 alternatives with very few differences function as near drop-in replacements for our cache backends. This is why deprecations become reasonable.
There are two parts of this migration that are yet undecided:
- Whether or not to migrate the "Typo3DatabaseBackend" to a PSR-6 compatible version, or select a third-party library that can work with for example a Doctrine querybuilder as only input. Although technically undecided it is expected that this specific cache backend will indeed be recreated as a cache pool (and will be schema compatible if possible)
- What should be the fate of the "PdoBackend" which may no longer be remotely relevant since switching to Doctrine.
All in all those are minor details in the whole. Should migration of both be required, it is still easily achieved.
You could argue - and you'd be right - that many sites are currently using custom cache backend implementations. In fact, there are community extensions which provide custom cache backends and all of these use the current TYPO3 caching framework interfaces and so on.
The migration to PSR-6 alternatives does make nearly all TYPO3 core cache backends redundant, and as such it becomes quite reasonable to drop those in favor of the alternatives. It does not however become equally reasonable to completely drop all support for custom cache backends via the TYPO3 caching framework. Instead, this can be a deprecation period of any length desired, with slight performance drop as the only sacrifice you make by not migrating.
The way this can be achieved:
- We deprecate all of our current cache backends but preserve the interfaces
- We create a single "delegate backend" which adopts all our legacy capability-signaling interfaces
- We automatically convert configurations to use this backend when a configuration does not specify a frontend
- We convert configurations which use any TYPO3 core cache frontend to use this delegate with the unified CacheFrontend
By being selective about which configurations we actually do migrate on-the-fly, we can preserve the ability for developers to configure the following cases:
- A custom frontend and backend which do not speak PSR-6 function unaffected (and both get logged).
- A custom frontend which speaks PSR-6 still functions with legacy backends via a bridge (and backend use gets logged).
- A configured core frontend class name gets converted to one that speaks PSR-6. If the configured backend doesn't speak PSR-6 then the bridge is implemented between the two.
The result is that all existing cache configurations are either automatically rewritten or allowed to work as-is, depending on the exact combination of frontend and backend. In essence: TYPO3 caching framework exists alongside PSR-6 caching for as long as this is wanted. Once finally ready to be removed, the selection logic and bridge are removed and errors thrown when an incompatible frontend or backend (read: one that does not use PSR-6 as API) is configured.
The goal here is a soft deprecation, simply due to the vital nature and widespread usage of our existing API. Only when it is certain that a configuration works with PSR-6 adapters will it even be attempted bridged.
The deprecation of our backends brings me to the final feature yielded from this research.
One of the key differences between the chosen cache library and existing cache backends is this: in the replacement library each cache backend does not contain the logic that speaks to a service or file system. A separate class does this.
To illustrate let me compare three of the implementations: Redis, Memcached and Flysystem.
- Redis in TYPO3 cache backend creates an internal
Redis
object instance and uses it. - Whereas the new cache pool alternative takes the
Redis
object as constructor argument. - Memcached in TYPO3 internally selects the appropriate
Memcached
orMemcache
object then uses it. - Whereas the cache pool alternative has two implementations, one for
Memcached
and one forMemcache
, each taking the appropriate object as constructor argument. - Filesystem caches in TYPO3 cache backends use direct file access and has no delegate.
- Whereas the cache pool alternative uses Flysystem which - no surprise - is provided to the backend as constructor arg.
This makes it patently obvious that it becomes necessary to be able to separate the concept of a "cache backend" form the concept of "the thing that communicates with the service that stores caches". Which is what the concept of "communicators" does in the following way:
- A shared interface,
CommunicatorInterface
, is provided. - The interface has methods that receive configuration options for the communication, e.g. host and port number.
- The option setting from TYPO3 cache backends is adopted (e.g. declare option
port
and methodsetPort($port)
will be called to set the actual value; and any option that does not have a setter causes an error).
This allows us to define a "communicator" by class name and provide options for the construction of the class, just like we
provided options for the construction of cache backends (taking Memcached
as example there is a servers
option which is
an array of hostname:port
style definitions; that same option is present on the MemcachedCommunicator
). And once we've
defined the communicator it can then be created via TYPO3 API and used as constructor arguments for pools which require a
particular type of service, e.g. Redis
.
The implementation I chose for this is as follows:
$GLOBALS['TYPO3_CONF_VARS']['SYS']['communicators'] => [
'FlysystemCache' => [
'class' => \TYPO3\CMS\Core\Communication\Native\FlysystemCommunicator::class,
'options' => [
'directory' => 'typo3temp/var/Cache/Doctrine/',
'skipVerifyConnect' => true
]
]
], // Runtime-registered communicators with configuration; assoc array [$communicatorClassOrName => ['class' => $className, 'options' => $optionalOptionsArray]]. Classes not registered can still be loaded by class name directly.
If you refer to the example cache configurations array in a previous section, you can see that the filesystem-based caches refer to this communicator by name; which then causes the factory that instanciates configured communicators to use this configuration. Alternatively it is also possible to configure a communicator for a cache pool by class name alone:
$GLOBALS['TYPO3_CONF_VARS']['SYS']['caching']['cacheConfigurations]['my_cache'] = [
'backend' => \My\Psr6\Backend::class,
'communicator' => \My\Psr6\Communicator::class,
'options' => [
'optionforcommunicator' => 'foobar'
],
'groups' => ['pages']
];
Which would then create an instance of your communicator and attempt to configure it with this option.
You may have noticed at this point that communicators and legacy cache backends use the exact same options location. This is done to make the implementations drop-in replaceable: TYPO3's MemcachedBackend can be replaced by the Memcached cache pool plus the MemcachedCommunicator without having to rename or move options (including those options that may be overridden by additional configuration, extensions, hooks, etc.)
If it was not immediately obvious, the goal is to convert all the remote service integrated cache backends TYPO3 currently contains, as "communicators" that support all the same options but now can be combined with PSR-6 caches.
The bonus capability is that we gain a proper API for system-level configuration for remote services that can then be used also by extensions; for example to expose configuration options for a Redis service that you consume manually in your code. The impact is of course that it is no longer necessary for an extension developer to create configuration options that configure services the extension uses - TYPO3 now contains an API for this exact thing along with communicators for the most popular such remote services that can now be used not just for caches but for any desired purpose. It also gives TYPO3 a way to list configured "communicators" and let the user select which one to use, if/when this applies to future features (think GUI to configure caching or TYPO3 core integrations with Redis as key-value storage, etc.)
Once again this isn't explicitly a demand of the PSR-6 migration, but like the features, this change of strategy would make much sense to implement along with this refactoring (as one break rather than several smaller ones in subsequent versions).
The idea is to replace the current database-first caches strategy with one that uses filesystem as default.
Why, you might ask. The reasons are maybe a bit obvious:
- Putting the caches in DB was a much better decision in terms of performance many years ago than it is today; e.g. keys and relational entries. It means less today because filesystems themselves use many of the optimisations a DB would do.
- By "polluting" the SQL connection traffic with caches you not only weigh down the SQL service but also cause slaves to need write access to the DB tables.
- The benefit of DB-based caches in multi-slave environments is acknowledged and true - however, this same argument is a double-edged sword in that the more slaves you add, the more DB-based caches will "pollute" the SQL traffic and the more likely it is you will see compounded load issues.
- There are plenty dedicated services that provide distributed caches much, much more efficiently than SQL could ever do.
- Such dedicated services are infinitely easier to set up than SQL replication (without assistant tools).
- Most importantly perhaps: the majority of TYPO3 sites are assumed to not be multi-slave setups and those that are, we can (IMHO) almost guarantee at some point will have faced issues due to TYPO3 caches being on databases when multiple slaves need to access it with every single request - and have already looked to or switched to some distributed caches.
The arguments that stand out are that database is already distributed in multi-slave setups and that using filesystem as cache needs other measures, like NFS, to become distributed. The combined sum of the points above should also make it clear that 1) the most frequent use case we have does not benefit from DB-based caches - it suffers from it, and 2) those use cases that demand a distributed cache can be reasonably expected to both not want it on DB and have chosen a proper cache service after discovering the rather serious issues TYPO3 has with caches on remote DB servers (numbers not shown here but if you wish, I don't mind sharing a horror story or two over a beer - wink).
The short conclusion to this is simple:
As part of this migration I will switch TYPO3 to filesystem-first caches in order to fit the majority of use cases in the best possible way.
And having the new features I described available hopefully makes it easier to convince you of this being a good idea: to switch to DB storage - or to switch to a proper distributed cache engine while you're at it - you need only change a single configuration option.
Now on to the less pleasant part:
This all boils down to the things that aren't similarities between TYPO3's caching framework and the PSR-6 standard. The main difference is in how tags are understood:
- In TYPO3 it is possible to read a list of identifiers, via public API, by providing a tag.
- In PSR-6 tags are used for invalidation, not fetching.
This difference is expressed in the public API of PSR-6, namely that it has no counterpart for TYPO3's getByTags
method
which is how you read items by tag value (technically: the chosen cache implementation does internally know how to read a
list of identifiers by tag and so is in theory capable of this, but the public contract prohibits it because in PSR-6 tags
are for invalidation, not public fetching).
Therefore, without rewriting nearly all cache pools provided by the chosen library, there is no way to reproduce the public
contract for getByTags
and we are forced to drop this capability from our public API.
It sounds a lot worse than it is though. At the time of writing this, there is exactly ONE implementation of this function in all of TYPO3. It sits in the admin panel shown in FE when you're logged into BE and have the admin panel enabled, and it gets used to count the number of cache entries for a given page ID provided as tag.
That is literally the extent of what we need to sacrifice. It is a small offering that prevents an inordinate amount of overrides we would otherwise need.
You've waded through a lot of argumentation and strategy to get to this point so here is the reward:
The cache library I chose is http://www.php-cache.com/en/latest/ - in the full-monty version which includes every adapter and cache pool. This library provides:
- PSR-6 implementation alternatives for nearly all TYPO3 cache engines.
- Doctrine adapters with PSR-6 interface for the rest, plus many others.
- The chained cache pool which combines multiple pools (for example to make L1+L2 caches or introduce redundancy)
- A PSR-16 (SimpleCache) adapter to extend the support even further
At the time of writing this I've managed to implement the library and fully migrate my development project to use PSR-6 with "communicators" as described above. The incompatibilities are so far limited to:
- Some characters like
\
not being allowed by the PSR-6 alternatives; which needs to be handled in the cache frontend. - Not having the
getByTags
method available.
In case you are worried about the performance I suggest to pick another topic to worry about ;) so far I tested this with all states of empty/semi-empty/full cache and have profiled the result. In broad terms:
- Yes, switching to filesystem first does increase the performance even with MySQL on localhost
- This includes performance when flushing caches
- Performance is identical for remote service based caches like Memcached or Redis (similar strategies used)
- Surprisingly, several core caches perform better with the alternative cache pools due to the simple fact that the new cache pools require much less serialising and unserialising; and the custom compiled code cache which generates a PHP file that returns a value immediately as you include the file - which I used to replace the current file based backends which currently serialize/unserialize or even substr()s file source to get the cached value. New ones include file, return array and then done - and can be used to cache any plain data type including arrays, as well as cache generated classes.
- Although I did not yet confirm this, I've confirmed with a similar implementation that the "chained cache pool" will increase performance of multi-slave setups significantly (see https://github.com/NamelessCoder/typo3-cms-multilevel-cache)
And with that, you've reached the end of the article. If something wasn't explained or you have relevant experiences or advise about any of the above, don't hesitate to write a comment!
This is not a flaw per se but something to keep in mind while rolling out and documenting this new (promisingly cool) cache framework: benchmarking should be available in the install tool, so that on a new server instance or on any system you could easily try out what works best.
This might come as useless or overkill, but if you have a Webserver that is not Linux, your performance results with Filesystem-first might not be as good as if you'd use database. I know from personal (painful) experience that anything stored on a Windows File System (that might be encrypted) takes magnitudes of time more to be read than anything in a plain old MySQL database. Therefore an integrated benchmark would be very appreciable for any TYPO3 admin to help him or her to choose what framework is best suited for the underlying infrastructure.