Skip to content

Instantly share code, notes, and snippets.

@tedivm
Last active December 18, 2015 00:29
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tedivm/5696677 to your computer and use it in GitHub Desktop.
Save tedivm/5696677 to your computer and use it in GitHub Desktop.
Caching Proposal Differences

Caching Proposal Differences

This document outlines the difference between the two Caching PSRs-

tedivm's proposal - https://github.com/tedivm/fig-standards/blob/Cache/proposed/PSR-Cache.md

tedivm's original PR - php-fig/fig-standards#17

tedivm's PR - php-fig/fig-standards#140

Dragoonis's proposal - https://github.com/dragoonis/fig-standards/blob/master/proposed/psr-cache.md

Dragoonis's PR - php-fig/fig-standards#96

It's highly recommended that you read through both proposals before reading this document.

Disclaimer

I am the author of the majority of both of these proposals. However, I am bias towards the original proposal I wrote (tedivm's proposal, tedivm being me) and this document is primarily written to demonstrate why.

Proposal Text Differences

This section outlines the differences in the text of the proposals, going on a section by section basis.

Introduction

This is literally lifted from my proposal verbatim, so there are no differences to speak of.

Definitions

Many of these definitions are taken from my proposal. There are modifications, which I'll go through.

  • TTL - The only real difference here is that my proposal also takes a DateTime object. I think this is reasonable option, but it's also reasonable to take it out.

  • Expiration - This is also mostly from my proposal, but removes the references to a DateTime object.

  • Key - This one has some major differences. My proposal goes into detail about what's expected of a key and how libraries should handle them. It also reserves some special characters that shouldn't be used here- the original reason for this, as discussed on the list, was to allow those characters to be used by future proposals that extend how caching works. Removing that text removes many of the options we have for the future extensions.

  • Miss - This is absent from the reworked proposal. I feel this definition removes ambiguity on how the caching systems should be expected to work.

  • Implementing Library, Calling Library - These are in the original proposal primarily to make explanations easier later on. I feel it makes things more clear.

Data

This section is missing from the Dragoonis Proposal. I think this section is important- it explicitly outlines what caching libraries need to support in order to be interoperable, and how they should handle situations where they for some reason can't do that. Data is the entire point of caching, so how caching libraries handle data needs to be clear.

Although this seems like a simple list of data types, it does also mention a points that really do need to be considered.

  • String Encoding - all strings should be put in and pulled out without issue, regardless of their encoding.

  • Large Integers - very large numbers can be problematic, and it's possible that caching libraries implementing this spec won't properly test or handle them if it's not stated that they should.

  • Null - the system needs to be able to store null values.

  • Objects - whether or not objects themselves can be cached, and what the requirements for that are, is a pretty important factor that should be defined by this standard.

Key Concepts

In the original proposal this section was used to describe the concepts behind the Pool (called Cache in the revised proposal) and Item classes. I think it's important for anyone using or implementing these libraries to understand the purpose behind the individual classes, and feel this section should never have been removed.

The Interfaces

As with the rest of the proposal, there are both huge similarities and stark differences between these proposals. My proposal was the basis for most of the text in the Dragoonis Proposal, although much of the content was removed and changes were obviously made.

Names

There are a lot of small changes to names, which really are up to the preferences of the group. There are some areas where I think we can be more consistent, and others where our previous conversations are relevant, but for the most part these are superficial issues which require consistency and commitment more than ideology at this point.

  • Item versus Item Interface - This depends on the preferences of the group, but I'm not sure the Interface part is really needed after the name.

  • get vs getValue - In my proposal the name is "get", rather than "getValue". I think this is more consistent, as in both proposals values are stored using the "set" function rather than a "setValue" one.

  • isHit vs isValid - this got switched around a lot on the list. Another option discussed was isMiss (since most of the time you're only performing a special action is when a miss occurs). There seemed to be a lot of strong opinions over this, and is one of those areas where consensus seems to be needed.

  • Pool vs CacheInterface - I had originally named my "Pool" class as Cache, and then switched it to Pool after conversations on the mailing list. Reverting the name is fine, but does ignore that previous conversation. The main logic behind the change was that the work Cache itself was loaded and could mean a few different things to different people, so renaming it to something like Pool helped distinguish it's unique functionality.

  • getItem versus get - My proposal names the function which returns the Item "getItem" so it's clear it's being returned. With the revised proposal it's just called "get", which I think can cause confusion (get and set are both used, but apply to different types of data altogether) and is inconsistent with the Item class's use of "getValue" already.

Multiples

There are two ways to handle this that are very similar, but have different implications. One is to simply send an array of keys to a set of functions that return the results as another set of arrays. This method is the one proposed in the dragoonis proposal. My proposal attempts to use an object (an item iterator object to be specific) as the return object. This has some performance and emulation benefits- by using objects the underlying library can optimize calls in a number of ways (running batches of multiple gets to allow processing to begin sooner, queuing set operations so they don't all have to be stored in memory, etc), and it makes it easier to emulate support for drivers that don't support it (a custom iterator can just make the cache call each time instead of merging them into a single call).

There are a lot of benefits to each option, and they both need to be explored more.

Comments

Simply put, I feel my proposal has better and more in depth comments. People can take the interfaces themselves and use them as a pretty solid start to creating their actual implementations. Many of the comments on the dragoonis proposal are comments lifted verbatim from my proposal, only with a significant portion edited out for no apparent reason. Standards should not be ambiguous, and having more in depth commenting benefits everyone.

Additional Problems

  • Weak TTL - By allowing silent failure when TTLs aren't supported the revised proposal makes using TTLs impossible. If you depend on them working and they don't you're basically out of luck, so no one can actually use this dependably.

  • Weak Keys - There is a recommendation on what cache keys should have, but no actual standard. The wording is also really vague, and it's not entirely clear whether the ,._- symbols are allowed or discouraged. Keys really, really need to be properly defined as part of this standard.

Weak Versus Strong Item Class

A major difference between the two proposals is the strength of the Item class. My proposal encapsulates all actions related to individual cache entrees to the Item class itself, while the Pool class handles all of the actions related to the caching system as a whole. The other proposal weakens the Item object by mixing in both individual entry and full system actions into it's version of the Pool class.

The crux of this centers around the Set and Remove functions. These two functions each operate on individual cache entrees. This has a huge number of repercussions which I'll explore below.

Single Responsibility Principle

Classes benefit greatly by being responsible for one man function. It raises the level of cohesion for the individual classes while lowering coupling with others. The Single Responsibility Principle is a well known one, and the benefits of it can easily be found with some quick googling.

The revised proposal breaks this model by placing multiple levels of responsibility inside it's Cache class. Although there is an Item object, it doesn't contain all of the functionality of an Item.

##Clear and Remove

With the Weak Item class, the super class (CacheInterface in the Dragoonis proposal) is responsible for clearing individual items and the entire cache. This means there are two functions, "clear" and "remove", in the same class. A user of the libraries who isn't paying particular attention to the "clear" and "remove" functions can make some silly but meaningful mistakes. This also just leads to general confusing, forcing people to refer to the documentation to confirm they're doing the right thing.

##API Consistency

Splitting the functionality up for individual cache entrees results in very inconsistent APIs. The get function in the cache class returns a class, but the set function is expecting a series of parameters (key, value, ttl). Putting the set function back into the Item class clarifies the API significantly by placing all of the value manipulation functions into a single class.

##Unclear Behavior

By splitting the responsibility for managing individual cache items between two classes the Dragoonis proposal leads to potentially unclear situations.

The simplest example of this is when a cache item gets updated- in my proposal the Item object itself gets the new value passed to it via the set method, which allow it to keep it's internal data consistent. This means that future calls to the "get" function will return the updated information. A library attempting to do that with the Dragoonis proposal would be forced to implement a series of observers, or make redundant calls to the caching system.

##Reduced Boilerplate

By placing all of the item manipulation functions in the Item class itself, we reduce the need to throw the "key" value around everywhere. We use it a single time to generate the item, and then have the ability to get, set and remove the item from the cache system without the need for a key. This may not seem like a big deal, but it makes working with multiple items at once a lot easier.

##Stateful versus Stateless

Since the Dragoonis puts the set and remove functions in the Cache class they are extremely limited in what information they can store between requests. The Item class itself is partially stateful, in that it stores some metadata as well as it's value. This is why functions in the Item class do not need a key passed to them, while those in the Cache class do. Although this may not have a huge affect on libraries that incorporating caching libraries, it will limit what the authors of those libraries will be capable of doing. There is a significant amount of functionality that libraries can add to make caching more efficient behind the scenes that is certainly beyond the scope of this standard, but just as certainly shouldn't be limited by it.

Process Issues

The process of how the new proposal was developed is cause for concern. As is evident by anyone who reads both proposals, a significant- if not majority- of the content in both comes from my proposal. My proposal was worked on for over a year and included input from many members of this group, and included far more than my own opinions and work.

When the new proposal was put together no one followed anything resembling best practices, and instead of creating a branch off of my repository they started fresh. The copy/paste ninjas came out and placed the text from my proposal into theirs, removing all of the history, attribution, comments and reasoning behind the changes made. More so, they made the bulk of their changes over a couple of days in IRC without consulting or talking with the group about those changes, and essentially excluded me from the process despite leaning heavily on my work.

I have no doubt that there are good ideas in that proposal. However, the process made it hard to find those and work with them. By starting completely fresh with another competing proposal it no longer became about individual issues, as everything was clumped into one big change. Ideally those changes could have been made as pull requests against my repository, allowing for discussion of them on an individual basis. Instead my proposal was essentially discarded and I was forced into a position where I had to defend it as a whole, despite the fact that most of the alternate proposal was also written by me.

Recommendations

In order to get this proposal passed I think there are some steps that should be taken. I personally think these are reasonable, but am happy to adjust them based on other input.

  • The first step should be moving back to the tedivm standard as the basis for the caching proposal. This isn't to say the work should be discarded, because I don't believe it should be. However, I think discarding the history and work of the tedivm proposal is also flawed. Going back to my original proposal allows us build on top of that, and that building should include the work people have been putting into other proposals.

  • Create another fork of the tedivm branch and incorporate the changes from the Dragoonis proposal. These changes can then be submitted back as Pull Requests, which will allow them to be discussed in an organized way by the mailing list. We can put them to a vote after a certain timeframe or do whatever is needed to gain closure on those issues.

  • As the first PR to consider, we should focus on the Weak versus Strong Item question. I honestly feel it is the biggest technical hurdle to getting the caching proposal passed, and it has huge ramifications for how this standard can be implemented. This is the hardest question left, and should be solved.

  • Alongside the WvSI question, individual items such as definition changes, removed comments, and other smaller scope items should be submitted as additional PRs and voted on quickly to keep the process moving.

  • Certain functionality, such as dealing with Multiple actions, Incrementing/Decrementing, and Group Invalidation should be held off into a second caching proposal. These are very important items, but will need more time to be developed. Getting the first simple proposal passed will give libraries a chance to implement that while the next more complex phase is worked out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment