Skip to content

Instantly share code, notes, and snippets.

@chancancode
Last active December 16, 2015 21:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chancancode/5ad770c4b6c85eebe54c to your computer and use it in GitHub Desktop.
Save chancancode/5ad770c4b6c85eebe54c to your computer and use it in GitHub Desktop.
Some thoughts on string-based caching for AM::S

Observations

JSON generation is quite slow, so hash-based caching is not enough. There's also marshal/unmarshal and hash merging cost on cache read (thanks wycats for pointing that out). String stiching is by comparsion a lot faster. I have seen significant speed up in a real-world application (a side project of mine) by using a hand-rolled erb+rabl fragments stiching solution, and my (very) basic benchmark seems to support this.

Thougts

  1. "Russian doll caching"

    Basically, things should be cached seperately and each object should get its own cache key. Assuming you are correctly using touch: true on your associations this should be quite easy to accomplish. Fragments should be reused as much as possible, so for example when one of the posts in a topic is edited, only the fragment for the specific post should be re-rendered, everything else should already have an existing fragment that's ready to be stiched together with this new post fragment.

  2. Side-loading (embed IDs)

    For embedded associations, it's basicaly 1:1 analogous to Russian doll caching in Rails views, so we should be able to look up a topic from the DB and make the decision of whether we can reuse what we have or do we have to load all the posts for this topic. I don't want side-loaded associations to incure the unncessary penalty of fetching all the objects to determine if they have changed. In theory, this should not be a problem and the only requirement is that any changes in your side-loaded associations must be reflected in the parent's cache key (e.g. touch: true and updated_at based cache-keys). I think this is sane, but I might have missed some cases where this is not possible/desirable.

  3. Metadata

    Metadata basically presents the same problem as side-loading. In order for this to work, the requirement is that any metadata change must result in a cache key change. I'm not sure if we can automatically do this for the user - they probably have to override the cache key method to explictly take into account any external metadata dependency. I'm not 100% sure what people use metadata for besides the obvious pagination, count, etc, so I'm not sure if the requirement makes sense. Also, I am wondering if there are cases that you might want to reuse the cached fragments for the object(s) but recompute the metadata everytime (like adding a freshness timestamp or something). If we need to support that we might need some sort of configuration option.

  4. Stiching

    Stiching is generally quite "fragile" and "dangerous", in the sense that it's quite easy to generate bad data. But I think the speed benefit easily out-weights the cons and I'm hoping that with extensive testing we can be relatively confident w.r.t. generating valid JSON. I think this part is one of the more reusable part of this project so I'm thinking maybe we should make this a seperate gem and have AM::S depend on this. Thoughts?

    Challenges/thoughts:

    Make this work with all popular JSON gems. We need to do things like throwing away square brackets, which might be non-trivial when working with multiple JSON gems.

    I think we might want to cache object fragments. e.g. '{ "a": 1, "b": 2 }' => '"a": 1, "b": 2'. The two main use cases are: 1. stiching together the "base" fields with the "extended" fields and 2. "ID arrays"/identy map, e.g. "{ "1": { "id": 1, ... }, "2": { "id": 2, ...} }". I don't think either of these seems to be particular popular among the AM::S world, it's definitely a thing in other libraries like Rabl (their "inheritance" is basically #1 if I understand correctly) and Brainstem. If possible I'd like this piece to be reusable for other libaries as well.

    ...although caching object fragments introduces tricky problems like key collision.

    Inserting arbitary content in the middle of a JSON string. For example, lets say the parent object (Topic) changed but the associations can be re-used. We can obviously just throw them away and regenerate it anyways, but it would be nice if we can feed some sort of placeholder to the JSON library and replce that part of the output with the cached string.

  5. Serializer Digest

    Like how Rails added template digest to the cache key, we would also need to do that for our serializers. Ideally we would do it exactly like view caching and use a digest of the source file. However, if they use non-conventional filenames and require them manually this won't work. They could also have mutliple serializers in the same file which also makes it unreliable. So I guess we will need to somehow create a digest from the internal states of the serilizer class, which I'm unsure how to do yet.

  6. ArraySerializers

    Should we cache arrays, or should we just always cache only the fragments and stich them together? The reason this is tricky is because we'll need to automatically come up with a (reasonably cheap) cache key based on the content of an array (unlike associations, you can't rely on touch: true). I suppose it could be a diget of all the cache keys in the array, but that might be expensive to compute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment