Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A short explainer on how Graphcache does what it does

How does it do the thing?!

There's two main caching modes that Graphcache does as part of the app's lifecycle:

  1. Querying results from the cache
  2. Writing results to the cache

We attempt query to query any query definition from the cache by default, unless it's been marked as network-only. In urql query operations are already marked as such and come from useQuery / client.executeQuery.

Whenever any result comes back from your API, Graphcache writes it to the cache by traversing the query/mutation/subscription definition and storing all normalised "entities" and "links" between them.

We call any object that is derived from a GraphQLObjectType an entity and assume that they carry a __typename field. All non-entity values are called scalars (e.g. numbers, booleans, etc)

When writing or reading results we rely on selection sets in the GraphQL query definition. Any node that has a selection set (e.g. field { id }) can be assumed to have a value that is an entity. Any node that does not have a selection set (e.g. just field) can be assumed to always be a scalar.

Note: For the sake of handling failing edge-cases gracefully, we also consider entities that are missing the __typename field scalars.

Since fields can also have arguments we key the fields using the field's name and the field's arguments (the arguments are normalized and stabily stringifier). That's a field key. So a field id becomes "id", and a field name(lastname: true) becomes 'name({"lastname":true})'.

Any selection set on an entity can hence refer to another entity, a list of entities, or null. We call these connections between entities links. Links are just either null, a key that refers to an entity, or a list of keys (that may also contain null) that all refer to entities.

These keys are derived from the entities themselves. This is the actual process of normalization, but is very simple. By default we simply use: data.__typename + ':' + (data.id || data._id}).

When any of these three fields aren't present on the entity, the entity is not keyable. In such a case the key of the entity becomes the parent entity's key plus the field key. So in such a case the key is: parentKey + fieldKey, which may be repeated for as long as nested entities aren't keyable.

Entities aren't stored as a whole, but are actually stored as individual fields on a large key-value map. Every key is the entitiy's key plus the field key.

Note: For any type the keying function can also be customised using the keys config.

The process of walking a definition to either read or write a result is very similar. During this process we also collect dependencies. Dependencies are just a Set of keys that have been either read or written during a traversal.

When we're writing or reading on the root Query type, then we track the field keys on Query. When we're writing other entities we only track the entity keys.

The Exchange for urql keeps track of these dependencies. Every time a query is used this query must be updates when some of its dependencies have been written during an update traversal.

When a fragment is encountered during traversal we need to determine whether the fragment condition matches the __typename of the entity it's trying to match against at that point. This is trivial when the type condition is the exact same as __typename. However, since GraphQL supports interfaces there may be fragments that match a "generic type" that isn't __typename. There's two ways we solve this:

  • Without a schema we check whether all of the fragments fields have been cached
  • With a schema we check whether the schema knows that the fragment is on an interface that matches __typename

There are two customisations that can be made to manually alter traversal on top of this.

For querying there are cache resolvers. These can build up keys that the traversal can continue at or read scalars from the cache.

For writing there are updaters (which only apply to mutations and subscriptions, but not queries!) These can add other updates that should also be made on the cache by starting a nested traversal (nested write or fragment update)

As mentioned before, internally, each field on a definition is traversed separately and stored separately, and — since it's not a valid JSON/GraphQL value — undefined is consitently used to check/mark the absence of a field, i.e. it not being cached.

Internally our data structure are two key-value maps:

  1. A map for entity fields (key: entityKey + fieldKey, value: any scalar)
  2. A map for links (key: entityKey + fieldKey, value: any link)

Any entity field or link may not have an entry at any point when we're reading from our cache. This is an undefined entry. When that happens there's two things we do:

  • If we have a schema, we check whether that particular field on that type is nullable. If it is then we simply set that field to null on the query result
  • If the field is not nullable or we don't have a schema we can't set the field to null. So the entire entity that is being traversed and queried at that point is replaced by undefined.

This is useful during traversal, because it's essentially nested or recursive. When a field is not nullable and undefined is returned, the parent will also check whether the entity that it was querying recursively is on a nullable field.

This nullable cascade can domino upwards until it reaches the Query root type. Query can very well be null, but when all fields on Query are nullable, instead of giving the user an empty object, we replace the entire result with null as a convenient shortcut.

Our internal data structure is an immutable HAMT, which may be used to optimise rollbacks or other diffing efficiently. For now it's convenient because we've also made it our layer for optimistic updates.

We can set, get, or remove entries on our internal K/V map. But with the library that we've built we can also call setOptimistic to set values optimistically. This is important if we want to make temporary changes to any piece of data in the cache.

We make these temporary changes, or optimistic updates, for mutations. There's a config that can be provided that's called optimistic. We expect it to be a user-provided object of functions for mutation fields, that return optimistic results for mutations. These optimistic changes will be written to the cache like a normal result, but they're identified by the urql operation key.

When a network result comes in, we check whether we've made optimistic updates for that operation key, and roll back the optimistic changes if necessary.

Lastly, we have to be careful around network results. Any network result can't simply be passed through, since cache resolvers may alter the result. This is where we do a root read. We take the network data and update it with data from the cache, which also takes the user's resolvers into account, but is careful to reuse the usual query logic.

Conclusion

When reading the cache's source code after this short summary, it'll hopefully become more obvious what it does at any point of src/operations/. Generally we've been careful to reuse code but also careful too choose the right APIs that balance user convenience with conciceness.

This generally is also reflected by the cache's data structure. We handle optimistic changes on the data structure level so that they become trivial for the cache.

We have helpers to traverse selection sets and a Store and SchemaPredicates class that carefully abstract away data and schema tasks from src/operations/.

Lastly we have a set of generic GraphQL AST helpers that are used here and there.

The resulting code in src/operations is often surprisingly compact 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.