kitten/urql-exchange-graphcache.md

## urql-exchange-graphcache.md

      
    Raw
  

              urql-exchange-graphcache.md
            
          
    How does it do the thing?!

There's two main caching modes that Graphcache does as part of the app's lifecycle:

Querying results from the cache
Writing results to the cache

We attempt query to query any query definition from the cache by default, unless
it's been marked as network-only. In urql query operations are already marked as
such and come from useQuery / client.executeQuery.
Whenever any result comes back from your API, Graphcache writes it to the cache
by traversing the query/mutation/subscription definition and storing all normalised
"entities" and "links" between them.
We call any object that is derived from a GraphQLObjectType an entity and assume
that they carry a __typename field. All non-entity values are called scalars
(e.g. numbers, booleans, etc)
When writing or reading results we rely on selection sets in the GraphQL query definition.
Any node that has a selection set (e.g. field { id }) can be assumed to have a value
that is an entity. Any node that does not have a selection set (e.g. just field)
can be assumed to always be a scalar.

Note: For the sake of handling failing edge-cases gracefully, we also consider
entities that are missing the __typename field scalars.

Since fields can also have arguments we key the fields using the field's name and
the field's arguments (the arguments are normalized and stabily stringifier). That's
a field key. So a field id becomes "id", and a field name(lastname: true) becomes
'name({"lastname":true})'.
Any selection set on an entity can hence refer to another entity, a list of entities, or null.
We call these connections between entities links. Links are just either null, a key
that refers to an entity, or a list of keys (that may also contain null) that all refer
to entities.
These keys are derived from the entities themselves. This is the actual process of
normalization, but is very simple. By default we simply use:
data.__typename + ':' + (data.id || data._id}).
When any of these three fields aren't present on the entity, the entity is not keyable.
In such a case the key of the entity becomes the parent entity's key plus the field key.
So in such a case the key is: parentKey + fieldKey, which may be repeated for as long as
nested entities aren't keyable.
Entities aren't stored as a whole, but are actually stored as individual fields on
a large key-value map. Every key is the entitiy's key plus the field key.

Note: For any type the keying function can also be customised using the keys
config.

The process of walking a definition to either read or write a result is very similar.
During this process we also collect dependencies. Dependencies are just a Set of
keys that have been either read or written during a traversal.
When we're writing or reading on the root Query type, then we track the field keys on
Query. When we're writing other entities we only track the entity keys.
The Exchange for urql keeps track of these dependencies. Every time a query is used
this query must be updates when some of its dependencies have been written during
an update traversal.
When a fragment is encountered during traversal we need to determine whether the fragment
condition matches the __typename of the entity it's trying to match against at that
point. This is trivial when the type condition is the exact same as __typename.
However, since GraphQL supports interfaces there may be fragments that match a "generic type"
that isn't __typename. There's two ways we solve this:

Without a schema we check whether all of the fragments fields have been cached
With a schema we check whether the schema knows that the fragment is on an interface that matches __typename

There are two customisations that can be made to manually alter traversal on top of this.
For querying there are cache resolvers. These can build up keys that the traversal can continue
at or read scalars from the cache.
For writing there are updaters (which only apply to mutations and subscriptions, but not queries!)
These can add other updates that should also be made on the cache by starting a nested traversal (nested write or fragment update)
As mentioned before, internally, each field on a definition is traversed separately and stored separately,
and — since it's not a valid JSON/GraphQL value — undefined is consitently used to check/mark the absence
of a field, i.e. it not being cached.
Internally our data structure are two key-value maps:

A map for entity fields (key: entityKey + fieldKey, value: any scalar)
A map for links (key: entityKey + fieldKey, value: any link)

Any entity field or link may not have an entry at any point when we're reading from our cache. This
is an undefined entry. When that happens there's two things we do:

If we have a schema, we check whether that particular field on that type is nullable.
If it is then we simply set that field to null on the query result
If the field is not nullable or we don't have a schema we can't set the field to null.
So the entire entity that is being traversed and queried at that point is replaced by
undefined.

This is useful during traversal, because it's essentially nested or recursive. When a field
is not nullable and undefined is returned, the parent will also check whether the entity
that it was querying recursively is on a nullable field.
This nullable cascade can domino upwards until it reaches the Query root type. Query can
very well be null, but when all fields on Query are nullable, instead of giving the user
an empty object, we replace the entire result with null as a convenient shortcut.
Our internal data structure is an immutable HAMT, which may be used to optimise rollbacks or
other diffing efficiently. For now it's convenient because we've also made it our layer for
optimistic updates.
We can set, get, or remove entries on our internal K/V map. But with the library that
we've built we can also call setOptimistic to set values optimistically.
This is important if we want to make temporary changes to any piece of data in the cache.
We make these temporary changes, or optimistic updates, for mutations. There's a config
that can be provided that's called optimistic. We expect it to be a user-provided object of
functions for mutation fields, that return optimistic results for mutations. These optimistic
changes will be written to the cache like a normal result, but they're identified by the
urql operation key.
When a network result comes in, we check whether we've made optimistic updates for that
operation key, and roll back the optimistic changes if necessary.
Lastly, we have to be careful around network results. Any network result can't simply be
passed through, since cache resolvers may alter the result. This is where we do a
root read. We take the network data and update it with data from the cache, which
also takes the user's resolvers into account, but is careful to reuse the usual query logic.
Conclusion

When reading the cache's source code after this short summary, it'll hopefully become more
obvious what it does at any point of src/operations/. Generally we've been careful to reuse
code but also careful too choose the right APIs that balance user convenience with conciceness.
This generally is also reflected by the cache's data structure. We handle optimistic changes
on the data structure level so that they become trivial for the cache.
We have helpers to traverse selection sets and a Store and SchemaPredicates class that
carefully abstract away data and schema tasks from src/operations/.
Lastly we have a set of generic GraphQL AST helpers that are used here and there.
The resulting code in src/operations is often surprisingly compact 🎉