nodkz/proposal.md

## proposal.md

      
    Raw
  

              proposal.md
            
          
    Performance problems in Apollo Client v2, v3 and Relay. Why we need a new GraphQL Client?

Both Apollo Client and Relay provide great State Managers for storing GraphQL responses with component updates on data changes. Under the hood they are using Cache Normalization that reduces data redundancy. More details can be found here for Apollo and here for Relay. This Normalization is a killer feature for Modern Web Apps - it helps to keep in sync displayable data of some entity in different parts of your application. But this Normalization has performance problems on big data sets.
Cache Normalization in Apollo Client 2 and Relay

Apollo Client was originally created as lightweight alternative of Relay. Apollo re-implemented basic functionality keeping in mind developer experience and bundle size. But conceptually Apollo Client 2 and Relay have similar principles for normalized cache. Let's take a look on example from Relay website:
Assume we have the following GraphQL Fragment:
fragment on User {
  id
  name
  address {
    city
  }
}
And server returns the following Response for this fragment:
{
  id: '842472',
  name: 'Joe',
  address: {
    city: 'Seattle',
  }
}
So after normalization we will keep in store (cache) the following records:
RecordSource {
  '842472': Record {
    __id: '842472',
    __typename: 'User', // the type is known statically from the fragment
    id: '842472',
    name: 'Joe',
    address: {__ref: 'client:842472:address'}, // link to another record
  },
  'client:842472:address': Record {
    // A client ID, derived from the path from parent & parent's ID
    __id: 'client:842472:address',
    __typename: 'Address',
    city: 'Seattle',
  }
}
Wow! We will spent in general in 2-3 times memory more than initial response. If we take in consideration how Apollo & Relay generates keys for normalized records the difference might be higher.
The unfortunate reality of those caches is that read/write operations impose considerable overhead (in CPU and memory) in order to work with such cache. Ian MacLeod explained quite well this problem in Motivation for Apollo Cache Hermes.
When cache is ready then Store re-creates payload and propagates its to useQuery hook. And do it again if some record in cache was changed even if was changed by another query.  Unlike Apollo Client 2, Relay has useFragment helper which provides ability to partially propagate updates in React components tree. If some fragment's data was changed then only this dependent React component will be rerendered, keeping rest React tree untouched. But Apollo Client 2 does not have such feature and on small changes will propagates update on whole React Tree which depends from affected query. That's why, for example, in big tables when data changed in one row then Relay will work much faster than Apollo Client 2. If and only if every row is wrapped in useFragment. So Relay is fragment centric, but Apollo Client 2 is query centric. Both of them has similar cache structure, but Relay has much more performant read & propagate operations.
Apollo Client 2 has slow write to store operation and also has very slow read from store operation. So Ian MacLeod suggested a new store architecture which improves write operations. And this proposal inspired Apollo team to rewrite their store in v3.
Cache Normalization in Apollo Client 3

In Apollo Client 3 cache consist of both normalized and non-normalized records. Ben Newman provides some information in PR #5146: Eliminate "generated" cache IDs to avoid normalizing objects without meaningful IDs..
For example in Apollo Client 2 we have such records in cache:
{
  '842472': {
    __id: '842472',
    __typename: 'User',
    id: '842472',
    name: 'Joe',
    address: {__ref: 'client:842472:address'},
  },
  'client:842472:address': {
    __id: 'client:842472:address',
    __typename: 'Address',
    city: 'Seattle',
  }
}
And Apollo Client 3 starts store address object without meaningful ID directly in its parent object:
{
  'User:842472': {
    __typename: 'User',
    id: '842472',
    name: 'Joe',
    address: {
      __typename: 'Address',
      city: 'Seattle'
    }
  }
}
These changes provided huge improvements in performance for write & read operations on big caches. Also was reduced memory consumption. So writing data to AC3 cache became faster than in Relay. Anyway read operation from AC3 cache do not provide big impact on application performance. Partial fragment updates in Relay still avoid calling tons of React render functions which does Apollo when propagates updates via useQuery from the root.
So Apollo Client 3 needs to implement somehow useFragment logic like in Relay. But with their new normalized and non-normalized cache this task became non trivial and quite complicated.
We need a new GraphQL Client!

Basic operations for current GraphQL clients in terms of performance can be summarized in the following table:


Write to cache
Read from cache
Data propagation


Apollo Client 2
normal
normal
bad


Apollo Client 3
good
good
bad


Relay
normal
normal
the best


Relay shows the best performance in real apps. Data propagation via fragments provide better performance, but we still pay by memory & cpu for data normalization.
The best solution will be AC3 cache with Relay fragments. But when you have non-normalized records in your cache and fragment on some internal data – then it's almost impossible to propagate data precisely.
So I what to propose an idea of entity-centric GraphQL client:

with immutable payload
useEntityQuery returns cache records as is (without data masking like in Relay)
if cache record has reference to another record it will be returned as is
for obtaining reference data need to use hook useEntityRef which reads and subscribes on Entity changes
if normalized record was changed then it does not propagate it changes to parent records. useEntityQuery will not be fired
GraphQL fragments is used in queries and by code-generator for TypeScript
useEntityRef does not use GraphQL fragments, this hooks only works on Entities and returns their copy from cache without any changes and without data masking.
main idea use vanilla cache as much as possible without any data preparations

Assume we have such GraphQL query
query {
  me {
    id
    nickname
    address {
      ...AddressFragment
    }
    company {
      ...CompanyFragment
    }
  }
}

fragment AddressFragment on Address {
  city
}

fragment CompanyFragment on Company {
  id
  name
}
Which returns the following payload
{
  me: {
    id: '1',
    nickname: 'nodkz',
    address: {
      city: 'Almaty'
    },
    company: {
      id: '33',
      name: 'Northwind'
    }
  }
}
In cache it will be normalized like in AC3 with normalized & non-normalized records:
{
  'User:1': {
    __typename: 'User',
    id: '1',
    nickname: 'nodkz',
    address: {
      __typename: 'Address',
      city: 'Seattle'
    },
    company: { __ref: 'Company:33' }
  },
  'Company:33': {
    __typename: 'Company',
    id: '33',
    name: 'Northwind'
  }
}
In React components it can be used like
function User {
  const { data: { me } } = useEntityQuery<GeneratedType>(graphql`...`);
  // `me` variable will be immutable object from cache as is
  //  - It may have additional fields which might be added by other queries (no data masking like in Relay). 
  //    It's TypeScript responsibility to check that developer uses only fields that listed in GraphQL query.
  //  - `address` field is non-normalized
  //  - `company` field is normalized and contains just { __ref }
  
  return (<div>
    <div>Nickname - {me.nickname}</div>
    <Address address={me.address} />
    <Company company={me.company} />
  </div>);
}

// generated type by codegen from GraphQL query
interface AddressFragment = {
  address: String;
};

function Address(props: { address: AddressFragment }) {
  return <div>{props.address.city}</div>;
}

// generated type by codegen from GraphQL query
// detect that fragment has ID so adds { __refs }
interface CompanyFragment = {
  name: String;
} | { __ref: String };

function Company(props: { company: CompanyFragment }) {
  // if __ref then reads Entity from cache and subscribe on it's changes, 
  // otherwise returns unchanged props data
  // in TypeScript takes arg type, removes { __ref } and for data returns just fragment fields
  const data = useEntityRef(props.company);
  return <div>Company name: {data.name}</div>;
}
So if was changed User entity then will be rerendered root User component via useEntityQuery.
If was changed Company entity then will be fired only useEntityRef, but not useEntityQuery. We keep parent React component without notification about changes.

What else for consideration:

tune store for cursor based lists (especially for chat applications)
debugging tools
partial cache persistence
two level nested cache for entities
	Write to cache	Read from cache	Data propagation
Apollo Client 2	normal	normal	bad
Apollo Client 3	good	good	bad
Relay	normal	normal	the best