Skip to content

Instantly share code, notes, and snippets.

@tannerlinsley
Last active June 4, 2021 15:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tannerlinsley/703de608515ea153116bf6a97ac8c4a5 to your computer and use it in GitHub Desktop.
Save tannerlinsley/703de608515ea153116bf6a97ac8c4a5 to your computer and use it in GitHub Desktop.
An RFC to rewrite React Query's API to allow for opt-in normalization

React Query v4 (Ezil Amron) RFC

Preface: This is an RFC, which means the concepts outlined here are a work in progress. You are reading this RFC because we need your help to discover edge cases, ask the right questions and offer feedback on how this RFC could improve.

What are we missing today without normalization?

Today, React Query uses unstructured query keys to uniquely identify queries in your application. This essentially means RQ behaves like a big key-value store and uses your query keys as the primary keys. While this makes things conceptually simple to implement and reason about on the surface, it does make other optimizations difficult (and a few others impossible):

  • Finding and using Initial/placeholder single-item queries from list-like queries. While this is possible today, it's very manual, tedious, and prone to error.
  • Manual optimistic updates that span multiple query types and shapes (eg. single-item queries, list queries and infinitely-paginated queries) are tedious and prone to breakage as your app changes.
  • The corresponding rollbacks of said app-wide optimistic updates are brittle, manual, and error prone as you must implement this logic yourself for every mutation
  • Auto-invalidation is near impossible since there is no schema backing queries and mutations are unaware of queries and their keys/data. Thus most, if not all, mutations usually result in manually calling invalidateQueries at the very least.
  • RQ's cache is currently a key-value store, which means it has a larger potention for taking up more memory if many queries are made to store the same piece of data in different lists or locations.

What could React Query look like with an opt-in normalization-first API?

Queries would take on a new API based on query definitions and optional resource identification. Those APIs might look like this:

const todoListQuery = createListQuery({
  kinds: ['todo'],
  fetch: async (variables) => {
    const { data } = await axios.get('/todos', { params: variables })
    return data
  },
  getResources: (todos) =>
    todos.map((todo) => ({
      kind: 'todo',
      id: todo.id,
      data: todo,
    })),
})

const todoQuery = createQuery({
  kind: 'todo',
  getId: (variables) => variables.todoId,
  fetch: async (variables) => {
    const data = await api.get(`/todos/${variables.todoId}`)
    return data
  },
})

What is an resource?

"Resource" is a flexible term, which is why we chose it. In the context of React Query, it represents a fine-grained single item or object that you would normally request from a server. An individual todo or user is a good example, but not a list of todos or users.

What is resource identification?

Resource identification is the process of taking arbitrary data we receive from the server and extracting the information out of it that uniquely identifies it. Take the following todo object for example:

const todo = {
  id: '93jhft2of8fy3j',
  created: Date.now(),
  title: 'Do the dishes',
  isComplete: false,
  notes: `They're really piling up...`,
}

We could easily identify this resource like so:

const resource = {
  kind: 'todo',
  id: todo.id,
  data: todo,
}

This gives us a data structure that is consistent and reliable in helping us normalize many resources with different kinds and ids into a normalized database.

What are queries?

Queries are used to define dependencies on one more resources from the server. There are 2 types of query definitions:

  • List Queries (includes paginated/infinite queries as well)
  • Resource Queries

List Queries

This is what a list query could look like:

const todoListQuery = createListQuery({
  kinds: ['todo'],
  fetch: async (variables) => {
    const { data } = await axios.get('/todos', { params: variables })
    return data
  },
  getResources: (todos) =>
    todos.map((todo) => ({
      kind: 'todo',
      id: todo.id,
      data: todo,
    })),
})

Another approach could be to expose an resource identification function:

const todoListQuery = createListQuery({
  kinds: ['todo'],
  fetch: ({ resource }) => async (variables) => {
    const { data: todos } = await axios.get('/todos', { params: variables })

    return todos.map((todo) => {
      resource({
        kind: 'todo',
        id: todo.id,
        data: todo,
      })
    })
  },
})

A list query is used to define a group of resources that are fetched and synchronized from the server together. For example:

  • An array of all todos for a given user
  • An array of of todos that have been filtered to a specific search term
  • An array of notification objects
  • An array of users

Did you notice how many times we used array there? 😉

Let's go over some of the basic options of list queries:

  • kinds: string[] - This is an array of strings where each string defines the kinds of resources this query might contain. While a vast majority of queries will likely only contain a single kind of resource, it's possible for a list query to return resources of different kinds, hence the array.

    createListQuery({
      kinds: ['todo'],
      // or
      kinds: ['user', 'bot'],
    })
  • fetch: (meta) => Promise<Data> - Similar to the query function in the current version of React Query, this function should return a promise that resolves your data from the server.

    createListQuery({
      fetch: async () => {
        const data = await api.getTodos()
        return data
      },
    })
  • getResources: (data) => Resource[] - A function that is used to identify the resources received from the fetcher function. In the example below, it receives the data from our fetcher function and returns an array of resources.

    createListQuery({
      getResources: (todos) =>
        todos.map((todo) => ({
          kind: 'todo',
          id: todo.id,
          data: todo,
        })),
    })

What about returning different kinds of resources in list queries?

If we wanted to return multiple resource types from the server, our getResources function could look like this:

createListQuery({
  getResources: (items) =>
    items.map((item) => ({
      kind: item.kind,
      id: item.id,
      data: item,
    })),
})

What about normalizing nested resources in list queries?

To register nested resources, you could recurse on resources themselves to collect more sub resources:

createListQuery({
  getResources: (users) =>
    users.map((user) => ({
      kind: 'user',
      id: users.id,
      data: user,
      subResources: {
        todos: (todos) =>
          todos.map((todo) => ({
            kind: 'todo',
            id: todo.id,
            data: todo,
          })),
      },
    })),
})

Resource Queries

Let's look one more time at what a list query could look like:

const todoQuery = createQuery({
  kind: 'todo',
  getId: (variables) => variables.todoId,
  fetch: async (variables) => {
    const data = await api.get(`/todos/${variables.todoId}`)
    return data
  },
})

An resource query is used to define single resources that are fetched and synchronized from the server. For example:

  • Individual todo objects
  • Individual notification objects
  • Individual user objects
  • Individual github repositories
  • Individual results that contain an array of table rows or dates to plot on a chart

Wait, did you just use array there? I thought that was for lists?

The data visualization example above is meant to illustrate that our "list" and "single" query APIs are more conceptual instead of actual rigid classification structures. It is very common for a data visualization endpoint to return an array of table rows or dates to plot on a chart, but you would rarely want to normalize each individual row of data or date's data. It makes more sense to treat the result as an individual resource.

Let's go over some of the basic options of resource queries:

  • kind: string - This is the kind of the individual resources that this query returns. This kind should match up with the kinds array in any list queries, so they can be aware of each other.

    createQuery({
      kind: 'todo',
      // or
      kinds: 'user',
    })
  • getId: (variables) => variables.todoId - This function is how we uniquely identify our resource requests before and during fetching. You are passed the variables for the query and can return the id for what you are requesting.

  • fetch: (meta) => Promise<Data> - Similar to the query function in the current version of React Query, this function should return a promise that resolves your data from the server. In the example below, we're using the variables to pass our todoId to the server

    createQuery({
      fetch: async (variables) => {
        const data = await api.get(`/todos/${variables.todoId}`)
        return data
      },
    })

Hold on, where is getResources for resource queries?

Since we already know the kind and the id for an resource query before we even request them, we can automatically idresource the resource. The above example would result in an resource like:

createQuery({
  kind: 'todo',
  id: variables.todoId,
  data: todo,
})

New query API, new capabilities

Even without discussing mutations (yet 😉), this new query-driven API that is designed for opt-in normalization would allow React Query to gather structured information about your server dependencies and perform out-of-the-box, app-wide, automatic optimizations such as:

  • Automatic initial/placeholder data - It's just magically there.
  • Automatic updates to list queries from resource query data.
  • Better memory management by sharing resources across queries
  • One-touch manual updates to the cache, eg. You can make a single call to update an resource by kind and id and have it reflected across all list queries and resource queries (as opposed to manually iterating over all queries and searching/updating all of the different query types like list, paginated, single, etc)

Normalized Mutations

In React Query today, mutations not much more than a wrapper around some tracked state variables and asynchronous lifecycle callbacks. They currently make it pretty convenient to call invalidateQueries or do optimistic updates, but the brutal fact is that we still have to do that oureselves for every single mutation.

Mutations can be so much more with normalization baked into the API:

const createTodoMutation = createMutation({
  action: 'create',
  mutate: async (newTodo) => {
    const { data } = await axios.post(`/todos`, newTodo)
    return data
  },
  getOptimisticResources: (optimisticTodo) => {
    const tempId = uuid()

    return [
      {
        kind: 'todo',
        id: tempId,
        data: { ...optimisticTodo, id: tempId },
      },
    ]
  },
  getResources: (newTodo, optimisticResources) => [
    {
      kind: 'todo',
      id: newTodo.id,
      data: newTodo,
      replaceId: optimisticResources[0].id,
    },
  ],
})

const updateTodoMutation = createMutation({
  action: 'update',
  mutate: async (todo) => {
    const { data } = await axios.put(`/todos/${todo.id}`, todo)
    return data
  },
  getOptimisticResources: (todo) => [
    {
      kind: 'todo',
      id: todo.id,
      data: todo,
    },
  ],
  getResources: (todo) => [
    {
      kind: 'todo',
      id: todo.id,
      data: todo,
    },
  ],
})

const removeTodoMutation = createMutation({
  action: 'remove',
  mutate: (todoId) => api.removeTodoById(todoId),
  getOptimisticResources: (todoId) => [{ kind: 'todo', id: todoId }],
})

Alright, so what's are these action and optimisticResources options?

Mutations, as you saw above are very similar in spirit to queries and use the same "resource"-esque vocabulary. However, there are a few cool options that make them very powerful:

  • action?: 'create' | 'update' | 'remove' - The action type of a mutation denotes what the mutation is doing with the resources it's handling. This action determins how optimistic updates behave, or whether to perform them at all. You're also not required to pass an action if you are simply firing off an RPC or utility call that doesn't affect any resources.
  • getOptimisticResources: (variables) => Resource[] - As you might have guessed, this function is responsible for returning optimistic informatation about resources. For create actions, the optimistic resources are added, for update actions they are replaced, and for remove actions, they are removed.

Along with mutations and optimistic updates, list queries could also have options like:

  • optimistic: boolean | ['create', 'update', 'remove'] - Whether the list query should respond to none, all or some optimistic update actions
  • createMode: 'append' : 'prepend' - A quick way to determine whether new resources should be pushed or unshifted onto list queries
  • optimisticCreate/optimisticUpdate/optimisticRemove: (existingResources, optimisticResources) => newResources - Fucntions to manually override how optimistic actions and their resources are handled. Depending on the action, you could manually append, prepend, replace, or remove resources.

New mutation API, even more new capabilities

With our new structured information about mutations and their dependencies we can make even more incredible optimizations like:

  • Automatic optimistic updates across all list queries and resource queries
  • Automatic rollbacks for optimistic updates
  • Auto-invalidations across all list queries and resource queries after successful mutations

Okay, but what if I don't need normalization? React Query was great at being "simple"...

You're right! That's why this new API is designed to be an easy opt-in for normalization, but definitely not a requirement. Take the following query for example:

const todoQuery = createQuery({
  fetch: async () => {
    const { data } = axios.get(
      'https://api.github.com/repos/tannerlinsley/react-query'
    )
    return data
  },
})

Because queries are not defined in hooks, but in the module scope, the query instance itself and/or function idresource can be used to uniquely identify the query and perform deduping, etc. Which brings us to another great question...

How are queries, requests uniquely identified?

Queries could be uniquely identified by:

  • Query instance
  • fetch function
  • kinds or kind

Individual requests for queries could be uniquely identified further by:

  • Stringified variables
  • Derived resource ID from variables

How could we consume these queries?

With a new and improved useQuery hook!

function Todos({ todoId }) {
  const todosQuery = useQuery({
    query: TodosQuery,
  })
}

function Todo({ todoId }) {
  const todoQuery = useQuery({
    query: TodoQuery,
    variables: { todoId },
  })
}

What happens to the rest of the awesome stuff I'm use to in React Query v3?

It stays! Yes, I'm talking about keeping 99% of what you know and love today in React Query, including (but not limited to)

  • stale/cache timings
  • request deduping
  • polling
  • dependent queries
  • parallel queries
  • pagination/lagged queries

So what's really changing?

  • No longer defining queries on the fly in useQuery(), but instead defining queries ahead of time which can power our subscriptions via hooks like useQuery()
  • No more unstructured query keys. Instead, kind, id and variables would be used to uniquely identify a query
  • No longer required to implement your own:
    • Initial Data / Placeholder Data
    • Optimistic Updates / Rollbacks
    • Mutation-related query invalidations

WIP Proof of concept and inspiration

https://codesandbox.io/s/focused-moser-gu66b?file=/src/App.js

@flybayer
Copy link

flybayer commented Feb 12, 2021

Comments from discord:

I think we need some way to have multiple queries for a "kind". Because some views of that kind will have some data and other views will have other data. The resource can be placed once in the cache with all the data, but need to have different query fetch functions for different pages


So this would work?

const todoQuery = createQuery({
  kind: 'todo',
  getId: (variables) => variables.todoId,
  fetch: async (variables) => {
    const data = await api.get(`/todos/${variables.todoId}`)
    return data
  },
})

const adminTodoQuery = createQuery({
  kind: 'todo',
  getId: (variables) => variables.todoId,
  fetch: async (variables) => {
    const data = await api.get(`/todosWithExtraDataForAdmin/${variables.todoId}`)
    return data
  },
})

Can a resource query return nested resources? Eg. in this example project.todos would use the todos cache?

const project = {
  id: 1,
  name: 'hi',
  todos: [
    {id: 1, text: 'sweet'}
  }
}

@tannerlinsley
Copy link
Author

@flybayer

Re: resource sharing - Yes, resources would be shared and probably shallow-merged to existing properties as they come in, so in your question, the admin resource would gracefully share and extend the existing resource internally as the new data loads

Re: nested resources - We definitely would need to make that possible. I've added a few more notes above on subResource identification that could help us do just that, but we should probably talk about it more.

@flybayer
Copy link

Re: resource sharing - Yes, resources would be shared and probably shallow-merged to existing properties as they come in, so in your question, the admin resource would gracefully share and extend the existing resource internally as the new data loads

What if the regular data conflicts with the admin data? I guess you can either error or you can automatically do-optimize to a different cache key.

@tannerlinsley
Copy link
Author

What if the regular data conflicts with the admin data? I guess you can either error or you can automatically do-optimize to a different cache key.

If the data conflicts, then I think that would be a big hint that it needs to be its own separate resource type, no?

@flybayer
Copy link

Yes probably. It's just an edge case to handle elegantly from the library standpoint

@tannerlinsley
Copy link
Author

tannerlinsley commented Feb 12, 2021 via email

@boschni
Copy link

boschni commented Feb 14, 2021

Nice! Some first comments:

  • Maybe good to use the same terminology as GraphQL and use type instead of kind?
  • Do we need to know the "kinds" upfront? Or can they also be inferred from the getResources result?
  • About subResources: not sure how it know that the todos must be provided to the function, but what if the sub resources are deeply nested?
  • Do we really need a distinction between query and list queries? Probably also need to have a way of identifying different kind of lists.
  • What would a query result contain? The fetch response with replaced normalized resources? Or only the fetch response? If it would contain the normalized resources then we need to know exactly where resources in the response are located. We would also need to know how to generate types as the query result would be different from the fetch response.
  • Different fetches can return resources with partial data, and even contain sub-resources with partial data. Will all data be deeply merged? And what if some fetch returns the full resources instead of partially, but some properties have been deleted?
  • How do deal with resource mutations? When creating or deleting a resource, it might only need to be added or removed in some lists. Or maybe only the order in some list needs to be updated?
  • What if a user executes multiple optimistic mutations also involving multiple resources/lists? Will we be able to revert them in the correct order?
  • Do we only assign stale times to queries or also to resources?

@flybayer
Copy link

flybayer commented Feb 14, 2021

How do deal with resource mutations? When creating or deleting a resource, it might only need to be added or removed in some lists. Or maybe only the order in some list needs to be updated?

This is a very good point. A query list may have the same kind, but have a filter applied which would exclude a newly created item. 🤔

@tannerlinsley
Copy link
Author

These are awesome questions. Can’t wait to talk about them on Monday!

@tannerlinsley
Copy link
Author

tannerlinsley commented Feb 14, 2021

Nice! Some first comments:

Maybe good to use the same terminology as GraphQL and use type instead of kind?

That may be a good idea, yeah. It's always a hard call, since type is also a very abused property name ;) But, I would be fine with type as long as everyone else is. I'll make that edit.

Do we need to know the "kinds" upfront? Or can they also be inferred from the getResources result?

For most cases, you don't, but for all the times you might need to work with queries before they are fetched, they come in handy. Use cases I could imagine: pausing, cancelling, invalidating while idle w/ no data. There may be more. Another use case for having the types array would be to allow users to continue to manipulate queries without requiring normalization. Much like the query key behaves today, they could simply be "tags" that help with that? Obviously there's more to explore here. :)

About subResources: not sure how it know that the todos must be provided to the function, but what if the sub resources are deeply nested?

Yep, there is a lot of gray area here and it's the part I'm most unclear about for the entire idea thus far. As a stretch... it could just be a key/value mapping of the shape the user returns and as long as you match the JSON structure with your subResources object and provide a function for properties you'd like to map to entities, it could work. It's a stretch though. Other ideas here could be that we use something schema-like. Really, anything that would resemble and ad-hoc graphql like setup, or even something that could use a graphql schema definition 🤔

Do we really need a distinction between query and list queries? Probably also need to have a way of identifying different kind of lists.

Technically no, but I would imagine that we would end up having many options that are either list/resource specific and it may end up being more options than the ones that are shared. Hard to tell right now.

What would a query result contain? The fetch response with replaced normalized resources? Or only the fetch response? If it would contain the normalized resources then we need to know exactly where resources in the response are located. We would also need to know how to generate types as the query result would be different from the fetch response.

This is a tough one. Without locking down a shape/contract/schema for queries to return (even if it's optional) it would be pretty difficult to make assumptions about the data being fetched. At the very least, I think we should consider a better contract for the data, possibly even in the fetch function. If opting in to normalization requires the user to meet certain heuristics/shapes/contracts, then I don't see any benefit to having a separate getResources layer... but I could be wrong.

Different fetches can return resources with partial data, and even contain sub-resources with partial data. Will all data be deeply merged? And what if some fetch returns the full resources instead of partially, but some properties have been deleted?

Great question. From what I've seen other tools do (Apollo, Relay too maybe?), I think shallow merging at the resource level has been a pretty fair tradeoff for the vast majority of use cases, and where it's not, allowing the user to customize how resources are merged in userland. There are circumstances like @flybayer mentioned where you might have two resources that seem very similar (eg. admin versions of resources), but could and sometimes should be cached separately under a different type.

Either way you slice it, it will be up to the user to decide which resources should normalized together and which ones shouldn't.

How do deal with resource mutations? When creating or deleting a resource, it might only need to be added or removed in some lists. Or maybe only the order in some list needs to be updated?

Fantastic question. To me, there is definitely a spectrum of solutions here that require different levels of work on the users part to implement. Resources that are being updated are more-or-less optimized for free already, assuming your normalization is working correctly, since each resource should only be stored once and link to each list-query in which it appears. For adding, removing and reordering resources though, it's tricky.

Unless you replicate the exact logic from the server that's responsible for filtering/sorting/pagination on the client, you can not reliably push and pop resources onto just any list query you want. I personally don't think it's a good idea, nor an easy task, for users to port all of that logic into the client, which is why from the very beginning, I wrote the docs and the API to favor smart-ish invalidation as opposed to normalization.

Coming full circle, we would definitely need to make a decision on what the defaults are:

  • optimistic by defualt, or
  • opt-in

Either way we go, we should make it extremely easy to opt-in or out of the optimistic updates. Either a list query is safe to receive optimistic resources, or not. If it's not, then it should at least fall back to invalidation based on the resource types it handles.

Something else to keep in mind: Will users want to only optimistic update and not invalidate? 🤷‍♂️

What if a user executes multiple optimistic mutations also involving multiple resources/lists? Will we be able to revert them in the correct order?

I think we can revert them in the correct order, yes. It will take some extra work to map the optimistic state of a list query to the mutation that caused it and also keep track of the history of those optimistic states in the right order. Assuming we have all that information, we could roll them back in the right order. In fact, we could even ensure that they are executed in the right order too when the network reconnects, right?

Do we only assign stale times to queries or also to resources?

I think that depends on what we define as a "query" and an "observer/subscription". My gut tells me that anywhere we can observe/subscribe to a resource (list or single), we should support a maxAge trigger for background invalidation. Supporting both would allow both bulk and very granular background updates.

@boschni
Copy link

boschni commented Feb 16, 2021

For most cases, you don't, but for all the times you might need to work with queries before they are fetched, they come in handy. Use cases I could imagine: pausing, cancelling, invalidating while idle w/ no data.

Think the types and params would probably not be enough as they can be similar for completely different queries. We could invalidate queries by object references like invalidateQuery(todoQuery, { id: 1 }) but it is a bit less flexible.

Yep, there is a lot of gray area here and it's the part I'm most unclear about for the entire idea thus far.

You could do something like this to identify and replace any nested resources in a response:

createQuery({
  fetch: fetchTodo,
  normalize: (todo, resource) =>
    resource({
      id: todo.id,
      type: 'todo',
      data: {
        ...todo,
        edits: todo.edits.map(edit => ({
          ...edit,
          user: resource({
            id: edit.user.id,
            type: 'user',
            data: edit.user,
          }),
        })),
      },
    }),
})

But if you have a lot of endpoints with the same resources this can become tedious. Schemas are a way to prevent the duplication, although you quickly get into https://resthooks.io/ territory.

Technically no, but I would imagine that we would end up having many options that are either list/resource specific and it may end up being more options than the ones that are shared.

I guess it could be useful to specify infinite query related options?

This is a tough one. Without locking down a shape/contract/schema for queries to return (even if it's optional) it would be pretty difficult to make assumptions about the data being fetched.

Maybe it would be good to write out a complete flow with multiple fetches and cache updates including lists to see how it work like:

const todoQuery = createQuery({
  fetch: fetchTodo,
  normalize: (todo, resource) =>
    resource({
      id: todo.id,
      type: 'todo',
      data: {
        ...todo,
        edits: todo.edits.map(edit => ({
          ...edit,
          user: resource({
            id: edit.user.id,
            type: 'user',
            data: edit.user,
            update: _cacheUser => edit.user, // replace instead of merge (runs only on query update)
          }),
        })),
      },
    }),
})

await todoQuery.fetch({ id: 1 }) // query normalize function runs and updates the cache

cache.get({ type: 'todo', id: 1 })
{
  id: 1,
  text: 'some todo',
  done: false,
  edits: [{
    id: 1,
    name: 'username'
  }]
}

cache.get({ type: 'user', id: 1 })
{
  id: 1,
  name: 'username'
}

// local update
caches.set({ type: 'todo', id: 1 }, prevTodo => ({
  ...prevTodo,
  text: 'some todo with edit'
}))

// query normalize function runs again for related queries, but now they will only read from the cache

cache.get({ type: 'todo', id: 1 })
{
  id: 1,
  text: 'some todo with edit',
  done: false,
  edits: [{
    id: 1,
    name: 'username'
  }]
}

await todoQuery.fetch({ id: 1 }) // query normalize function runs and updates the cache

cache.get({ type: 'todo', id: 1 })
{
  id: 1,
  text: 'some todo',
  done: false,
  edits: [{
    id: 1,
    name: 'username'
  }]
}

This is just a basic flow for a possible implementation, but writing out entire flows definitely helps in shaping the api.

Unless you replicate the exact logic from the server that's responsible for filtering/sorting/pagination on the client, you can not reliably push and pop resources onto just any list query you want.

Yeah that is what I mean, when creating some resource, we need to have a way to add it to specific lists. In the current mutation API I don't see a way how to specify in which lists the new resource need to be added and how it needs to be added.

@Stancobridge
Copy link

Won't break v3 code when upgraded ??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment