Link Expansion can be considered as two distinct components:
- Building a link graph for what links might be set
- Populating the link graph with content
Both of these are relatively expensive tasks to under take, so it would be good if we had a means to cache it.
The challenge for us has been determining a way to invalidate the cache as there are many ways this can become stale.
- A user updates some links (taxons) in Whitehall
- A request is made to /v2/expanded-links/{content-id}
The request then either times out or - since caching was introduced - returns stale data.
We'd like to be able to:
- generate expanded links fast enough for a request to not time out
- cache the results of it so it doesn't need to be done multiple times
We have completed the first step and are wondering how to do the second.
A put content call can change content and links and only for the draft content store
Cached expanded link sets, with with_drafts true, and have a dependency of the content_id of the content that changed
If links have changed since previous edition then we need to invalidate Link Graph for this content_id, locale, and with_drafts: true (and not ones that depend on this since edition links can't be recursive)
If links have changed and they are of reverse link types then we need to invalidate the link graphs for the content_ids, with_drafts: true of the changed reverse links and the corresponding expanded link sets
Publish can change content and links and for the live content store
Cached expanded link sets, with with_drafts false, and have a dependency of the content_id of the content that changed
If links have changed since previous edition then we need to invalidate Link Graph for this content_id, locale, and with_drafts: true (and not ones that depend on this since edition links can't be recursive)
If links have changed and they are of reverse link types then we need to invalidate the link graphs for the content_ids, with_drafts: true of the changed reverse links and the corresponding expanded link sets
Unpublish can change content and links and for the live content store
Cached expanded link sets, with with_drafts true and false, and have a dependency of the content_id of the content that changed
If links have changed since previous edition (ie replacing a draft) then we need to invalidate Link Graph for this content_id, locale, and with_drafts: true (and not ones that depend on this since edition links can't be recursive)
If links have changed and they are of reverse link types then we need to invalidate the link graphs for the content_ids, with_drafts: true of the changed reverse links and the corresponding expanded link sets
Discarding drafts can remove items from links
Cached expanded link sets, with with_drafts true, and have a dependency of the content_id of the content that changed
Invalidate Link Graph for this content_id, locale, and with_drafts: true
If some links were of reverse link types then we need to invalidate the link graphs for the content_ids, with_drafts: true of the reverse links and the corresponding expanded link sets
Substitution helper automatically discards drafts when the item is a draft, or automatically unpublishes when not.
When draft - we don't have do anything, would be handled by discard drafts
When not draft:
Cached expanded link sets, with with_drafts false, and have a dependency of the content_id of the content that has been substituted
Invalidate Link Graph for this content_id, locale, and with_drafts: false
If some links were of reverse link types then we need to invalidate the link graphs for the content_ids, with_drafts: true of the reverse links and the corresponding expanded link sets
If we are able to store a blob of data (link graph or expanded link set) which has a number of tags (content ids) associated with it then we can easily invalidate the cache.
I tried having a model where we had:
create_table :link_expansion_link_graph_caches do |t|
t.uuid :content_id, null: false
t.string :locale, null: false
t.boolean :with_drafts, null: false
t.json :link_graph, null: false
# Postgres doesn't support uuid[] type
t.text :dependency_content_ids, array: true, null: false, default: []
t.index [:content_id, :locale, :with_drafts], unique: true,
name: "link_expansion_link_graph_caches_content_id_locale_with_drafts"
t.index :dependency_content_ids, using: :gin
end
The problem we had here were that reads weren't particularly fast
without index: (doesn’t work with ANY)
(129.1ms) SELECT COUNT(*) FROM "link_expansion_link_graph_caches" WHERE ('ed3f3bbe-0f1e-4cb6-b2d9-adefce039dff' = ANY(links_content_ids))
=> 1
with Index: (does work with this strange operator)
(118.6ms) SELECT COUNT(*) FROM "link_expansion_link_graph_caches" WHERE (links_content_ids @> ARRAY['ed3f3bbe-0f1e-4cb6-b2d9-adefce039dff'])
=> 1
So then tried using a second table:
create_table :expanded_link_set_caches do |t|
t.uuid :content_id, null: false
t.string :locale, null: false
t.boolean :with_drafts, null: false
t.json :expanded_links, null: false
t.index [:content_id, :locale, :with_drafts], unique: true,
name: "expanded_link_set_caches_content_id_locale_with_drafts"
end
create_table :expanded_link_set_cache_dependencies do |t|
t.uuid :content_id, null: false
t.references :expanded_link_set_cache,
foreign_key: { on_delete: :cascade },
index: { name: "expanded_link_set_cache_dependencies_cache_id" }
t.index :content_id, name: "expanded_link_set_cache_dependencies_content_id"
end
Reads were very fast but inserting into the db was taking approximately 4 seconds
- Can we add cache to the db and delete it fast enough to happen within a web request.
The idea of storing full expanded link sets in the db doesn't seem great - storing link graphs seem much preferred. Maybe we can store a date or something with the link graph that can be used as a cache key so expanded link set doesn't need to populated each time?