Fang-/cached-eyre.md

## cached-eyre.md

      
    Raw
  

              cached-eyre.md
            
          
    eyre caching

Urbit can function as a webserver, but isn't as fast at that as it could be. It would be good if it could comfortably serve hundreds or even thousands of pageviews a minute. Benchmarks for the status quo are left as an exercise to the reader.
approaches to improving performance

There are two ways in which we can use the scry namespace to make Eyre more performant. Truthfully, there are probably more than two ways, but these specific two have highest relevance in Eyre's recent history.

Publication cache: Eyre can track a publication cache with static, known-ahead-of-time responses bound to specific endpoint paths. Eyre would tell the runtime about these known responses, which the runtime would use to serve up responses to GET requests.
Stateless reads: The runtime, when receiving a GET request, can scry into Eyre to retrieve a response for it. Eyre might scry into agents to further resolve the read. This way, with GET requests handled as pure reads, they could theoretically be served in parallel.

In this document we will talk about the publication cache. The latter will not deliver performance gains by itself, may actually incur a performance hit in the "scry miss" case, and has unanswered questions around referential transparency. Some prior art can be found in an old draft PR. urbit/urbit#4674
The publication cache also aligns closely with the imagined future of "solid state subscriptions" and "shrub namespace", wherein new or changed data is explicitly published into the namespace.
kinds of content

Before talking about the publication cache proper, we must be aware of what kinds of content may get served through Eyre. We identify three kinds:

Static content is fully self-contained and only changes when the data within changes. For example: a blog post without comments, or an image.
Dynamic content changes based on the current state of an agent (or some other datum). For example: a blog post with comments or a list of pals.
Procedural content is generated from the request itself. While the response for any given path may be known ahead of time, it may not be possible to enumerate all the valid paths we have responses for. For example: a parameterized /sigil.svg endpoint, or a calculator API.

Note that for dynamic and procedural content, it may not always be possible to publish a known response at all. If the response depends on the timestamp of the request in some way, a cache entry would be busted before it even got stored. We simply ignore this case.
sources of content

Briefly, take note that for both dynamic and procedural content, hoon code must be run to generate the (original) response. Most commonly, this is a gall agent. In rare cases, generators are used for this.
Only for static content can we consider another source: Clay. For things that are already files, it makes sense to stick these in Clay. Think javascript blobs, image files, and other "earth" content.

For data that originates within agents, however, in most cases it is unergonomic and unsound to store that data in clay, especially if the agent may want to refer back to the data later.
In practice, both the "dynamic/prodecural response from agent" and "static file from Clay" cases are common, though the latter is often handled through Docket's "glob" system instead (for, at this point, largely historical reasons).
static publication cache

Accounting for the fact that agent-driven responses are already common, and weighing the fact that Eyre does not currently have any connection to or dependency on Clay, we propose the following model for a publication cache in Eyre:
+$  task
  $%  to-cache
      etc...
  ==
::
+$  gift
  $%  to-cache
      etc...
  ==
::
+$  to-cache
  $%  [%save endpoint data=simple-payload:http]
      [%dump endpoint]
  ==
::
+$  endpoint  ::  exact binding
  $:  binding
      tail=(unit @t)
  ==

Eyre gets a new task & gift, %save, which can be used to publish responses for a specific endpoint (optional site, plus path, plus extension if any) into the cache.
Eyre stores the cache within itself. Whenever something gets added, it notifies the runtime. On-%born, it notifies the runtime of all existing entries.
The runtime tracks the cache as per Eyre's notifications. Whenever it receives a GET request, and an entry for that exact path exists in the cache, it serves the stored response instead of injecting the request as an event.
%save may be used to overwrite existing entries. %dump may be used to remove existing entries.
When resolving an incoming GET request, Eyre checks the cache for an exact match. If there is, it serves that. If there is none, it falls back to the regular binding matching that path, if any.

This is sufficient to let agents publish responses into the cache, and keep those updated as the underlying data changes.
The "static file from Clay" case can be implemented using the affordances here. To avoid repetition of boilerplate patterns regarding this, we might ship a small piece of userspace infrastructure, an /app/file-server if you will, that is responsible for bindings of this kind.
authentication

The above does not account for authentication, limiting cached responses to fully public content. Presently, we do not have the affordances needed to handle private content properly.
But it doesn't have to be that way. The %save task could simply include an auth=? flag alongside the data, indicating whether authentication is required or not. The runtime would then, where needed, check the incoming request for a valid authentication cookie, and either give the response or serve a simple 403.
...Except the runtime does not presently know how to check an incoming request for authentication. And making it do so is outside the scope of the grant. The changes here aren't too big though, and in some ways similar to the behavior outlined above. (Most likely, just Eyre telling the runtime about creation/expiry of session identifiers, and teaching the runtime how to check for its presence in any Cookie headers.) Considering the very-nice-to-have nature of support for private endpoints, we (read: ~palfun-foslup) may offer to implement this in the short-term, so that the grant work may make use of it.
Caching may not seem as relevant for private content, since it's significantly less likely to get requested many times a minute. But being able to eagerly cache there still provides tangible benefits. Urbit-generated web UIs can get served faster, and private endpoints stop being surface area for DOS attacks.
procedural publication cache

We have accounted for static and dynamic content, but are not yet able to serve procedural content. This requires a slightly different approach.
NOTE that this section is still tentative and under discussion. We may or may not want to punt on this until we have solid state subscriptions. Certainly the static publication cache above should be sufficient for most cases. Consider this section out of scope for the grant.
+$  to-cache
  $%  [%prep =binding =work]
      [%drop =binding]
      etc...
  ==
::
::TODO  but $inbound-request gets constructed by eyre!
::      the runtime can easily make $request:http, but that doesn't
::      have the auth check/flag...
+$  work  $-(inbound-request simple-payload:http)
::
+$  action
  $%  [%work work]
      etc...
  ==

Eyre gets a new task & gift, %prep, which can be used to publish a response generation function for a specific binding (optional site, plus top-level path) into the cache.
In addition to a cache entry, Eyre stores this among the normal bindings. Whenever a %work binding gets added, Eyre notifies the runtime.
The runtime tracks the cache as per Eyre's notifications. Whenever it receives a GET request, it checks to see if a %prep binding matches. If one does, it runs the gate and serves the generated response, instead of injecting the request as an event.
When resolving an incoming GET request, Eyre resolves from the bindings as normal.

However:

Not super pleased with how procedural content forces you to bind on non-exact paths, in turn forcing the runtime to to binding resolution.

We could reduce the friction here by moving Eyre's "find binding for request path" function into lull/the ivory pill, right?
Alternatively, Eyre just publishes one overarching "cache resolution" function to the runtime, instead of letting it implement its own logic.


Concern with any of these is keeping old copies of the kernel around within these functions. Eyre could re-publish the function(s) on-upgrade, but might also need agents to do the same.