nolanlawson/blog.md Secret

## blog.md

      
    Raw
  

              blog.md
            
          
    DRAFT - please do not post to social media! :)
Todo: Add photos from https://www.dropbox.com/sh/xwi8nxs7nysycwv/AAAnLfla3yzZwzpNa-NKxUGOa?dl=0
At Offline Camp California in November 2016, John Kleinschmidt and I tried to tackle the seemingly clear-cut problem of how to get PouchDB replication to run correctly in a Service Worker. However, even though John and I were familiar with both PouchDB and Service Workers, and even though I work for a browser vendor that's in the process of implementing Service Worker, we ran into impedance mismatches between what we thought a Service Worker was capable of and what we were trying to accomplish with offline replication.
The goal of this article is to explain where we went wrong in our understanding of Service Workers, and how we began tackling the problem of adapting PouchDB replication to a context for which it wasn't originally designed. I'll also go over some of the trickier parts of the Service Worker API, which may prove surprising to the uninitiated. John wrote up his own thoughts on the subject, and in this post I'd like to expand on what we've learned since then.
Scoping the problem

First off, what was the problem we were trying to solve? Well, one issue John identified was the performance impact of IndexedDB on the main thread, which can only be alleviated by moving PouchDB either to a Web Worker or to a Service Worker. John's project, HospitalRun.io, was already making use of Service Worker, and so it seemed to be a natural fit – no need to allocate additional workers just to run PouchDB.
However, the first thing we realized, and which I summarized in a talk I later gave at DotJS, was that a Service Worker can't really be treated the same as a Web Worker. It has some superficial similarities, but also some important differences.
On the surface, a Service Worker can appear quite similar to a Web Worker. They both run on separate threads from the main browser UI thread, they have a global self object, and they tend to support the same APIs. However, while Web Workers offer the developer a lot of control over their lifecycle (you can create an terminate them at will) and let you execute long-running JavaScript operations (in fact, they're designed for this), Service Workers explicitly don't allow either of these things. In fact, a Service Worker is best thought of as "fire-and-forget"; it responds to events in an ephemeral way, and the browser is free to terminate a misbehaved Service Worker that takes too long too fulfill a request or makes too much use of global in-memory resources.
This led us to our first real hurdle with Service Worker. Our goal, as we originally conceived it, was to use PouchDB's existing replication APIs to enable bi-drectional sync between the client and the server, with the client code isolated entirely to the Service Worker. So as a first pass, we simply loaded PouchDB into our serviceWorker.js, waited for the 'registration' event, and then used the standard PouchDB "live" sync API:
https://gist.github.com/42f8fcc8dbf756cfccccd1dcb2cdb58f
This immediately resulted in a silent error, which took us a bit of time to debug. The culprit? Well, PouchDB's "live" sync depends on HTTP long-polling – in other words, it maintains an ongoing HTTP connection with the CouchDB server, which is used to send real-time updates from the server to the client. As it turns out, this is a big no-no in Service Worker Land, and the browser will unceremoniously drop your Service Worker if it tries to keep any ongoing TCP connections. The same applies to Web Sockets, Server-Sent Events, WebRTC, and any other network APIs where you may be tempted to keep a constant connection with your server.
What we realized is that "the zen of Service Worker" is all about embracing events. The Service Worker receives events, it responds to events, and (if it knows what's good for it) it does so in a timely manner, lest the browser decide to proactively squash it. And this is actually a good design decision in the Service Worker specification, as it prevents Service Workers on malicious websites from going rogue and abusing the user's battery, memory, or CPU.
So, keeping in mind that we're trying to accomplish bi-directional replication – i.e. data flow from server to client and vice-versa – let's review what events the Service Worker can actually receive and respond to.
The 'fetch' event

This is the most famous feature in the Service Worker arsenal – by capturing this event, the Service Worker can intercept network requests and respond with its own content.
Because the CouchDB API is entirely REST-based, it would be technically possible to implement a caching layer via the 'fetch' event. However, this would be working against the grain of CouchDB's bi-directional replication model, and would do little to answer the question of how to handle conflicts and merges, as the existing CouchDB replication protocol does.
Such a technique may make sense for lightweight caching of a read-only REST API, but it doesn't make much sense for CouchDB. In the same way that it would be silly to take the Git protocol and map every HTTPS request to an object that can be independently cached and invalidated, we decided it would be silly to route our PouchDB replication system through 'fetch' events. That led us to the other Service Worker APIs:
The 'sync' event

This event is defined by the Background Sync API, and is probably the most confusingly-named of the Service Worker events. Rather than having anything to do with "sync," this is merely an event that fires when the browser goes from an offline state to an online state. The goal is to allow for the Service Worker to use this "just went online" state as an opportunity to push unsynced changes from the client to the server.
In the case of PouchDB, this would be as straightforward as waiting for the 'sync' event, and then firing a single-shot replication from the client to the server. In this case there's no need to keep a persistent HTTP connection open – we can merely wait for the 'sync' event to notify us that the device has come online, and then we allow PouchDB to replicate from the last checkpoint it may have stored from any previous replications, and then patiently wait for the next 'sync' event.
After examining this architecture, the first question you might ask is why this 'sync' API even exists, given that there is already navigator.onLine and similar online/offline events available in the browser. The answer is that, because this is available to a Service Worker rather than the main UI thread, you can actually handle this event even when the user doesn't currently have your website open. So in this sense the 'sync' event is much more capable.
The second question you may ask, if you've been working with offline-first architectures for awhile, is how a simple indication that the browser believes it is online can handle the trickier cases of captive portals, lie-fi, and other situations where the device believes it's online, but the network request still fails for one reason or another. The answer is that the 'sync' API doesn't answer this question, but the Periodic Sync API potentially does.
The 'periodicsync' event, as defined in the current Background Sync API working draft, allows the developer to schedule repeated events that the Service Worker can intercept. For instance, you might register an event that fires once every 30 minutes (which is guaranteed with only a rough degree of precision). You could also specify that the event should only fire if the device isn't currently on battery, is connected to WiFi rather than a cell network, etc. Unfortunately this API doesn't exist in any browser yet, but it does contain the building blocks for a robust client-to-server sync mechanism.
While Periodic Sync is still unimplemented in any browser, it's potentially polyfillable via setIntervals in the main UI thread – with the downside, of course, that it wouldn't work if the user wasn't currently on the page. Regardless, we now had a basic idea of how we might implement client-to-server replication using Service Worker. But what about server-to-client?
The 'push' event

Again, this is an event that can be easily misunderstood. From reading much of the documentation on the Web Push API, you may be led to believe that it's only useful for push notifications. And indeed, this is the landmark feature of the 'push' event.
However, it turns out that the Service Worker can handle a 'push' event even without displaying a notification (assuming that you've set userVisibleOnly to false). This means that it's perfectly suited for pushing data from the client to the server, without ever interrupting the user's workflow with an unwanted notification.
As a basic example, your server may contain a process that is listening to CouchDB's changes feed (via standard HTTP longpolling), and when it sees an update, it sends a notification request to the standard Push Messaging Service for that device, as defined in the IETF HTTP Push specification. In practice, this could be any one of Firebase Cloud Messaging (FCM, formerly GCM) (Chrome), Mozilla Push Service (Firefox), or Windows Push Notification Services (WNS), although since it's an established standard, neither the client nor the server need to know the details of which service they happen to be using.
Because push messaging is implemented at the level of the operating system itself, this has a huge advantage of the standard longpolling-based PouchDB sync, in that the device only ever needs to keep one connection open at a time, shared by the entire system. And once the 'push' event is received by the Service Worker, the push message itself doesn't even need to contain any data – we can simply do a single-shot replication from server to client using the regular PouchDB replicate() API, and then wait for the next push message.
Conclusion

As we've learned, there's a lot more to Service Workers than may initially meet the eye. Although sharing some cosmetic similarities with Web Workers, they are a different beast entirely, and often require a very different approach. Being aware of the Service Worker lifecycle and being careful to retain a "fire-and-forget" model are the best paths to effective Service Worker code.
Furthermore, we've shown how it's possible to leverage the Service Worker event model to build a bi-directional replication system, using the 'sync' event to send data from the client to the server, and the 'push' event to send data from the server to the client. Furthermore, the 'periodicsync' event may someday be useful for handling 'sync'-triggered client-to-server requests that fail due to lie-fi or other intermittent issues.
After our initial rocky attempts to enable full Service Worker-based replication to PouchDB, John and I expanded our conversation to a larger group and started hammering out the finer-grained details of what such a system may look like. Gregor Martynus and I explored some of these ideas at the 2017 CouchDB Dev Summit in February 2017, and later we began collaborating on a shared design document in GitHub, along with Garren Smith and Robin Mehner.
Hopefully this brainstorming will someday result in a new PouchDB plugin, which should allow the user a similar experience to PouchDB's existing "live" replication APIs, but implemented fully in the "Service Worker way." In the meantime, I hope that this exploration of Service Worker's more beguiling concepts can serve as a useful lesson for those who, like us, thought we knew what Service Worker was all about, and found ourselves more and more surprised the deeper we dug into it.