am0c/posa.mkdn

## posa.mkdn

      
    Raw
  

              posa.mkdn
            
          
    From SocialCalc to EtherCalc

Previously, in The Architecture of Open Source Applications, I described SocialCalc, an in-browser spreadsheet system that replaced the server-centric WikiCalc architecture. SocialCalc performs all of its computations in the browser; it uses the server only for loading and saving spreadsheets.
...diagram on SocialCalc's scalability model...
For the Socialtext team, performance was the primary goal behind SocialCalc's design. The key observation was this: Client-side computation in JavaScript, while an order of magnitude slower than server-side computation in Perl, was much faster than the network latency incurred during AJAX roundtrips.
Toward the end of the AOSA chapter, we introduced simultaneous collaboration on spreadsheets, using a simple, chatroom-like architecture:
...diagram on Multiplayer SocialCalc's scalability model...
However, as we began to test it for production deployment, the performance and scalability characteristics fell short of practical use, motivating a series of system-wide rewrites in order to reach acceptable performance.
This chapter will discuss the revamped architecture we made for EtherCalc, a successor to SocialCalc optimized toward simultaneous editing.
Design Constraints

The Socialtext platform has both behind-the-firewall and in-the-cloud deployment options, imposing unique constraints on EtherCalc's resource and performance requirements. At the time of this writing, Socialtext requires 2 CPU cores and 4GB RAM for vSphere-based intranet hosting; a typical dedicated EC2-based cloud instance provides about twice that capacity, with 4 cores and 7.5GB of RAM.
Deploying on intranets means that we can't simply throw hardware at the problem in the same way multi-tenant, hosted-only systems did (e.g. DocVerse, which later became part of Google Docs); we can assume only a modest amount of server capacity.
Compared to intranet deployments, cloud-hosted instances offer better capacity, with on-demand extension, but network connections from browsers are usually slower and fraught with frequent disconnections and reconnections.
To recap, constraints on these resources shaped EtherCalc's architecture directions:


Memory: An event-based server allows us to scale to thousands of concurrent connections with a small amount of RAM.


CPU: Based on SocialCalc's original design, we offload most computations and all content rendering to client-side JavaScript.


Network: By sending spreadsheet operations, instead of spreadsheet content, we reduce bandwidth use and allow recovering over unreliable network connections.


Initial Prototype

We started with a WebSocket server implemented in Perl 5, backed by Feersum, a libev-based event engine developed at Socialtext. Feersum is very fast, capable of handling over 10k requests per second on a typical instance.
On top of Feersum, we use the PocketIO middleware to leverage the popular Socket.io JavaScript client, which provides backward compatibility for legacy browsers without WebSocket support.
The initial prototype closely resembles a chat server. Each collaborative session is a chatroom; clients sends their locally executed commands and cursor movements to the server, which relays them to all other clients in the same room.
A typical flow of operation looks like this:
...diagram of operation flow...
Each command is logged on the server with a timestamp. If a client drops and reconnects, it can resume by asking for the backlog durings its absence, then replay those commands locally to get to the same state as its peers.
As we described in the AOSA chapter, this simple design minimized server-side CPU and RAM requirements, and demonstrates reasonable resiliency against network failure.
First Bottleneck

When the prototype was put to field testing in June 2011, we quickly discovered a performance problem with long-running sessions.
Spreadsheets are long-lived documents, and a collaborative session can accumulate thousands of modifications over weeks of editing.
Under the naive backlog model, when a client joins such an edit session, it must replay thousands of commands, incurring a significant startup delay before it can make any modifications.
To mitigate this issue, we implemented a snapshot mechanism. After every 100 commands sent to a room, the server will poll the states from each active client, and save the latest snapshot it receives next to the backlog. A freshly joined client receives the snapshot along with new commands entered after the snapshot was taken, so it only needs to replay 99 commands at most.
...diagram of snapshot-polling (keyed by shasum of all past commands)...
This workaround solved the CPU issue for new clients, but created a network performance problem of its own, as it taxes each client's upload bandwidth. Over a slow connection, this delays the reception of subsequent commands from the client.
Moreover, the server has no way to validate the consistency of snapshots. Therefore, an erroneous snapshot can corrupt the state for all newcomers, placing them out of sync with existing peers.
An astute reader may note that both problems are caused by the server's inability to execute spreadsheet commands. If the server can update its own state as it receives each command, it would not need to maintain a command backlog at all.
The in-browser SocialCalc engine is written in JavaScript. We considered translating that logic into Perl, but that carries the steep cost of maintaining two codebases going forward. We also experimented with embedded JS engines (V8, SpiderMonkey), but they imposed their own performance penalties when running inside Feersum's event loop.
Finally, by August 2011, we resolved to rewrite the server in Node.js.
Porting to Node.js

The initial rewrite went smoothly. Because both Feersum and Node.js are based on the same libev event model, and Pocket.io's API matches Socket.io closely, it took us only an afternoon to code up a functionally equivalent server.
Nevertheless, a side-by-side comparison of two implementations revealed several shortcomings with our initial translation:


Scalability: Initial micro-benchmarking revealed that Node.js's throughput is about 40% less than Feersum.
Because it's within our acceptable range, we accepted this drawback and expected that it will improve over time.


Verbosity: The JavaScript API of Express is highly repetitive, requiring twice as many lines of code.
We addressed this concern by adopting ZappaJS, a concise API layer on top of Express and Socket.io.


Composability: In contrast to the straightforward Perl syntax of Coro::AnyEvent, Node.js's callback-based API necessitates deeply nested functions that are difficult to reuse.
After experimenting with various flow-control libraries, we finally solved this issue by switching to LiveScript, a new language that compiles to JavaScript, with syntax inspired by Perl and Haskell.
LiveScript eliminated nested callbacks with novel constructs, such as backcalls and cascades, while also providing us with powerful syntactic tools for functional and object-oriented composition.


After settling on the ZappaJS framework, we continued the work to maintain each session's ongoing spreadsheet state in the server to reduce client-side CPU use and minimize bandwidth use:and
...diagram of maintaining spreadsheet state on server...
Server-side SocialCalc

The key enabling technology for our work is jsdom, a full implementation of the W3C document object model, which enables Node.js to load client-side JavaScript libraries within a simulated browser environment.
Using jsdom, it's trivial to create any number of server-side SocialCalc spreadsheets, each running in its own sandbox:
require! <[ vm jsdom ]>
create-spreadsheet = ->
  document = jsdom.jsdom \<html><body/></html>
  sandbox  = vm.createContext window: document.createWindow! <<< {
    setTimeout, clearTimeout, alert: console.log
  }
  return vm.runInContext """
    #packed-SocialCalc-js-code
    window.ss = new SocialCalc.SpreadsheetControl
  """ sandbox

Each collaboration session corresponds to a sandboxed SocialCalc controller, executing commands as they arrive. The server then transmits this up-to-date controller state to newly joined clients, removing the need for backlogs altogether.
Satisfied with benchmarking results, we coded up a Redis-based persistence layer and launched EtherCalc.org for public beta testing. For the next six months, it scaled remarkably well, performing millions of spreadsheet operations without a single incident.
On April 2012, after delivering a talk on EtherCalc at the OSDC.tw conference, I was invited by Trend Micro to participate in their hackathon, adapting EtherCalc into a programmable visualization engine for their real-time web traffic monitoring data.
For their use case, we created REST APIs for accessing individual cells with GET/PUT as well as POSTing commands directly to a spreadsheet. During the hackathon, the brand-new REST handler received hundreds of calls per second, updating graphs and contents of formula cells on the browser, without any hints of slowdown or memory leaks.
However, at the end-of-day demo, as we piped traffic data into EtherCalc and started to type formulas into the in-browser spreadsheet, the server suddenly locked up, freezing all active connections. We restarted the Node.js process, only to find it consuming 100% CPU, locking up again soon after.
Flabbergasted, we rolled back to an earlier data set, which did work correctly and allowed us to finish the demo. But I wondered: What caused the lock-up in the first place?
Profiling Node.js

...Fortunately, the other application for jsdom is Zombie, allowing us to recreate the problem with simulated high loads, and opened the possiblity for profiling...
...more investigation needed...
...instrumentation; states of art...
...bottleneck at jsdom...
...introduce server-side profiling tools we used, e.g. nodetime...
...introduce Ghostdriver, the PhantomJS WebDriver backend...
...introduce illumos: http://blog.nodejs.org/2012/04/25/profiling-node-js/
...as well as exporting spreadsheets to web pages via jsdom's innerHTML support...
dtrace-provider + stackvis
dtrace -n 'profile-97/execname == "node" && arg1/{ @[jstack(100, 8000)] = count(); } tick-60s { exit(0); }' > stacks.out
stackvis dtrace flamegraph-svg < stacks.out > graph.sv

Horizontal Scaling

...multi-core vs horizontal database...
...web workers, thread_a_gogo, and dom-less SC...
...multiple node.js servers, connected with Redis Pub/Sub...
Lessons Learned

...LiveScript and profiling with compiling-to-JS languages...
...JS everywhere; comparing various contexts of JS execution...
...future directions:...
......Moving server-side SC code into Postgres PL/v8js...
......Postgres NOTIFY compared with Redis (w/ benchmarks)...