This gist is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are given this private URL.
Every gist with this icon (
) is private.
Every gist with this icon (
This gist is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (
============================
Espra Protoplex Architecture
============================
-----
Nodes
-----
Espra Protoplex is specifically designed in order to be deployed in the context
of EC2, S3 and App Engine.
There are 7 types of nodes which will be running on top of EC2:
* Proxy
* Fileserver
* App
* Mail
* Live
* Seed
* Admin
These will be complemented by 2 App Engine applications:
* Espra
* EspraLog
Node Structure
==============
On startup all nodes establish a connection to the Seed node.
::
+----------------+
| Internet Horde |
+----------------+
| +-------------+
| +-----------+ | Other Nodes |
± | Seed Node | +-------------+
| +-----------+ |
| | \ |
+-------------+ | \ |
| Public Port | | +----------------------------------+
+-------------+ | | Meta Port (Internal Access Only) |
\ | +----------------------------------+
\ | /
\ | /
+===========\========|=====/=================================+
| \ | / |
| +----------------------+ |
| | Node: Parent Process | |
| +----------------------+ |
| | |
| | |
| +-----------------------+-----------------------+ |
| | | | |
| +---------------+ | +---------------+ |
| | Child Process | +---------------+ | Child Process | |
| +---------------+ | Child Process | +---------------+ |
| +---------------+ |
| |
+============================================================+
Proxy Nodes
===========
:Protocols: HTTP, HTTPS
:ELB: Yes
:LocalPorts: 8080, 8443
:RemotePorts: 80, 443
Proxy nodes are intelligent proxies to the Live nodes. They:
* Parse enough of the request according to predefined handlers.
* Query the Seed node to find out which particular Live node they should be
relaying the request to.
* Stop any further processing and simply proxy to and from the target Live node.
In order to facilitate high throughput, Proxy nodes will use the multi-process
single-threaded coroutines-based HTTP server.
Depending on whether they are in the ``us-east`` or ``eu-west`` region, the
Proxy nodes will respectively respond to requests on either
``us-1.live.espra.com`` or ``eu-1.live.espra.com``.
For scalability, the Proxy nodes will sit behind Auto Scaling enabled Elastic
Load Balancers (ELB) in both regions.
Fileserver Nodes
================
:Protocols: HTTP, HTTPS
:ELB: Yes
:LocalPorts: 8080, 8443
:RemotePorts: 80, 443
Fileserver nodes handle static files in a variety of different ways. Initially,
four specific handlers would be specified:
1. The ``AppFilesHandler`` will serve the main Espra app related assets from
memory, e.g. javascript, css, images, etc. These in-memory caches would be
invalidated when a new app build is pushed out by the Seed node.
2. The ``S3FilesHandler`` will look for the requested file in a local disk cache
before querying S3 for the source file if it's not found. If found, the
source file will be cached locally and (if appropriate) uncompressed before
being served as a response. If not found, the handler will register for an
update from the Live nodes and store the file key in an in-memory cache so as
to minimise unnecessary S3 requests.
3. The ``VhostFilesHandler`` will query the Main Datastore for the storage
reference for the file key and Host combination and then use the
S3FilesHandler to do the actual serving. Any found or not found storage
references will be saved in local caches and invalidated from registrations
from the Live nodes.
4. The ``UploadHandler`` will first validate the upload token sent with a POST
request. And if it's valid according to the Main Datastore, it will start
saving the uploaded file locally. As the upload progresses, a combined
(sha256+whirlpool) hash will be created and the Live nodes notified so that
upload progress can be relayed back to the uploader. Once the upload is
complete, the handler will in turn compress the file if it's compressible,
before updating the Main Datastore and uploading it to S3.
All responses will be aggressively cached with HTTP headers with a minimum
expiration set to at least 1 month. And the nodes will be using the same HTTP
server as the Proxy nodes and similarly sit behind an Auto Scaling ELB.
However, since latency isn't too critical an issue with file serving, the
Fileserver nodes and the S3 storage will only exist in the ``us-east`` region
and will respond to requests on ``*.espfile.com`` or appropriately CNAME'd
hosts.
App Nodes
=========
:Protocols: HTTP, HTTPS
:ELB: Yes
:LocalPorts: 8080, 8443
:RemotePorts: 80, 443
The App nodes are more CPU-bound and therefore will use a slightly different
server to the other nodes: a multi-process multi-threaded HTTP server.
Main Datastore
==============
:Protocols: HTTP, HTTPS
:RemotePorts: 80, 443
The Main Datastore application will be running on the ``espra`` App Engine
application. It will be accessed only via SSL and with a token and will have two minimal handlers:
This is where all the structured data will be stored and we rely on
App Engine to provide a query-able and
* Provide access to App Engine's Remote API for access by the various nodes.
Mail
Image
Datastore
Taskqueue
-------------
Remote Access
-------------
--------------
Load Balancing
--------------
Multi process
Queue
SSL
Update
Accounting/Quota
Planned Maintenance
GAE
* (Buildbot)
* Urlfetch
* Memcache
Thus will get load balanced by the kernel.
Worker
Queue
cpus -- core
I/O bound
Understudy
node roles
Failure and crashes
DNS
===
DNS for the various Espra domains will be delegated to the DNS services provided
by Linode and Slicehost.
This should be sufficient protection for now in case of extreme failure at
either provider. Both providers also offer relatively decent APIs which can be
used to update the zone records.
Support Services
================
A number of support services will be running on the Linode and Slicehost VPS
servers:
* Since the ``espra.com`` zone apex cannot be CNAME'd onto the ELB, it will
instead round-robin to Apache instances at the various VPS servers. Apache
will then redirect the request to the ``www.espra.com`` host on EC2.
* Off-site monitoring apps will test the responsiveness of the various node
services on EC2 as well as the App Engine applications. The data will be
logged locally and if any service stops responding, a priority SMS will be
sent to the Admins.



