tav (owner)

Revisions

gist: 229376 Download_button fork
public
Public Clone URL: git://gist.github.com/229376.git
Embed All Files: show embed
Text only #
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
============================
Espra Protoplex Architecture
============================
 
-----
Nodes
-----
 
Espra Protoplex is specifically designed in order to be deployed in the context
of EC2, S3 and App Engine.
 
There are 7 types of nodes which will be running on top of EC2:
 
* Proxy
* Fileserver
* App
* Mail
* Live
* Seed
* Admin
 
These will be complemented by 2 App Engine applications:
 
* Espra
* EspraLog
 
 
Node Structure
==============
 
On startup all nodes establish a connection to the Seed node.
 
::
 
   +----------------+
   | Internet Horde |
   +----------------+
         | +-------------+
         | +-----------+ | Other Nodes |
         ± | Seed Node | +-------------+
         | +-----------+ |
         | | \ |
     +-------------+ | \ |
     | Public Port | | +----------------------------------+
     +-------------+ | | Meta Port (Internal Access Only) |
            \ | +----------------------------------+
             \ | /
              \ | /
   +===========\========|=====/=================================+
   | \ | / |
   | +----------------------+ |
   | | Node: Parent Process | |
   | +----------------------+ |
   | | |
   | | |
   | +-----------------------+-----------------------+ |
   | | | | |
   | +---------------+ | +---------------+ |
   | | Child Process | +---------------+ | Child Process | |
   | +---------------+ | Child Process | +---------------+ |
   | +---------------+ |
   | |
   +============================================================+
 
 
Proxy Nodes
===========
 
:Protocols: HTTP, HTTPS
:ELB: Yes
:LocalPorts: 8080, 8443
:RemotePorts: 80, 443
 
Proxy nodes are intelligent proxies to the Live nodes. They:
 
* Parse enough of the request according to predefined handlers.
 
* Query the Seed node to find out which particular Live node they should be
  relaying the request to.
 
* Stop any further processing and simply proxy to and from the target Live node.
 
In order to facilitate high throughput, Proxy nodes will use the multi-process
single-threaded coroutines-based HTTP server.
 
Depending on whether they are in the ``us-east`` or ``eu-west`` region, the
Proxy nodes will respectively respond to requests on either
``us-1.live.espra.com`` or ``eu-1.live.espra.com``.
 
For scalability, the Proxy nodes will sit behind Auto Scaling enabled Elastic
Load Balancers (ELB) in both regions.
 
 
Fileserver Nodes
================
 
:Protocols: HTTP, HTTPS
:ELB: Yes
:LocalPorts: 8080, 8443
:RemotePorts: 80, 443
 
Fileserver nodes handle static files in a variety of different ways. Initially,
four specific handlers would be specified:
 
1. The ``AppFilesHandler`` will serve the main Espra app related assets from
   memory, e.g. javascript, css, images, etc. These in-memory caches would be
   invalidated when a new app build is pushed out by the Seed node.
 
2. The ``S3FilesHandler`` will look for the requested file in a local disk cache
   before querying S3 for the source file if it's not found. If found, the
   source file will be cached locally and (if appropriate) uncompressed before
   being served as a response. If not found, the handler will register for an
   update from the Live nodes and store the file key in an in-memory cache so as
   to minimise unnecessary S3 requests.
 
3. The ``VhostFilesHandler`` will query the Main Datastore for the storage
   reference for the file key and Host combination and then use the
   S3FilesHandler to do the actual serving. Any found or not found storage
   references will be saved in local caches and invalidated from registrations
   from the Live nodes.
 
4. The ``UploadHandler`` will first validate the upload token sent with a POST
   request. And if it's valid according to the Main Datastore, it will start
   saving the uploaded file locally. As the upload progresses, a combined
   (sha256+whirlpool) hash will be created and the Live nodes notified so that
   upload progress can be relayed back to the uploader. Once the upload is
   complete, the handler will in turn compress the file if it's compressible,
   before updating the Main Datastore and uploading it to S3.
 
All responses will be aggressively cached with HTTP headers with a minimum
expiration set to at least 1 month. And the nodes will be using the same HTTP
server as the Proxy nodes and similarly sit behind an Auto Scaling ELB.
 
However, since latency isn't too critical an issue with file serving, the
Fileserver nodes and the S3 storage will only exist in the ``us-east`` region
and will respond to requests on ``*.espfile.com`` or appropriately CNAME'd
hosts.
 
 
App Nodes
=========
 
:Protocols: HTTP, HTTPS
:ELB: Yes
:LocalPorts: 8080, 8443
:RemotePorts: 80, 443
 
The App nodes are more CPU-bound and therefore will use a slightly different
server to the other nodes: a multi-process multi-threaded HTTP server.
 
 
Main Datastore
==============
 
:Protocols: HTTP, HTTPS
:RemotePorts: 80, 443
 
The Main Datastore application will be running on the ``espra`` App Engine
application. It will be accessed only via SSL and with a token and will have two minimal handlers:
 
This is where all the structured data will be stored and we rely on
App Engine to provide a query-able and
 
 
* Provide access to App Engine's Remote API for access by the various nodes.
 
Mail
 
Image
 
Datastore
 
Taskqueue
 
-------------
Remote Access
-------------
 
 
 
--------------
Load Balancing
--------------
 
 
Multi process
 
Queue
 
SSL
 
Update
 
Accounting/Quota
 
Planned Maintenance
GAE
 
* (Buildbot)
* Urlfetch
* Memcache
 
Thus will get load balanced by the kernel.
 
Worker
Queue
 
cpus -- core
 
I/O bound
 
Understudy
 
node roles
 
 
Failure and crashes
 
 
 
 
DNS
===
 
DNS for the various Espra domains will be delegated to the DNS services provided
by Linode and Slicehost.
 
This should be sufficient protection for now in case of extreme failure at
either provider. Both providers also offer relatively decent APIs which can be
used to update the zone records.
 
 
Support Services
================
 
A number of support services will be running on the Linode and Slicehost VPS
servers:
 
* Since the ``espra.com`` zone apex cannot be CNAME'd onto the ELB, it will
  instead round-robin to Apache instances at the various VPS servers. Apache
  will then redirect the request to the ``www.espra.com`` host on EC2.
 
* Off-site monitoring apps will test the responsiveness of the various node
  services on EC2 as well as the App Engine applications. The data will be
  logged locally and if any service stops responding, a priority SMS will be
  sent to the Admins.