CodePen allows users to write html/css/js in their browsers, using an editor that looks like this:
We also will preprocess haml/sass/jade/stylus and others for our users. An example is something like this:
http://codepen.io/anon/pen/OPBpMj
explained:
Preprocessors can be insecure. We're running untrusted code on our servers. People have reported and we've fixed Remote Code Execution exploits since we started offering this service. We've done all we can to prevent this via Regex stripping of harmful code, things like Kernel
and File
are regex'd out, but it is an arms race. A dedicated hacker will eventually break us.
You can process haml at the command line like this:
echo '%p hii' > /tmp/thing.haml
/bin/haml /tmp/thing.haml
which produces
<p>hii</p>
We want to start using Docker containers to run short-lived sessions for preprocessing haml at first, and eventually all of our preprocessors.
I've got a proof-of-concept solution that does the following:
- Accepts a web request for preprocessing
- calls out to the the docker dameon like this
class HamlDocker
def self.call(markup)
uuid = SecureRandom.uuid
File.write("/tmp/#{uuid}", markup)
dir_args = '-v /tmp:/temp'
haml_call = "\"haml /temp/#{uuid}\""
docker_img = Settings.docker.haml_image
cmd = "docker run #{dir_args} #{docker_img} su runner -c #{haml_call}"
rslt = `#{cmd}`
File.delete("/tmp/#{uuid}")
rslt
end
end
We've found this to take an acceptable 400ms to start, preprocess, and return.
I'm also aware that i'll need to spawn a docker rm
call async to remove the spent container started above.
Basically, what don't we know about stopping/starting thousands of containers a day? Spread across 3 boxes, we get about 1000 requests per minute to the preprocessor service, but most of those calls are cached and returned. I'd say cash hit/miss ratio is 10:1, so we're looking at about 100 containers per minute with a maximum run time of 3 seconds, after which the call is killed to prevent infinate loops. We know how to handle scaling of the infrastructure with AWS, but I don't know much about Docker.
- Can the docker dameon handle this type of abuse?
- Are there more crufty things left around besides the containers? For example, is there some log I need to be trimming as well
- Any words of caution you can provide.
We've implemented the solution above, but it seems to fall down under load. The docker service seems incapable of doing more than 1.4 docker run
calls per second. What's worse is that container destruction takes even longer, about a second, regardless of the container type. We've tried doing the following to speed things up, but with no luck:
- run the service on a ramdisk do avoid disk contention, no dice
- build a tiny image. We used Alpine Linux, which after ruby install weighs only 33mb
The problem that seems to be killing us is the startup time for ruby. A call to the haml executable (haml /tmp/hi
) call on an ssd-backed macbook pro takes 350ms. In contrast, the same haml call against a webserver that has the haml gem loaded is 15ms. So, really this may not be a docker problem at all but a gem loading problem.
Run a tiny webserver in a docker container that does nothing but preprocess haml. Here's the code:
require 'sinatra'
require 'haml'
require 'json'
class SinatraParser < Sinatra::Base
post '/haml' do
begin
haml_engine = Haml::Engine.new(params[:markup] || '')
{ success: true, html: haml_engine.render }.to_json
rescue Haml::SyntaxError, Haml::Error => e
{ success: false, error: e.message, line: e.line }.to_json
end
end
end
Our preprocessor service calls out to this server above, asking it to process the haml. The haml server itself is run within a crippled environment: read-only filesystem, no networking, etc. So, if somone broke out of he regex jail, the attack surface is very small. The solution is not as "pure" as the one where each call happens in an individual docker container, but it is a step forward.
One quick note, not sure if you'd end up eating the cost rm costs, but
--rm
on your run will remove the container after it's killed to. Never timed it to see what the difference is there. Doing an async call might be better for you.There's a third option that's kind of a hybrid of the two solutions you have here. You could spin up a bunch of idle worker containers that are waiting for a file to appear or HTTP request to be invoked or some such. Those would perform the action, then die. With a routing layer in front of it, you could have it manage starting new containers as the old ones get consumed. If you keep enough of a buffer, this should avoid issues with starting too many.
A quick hack to keep them running would be something like supervisord and processes that stay running until they've finished. As soon as they finish and stop, supervisord would detect that as a stopped process and try to restart it.