Skip to content

Instantly share code, notes, and snippets.

@tsabat

tsabat/docker.md Secret

Last active August 29, 2015 14:17
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save tsabat/4b979c275b44cbb42ca8 to your computer and use it in GitHub Desktop.

Background

CodePen allows users to write html/css/js in their browsers, using an editor that looks like this:

http://codepen.io/pen

We also will preprocess haml/sass/jade/stylus and others for our users. An example is something like this:

http://codepen.io/anon/pen/OPBpMj

explained:

http://d.pr/i/xsua/3w9DFvSa

Problem

Preprocessors can be insecure. We're running untrusted code on our servers. People have reported and we've fixed Remote Code Execution exploits since we started offering this service. We've done all we can to prevent this via Regex stripping of harmful code, things like Kernel and File are regex'd out, but it is an arms race. A dedicated hacker will eventually break us.

Proposed Solution

You can process haml at the command line like this:

echo '%p hii' > /tmp/thing.haml

/bin/haml /tmp/thing.haml

which produces

<p>hii</p>

We want to start using Docker containers to run short-lived sessions for preprocessing haml at first, and eventually all of our preprocessors.

I've got a proof-of-concept solution that does the following:

  1. Accepts a web request for preprocessing
  2. calls out to the the docker dameon like this
class HamlDocker
  def self.call(markup)
    uuid        = SecureRandom.uuid

    File.write("/tmp/#{uuid}", markup)

    dir_args   = '-v /tmp:/temp'
    haml_call  = "\"haml /temp/#{uuid}\""
    docker_img = Settings.docker.haml_image

    cmd = "docker run #{dir_args} #{docker_img} su runner -c #{haml_call}"

    rslt = `#{cmd}`

    File.delete("/tmp/#{uuid}")

    rslt
  end
end

We've found this to take an acceptable 400ms to start, preprocess, and return.

I'm also aware that i'll need to spawn a docker rm call async to remove the spent container started above.

Questions

Basically, what don't we know about stopping/starting thousands of containers a day? Spread across 3 boxes, we get about 1000 requests per minute to the preprocessor service, but most of those calls are cached and returned. I'd say cash hit/miss ratio is 10:1, so we're looking at about 100 containers per minute with a maximum run time of 3 seconds, after which the call is killed to prevent infinate loops. We know how to handle scaling of the infrastructure with AWS, but I don't know much about Docker.

  1. Can the docker dameon handle this type of abuse?
  2. Are there more crufty things left around besides the containers? For example, is there some log I need to be trimming as well
  3. Any words of caution you can provide.

Problems

We've implemented the solution above, but it seems to fall down under load. The docker service seems incapable of doing more than 1.4 docker run calls per second. What's worse is that container destruction takes even longer, about a second, regardless of the container type. We've tried doing the following to speed things up, but with no luck:

  • run the service on a ramdisk do avoid disk contention, no dice
  • build a tiny image. We used Alpine Linux, which after ruby install weighs only 33mb

The problem that seems to be killing us is the startup time for ruby. A call to the haml executable (haml /tmp/hi) call on an ssd-backed macbook pro takes 350ms. In contrast, the same haml call against a webserver that has the haml gem loaded is 15ms. So, really this may not be a docker problem at all but a gem loading problem.

Alternate Solution

Run a tiny webserver in a docker container that does nothing but preprocess haml. Here's the code:

require 'sinatra'
require 'haml'
require 'json'

class SinatraParser < Sinatra::Base
  post '/haml' do

    begin
      haml_engine = Haml::Engine.new(params[:markup] || '')
      { success: true, html: haml_engine.render }.to_json
    rescue Haml::SyntaxError, Haml::Error => e
      { success: false, error: e.message, line: e.line }.to_json
    end
  end
end

Our preprocessor service calls out to this server above, asking it to process the haml. The haml server itself is run within a crippled environment: read-only filesystem, no networking, etc. So, if somone broke out of he regex jail, the attack surface is very small. The solution is not as "pure" as the one where each call happens in an individual docker container, but it is a step forward.

@tswicegood
Copy link

One quick note, not sure if you'd end up eating the cost rm costs, but --rm on your run will remove the container after it's killed to. Never timed it to see what the difference is there. Doing an async call might be better for you.


There's a third option that's kind of a hybrid of the two solutions you have here. You could spin up a bunch of idle worker containers that are waiting for a file to appear or HTTP request to be invoked or some such. Those would perform the action, then die. With a routing layer in front of it, you could have it manage starting new containers as the old ones get consumed. If you keep enough of a buffer, this should avoid issues with starting too many.

A quick hack to keep them running would be something like supervisord and processes that stay running until they've finished. As soon as they finish and stop, supervisord would detect that as a stopped process and try to restart it.

@tsabat
Copy link
Author

tsabat commented Mar 19, 2015

@tswicegood these are great suggestions.

  • Regarding --rm, the teardown happens before standard out is returned. Since docker rm takes > 1 second, it adds more overhead than we can handle.
  • We've considered the 'warm pool' idea, but is complicated. I'd love a simple solution before we start going down that road. We're also still stuck with the slow teardown problem.

@SamSaffron
Copy link

I think you would be pushing it to be able to be spinning up and down an enormous amount of containers, there is a lot of setup and teardown and spinning up ruby is super expensive.

I think the sinatra parser solution is good, if you want some extra protection you could "fork" on every request do the rendering in the child and communicate back to the master, its possible you may be able to de-elevate the forked child even more after forking.

forking also gives you an advantage that you can set up a very clean timeout cause you would just kill -9 if it takes too long leaving your master environment pristine

@tswicegood
Copy link

Definitely more complicated. The poor man's version via supervisord might buy you enough, but it definitely won't handle burst traffic where you overload the number of spare workers you have. :-/

@tsabat
Copy link
Author

tsabat commented Mar 19, 2015

@SamSaffron great idea! I'm currently using the forking solution, but did not want to complicate the example.

@deedubs
Copy link

deedubs commented Mar 19, 2015

Have you tried going lower level than docker using lxc or rkt directly

@rheinwein
Copy link

Re: stale containers, you could also run docker rm $(docker ps -aq -f status=exited) at some interval (assuming not using --rm in the run string). -a for all containers, -q for only numeric ids, and -f filters based on k/v provided.

@domdavis
Copy link

My first thought here (and admittedly I've skim read and I'm tired) is that you could use AWS Lambda's here to run the code.

Your alternative solution is also quite a good one. If someone breaks out of the preprocessor it doesn't matter. They're also stateless so you could sit them behind a load balancer and run as many as needed.

@tsabat
Copy link
Author

tsabat commented Mar 19, 2015

@deedubs I'm a system guy, but not a neckbeard! 😝 haha. I fear going my own way. Docker is a community solution and exploits are closed by the community. My home-grown lxc solution will not peer reviewed, whereas Docker will.

@nathanleclaire
Copy link

Cool use case.

Couple of comments:

  • Whichever solution you end up using, make sure to create a non-root user and run all of the code in the containers as this lower privileged user with USER directive in your Dockerfile
  • I like the Sinatra solution, at least to get a POC together. Spinning up as many containers as you want to at once will be difficult without spreading the load over a pool of servers (or idle containers) as mentioned.
  • I highly recommend using AUFS for the graph driver - it's generally the most stable and fastest right now
  • Keep an eye on disk space and make sure you always clean up containers and images that you don't need anymore. For a variety of reasons, the daemon historically has tended to chew up disk over time and depending on your setup you might need to periodically blow away /var/lib/docker / power cycle the daemon etc. And if any of your containers use volumes (not the host-mounted kind), make sure to docker rm with the -v flag
  • I think that if you are removing all of the containers after you run them you won't have to worry about rotating logs and so on (although you will if you have a long-lived server as in the second example) theoretically, but keep an eye on it anyway.

** Hail Marys **:

  • You could try keeping the gems that you want to load in a volume to take advantage of "normal" disk RW speeds, but this is kind of hacky and bad for a variety of reasons
  • I always wonder if CRIU could help people with situations like this, but CRIU+Docker support is like pre-pre-pre-alpha.

@tsabat
Copy link
Author

tsabat commented Mar 19, 2015

@rheinwein the docker rm $(docker ps -aq -f status=exited) on cron is a good idea, but it is serial. A listener on a redis pub-sub would allow me to do it in parallel.

@mustafaakin
Copy link

I made a web tool for grading programming assignments. (http://blog.docker.com/2014/04/docker-in-education-interview/) Not only I run untrusted code, I run student codes which are far more dangerous 😳 I have almost 1.5 years of experince with them:

  • Container creation/deletion highly depends on th storage backend. I use AUFS on a Samsung EVO SSD, and it is very fast. To make it faster, you can consider placing /var/lib/docker under tmpfs
  • Don't run direct commands, use HTTP API, it is a little faster and more reliable.
  • Disable TLS, protect your docker daemon, and protect your Docker daemons by own measures by restricting access to a single request IP maybe.
  • Disable network stack creation
  • You do not need to delete every container after execution. Instead, you can monitor /sys/fs/cgroup/blkio metrics of each container disk usage metrics in 0.1 sec and delete them instantly, others when there is less load.
  • Instead of deletion of container via Docker, you can delete it directly under /var/lib/docker folder, the reason that Docker takes long time to delete that is it checks every container for possible dependencies (not exaclty sure). I had a bug when Docker was version 0.7, I had over 4000 containers, creating a new one would take minutes, and each time a student tried to create a container, they kept going and crashed my system, I could not even ssh 😄 I went to university at 4 am for physical reboot. But here is the catch: You must be sure there will be no containers originating from the container you have created to run your HAML code, otherwise If you create a container based on a container, while deleting it manually it would possibly crash. However, in your case you would be fine since you are only creating one indepdendent container from base haml code each time.
  • Be sure to use memory as much as you can, with those small changes they will not even probably hit the disk if enough memory is provided.
  • Make sure you use proper ulimits upon creating a container. A code in container can still fork bomb your host. I usually create a script that first sets some ulimits, for open files, max processes, file size etc. than run the code I want, because in Operating Systems course students love to fork bomb me uninentionally.

I hope those helps to anyone 😄

@dougborg
Copy link

You may want to take a look at @rgbkrk's solution for instant, temporary ipython notebooks: https://lambdaops.com/ops-lessons-and-instant-temporary-ipython-jupyter-notebooks/. It seems like a similar use case.

@athoune
Copy link

athoune commented Mar 20, 2015

apparmor is a clean and simple solution for sandboxing, without trusting the language, your code, foreign code.

@whitmo
Copy link

whitmo commented Mar 21, 2015

Not sure my inputs is of any great value (lots of good stuff above, especially the temp ipy notebooks and cleaning up containers). I like option B personally: running a daemon offers a few nice advantages:

  • lower latency
  • easier to monitor (important for security)
  • gives you a way to inject handling and monitoring on ingress and egress
  • less container detritus management

In general:

  • apparmor is great for locking stuff down on the container host and inside the container
  • check out heka for getting logs out of docker (super easy operationally and very flexible)
  • containers are not a replacement for the isolation of vms, so consider scripting to periodically trash and recycle your container hosts.
  • I assume you would plan to trash and recycle your containers if you go the daemon route
  • dunno if go haml would serve your purposes, but having a single binary could allow you to crank your attack surface way down.
  • take a look at https://github.com/google/cadvisor for keeping track of what's happening with running containers (to keep an eye out for funny biz)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment