Skip to content

Instantly share code, notes, and snippets.

@craigfurman
Last active August 30, 2017 14:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save craigfurman/cf065fce4e92e77b05ad524b7e21fd3e to your computer and use it in GitHub Desktop.
Save craigfurman/cf065fce4e92e77b05ad524b7e21fd3e to your computer and use it in GitHub Desktop.
Thoughts on how we use unix sockets in garden-runc.

General Notes

  1. The unix socket is executable. Let's change this.
  2. All users have read+write permissions on the socket. This has been the case for a long time. I deployed grr 1.0.0, and as expected we don't chmod the socket in our bosh release there. As far as I can see we never have.
  3. Its default path is /var/vcap/data/garden/garden.sock. This default is unchanged in typical CF deployments. The ancestor dirs in that default path are all read+executable by all users (mode 755). This is true even when using a umask-hardened stemcell, as we chmod /var/vcap/data/garden to 755 in order to keep cloud foundry deployments working on these stemcells.
  4. The socket is owned by whoever gdn runs as: currently either root or maximus (container-root).
  5. Concourse uses TCP not a socket so all the focus here is on cloud foundry deployments.
  6. In CF deployments the diego rep runs as vcap.

Security holes that exist today

  1. Every user on a garden server VM is root, because they can create privileged containers.
  2. Every user on the VM can use the socket to mess with containers.
  3. In rootless deployments, there are no privileged containers. However, if a user process were to break out of its namespaces (which should be impossible), then they can mess with other containers using the socket (or the filesystem).

Who should own the socket?

Traditionally, bosh jobs run as vcap or root. Rep runs as vcap. If we chowned the socket to vcap:vcap in the rootful case, and ran as vcap in the rootless case, we could reduce the mode of the socket to 600, improving security.

In the rootless case, there is another point to question: should we keep mapping the gdn user to container root, or map another user? Today gdn and container root are both maximus, but we could investigate running gdn as vcap and keeping maximus as container root. This could create problems around graph ownership for rootless though.

I want to go on a bit of a tangent about bosh now before coming back to the question of who should own the socket.

Bosh has always been weird from a security point of view (IMO) in 2 ways: SSH and services.

Services are a little odd because bosh / bosh jobs usually don't create a unique user to run their services as. Usually some level of isolation is achieved on Linux by running services (nginx, postgres, etc) as users named after the service itself. In bosh, if an application server running as vcap is compromised, then the malicious user can mess with the backing database service.

As for SSH, If you have bosh ssh privilege then you are root (you enter as a freshly-created user with passwordless sudo privilege). If you ssh as vcap and have vcap's password, you are root too as vcap is a sudoer. Exploiting a vcap server isn't enough to become root as you will need the password.

The conclusion of that tangent is that both gdn and rep need to read and write the socket, and the bosh-traditional way of doing that is to run both as vcap (in the rootless case at least - in the rootful case it gdn can read/write the socket anyway because root privilege). If we do this we'll have to figure out how easy/desirable it is to map a different user than gdn's to container root.

If we want to keep gdn running as maximus in the non-root case, and also want to reduce the permissions of the socket, we could add maximus to the vcap group (or create a new group) but that would be weird from bosh's perspective.

If the operator overrides the socket path, it's their responsibility to ensure gdn can read and write it. As well as changing the filemode from 777 to 600 (or 660), we could remove our code that deletes an existing socket file on server startup so that clients can create, chown, and chmod the socket as they wish.

An alternative to ensuring gdn can read/write the socket is to open it as root in the control script, before execing gdn using an fd instead of a path in gdn's flags. Gdn could then just use its file descriptor to read and write as rootless.

The file descriptor approach has the advantage of not requiring us to mess about with permissions, but is a bit "magic" and makes testing this path slightly awkward. Another idea is to deprecate sockets, but this may not be desirable (discussed in a section below).

This might all be a bit futile - every server (that could be remotely exploited) that runs on a bosh VM is root or vcap (sudoer), and every ssh session is either vcap (sudoer) or a bosh ssh user (passwordless sudoer).

Authentication / Authorization

To answer the question from acceptance "is another option to use TLS instead of file permissions on the socket": maybe, but only if you're talking about client certs for authentication. We could still use HTTPS to avoid snooping / mitm but neither of these threats are possible over unix socket, only over a network (AFAIK).

I do think we should introduce some sort of authentication and authorization, but TLS client certificates seem overcomplicated when you consider that a garden instance is single-user. HTTP basic auth should be sufficient.

The "everyone is root or vcap" problem rears its head again regarding visibility of the diego rep's config file / control script, one of which must contain the plain text gdn password.

Interaction with umask-hardened stemcells

To keep Cloud Foundry deploying on umask-hardened stemcells (vcap and root have umask of 077), we chmod the default socket path's parent dir (/var/vcap/data/garden) to 755. We should be able to stop doing this if we get operators to override the socket path, or use another dir such as /var/vcap/sys/run to avoid chmodding our state dir.

Should we use sockets at all?

I think we should apply proper authentication/authorization regardless of what else we do. If we do this, then the ownership of a socket will diminish in importance is it isn't the only gatekeeper to security anymore, and we could conceivably just bind to a loopback IP using tcp.

However, if a process escapes its network namespace and sees the host loopback interface, it will be able to use the garden API if we use TCP. Authn/authz mitigates this somewhat.

If using unix sockets offers us a performance gain then we should keep using them in any case.

Conclusion

  1. Can we deprecate sockets and use TCP? Probably not, but if so skip the rest of these steps.
  2. Reduce permissions of the socket (perhaps to 600) and stop making it executable.
  3. We should put the socket in /var/vcap/sys/run by default to avoid having to chmod the garden dir. This might create upgrade difficulties.
  4. Who should own the socket? Possibly vcap for compatibility with diego rep?
  5. What user should we run rootless gdn as?
  6. Look at your answers to the previous two questions. If they are different, we should probably open the socket as root and use the fd in rootless gdn after execing to another user.
  7. We should introduce authentication/authorization to Garden's HTTP API. Basic auth will probably do.
  8. When solving the socket ownership problem, return to chmodding the garden dir to 750 rather than 755.
  9. For rootless gdn, should we map a non-gdn-server user to container root, and not map the gdn server user? In case of namespace escape, this would disallow the container process to use the socket.
  10. Think about whether or not gdn and rep should run as the same user (the bosh way) or different users (the traditional way). This probably affects your answers to all other questions here, and if you decide on the traditional way we should speak to the bosh team.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment