A custom statically built PID 1 will work everywhere including the lowest grade OpenVZ VPSes. So we can implement our cloud controller as PID 1 (i.e. /sbin/init or systemd replacement) and avoid the usual bloat. The system management interface will be exposed as a Tor hidden service. The interface will be used to push container images, start containers and perform other container management operations mentioned in [containers.md]
We can remove all or almost all the usual UNIX management tool set from production system. Ideally, the system will consist of a single /sbin/init executable, a state database with configuration data and cryptographic key material and a set of container file systems.
The containers will be running as chroots (which is the only level of sub-container isolation available on OpenVZ) or at higher levels of isolation if VPS provider allows that. For example, Xen and KVM readily allow Docker, and some KVM providers allow full nested KVMS (i.e. different kernels) within.
I consider the following minimal scenario for my application: a C++ application and 2 node.js applications deployed from images to a OpenVZ host.
PID 1 is the lowest level facility under our control there. So we can leave it as is and run under it or replace it.
If we leave it - it's basically SysVInit which is bloated and shell-based. Many OpenVZ providers run very old (albeit patched for vulnerabilities) kernels that cannot run systemd.
If we only utilize pid 1 and inittab, it is not really much different from just having our own pid 1.
Note that if we need, we can have almost normal shell using a dropbox or even dropbox+dropbear in a chroot container deployed the same way as application containers.
We need to store:
- /sbin/init
- /chroots/app1/*
- /chroots/app2/*
- /chroots/app3/*
- Tor hidden service private key
- Few essential /proc /dev /sys /etc parts
/sbin/init can be self-contained (e.g. with statically built syscall interface using ghc-musl). One question remains if we can have reasonable feature set without relying on shared .so implemented in C.
A minimal node.js container seems to be around 5 mbs even with normal glibc inside the chroot, so the total disk footprint will never be around 300+ mbs of an empty CoreOS (it's actually around 1 GB for various reasons). And tinycore relies heavily on kernel functionality we miss (e.g. compressed file systems) to be small.
Disk, CPU, memory and bandwitdth usages are non-issues today. The smallest VPS you can get for $3 is about 256MB RAM 5GB HDD 300 GBs of traffic, and even the bloated OSes use about 64-96 MBs of RAM, so it's 25-30% overhead on the tiniest systems which is acceptable.
So we should focus on reducing the important metrics instead:
- Overall system complexity and manageability
- Deployment Time
- Reproducable Deployment
- Hosting costs
- Elasticity
- Disaster recovery