Skip to content

Instantly share code, notes, and snippets.

@owulveryck
Last active January 10, 2024 08:24
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save owulveryck/1c5a6b0844dedc0972a210b316b6f124 to your computer and use it in GitHub Desktop.
Save owulveryck/1c5a6b0844dedc0972a210b316b6f124 to your computer and use it in GitHub Desktop.
havre / a docker like that runs an embedded squashfs image

About

This is a proof of concept of a portable operating system a-la-docker.

It is a single binary that contains an image of an OS in SquashFS and the binary to run it.

a make will build the binary with an embedded alpine image. make havre-xenial will build a binary with an embedded ubuntu image.

What does the binary do?

  • locate the offset of the FS image within itself (based on magic numbers for gzip and lzma);
  • mount it on a loop device;
  • create a new namespace for a new process;
  • call itself in the new namespace (via /proc/self/exe);
  • chroot in the mounted image;
  • finally executes the command and arguments that were passed in the execution.

For example:

sudo ./havre-alpine /bin/sh -l

executes a shell in the distribution that is embedded in the binary (alpine linux here).

Credits

The main idea has been taken from Liz Rice's talk: What is a container, really? Let's write one in Go from scratch. She took the idea from Julian Friedman Build Your Own Container Using Less than 100 Lines of Go

I've simply added the principle of the embedded image and the code to mount it on a loopback device.

Disclaimer

Warning this is a POC, errors are badly tested, it should be run as root, well there is still a lot to do to actually use that in the real life...

TODO

  • Obviously some code cleaning and refactoring.
  • Playing with CGroups...
  • add some fun feature
  • add even more fun features
  • sharing

Exemple

# Running locally... I see all the PIDs
$ ps auxww | wc -l
199
# Entering the "chroot and namespace"
$  sudo ./havre-alpine run /bin/sh -l
localhost:/# cat /etc/alpine-release
3.6.2
# I see only my processes
localhost:/# ps auxww 
PID   USER     TIME   COMMAND
    1 root       0:00 /proc/self/exe child /tmp/.havre689486390 /bin/sh -l
    5 root       0:00 /bin/sh -l
    6 root       0:00 ps auxww

Files

All the system logic is in main.go. The file offset.go is just a helper to locate the offset and does not carry any system logic for the container.

// +build linux
package main
import (
"log"
"os"
"os/exec"
"syscall"
"io/ioutil"
losetup "gopkg.in/freddierice/go-losetup.v1"
)
func main() {
switch os.Args[1] {
case "run":
run()
case "child":
child()
default:
panic("what should I do")
}
}
func run() {
// Find the offset of the embedded image
o, err := findOffset("/proc/self/exe")
if err != nil {
log.Fatal("Cannot find image")
}
// attach a raw file to a loop device
dev, err := losetup.Attach("/proc/self/exe", uint64(o), true)
if err != nil {
log.Fatal("Cannot attach device", err)
}
// create a mountpoint
mountpoint, err := ioutil.TempDir("", ".havre")
if err != nil {
log.Fatal("Cannot create mountpoint", err)
}
// now mount the loopback device
err = syscall.Mount(dev.Path(), mountpoint, "squashfs", syscall.MS_RDONLY, "")
if err != nil {
log.Fatal("Cannot mount fs ", err)
}
// create an array of arguments for the fork executable
args := append([]string{"child"}, mountpoint)
args = append(args, os.Args[2:]...)
cmd := exec.Command("/proc/self/exe", args...)
syscallAttributes := syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
}
cmd.SysProcAttr = &syscallAttributes
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
err = cmd.Run()
if err != nil {
log.Println(err)
}
err = syscall.Unmount(mountpoint, 0)
if err != nil {
log.Println(err)
}
err = dev.Detach()
if err != nil {
log.Println(err)
}
}
func child() {
err := syscall.Chroot(os.Args[2])
if err != nil {
log.Fatal("Cannot chroot", err)
}
syscall.Chdir("/")
err = syscall.Mount("proc", "/proc", "proc", 0, "")
if err != nil {
log.Fatal("Cannot mount proc", err)
}
cmd := exec.Command(os.Args[3], os.Args[4:]...)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
err = cmd.Run()
}
all: havre-alpine
alpine:
mkdir alpine
curl -s http://dl-cdn.alpinelinux.org/alpine/v3.6/releases/x86_64/alpine-minirootfs-3.6.2-x86_64.tar.gz | sudo tar -C alpine -xzf -
alpine.squash: alpine
sudo mksquashfs alpine alpine.squash
xenial/etc/lsb-release:
sudo debootstrap xenial xenial
xenial.squash: xenial/etc/lsb-release
sudo mksquashfs xenial xenial.squash
havre: *go
go build -o havre
havre-alpine: alpine.squash havre
cat havre alpine.squash > havre-alpine
chmod +x havre-alpine
havre-xenial: xenial.squash havre
cat havre xenial.squash > havre-xenial
chmod +x havre-xenial
clean:
sudo rm -rf havre havre-xenial havre-alpine alpine.squash xenial.squash alpine/ xenial/
package main
import (
"bytes"
"os"
)
const (
// SQUASHFSMAGIC ...
SQUASHFSMAGIC = "\x71\x73\x68\x73"
// SQUASHFSMAGICLZMA ...
SQUASHFSMAGICLZMA = "\x68\x73\x71\x73"
)
// findOffset returns the offset of the data
// it returns nil or io.EOF if everything is ok
func findOffset(f string) (int64, error) {
var err error
offset := int64(0)
file, err := os.Open(f) // For read access.
defer file.Close()
if err != nil {
return offset, err
}
// It will find two occurences, those that are hardcoded in the variables
for _, try := range [][]int{
[]int{2000000, 0},
[]int{0, 2},
} {
occurence := 0
expectedPreviousOccurence := try[1]
offset = int64(try[0])
data := make([]byte, 4)
for err == nil {
_, err = file.ReadAt(data, offset)
if bytes.Equal(data, []byte(SQUASHFSMAGICLZMA)) || bytes.Equal(data, []byte(SQUASHFSMAGIC)) {
occurence++
if occurence > expectedPreviousOccurence {
return offset, err
}
}
offset++
}
}
return offset, err
}
@pauldotknopf
Copy link

You should checkout Darch. It doesn't something similar, but you can boot natively into the images.

https://pknopf.com/post/2018-11-09-give-ubuntu-darch-a-quick-ride-in-a-virtual-machine/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment