Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Putting Docker on its own pseudo filesystem

Docker on BTRFS is very buggy and can result in a fully-unusable system, in that it will completely butcher the underlying BTRFS filesystem in such a way that it uses far more disk space than it needs and can get into a state where it cannot even delete any image, requiring one to take drastic actions up to and including reformatting the entire affected BTRFS root file system.

According to the official Docker documentation:

btrfs requires a dedicated block storage device such as a physical disk. This block device must be formatted for Btrfs and mounted into /var/lib/docker/.

In my experience, you will still run into issues even if you use a dedicated partition. No, it seems it requires a standalone hard drive, which is a luxury many computers just simply cannot afford.

See Docker gradually exhausts disk space on BTRFS #27653 for details of exactly what I have run into. Also, docker does not remove btrfs subvolumes when destroying container

A pseudo filesystem is a filesystem that is contained inside an otherwise-ordinary file, that is mounted by the OS. This guide will show you how to set one up and use it exclusively for Docker images and containers in a way that will NOT cripple your BTRFS file system, but also allows you to store it in normal BTRFS subvolume snapshots.

Steps to migrate /var/lib/docker from a subdirectory to a dedicated pseudo filesystem.

System Preparation:

  1. BACKUP ANY IMPORTANT self-made Docker images! This guide will destroy all of your existing images and containers. docker save image/name -o image_name.docker; bzip2 image_name.docker
  2. Open up a terminal and run the command sudo watch -n10 df /var/lib/docker. Pay attention to the total space availabler. Because BTRFS deletes files from the system only when the disk is inactive, it is important to know when certain processes have really finished, or if they are even happening. In a BTRFS file system that is corrupted by Docker, many times no file will actually be removed from the underlying file system. If this happens, refer to the Drastic Actions section.
  3. Make a BTRFS volume snapshot!! We are messing with your core file system. It is important to make a snapshot. If all goes to Hell, refer to the Drastic Actions section for how to restore the snapshot and get quickly back to work. sudo mkdir /snaps sudo btrfs subvolume snapshot / /snaps/root-$(date '+%Y-%m-%d')-pre

Clean Up Docker /var/lib/docker files.

  1. Delete all of the docker containers: docker rm $(docker ps -aq)

    Afterwards, docker ps -aq should return nothing.

  2. Delete all of the docker images. docker rmi -f $(docker images -q) NOTE: If you do not see any activity for several minutes, it is indicative of a BTRFS meltdown. To verify for sure, run sudo du -hs /var/lib/docker. If it is still running after 3-5 minutes, refer to the Drastic Actions section.

    Afterwards, docker images -q should return nothing.

  3. Stop docker. sudo systemctl stop docker NOTE: When docker has butchered the BTRFS file system, stopping docker will many times NOT be stoppable via this step. Fortunately, a simple system reboot resolves this issue. Do that now if you encounter this problem.

3b. Ensure that docker is completely stopped. ps aux | grep docker 4. Explore the /var/lib/docker director:

sudo -s
cd /var/lib/docker
du -h --max-depth=1 | sort -h

Because you have deleted literally 100% of the files which docker stores, your /var/lib/docker should be virtually empty. Maybe a few MB max. However, if Docker has been abusing the underlying root BTRFS system, many times many GBs will still be stored. 5. Attempt to remove all of the files manually: DO NOT USE THE rm COMMAND! This will not work, and if it does, you will have irreversibly corrupted your BTRFS system. Go immediately to the Drastic Actions section if you have accidentally done so.

As discussed in nuking old and broken /var/lib/docker directories is non-trivial, the only safe way to remove broken /var/lib/docker files on BTRFS is to do the following:

for subvolume in /var/lib/docker/btrfs/subvolumes/*; do
    btrfs subvolume delete $subvolume
done
  1. Ensure that all docker BTRFS subvolumes have been destroyed: btrfs subvolume list / You should not see any entries with the path /var/lib/docker.
  2. Manually remove all the other files in /var/lib/docker: rm -r /var/lib/docker/* Ensure that it is empty by running both ls and du -h ., both of which should report 0 disk space used.

If all has gone well, you now have a BTRFS file system that is devoid of all docker-related images, containers and various metadata and caches. Congratulations!

Create the pseudo file system

  1. Ensure that you are the root user. sudo -s
  2. Create the pseudo filesystem: The best place to store file-based pseudo filesystems is in /media.

Estimate how much space you will need, or want to reserve, for Docker images. I find that 10-20 GB is far more than enough for properly functioning systems.

cd /media
fallocate -l 10G docker-volume.img
mkfs.ext4 docker-volume.img
mount -o loop -t ext4 /media/docker-volume.img /var/lib/docker
df -h
# You should see: /dev/loop0      9.8G   37M  9.3G   1% /var/lib/docker
umount /var/lib/docker
  1. Add the pesudo filesystem to the "mount on boot" config. echo "/media/docker-volume.img /var/lib/docker ext4 defaults 0 0" >> /etc/fstab
  2. Test mount it: mount /var/lib/docker
  3. Restart docker and confirm that it is using the pseudo filesystem:
systemctl start docker
systemctl stop docker
cd /media
ls /var/lib/docker    # You should see many subdirectories. 
du -h /var/lib/docker # It should report approximately 35 directories, and about 256 KB of space used.
                      # You should NOT see any mention of BTRFS subvolumes.
umount /var/lib/docker
du -h /var/lib/docker # You should see: 0	/var/lib/docker/
  1. Now reboot the system and confirm that the volume has auto-mounted and that docker is using it.

Congratulations! You have now moved Docker volumes from BTRFS to a pseudo ext4 file system, which docker supports much better!

  1. IMPORTANT: Take a new snapshot of the fixed system and remove the one we made at the beginning of this guide.
sudo btrfs subvolume snapshot / /snaps/root-$(date '+%Y-%m-%d')
sudo btrfs subvolume del /snaps/root-$(date '+%Y-%m-%d')-pre

If you ever run into a corrupted /var/lib/docker in the future, simply sudo rm /media/docker-volume.img and repeat this guide. It is much better than risking your entire BTRFS file system to docker's buggy implementation!

Drastic Actions

Attempt a BTRFS restore

Things didn't go so well? Unfortunately, this happens.

First things first, attempt to restore an older snapshot that may not be corrupted.

Follow the guide here: Using Btrfs for Easy Backup and Rollback

If that fails, restore the snapshot taken in the prep stage of this guide. That will at least get you back to the same state your system was in before you started all of this.

Attempt via a rescue disk

Mount the partition while inside a recovery system like System Rescue CD and reattempt this guide from the very beginning.

When I was in a total desperate situation where Docker had consumed so much of the file system that basic commands would not run, this method saved me.

Back up and Reformat the entire system.

In early 2017, no matter what I tried, nothing worked. If you find yourself in this unfortunate state, back up all of your important files, maybe via a resovery system, and reformat the machine. I still recommend BTRFS as it is vastly superior to all other mainstream file systems. Just don't use it with docker!

Be sure to leave your horror story on the official Docker bug reports for this issue:

@plutext

This comment has been minimized.

Copy link

commented Oct 3, 2018

Thanks for your pseudo file system instructions!
I ran into a little hiccup on reboot;

it said
mount: /var/lib/docker: special device /bvols/\100docker/No_COW/docker-volume.img does not exist. var-lib-docker.mount: Mount process exited, code=exited status=32 Failed with result 'exit-code'. Failed to mount /var/lib/docker.

This was because my root filesystem / is cryptroot / uses LUKS

It works if I instruct that / must be mounted first, using x-systemd.requires=/

My full fstab entry:

/bvols/docker/No_COW/docker-volume.img /var/lib/docker ext4 noatime,noauto,nofail,x-systemd.automount,x-systemd.requires=/ 0 0

I had this in a subvolume named @docker; the '@' symbol had to be escaped as \100 in fstab, so I changed the subvolume name to Docker to get rid of that complexity.

Also, it seems to me to be a good idea to disable COW for this pseudo FS.

@pasikon

This comment has been minimized.

Copy link

commented Oct 11, 2018

Thanks man you rescued me!

@andriy-f

This comment has been minimized.

Copy link

commented Mar 10, 2019

I used this command btrfs subvolume delete /var/lib/docker/btrfs/subvolumes/*. This may cause some information to be lost

@dmerillat

This comment has been minimized.

Copy link

commented Mar 22, 2019

YOU DIDN'T MARK THE IMAGE FILE NOCOW

You're going to fragment your filesystem into unusability really rapidly if you blindly follow this guide, since you're already running on BTRFS and you're basically running a copy-on-write for docker now.

plutext pointed this out above, but didn't say how to do it. The modified commands are below:

cd /media
touch docker-volume.img
chattr +C docker-volume.img
fallocate -l 10G docker-volume.img
mkfs.ext4 docker-volume.img
mount -o loop -t ext4 /media/docker-volume.img /var/lib/docker
df -h
# You should see: /dev/loop0      9.8G   37M  9.3G   1% /var/lib/docker
umount /var/lib/docker

That said, there's a much easier fix.
The disaster comes from the way subvolumes work on btrfs with docker, if you look into them they start as your root filesystem and are then modified. The bigger your root filesystem the heavier each docker subvolume starts as and the slower everything runs. Metadata gets duplicated over and over and you end up with tens of gigabytes of duplicate trees as each image change makes a new fork.

Most trivial solution: make /var/lib/docker/btrfs it's own subvolume that's mounted in fstab. The copy stops at the mount-point and ignores subvolumes, so each new docker sub starts empty and doesn't exponentially grow your metadata allocations.

A separate partition is also fine. They won't interfere with each other outside IO contention, and if you're wedged to the point that your disk is completely saturated you have something else to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.