Skip to content

Instantly share code, notes, and snippets.

@dergachev
Last active May 25, 2023 03:55
Show Gist options
  • Star 64 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save dergachev/8441335 to your computer and use it in GitHub Desktop.
Save dergachev/8441335 to your computer and use it in GitHub Desktop.
Caching debian package installation with docker

TLDR: I now add the following snippet to all my Dockerfiles:

# If host is running squid-deb-proxy on port 8000, populate /etc/apt/apt.conf.d/30proxy
# By default, squid-deb-proxy 403s unknown sources, so apt shouldn't proxy ppa.launchpad.net
RUN route -n | awk '/^0.0.0.0/ {print $2}' > /tmp/host_ip.txt
RUN echo "HEAD /" | nc `cat /tmp/host_ip.txt` 8000 | grep squid-deb-proxy \
  && (echo "Acquire::http::Proxy \"http://$(cat /tmp/host_ip.txt):8000\";" > /etc/apt/apt.conf.d/30proxy) \
  && (echo "Acquire::http::Proxy::ppa.launchpad.net DIRECT;" >> /etc/apt/apt.conf.d/30proxy) \
  || echo "No squid-deb-proxy detected on docker host"

Caching apt-get install with docker

If you're using docker, you probably have a Dockerfile that starts like this:

FROM ubuntu:12.04
RUN apt-get update

# install all my favorite utilities, putting it early to facilitate docker caching
RUN apt-get install -y curl git vim make build-essential

# install all pre-requisite packages for our dockerized application
RUN apt-get install -y libyaml-dev libxml2-dev libxslt-dev ruby1.9.1 ruby1.9.1-dev

# other stuff...

The best part about docker (vs vagrant-lxc, for example) is that Docker will automatically cache each successful build step in the Dockerfile, and each time you tweak the Dockerfile and re-run docker build, it only needs to re-run from the first change in the Dockerfile. That's a massive win, unless you like watching your packages install!

Even with this workflow, you'll still regularly invalidate your caches. For example, it's still likely that you'll invalidate Docker's build caches and be forced to re-install your packages. For example, you just realized that the base ubuntu image is missing man (what horror!). Sure you could add it to the bottom of your dockerfile and keep your caches, but that's just gonna bug at you:

FROM ubuntu:12.04
RUN apt-get update

# install all my favorite utilities, putting it early to facilitate docker caching
RUN apt-get install -y curl git vim make build-essential

# install all pre-requisite packages for our dockerized application
RUN apt-get install -y libyaml-dev libxml2-dev libxslt-dev ruby1.9.1 ruby1.9.1-dev

# other stuff...

# all by itself
RUN apt-get install -y man

You can save yourself refactoring, time, and bandwidth by installing a caching proxy for apt. Assuming you're running debian/ubuntu on your docker host, best one seems to be squid-deb-proxy.

I recommend installing it on the docker host, as follows:

sudo apt-get install -y squid-deb-proxy
# it automatically starts listening on port 8000 (on the host)

Now we need to get our docker containers to use the proxy. While there's a package called squid-deb-proxy-client that automatically detects the presence of a local proxy, it relies on the zeroconf daemon, and daemon's generally don't work in docker containers without a lot of fuss. Instead, modify your Dockerfile to create /etc/apt/apt.conf.d/30proxy, which configures apt to use the proxy on http://HOST-IP:8000:

FROM ubuntu:12.04

# If host is running squid-deb-proxy on port 8000, populate /etc/apt/apt.conf.d/30proxy
# By default, squid-deb-proxy 403s unknown sources, so apt shouldn't proxy ppa.launchpad.net
RUN route -n | awk '/^0.0.0.0/ {print $2}' > /tmp/host_ip.txt
RUN echo "HEAD /" | nc `cat /tmp/host_ip.txt` 8000 | grep squid-deb-proxy \
  && (echo "Acquire::http::Proxy \"http://$(cat /tmp/host_ip.txt):8000\";" > /etc/apt/apt.conf.d/30proxy) \
  && (echo "Acquire::http::Proxy::ppa.launchpad.net DIRECT;" >> /etc/apt/apt.conf.d/30proxy) \
  || echo "No squid-deb-proxy detected on docker host"

RUN apt-get update
RUN apt-get install -y {PACKAGES}

Obviously it's still faster if you can get Docker to cache the install steps, but caching just the package downloading will give you a big speedup.

Want more info? Here are my sources:

Enjoy!

@tomgruner
Copy link

RUN /sbin/ip route | awk '/default/ { print "Acquire::http::Proxy "http://"$3":8000";" }' > /etc/apt/apt.conf.d/30proxy

Wow that saved my day - thank you

@pmoust
Copy link

pmoust commented Aug 27, 2014

And here is a containerized squid-deb-proxy server over squid3: https://github.com/pmoust/squid-deb-proxy

@luser
Copy link

luser commented Oct 9, 2015

I ran into problems running this on Ubuntu 15.04, where neither nc nor route were installed in the base image. I hacked up something using awk and perl that seems to work:

RUN awk '/^[a-z]+[0-9]+\t00000000/ { printf("%d.%d.%d.%d\n", "0x" substr($3, 7, 2), "0x" substr($3, 5, 2), "0x" substr($3, 3, 2), "0x" substr($3, 1, 2)) }' < /proc/net/route > /tmp/host_ip.txt
RUN perl -pe 'use IO::Socket::INET; chomp; $socket = new IO::Socket::INET(PeerHost=>$_,PeerPort=>"8000"); print $socket "HEAD /\n\n"; my $data; $socket->recv($data,1024); exit($data !~ /squid-deb-proxy/)' <  /tmp/host_ip.txt \
  && (echo "Acquire::http::Proxy \"http://$(cat /tmp/host_ip.txt):8000\";" > /etc/apt/apt.conf.d/30proxy) \
  && (echo "Acquire::http::Proxy::ppa.launchpad.net DIRECT;" >> /etc/apt/apt.conf.d/30proxy) \
  || echo "No squid-deb-proxy detected on docker host"

@joshbrooks
Copy link

To set Docker builds to use apt-cacher-ng, check if port 3142 is open on the docker host.

RUN route -n | awk '/^0.0.0.0/ {print $2}' > /tmp/host_ip.txt; nc -zv `cat /tmp/host_ip.txt` 3142 &> /dev/null && if [ $? -eq 0 ]; then echo "Acquire::http::Proxy \"http://$(cat /tmp/host_ip.txt):3142\";" > /etc/apt/apt.conf.d/30proxy; echo "Proxy detected on docker host - using for this build"; fi

After all apt-update, apt-get etc commands remove the cache (so it's still portable!)

RUN if [ -f "/etc/apt/apt.conf.d/30proxy" ]; then rm /etc/apt/apt.conf.d/30proxy; fi

@z0u
Copy link

z0u commented Apr 1, 2016

As a separate script with a configurable port; call it detect-apt-proxy.sh:

#!/bin/bash

APT_PROXY_PORT=$1
HOST_IP=$(route -n | awk '/^0.0.0.0/ {print $2}')
nc -z "$HOST_IP" ${APT_PROXY_PORT}

if [ $? -eq 0 ]; then
    cat >> /etc/apt/apt.conf.d/30proxy <<EOL
    Acquire::http::Proxy "http://$HOST_IP:$APT_PROXY_PORT";
    Acquire::http::Proxy::ppa.launchpad.net DIRECT;
EOL
    cat /etc/apt/apt.conf.d/30proxy
    echo "Using host's apt proxy"
else
    echo "No apt proxy detected on Docker host"
fi

Then in your Dockerfile:

ARG APT_PROXY_PORT=
COPY detect-apt-proxy.sh /root
RUN /root/detect-apt-proxy.sh ${APT_PROXY_PORT}

Usage:

sudo docker run -d --name apt-cacher -p 3142:3142 sameersbn/apt-cacher-ng:latest
sudo docker build --build-arg APT_PROXY_PORT=3142 .

It only checks that the port is open, not that it's running any particular proxy server.

@mbana
Copy link

mbana commented Feb 21, 2017

yeh this great just what most people need. here is a slightly modified version for anyone wanting to do something similar with no hard deps on certain bins on the path and it extends from the earlier examples:

sudo docker build --build-arg APT_PROXY_PORT=3142 .
FROM debian:stretch
ENV DEBIAN_FRONTEND=noninteractive

ARG APT_PROXY_PORT=
COPY detect-apt-proxy.sh /root
RUN /root/detect-apt-proxy.sh ${APT_PROXY_PORT}
#!/bin/bash -ex
# see:
# https://github.com/sameersbn/docker-apt-cacher-ng
# https://gist.github.com/dergachev/8441335

APT_PROXY_PORT=$1
HOST_IP=$(awk '/^[a-z]+[0-9]+\t00000000/ { printf("%d.%d.%d.%d\n", "0x" substr($3, 7, 2), "0x" substr($3, 5, 2), "0x" substr($3, 3, 2), "0x" substr($3, 1, 2)) }' < /proc/net/route)

if [[ ! -z "$APT_PROXY_PORT" ]] && [[ ! -z "$HOST_IP" ]]; then
    echo 'Acquire::HTTPS::Proxy "false";' >> /etc/apt/apt.conf.d/01proxy
    cat >> /etc/apt/apt.conf.d/01proxy <<EOL
    Acquire::HTTP::Proxy "http://${HOST_IP}:${APT_PROXY_PORT}";
    Acquire::HTTPS::Proxy "false";
EOL
    cat /etc/apt/apt.conf.d/01proxy
    echo "Using host's apt proxy"
else
    echo "No squid-deb-proxy detected on docker host"
fi

observing the logs, it looks like it goes t the catch:

$ docker exec -it apt-cacher-ng tail -f /var/log/apt-cacher-ng/apt-cacher.log
1487691303|O|66645|172.17.0.1|deb.debian.org/debian/pool/main/n/netcat/netcat-traditional_1.10-41_amd64.deb
1487691303|O|138750|172.17.0.1|deb.debian.org/debian/pool/main/libp/libpcap/libpcap0.8_1.8.1-3_amd64.deb
1487691303|O|247857|172.17.0.1|deb.debian.org/debian/pool/main/n/net-tools/net-tools_1.60+git20161116.90da8a0-1_amd64.deb
1487691303|O|533741|172.17.0.1|deb.debian.org/debian/pool/main/s/strace/strace_4.15-2_amd64.deb
1487691304|O|406026|172.17.0.1|deb.debian.org/debian/pool/main/t/tcpdump/tcpdump_4.9.0-1_amd64.deb
1487691304|I|5411693|172.17.0.1|deb.debian.org/debian/pool/main/v/vim/vim-runtime_8.0.0197-2_all.deb
1487691304|O|5411434|172.17.0.1|deb.debian.org/debian/pool/main/v/vim/vim-runtime_8.0.0197-2_all.deb
1487691304|I|1034137|172.17.0.1|deb.debian.org/debian/pool/main/v/vim/vim_8.0.0197-2_amd64.deb
1487691304|O|1033860|172.17.0.1|deb.debian.org/debian/pool/main/v/vim/vim_8.0.0197-2_amd64.deb
1487691304|O|9288|172.17.0.1|deb.debian.org/debian/pool/main/n/netcat/netcat_1.10-41_all.deb

@mbana
Copy link

mbana commented Feb 21, 2017

ok i've wrapped this up into a repo for anyone wanting to get a quickstart on it.
it binds to port 3143 - so as not to conflict with the version in the original repo.
there are two images in there, one which makes cache calls and another which doesn't.

https://github.com/mbana/debian-apt-cacher
the images/ folder contains the images just mentioned.

@FengLin-UBNT
Copy link

@mbana Thanks for your scripts. It has one issue on Big Endian host. The ip address is reversed.
I got HOST_IP=1.0.17.172
It should be HOST_IP=172.17.0.1

@islander
Copy link

islander commented Jun 14, 2019

HOST_IP=$(awk '/^[a-z]+[0-9]+\t00000000/ { printf("%d.%d.%d.%d\n", "0x" substr($3, 7, 2), "0x" substr($3, 5, 2), "0x" substr($3, 3, 2), "0x" substr($3, 1, 2)) }' < /proc/net/route)

This line doesn't work on my installation (Ubuntu 18.04 / GNU Awk 4.1.4).
Interface regexp should be [a-z0-9]+ to parse new interface names, for example enp2s0. And printf can't parse digits from string, so you should explicitly convert it using strtonum("0x" substr($3, 7, 2)), or add --non-decimal-data key to awk. So my fixed version of this script:

#!/bin/bash -ex
# see:
# https://github.com/sameersbn/docker-apt-cacher-ng
# https://gist.github.com/dergachev/8441335

CONFPATH=/etc/apt/apt.conf.d/01proxy 
APT_PROXY_PORT=$1
HOST_IP=$(awk --non-decimal-data '/^[a-z0-9]+\t00000000/ { printf("%d.%d.%d.%d\n", "0x" substr($3, 7, 2), "0x" substr($3, 5, 2), "0x" substr($3, 3, 2), "0x" substr($3, 1, 2)) }' < /proc/net/route)

if [[ ! -z "$APT_PROXY_PORT" ]] && [[ ! -z "$HOST_IP" ]]; then
    cat > $CONFPATH <<-EOL
        Acquire::HTTP::Proxy "http://${HOST_IP}:${APT_PROXY_PORT}";
        Acquire::HTTPS::Proxy "false";
EOL
    cat $CONFPATH
    echo "Using host's apt proxy"
else
    echo "No squid-deb-proxy detected on docker host"
fi

UPD: also, see this workaround for other versions of awk (mawk, etc)

@bytearchive
Copy link

🆒

@Aposhian
Copy link

Aposhian commented Aug 6, 2021

What are the advantages of using squid-deb-proxy vs using buildkit to cache your /var/cache/apt and /var/lib/apt directories between runs?

https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/syntax.md#example-cache-apt-packages

@aerickson
Copy link

@Aposhian it doesn't cache during a docker build only a docker run.

I have a gist with what I'm currently using. It doesn't require modifying the Dockerfile at all, just the build command.

https://gist.github.com/aerickson/3f785dd2fb75de27c30468dbac91cb96

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment