Skip to content

Instantly share code, notes, and snippets.

@jperkin
Created April 11, 2016 09:45
Show Gist options
  • Save jperkin/1025a6d81553c19b3c35a3aefa986a0b to your computer and use it in GitHub Desktop.
Save jperkin/1025a6d81553c19b3c35a3aefa986a0b to your computer and use it in GitHub Desktop.
pkgsrc rsync

Issue

pkgsrc binary package repositories must be consistent. They contain a file pkg_summary which contains metadata about all packages currently in the repository. This file is then used by binary package managers such as pkgin to offer easy downloading and installing of packages to users.

Unlike many other package managers, pkgsrc will rebuild all dependents of a package if it is rebuilt. This is to ensure consistency and avoid nasty surprises at runtime, and many times this has exposed issues that other package managers would not detect until much later. The drawback is that an update to a package such as OpenSSL means that many thousands of packages are rebuilt against the newer OpenSSL package, even though the packages themselves haven't changed at all.

When packages are rebuilt, the package file itself and the details in pkg_summary must match, otherwise binary package managers will fail when trying to install packages, either because the package has been changed but the pkg_summary information is stale, or pkg_summary has been updated but the new package file is not yet available.

Issues of this nature are exposed with traditional rsync based distribution methods as package directories are updated on a file-by-file basis, so the repository is inconsistent at all times between start and end.

Example

Taking the example of an OpenSSL update, this would cause a rebuild of the sysutils/adtool package, as it builds against security/openssl. Even though the adtool package itself hasn't changed, the resulting binary package will have a number of different characteristics.

For example, there are some per-build details embedded in the package:

$ pkg_info -B adtool-1.3.3.tgz | grep ^BUILD
BUILD_DATE=2016-03-17 02:19:48 +0000
BUILD_HOST=SunOS pkgsrc-pbulk-2014Q4-2.local 5.11 joyent_20141030T081701Z i86pc i386 i86pc

as well as the list of exact dependencies it was built against, which will change due to the OpenSSL version being increased:

$ pkg_info -qN adtool-1.3.3.tgz
openldap-client-2.4.44
openssl-1.0.2g
gcc49-libs-4.9.3nb1
digest-20121220
gcc49-4.9.3
cwrappers-20150707

The metadata for the adtool package that will be in the pkg_summary file includes:

$ pkg_info -X adtool-1.3.3.tgz | egrep 'PKGNAME|SIZE_PKG|FILE_SIZE'
PKGNAME=adtool-1.3.3
SIZE_PKG=89615
FILE_SIZE=45116

and with a rebuild of the package and an update of its metadata it is highly likely that the SIZE_PKG and FILE_SIZE values will change. When installing, pkgin will compare the FILE_SIZE value between what is encoded into the package itself and what pkg_summary has recorded for it, and if they differ will abort with:

download mismatch: adtool-1.3.3.tgz

rsync race

The race condition is exposed when using a simple rsync to copy packages up from the build host to the package server. By default a simple directory sync is used which will copy files one by one until the directory is in sync, for example:

$ rsync -av packages/SmartOS/2015Q4/i386/ pkgsrc.example.com:/www/packages/SmartOS/2015Q4/i386/
...
adtool-1.3.3.tgz
...lots of other packages...
pkg_summary.xz
...more packages...

At any point between adtool-1.3.3.tgz and pkg_summary.xz being synced, the repository is inconsistent and users will see download mismatches. The same will be seen for any package which sorts alphabetically after pkg_summary.xz, as pkgin will have the new FILE_SIZE but the old package will be served with a different FILE_SIZE.

The same goes for any mirrors of pkgsrc.example.com, they must also ensure consistency for all updates.

Fix

For pkgsrc.joyent.com we have moved to using rsync's --link-dest option and atomic directory updates. This needs to be performed at all stages of the publishing process.

Package upload

For copying up package builds to the master package server, we changed pbulk's publishing step to use --link-dest with the following commit:

https://github.com/joyent/pkgsrc/commit/c707841066d781be241c650ae271f2f3a0f9ef60

The basic overview is:

# Old method, files updated in-place.
rsync -av . primaryhost:/www/packages/SmartOS/2015Q4/i386

# New method, update to a hidden directory then switch into place, using hardlinks
# to avoid having to rsync files which haven't changed.
rsync -av --link-dest=/www/packages/SmartOS/2015Q4/i386 . \
    primaryhost:/.www/packages/SmartOS/2015Q4/i386
ssh host "mv /www/packages/SmartOS/2015Q4/i386 /www/packages/SmartOS/2015Q4/i386-old &&
          mv /.www/packages/SmartOS/2015Q4/i386 /www/packages/SmartOS/2015Q4/i386 &&
          rm -rf /www/packages/SmartOS/2015Q4/i386-old"

This method requires a few things:

  • The shadow directory must reside on the same file system to support hardlinks...
  • ...but should also be hidden to avoid being picked up by other mirrors
  • SSH access to the target host to perform the post-sync renames
  • Being very careful not to delete everything!

Package mirror

The above takes care of the initial package upload, but all parts of the delivery process must use the same design, including mirrors. For the pkgsrc.joyent.com mirrors we changed the older push-based sync to a pull-based one. The method is similar, but as it is performed locally on the mirror we do not need SSH.

# Old method, files updated in-place.
rsync -av primaryhost:/www/packages/ /www/packages/

# New method (plus lots of error handling!)
rsync -av --link-dest=/www/packages primaryhost:/www/packages/ /.www/packages/
mv /www/packages /.www/packages-old
mv /.www/packages /www/packages
rm -rf /.www/packages-old

There is a tiny tiny race condition between the two mv operations, but we can live with that - it's certainly a lot smaller than the previous race which, with large directory updates, could be hours!

@jperkin
Copy link
Author

jperkin commented Apr 11, 2016

The mirror script we use (lightly edited)

#!/bin/bash

# Bail early on failures so that we don't risk corrupting the package dirs.
set -eux

PATH="/usr/bin"

#
# Using OpenSSH improves throughput on a single file from approximately 1MB/s
# to 12.5MB/s.  We can remove this once the mirrors are running a platform with
# OpenSSH, and rely on the primary already running it as sshd.
#
SSH="/opt/local/bin/ssh"

#
# Use a lock file to ensure we do not run multiple rsync processes from
# cron at the same time, as that could have rather bad consequences.
#
lockf="/var/tmp/.pkgsrc-rsync-lock"

cleanup()
{
    rm -f ${lockf}
    exit
}

# Send failure mail, clean up lockfile.
warn()
{
    echo "$*" | mailx -s \
        "rsync-pkgsrc failure (non-fatal) on $(uname -n)" pkgsrc@example.com
    cleanup
}

# Send failure mail, do not clean up lockfile.
fail()
{
    echo "$*" | mailx -s \
        "rsync-pkgsrc failure (fatal) on $(uname -n)" pkgsrc@example.com
    echo "fail" >${lockf}
    exit 1
}

if [ -f ${lockf} ]; then
    read pid < ${lockf}
    # Bail out either if the lockfile contains "fail" from a previously
    # failed run, or if a previous run is still running.
    if [ "${pid}" = "fail" ]; then
        exit
    fi
    ps -p ${pid} >/dev/null 2>&1
    if [ $? = 0 ]; then
        exit
    fi
fi

# Ok, no existing processes or failed runs, we're ready to go.
echo $$ >${lockf}

#
# Sync to a temporary directory, using hardlinks from the current live
# directory for unchanged files.  If this operation fails we can safely
# remove the lockfile as the subsequent run will continue where it left
# off, but send mail to investigate in case something is broken (disk
# full or whatever).
#
rsync -aq -e ${SSH} --delete --link-dest=/www/pkgsrc.example.com \
    pkgsrc.example.com:/www/pkgsrc.example.com/ \
    /www/.pkgsrc.example.com/ || warn "rsync failed"

#
# At this point the .dir will have the entire new contents, so we can move
# the live dir out of the way and move .dir into its place.
#
mv /www/pkgsrc.example.com /www/.pkgsrc.example.com-old || fail "mv to -old failed"
mv /www/.pkgsrc.example.com /www/pkgsrc.example.com || fail "mv .dir to live failed"

# We can now remove the old directory.  This may take a while.
rm -rf /www/.pkgsrc.example.com-old || fail "rm -old failed"

# All completed successfully, remove the lock file.
cleanup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment