Brainiarc7/transient-clustering-gnu-parallel-sshfs.md

## transient-clustering-gnu-parallel-sshfs.md

      
    Raw
  

              transient-clustering-gnu-parallel-sshfs.md
            
          
    Transient compute clustering with GNU Parallel and sshfs:

GNU Parallel is a multipurpose program for running shell commands in parallel, which can often be used to replace shell script loops,find -exec, and find | xargs. It provides the --sshlogin and --sshloginfile options to farm out jobs to multiple hosts, as well as options for sending and retrieving static resources and and per-job input and output files.
For any particular task, however, keeping track of which files need to pushed to and retrieved from the remote hosts is somewhat of a hassle. Furthermore, cancelled or failed runs can leave garbage on the remote hosts, and if input and output files are large, sending them to local disk on the remote hosts is somewhat inefficient.
In a traditional cluster, this problem would be solved by giving all nodes access to a shared filesystem, usually with NFS or something more exotic. However, NFS doesn't work particularly well for machines that aren't always connected to the same LAN, and is very difficult to implement securely unless you can prevent people from connecting their personal machines to your network.
By using SSHfs instead, we can give all nodes access to a shared filesystem with no more root-level configuration than installing sshfs and adding the relevant user to the fuse group. This makes it easy to build temporary computing clusters out of a motley assortment of desktops, laptops, and friends' machines booted from USB.
GNU Parallel Configuration:
Parallel is configured with files in ~/.parallel. Files under this directory can be used to add groups of options to Parallel's command line with the -J option. For example, if you have a file ~/.parallel/local:
--nice 10
--progress

then parallel -Jlocal will behave as if you had called parallel --nice 10 --progress. The files ~/.parallel/config and ~/.parallel/sshloginfile are special. The first is included by default in all invocations of parallel, and the second can be used by calling parallel --sshloginfile .. rather than providing the path.
We use this sshloginfile to list the hosts in our cluster, like so:
lily
ghasthawk

When Parallel is used to run jobs on remote machines, it invokes that machine's copy of itself, so ~/.parallel/config must not cause parallel to log into any remote machines. If it does, an infinite loop will result. Personally, I leave this file empty and use named profiles. For clustering, we use ~/.parallel/cluster:
--nice 10
--sshloginfile ..
--progress
--workdir .

The --workdir . option causes the working directory on the local machine to be used when running commands remotely. With a shared filesystem mounted at the same path on all nodes, this allows us to distribute jobs without explicitly pushing and retrieving files.
Cluster Filesystem Setup/Teardown:
The script below activates and deactivates the shared filesystem on all nodes. The cluster is started by running clustermode.bash up on the master node, and stopped by running clustermode.bash down. The script must be executable and located somewhere on $PATH on all nodes. On my systems, this is achieved by putting it in my personal bin directory, which is opportunistically synced by git along with the rest of my config files.
We mount the master's home directory on all nodes. This allows you to use the cluster without copying your data files into a particular place. Also, the arcfour (RC4) cipher is used to reduce overhead.
#!/bin/bash

# clustermode.bash:
# Set up a shared sshfs file system in the same path (relative to $HOME)

set -ue

usage_message="Usage: clustermode.bash <up|down>"

# If script is called without specifying a master node, assume it's us.
master=${2:-$(hostname)}
sharedir="${HOME}/sshfs/${master}"

up ()
{
    # Ensure mountpoint exists.
    [ -d "$sharedir" ] || mkdir -p "$sharedir"
    # Mount shared fs; nohup required to avoid immediate unmount.
    nohup sshfs \
       -o Cipher=arcfour \
       -o follow_symlinks \
       "${master}:" "$sharedir" \
       >/dev/null
    # If we are the master node:
    if [[ "$(hostname)" == "$master" ]]; then
        for host in $(grep -v $master "${HOME}/.parallel/sshloginfile"); do
            # Recurse to connect to master node; -t required for passwords.
            # source ~/.profile required to get this script in $PATH.
            ssh -t $host "source ~/.profile; clustermode.bash up ${master}" \
                || echo "$host unreachable"
        done
        # Switch to current path in the shared filesystem.
        wd=$(pwd)
        cd "${sharedir}/${wd#${HOME}}"
        exec "$SHELL"
    fi
}

down ()
{
    fusermount -u "$sharedir"
    if [[ "$(hostname)" == "$master" ]]; then
        for host in $(grep -v $master "${HOME}/.parallel/sshloginfile"); do
            ssh -t $host "source ~/.profile; clustermode.bash down ${master}" \
                || echo "Could not deactivate ${host}."
        done
    fi
}


if [[ $# -eq 0 ]];then
    echo "$usage_message"
    exit 1
fi

if [[ "$1" == "up" ]]; then
    up
elif [[ "$1" == "down" ]]; then
    down
else
    echo "$usage_message"
fi

Running Something:
Once you have the shared filesystem set up, you can run distributed jobs in any directory beneath the mount point. Personally, I like to write a short shell script to specify exactly what each chunk of work is supposed to do, and then use parallel to apply it across the files, as shown in the ffmpeg usage script below:
#!/bin/bash

file="$1"

ffmpeg -i "$file" \
    -loglevel error \
    -c:a copy \
    -c:v libx264 \
    -profile:v high \
    -preset veryslow \
    -tune animation \
    -crf 20 \
    "./encoded/${file%.*}.mkv"

Because ffmpeg already uses all the cores on a node, we use the -j1 option so only one job runs on each node at once.
ls *.mp4 | parallel -Jcluster -j1 ./encode.bash "{}"

If you have everything set up right, you should get something like this:
Computers / CPU cores / Max jobs to run
1:ghasthawk / 2 / 1
2:lily / 2 / 1

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ghasthawk:1/14/50%/3594.8s  lily:1/14/50%/3594.8s 

Note: If you need to tune this for use with NVIDIA's NVENC based encoding on a cluster with multiple GPUs as illustrated here, change the ffmpeg command above as follows:
#!/bin/bash

file="$1"

ffmpeg -i "$file" \
    -loglevel error \
    -c:a copy \
    -filter:v hwupload_cuda,format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \
    -profile:v high \
    -c:v h264_nvenc \
    -preset llhq -rc:v vbr_minqp -qmin:v 19 \
    "./encoded/${file%.*}.mkv"