Skip to content

Instantly share code, notes, and snippets.

@mbreese
Created May 9, 2014 05:42
Show Gist options
  • Save mbreese/c79150b6d504ff0cf7ff to your computer and use it in GitHub Desktop.
Save mbreese/c79150b6d504ff0cf7ff to your computer and use it in GitHub Desktop.
Using modules to version your processing pipeline

Environmental Modules

Environment Modules is a utility that has been used to manage executables and paths for high-performance computing clusters for multiple decades (1991!). The basic idea is that you can use modules to adapt your processing environment (and $PATH) to make sure that your environment is consistent. Importantly, this allows system administrators the ability to install and maintain multiple versions of software for different users. This tool can also be used to effectively manage your bioinformatics processing pipelines to ensure consistent analysis runs. For example, if you have a set of samples that will need to be analyzed consistently over a long time span, you could use modules to make sure that the same version of a program is used throughout the entire experiment while letting you use newer versions for different experiments.

Installation

If you are running your samples on a computing cluster, chances are you are already using modules to configure your environment (add programs to your path, etc). I'll first discuss installing modules from scratch using a FreeBSD system as the example. Next, I'll discuss using your own modules in the context of an existing cluster setup. Finally we'll look at how to add new modules to keep track of versions of your tools.

Installation from scratch

Environment Modules is likely available in your default OS package repository. For FreeBSD, it is in ports under the name "sysutils/modules", or as a downloadable package under the name "modules". For CentOS/RHEL, it is part of the EPEL repository under the name "environment-modules". I will assume that you have installed the software from the OS repository. If you need source-level installation instructions, see here: http://www.admin-magazine.com/HPC/Articles/Environment-Modules or https://github.com/hpcugent/easybuild/wiki/Installing-environment-modules-without-root-permissions

For FreeBSD, your command would be: pkg install modules. For CentOS (with EPEL enabled), you'd run: yum install environment-modules

The CentOS/RHEL EPEL package will take care of this for you. If you are using FreeBSD, you'll need to make sure that modules are enabled in your shell's initialization. You can do this on a per-user basis or for all users of a system. Basically, you'll need to add this line to your $HOME/.bashrc file (or the appropriate file for your shell):

source /usr/local/Modules/$VERSION/init/bash

There is a version of this for each type of shell, so choose the one for the shell you are using. Replace the path with the correct path for your installation.

At this point, you can test whether or not the installation worked for you by logging out, re-logging in, and running module. You should see a list of possible commands. In order to see what modules are currently available, you can run module avail.

Adding a custom module path

If you want to use a specific path for your managed modules for all users, you'll want to edit the file: $MODULEHOME/$MODULE_VERSION/init/.modulespath. To add a new directory for modules, just add the pathname to the end of this file.

If you are trying to install a personal-path without root permissions (such as on an existing HPC cluster), you can use the command module use $HOME/.local/mysoftware. This can be setup in your shell initialization or in the file $HOME/.modulerc:

#%Module
module use $HOME/.local/mysoftware

It's important to note that this custom path isn't necessarily path that contains all of your programs. Rather, this path contains the module definitions that tell the system what programs/versions are available for loading. You can actually put the programs themselves in another location.

Note: If you can't see your custom module, perhaps one of the earlier entries in $MODULEHOME/$MODULE_VERSION/init/.modulespath is causing an error. In the stock FreeBSD setup, this file contains /usr/local/lib, which causes the program to segfault. If you comment out this line, then the program will keep going and show your custom module path.

Testing

In order to test that your custom path is configured properly, from the primary module path (check $MODULEHOME/$MODULE_VERSION/modulefiles), copy the null module file to your custom path. This is an empty module definitino file that can be used to demonstrate that your custom path is working.

Once you've added the null module, run module avail. You should see your new custom module path in the list with the null module listed.

Adding a new module

The first module that we will add is a simple one that will set the directory that will store our programs. Here, I've chosen the following directory setup:

$HOME/.local/modules
$HOME/.local/modules/modulefiles

All of the programs will go into $HOME/.local/modules whereas all of the definition files will be in $HOME/.local/modules/modulefiles. $HOME/.local/modules has been added to put module path with module use $HOME/.local/modules/modulefiles

For example, for a program named 'foo' with a version of 0.1.2, all of the programs files would go in $HOME/.local/modules/foo/foo-0.1.2, and the definition file would be named $HOME/.local/modules/modulefiles/foo/0.1.2.

The program file directory could be setup anyway that you'd like. There is a lot of flexibility in how you setup each module. What I'm going to do is have the following folders: bin, man, and build. Binary files go in bin, man pages go in man, and we will use build to store the downloaded source code and itermediate files.

$HOME/.local/modules/foo
$HOME/.local/modules/foo/foo-0.1.2
$HOME/.local/modules/foo/foo-0.1.2/bin
$HOME/.local/modules/foo/foo-0.1.2/build
$HOME/.local/modules/foo/foo-0.1.2/man

$HOME/.local/modules/modulefiles/foo/0.1.2

If you use a directory structure like this, the only tricky part is the definition file $HOME/.local/modules/modulefiles/foo/0.1.2.

Here is an example of what that file could look like. This example simply adds $HOME/.local/modules/foo/foo-0.1.1/bin to your $PATH:

#%Module10#################################################################
#
## foo-0.1.1
##
proc ModulesHelp { } {
    global version

    puts stderr "\tfoo is a program"
    puts stderr "\n\tVersion \$version\n"
}

module-whatis "
Foo does something...
"

conflict foo

# for Tcl script use only
set version "0.1.1"

prepend-path PATH $HOME/.local/modules/foo/foo-0.1.1/bin
prepend-path MANPATH $HOME/.local/modules/foo/foo-0.1.1/man

if [ module-info mode load ] {
        puts stderr "foo version \$version loaded."
}

if [ module-info mode switch2 ] {
        puts stderr "foo version \$version loaded."
}

if [ module-info mode remove ] {
        puts stderr "foo version \$version unloaded."
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment