Skip to content

Instantly share code, notes, and snippets.

@mwvaughn
Created January 9, 2018 14:52
Show Gist options
  • Save mwvaughn/ea5000a02ef93429eca5bd4c01482b03 to your computer and use it in GitHub Desktop.
Save mwvaughn/ea5000a02ef93429eca5bd4c01482b03 to your computer and use it in GitHub Desktop.
Agave App

Agave Apps: A Theory of Operations to Help with Debugging

Here are a few of the problems we're solving with Agave:

You have data. Either it's in your direct physical possession or you can access it via a URI. You have a specific process, analysis, or test you need to apply to that data (We're just going to say analysis from here on out). The analysis can be based on new code you will write, someone else's complete solution, or an integration of off-the-shelf parts. The analysis code can be of arbitrary complexity, ranging from a single-language binary to a script that calls other scripts and leverages UNIX userland (sed/awk/grep/etc). You need to collaborate of development and testing of the analysis code. You need to be able to run your analysis on the data reproducibly and retrieve information about those runs in the future. You need to be able to share the ability to perform this analysis with other people with diverse levels of computational science experience and comfort.

We've worked hard to make it approachable, but these are some truly nasty problems. So, it's not uncommon to for even experienced Agave users to run into roadblocks, especially when developing a new analysis application.

Agave app, job, and system objects are resource definitions that give the underlying platform the information it needs to marshal data, invoke analysis code, and store the result. Let's examine app definitions as that's the sharpest pain point for most right now.

Applications

Let's assume your analysis code is in a file named code.py, and also that your code needs two pieces of information to do its work. First, it needs to know the name of the input file (let's say it is mydata.txt for now). Second, the value for a parameter we'll call alpha is needed. The objective of an Agave app is to provide enough detail that the Agave platform can automatically run your analysis code for you on data that you provide, with parameters that you have specified. To understand how to build a good Agave app, it's essential to understand how a Agave job works on a mechanistic level.

After a user has submitted a job request against a specific app, the following happens.

  1. Agave verifies that the user has access rights to all resources referred to. It also checks that the physical resources (Maverick, Stampede, Cloud VM) are in service.
  2. It then sets up an ephemeral job directory on the Agave system referenced by the job. This is (almost always) the system to which the app has been deployed. This directory is considered a virtual root directory for the job. Its contents are defined by the contents of your app's deploymentPath, which you build and upload as part of creating your app.
  3. Agave copies in data files specified by the job's inputs
  4. Then, Agave writes a special file (you have probably seen them) with the suffix *ipcexe. This is actually what gets run on the target system. The templatePath in your app definition is the template for this file. It's always written in Bash 4 for maximum portability. Agave substitutes values submitted in the job into the *ipcexe file based on variable names defined as input and parameter id in the app definition.
  5. Agave executes (or sends to a scheduler) the ipcexe script and monitors the results.

For the example we're considering here, the minimal template is

python code.py ${input1} ${alpha}

If we have a proper app and job definition, it will written into its ipcexe form as

python code.py mydata.txt 0.5 

Dependencies

Besides helping to isolate jobs from one another, the Agave app model is intended to represent and package dependencies to ensure portability. We support multiple, non-exclusive ways of accomplishing that objective.

  1. Don't Even Try: Some users don't try to solve for portability. They use Agave apps strictly as scripting coordinators. This is fine, but not portable and one is at the mercy of the sysadmins with respect to installed software versions.
  2. UNIX modules: Commonly used on shared systems, these reconfigure the enviroment via coordinated manipulation of the user's environment to change paths. The assumption at play is that there are modules installed that can satisfy the dependencies for your analysis code.
  3. App deploymentPath: The full contents of an app's deploymentPath are copied into the temporary job directory. If the dependencies are binaries, it is up to the app builder to ensure portability and compatibility with the target system. The app builder also takes on all responsibility for setting up paths, permissions, etc. This is the most flexible option.
  4. Containers: Agave doesn't yet have native container support. While that is under development, we provide some recipes for how to apply technique #3.

Container-based Apps and Dependencies

Let's revisit code.py in context of a containerized environment.

Instead of running code.py directly, the job of the *ipcexe file is to invoke code.py inside a container, where the current working directory (in this case, the Agave job directory) is mounted into the container as WORKDIR and thus has access to input files staged in by Agave as well as any other files that were in the deploymentPath directory. Assuming code.py lives in a container named myapp:0.1.0, the template could look like:

docker run -v $PWD:/home myapp:0.1.0 python code.py ${input1} ${alpha}

But, we don't want to restrict ourselves to Docker since its not supported on most HPC systems. So, the TACC team has provided a wrapper that can be bundled with your app (_util/container_exec.sh) and used to simplify the run setup.

container_exec myapp:0.1.0 python code.py ${input1} ${alpha}

In theory, this works out of the box with the Singularity container runtime supported on TACC's HPC systems. But, there's a catch. Mounting a volume into a container is an adminstrative process that we can't automatically delegate to end users. TACC does this automatically when a Singularity container is launched, but with the caveat that the following shadow mountpoints must exist in the container image: /scratch, /gpfs, /work, and /data. If you have built your Docker image from a TACC-provided base, these are pre-created and you'll never encounter a problem.

If you want to add these to your Docker image, add this to your Dockerfile (ideally early on)

LABEL description="Additional root-level directories to avoid needing OverlayFS @ TACC HPC"
RUN mkdir -p /work && chown root:root /work
RUN mkdir -p /gpfs && chown root:root /gpfs
RUN mkdir -p /data && chown root:root /data
RUN mkdir -p /scratch && chown root:root /scratch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment