Skip to content

Instantly share code, notes, and snippets.

@michaelbarton
Last active August 29, 2015 14:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save michaelbarton/3844e29ccdbd4c0a19d2 to your computer and use it in GitHub Desktop.
Save michaelbarton/3844e29ccdbd4c0a19d2 to your computer and use it in GitHub Desktop.
Specification for bioinformatics containers

Introduction

Purpose

The purpose of this document is provide a detailed specification for developers to write community-standardised bioinformatics containers. The audience of this document are bioinformaticians writing bioinformatics code that can be shared interchangeably using Linux containers. This document will describe the interfaces that a developer may expect to be available when the container is run.

Scope of Project

The scope of this document is all bioinformatics software that is packaged within a Linux container for sharing. Bioinformatics software in a Linux container can easily be shared as all dependencies should be included with the container. Examples of bioinformatics software are genome assemblers, read binners and read aligners. Examples of container software are Docker and LXC. The aim of standardising bioinformatics software in containers is so that they may used interchangeably with the same interface between different research groups and institutions.

Applications of this interface are:

  • A developer uploads his short read aligner as a container to an online repository for others to use. A biologists downloads this aligner and is able to use it immediately as it follows a standardised interface that the biologist is already familiar with.
  • A genome assembly benchmarking service downloads many containerised genome assemblers and compares them with each other using different performance metrics. The standardised interface allows all containers to be tested in the same way.

References

IEEE. IEEE Std 830-1998 IEEE Recommended Practice for Software Requirements Specifications. IEEE Computer Society, 1998.

Specific Requirements

Functional Requirements

Generic bioinformatics container

Introduction

This specification describes the required inputs for generic containerised bioinformatics software. This specification

Inputs

  • Proc: The argument given to start a container MUST be a single string containing only the the characters a-z and '-'. This argument is used to differentiate different combination of settings the containerised software can be run as.
  • INPUT_DIR*: The variable INPUT_DIR MUST be present inside the container environment. This will be the absolute path to a read-only directory containing the files required for running the software.
  • OUTPUT_DIR: The variable OUTPUT_DIR MUST be present inside the container environment. This will be the absolute path to a writable directory. This location should be used to write the completed files to.
  • LOG_FILE: The variable LOG_FILE MUST be present in the container environment. This will be a path relative to OUTPUT_DIR. This location should be used by the developer to write any information about the process that may be useful for information or debugging purposes.
  • AVAIL_CPUS: The variable AVAIL_CPUS MUST be present in the container environment. This describes the number of CPUs that have been provided to the container.

Outputs

All files that are generated by running the container should be created in the directory specified by OUTPUT_DIR. This should be a mounted volume in the container so that this data is available on the host file system after the container completes. The containerised software should return a zero exit code when completing successfully, and return a non-zero exit code when an error occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment