michaelbarton/specification.mkd

## specification.mkd

      
    Raw
  

              specification.mkd
            
          
    Introduction

Purpose

The purpose of this document is provide a detailed specification for developers
to write community-standardised bioinformatics containers. The audience of this
document are bioinformaticians writing bioinformatics code that can be shared
interchangeably using Linux containers. This document will describe the
interfaces that a developer may expect to be available when the container is
run.
Scope of Project

The scope of this document is all bioinformatics software that is packaged
within a Linux container for sharing. Bioinformatics software in a Linux
container can easily be shared as all dependencies should be included with the
container. Examples of bioinformatics software are genome assemblers, read
binners and read aligners. Examples of container software are Docker and LXC.
The aim of standardising bioinformatics software in containers is so that they
may used interchangeably with the same interface between different research
groups and institutions.
Applications of this interface are:

A developer uploads his short read aligner as a container to an online
repository for others to use. A biologists downloads this aligner and is
able to use it immediately as it follows a standardised interface that the
biologist is already familiar with.
A genome assembly benchmarking service downloads many containerised genome
assemblers and compares them with each other using different performance
metrics. The standardised interface allows all containers to be tested in
the same way.

References

IEEE. IEEE Std 830-1998 IEEE Recommended Practice for Software Requirements
Specifications. IEEE Computer Society, 1998.
Specific Requirements

Functional Requirements

Generic bioinformatics container

Introduction

This specification describes the required inputs for generic containerised
bioinformatics software. This specification
Inputs


Proc: The argument given to start a container MUST be a single string
containing only the the characters a-z and '-'. This argument is used to
differentiate different combination of settings the containerised software
can be run as.
INPUT_DIR*: The variable INPUT_DIR MUST be present inside the
container environment. This will be the absolute path to a read-only
directory containing the files required for running the software.
OUTPUT_DIR: The variable OUTPUT_DIR MUST be present inside the
container environment. This will be the absolute path to a writable
directory. This location should be used to write the completed files to.
LOG_FILE: The variable LOG_FILE MUST be present in the container
environment. This will be a path relative to OUTPUT_DIR. This location
should be used by the developer to write any information about the process
that may be useful for information or debugging purposes.
AVAIL_CPUS: The variable AVAIL_CPUS MUST be present in the
container environment. This describes the number of CPUs that have been
provided to the container.

Outputs

All files that are generated by running the container should be created in the
directory specified by OUTPUT_DIR. This should be a mounted volume in the
container so that this data is available on the host file system after the
container completes. The containerised software should return a zero exit code
when completing successfully, and return a non-zero exit code when an error
occurs.