Skip to content

Instantly share code, notes, and snippets.

@sfletc
Last active September 18, 2017 05:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sfletc/b70911d0de13bd4cde86f08b6ca32026 to your computer and use it in GitHub Desktop.
Save sfletc/b70911d0de13bd4cde86f08b6ca32026 to your computer and use it in GitHub Desktop.
Install the SCRAM pipeline via Docker

Install the SCRAM pipeline (including some useful tools) via Docker


What is Docker (and why is it useful for bioinformatics)?

Long story short - it takes the pain out of installing bioinformatics software if someone's already done it for you.

Docker allows you to download images and run containers on a Windows, Mac or Linux PC. All the software for a bioinformatics workflow can be loaded onto an image and tested before deployment. It doesn't matter that the software is designed for Linux, and you're using a Windows PC - it'll run!!

But what is an image or a container? If you take a snapshot of your PC's hard-drive, that's essentially an image. It has an OS, whatever software you have installed, and some data files. When you 'run' an image, a container is generated. This is similar to a running PC - it can be interacted with, and its installed software used. Instead of being a discrete entity like a PC, the container is virtualized. One image can be used to generate multiple containers, which can run similtaneously on a single PC. Often containers are deleted once the application they are performing is complete.

Redhat has a nice outline:

Container tools, including Docker, provide an image-based deployment model. This makes it easy to share an application, or set of services, with all of their dependencies across multiple environments. Docker also automates deploying the application (or combined sets of processes that make up an app) inside this container environment.

Rather than installing and troubleshooting software for QC, adapter trimming, read alignment, visualization and electronic lab book keeping, a single command can download a pre-configured image, and a single command can start the container.

Why is Docker great for bioinformatics? It makes installation of complex pipelines simple and hassle free (not that the SCRAM pipeline is overly complex). It also supports reproducible science - the same software versions can be used across multiple systems, ensuring the same outputs for a given set of inputs. For a publication, not only can the read files be made available online in the SRA for example, but the entire pipeline that generated the data for the paper can also be stored in the cloud, allowing anyone to reproduce the results, spot process errors, and suggest improvements.

The SCRAM docker image has the following packages installed (among others). Remember to cite the package authors if you use them!

  1. The SCRAM aligner and plotter
  2. Jupyter Notebook
  3. FastQC
  4. FastX-toolkit
  5. Blast+

The SCRAM docker file (which is used to build the image) is here if you want to take a look.

Prerequisites

A mid-range PC or up - laptop or desktop. Ideally a minimum of 8GB RAM, though 16GB+ is better for larger projects. Remember, unless it's a Linux machine, it's running both its own OS and the SCRAM Docker container with associated OS at the same time.

Easiest is a Linux machine (e.g. running Ubuntu). Next are Windows 10 Pro or similar machines, and Apple PCs running OS X (1-click installs). Windows 7 and (I think) Win 10 Home machines require a bit more playing around (Docker Toolbox and Virtual Box instead of Hyper-V)

Install Docker CE

Ubuntu

Via the Terminal:

sudo apt-get install docker.io

You'll need to use sudo or follow this guide

OS X (Apple Mac)

Full instructions are here

Windows 10 (Pro or Education)

Full instructions are here

Windows 10 (Home) or Windows 7/8

Unfortunately it's a bit more work - full instructions are here

Start Docker for the first time (Windows 10 and Mac - skip for Linux)

After starting docker, right click on the Docker icon in the taskbar / dock, and click settings. You'll probably want to un-tick the Start Docker when you log in check box.

Click on the Shared Drives tab (left), and ensure the drive with your project data is shared. Sometimes antivirus products (like Kaspersky) can interfere, so their settings may need altering.

Next, click on the Advanced tab and give Docker sufficient CPUs and memory (RAM). All CPUs and approximately 4GB memory remaining for the host OS should be OK, but check how much RAM the host OS and background processes are actually using if this is an issue.

Multiple Docker images (and containers if you retain them) can take up a lot of disk space, so ensure that there is sufficient spare.

Download the SCRAM docker image

With Docker running in the background, open a terminal (Mac and Linux), Powershell (Windows 10 Pro), or the Docker terminal (Windows 10 Home, Windows7/8) and enter:

docker pull sfletcher/scram_docker

sudo may be needed for Linux. A decent internet connection is required, as there is a fair bit to download. If the software is updated, it's likely only a portion of this will have to be (automatically) downloaded again.

That's it - running SCRAM will be covered in the next gist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment