Skip to content

Instantly share code, notes, and snippets.

@chrisamiller
Last active September 22, 2020 13:40
Show Gist options
  • Save chrisamiller/688fb9c95f2cc9bd314996e4a1478892 to your computer and use it in GitHub Desktop.
Save chrisamiller/688fb9c95f2cc9bd314996e4a1478892 to your computer and use it in GitHub Desktop.
notes from the 2019.09.23 bioinformatics workshop on LSF and Docker

LSF basics

Interactive Jobs

Let's get a job on an interactive blade

bsub -Is -M 2000000 -R 'select[mem>2000] rusage[mem=2000]' -n 1 -q docker-interactive -a 'docker(chrisamiller/docker-genomic-analysis)' /bin/bash

Let's notice some things:

  1. all of our data volumes are mounted and accessible
  2. tools that weren't previously there, are there now
  3. we used the docker-interactive queue (slides)
  • bjobs lets you see running jobs
  • bjobs -l lets you see all the gory details

There are shortcuts for getting interactive jobs that don't require quite so much typing

docker-interactive chrisamiller/docker-genomic-analysis

gives a instance with 4G of ram, 1 core

gsub is a special alias for doing things in GMS (MGI pipelines and tools). We'll talk about that more in a future session isub is a similar alias you can set up that has similar syntax. See: /gscuser/cmiller/usr/bin/isub -h

Non-interactive jobs

When you run a program, you often get two types of output

  • to your files (often through STDOUT)
  • to your screen (generally through STDERR)

With LSF, it's the same concept, but stdout/stderr go to files instead

Running a simple job

bsub -M 2000000 -R 'select[mem>2000] rusage[mem=2000]' -q research-hpc -oo date.log -a 'docker(ubuntu:xenial)' "date"

Redirecting stdout

bsub -M 2000000 -R 'select[mem>2000] rusage[mem=2000]' -q research-hpc -oo date.log -a 'docker(ubuntu:xenial)' "bash -c \"date >date.output\""

Using a script to avoid escaping quotes or complicated expressions

echo "date >date.output" > rundate.sh
cat rundate.sh
bsub -M 2000000 -R 'select[mem>2000] rusage[mem=2000]' -q research-hpc -oo date.log -a 'docker(ubuntu:xenial)' "bash rundate.sh"

Using a job name so that you can track progress

echo "sleep 60" >>rundate.sh
bsub -M 2000000 -R 'select[mem>2000] rusage[mem=2000]' -q research-hpc -oo date.log -a 'docker(ubuntu:xenial)' -J mydate "bash rundate.sh"

Some useful LSF commands:

bjobs - list jobs bsub - submit jobs binfo - (/gscuser/cmiller/usr/bin/binfo) - info on old jobs that includes actual mem usage bkill - kill jobs

Use job groups for controlling the rate at which things run: https://confluence.ris.wustl.edu/display/~cmiller/Using+LSF+job+groups

Some LSF tips:

  • don't launch 100 jobs until you've verified that one will run successfully
  • If you need to launch 1000 jobs, you're probably doing it wrong.
  • If you need to launch 100,000 jobs, you're definitely doing it wrong
  • If you're launching hundreds of jobs that take only seconds to complete, refactor your code
  • every job has a /tmp/ directory that gets blown away when things are over. In this era of limited/expensive disk, this is useful!

Using Docker

On your laptop:

docker pull ubuntu

What is that doing? It's going to https://hub.docker.com/_/ubuntu and pulling down the image with the "latest" tag

docker run ubuntu

Uhh, nothing happened. Not quite - it loaded up the entire OS, but you didn't tell it to do anything!

docker run ubuntu echo "hello world" 

What's cool is that didn't run on MacOS, that ran in Linux, and we can prove it:

docker run ubuntu uname -a

But what if we want to do more than one thing at a time? Run docker interactively! (bash is the most common shell that we generally work on)

docker run -it ubuntu /bin/bash

There is one major difference between running docker on the cluster and on your laptop:

whoami

We can't get root access in docker images on the cluster. To oversimplify a complicated topic, that's to prevent people from accessing data they shouldn't be able to.

Here though, we have root, so let's install some software.

python

fails - it's not installed!

apt-get update
apt-get install python

Now it should work:

`python`
`>>> print "Hello World!"`

Now, exit out python and your container

<CTRL-D to quit python >
exit 

Let's pop back into our container

python

Wait! where'd it go? Let's chat about persistence

My First Dockerfile

Create a folder

mkdir ubuntu-python
    cd ubuntu-python

Create a new text file named "Dockerfile" with the following contents:

# start from base ubuntu
FROM ubuntu:latest

MAINTAINER Chris Miller <c.a.miller@wustl.edu>

RUN apt-get -y update
RUN apt-get -y install python

Save that file in the directory And build a docker image from that Dockerfile

    cd ..
    docker build -t chrisamiller/ubuntu-python ubuntu-python/

Let's run it:

    docker run -it chrisamiller/ubuntu-python

and verify that python is installed

    python
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment