Skip to content

Instantly share code, notes, and snippets.

@disulfidebond
Last active March 27, 2023 10:04
Show Gist options
  • Save disulfidebond/00ff5a6f84a0a81057c6e5817c540569 to your computer and use it in GitHub Desktop.
Save disulfidebond/00ff5a6f84a0a81057c6e5817c540569 to your computer and use it in GitHub Desktop.
ONT Guppy setup

Overview

This markdown file contains the steps involved in configuring a new computer, runnning Ubuntu 16.04, to run ONT Guppy GPU basecalling.

Prerequisites

  • CUDA must be installed, which can be simple or extremely difficult, depending on if the CUDA gods smile on you.
  • The computer must be running Ubuntu 16.04 'xenial', with all updates installed.

Steps

  • The steps in the installation manual were followed as directed.

  • For the graphics card that was installed, a RTX 2080ti, no additional configuration was necessary, similar to the recommendations for the GTX 1080ti.

  • guppy_basecaller was tested with the following parameters and a simple bash for loop:

      # directory contains 0.tar.gz, 1.tar.gz, ...
      for i in *.tar.gz ; do
        V=$(echo "$i" | cut -d. -f1)
        OUTDIRPATH=/output/path
        INPUTDIRPATH=/input/path
        mkdir $OUTDIRPATH/${V}
        guppy_basecaller -x "cuda:0" --input_path $INPUTDIRPATH --output_path $OUTDIRPATH --flowcell $FLOWCELL --kit $KIT --records_per_fastq 0
        echo "basecalling for $i done"
      done
      # the option "-x" specifies a CUDA graphics card at slot 0
      # if you have only 1 card, or the card you want is at slot 0, this is not necessary
      # to get info on graphics cards, use this command:
      nvidia-smi
    

Stats/Benchmark

  • The graphics card at cuda:0 was used:

      bash$ nvidia-smi 
      Fri May 24 18:31:05 2019       
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
      |-------------------------------+----------------------+----------------------+
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |===============================+======================+======================|
      |   0  GeForce RTX 208...  Off  | 00000000:17:00.0 Off |                  N/A |
      | 52%   82C    P2   252W / 250W |   4218MiB / 10989MiB |     78%      Default |
      +-------------------------------+----------------------+----------------------+
      |   1  Quadro P400         Off  | 00000000:65:00.0  On |                  N/A |
      | 34%   43C    P8    N/A /  N/A |    233MiB /  1992MiB |      0%      Default |
      +-------------------------------+----------------------+----------------------+
                                                                                     
      +-----------------------------------------------------------------------------+
      | Processes:                                                       GPU Memory |
      |  GPU       PID   Type   Process name                             Usage      |
      |=============================================================================|
      |    0      7762      C   guppy_basecaller                            4207MiB |
      |    1      1309      G   /usr/lib/xorg/Xorg                           133MiB |
      |    1      2481      G   compiz                                        87MiB |
      +-----------------------------------------------------------------------------+
    
  • Here is the output from a test basecalling run:

      ONT Guppy basecalling software version 3.1.5+781ed57
      config file:        /opt/ont/guppy/data/dna_r9.4.1_450bps_hac.cfg
      model file:         /opt/ont/guppy/data/template_r9.4.1_450bps_hac.jsn
      input path:         /media/drive2/ONTdata/50
      save path:          /media/databk1/ONTdata_05242019-output/50
      chunk size:         1000
      chunks per runner:  1000
      records per file:   0
      num basecallers:    4
      gpu device:         cuda:0
      kernel path:
      runners per device: 2
    
      Found 4000 fast5 files to process.
      Init time: 1547 ms
    
      0%   10   20   30   40   50   60   70   80   90   100%
      |----|----|----|----|----|----|----|----|----|----|
      ***************************************************
      Caller time: 56515 ms, Samples called: 558584255, samples/s: 9.88382e+06
      Finishing up any open output files.
      Basecalling completed successfully.
    
@ginolhac
Copy link

you are welcome, the optimization was performed by @vplugaru.

to be crystal clear, the basecalling is tweaked as follow

guppy_basecaller \
  -i fast5/ \
  --config dna_r9.4.1_450bps_hac.cfg \
  --save_path fastq/ \
  --compress_fastq \
  -x "auto" --num_callers 14 --gpu_runners_per_device 8\
  --chunks_per_runner 768 --chunk_size 500

quite nice to be able to output gzipped fastq thanks to the latest version

@smallwhitelama
Copy link

Thank you!

@callumparr
Copy link

you are welcome, the optimization was performed by @vplugaru.

to be crystal clear, the basecalling is tweaked as follow

guppy_basecaller \
  -i fast5/ \
  --config dna_r9.4.1_450bps_hac.cfg \
  --save_path fastq/ \
  --compress_fastq \
  -x "auto" --num_callers 14 --gpu_runners_per_device 8\
  --chunks_per_runner 768 --chunk_size 500

quite nice to be able to output gzipped fastq thanks to the latest version

If I understand correctly increase runners per device and num callers can increase speed at expense of GPU memory but what is the rationale for deciding how to tweak the chunk size and number of chunks to send to each basecaller instance? by default both have value 1000, was it trial and error to arrive at 768 and 500 ? Did this have some affect to change the base calling results. I worry changing chunk size may affect the Basecall quality.

@colindaven
Copy link

Fairly new GPU basecaller here: do you guys have suggestion for optimizing for an A100 Ampere GPU with 40GB GPU RAM. The server has 64 CPU cores. Thanks.

@benbfly
Copy link

benbfly commented Aug 19, 2021

Did you ever figure this out for A100? That's what we have as well.

@colindaven
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment