Skip to content

Instantly share code, notes, and snippets.

@disulfidebond
Last active March 27, 2023 10:04
Show Gist options
  • Star 15 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save disulfidebond/00ff5a6f84a0a81057c6e5817c540569 to your computer and use it in GitHub Desktop.
Save disulfidebond/00ff5a6f84a0a81057c6e5817c540569 to your computer and use it in GitHub Desktop.
ONT Guppy setup

Overview

This markdown file contains the steps involved in configuring a new computer, runnning Ubuntu 16.04, to run ONT Guppy GPU basecalling.

Prerequisites

  • CUDA must be installed, which can be simple or extremely difficult, depending on if the CUDA gods smile on you.
  • The computer must be running Ubuntu 16.04 'xenial', with all updates installed.

Steps

  • The steps in the installation manual were followed as directed.

  • For the graphics card that was installed, a RTX 2080ti, no additional configuration was necessary, similar to the recommendations for the GTX 1080ti.

  • guppy_basecaller was tested with the following parameters and a simple bash for loop:

      # directory contains 0.tar.gz, 1.tar.gz, ...
      for i in *.tar.gz ; do
        V=$(echo "$i" | cut -d. -f1)
        OUTDIRPATH=/output/path
        INPUTDIRPATH=/input/path
        mkdir $OUTDIRPATH/${V}
        guppy_basecaller -x "cuda:0" --input_path $INPUTDIRPATH --output_path $OUTDIRPATH --flowcell $FLOWCELL --kit $KIT --records_per_fastq 0
        echo "basecalling for $i done"
      done
      # the option "-x" specifies a CUDA graphics card at slot 0
      # if you have only 1 card, or the card you want is at slot 0, this is not necessary
      # to get info on graphics cards, use this command:
      nvidia-smi
    

Stats/Benchmark

  • The graphics card at cuda:0 was used:

      bash$ nvidia-smi 
      Fri May 24 18:31:05 2019       
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
      |-------------------------------+----------------------+----------------------+
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |===============================+======================+======================|
      |   0  GeForce RTX 208...  Off  | 00000000:17:00.0 Off |                  N/A |
      | 52%   82C    P2   252W / 250W |   4218MiB / 10989MiB |     78%      Default |
      +-------------------------------+----------------------+----------------------+
      |   1  Quadro P400         Off  | 00000000:65:00.0  On |                  N/A |
      | 34%   43C    P8    N/A /  N/A |    233MiB /  1992MiB |      0%      Default |
      +-------------------------------+----------------------+----------------------+
                                                                                     
      +-----------------------------------------------------------------------------+
      | Processes:                                                       GPU Memory |
      |  GPU       PID   Type   Process name                             Usage      |
      |=============================================================================|
      |    0      7762      C   guppy_basecaller                            4207MiB |
      |    1      1309      G   /usr/lib/xorg/Xorg                           133MiB |
      |    1      2481      G   compiz                                        87MiB |
      +-----------------------------------------------------------------------------+
    
  • Here is the output from a test basecalling run:

      ONT Guppy basecalling software version 3.1.5+781ed57
      config file:        /opt/ont/guppy/data/dna_r9.4.1_450bps_hac.cfg
      model file:         /opt/ont/guppy/data/template_r9.4.1_450bps_hac.jsn
      input path:         /media/drive2/ONTdata/50
      save path:          /media/databk1/ONTdata_05242019-output/50
      chunk size:         1000
      chunks per runner:  1000
      records per file:   0
      num basecallers:    4
      gpu device:         cuda:0
      kernel path:
      runners per device: 2
    
      Found 4000 fast5 files to process.
      Init time: 1547 ms
    
      0%   10   20   30   40   50   60   70   80   90   100%
      |----|----|----|----|----|----|----|----|----|----|
      ***************************************************
      Caller time: 56515 ms, Samples called: 558584255, samples/s: 9.88382e+06
      Finishing up any open output files.
      Basecalling completed successfully.
    
@smallwhitelama
Copy link

smallwhitelama commented May 28, 2019

Thank you for this useful article.Could you share your list of computer equipment?

@ginolhac
Copy link

ginolhac commented May 28, 2019

this can be further improved using tweaked parameters. Be default some units are still waiting for data

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.66       Driver Version: 410.66       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:1C:00.0 Off |                    0 |
| N/A   66C    P0   280W / 300W |   6650MiB / 16130MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0    290317      C   .../install/ont-guppy/bin/guppy_basecaller  6639MiB |
+-----------------------------------------------------------------------------+

see the output and modified parameters here:

ONT Guppy basecalling software version 3.1.5+781ed57
config file:        /mnt/irisgpfs/users/aginolhac/install/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:         /mnt/irisgpfs/users/aginolhac/install/ont-guppy/data/template_r9.4.1_450bps_fast.jsn
input path:         mtDNACaco2/20190131_1000_MN22103_FAH49932_9745c045/fast5/
save path:          /scratch/users/aginolhac/170607/bc_guppy_fast_gpu
chunk size:         500
chunks per runner:  768
records per file:   4000
fastq compression:  ON
num basecallers:    14
gpu device:         auto
kernel path:        
runners per device: 8

Found 173 fast5 files to process.
Init time: 882 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 858894 ms, Samples called: 13969376311, samples/s: 1.62644e+07

Edit: using the same config dna_r9.4.1_450bps_hac.cfg leads to better Q scores and still a great speed

Caller time: 893478 ms, Samples called: 14024771315, samples/s: 1.56968e+07

@disulfidebond
Copy link
Author

disulfidebond commented May 28, 2019

Thanks Aurélien!!

smallwhitelama here are the specs for the computer:

Unit: Dell Precision 5820 Tower
CPU: Intel i9-9900X 'SkyLake' 3.5 GHz, 4.5 GHz Turbo
Memory: 64 GB DDR4 RAM
Storage1: 1 TB NVMe SSD
Storage2: 2 TB 7200 RPM SATA HD
GPU: PNY GeForce RTX 2080Ti with 11 GB GDDR6 RAM
OS: Ubuntu 16.04 LTS

@ginolhac
Copy link

you are welcome, the optimization was performed by @vplugaru.

to be crystal clear, the basecalling is tweaked as follow

guppy_basecaller \
  -i fast5/ \
  --config dna_r9.4.1_450bps_hac.cfg \
  --save_path fastq/ \
  --compress_fastq \
  -x "auto" --num_callers 14 --gpu_runners_per_device 8\
  --chunks_per_runner 768 --chunk_size 500

quite nice to be able to output gzipped fastq thanks to the latest version

@smallwhitelama
Copy link

Thank you!

@callumparr
Copy link

you are welcome, the optimization was performed by @vplugaru.

to be crystal clear, the basecalling is tweaked as follow

guppy_basecaller \
  -i fast5/ \
  --config dna_r9.4.1_450bps_hac.cfg \
  --save_path fastq/ \
  --compress_fastq \
  -x "auto" --num_callers 14 --gpu_runners_per_device 8\
  --chunks_per_runner 768 --chunk_size 500

quite nice to be able to output gzipped fastq thanks to the latest version

If I understand correctly increase runners per device and num callers can increase speed at expense of GPU memory but what is the rationale for deciding how to tweak the chunk size and number of chunks to send to each basecaller instance? by default both have value 1000, was it trial and error to arrive at 768 and 500 ? Did this have some affect to change the base calling results. I worry changing chunk size may affect the Basecall quality.

@colindaven
Copy link

Fairly new GPU basecaller here: do you guys have suggestion for optimizing for an A100 Ampere GPU with 40GB GPU RAM. The server has 64 CPU cores. Thanks.

@benbfly
Copy link

benbfly commented Aug 19, 2021

Did you ever figure this out for A100? That's what we have as well.

@colindaven
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment