Skip to content

Instantly share code, notes, and snippets.

@sirselim
Last active August 30, 2023 03:46
Show Gist options
  • Star 15 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save sirselim/13f70ae69f2a512e7d9e1f00f9704f53 to your computer and use it in GitHub Desktop.
Save sirselim/13f70ae69f2a512e7d9e1f00f9704f53 to your computer and use it in GitHub Desktop.
My notes on setting up basecalling on Google Colab

Nanopore basecalling on Google Colab


NOTE: this whole idea is the brain child of Jürgen Hench. He got it up and running and posted about it here. I am merely wrapping the idea in a hopefully easy to follow set of instructions for people to test themseleves.


This notebook describes processing of Nanopore sequencing data (fast5 files) in a Google Colab interactive notebook environment. This is made possible by utalising the GPU enabled runtime that is available via Colab.

Before we get started there are some important points to consider.

Caveats

Some things to note before proceeding:

  • you will need an ONT community forum account to download Guppy, so make sure you have one and can access the downloads section
  • this is a cloud based approach, meaning all data will be located in some cloud instance somewhere (I use Google Drive in this example). This may not be appropriate for the data you have. Consider this carefully before uploading any of your data.
  • this is currently a free service, it may well be removed at any stage - this is completely at the discretion of Google.
    • as part of this Google has the right to monitor usage and may throttle or deny allocation of resources to users that are running constantly
    • the amount and type of allocated resources can and likely will change. The current GPU instance is using GPUs that work with Guppy, and the available disk is about 64Gb, but this can change
  • runtime disconnection is a thing, if the notebook is idle too long you'll be disconnected
  • it is possible to run out of memory/RAM
  • there is no guarantee that the GPU hardware will be available when you want to use it
    • the GPU that you get allocated might not be compatible with Guppy. For example, in one instance I was assigned a Telsa K80. This is a Kepler based card and doesn't meet the requirement of CUDA compute >=6.0. This is the error that I recieved:
[guppy/error] *common::LoadModuleFromFatbin: Loading fatbin file shared.fatbin failed with: CUDA error at /builds/ofan/ont_core_cpp/ont_core/common/cuda_common.cpp:54: CUDA_ERROR_NO_BINARY_FOR_GPU
  • there is no responsibility from myself or ONT, you're on your own! :)

A note of interest from the Google Colab FAQ:

"The types of GPUs that are available in Colab vary over time. This is necessary for Colab to be able to provide access to these resources for free. The GPUs available in Colab often include Nvidia K80s, T4s, P4s and P100s. There is no way to choose what type of GPU you can connect to in Colab at any given time. Users who are interested in more reliable access to Colab’s fastest GPUs may be interested in Colab Pro."

So there are 4 different GPUs on offer, and it's essentially a 'lottery' as to which you get assigned - though it is probably likely to be one of the less powerful options. Here is an overview of these GPUs with respect to which "work" with Guppy:

  • Nvidia K80 - not compatible with Guppy
    • Kepler 2.0 microarchitecture
      • Year of release = 2014
    • CUDA Compute = 3.7
    • 2496 x2 CUDA cores (essential a dual GPU)
  • Nvidia P4 - compatible with Guppy
    • Pascal microarchitecture
      • Year of release = 2016
    • CUDA Compute = 6.1
    • 2560 CUDA cores
  • Nvidia P100 - compatible with Guppy
    • Pascal microarchitecture
      • Year of release = 2016
    • CUDA Compute = 6.0
    • 3584 CUDA cores
  • Nvidia T4 - compatible with Guppy
    • Turing microarchitecture
      • Year of release = 2018
    • CUDA Compute = 7.5
    • 2560 CUDA cores

So of the 4 types of GPU currently available via the free tier of Google Colab, the Nvidia K80 is the only one which will not work with Guppy as it is currenty implemented. If you end up with an instance with a K80 then there is no point continuing, and you can try again later. If you sign up for the Pro version of Google Colab (9.99 USD p/month) then you are priority to better GPUs - food for thought.


Initiate GPU runtime

The first thing is to make sure the runtime is set to use a GPU. To do this is pretty simple:

  • go to the Runtime menu
  • select the Change runtime type option
  • make sure the Hardware accelerator is set to GPU

Check the presence of a GPU

Once the above is set up you should be able to run the below code block. If successful you should see something like /device:GPU:0 as the output. This means that the GPU is available for use.

import tensorflow as tf
tf.test.gpu_device_name()       # this will tell you device number (should be 0 with a single GPU)

import torch
torch.cuda.get_device_name(0)   # this will tell you the name/model of the GPU
'Tesla T4'

Download Guppy

You will need to have access to the ONT community forum here to be able to access the download section to grab a copy of Guppy.

Once you have access and can navigate to the 'Software Downloads' section of the ONT community forum you will see a listing for Guppy. I recommend grabbing the pre-compiled binaries, i.e. the version listed as Linux x64-bit GPU, it should have a file name similar to ont-guppy_X.X.X_linux64.tar.gz - where the X's denote the version number. You can copy the link to this download and paste it into the code block below, i.e. replace the section [paste_guppy_link_here].

Run the code block and Guppy will be downloaded.

%%shell
GuppyBinary=[paste_guppy_link_here]
wget $GuppyBinary
...
...
...
Resolving americas.oxfordnanoportal.com (americas.oxfordnanoportal.com)... 96.126.99.215
Connecting to americas.oxfordnanoportal.com (americas.oxfordnanoportal.com)|96.126.99.215|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 637723012 (608M) [application/x-tar]
Saving to: ‘ont-guppy_4.5.3_linux64.tar.gz’

ont-guppy_4.5.3_lin 100%[===================>] 608.18M  44.7MB/s    in 14s     

2021-04-14 10:29:27 (42.0 MB/s) - ‘ont-guppy_4.5.3_linux64.tar.gz’ saved [637723012/637723012]

Extract the compressed Guppy binaries

Before we can use the Guppy binaries we need to extract the file we downloaded. Replace the X's in the below code block with the version you downloaded and then run the code block. If we use version 4.5.3 as an example:

%%shell
tar -xzvf ont-guppy_4.5.3_linux64.tar.gz
ont-guppy/bin/
ont-guppy/bin/guppy_basecall_client
ont-guppy/bin/guppy_basecall_server
ont-guppy/bin/guppy_basecaller
ont-guppy/bin/guppy_basecaller_supervisor
ont-guppy/data/
ont-guppy/data/YHR174W.fasta
ont-guppy/data/adapter_scaling_dna_r10.3_min.jsn
ont-guppy/data/adapter_scaling_dna_r10.3_prom.jsn
ont-guppy/data/adapter_scaling_dna_r9.4.1_min.jsn
ont-guppy/data/adapter_scaling_dna_r9.4.1_prom.jsn
ont-guppy/data/certs-bundle.crt
ont-guppy/data/dna_r10.3_450bps_fast.cfg
ont-guppy/data/dna_r10.3_450bps_fast_prom.cfg
ont-guppy/data/dna_r10.3_450bps_hac.cfg
ont-guppy/data/dna_r10.3_450bps_hac_prom.cfg
ont-guppy/data/dna_r10.3_450bps_modbases_5mc_hac_prom.cfg
ont-guppy/data/dna_r10_450bps_fast.cfg
ont-guppy/data/dna_r10_450bps_hac.cfg
ont-guppy/data/dna_r9.4.1_450bps_fast.cfg
ont-guppy/data/dna_r9.4.1_450bps_fast_prom.cfg
ont-guppy/data/dna_r9.4.1_450bps_hac.cfg
ont-guppy/data/dna_r9.4.1_450bps_hac_prom.cfg
ont-guppy/data/dna_r9.4.1_450bps_hac_prom_fw205.cfg
ont-guppy/data/dna_r9.4.1_450bps_modbases_5mc_hac.cfg
ont-guppy/data/dna_r9.4.1_450bps_modbases_5mc_hac_prom.cfg
ont-guppy/data/dna_r9.4.1_450bps_sketch.cfg
ont-guppy/data/dna_r9.5_450bps.cfg
ont-guppy/data/lambda_3.6kb.fasta
ont-guppy/data/lampore_analysis-2.0.0-py3-none-any.whl
ont-guppy/data/mismatch_matrix.txt
ont-guppy/data/rna_r9.4.1_70bps_fast.cfg
ont-guppy/data/rna_r9.4.1_70bps_fast_prom.cfg
ont-guppy/data/rna_r9.4.1_70bps_hac.cfg
ont-guppy/data/rna_r9.4.1_70bps_hac_prom.cfg
ont-guppy/data/template_r10.3_450bps_fast.jsn
ont-guppy/data/template_r10.3_450bps_fast_prom.jsn
ont-guppy/data/template_r10.3_450bps_hac.jsn
ont-guppy/data/template_r10.3_450bps_hac_prom.jsn
ont-guppy/data/template_r10.3_450bps_modbases_5mc_hac_prom.jsn
ont-guppy/data/template_r10_450bps_fast.jsn
ont-guppy/data/template_r10_450bps_hac.jsn
ont-guppy/data/template_r9.4.1_450bps_fast.jsn
ont-guppy/data/template_r9.4.1_450bps_fast_prom.jsn
ont-guppy/data/template_r9.4.1_450bps_hac.jsn
ont-guppy/data/template_r9.4.1_450bps_hac_prom.jsn
ont-guppy/data/template_r9.4.1_450bps_hac_prom_fw205.jsn
ont-guppy/data/template_r9.4.1_450bps_modbases_5mc_hac.jsn
ont-guppy/data/template_r9.4.1_450bps_modbases_5mc_hac_prom.jsn
ont-guppy/data/template_r9.4.1_450bps_sketch.jsn
ont-guppy/data/template_r9.5_450bps_5mer_raw.jsn
ont-guppy/data/template_rna_r9.4.1_70bps_fast.jsn
ont-guppy/data/template_rna_r9.4.1_70bps_fast_prom.jsn
ont-guppy/data/template_rna_r9.4.1_70bps_hac.jsn
ont-guppy/data/template_rna_r9.4.1_70bps_hac_prom.jsn
ont-guppy/bin/
ont-guppy/bin/guppy_aligner
ont-guppy/bin/minimap2
ont-guppy/lib/
ont-guppy/lib/MINIMAP2_LICENSE
ont-guppy/lib/libont_minimap2.so.2
ont-guppy/lib/libont_minimap2.so.2.17.2
ont-guppy/bin/
ont-guppy/bin/Nanopore Product Terms and Conditions (28 November 2018).pdf
ont-guppy/bin/THIRD_PARTY_LICENSES
ont-guppy/bin/
ont-guppy/bin/guppy_barcoder
ont-guppy/data/
ont-guppy/data/barcoding/
ont-guppy/data/barcoding/4x4_mismatch_matrix.txt
ont-guppy/data/barcoding/5x5_mismatch_matrix.txt
ont-guppy/data/barcoding/5x5_mismatch_matrix_simple.txt
ont-guppy/data/barcoding/barcode_arrs_16s.cfg
ont-guppy/data/barcoding/barcode_arrs_dual_nb24_pcr96.cfg
ont-guppy/data/barcoding/barcode_arrs_lwb.cfg
ont-guppy/data/barcoding/barcode_arrs_multivirus1.cfg
ont-guppy/data/barcoding/barcode_arrs_multivirus8.cfg
ont-guppy/data/barcoding/barcode_arrs_nb12.cfg
ont-guppy/data/barcoding/barcode_arrs_nb13-24.cfg
ont-guppy/data/barcoding/barcode_arrs_nb24.cfg
ont-guppy/data/barcoding/barcode_arrs_nb96.cfg
ont-guppy/data/barcoding/barcode_arrs_ncov8.cfg
ont-guppy/data/barcoding/barcode_arrs_ncov96.cfg
ont-guppy/data/barcoding/barcode_arrs_pcr12.cfg
ont-guppy/data/barcoding/barcode_arrs_pcr96.cfg
ont-guppy/data/barcoding/barcode_arrs_rab.cfg
ont-guppy/data/barcoding/barcode_arrs_rbk.cfg
ont-guppy/data/barcoding/barcode_arrs_rbk096.cfg
ont-guppy/data/barcoding/barcode_arrs_rbk4.cfg
ont-guppy/data/barcoding/barcode_arrs_rlb.cfg
ont-guppy/data/barcoding/barcode_arrs_vmk.cfg
ont-guppy/data/barcoding/barcode_arrs_vmk2.cfg
ont-guppy/data/barcoding/barcode_score_vs_classification.png
ont-guppy/data/barcoding/barcodes_masked.fasta
ont-guppy/data/barcoding/configuration.cfg
ont-guppy/data/barcoding/configuration_dual.cfg
ont-guppy/data/barcoding/multivirus_targets.fasta
ont-guppy/data/barcoding/ncov_targets.fasta
ont-guppy/data/barcoding/nw_barcoding_grid.png
ont-guppy/lib/
ont-guppy/lib/libvbz_hdf_plugin.so
ont-guppy/lib/libvbz_hdf_plugin.so.1
ont-guppy/lib/libvbz_hdf_plugin.so.1.0.0

Check Guppy version

We should now be able to run the Guppy binaries we downloaded. They are located in ./ont-guppy/bin. The below code block should run guppy_basecaller and report the version of the software.

%%shell
./ont-guppy/bin/guppy_basecaller --version
: Guppy Basecalling Software, (C) Oxford Nanopore Technologies, Limited. Version 4.5.3+0ab5ebb

Mount your Google Drive

By mounting your Google Drive you will be able to upload fast5 files which can be processed and the output can be written back to the same location within Drive.

The below chunk performs the mounting. You will be asked to authenticate, just follow the instructions and things should go pretty smoothly.

from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
Mounted at /content/gdrive

For this example I created a directory within My Drive called ONT and then within this folder another directory called example_data. I then uploaded a few fast5 files to this location.

We can check that the mounted drive and files are identified in the notebook environment below.

%%shell
ls gdrive/MyDrive/ONT/example_data
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_0.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_10.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_11.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_12.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_13.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_14.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_15.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_16.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_17.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_18.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_19.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_1.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_20.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_2.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_3.fast5
PAD42977_84899b42a6019949c8f43092626c45d9beac7752_4.fast5

Looks good! We can see a list of fast5 files.

Basecalling with Guppy

Now for the fun part!

With all the above working then we can now basecall our data. First we will set a few variables. The below code block creates shell variables for input and output locations, the guppy binary (basecaller) and several model files for basecalling (i.e. fast, hac and modified bases).

Fast calling model

Once we're happy with these variables we can then put together the Guppy code to start basecalling. In the below it's a fairly simple run using the fast model and adjusting the parameters slightly for the compute environment.

Run this block and hopefully you'll see base calling kick off. If so that's all there is to it. :)

%%shell
inputPath="gdrive/MyDrive/ONT/example_data"
outputPath="gdrive/MyDrive/ONT/example_data"
guppy_bc=./ont-guppy/bin/guppy_basecaller                               # set guppy_basecaller binary location
guppy_cfg_fast=./ont-guppy/data/dna_r9.4.1_450bps_fast.cfg              # fast model calling
guppy_cfg_hac=./ont-guppy/data/dna_r9.4.1_450bps_hac.cfg                # high accuracy calling
guppy_cfg_mod=./ont-guppy/data/dna_r9.4.1_450bps_modbases_5mc_hac.cfg   # base modification calling

$guppy_bc -i $inputPath -s $outputPath  \
--recursive \
--config $guppy_cfg_fast \
--gpu_runners_per_device 16 \
--cpu_threads_per_caller 2 \
--device cuda:0
ONT Guppy basecalling software version 4.5.3+0ab5ebb
config file:        ./ont-guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:         /content/ont-guppy/data/template_r9.4.1_450bps_fast.jsn
input path:         gdrive/MyDrive/ONT/example_data
save path:          gdrive/MyDrive/ONT/example_data
chunk size:         2000
chunks per runner:  160
minimum qscore:     7
records per file:   4000
num basecallers:    4
gpu device:         cuda:0
kernel path:        
runners per device: 16
Found 16 fast5 files to process.
Init time: 696 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 101781 ms, Samples called: 1912424322, samples/s: 1.87896e+07
Finishing up any open output files.
Basecalling completed successfully.

HAC model run

This basecalling run performs high accuracy calling. I was actually very surprised with the speed of the GPU that generated this output (Nvida T4). I feel it would be a decent option if you wanted to turn around a small amount of data using the hac model.

The below code block will perform hac:

%%shell
inputPath="gdrive/MyDrive/ONT/example_data"
outputPath="gdrive/MyDrive/ONT/example_data"
guppy_bc=./ont-guppy/bin/guppy_basecaller                               # set guppy_basecaller binary location
guppy_cfg_fast=./ont-guppy/data/dna_r9.4.1_450bps_fast.cfg              # fast model calling
guppy_cfg_hac=./ont-guppy/data/dna_r9.4.1_450bps_hac.cfg                # high accuracy calling
guppy_cfg_mod=./ont-guppy/data/dna_r9.4.1_450bps_modbases_5mc_hac.cfg   # base modification calling

$guppy_bc -i $inputPath -s $outputPath  \
--recursive \
--config $guppy_cfg_hac \
--gpu_runners_per_device 16 \
--cpu_threads_per_caller 2 \
--device cuda:0
ONT Guppy basecalling software version 4.5.3+0ab5ebb
config file:        ./ont-guppy/data/dna_r9.4.1_450bps_hac.cfg
model file:         /content/ont-guppy/data/template_r9.4.1_450bps_hac.jsn
input path:         gdrive/MyDrive/ONT/example_data
save path:          gdrive/MyDrive/ONT/example_data
chunk size:         2000
chunks per runner:  512
minimum qscore:     9
records per file:   4000
num basecallers:    4
gpu device:         cuda:0
kernel path:        
runners per device: 16
Found 16 fast5 files to process.
Init time: 1864 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 409705 ms, Samples called: 1904252640, samples/s: 4.64786e+06
Finishing up any open output files.
Basecalling completed successfully.

Modified base run

If you are interested in exploring base modifications then you can provide the appropriate model configuration file and let it run. Again I was quite surprised by the speed in this cloud instance using an Nvidia T4 - good stuff.

Run the below code block for base modification enabled calling:

%%shell
inputPath="gdrive/MyDrive/ONT/example_data"
outputPath="gdrive/MyDrive/ONT/example_data"
guppy_bc=./ont-guppy/bin/guppy_basecaller                               # set guppy_basecaller binary location
guppy_cfg_fast=./ont-guppy/data/dna_r9.4.1_450bps_fast.cfg              # fast model calling
guppy_cfg_hac=./ont-guppy/data/dna_r9.4.1_450bps_hac.cfg                # high accuracy calling
guppy_cfg_mod=./ont-guppy/data/dna_r9.4.1_450bps_modbases_5mc_hac.cfg   # base modification calling

$guppy_bc -i $inputPath -s $outputPath  \
--recursive \
--config $guppy_cfg_mod \
--gpu_runners_per_device 16 \
--cpu_threads_per_caller 2 \
--device cuda:0
ONT Guppy basecalling software version 4.5.3+0ab5ebb
config file:        ./ont-guppy/data/dna_r9.4.1_450bps_modbases_5mc_hac.cfg
model file:         /content/ont-guppy/data/template_r9.4.1_450bps_modbases_5mc_hac.jsn
input path:         gdrive/MyDrive/ONT/example_data
save path:          gdrive/MyDrive/ONT/example_data
chunk size:         2000
chunks per runner:  512
minimum qscore:     9
records per file:   4000
num basecallers:    4
gpu device:         cuda:0
kernel path:        
runners per device: 16
Found 16 fast5 files to process.
Init time: 1820 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 406350 ms, Samples called: 1904252640, samples/s: 4.68624e+06
Finishing up any open output files.
Basecalling completed successfully.

Final thoughts

Well that is really all there is to it, cloud based GPU accelerated basecalling using the free tier in Google Colab is not just possible, it's actually quite usable! Again a massive thanks to Jürgen Hench who put in all the hard work and created the initial post explaining that this was a possibility.

Moving forward it would be interestinig to see how paid tiers perform, the Pro version of Google Colab is only 9.99 USD per month and can be cancelled anytime. I might clock up a month or two and try to do a little benchmarking. It would also be very useful to examine other cloud based options, i.e. AWS with GPU enabled instances. The prices of instances with decent GPUs available in them is dropping rather quickly, which is quite exciting.

Happy GPU basecalling everyone!

@juhench
Copy link

juhench commented Apr 14, 2021

@sirselim: Thanks a lot for wrapping this up to nicely! Well done!

@sirselim
Copy link
Author

@juhench - no worries at all, thanks for providing all the background and testing. There is already one person in the ONT community forum that is using it to get GPU base calling because she doesn't have access to anything else - so in my mind that's already a very worthwhile exercise.

@juhench
Copy link

juhench commented Apr 14, 2021

A short note on checking the model of the currently supplied GPU in your colab session:
Create a cell that says
!nvidia-smi
This will output comprehensive GPU information.
I always include this one-liner in the first cell of a GPU playground on colab. It prevents you from surprises.

One thing that I find useful with colab is the possibility to create a working pipeline template to share it with someone and to test your own software deployment scripts. Of course, one could alternatively create virtual machines but it is much less tedious with colab. It is also a good way to have your students play around with guppy-GPU and experience the speed, i.e why it is worth all the hassle with CUDA etc. Too bad there is no JetsonAGX-Colab :) (yet?)

@sirselim
Copy link
Author

!nvidia-smi is a nice idea, I was using:

import torch
torch.cuda.get_device_name(0)   # this will tell you the name/model of the GPU

@husamia
Copy link

husamia commented Apr 15, 2021

I tested the latest guppy version with the latest bonito model which takes the longest to run and it worked.

ONT Guppy basecalling software version 4.5.3+0ab5ebb
config file:        ./ont-guppy/data/res_dna_r941_min_crf_v032.cfg
model file:         /content/ont-guppy/data/res_dna_r941_min_crf_v032.jsn
input path:         gdrive/MyDrive/ONT/
save path:          gdrive/MyDrive/ONT/
chunk size:         720
chunks per runner:  320
minimum qscore:     7
records per file:   4000
num basecallers:    4
gpu device:         cuda:0
kernel path:        
runners per device: 16
Found 2 fast5 files to process.
Init time: 11653 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 602385 ms, Samples called: 537848519, samples/s: 892865

@sirselim
Copy link
Author

@husamia - very nice! Which of the GPUs was this using? Did you provide modifications to any of the parameters?

@husamia
Copy link

husamia commented Apr 16, 2021

I got the free Tesla. I downloaded the model file from rerio and copied it in the data folder. Other than that everything is the same.

@ramongallego
Copy link

Works like a charm - Tried with the new guppy5 and the sup model for a 10.3 flowcell, and it processed ~ 1M samples/sec. Can I ask what does the --device CUDA: 0 do?

@sirselim
Copy link
Author

@ramongallego - great to hear!

The device argument tells guppy which GPU to use. If there are multiple GPUs in a system you can select any combination of them, i.e. 0 is always the default for the first GPU, 1 would be the second, 2 the third etc. So you can address them manually with the '--device' argument, or it can also be set to 'auto' and Guppy will try and locate the cards automatically.

@miquelupf
Copy link

Dear Sirselim,

I used your scripts to basecall different large datasets, I estimate around 200 Gigabytes of fast5 files. This would have taken about a month of basecalling on MinION Mk1C with the high accuracy method.
I have no idea of programing but it was very straightforward to follow and succeed.
I see a limitation on the google colab disconnecting due idle and also the maximum runtime allowed, which the Pro version improves slightly but the big difference comes with Pro+
With Tesla T4 GPU I could basecall 1260 fast5 files in 6h20', which compared to the MinION is extremelly fast

Thanks a lot for your service to the community!

nanopore

@vebaev
Copy link

vebaev commented Jun 3, 2023

Really works like charm, comparing to the 60 CPU cores on my server it reaches 10% of my data for a 24h, and on T4 it was like 50min!
I setuped Amphetamine on my Mac in order not to sleep and simulate mouse movements, let see hope it will not disconnect as it will take like 7-8h probably…

After 6h (70%) I have been disconnect saying I have reached GPU limit and I cannot run and --resume basecalling it probably have cool down time....

@C-young-maker
Copy link

C-young-maker commented Aug 19, 2023

It worked so well!! I got lucky with the A100 :) I was processing 300 GB of data sequenced using the P2 (ONT)
I also would recommend doing this so it doesn't disconnect while waiting for your run to finish:

Set a JavaScript interval to click on the connect button every 60 seconds.

Open developer-settings (in your web-browser) with Ctrl+Shift+I then click on console tab and type this on the console prompt. (for mac press Option+Command+I)

function ClickConnect(){
document.querySelector("colab-connect-button").click()
console.log("Clicked on connect button");
}
setInterval(ClickConnect,60000)

https://stackoverflow.com/questions/57113226/how-can-i-prevent-google-colab-from-disconnecting

@russellsmithies
Copy link

Hi Miles, are you going to start on dorado benchmarking soon?
:-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment