Skip to content

Instantly share code, notes, and snippets.

@jvfe
Forked from iaradsouza1/Nextflow_on_NPAD_UFRN.md
Last active May 28, 2024 17:14
Show Gist options
  • Save jvfe/7af8f3b6ec76364a1fd8ab96ff9315a8 to your computer and use it in GitHub Desktop.
Save jvfe/7af8f3b6ec76364a1fd8ab96ff9315a8 to your computer and use it in GitHub Desktop.
Run Nextflow pipelines and RStudio at NPAD local cluster

This a basic guideline to set some basic configurations in order to run nextflow pipelines at the NPAD.

First, I'm assuming that your analysis will take more than 9 GB (your home quota), then you'll need to use the scratch disk to store the results and the singularity images.

0. Install conda and nf-core

Here I recomend to use conda to install the nextflow and nf-core.

Download and install conda: https://docs.conda.io/en/latest/miniconda.html

Install nf-core with conda: https://nf-co.re/docs/usage/installation#bioconda-installation

Make sure to install them to your scratch directory.

1. Create a directory to be used as singularity cache in your scratch

Complex pipelines will need several singularity images. You need to download them beforehand into a cache directory. Create a directory at the scratch:

mkdir /scratch/iddsouza/singularity_images
mkdir /scratch/iddsouza/singularity_images/cache

2. Export env variables to your local profile

Add the following variables to your ~/.bashrc file. Also, call the singularity module by default (then you won't forget to call it when you run the pipeline)

export SINGULARITY_CACHEDIR='/home/iddsouza/scratch/singularity_images/cache'
export NXF_SINGULARITY_CACHEDIR='/home/iddsouza/scratch/singularity_images'
module load singularity

After modifying the ~/.bashrc file, restart your session by source ~/.bashrc.

3. Download the singularity images

After creating the directories, download the images. This is an example for the scrnaseq pipeline, version 2.4.0. Change this line to your pipeline:

nf-core download --container-system singularity --container-cache-utilisation amend -r 2.4.0 -p 5 nf-core/scrnaseq

4. Change the ~/.nextflow/config file

If this file is not present at your ~/.nextflow/ directory, create it.

This is an example of my ~/.nextflow/config file:

singularity{
  autoMounts = true
}

process {
  executor = 'slurm' 
  queue = 'intel-512'
  queueSize = 30 
  submitRateLimit = '10/1min'
  clusterOptions = '--qos=preempt'
  errorStrategy = 'retry'
  maxRetries = 3
  time = '24h'
  errorStrategy = { task.exitStatus in [125,139] ? 'retry' : 'finish' }
}


The required params are singularity{ autoMounts = true } and process { clusterOptions = '--qos=preempt' executor = 'slurm' }. This ensures that the nextflow handles the destination of each process to the processing nodes.

5. Create a screen to run the pipeline

Organize the files needed for your run at the local scratch. After that, create a screen to run the pipeline, with screen -S analysis, for example. In the screen, activate the nf-core environment.

If you prefer to use the nextflow tower, set the token to the ~/.bashrc file. (https://help.tower.nf/22.2/getting-started/usage/#nextflow-with-tower)

Don't forget to place the --outdir in the scratch.

nextflow run nf-core/scrnaseq -profile test,singularity --outdir scratch/scrnaseq_results

Alguns avisos importantes:

  • Se você não tem conta no supercomputador, fale com o seu orientador para que ele possa solicitar seu acesso no NPAD.

  • O script que disponibilizo em anexo por padrão irá alocar aleatoriamente um nó do super e de forma exclusiva (#SBATCH --exclusive). Se você não tem a necessidade de usar exclusivamente um nó, remova essa linha do script. Assim poderemos todos usar os recursos do super de forma eficiente.

  • Por padrão o script usa os nós intel-128, que possuem 128GB de RAM (http://npad.ufrn.br/npad/hardware).

  • Caso for executar sem ser em nó exclusivo e tenha alguém usando porta do jupyter ou do próprio RStudio no mesmo nó que irá executar, terá que trocar a porta.

  • Por padrão o script abaixo mantém a sessão do RStudio rodando por até 24 horas.

Instruções

  1. Copie o script run_rstudio.sh (abaixo) para o seu diretório de trabalho no super.

  2. Ative o seu ambiente conda com os pacotes do R:

conda activate intro-single-cell
  1. Rode o script no super:

sbatch run_rstudio.sh

O resultado da execução acima irá gerar um arquivo chamado saida.txt. Ele conterá o comando ssh para gerar um túnel da sua máquina local para a sessão do RStudio no super. No terminal da sua máquina local, digite o comando de saída:

ssh -p 4422 -N -L 8787:${HOSTNAME}:${PORT} $USER@sc2.npad.ufrn.br

Observação: O terminal desse túnel vai ficar travado.

  1. Acesse o navegador com a seguinte porta: http://localhost:8787/

  2. Coloque o nome de usuário do super e a senha (ambas estão no arquivo saida.txt)


Para ver os jobs do seu usuário: squeue -u $USER

Para matar um job: scancel <JOB_ID>

#!/bin/bash
#SBATCH --time=01-00:00:00
#SBATCH --partition=intel-128
#SBATCH --signal=USR2
#SBATCH --exclusive
#SBATCH --output=saida.txt
module load singularity
#diretorio temporario pras coisas do rstudio
basescratch=~/scratch
workdir=$basescratch/rstudio
echo workdir: ${workdir}
mkdir -p -m 700 ${workdir}/run ${workdir}/tmp ${workdir}/var/lib/rstudio-server
cat > ${workdir}/database.conf <<END
provider=sqlite
directory=/var/lib/rstudio-server
END
# Set OMP_NUM_THREADS to prevent OpenBLAS (and any other OpenMP-enhanced
# libraries used by R) from spawning more threads than the number of processors
# allocated to the job.
#
# Set R_LIBS_USER to a path specific to rocker/rstudio to avoid conflicts with
# personal libraries from any R installation in the host environment
# Define OMP_NUM_THREADS para o numero de CPUs do nó
# Muda o local de instalação dos pacotes com R_LIBS_USER
cat > ${workdir}/rsession.sh <<END
#!/bin/sh
export OMP_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}
export R_LIBS_USER=${CONDA_PREFIX}/lib/R/library
exec /usr/lib/rstudio-server/bin/rsession "\${@}"
END
chmod +x ${workdir}/rsession.sh
# Aqui embaixo voce insere a pasta do scratch que deseja vincular
# No meu caso ela eh "${basescratch}/intro-single-cell:/home/jvfcavalcante/intro-single-cell"
export SINGULARITY_BIND="${workdir}/run:/run,${basescratch}/intro-single-cell:/home/jvfcavalcante/intro-single-cell,${workdir}/tmp:/tmp,${workdir}/database.conf:/etc/rstudio/database.conf,${workdir}/rsession.sh:/etc/rstudio/rsession.sh,${workdir}/var/lib/rstudio-server:/var/lib/rstudio-server"
#Para não suspender a sessão
export SINGULARITYENV_RSTUDIO_SESSION_TIMEOUT=0
export SINGULARITYENV_USER=$(id -un)
export SINGULARITYENV_PASSWORD=12345
readonly PORT=8787
cat 1>&2 <<END
1. Comando para fazer o túnel SSH da sua máquina local :
ssh -p 4422 -N -L 8787:${HOSTNAME}:${PORT} $USER@sc2.npad.ufrn.br
2. Então abra seu navegador em http://localhost:8787
user: ${SINGULARITYENV_USER}
password: ${SINGULARITYENV_PASSWORD}
END
singularity exec --cleanenv /opt/npad/shared/containers/rstudio-server.sif \
/usr/lib/rstudio-server/bin/rserver --www-port ${PORT} \
--auth-none=0 \
--auth-pam-helper-path=pam-helper \
--auth-stay-signed-in-days=30 \
--auth-timeout-minutes=0 \
--rsession-path=/etc/rstudio/rsession.sh \
--server-user ${SINGULARITYENV_USER}
printf 'rserver exited' 1>&2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment