jvfe/Nextflow_on_NPAD_UFRN.md

## Nextflow_on_NPAD_UFRN.md

      
    Raw
  

              Nextflow_on_NPAD_UFRN.md
            
          
    This a basic guideline to set some basic configurations in order to run nextflow pipelines at the NPAD.
First, I'm assuming that your analysis will take more than 9 GB (your home quota), then you'll need to use the scratch disk to store the results and the singularity images.
0. Install conda and nf-core

Here I recomend to use conda to install the nextflow and nf-core.
Download and install conda: https://docs.conda.io/en/latest/miniconda.html
Install nf-core with conda: https://nf-co.re/docs/usage/installation#bioconda-installation
Make sure to install them to your scratch directory.
1. Create a directory to be used as singularity cache in your scratch

Complex pipelines will need several singularity images. You need to download them beforehand into a cache directory. Create a directory at the scratch:
mkdir /scratch/iddsouza/singularity_images
mkdir /scratch/iddsouza/singularity_images/cache

2. Export env variables to your local profile

Add the following variables to your ~/.bashrc file. Also, call the singularity module by default (then you won't forget to call it when you run the pipeline)
export SINGULARITY_CACHEDIR='/home/iddsouza/scratch/singularity_images/cache'
export NXF_SINGULARITY_CACHEDIR='/home/iddsouza/scratch/singularity_images'
module load singularity

After modifying the ~/.bashrc file, restart your session by source ~/.bashrc.
3. Download the singularity images

After creating the directories, download the images. This is an example for the scrnaseq pipeline, version 2.4.0. Change this line to your pipeline:
nf-core download --container-system singularity --container-cache-utilisation amend -r 2.4.0 -p 5 nf-core/scrnaseq

4. Change the ~/.nextflow/config file

If this file is not present at your  ~/.nextflow/ directory, create it.
This is an example of my ~/.nextflow/config file:
singularity{
  autoMounts = true
}

process {
  executor = 'slurm' 
  queue = 'intel-512'
  queueSize = 30 
  submitRateLimit = '10/1min'
  clusterOptions = '--qos=preempt'
  errorStrategy = 'retry'
  maxRetries = 3
  time = '24h'
  errorStrategy = { task.exitStatus in [125,139] ? 'retry' : 'finish' }
}


The required params are singularity{ autoMounts = true } and process { clusterOptions = '--qos=preempt' executor = 'slurm' }. This ensures that the nextflow handles the destination of each process to the processing nodes.
5. Create a screen to run the pipeline

Organize the files needed for your run at the local scratch. After that, create a screen to run the pipeline, with screen -S analysis, for example. In the screen, activate the nf-core environment.
If you prefer to use the nextflow tower, set the token to the ~/.bashrc file. (https://help.tower.nf/22.2/getting-started/usage/#nextflow-with-tower)
Don't forget to place the --outdir in the scratch.
nextflow run nf-core/scrnaseq -profile test,singularity --outdir scratch/scrnaseq_results

  
## RStudio_on_NPAD_ufrn.md

      
    Raw
  

              RStudio_on_NPAD_ufrn.md
            
          
    Alguns avisos importantes:


Se você não tem conta no supercomputador, fale com o seu orientador para que ele possa solicitar seu acesso no NPAD.


O script que disponibilizo em anexo por padrão irá alocar aleatoriamente um nó do super e de forma exclusiva (#SBATCH --exclusive).
Se você não tem a necessidade de usar exclusivamente um nó, remova essa linha do script.
Assim poderemos todos usar os recursos do super de forma eficiente.


Por padrão o script usa os nós intel-128, que possuem 128GB de RAM (http://npad.ufrn.br/npad/hardware).


Caso for executar sem ser em nó exclusivo e tenha alguém usando porta do jupyter ou do próprio RStudio no mesmo nó que irá executar, terá que trocar a porta.


Por padrão o script abaixo mantém a sessão do RStudio rodando por até 24 horas.


Instruções


Copie o script run_rstudio.sh (abaixo) para o seu diretório de trabalho no super.


Ative o seu ambiente conda com os pacotes do R:


conda activate intro-single-cell

Rode o script no super:

sbatch run_rstudio.sh
O resultado da execução acima irá gerar um arquivo chamado saida.txt.
Ele conterá o comando ssh para gerar um túnel da sua máquina local para a sessão do RStudio no super.
No terminal da sua máquina local, digite o comando de saída:
ssh -p 4422 -N -L 8787:${HOSTNAME}:${PORT} $USER@sc2.npad.ufrn.br
Observação: O terminal desse túnel vai ficar travado.


Acesse o navegador com a seguinte porta: http://localhost:8787/


Coloque o nome de usuário do super e a senha (ambas estão no arquivo saida.txt)


Para ver os jobs do seu usuário: squeue -u $USER
Para matar um job: scancel <JOB_ID>

  
## run_rstudio.sh
#!/bin/bash
#SBATCH --time=01-00:00:00
#SBATCH --partition=intel-128
#SBATCH --signal=USR2
#SBATCH --exclusive
#SBATCH --output=saida.txt

module load singularity

#diretorio temporario pras coisas do rstudio
basescratch=~/scratch
workdir=$basescratch/rstudio

echo workdir: ${workdir}

mkdir -p -m 700 ${workdir}/run ${workdir}/tmp ${workdir}/var/lib/rstudio-server
cat > ${workdir}/database.conf <<END
provider=sqlite
directory=/var/lib/rstudio-server
END

# Set OMP_NUM_THREADS to prevent OpenBLAS (and any other OpenMP-enhanced
# libraries used by R) from spawning more threads than the number of processors
# allocated to the job.
#
# Set R_LIBS_USER to a path specific to rocker/rstudio to avoid conflicts with
# personal libraries from any R installation in the host environment

# Define OMP_NUM_THREADS para o numero de CPUs do nó
# Muda o local de instalação dos pacotes com R_LIBS_USER

cat > ${workdir}/rsession.sh <<END
#!/bin/sh
export OMP_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}
export R_LIBS_USER=${CONDA_PREFIX}/lib/R/library
exec /usr/lib/rstudio-server/bin/rsession "\${@}"
END

chmod +x ${workdir}/rsession.sh

# Aqui embaixo voce insere a pasta do scratch que deseja vincular
# No meu caso ela eh "${basescratch}/intro-single-cell:/home/jvfcavalcante/intro-single-cell"
export SINGULARITY_BIND="${workdir}/run:/run,${basescratch}/intro-single-cell:/home/jvfcavalcante/intro-single-cell,${workdir}/tmp:/tmp,${workdir}/database.conf:/etc/rstudio/database.conf,${workdir}/rsession.sh:/etc/rstudio/rsession.sh,${workdir}/var/lib/rstudio-server:/var/lib/rstudio-server"

#Para não suspender a sessão
export SINGULARITYENV_RSTUDIO_SESSION_TIMEOUT=0
export SINGULARITYENV_USER=$(id -un)
export SINGULARITYENV_PASSWORD=12345

readonly PORT=8787
cat 1>&2 <<END
1. Comando para fazer o túnel SSH da sua máquina local :

   ssh -p 4422 -N -L 8787:${HOSTNAME}:${PORT} $USER@sc2.npad.ufrn.br

2. Então abra seu navegador em http://localhost:8787
   user: ${SINGULARITYENV_USER}
   password: ${SINGULARITYENV_PASSWORD}

END

singularity exec --cleanenv /opt/npad/shared/containers/rstudio-server.sif \
    /usr/lib/rstudio-server/bin/rserver --www-port ${PORT} \
             --auth-none=0 \
            --auth-pam-helper-path=pam-helper \
            --auth-stay-signed-in-days=30 \
            --auth-timeout-minutes=0 \
            --rsession-path=/etc/rstudio/rsession.sh \
           --server-user  ${SINGULARITYENV_USER}

printf 'rserver exited' 1>&2
	#!/bin/bash
	#SBATCH --time=01-00:00:00
	#SBATCH --partition=intel-128
	#SBATCH --signal=USR2
	#SBATCH --exclusive
	#SBATCH --output=saida.txt

	module load singularity

	#diretorio temporario pras coisas do rstudio
	basescratch=~/scratch
	workdir=$basescratch/rstudio

	echo workdir: ${workdir}

	mkdir -p -m 700 ${workdir}/run ${workdir}/tmp ${workdir}/var/lib/rstudio-server
	cat > ${workdir}/database.conf <<END
	provider=sqlite
	directory=/var/lib/rstudio-server
	END

	# Set OMP_NUM_THREADS to prevent OpenBLAS (and any other OpenMP-enhanced
	# libraries used by R) from spawning more threads than the number of processors
	# allocated to the job.
	#
	# Set R_LIBS_USER to a path specific to rocker/rstudio to avoid conflicts with
	# personal libraries from any R installation in the host environment

	# Define OMP_NUM_THREADS para o numero de CPUs do nó
	# Muda o local de instalação dos pacotes com R_LIBS_USER

	cat > ${workdir}/rsession.sh <<END
	#!/bin/sh
	export OMP_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}
	export R_LIBS_USER=${CONDA_PREFIX}/lib/R/library
	exec /usr/lib/rstudio-server/bin/rsession "\${@}"
	END

	chmod +x ${workdir}/rsession.sh

	# Aqui embaixo voce insere a pasta do scratch que deseja vincular
	# No meu caso ela eh "${basescratch}/intro-single-cell:/home/jvfcavalcante/intro-single-cell"
	export SINGULARITY_BIND="${workdir}/run:/run,${basescratch}/intro-single-cell:/home/jvfcavalcante/intro-single-cell,${workdir}/tmp:/tmp,${workdir}/database.conf:/etc/rstudio/database.conf,${workdir}/rsession.sh:/etc/rstudio/rsession.sh,${workdir}/var/lib/rstudio-server:/var/lib/rstudio-server"

	#Para não suspender a sessão
	export SINGULARITYENV_RSTUDIO_SESSION_TIMEOUT=0
	export SINGULARITYENV_USER=$(id -un)
	export SINGULARITYENV_PASSWORD=12345

	readonly PORT=8787
	cat 1>&2 <<END
	1. Comando para fazer o túnel SSH da sua máquina local :

	ssh -p 4422 -N -L 8787:${HOSTNAME}:${PORT} $USER@sc2.npad.ufrn.br

	2. Então abra seu navegador em http://localhost:8787
	user: ${SINGULARITYENV_USER}
	password: ${SINGULARITYENV_PASSWORD}

	END

	singularity exec --cleanenv /opt/npad/shared/containers/rstudio-server.sif \
	/usr/lib/rstudio-server/bin/rserver --www-port ${PORT} \
	--auth-none=0 \
	--auth-pam-helper-path=pam-helper \
	--auth-stay-signed-in-days=30 \
	--auth-timeout-minutes=0 \
	--rsession-path=/etc/rstudio/rsession.sh \
	--server-user ${SINGULARITYENV_USER}

	printf 'rserver exited' 1>&2