Skip to content

Instantly share code, notes, and snippets.

Prom is 36 node DGX based Slurm cluster. There are three main partitions:

  • main/batch: max 4 nodes per user
  • bigjob partition: max 16 nodes per user
  • backfill partition: no limits, but jobs are lower priority and pre-emptible

Below are two scripts: dask-scheduler.script and dask-cuda-worker.script . For the interactive workflows I think we should do the following:

  1. Allocate a node for interactive use: salloc -N1 bash -- this will allocate a node we can ssh into (the client)
  2. start scheduler and set of dask-cuda workers: sbatch dask-scheduler.script -- scheduler on main/batch partition
@pentschev
pentschev / pynvml_query_memory.py
Last active April 28, 2020 08:22
Query used GPU memory with pynvml
import datetime
import getopt
import os
import sys
import time
import pynvml
def get_printable_util_mem(dev_count, peak_mem):
@jrhemstad
jrhemstad / ninja_instructions.md
Last active May 27, 2024 07:34
How to build with Ninja

How to Use Ninja

  1. Install Ninja
sudo apt install ninja-build
  1. Configure CMake to create Ninja build files
mkdir build && cd build