Skip to content

Instantly share code, notes, and snippets.


Sean Smith sean-smith

View GitHub Profile
sean-smith /
Last active February 14, 2023 20:01
BE CAREFUL. This removes buckets with the prefix you specify.
# Usage: bash bucket1
for bucket in $(aws s3 ls | grep $1 | awk '{ print $3}'); do
echo "Deleting ${bucket}..."
aws s3 rm --recursive s3://${bucket};
aws s3 rb --force s3://${bucket};
sean-smith /
Created December 21, 2022 02:34
Remove stuck licenses from LSTC License Manager

LSTC Remove Stuck Licenses

  1. Check status of licenses with lstc_qrun:
$ ./lstc_qrun
Defaulting to server 1 specified by LSTC_LICENSE_SERVER variable

                     Running Programs
sean-smith /
Last active June 21, 2022 23:54
Launch instances with AWS ParallelCluster All-or-nothing scaling

Enable All-or-Nothing Scaling with AWS ParallelCluster

All or nothing scaling is useful when you need to run MPI jobs that can't start until all N instances have joined the cluster.

The way Slurm launches instances is in a best-effort fashion, i.e. if you request 10 instances but it can only get 9, it'll provision 9 then keep trying to get the last instance. This incurs cost for jobs that need all 10 instances before starting.

For example, if you submit a job like:

sbatch -N 10 ...
sean-smith /
Created June 7, 2022 14:46
Install older intel mpi versions

Intel MPI Versions

IntelMPI 2018.2

Download and install

tar -xzf l_mpi_2018.2.199.tgz
cd l_mpi_2018.2.199/
sudo ./
sean-smith /
Last active May 25, 2022 02:13
Setup Slurm Accounting with AWS ParallelCluster

Slurm Accounting with AWS ParallelCluster

In this tutorial we will work through setting up Slurm Accounting. This enables many features within slurm, including job resource tracking and providing a necessary building block to slurm federation.

Step 1 - Setup External Accounting Database


Dynamic Filesystems with AWS ParallelCluster

You can dynamically create a filesystem per-job, this is useful for jobs that require a fast filesystem but don't want to pay to have the filesystem running 24/7. It's also useful to create a filesystem per-job.

In order to accomplish this without wasting time waiting for the filesystem to create (~15 mins), we've seperated this into three seperate jobs:

  1. Create filesystem, only needs a single EC2 instance to run, can be run on head node. Takes 8-15 minutes.
  2. Start job, this first mounts the filesystem before executing the job.
  3. Delete filesystem
sean-smith /
Last active May 15, 2022 18:33
Start CUDA MPS Server on each node

👾 Slurm CUDA MPS Prolog

The following Slurm Prolog starts the CUDA MPS server on each compute node before the job is started.

cat << EOF > /opt/slurm/etc/

# start mps
nvidia-cuda-mps-control -d

🚀 Wifi


So naturally the first thing I wanted to do when we got fiber internet was to rename the wifi network to something sexier than "CenturyLink0483". I decided on 🚀.

To do so I navigated to the router setup page at, cringing with all the 90's tech it employs.

Then I added 🚀 and tried to update.

sean-smith /
Last active April 29, 2022 22:35
Setup user and project level tags in AWS ParallelCluster

AWS ParallelCluster Cost Explorer Tags

In a previous gist we discussed using cost explorer to see how much a cluster costs at the instance type and cluster level.

This gist describes how to get cost at the Project and User level.


1. Create a Policy


Slurm Failover from Spot to On-Demand

In AWS ParallelCluster you can setup a cluster with two queues, one for Spot pricing and one for On-demand. When a job fails, due to a spot reclaimation, you can automatically requeue that job to OnDemand.

To set that up, first create a cluster with a Spot and OnDemand queue:

- Name: od
      - Name: c6i-od-c6i32xlarge