Skip to content

Instantly share code, notes, and snippets.

@sean-smith
Created October 17, 2023 17:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sean-smith/97f38b2bbd62dab1d93c2a27f4121267 to your computer and use it in GitHub Desktop.
Save sean-smith/97f38b2bbd62dab1d93c2a27f4121267 to your computer and use it in GitHub Desktop.

Disable Protected Mode in AWS ParallelCluster

If your cluster tries 10 times to launch instances and fails, it'll automatically go into PROTECTED mode. This disables instance provisioning until the compute fleet is restarted.

You'll see inact as the status of the queue when the cluster is in PROTECTED mode:

[ec2-user@ip-10-0-0-98 ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
default*       inact   infinite      2  idle~ spot-dy-compute-[1-100]

How to Disable

To disable it we simply set the protected_failure_count parameter to 0. This is the limit at which it'll go into protected mode. If it's at 0 it's disabled.

sudo su -
echo "protected_failure_count = 0" >> /etc/parallelcluster/slurm_plugin/parallelcluster_clustermgtd.conf

See Protected Mode docs for more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment