Created
June 23, 2020 13:59
-
-
Save hungyiwu/444d1e8baeccfa58b133b141f5c1a7d6 to your computer and use it in GitHub Desktop.
Handy one-line command to get SLURM job array task IDs timed out for re-run
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# Use case: | |
# You ran a job array in SLURM by something like `sbatch --array=1-300 run.sh` | |
# ...checked the error logs by `tail -n 1 *.err` and saw many got timed-out | |
# ...would like to re-run tasks with higher time quota but first need to know their task IDs | |
# ...instead of jotting down the IDs by scanning the terminal with your eyes, | |
# or writting yet-another Python script to parse the error logs, | |
# you can use this one-line command | |
sacct -j [JOB-ID] -s to --brief\ | |
| grep TIMEOUT\ | |
| cut -d ' ' -f 1\ | |
| cut -d '_' -f 2\ | |
| paste -sd ',' | |
# Explanation: | |
# sacct: `-j [JOB-ID]` filters by job ID, `-s to` filters by job state (`to` for TimeOut) | |
# `--brief` gives cleaner output | |
# grep: not sure why but `sacct -s to` also gives lines with state `CANCELLED`, so add another filter here | |
# at this point the output will look like | |
# ``` | |
# 11166431_247 TIMEOUT 0:0 | |
# 11166431_249 TIMEOUT 0:0 | |
# ``` | |
# cut: `-d ' '` splits each line by delimiter of space and `-f 1` keeps only the first field | |
# this gives | |
# ``` | |
# 11166431_247 | |
# 11166431_249 | |
# ``` | |
# `-d '_' -f 2` splits each line by an underscore and keeps only the second field | |
# this gives | |
# ``` | |
# 247 | |
# 249 | |
# ``` | |
# paste: `-s` combines all lines into one line, and `-d ','` inserts a comma as the delimiter | |
# this gives | |
# ``` | |
# 247,249 | |
# ``` | |
# Now you can change the time limit in the original job script (ex. `run.sh`) and copy-paste that task IDs for re-run | |
# something like this `sbatch --array=247,249 run.sh` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment