Commonly used commands in high performance cluster computing with Slurm workload manager
Published:
SLURM is one of the most common job scheduler used in many high performance cluster computing severs (HPC). Here, I summarize useful SLURM commands.
Example usage of SLURM commands
Job submission
sbatch -C "CPU_GEN:HSW|CPU_GEN:BDW|CPU_GEN:SKX"
:- AVX2 support started with Haswell on Intel CPUs. To target any CPU generation after Haswell, we can specify constraits assuming that the cluster admin configured those properties in each node.
scontrol hold <job ID>
scontrol release <job ID>
Update submitted jobs
scontrol update job <job ID> JobName="newJobName"
scontrol update job <job ID> Dependency=afterany:<previous job ID>
scontrol update job <job ID> partition=<job partitions>
scontrol update job <job ID> ExcNodeList=<node name list>
scontrol update job <job ID> MinMemoryCPU=8000
scontrol update job <job ID> TimeLimit=7-00:00:00
scontrol update job <job ID> ArrayTaskThrottle=0
If your job is already running, you cannot update the resource requirements.
Resource usage
scontrol show job <job ID>
seff <job ID>
sacct -j <job ID>
: show time and memory usageMaxRSS
is the memory usagesacct -l -j <job ID>
sacct --starttime 2020-08-01 --
- This is useful when checking the job history. Please see the UBCCR page in reference for more details.