Commonly used commands in high performance cluster computing with Slurm workload manager
Published:
SLURM is one of the most common job scheduler used in many high performance cluster computing severs (HPC). Here, I summarize useful SLURM commands.
sbatch -C "CPU_GEN:HSW|CPU_GEN:BDW|CPU_GEN:SKX"
:- AVX2 support started with Haswell on Intel CPUs. To target any CPU generation after Haswell, we can specify constraits assuming that the cluster admin configured those properties in each node.
scontrol hold <job ID>
scontrol release <job ID>
scontrol update job <job ID> Dependency=afterany:<previous job ID>
scontrol update job <job ID> JobName="newJobName"
scontrol update job <job ID> partition=<job partitions>
scontrol update job <job ID> MinMemoryCPU=8000
scontrol update job <job ID> TimeLimit=7-00:00:00
- You cannot update the time limit of the resource requirements once the job gets started.
scontrol show job <job ID>
sacct --starttime 2020-08-01 --
- This is useful when checking the job history. Please see the UBCCR page in reference for more details.
sacct -l -j <job ID>
: show time and memory usage