Commonly used commands in high performance cluster computing with Slurm workload manager
SLURM is one of the most common job scheduler used in many high performance cluster computing severs (HPC). Here, I summarize useful SLURM commands.
Example usage of SLURM commands
sbatch -C "CPU_GEN:HSW|CPU_GEN:BDW|CPU_GEN:SKX":
- AVX2 support started with Haswell on Intel CPUs. To target any CPU generation after Haswell, we can specify constraits assuming that the cluster admin configured those properties in each node.
scontrol hold <job ID>
scontrol release <job ID>
Update submitted jobs
scontrol update job <job ID> JobName="newJobName"
scontrol update job <job ID> Dependency=afterany:<previous job ID>
scontrol update job <job ID> partition=<job partitions>
scontrol update job <job ID> ExcNodeList=<node name list>
scontrol update job <job ID> MinMemoryCPU=8000
scontrol update job <job ID> TimeLimit=7-00:00:00
scontrol update job <job ID> ArrayTaskThrottle=0
If your job is already running, you cannot update the resource requirements.
scontrol show job <job ID>
seff <job ID>
sacct -j <job ID>: show time and memory usage
MaxRSSis the memory usage
sacct -l -j <job ID>
sacct --starttime 2020-08-01 --
- This is useful when checking the job history. Please see the UBCCR page in reference for more details.