Commonly used commands in high performance cluster computing with Slurm workload manager

1 minute read

Published:

SLURM is one of the most common job scheduler used in many high performance cluster computing severs (HPC). Here, I summarize useful SLURM commands.

Example usage of SLURM commands

Job submission

  • sbatch -C "CPU_GEN:HSW|CPU_GEN:BDW|CPU_GEN:SKX":
    • AVX2 support started with Haswell on Intel CPUs. To target any CPU generation after Haswell, we can specify constraits assuming that the cluster admin configured those properties in each node.
  • scontrol hold <job ID>
  • scontrol release <job ID>

Update submitted jobs

  • scontrol update job <job ID> JobName="newJobName"
  • scontrol update job <job ID> Dependency=afterany:<previous job ID>
  • scontrol update job <job ID> partition=<job partitions>
  • scontrol update job <job ID> ExcNodeList=<node name list>
  • scontrol update job <job ID> MinMemoryCPU=8000
  • scontrol update job <job ID> TimeLimit=7-00:00:00
  • scontrol update job <job ID> ArrayTaskThrottle=0

If your job is already running, you cannot update the resource requirements.

Resource usage

  • scontrol show job <job ID>
  • seff <job ID>
  • sacct -j <job ID>: show time and memory usage
    • MaxRSS is the memory usage
    • sacct -l -j <job ID>
  • sacct --starttime 2020-08-01 --

References