Quick-and-dirty knowledge base for ODU RCS.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2.3 KiB

Examining CPU & Memory Utilization of SLURM Jobs

Question: How to see memory usage of a SLURM job?

Answer:

It depends on whether the job is still running or has been completed / terminated. If the job is still running, one can check the "instantaneous" (current) memory usage by ssh-ing to the node used by the job and examine the memory usage by invoking the top command.

If the SLURM job has been completed, SLURM holds some statistics concerning completed job including the memory usage. It can be found by querying SLURM using the sacct command. Please see here:

https://slurm.schedmd.com/sacct.html

The --format and --units flags are what you want to explore. The --format flag can be used to specify which fields to display. The following fields are most relevant":

AveRSS
MaxRSS
MaxRSSNode
MaxRSSTask
AveVMSize
MaxVMSize
MaxVMSizeNode
MaxVMSizeTask

They define the average, maximum, and the node that has the maximum memory usage. VMSize refers the amount of (virtual) memory allocated, but generally not all the allocated memory will be used in practice. RSS is the amount of memory actually occupied and used by the program. We typically want to pay attention to the RSS. Other SLURM accounting fields can be seen in sacct's mannual page, subheading JOB ACCOUNTING FIELDS.

Below I provide an example (by way of a bash script) of the invocation of sacct to return several fields of interest:

#!/bin/bash

FIELDS_PRESET1=JobID,JobIDRaw,JobName%-20,User,Partition,Elapsed%12,NCPUS%5,CPUTime%12,TotalCPU%12,AveRSS,MaxRSS.Submit,Start,End,State,NodeList%20

filter_sacct_only_main_jobs () {
    awk '($2 ~ /^[0-9]+$/) || (FNR <= 2) { print }'
}

sacct -o $FIELDS_PRESET1 "$@" \
   | filter_sacct_only_main_jobs

A small auxiliary AWK filtering script is used above to select only the primary SLURM record. You can bypass that filtering to return all the subrecords, which will expose the resource utilization for each programs / steps launched in the job script. The subrecords vary depending on the jobs; but an MPI parallel job will usually has at minimum a subrecord corresponding to the invocation of that MPI program, which is what is useful.