From fa6d73a84272b79b0ce1ee748610276c471e2474 Mon Sep 17 00:00:00 2001 From: Wirawan Purwanto Date: Tue, 5 Dec 2023 19:52:57 -0500 Subject: [PATCH] * Added a Q&A note on how to examine a SLURM job memory utilization. --- slurm/20231205.Examine-CPU-memory-SLURM.md | 47 ++++++++++++++++++++++ 1 file changed, 47 insertions(+) create mode 100644 slurm/20231205.Examine-CPU-memory-SLURM.md diff --git a/slurm/20231205.Examine-CPU-memory-SLURM.md b/slurm/20231205.Examine-CPU-memory-SLURM.md new file mode 100644 index 0000000..dbe2092 --- /dev/null +++ b/slurm/20231205.Examine-CPU-memory-SLURM.md @@ -0,0 +1,47 @@ +Examining CPU & Memory Utilization of SLURM Jobs +================================================ + + + +## Question: How to see memory usage of a SLURM job? + +## Answer: + +It depends on whether the job is still running or has been completed / terminated. +If the job is still running, one can check the "instantaneous" (current) memory usage by ssh-ing to the node used by the job and examine the memory usage by invoking the `top` command. + +If the SLURM job has been completed, SLURM holds some statistics concerning completed job including the memory usage. It can be found by querying SLURM using the `sacct` command. Please see here: + +https://slurm.schedmd.com/sacct.html + +The `--format` and `--units` flags are what you want to explore. The `--format` flag can be used to specify which fields to display. The following fields are most relevant": + +``` +AveRSS +MaxRSS +MaxRSSNode +MaxRSSTask +AveVMSize +MaxVMSize +MaxVMSizeNode +MaxVMSizeTask +``` + +They define the average, maximum, and the node that has the maximum memory usage. `VMSize` refers the amount of (virtual) memory allocated, but generally not all the allocated memory will be used in practice. `RSS` is the amount of memory *actually* occupied and used by the program. We typically want to pay attention to the RSS. Other SLURM accounting fields can be seen in `sacct`'s mannual page, subheading [JOB ACCOUNTING FIELDS](https://slurm.schedmd.com/sacct.html#lbAF). + +Below I provide an example (by way of a bash script) of the invocation of sacct to return several fields of interest: + +``` +#!/bin/bash + +FIELDS_PRESET1=JobID,JobIDRaw,JobName%-20,User,Partition,Elapsed%12,NCPUS%5,CPUTime%12,TotalCPU%12,AveRSS,MaxRSS.Submit,Start,End,State,NodeList%20 + +filter_sacct_only_main_jobs () { + awk '($2 ~ /^[0-9]+$/) || (FNR <= 2) { print }' +} + +sacct -o $FIELDS_PRESET1 "$@" \ + | filter_sacct_only_main_jobs +``` + +A small auxiliary AWK filtering script is used above to select only the primary SLURM record. You can bypass that filtering to return all the subrecords, which will expose the resource utilization for each programs / steps launched in the job script. The subrecords vary depending on the jobs; but an MPI parallel job will usually has at minimum a subrecord corresponding to the invocation of that MPI program, which is what is useful.