SLURM ACCOUNTING (sacct) ======================== Created: April 2019
Updated: November 2019 **CAVEAT:** This document was originally developed by referencing SLURM 18.08.1 used on Turing. I also tried to consult the newer version (master branch around July 2019). Newer version may introduce additional features, or features incompatible with this version. Please use a grain of salt when reading, and always consult with manual pages, source code, etc in case of doubt. *Update 2019-11-06*: SLURM man page now contains the description of the accounting fields. Please look at . UNDERSTANDING SLURM ACCOUNTING FIELDS ------------------------------------- SLURM accounting can produce very many fields. `JobID`: The "cooked" job ID. Please see the discussion below. `JobIDRaw`: The "raw" job ID. In a vast majority of cases, the `JobIDRaw` field is identical to `JobID` except in the case of array jobs. Please see the discussion below. `TimelimitRaw`: The raw value of time limit, in minutes. ### About SLURM Job IDs SLURM produces one or more records in the accounting database for every job. When a user submits a job to SLURM, SLURM assigns that job a unique job number, like this: $ sbatch calculation.job Submitted batch job 8918299 However, internally within SLURM, there can be one or more "job steps" created and executed while this job is being launched and executed. (Things get more even complicated with newer "heterogenous job" feature, in which various parts of a job can require very different resources. See [this documentation](https://slurm.schedmd.com/heterogeneous_jobs.html) for more information.) The combination of all the job steps constitute the entire job. Each job step generates its own record in the SLURM accounting database. #### Summary on Job ID A single SLURM job will generate the "master record" which logs the overall execution of the job. In addition, there can be zero or more extra records generated by the "job steps" triggered during the course of that job. The master record includes the resource utilization usage (CPU, memory, etc) of the child "job steps". The master job record is characterized by a plain number in the `JobIDRaw` field. Further, the `User` field must not be empty. The rest of this section goes into greater detail of the various `JobID`'s. #### Observed Job ID Patterns Several regex patterns have observed in the JobID field (from Turing accounting): * `[0-9]+` * `[0-9]+_[0-9]+` (for job arrays) * `[0-9]+\.[0-9]+` * `[0-9]+\.batch` For all cases, the `JobIDRaw` is the same as `JobID` except in the case of `/[0-9]+_[0-9]+/`, where the `JobIDRaw` is a running number `[0-9]+`. This is the case where the submitter specifies an array of jobs. From the slurm's sacct source code (`src/sacct/print.c`) one can find that there are other patterns too (look for string `case PRINT_JOBIDRAW:`). The key function in `print.c` is `print_fields`. In particular look at the lengthy `case` statement where it tackles `PRINT_JOBID` and `PRINT_JOBIDRAW` cases. A job can be of different types: * `JOB` * `JOBSTEP` * `JOBCOMP` A `JOBSTEP` can have several subtypes: * `SLURM_BATCH_SCRIPT`, in which case JobIDRaw will obtain the `.batch` suffix. * `SLURM_EXTERN_CONT`, in which case JobIDRaw will obtain the `.extern` suffix. Apparently, this is meant to indicate "external" type of job steps, described further below. * many others; but in this case, it will print JobIDRaw in `[0-9]+\.[0-9]+` pattern * Other types (usually it will have index numbers like 0, 1, 2, ...) #### Vanilla Job A "vanilla" job entry corresponds to a single job submitted by a user to SLURM. This will not be a job array. * Regexp match : `JobID ~ /^[0-9]+$/`. From my observation, only simple single-core jobs that do not involve any MPI or other fancy stuff (no job array, for example) would not generate extra "child records" for job steps in the SLURM accounting database. However, several job records with this type JobID will have no "User" field set. These are also not vanilla jobs. #### Array Job An "array" job entry corresponds to a single job as part of a job array submitted by a user to SLURM. * Regexp match : `JobID ~ /^[0-9]+_[0-9]+$/`. The Job ID contains two numbers separated by an underscore. The number before the underscore refers to the job ID as reported by sbatch upon the submission of the job. NOTE: Newer version of SLURM will allow textual word instead of numbers to identify one job in an array. Those text-based job label (instead of integer) will be marked by square brackets around the job suffix: * Characteristics (textual array label): `JobID ~ /^[0-9]+_\[.*\]+$/`. #### Heterogenous Job A heterogenous job entry corresponds to a part of a heterogenous job submitted by a user to SLURM. * Regexp match: `JobID ~ /^[0-9]+\+[0-9]+$/`. The Job ID contains two numbers separated by a plus sign. The number before the underscore refers to the job ID as reported by sbatch upon the submission of the job. This will not be a job array. #### Job Step: Batch script This corresponds to the execution of the batch script (submitted to sbatch) when more than one CPU cores were requested by the job. Characteristics of SLURM_BATCH_SCRIPT accounting records: * Regexp match: `JobIDRaw ~ /^[0-9]+\.batch$/` * The record does NOT have user ID (field `User`) * `JobName` is always `batch` #### Job Step: External SLURM_EXTERN_CONT apparently is a way to account for "external processes". It is still not 100% obvious what this means, but from reading the source code, there are two types of stuff that will fall under this category: * Job prologue * Direct SSH access into an allocated compute node: in this case, the `pam_adopt_slurm` module will make the determination as to which SLURM job launches the ssh (if any) and attribute the portion of this computation to the calling job. There were some other steps observed, whose JobIDRaw becomes `NNNNN.N`. I wonder if these "job steps" are due to the calls of "srun" within the batch script, because the job names are indicative: `pw.x`, `pmi_proxy`, etc.. (Example job: 5947279 , Nov 2018.) #### Job Step: All the others These correspond to job steps that were launched by `srun` or other similar mechanism instead the job script. A prime example is the `mpirun` launch, which will record a new job step. #### Job Completion `JOBCOMP` appears to mark a job completion. Not sure if this kind of record appears on Turing accounting; that may be only when a specific "job completion" task is specified. #### Questions & (Possible) Answers * Why there is a separate "NNNNN.batch" record? Perhaps, this record was made when the job is multi-node. It appears to me that the ".batch" record is for accounting the batch script itself (which will run only on node #0 of the allocated resources). #### The Takeaway Why all this complicated explanation? My original goal was to find the accounting records which covers the whole-job statistics without getting bogged down by the minute details of each job. This is what I found after this exploration: > We only need to include accounting records where the `JobIDRaw` field > contains only whole integers (i.e. matching regex `^[0-9]+$`). > Further, ## References - `sacct` manual page: SLURM administrator's documentation contains helpful bits and pieces to decipher the accounting records; unfortunately in themselves they are not sufficient. - Accounting: . - Job Launch design guide: . > This guide describes at a high level the processes which occur in > order to initiate a job including the daemons and plugins involved > in the process. It describes the process of job allocation, step > allocation, task launch and job termination. In SLURM, launching a job is a multistep process. Various "job steps" described this guide eventually make their own entries in the SLURM accounting database. ### Working Notes These are my private working notes: - daily-notes/2019/20190326.slurm-acct.txt - daily-notes/2019/20190411.slurm-acct.txt - daily-notes/2019/20190430.slurm-acct-201811.txt - docs/kb/turing-slurm/20180106.SLURM-accounting.txt