* Committed my KB article on SLURM accounting fields.

Last updated: 2019-07-22.
Wirawan Purwanto 5 years ago
commit 0d8b081ac7
  1. 232

@ -0,0 +1,232 @@
This document was originally developed by referencing SLURM
18.08.1 used on Turing.
I also tried to consult the newer version (master branch
around July 2019).
Newer version may introduce additional features, or features
incompatible with this version.
Please use a grain of salt when reading, and always consult with
manual pages, source code, etc in case of doubt.
SLURM accounting can produce very many fields.
The "cooked" job ID. Please see the discussion below.
The "raw" job ID. Please see the discussion below.
The raw value of time limit, in minutes.
### About SLURM Job IDs
SLURM produces one or more records in the accounting database for every job.
When a user submits a job to SLURM, SLURM assigns that job a unique job number,
like this:
$ sbatch calculation.job
Submitted batch job 8918299
However, internally within SLURM, there can be one or more "job steps" created
and executed while this job is being launched and executed.
(Things get more even complicated with newer "heterogenous job" feature,
in which various parts of a job can require very different resources.
See [this documentation](https://slurm.schedmd.com/heterogeneous_jobs.html)
for more information.)
Several regex patterns have observed in the JobID field (from Turing
* `[0-9]+`
* `[0-9]+_[0-9]+` (for job arrays)
* `[0-9]+\.[0-9]+`
* `[0-9]+\.batch`
For all cases, the `JobIDRaw` is the same as `JobID` except in the case of
`/[0-9]+_[0-9]+/`, where the `JobIDRaw` is a running number `[0-9]+`.
This is the case where the submitter specifies an array of jobs.
From the slurm's sacct source code (`src/sacct/print.c`) one can find that there
are other patterns too (look for string `case PRINT_JOBIDRAW:`).
The key function in `print.c` is `print_fields`.
In particular look at the lengthy `case` statement where it tackles
A job can be of different types:
* `JOB`
A `JOBSTEP` can have several subtypes:
* `SLURM_BATCH_SCRIPT`, in which case JobIDRaw will obtain the `.batch` suffix.
* `SLURM_EXTERN_CONT`, in which case JobIDRaw will obtain the `.extern` suffix.
Apparently, this is meant to indicate "external" type of job steps,
* many others; but in this case, it will print JobIDRaw in `[0-9]+\.[0-9]+`
* Other types (usually it will have index numbers like 0, 1, 2, ...)
#### Vanilla Job
A "vanilla" job entry corresponds to a single job submitted by a user to SLURM.
This will not be a job array.
* Characteristics : `JobID ~ /^[0-9]+$/`.
#### Array Job
An "array" job entry corresponds to a single job as part of a job
array submitted by a user to SLURM.
* Characteristics : `JobID ~ /^[0-9]+_[0-9]+$/`.
The Job ID contains two numbers separated by an underscore.
The number before the underscore refers to the job ID as reported by
sbatch upon the submission of the job.
NOTE: Newer version of SLURM will allow textual word instead of
numbers to identify one job in an array.
Those text-based job label (instead of integer) will be marked by
square brackets around the job suffix:
* Characteristics (textual array label): `JobID ~ /^[0-9]+_\[.*\]+$/`.
#### Heterogenous Job
A heterogenous job entry corresponds to a part of a heterogenous job
submitted by a user to SLURM.
* Characteristics : `JobID ~ /^[0-9]+\+[0-9]+$/`.
The Job ID contains two numbers separated by a plus sign.
The number before the underscore refers to the job ID as reported by
sbatch upon the submission of the job.
This will not be a job array.
#### Job Step: Batch script
This corresponds to the execution of the batch script (submitted to
sbatch) when more than one CPU cores were requested by the job.
Characteristics of SLURM_BATCH_SCRIPT accounting records:
* JobIDRaw =~ /^[0-9]+\.batch$/
* The record does NOT have user ID (field `User`)
* `JobName` is always `batch`
#### Job Step: External
SLURM_EXTERN_CONT apparently is a way to account for "external processes".
It is still not 100% obvious what this means, but from reading the
source code, there are two types of stuff that will fall under this
* Job prologue
* Direct SSH access into an allocated compute node: in this case, the
`pam_adopt_slurm` module will make the determination as to which
SLURM job launches the ssh (if any) and attribute the portion of
this computation to the calling job.
There were some other steps observed, whose JobIDRaw becomes `NNNNN.N`.
I wonder if these "job steps" are due to the calls of "srun" within
the batch script, because the job names are indicative: `pw.x`,
`pmi_proxy`, etc..
(Example job: 5947279 , Nov 2018.)
#### Job Step: All the others
These correspond to job steps that were launched by `srun` or other
similar mechanism instead the job script.
A prime example is the `mpirun` launch, which will record a new job step.
#### Job Completion
`JOBCOMP` appears to mark a job completion.
Not sure if this kind of record appears on Turing accounting;
that may be only when a specific "job completion" task is specified.
#### Questions & (Possible) Answers
* Why there is a separate "NNNNN.batch" record?
It is perhaps when the job is multi-node.
It appears to me that the ".batch" record is for accounting the batch script
itself (which will run only on node #0 of the allocated resources).
#### The Takeaway
Why all this complicated explanation?
My original goal was to find the accounting records which covers the
whole-job statistics without getting bogged down by the minute details
of each job.
This is what I found after this exploration:
> We only need to include accounting records where the `JobIDRaw` field
> contains only whole integers (i.e. matching regex `^[0-9]+$`).
## References
- `sacct` manual page:
SLURM administrator's documentation contains helpful bits and pieces
to decipher the accounting records; unfortunately in themselves they
are not sufficient.
- Accounting:
<https://slurm.schedmd.com/accounting.html> .
- Job Launch design guide:
<https://slurm.schedmd.com/job_launch.html> .
> This guide describes at a high level the processes which occur in
> order to initiate a job including the daemons and plugins involved
> in the process. It describes the process of job allocation, step
> allocation, task launch and job termination.
In SLURM, launching a job is a multistep process.
Various "job steps" described this guide eventually make their own
entries in the SLURM accounting database.
### Working Notes
These are my private working notes:
- daily-notes/2019/20190326.slurm-acct.txt
- daily-notes/2019/20190411.slurm-acct.txt
- daily-notes/2019/20190430.slurm-acct-201811.txt
- docs/kb/turing-slurm/20180106.SLURM-accounting.txt