Quick-and-dirty knowledge base for ODU RCS.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

272 lines
8.2 KiB

SLURM ACCOUNTING (sacct)
========================
Created: April 2019<br>
Updated: November 2019
**CAVEAT:**
This document was originally developed by referencing SLURM
18.08.1 used on Turing.
I also tried to consult the newer version (master branch
around July 2019).
Newer version may introduce additional features, or features
incompatible with this version.
Please use a grain of salt when reading, and always consult with
manual pages, source code, etc in case of doubt.
*Update 2019-11-06*:
SLURM man page now contains the description of the accounting fields.
Please look at
<https://slurm.schedmd.com/sacct.html#lbAF> .
UNDERSTANDING SLURM ACCOUNTING FIELDS
-------------------------------------
SLURM accounting can produce very many fields.
`JobID`:
The "cooked" job ID. Please see the discussion below.
`JobIDRaw`:
The "raw" job ID.
In a vast majority of cases, the `JobIDRaw` field is identical to `JobID`
except in the case of array jobs.
Please see the discussion below.
`TimelimitRaw`:
The raw value of time limit, in minutes.
### About SLURM Job IDs
SLURM produces one or more records in the accounting database for every job.
When a user submits a job to SLURM, SLURM assigns that job a unique job number,
like this:
$ sbatch calculation.job
Submitted batch job 8918299
However, internally within SLURM, there can be one or more "job steps" created
and executed while this job is being launched and executed.
(Things get more even complicated with newer "heterogenous job" feature,
in which various parts of a job can require very different resources.
See [this documentation](https://slurm.schedmd.com/heterogeneous_jobs.html)
for more information.)
The combination of all the job steps constitute the entire job.
Each job step generates its own record in the SLURM accounting database.
#### Summary on Job ID
A single SLURM job will generate the "master record" which logs the
overall execution of the job.
In addition, there can be zero or more extra records generated by the
"job steps" triggered during the course of that job.
The master record includes the resource utilization usage (CPU,
memory, etc) of the child "job steps".
The master job record is characterized by a plain number in the
`JobIDRaw` field.
Further, the `User` field must not be empty.
The rest of this section goes into greater detail of the various
`JobID`'s.
#### Observed Job ID Patterns
Several regex patterns have observed in the JobID field (from Turing
accounting):
* `[0-9]+`
* `[0-9]+_[0-9]+` (for job arrays)
* `[0-9]+\.[0-9]+`
* `[0-9]+\.batch`
For all cases, the `JobIDRaw` is the same as `JobID` except in the case of
`/[0-9]+_[0-9]+/`, where the `JobIDRaw` is a running number `[0-9]+`.
This is the case where the submitter specifies an array of jobs.
From the slurm's sacct source code (`src/sacct/print.c`) one can find that there
are other patterns too (look for string `case PRINT_JOBIDRAW:`).
The key function in `print.c` is `print_fields`.
In particular look at the lengthy `case` statement where it tackles
`PRINT_JOBID` and `PRINT_JOBIDRAW` cases.
A job can be of different types:
* `JOB`
* `JOBSTEP`
* `JOBCOMP`
A `JOBSTEP` can have several subtypes:
* `SLURM_BATCH_SCRIPT`, in which case JobIDRaw will obtain the `.batch` suffix.
* `SLURM_EXTERN_CONT`, in which case JobIDRaw will obtain the `.extern` suffix.
Apparently, this is meant to indicate "external" type of job steps,
described further below.
* many others; but in this case, it will print JobIDRaw in `[0-9]+\.[0-9]+`
pattern
* Other types (usually it will have index numbers like 0, 1, 2, ...)
#### Vanilla Job
A "vanilla" job entry corresponds to a single job submitted by a user to SLURM.
This will not be a job array.
* Regexp match : `JobID ~ /^[0-9]+$/`.
From my observation, only simple single-core jobs that do not involve any
MPI or other fancy stuff (no job array, for example) would not
generate extra "child records" for job steps in the SLURM accounting
database.
However, several job records with this type JobID will have no "User" field set.
These are also not vanilla jobs.
#### Array Job
An "array" job entry corresponds to a single job as part of a job
array submitted by a user to SLURM.
* Regexp match : `JobID ~ /^[0-9]+_[0-9]+$/`.
The Job ID contains two numbers separated by an underscore.
The number before the underscore refers to the job ID as reported by
sbatch upon the submission of the job.
NOTE: Newer version of SLURM will allow textual word instead of
numbers to identify one job in an array.
Those text-based job label (instead of integer) will be marked by
square brackets around the job suffix:
* Characteristics (textual array label): `JobID ~ /^[0-9]+_\[.*\]+$/`.
#### Heterogenous Job
A heterogenous job entry corresponds to a part of a heterogenous job
submitted by a user to SLURM.
* Regexp match: `JobID ~ /^[0-9]+\+[0-9]+$/`.
The Job ID contains two numbers separated by a plus sign.
The number before the underscore refers to the job ID as reported by
sbatch upon the submission of the job.
This will not be a job array.
#### Job Step: Batch script
This corresponds to the execution of the batch script (submitted to
sbatch) when more than one CPU cores were requested by the job.
Characteristics of SLURM_BATCH_SCRIPT accounting records:
* Regexp match: `JobIDRaw ~ /^[0-9]+\.batch$/`
* The record does NOT have user ID (field `User`)
* `JobName` is always `batch`
#### Job Step: External
SLURM_EXTERN_CONT apparently is a way to account for "external processes".
It is still not 100% obvious what this means, but from reading the
source code, there are two types of stuff that will fall under this
category:
* Job prologue
* Direct SSH access into an allocated compute node: in this case, the
`pam_adopt_slurm` module will make the determination as to which
SLURM job launches the ssh (if any) and attribute the portion of
this computation to the calling job.
There were some other steps observed, whose JobIDRaw becomes `NNNNN.N`.
I wonder if these "job steps" are due to the calls of "srun" within
the batch script, because the job names are indicative: `pw.x`,
`pmi_proxy`, etc..
(Example job: 5947279 , Nov 2018.)
#### Job Step: All the others
These correspond to job steps that were launched by `srun` or other
similar mechanism instead the job script.
A prime example is the `mpirun` launch, which will record a new job step.
#### Job Completion
`JOBCOMP` appears to mark a job completion.
Not sure if this kind of record appears on Turing accounting;
that may be only when a specific "job completion" task is specified.
#### Questions & (Possible) Answers
* Why there is a separate "NNNNN.batch" record?
Perhaps, this record was made when the job is multi-node.
It appears to me that the ".batch" record is for accounting the batch script
itself (which will run only on node #0 of the allocated resources).
#### The Takeaway
Why all this complicated explanation?
My original goal was to find the accounting records which covers the
whole-job statistics without getting bogged down by the minute details
of each job.
This is what I found after this exploration:
> We only need to include accounting records where the `JobIDRaw` field
> contains only whole integers (i.e. matching regex `^[0-9]+$`).
> Further,
## References
- `sacct` manual page:
<https://slurm.schedmd.com/sacct.html>
SLURM administrator's documentation contains helpful bits and pieces
to decipher the accounting records; unfortunately in themselves they
are not sufficient.
- Accounting:
<https://slurm.schedmd.com/accounting.html> .
- Job Launch design guide:
<https://slurm.schedmd.com/job_launch.html> .
> This guide describes at a high level the processes which occur in
> order to initiate a job including the daemons and plugins involved
> in the process. It describes the process of job allocation, step
> allocation, task launch and job termination.
In SLURM, launching a job is a multistep process.
Various "job steps" described this guide eventually make their own
entries in the SLURM accounting database.
### Working Notes
These are my private working notes:
- daily-notes/2019/20190326.slurm-acct.txt
- daily-notes/2019/20190411.slurm-acct.txt
- daily-notes/2019/20190430.slurm-acct-201811.txt
- docs/kb/turing-slurm/20180106.SLURM-accounting.txt