* Minor updates Nov 2019.

master
Wirawan Purwanto 4 years ago
parent 0d8b081ac7
commit 837e1c7d9f
  1. 30
      slurm/20190411.Slurm-accounting.md

@ -1,7 +1,10 @@
SLURM ACCOUNTING (sacct)
========================
CAVEAT:
Created: April 2019<br>
Updated: November 2019
**CAVEAT:**
This document was originally developed by referencing SLURM
18.08.1 used on Turing.
I also tried to consult the newer version (master branch
@ -11,6 +14,12 @@ incompatible with this version.
Please use a grain of salt when reading, and always consult with
manual pages, source code, etc in case of doubt.
*Update 2019-11-06*:
SLURM man page now contains the description of the accounting fields.
Please look at
<https://slurm.schedmd.com/sacct.html#lbAF> .
UNDERSTANDING SLURM ACCOUNTING FIELDS
-------------------------------------
@ -21,7 +30,10 @@ SLURM accounting can produce very many fields.
The "cooked" job ID. Please see the discussion below.
`JobIDRaw`:
The "raw" job ID. Please see the discussion below.
The "raw" job ID.
In a vast majority of cases, the `JobIDRaw` field is identical to `JobID`
except in the case of array jobs.
Please see the discussion below.
`TimelimitRaw`:
The raw value of time limit, in minutes.
@ -76,7 +88,7 @@ A `JOBSTEP` can have several subtypes:
* `SLURM_BATCH_SCRIPT`, in which case JobIDRaw will obtain the `.batch` suffix.
* `SLURM_EXTERN_CONT`, in which case JobIDRaw will obtain the `.extern` suffix.
Apparently, this is meant to indicate "external" type of job steps,
including.
described further below.
* many others; but in this case, it will print JobIDRaw in `[0-9]+\.[0-9]+`
pattern
* Other types (usually it will have index numbers like 0, 1, 2, ...)
@ -87,7 +99,7 @@ A `JOBSTEP` can have several subtypes:
A "vanilla" job entry corresponds to a single job submitted by a user to SLURM.
This will not be a job array.
* Characteristics : `JobID ~ /^[0-9]+$/`.
* Regexp match : `JobID ~ /^[0-9]+$/`.
#### Array Job
@ -95,7 +107,7 @@ This will not be a job array.
An "array" job entry corresponds to a single job as part of a job
array submitted by a user to SLURM.
* Characteristics : `JobID ~ /^[0-9]+_[0-9]+$/`.
* Regexp match : `JobID ~ /^[0-9]+_[0-9]+$/`.
The Job ID contains two numbers separated by an underscore.
The number before the underscore refers to the job ID as reported by
@ -114,7 +126,7 @@ square brackets around the job suffix:
A heterogenous job entry corresponds to a part of a heterogenous job
submitted by a user to SLURM.
* Characteristics : `JobID ~ /^[0-9]+\+[0-9]+$/`.
* Regexp match: `JobID ~ /^[0-9]+\+[0-9]+$/`.
The Job ID contains two numbers separated by a plus sign.
The number before the underscore refers to the job ID as reported by
@ -130,7 +142,7 @@ sbatch) when more than one CPU cores were requested by the job.
Characteristics of SLURM_BATCH_SCRIPT accounting records:
* JobIDRaw =~ /^[0-9]+\.batch$/
* Regexp match: `JobIDRaw ~ /^[0-9]+\.batch$/`
* The record does NOT have user ID (field `User`)
@ -177,7 +189,7 @@ that may be only when a specific "job completion" task is specified.
#### Questions & (Possible) Answers
* Why there is a separate "NNNNN.batch" record?
It is perhaps when the job is multi-node.
Perhaps, this record was made when the job is multi-node.
It appears to me that the ".batch" record is for accounting the batch script
itself (which will run only on node #0 of the allocated resources).
@ -192,7 +204,7 @@ This is what I found after this exploration:
> We only need to include accounting records where the `JobIDRaw` field
> contains only whole integers (i.e. matching regex `^[0-9]+$`).
> Further,
## References

Loading…
Cancel
Save