From 0d8b081ac7f04246141a324f1efd18eca5d51603 Mon Sep 17 00:00:00 2001 From: Wirawan Purwanto Date: Tue, 5 Nov 2019 15:48:45 -0500 Subject: [PATCH] * Committed my KB article on SLURM accounting fields. Last updated: 2019-07-22. --- slurm/20190411.Slurm-accounting.md | 232 +++++++++++++++++++++++++++++ 1 file changed, 232 insertions(+) create mode 100644 slurm/20190411.Slurm-accounting.md diff --git a/slurm/20190411.Slurm-accounting.md b/slurm/20190411.Slurm-accounting.md new file mode 100644 index 0000000..5db10f2 --- /dev/null +++ b/slurm/20190411.Slurm-accounting.md @@ -0,0 +1,232 @@ +SLURM ACCOUNTING (sacct) +======================== + +CAVEAT: +This document was originally developed by referencing SLURM +18.08.1 used on Turing. +I also tried to consult the newer version (master branch +around July 2019). +Newer version may introduce additional features, or features +incompatible with this version. +Please use a grain of salt when reading, and always consult with +manual pages, source code, etc in case of doubt. + + +UNDERSTANDING SLURM ACCOUNTING FIELDS +------------------------------------- + +SLURM accounting can produce very many fields. + +`JobID`: +The "cooked" job ID. Please see the discussion below. + +`JobIDRaw`: +The "raw" job ID. Please see the discussion below. + +`TimelimitRaw`: +The raw value of time limit, in minutes. + + + +### About SLURM Job IDs + +SLURM produces one or more records in the accounting database for every job. +When a user submits a job to SLURM, SLURM assigns that job a unique job number, +like this: + + $ sbatch calculation.job + Submitted batch job 8918299 + +However, internally within SLURM, there can be one or more "job steps" created +and executed while this job is being launched and executed. +(Things get more even complicated with newer "heterogenous job" feature, +in which various parts of a job can require very different resources. +See [this documentation](https://slurm.schedmd.com/heterogeneous_jobs.html) +for more information.) + +Several regex patterns have observed in the JobID field (from Turing +accounting): + +* `[0-9]+` + +* `[0-9]+_[0-9]+` (for job arrays) + +* `[0-9]+\.[0-9]+` + +* `[0-9]+\.batch` + +For all cases, the `JobIDRaw` is the same as `JobID` except in the case of +`/[0-9]+_[0-9]+/`, where the `JobIDRaw` is a running number `[0-9]+`. +This is the case where the submitter specifies an array of jobs. + +From the slurm's sacct source code (`src/sacct/print.c`) one can find that there +are other patterns too (look for string `case PRINT_JOBIDRAW:`). +The key function in `print.c` is `print_fields`. +In particular look at the lengthy `case` statement where it tackles +`PRINT_JOBID` and `PRINT_JOBIDRAW` cases. + +A job can be of different types: + +* `JOB` +* `JOBSTEP` +* `JOBCOMP` + +A `JOBSTEP` can have several subtypes: + +* `SLURM_BATCH_SCRIPT`, in which case JobIDRaw will obtain the `.batch` suffix. +* `SLURM_EXTERN_CONT`, in which case JobIDRaw will obtain the `.extern` suffix. + Apparently, this is meant to indicate "external" type of job steps, + including. +* many others; but in this case, it will print JobIDRaw in `[0-9]+\.[0-9]+` + pattern +* Other types (usually it will have index numbers like 0, 1, 2, ...) + + +#### Vanilla Job + +A "vanilla" job entry corresponds to a single job submitted by a user to SLURM. +This will not be a job array. + +* Characteristics : `JobID ~ /^[0-9]+$/`. + + +#### Array Job + +An "array" job entry corresponds to a single job as part of a job +array submitted by a user to SLURM. + +* Characteristics : `JobID ~ /^[0-9]+_[0-9]+$/`. + +The Job ID contains two numbers separated by an underscore. +The number before the underscore refers to the job ID as reported by +sbatch upon the submission of the job. + +NOTE: Newer version of SLURM will allow textual word instead of +numbers to identify one job in an array. +Those text-based job label (instead of integer) will be marked by +square brackets around the job suffix: + +* Characteristics (textual array label): `JobID ~ /^[0-9]+_\[.*\]+$/`. + + +#### Heterogenous Job + +A heterogenous job entry corresponds to a part of a heterogenous job +submitted by a user to SLURM. + +* Characteristics : `JobID ~ /^[0-9]+\+[0-9]+$/`. + +The Job ID contains two numbers separated by a plus sign. +The number before the underscore refers to the job ID as reported by +sbatch upon the submission of the job. + +This will not be a job array. + + +#### Job Step: Batch script + +This corresponds to the execution of the batch script (submitted to +sbatch) when more than one CPU cores were requested by the job. + +Characteristics of SLURM_BATCH_SCRIPT accounting records: + +* JobIDRaw =~ /^[0-9]+\.batch$/ + +* The record does NOT have user ID (field `User`) + +* `JobName` is always `batch` + + +#### Job Step: External + +SLURM_EXTERN_CONT apparently is a way to account for "external processes". +It is still not 100% obvious what this means, but from reading the +source code, there are two types of stuff that will fall under this +category: + +* Job prologue + +* Direct SSH access into an allocated compute node: in this case, the + `pam_adopt_slurm` module will make the determination as to which + SLURM job launches the ssh (if any) and attribute the portion of + this computation to the calling job. + +There were some other steps observed, whose JobIDRaw becomes `NNNNN.N`. +I wonder if these "job steps" are due to the calls of "srun" within +the batch script, because the job names are indicative: `pw.x`, +`pmi_proxy`, etc.. +(Example job: 5947279 , Nov 2018.) + + +#### Job Step: All the others + +These correspond to job steps that were launched by `srun` or other +similar mechanism instead the job script. +A prime example is the `mpirun` launch, which will record a new job step. + + + + +#### Job Completion + +`JOBCOMP` appears to mark a job completion. +Not sure if this kind of record appears on Turing accounting; +that may be only when a specific "job completion" task is specified. + + +#### Questions & (Possible) Answers + +* Why there is a separate "NNNNN.batch" record? + It is perhaps when the job is multi-node. + It appears to me that the ".batch" record is for accounting the batch script + itself (which will run only on node #0 of the allocated resources). + + +#### The Takeaway + +Why all this complicated explanation? +My original goal was to find the accounting records which covers the +whole-job statistics without getting bogged down by the minute details +of each job. +This is what I found after this exploration: + +> We only need to include accounting records where the `JobIDRaw` field +> contains only whole integers (i.e. matching regex `^[0-9]+$`). + + + +## References + +- `sacct` manual page: + + +SLURM administrator's documentation contains helpful bits and pieces +to decipher the accounting records; unfortunately in themselves they +are not sufficient. + +- Accounting: + . + +- Job Launch design guide: + . + + > This guide describes at a high level the processes which occur in + > order to initiate a job including the daemons and plugins involved + > in the process. It describes the process of job allocation, step + > allocation, task launch and job termination. + + In SLURM, launching a job is a multistep process. + Various "job steps" described this guide eventually make their own + entries in the SLURM accounting database. + + + +### Working Notes + +These are my private working notes: + +- daily-notes/2019/20190326.slurm-acct.txt +- daily-notes/2019/20190411.slurm-acct.txt +- daily-notes/2019/20190430.slurm-acct-201811.txt +- docs/kb/turing-slurm/20180106.SLURM-accounting.txt +