Computations on the cluster are performed in so-called jobs. A job is a self-contained, user-defined task (or set of tasks). As user, you specify how many resources (nodes, cores, memory, etc.) your job needs and for how long you need them. You then queue the job for a specific partition and the scheduler software is going to start it as soon as enough resources are available.
The scheduler software on our systems is called SLURM. SLURM is operated with a number of console commands for queuing jobs, monitoring them and, if necessary, canceling them. If you want to know more about possible arguments for all the options presented to you on this page view the man pages for any of the commands, e.g.
man sbatch for the
Waiting queues (called partitions in SLURM) differ from system to system. Usually, SLURM partitions differ in terms of hardware (e.g. GPU queues and non-GPU queues) and maximum allowed runtime per job.
SLURM queuing is designed to minimize waiting times for everyone. Thus, a priority value is assigned to each job. For example, while a job is waiting, its priority will increase continuously or if a user has recently used lots of resources, the priority of their next job will be lower.
You can find out the partitioning of our clusters with the
sinfo command and an overview of currently running jobs with
squeue. The output of both commands can be highly customized to filter and sort the output and display more information, you can see a full list with
man sinfo and
man squeue respectively. A basic overview of the available hardware can also be viewed from the corresponding System Information pages (Bender, Bonna, Marvin).
Tip: While it always depends on the system, typically partitions with shorter job time limits have a shorter waiting time. It therefore makes sense to put your jobs into the shortest queue that it fits into.
Remember that you can also use
sinfo to get the queue setup and job limits on each system.
|Brief debugging test runs on the A40 nodes
|Short jobs on the A40 nodes
|Longer jobs on the A40 nodes
|Brief debugging test runs on the A100 nodes
|Short jobs on the A100 nodes
|Longer jobs on the A100 nodes
Current as of August 2023
|Used only for special purposes
Current as of August 2023
The main way to create a job is by writing a job script. In principle, a job script is just a regular Linux shell script with additional instructions for SLURM about what resources to allocate for the job. These instructions take the form
You as the person running the job should review which resources your job needs before you run it (remember your responsibilities). The three resources you need to think about basically always are:
--cpus-per-task (and possibly
--ntasks-per-node for larger jobs)
--account. This is not necessary on Bender.
sshare -U command shows to which accounts you belong.
And other parameters as needed (commonly RAM with the
--mem option). You can see a full list of available job parameters at the official sbatch documentation. Most parameters have default options, meaning you do not necessarily need to set them.
Here is a very simple job script which will request a single GPU on Bender:
module load Python
The part below the
#SBATCH instructions is a regular Linux shell script and can therefore contain any Linux command. In addition to actually launching your computation, you might for example need to do things like loading modules or creating output directories.
To queue a job simply call the
sbatch command on your job script, like this:
$ sbatch <script>
You can also add options to this
sbatch command identical to the ones you would use with the
#SBATCH instructions. If you do, these options will override their counterparts defined in the script. The hierarchy of options in decreasing ranking order is as follows:
Once you have queued a job with SLURM, it will be assigned an ID number. This job ID is used as an identifier with which you can interact with your jobs, for example in order to monitor them. You can find out which jobs you currently run, their IDs and other useful information with:
$ squeue --me
If you want to do CPU-intensive work interactively on a compute node, e.g. for visualization or debugging, a job script might not be possible. In these cases, please do not use the login nodes for CPU-intensive tasks over longer times, as you will slow down the login node for everybody else as well. In these cases we reserve the right to kill any process that causes high CPU load over a longer period.
Instead, we recommend to use an interactive job. These are like any other SLURM job except that only a console gets started and you can use that console interactively. You can start an interactive job with a command similar to the following:
$ srun --pty <other SLURM options> /bin/bash
Note two important details. First,
srun is used instead of
sbatch. Second, the
--pty option is used. Other SLURM options can be used as needed, like with any
If a job has not finished running yet, you can view highly detailed information about it with:
$ scontrol show job <id>
scontrol is primarily an admin command, the
show job option can be extremely handy for regular users as well. Additionally, there is
sacct (see documentation), which is used to query the SLURM database (which contains information about past jobs). The
sacct command is a much more complex but also very powerful tool if you need to access information about old jobs after the fact.
You can cancel a specific job preemptively, i.e. before its scheduled or otherwise caused termination, with:
$ scancel <job ID>
This command can also take the
--me option in order to cancel all your currently running jobs.
By default, the output files of jobs are named
slurm-<id>.out and get placed where your current working directory was when you launched them. The name of the output file can be changed with the
--output=<pattern> option, see also the sbatch documenation.
Graphical processing units (GPUs) are specialized processors that can perform certain operations faster than CPUs with their massively parallel architectures. They are therefore also referred to as accelerators of which GPUs are only a subgroup.
In order for a program to use the GPUs, the following conditions have to be met:
In our experience it is very easy, especially as a newbie user, to not use the GPUs by accident despite thinking you did. This is quite wasteful, as GPUs are expensive and resource-intensive to operate. Please make sure that your software can see the GPUs and actually makes use of them. Remember you can use small test jobs and the
develqueues for debugging.
Tip: There is a simple test you can do: if you disable GPU support in your software and it does not run slower than before, then there is a GPU misconfiguration.
Here is an example for Bender:
There are also a number of other parameters in the Slurm documentation that you can use to control GPU usage.
In order to solve a problem or process larger datasets faster, calculations can often be dissected into pieces and executed in parallel. However, parallelization is quite a big topic on its own and depends on the programming language you want to use, which is why we only briefly mention it here.
squeue - shows all running jobs on the cluster
sinfo - shows all parts of the cluster, check for the idle ones
sview - a graphical view of the cluster usage (don't forget to use ssh -X)
scontrol show job <jobid> - shows all details of a job with $JOBID
scancel <jobid> - cancel running job
scancel -t PENDING - cancel all pending jobs
scontrol hold <jobid> - put the job on hold
scontrol release <jobid> - releases the job
scontrol requeue <jobid> - cancel and rerun a job
These can be used inside of SLURM jobs to retrieve information about the job. See here for full list.
On systems where you need to use SLURM accounts (Bonna and Marvin), you can view your associated SLURM accounts and your cluster usage with:
Checkpointing refers to the technique of writing out intermediate results. Implementing checkpoints within your job scripts will shorten your occupation of valuable resources through your cluster usage in case of an event that aborts your job. There are several possible sources of such events, e.g. exceeding the time limit or the allocated memory, node failures, or cluster down times. Checkpointing may also help you to narrow down possible error sources in your calculation.
You can request SLURM to send you an e-mail notification if certain events occur with:
$ sbatch --mail-type=<type> <script>
man sbatch for all possible arguments.
E-mail notifications are not currently functional on Bender and on Marvin.
By default, specified notifications will be sent to the e-mail address of the submitting user. You can change the receiving user with an additional argument for the