This page contains the most basic information you should know as a new cluster user. You should read this entire page at the least before using our systems.
This page contains a basic overview of clusters, how to connect to them, how to use them, how to run computations on them and where to get help.
A cluster is essentially a large computer that is composed of many smaller inter-connected computers. These individual computers are called nodes. Much like regular PCs, each node has one or more CPUs which in turn have multiple cores. Each node also has its own RAM (random-access memory). Nodes may also have additional hardware like graphical processing units (GPUs).
Many things about a cluster are identical to any other (Linux) computer. There are, however, some differences.
Storage devices (hard drives etc.) on HPC clusters are usually centralized, meaning users have access to the same files from any node. On some clusters individual cluster nodes may have additional local hard drives (or sometimes SSDs), those usually serve special purposes and how to access them is cluster-specific.
The nodes are usually connected via a fast network (usually called an Interconnect), most commonly employing Infiniband, or occasionally Omnipath technology. This allows multiple nodes to function together more effectively than regular networked computers.
As you can see in this image, you always connect to one of the login nodes. There you work interactively, usually via a Linux command-line interface (CLI). However, you do not run your resource-intensive calculations on the login nodes themselves. Instead, you (usually) queue them for computation on the compute nodes via a job scheduler, which on all our systems is a software called SLURM.
Never run computations on the login nodes! You will slow down the login nodes for everyone. Only ever run your actual computations via the job scheduling system.
Our systems are available to university members (both employees and students) free of charge. All you need to do is to register for HPC access by filling out the corresponding registration form.
Once you have access, connecting to our systems is done via the Secure Shell (SSH) protocol, which can be used under Windows, Linux, and macOS systems.
Our systems can be reached from inside the university network or the university's VPN by default.
Please remember that you are sharing a limited resource with the entire university, with comparatively few technical restrictions. Not to mention that HPC systems have a very high power consumption and are expensive to operate.
Instead, please stick to the following rules of thumb regarding the use of the login nodes:
|Things that are generally OK||Things that are not OK|
|Compiling software.||Running computations on the login nodes for hours and using a large number of cores.|
|Doing brief debug runs with a few parallel processes to see if they start at all.||Running scripts on the login nodes because the waiting time for jobs is too long.|
||Leaving phantom processes (those that persist after logout) lying around forever (some software is known to do that). To check for phantom processes, use
|Copying large files after taking steps to minimize the information that actually needs to be copied.||Blindly making multiple copies of your data for convenience.|
However, time reserves are in order. If you do not know how long your job will run, give it a generous time reserve (e.g. between 1.5 and 2 times the expected runtime). It is also fine to run a few longer jobs to experiment with runtimes.
The HPC team will never interfere with the job priorization algorithm, no matter how urgent your paper deadline is.
Also, note that another user's job running before yours does not mean that job queuing went wrong. A number of factors go into the priority calculation and the algorithm may sometimes fill gaps with small jobs that fit into them.
We have neither the time nor the expertise to do that.
That said, we offer consulting services for both basic use and for advanced topics such as parallelization and performance optimization. You can get in contact with us at email@example.com.
They can be found here:
Our clusters, like almost all clusters, run on the Linux operating system.
If you do not know how to operate Linux, check out our Linux Introduction Course which is held at least once each semester, or our Linux video tutorial. This tutorial is part of the larger HPC Wiki which offers more tutorials and a lot of additional information.
Like on any Linux system, you have a home directory on the cluster. Your home directory is limited in size.
Individual systems may have additional file systems for special purposes. They are decribed:
All our systems have a lot of centrally installed software that you can use right away. These are usually defined in the form of environment modules.
The environment module functionality is provided on our systems by a software called LMod, whose user guide you can find here.
Additionally, there are various package managers. You can use Conda with
module load Anaconda3 or Easybuild with
module load EasyBuild. Both let you install software into your own home directory.
There are various ways you can install and build your own software. Some common ways of doing that are described here.
You do not run HPC computations from the command line directly. That is because anything you would type in the console would run on the login node.
Computations on the cluster are run in so-called jobs. In such a job, you define how many resources (CPU cores, RAM, GPUs) you need and for how long you need them. You also typically provide a job script which details what program(s) are to be run in it. You then put the job into a waiting queue and our scheduler SLURM decides, based on the size of the job and other factors, when to run it.
This wiki only provides the most important information on how to interact with the HPC systems at the University of Bonn.
An overview of many important terms, concepts and technologies can be found at the HPC Wiki. It is built up continuously within the HPC.NRW project. The HPC Wiki also contains a number of interactive tutorials.
If you are looking for help on a specific Linux command, try the built-in help functions and especially the Linux man pages, which you can access with
man <command>. If the developer has written a man page for their software, which is usually the case, it is then displayed.
If none of the resources answer your question, you can send an e-mail to firstname.lastname@example.org. The same e-mail addresses, or one of our system-specific support addresses (email@example.com, firstname.lastname@example.org) can also be used for reporting problems and software installation requests.
Additionally, we offer consulting for development and optimization of your software. Experts from multiple scientific disciplines are available to review your software or otherwise advise you in person. If you would like consulting, you can also send an e-mail to email@example.com.