Just like on any other Linux system you have a home directory on each of our systems, which is located under /home/<user>
. This is the default directory for all your data and it becomes your current working directory when you connect to the cluster.
Your home directory is the same from all login and compute nodes of the corresponding cluster. They are not shared between clusters however.
Home directories are not backed up regularly.
On Bender and Marvin, your home directory has a quota (maximum allowed size) of 100 GB. Once you exceed that threshold, you will get an error message when creating additional files. You have to get back under the 90% limit within 7 days in this case.
On Bender, you can check how much disk space you are using with the command quota -s
. On Marvin this is not possible currently. Your only options here are built-in Linux commands like ls
and du
.
On Bender, we may grant exceptions to this rule if we deem it justified. If you absolutely cannot do your computations with the 100 GB limit, please contact bender-support@hpc.uni-bonn.de and describe your problem. We may consider increasing your quota. On Marvin, you can use workspaces for large amounts of data.
On Bonna and Marvin, only you have read and write access to your own home directory. You can change permissions of files and directories by using the chmod
command.
We strongly advise against giving write access to your home directory to people you do not know.
On Bender, home directories are automatically mounted whenever they are needed. This means that they might not always be visible with an ls
call, however you can still cd
into them (provided the owner has set the permissions accordingly). The owner does not need to be logged in for this.
On our systems, you can in principle use the chown
and chmod
commands together with Linux file permissions to give access to your data to other users. You can even do this for your entire home directory. On Marvin, this also means that you can share data with your research group, see this guide on our page about research groups.
On Bender, your home directory is the primary location for all data from your computations. However, if you need higher file I/O speeds, you may use the local SSDs on the Bender compute nodes.
For every job, a temporary directory is automatically created and you can access it at the path
/local/nvme/$USER_$SLURM_JOB_ID
where $USER
is your user name and $SLURM_JOB_ID
is the numerical job ID of the job (both can be retrieved within a job from environment variables with the same name). For example, a job with ID 23111 of user kuckertz
would create a directory named /local/nvme/kuckertz_23111
.
All data residing in the temporary directory will be deleted at the end of the job!
See the Bonna wiki here.
See our article about the Marvin file systems for more in-depth information on all the different available filesystems.
Marvin has a central file storage with a capacity of 5 Petabytes. It uses the Lustre filesystem.
You do not have a quota or any size restrictions on the Lustre filesystem. Rather, we use the hpc-workspaces mechanism. You can find detailed guides and information on our Marvin workspaces page.
You can also share data in your workspaces with your research group on Marvin, this is described here on our page about research groups.
None of our systems have enough storage space to store your research data for longer periods of time. Please refer to our colleagues who manage the research data infrastructure for long-term storage options.
You can access the research data infrastructure from our systems, see below.
You can transfer files from and to any of our systems via any method that works via SSH.
In any case, data should be transferred as little as possible. Ideally your input data should be copied to our system at the beginning of your compute campaign, and the results should be transferred back after your computations are done.
Streaming data back and forth is not possible, neither is mounting the FDI or other remote storage permanently. Remember that you are sharing the cluster and thus the network connection with everyone and act responsibly.
You can transfer files back and forth between our systems and the Research Data Infrastructure (FDI) of Uni Bonn. All you need is the sftp
command, which is available on all our systems. You have to sign up for the FDI as described on their website to use it. The FDI is only available if you have an Uni ID.
Here is a tutorial giving a good overview of SFTP. Information on the sftp
command can also be found in its man
page and on Wikipedia.
Usage example: If your uni ID is demouser
, then you could enter the following command on the cluster:
sftp demouser@fdi-sftp.uni-bonn.de:demouser
to connect to your personal storage. The part before the @
is the user that you want to connect as - as mentioned, the FDI uses your Uni ID and password. fdi-sftp.uni-bonn.de
is the server hosting your personal storage. See also this page in Confluence on the addresses for personal and project storage. The part after the :
is the directory you want to enter. For your personal storage, your top-level directory is by default identical to your username, but you could specify a subdirectory here.
After typing this command, you will enter the interactive console of SFTP, which is explained in more detail in the aforementioned tutorial. From this point on, you can copy files back and forth between the cluster and the FDI.
You can find information on this on the FDI Confluence pages. Note that you need to have an Uni-ID both to use the FDI and to access the Confluence pages.
Since SFTP is based on SSH, you can make use of the normal SSH features like key pairs as well, however only the FDI support can help you with that.
Mounting the FDI on our systems via SSHFS is not possible.
OpenSSH comes with the command scp
(secure copy) which is used to securely copy files and folders back and forth between your local machine and a cluster through the SSH network. In simplified syntax, with the scp
command you copy <source>
data and create <target>
data with the exact same content as the original, like this:
$ scp <source> <target>
For example, to copy a file foobar.sh
from the current working directory of your local machine into your home directory on the cluster, type:
$ scp ./foobar.sh <user>@<cluster-address>:~/foobar.sh
If you instead want to copy a file foobar.sh
from your home directory on the cluster into the current working directory of your local system, just specify source and target files the other way around like this:
$ scp <user>@<cluster-address>:~/foobar.sh ./foobar.sh
Obviously, it is mandatory that you specify the exact name of the <source>
file you want to copy. However, you are able to name the <target>
file as you like, thus copying and renaming data simultaneously.
Transferring whole directories works exactly the same except that you need to add the -r
flag for recursively copying entire directories. In the following example we first copy a directory foobar
onto the cluster and then back to your local machine:
$ scp -r ./foobar <user>@<cluster-address>:~
$ scp -r <user>@<cluster-address>:~/foobar .
A more sophisticated approach to transfer data is the use of the rsync utility with its rsync
command. The simplified syntax and its use in general is the same as for the scp
command:
$ rsync <source> <target>
If you would like to copy a directory recursively you could add the -r
flag as is described above. However, the advantage of the rsync
command lies in its capability for you to be able to comfortably archive and compress your data before the transferring process takes place. This method is more efficient and much more time-saving than copying directories raw, especially when you need to transfer lots of data. For this purpose, let us assume that you want to transfer a directory foobar
from the current working directory of your local system onto your home directory of a cluster. One way using the rsync
command could be as follows:
$ rsync -azv ./foobar <user>@<cluster-address>:~
In this example, we use three options: -a
for archiving, -z
for compressing, and an additional -v
to print verbose information about the transferring process. You do not need to use -r
in this case, because an archived directory will be interpreted as a single file. After the transferring process has finished, rsync
will automatically uncompress and extract the archived directory back to its initial state.