Caution: The information on this page only applies to Marvin.
There are a number of file systems for different purposes on Marvin.
This page contains some pointers on which filesystem to use for which purpose and how to use them.
See the Marvin system information page for technical details on the hard drives.
Here are some general tips on what filesystem to use for which data. Most likely your data breaks down roughly into the following categories:
This includes job scripts.
Code that you write yourself is best stored in your home directory, because your home directory is a persistent storage location that is central, i.e. the same on any node. Most of the time, software needs moderate amounts of disk space and therefore does not threaten to exceed your quota.
We know that some users have a setup where one person in the group maintains the software and the entire group uses it. There is nothing wrong with that, although we advise to always be careful which of the data in your home directory you are sharing with whom.
If you want to share data with your group, you can use the regular Linux mechanisms like chown
and chmod
for that purpose. We describe how to do that on our Marvin research groups page.
If you need to share data with people outside your research group, you can open directories to everyone with the same mechanisms.
For software that you download and install from the internet, the same conditions as for your own software apply. Of course, we recommend that you first check whether your software is installed centrally already.
We are aware that sometimes software requires such a large amount of storage space that it does not fit into your home quota. If the software has a lot of input data, you might be able to split the input data off into a workspace. Note also that package managers like pip
and conda
often cache data, and you can free up space with various mechanism. Consult the package manger's documentation for that.
This problem is rare, and if you absolutely cannot continue your work otherwise, you might want to contact us and we can see if we find a solution together.
If your input data is of significant size, you are best off putting it into a workspace. Remember that you can extend a workspace for up to almost one year, which should be enough for most computation campaigns.
Note also that you can share data in your home and your workspaces with your research group.
If you have input data that you are absolutely certain your group will need for more than one year, you might want to contact us at marvin-support@hpc.uni-bonn.de.
Your output data should generally go into a workspace. If it is a large amount, you might want to condense it down to the data that you actually need. Since your workspace has a finite duration, you will need to copy everything down from the cluster to a more permanent location before the workspace expires.
If you need a long-term storage solution for your data, unfortunately none of our systems is suitable for that, but the research data infrastructure might be of use for you.
Intermediate results, checkpoint data and any other temporary data that your software creates should go into a workspace.
On Marvin, you have a home directory, as is typical on Linux systems. Your home directory has a quota, i.e. a maximum allowed size, of 100 GB.
All further information about home directories can be found on our page Storing and Transferring Data
There is currently no convenient way to check your quota on Marvin. You can use the Linux command
du
to check the size of files and folders in your home directory.
The workspaces are where most of your input, output and temporary data for your jobs should go.
You have no limit to the size of your workspaces on the main workspace (named scratch
) except the physical limits of the hard drives. The total capacity is about 5 Petabytes, shared between all users.
There is however a limit to the duration of your workspaces. That means that if the time is up, the data will be deleted and you should have saved all meaningful results by that point. The maximum allowed duration for a workspace is rather generous however - almost a year in total is possible.
Using the Marvin workspaces has its own dedicated page in our wiki, where you can learn how to create them.
Marvin has a central SSD partition which you can also access via the workspace mechanism.
You can read everything about how to use workspaces on our Marvin Workspaces page.If you create a workspace with the -F mlnvme
option, your workspace will live on the central SSD partition, see also the section about the different workspace filesystems on the Workspaces page.
The individual nodes on Marvin each have a local SSD that you can use for jobs with very high file I/O. This should be used sparingly and only if you know that you actually need such high I/O speeds.
Each Marvin node has 2 TB of local SSD capacity for all job on that node combined.
Every SLURM job has access to the local SSD on the node(s) where it is running. When the job starts, a new temporary folder for the job will be created and can be accessed under /tmp
from inside the job. In other words, you can copy files to and from /tmp
in your job script and they will be placed on the SSD.
Two important things to note:
/tmp
folder. If you need to exchange data between jobs, you need to use a shared location./tmp
folder is local to its node. This implies that if your job uses multiple nodes, not all your processes can see the same data inside /tmp
, and you need to handle this manually in your job script.The job's
/tmp
folder and all its contents will be deleted when the job finishes.