Skip to content

Data Storage Organisation

As explained in the Overview, the available storage is divided into filesets, each has its specific use.

/gpfs/home

The home directories are stored on the /gpfs/home fileset, this is the starting point when you log on Lucia. Each user has its own private space to store personal data, configurations, codes, etc. It is usually referred to as $HOME or ~.
By default, the permissions are read/write/execute for the owner only, i.e drwx------, and the group ownership is set to the user's personal group.
The /gpfs/home uses user quota, and limits are set as follows:

Block soft limit Block hard limit Block grace period File soft limit File hard limit File grace period
200GB 260GB 7 days 1000k files 1300k files 7 days

You'll still be able to write data when exceeding the soft limit until the 7 days grace period expires or you reach the hard limit. Once the grace period expires, you'll have to reduce your usage below the soft limit to be able to write data again.

You can check your usage and limits with:

mmlsquota -u <username> --block-size g ess:home

Project directories

Each project on Lucia has two main directories, one on the /gpfs/projects fileset for storage, and one on the /gpfs/scratch fileset as working directory. Those directories are intended for sharing data among members of the project.

The default permissions on the directories are "2770", i.e drwxrws---, with the setgid bit set, allowing newly created files and subdirectories to automatically inherit the same group as the parent directory, see setgid below for more information.

Group quota are set on blocks and files on both filesets, the limits vary from one project to another following what was requested for the project. Note that the file limit depends on the block limit, by default the minimum is 500k files for projects with ≤500GB of block limit, this limit is then increased by 1k file per additional GB, and this limit is capped at 10000k files for projects with block limits >10000GB can be increased on demand).

You can check the usage and limits for your project on all filesets with:

mmlsquota -g <project_name> --block-size g ess
If not specified in the DoW, the default quota values for industrial projects are:

Fileset Block soft limit Block hard limit Block grace period File soft limit File hard limit File grace period
/gpfs/projects 2000GB 2600GB 7 days 2000k files 2600k files 7 days
/gpfs/scratch 1000GB 1300GB 7 days 1000k files 1300k files 7 days

Default quota

The default quota for any Unix group that isn't a project on /gpfs/projects and /gpfs/scratch is set to an extremely low value (16KB and 1 file), as a consequence you might get a Disk quota exceeded error message if the project directory you're working on doesn't have its permissions and ownership properly set, see setgid below.

/gpfs/projects ($PROJECT_HOME)

The /gpfs/projects fileset is used to store and share data throughout the project's lifespan, typically software, devolopments, input files and important files that need to be kept after a job is completed.

You can specifically check the project usage and limits for the /gpfs/projects fileset with:

mmlsquota -g <project_name> --block-size g ess:projects

/gpfs/scratch ($SCRATCH_HOME)

The /gpfs/scratch fileset is the workspace used for temporary data during job execution, and it partly consists of NVMe SSD for better performance.

You can specifically check the scratch usage and limits for the /gpfs/scratch fileset with:

mmlsquota -g <project_name> --block-size g ess:scratch

Periodical clean-up of the /gpfs/scratch fileset

As the /gpfs/scratch fileset is a temporary workspace build for performance, it will be cleaned up periodically to avoid dormant data. The clean-up will usually occur during the spring (end of May) and the fall (end of November) maintenance windows. A reminder will be sent 15 days prior the maintenance window.

Umask

The default umask on Lucia is currently quite permissive as it is set to 0002 (RHEL8’s default), meaning files and directories you create will be group writable and world readable.

While group writable permissions can be useful for instance when collaborating with other users in projects’ directories, be aware that other users of the same project might also intentionally or unintentionally modify or even delete your files and directories.

Depending on your preferences, you might want to restrict the default permissions and only relax them when needed using chmod. Here are some examples of the umask command:

  • Display your current umask in octal values: umask
  • Display your current umask in symbolic values: umask -S
  • Set your umask to only allow group readable permissions and no other permssions: umask 0027

Note that setting your umask on the command line will only modify it for the current session, if you want a permanent change, you'll have to add the command in your ~/.bashrc.

For more information on umask and permissions, see Red Hat’s documentation

Setgid

The setgid bit is set on the project directories so that new files and directories created inside the project directories inherit the same group membership as their parent directory instead of the primary group of the user.

Unfortunately, some commands like mv try to preserve the original permissions and ownership and "break" the setgid bit, so it is preferable to use cp instead (without the -p option obviously). Depending on how you use rsync, it may also cause issues, and you should use the --no-p (turns off the preserve permissions), --no-g (turns off the preserve group) and --chmod=ug-rwX (ensures that all non-masked bits get enabled) options, for instance:

rsync -av --no-p --no-g --chmod=ug=rwX <src> <dest>

Alternatively, it might be more convenient to use the newgrp and/or sg commands to temporarily change your primary group to the group of the project you're working on, see man newgrp and man sg for the differences between the two commands.

Setting the setgid bit

Be cautious when setting the setgid bit and avoid using the -R option to chmod as this will also put the setgid bid on files, and when executed, the process will run with the group which owns the file. Use find instead, e.g.: find /gpfs/projects/company/my_project/my_subdir -type d -exec chmod g+s {} \;