Data Storage Organisation
As explained in the Overview, the available storage is divided into filesets, each has its specific use.
/gpfs/home
The home directories are stored on the /gpfs/home
fileset, this is the starting point when you log on Lucia. Each user has its own private space to store personal data, configurations, codes, etc. It is usually referred to as $HOME
or ~
.
By default, the permissions are read/write/execute for the owner only, i.e drwx------
, and the group ownership is set to the user's personal group.
The /gpfs/home
uses user quota, and limits are set as follows:
Block soft limit | Block hard limit | Block grace period | File soft limit | File hard limit | File grace period |
---|---|---|---|---|---|
200GB | 260GB | 7 days | 1000k files | 1300k files | 7 days |
You'll still be able to write data when exceeding the soft limit until the 7 days grace period expires or you reach the hard limit. Once the grace period expires, you'll have to reduce your usage below the soft limit to be able to write data again.
You can check your usage and limits with:
Project directories
Each project on Lucia has two main directories, one on the /gpfs/projects
fileset for storage, and one on the /gpfs/scratch
fileset as working directory. Those directories are intended for sharing data among members of the project.
The default permissions on the directories are "2770", i.e drwxrws---
, with the setgid bit set, allowing newly created files and subdirectories to automatically inherit the same group as the parent directory, see setgid below for more information.
Group quota are set on blocks and files on both filesets, the limits vary from one project to another following what was requested for the project. Note that the file limit depends on the block limit, by default the minimum is 500k files for projects with ≤500GB of block limit, this limit is then increased by 1k file per additional GB, and this limit is capped at 10000k files for projects with block limits >10000GB can be increased on demand).
You can check the usage and limits for your project on all filesets with:
If not specified in the DoW, the default quota values for industrial projects are:Fileset | Block soft limit | Block hard limit | Block grace period | File soft limit | File hard limit | File grace period |
---|---|---|---|---|---|---|
/gpfs/projects |
2000GB | 2600GB | 7 days | 2000k files | 2600k files | 7 days |
/gpfs/scratch |
1000GB | 1300GB | 7 days | 1000k files | 1300k files | 7 days |
Default quota
The default quota for any Unix group that isn't a project on /gpfs/projects
and /gpfs/scratch
is set to an extremely low value (16KB and 1 file), as a consequence you might get a Disk quota exceeded
error message if the project directory you're working on doesn't have its permissions and ownership properly set, see setgid below.
/gpfs/projects ($PROJECT_HOME
)
The /gpfs/projects
fileset is used to store and share data throughout the project's lifespan, typically software, devolopments, input files and important files that need to be kept after a job is completed.
You can specifically check the project usage and limits for the /gpfs/projects
fileset with:
/gpfs/scratch ($SCRATCH_HOME
)
The /gpfs/scratch
fileset is the workspace used for temporary data during job execution, and it partly consists of NVMe SSD for better performance.
You can specifically check the scratch usage and limits for the /gpfs/scratch
fileset with:
Periodical clean-up of the /gpfs/scratch
fileset
As the /gpfs/scratch
fileset is a temporary workspace build for performance, it will be cleaned up periodically to avoid dormant data. The clean-up will usually occur during the spring (end of May) and the fall (end of November) maintenance windows. A reminder will be sent 15 days prior the maintenance window.
Umask
The default umask on Lucia is currently quite permissive as it is set to 0002
(RHEL8’s default), meaning files and directories you create will be group writable and world readable.
While group writable permissions can be useful for instance when collaborating with other users in projects’ directories, be aware that other users of the same project might also intentionally or unintentionally modify or even delete your files and directories.
Depending on your preferences, you might want to restrict the default permissions and only relax them when needed using chmod. Here are some examples of the umask command:
- Display your current umask in octal values:
umask
- Display your current umask in symbolic values:
umask -S
- Set your umask to only allow group readable permissions and no other permssions:
umask 0027
Note that setting your umask on the command line will only modify it for the current session, if you want a permanent change, you'll have to add the command in your ~/.bashrc
.
For more information on umask and permissions, see Red Hat’s documentation
Setgid
The setgid bit is set on the project directories so that new files and directories created inside the project directories inherit the same group membership as their parent directory instead of the primary group of the user.
Unfortunately, some commands like mv
try to preserve the original permissions and ownership and "break" the setgid bit, so it is preferable to use cp
instead (without the -p
option obviously). Depending on how you use rsync
, it may also cause issues, and you should use the --no-p
(turns off the preserve permissions), --no-g
(turns off the preserve group) and --chmod=ug-rwX
(ensures that all non-masked bits get enabled) options, for instance:
Alternatively, it might be more convenient to use the newgrp
and/or sg
commands to temporarily change your primary group to the group of the project you're working on, see man newgrp
and man sg
for the differences between the two commands.
Setting the setgid bit
Be cautious when setting the setgid bit and avoid using the -R
option to chmod
as this will also put the setgid bid on files, and when executed, the process will run with the group which owns the file. Use find
instead, e.g.: find /gpfs/projects/company/my_project/my_subdir -type d -exec chmod g+s {} \;