Skip to content

Partitions

The available nodes are grouped into partitions, also sometimes called queues, usually depending on the type of resource made available and the usage purpose. Each partition has its own limits and preferred type of usage, see the table below.

About resource usage

As you can see in the table below, the GPU nodes have only 32 CPU cores and 240GB of memory available for 4 GPUs. To maximize the use of the GPUs on Lucia, please do no use more than 8 CPU cores and 60 GB per GPU.

As a general rule, it is also recommended to avoid exceeding to the optimal amount of memory per CPU as much as possible so as not to waste computing resources.

Partition Job type Num nodes CPUs/node GPUs/node Available Mem/node Optimal Mem/CPU Shared
batch MPI/SMP 260 128 - 240GB 1920MB NO (ExclusiveUser)
medium MPI/SMP 30 128 - 492GB 3936MB NO (ExclusiveUser)
shared Serial/SMP 10 128 - 492GB 3936MB YES
large SMP 7 64 - 2000GB 32000MB YES
xlarge SMP 1 64 - 4000GB 64000MB YES
gpu GPU 50 32 4 x A100 40GB 240GB 7680MB YES
ia GPU 2 64 8 x A100 80GB 2000GB 32000MB YES
visu Visualization 4 32 4 x T4 16GB 492GB 15744MB YES
debug Debugging (CPU) 10 128 - 240GB 1920MB YES
debug-gpu Debugging (GPU) 2 32 4 x A100 40GB 240GB 7680MB YES

QoS

We're also using QoS (Quality of Service) on top of partitions to set additional parameters or constraints, see the table below for the default (in bold) and available QoS for each partition. Actual limits can also be displayed with the following command:

sacctmgr show qos format=Name,Priority,MaxTRESPU%16,MaxJobsPU,MaxSubmitPU,MaxTRESPA,MaxJobsPA,MaxSubmitPA,MinTRES,MaxTRES%32,MaxWall,Flags

Partition QoS Max walltime Job resource limits Account resource limits User resource limits
batch & medium normal 48h Max 128 nodes - Max 2000 queued jobs
long 168h Max 4 nodes Max 2048 CPU Max 512 CPU, max 4 nodes, max 2000 queued jobs
shared shared 168h Max 1 node - Max 500 queued jobs
large large 168h Min 490GB, max 4 nodes - Max 4 nodes, max 16 running jobs, max 200 queued jobs
xlarge xlarge 168h Min 1000GB, max 1 node - Max 1 node, max 4 running jobs, max 200 queued jobs
gpu gpu 48h Min 1 GPU, max 16 nodes - -
ia ia 48h Min 1 GPU - -
visu visu 4h Min 1 GPU, Max 1 GPU, Max 8 CPU, Max 123GB - Max 1 job
debug debug 2h Max 4 nodes - Max 4 nodes, max 4 running jobs, max 20 queued jobs
debug-gpu debug-gpu 2h Max 2 nodes - Max 1 running job, max 10 queued jobs

Fairshare

Fairshare allows projects and users to get a fair portion of the system based on their past resource usage. Shares on Lucia are established using the Fair Tree algorithm, and the shares are distributed equally between projects of the same category, categories and subcategories shares are as follows:

  • Category 1 (85%): non-economic activities, divided in 2 subcategories:
    • Category 1a (70%): Universities and colleges
    • Category 1b (15%): Accredited research centers
  • Category 2 (15%): economic activities, divided in 3 subcategories:
    • Category 2a (5%): Universities and colleges
    • Category 2b (5%): Accredited research centers
    • Category 2c (5%): Companies and industry