Partitions

The available nodes are grouped into partitions, also sometimes called queues, usually depending on the type of resource made available and the usage purpose. Each partition has its own limits and preferred type of usage, see the table below.

About resource usage

As you can see in the table below, the GPU nodes have only 32 CPU cores and 240GB of memory available for 4 GPUs. To maximize the use of the GPUs on Lucia, please do no use more than 8 CPU cores and 60 GB per GPU.

As a general rule, it is also recommended to avoid exceeding to the optimal amount of memory per CPU as much as possible so as not to waste computing resources.

Partition	Job type	Num nodes	CPUs/node	GPUs/node	Available Mem/node	Optimal Mem/CPU	Shared
batch	MPI/SMP	260	128	-	240GB	1920MB	NO (ExclusiveUser)
medium	MPI/SMP	30	128	-	492GB	3936MB	NO (ExclusiveUser)
shared	Serial/SMP	10	128	-	492GB	3936MB	YES
large	SMP	7	64	-	2000GB	32000MB	YES
xlarge	SMP	1	64	-	4000GB	64000MB	YES
gpu	GPU	50	32	4 x A100 40GB	240GB	7680MB	YES
ia	GPU	2	64	8 x A100 80GB	2000GB	32000MB	YES
visu	Visualization	4	32	4 x T4 16GB	492GB	15744MB	YES
debug	Debugging (CPU)	10	128	-	240GB	1920MB	YES
debug-gpu	Debugging (GPU)	2	32	4 x A100 40GB	240GB	7680MB	YES

QoS

We're also using QoS (Quality of Service) on top of partitions to set additional parameters or constraints, see the table below for the default (in bold) and available QoS for each partition. Actual limits can also be displayed with the following command:

sacctmgr show qos format=Name,Priority,MaxTRESPU%16,MaxJobsPU,MaxSubmitPU,MaxTRESPA,MaxJobsPA,MaxSubmitPA,MinTRES,MaxTRES%32,MaxWall,Flags

Partition	QoS	Max walltime	Job resource limits	Account resource limits	User resource limits
batch & medium	normal	48h	Max 128 nodes	-	Max 2000 queued jobs
	long	168h	Max 4 nodes	Max 2048 CPU	Max 512 CPU, max 4 nodes, max 2000 queued jobs
shared	shared	168h	Max 1 node	-	Max 500 queued jobs
large	large	168h	Min 490GB, max 4 nodes	-	Max 4 nodes, max 16 running jobs, max 200 queued jobs
xlarge	xlarge	168h	Min 1000GB, max 1 node	-	Max 1 node, max 4 running jobs, max 200 queued jobs
gpu	gpu	48h	Min 1 GPU, max 16 nodes	-	-
ia	ia	48h	Min 1 GPU	-	-
visu	visu	4h	Min 1 GPU, Max 1 GPU, Max 8 CPU, Max 123GB	-	Max 1 job
debug	debug	2h	Max 4 nodes	-	Max 4 nodes, max 4 running jobs, max 20 queued jobs
debug-gpu	debug-gpu	2h	Max 2 nodes	-	Max 1 running job, max 10 queued jobs

Fairshare

Fairshare allows projects and users to get a fair portion of the system based on their past resource usage. Shares on Lucia are established using the Fair Tree algorithm, and the shares are distributed equally between projects of the same category, categories and subcategories shares are as follows:

Category 1 (85%): non-economic activities, divided in 2 subcategories:
- Category 1a (70%): Universities and colleges
- Category 1b (15%): Accredited research centers
Category 2 (15%): economic activities, divided in 3 subcategories:
- Category 2a (5%): Universities and colleges
- Category 2b (5%): Accredited research centers
- Category 2c (5%): Companies and industry