Skip to content

Queueing and Running Jobs

The job scheduler used on Lucia is Slurm Workload Manager version 23.02. A quick start guide explaining the basics is available on Slurm's official website along with the full documentation. For former PBS users, you may also want to check out the Rosetta Stone of job schedulers which is a conversion table of commands, variables, etc. between various job schedulers.

Partitions

The available nodes are grouped into partitions, also sometimes called queues, usually depending on the type of resource made available and the usage purpose. Each partition has its own limits and preferred type of usage, see the table below.

About resource usage

As you can see in the table below, the GPU nodes have only 32 CPU cores and 240GB of memory available for 4 GPUs. To maximize the use of the GPUs on Lucia, please do no use more than 8 CPU cores and 60 GB per GPU.

As a general rule, it is also recommended to avoid exceeding to the optimal amount of memory per CPU as much as possible so as not to waste computing resources.

Partition Job type Num nodes CPUs/node GPUs/node Available Mem/node Optimal Mem/CPU Shared
batch MPI/SMP 260 128 - 240GB 1920MB NO (ExclusiveUser)
medium MPI/SMP 30 128 - 492GB 3936MB NO (ExclusiveUser)
shared Serial/SMP 10 128 - 492GB 3936MB YES
large SMP 7 64 - 2000GB 32000MB YES
xlarge SMP 1 64 - 4000GB 64000MB YES
gpu GPU 50 32 4 x A100 40GB 240GB 7680MB YES
ia GPU 2 64 8 x A100 80GB 2000GB 32000MB YES
visu Visualization 4 32 4 x T4 16GB 492GB 15744MB YES
debug Debugging (CPU) 10 128 - 240GB 1920MB YES
debug-gpu Debugging (GPU) 2 32 4 x A100 40GB 240GB 7680MB YES

QoS

We're also using QoS (Quality of Service) on top of partitions to set additional parameters or constraints, see the table below for the default (in bold) and available QoS for each partition. Actual limits can also be displayed with the following command:

sacctmgr show qos format=Name,Priority,MaxTRESPU%16,MaxJobsPU,MaxSubmitPU,MaxTRESPA,MaxJobsPA,MaxSubmitPA,MinTRES,MaxTRES%32,MaxWall,Flags

Partition QoS Max walltime Job resource limits Account resource limits User resource limits
batch & medium normal 48h Max 128 nodes - Max 2000 queued jobs
long 168h Max 4 nodes Max 2048 CPU Max 512 CPU, max 4 nodes, max 2000 queued jobs
shared shared 168h Max 1 node - Max 500 queued jobs
large large 168h Min 490GB, max 4 nodes - Max 4 nodes, max 16 running jobs, max 200 queued jobs
xlarge xlarge 168h Min 1000GB, max 1 node - Max 1 node, max 4 running jobs, max 200 queued jobs
gpu gpu 48h Min 1 GPU, max 16 nodes - -
ia ia 48h Min 1 GPU - -
visu visu 4h Min 1 GPU, Max 1 GPU, Max 8 CPU, Max 123GB - Max 1 job
debug debug 2h Max 4 nodes - Max 4 nodes, max 4 running jobs, max 20 queued jobs
debug-gpu debug-gpu 2h Max 2 nodes - Max 1 running job, max 10 queued jobs

Fairshare

Fairshare allows projects and users to get a fair portion of the system based on their past resource usage. Shares on Lucia are established using the Fair Tree algorithm, and the shares are distributed equally between projects of the same category, categories and subcategories shares are as follows:

  • Category 1 (85%): non-economic activities, divided in 2 subcategories:
    • Category 1a (70%): Universities and colleges
    • Category 1b (15%): Accredited research centers
  • Category 2 (15%): economic activities, divided in 3 subcategories:
    • Category 2a (5%): Universities and colleges
    • Category 2b (5%): Accredited research centers
    • Category 2c (5%): Companies and industry

Submitting and controlling jobs

  • sbatch: to submit batch scripts
  • srun: to initiate parallel job steps within a job, and also to start an interactive job
  • salloc: to request an interactive allocation, and then use srun to execute parallel task on the allocated resources
  • scancel: to cancel a job
  • squeue: to view queued jobs
  • scontrol: to view various information about Slurm, e.g job information with scontrol show <jobid>

Job examples

Single-threaded

Serial job with 1200GB of memory per core, running for 4 days and 12 hours, on the large partition:

#!/bin/bash

#SBATCH --job-name=serial_job
#SBATCH --output=%j_%x.out
#SBATCH --partition=large
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=1200G
#SBATCH --time=4-12:00:00
#SBATCH --account=my_project_name

echo "----------------- Environment ------------------"
module purge
module load foss/2022a
module list

echo "------------------- Job info -------------------"
echo "job_id             : $SLURM_JOB_ID"
echo "jobname            : $SLURM_JOB_NAME"
echo "queue              : $SLURM_JOB_PARTITION"
echo "qos                : $SLURM_JOB_QOS"
echo "account            : $SLURM_JOB_ACCOUNT"
echo "submit dir         : $SLURM_SUBMIT_DIR"
echo "number of mpi tasks: $SLURM_NTASKS tasks"
echo "OMP_NUM_THREADS    : $OMP_NUM_THREADS"

echo "------------------- Node list ------------------"
echo $SLURM_JOB_NODELIST

echo "---------------- Checking limits ---------------"
ulimit -a

echo "--------------- Running the code ---------------"

echo -n "This run started on: "
date

./runner.serial

echo -n "This run completed on: "
date

Multi-threaded

SMP/OpenMP job with 64 threads and a total of 60GB memory, running for 12 hours on the batch partition:

#!/bin/bash

# ------------------------------------------------------------------------------
# Slurm directives
# ------------------------------------------------------------------------------

#SBATCH --job-name=openmp_job
#SBATCH --output=%j_%x.out
#SBATCH --partition=batch
#SBATCH --ntasks=1
#SBATCH --mem=60G
#SBATCH --cpus-per-task=64
#SBATCH --time=12:00:00
#SBATCH --account=my_project_name

# ------------------------------------------------------------------------------
# Setting up the environment
# ------------------------------------------------------------------------------

echo "----------------- Environment ------------------"
module purge
module load foss/2022a
module list

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# ------------------------------------------------------------------------------
# Printing some information
# ------------------------------------------------------------------------------

echo "------------------- Job info -------------------"
echo "job_id             : $SLURM_JOB_ID"
echo "jobname            : $SLURM_JOB_NAME"
echo "queue              : $SLURM_JOB_PARTITION"
echo "qos                : $SLURM_JOB_QOS"
echo "account            : $SLURM_JOB_ACCOUNT"
echo "submit dir         : $SLURM_SUBMIT_DIR"
echo "number of mpi tasks: $SLURM_NTASKS tasks"
echo "OMP_NUM_THREADS    : $OMP_NUM_THREADS"
echo "Executable         : $EXEC"

echo "------------------- Node list ------------------"
echo $SLURM_JOB_NODELIST

echo "---------------- Checking limits ---------------"
ulimit -a

# ------------------------------------------------------------------------------
# And finally running the code
# ------------------------------------------------------------------------------

echo "--------------- Running the code ---------------"

echo -n "This run started on: "
date

./runner.omp

echo -n "This run completed on: "
date

Parallel

Pure MPI

#!/bin/bash

# ------------------------------------------------------------------------------
# Slurm directives
# ------------------------------------------------------------------------------

#SBATCH --job-name=mpi_job
#SBATCH --output=%j_%x.out
#SBATCH --partition=batch
#SBATCH --ntasks=1024
#SBATCH --mem-per-cpu=1920M
#SBATCH --time=24:00:00
#SBATCH --account=my_project_name

# ------------------------------------------------------------------------------
# Setting up the environment
# ------------------------------------------------------------------------------

echo "----------------- Environment ------------------"
module purge
module load PrgEnv-cray
module list

# ------------------------------------------------------------------------------
# Printing some information
# ------------------------------------------------------------------------------

echo "------------------- Job info -------------------"
echo "job_id             : $SLURM_JOB_ID"
echo "jobname            : $SLURM_JOB_NAME"
echo "queue              : $SLURM_JOB_PARTITION"
echo "qos                : $SLURM_JOB_QOS"
echo "account            : $SLURM_JOB_ACCOUNT"
echo "submit dir         : $SLURM_SUBMIT_DIR"
echo "number of mpi tasks: $SLURM_NTASKS tasks"
echo "OMP_NUM_THREADS    : $OMP_NUM_THREADS"
echo "Executable         : $EXEC"

echo "------------------- Node list ------------------"
echo $SLURM_JOB_NODELIST

echo "---------------- Checking limits ---------------"
ulimit -a

# ------------------------------------------------------------------------------
# And finally running the code
# ------------------------------------------------------------------------------

echo "--------------- Running the code ---------------"

echo -n "This run started on: "
date

srun ./runner.mpi

echo -n "This run completed on: "
date

Hybrid MPI/OpenMP

Multiple threads per MPI process

#!/bin/bash

# ------------------------------------------------------------------------------
# Slurm directives
# ------------------------------------------------------------------------------

#SBATCH --job-name=hybrid_job
#SBATCH --output=%j_%x.out
#SBATCH --partition=batch
#SBATCH --ntasks=256
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=1920M
#SBATCH --time=12:00:00
#SBATCH --account=my_project_name

# ------------------------------------------------------------------------------
# Setting up the environment
# ------------------------------------------------------------------------------

echo "----------------- Environment ------------------"
module purge
module load PrgEnv-cray
module list

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# ------------------------------------------------------------------------------
# Printing some information
# ------------------------------------------------------------------------------

echo "------------------- Job info -------------------"
echo "job_id             : $SLURM_JOB_ID"
echo "jobname            : $SLURM_JOB_NAME"
echo "queue              : $SLURM_JOB_PARTITION"
echo "qos                : $SLURM_JOB_QOS"
echo "account            : $SLURM_JOB_ACCOUNT"
echo "submit dir         : $SLURM_SUBMIT_DIR"
echo "number of mpi tasks: $SLURM_NTASKS tasks"
echo "OMP_NUM_THREADS    : $OMP_NUM_THREADS"
echo "Executable         : $EXEC"

echo "------------------- Node list ------------------"
echo $SLURM_JOB_NODELIST

echo "---------------- Checking limits ---------------"
ulimit -a

# ------------------------------------------------------------------------------
# And finally running the code
# ------------------------------------------------------------------------------

echo "--------------- Running the code ---------------"

echo -n "This run started on: "
date

srun ./runner.hybrid

echo -n "This run completed on: "
date

GPU

Using GPUs

#!/bin/bash

# ------------------------------------------------------------------------------
# Slurm directives
# ------------------------------------------------------------------------------

#SBATCH --job-name=gpu_job
#SBATCH --output=%j_%x.out
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --mem=240G
#SBATCH --gpus=4
#SBATCH --time=10:00:00
#SBATCH --account=my_project_name

# ------------------------------------------------------------------------------
# Setting up the environment
# ------------------------------------------------------------------------------

echo "----------------- Environment ------------------"
module purge
module load CUDA/11.7.0
module list

# ------------------------------------------------------------------------------
# Printing some information
# ------------------------------------------------------------------------------

echo "------------------- Job info -------------------"
echo "job_id             : $SLURM_JOB_ID"
echo "jobname            : $SLURM_JOB_NAME"
echo "queue              : $SLURM_JOB_PARTITION"
echo "qos                : $SLURM_JOB_QOS"
echo "account            : $SLURM_JOB_ACCOUNT"
echo "submit dir         : $SLURM_SUBMIT_DIR"
echo "number of mpi tasks: $SLURM_NTASKS tasks"
echo "OMP_NUM_THREADS    : $OMP_NUM_THREADS"
echo "number of gpus     : $SLURM_GPUS_ON_NODE"
echo "Executable         : $EXEC"

echo "------------------- Node list ------------------"
echo $SLURM_JOB_NODELIST

echo "---------------- Checking limits ---------------"
ulimit -a

# ------------------------------------------------------------------------------
# And finally running the code
# ------------------------------------------------------------------------------

echo "--------------- Running the code ---------------"

echo -n "This run started on: "
date

srun ./runner.cuda

echo -n "This run completed on: "
date

Interactive

srun -p batch -A my_project_name -N 1 -n 16 --mem-per-cpu=1024M -t 60 --pty bash
salloc -p batch -A my_project_name -N 2 -n 256 --mem=241G -t 2:00:00
# and once the resources are allocated use srun the same way as in submission scripts:
srun ./runner.mpi

Job Arrays

Running many similar jobs with small variations (e.g. different input files or conditions)

#!/bin/bash

# ------------------------------------------------------------------------------
# Slurm directives
# ------------------------------------------------------------------------------

#SBATCH --job-name=array_job
#SBATCH --output=%A-%a_%x.out
#SBATCH --partition=batch
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=1:00:00
#SBATCH --array=0-19
#SBATCH --account=my_project_name

# ------------------------------------------------------------------------------
# Setting up the environment
# ------------------------------------------------------------------------------

echo "----------------- Environment ------------------"
module purge
module load foss/2022a
module list

# ------------------------------------------------------------------------------
# Printing some information
# ------------------------------------------------------------------------------

echo "------------------- Job info -------------------"
echo "job_id             : $SLURM_JOB_ID"
echo "jobname            : $SLURM_JOB_NAME"
echo "queue              : $SLURM_JOB_PARTITION"
echo "qos                : $SLURM_JOB_QOS"
echo "account            : $SLURM_JOB_ACCOUNT"
echo "submit dir         : $SLURM_SUBMIT_DIR"
echo "number of mpi tasks: $SLURM_NTASKS tasks"
echo "OMP_NUM_THREADS    : $OMP_NUM_THREADS"
echo "Executable         : $EXEC"

echo "------------------- Node list ------------------"
echo $SLURM_JOB_NODELIST

echo "---------------- Checking limits ---------------"
ulimit -a

# ------------------------------------------------------------------------------
# And finally running the code
# ------------------------------------------------------------------------------

echo "--------------- Running the code ---------------"

echo -n "This run started on: "
date

srun ./runner $SLURM_ARRAY_TASK_ID

echo -n "This run completed on: "
date

Packed

Running (many) independent process inside a job

#!/bin/bash

#SBATCH --job-name=packed_job

Heterogeneous

Requesting heterogeneous resources for the same job (e.g. 1 cpu with 100GB of mem + 64 cpu with 2GB)

#!/bin/bash

#SBATCH --job-name=heterogen_job

Co-simulations

Running various programs in the same job

#!/bin/bash

#SBATCH --job-name=cosim_job