Skip to content

Cray Programming Environment

Cray Programming Environment (aka CPE) is available on Lucia. The Cray Programming Environment (CPE) is a comprehensive suite of software tools, compilers, libraries and debuggers designed to optimize and manage applications. The CPE is tailored to exploit the high-performance capabilities of Lucia, making it an essential toolkit for scientific computing, research and other applications requiring massive computational power.

Usage

In order to load the available compiler suites, loading the Cray metamodule is required:

   module load Cray/24.07
   module avail
   ...
   ---------------------------------------------------------------------------------------- Cray Programming Environment Modules -----------------------------------------------------------------------------------------
   Cray/24.07         (S,L)    cray-R/4.4.0                   cray-libsci/24.07.0                    cray-pals/1.3.2                   craype-network-ucx        gdb4hpc/4.16.2            perftools-lite-gpu
   PrgEnv-aocc/8.4.0           cray-ccdb/5.0.4                cray-mpich-abi-pre-intel-5.0/8.1.30    cray-parallel-netcdf/1.12.3.13    craype-x86-milan          intel-classic/2022.2.0    perftools-lite-hbm
   PrgEnv-cray/8.4.0           cray-cti/2.18.4                cray-mpich-abi/8.1.30                  cray-pmi/6.1.15                   craype/2.7.32             intel-oneapi/2022.2.0     perftools-lite-loops
   PrgEnv-gnu/8.4.0            cray-dsmml/0.2.2               cray-mpich-ucx/8.1.30                  cray-python/3.11.7                craypkg-gen/1.3.33        intel/2022.2.0            perftools-lite
   PrgEnv-intel/8.4.0          cray-dyninst/12.3.2            cray-mpich/8.1.30                      cray-stat/4.12.3                  gcc-native/10.3           libfabric/1.13.1          perftools-preload
   aocc/4.1.0                  cray-fftw/3.3.10.8             cray-mrnet/5.1.3                       craype-accel-host                 gcc-native/11.2           papi/7.1.0.2              perftools
   atp/3.15.4                  cray-hdf5-parallel/1.14.3.1    cray-netcdf/4.9.0.13                   craype-network-infiniband         gcc-native/12.2           perftools-base/24.07.0    sanitizers4hpc/1.1.3
   cce/18.0.0                  cray-libpals/1.3.2             cray-openshmemx/11.4.0.beta            craype-network-ofi                gcc-native/13.2    (D)    perftools-lite-events     valgrind4hpc/2.13.3
   ...

The following table summarises the suites and associated compiler environments:

Suite name Module Programming environment collection
CCE cce PrgEnv-cray
GCC gcc PrgEnv-gnu
AOCC aocc PrgEnv-aocc
Intel intel PrgEnv-intel

Once the chosen programming environment loaded, the compiler wrappers will be available:

Language Wrapper
C cc
C++ CC
Fortran ftn

In the HPE Cray Programming Environment, the "compilers" — cc, CC and ftn — are not direct compilers like gcc or ifx. They are compiler wrappers provided by the Cray environment. A compiler wrapper is a front-end script that selects and invokes the correct underlying compiler (Cray, Intel, GCC or AOCC) along with system-specific flags, libraries and paths required to build applications correctly. Calling one of these wrappers:

  • picks the active compiler backend (based on the loaded PrgEnv-* module)

  • adds include paths, linker options and libraries automatically (see below)

  • ensures compatibility with Cray’s MPI (if needed)

  • adapts to the architecture (AMD EPYC zen3 or zen4...)

The Cray compiler wrappers automatically include a lot of the system-specific setup that should be otherwise added manually.

  1. Include paths

    These are directories the compiler looks in for header files (.h) or Fortran modules (.mod). The wrapper ensures the compiler can find:

    • Cray-provided maths libraries (Cray LibSci)

    • MPI headers (mpi.h)

    • Cray system and runtime headers

    • Any other Cray-specific tools like OpenMP headers, etc...

  2. Linker options

    The wrapper:

    • links the right version of Cray LibSci, MPI, OpenMP runtimes, etc...

    • passes architecture-specific flags like -march, -mcpu or -target

    • ensures compatibility with the Cray interconnect (for example, using libfabric)

  3. Runtime libraries

    Cray systems use custom libraries for:

    • MPI (Cray MPICH)

    • Threading (OpenMP)

    • Math (LibSci)

    • I/O (Cray HDF5 or NetCDF builds)

Compilers options

A concise table of recommended optimization flags in the HPE Cray Programming Environment for each C/C++/Fortran compiler, grouped by performance profiles is presented below:

Compiler / Language Default (safe/minimal) Good (balanced performance) Aggressive (maximum speed)
CCE (Cray) C/C++: -O2
Fortran: -O2
C/C++: -O2 -funroll-loops -ffast-math
Fortran: -O3 -hfp3 -h ipa
C/C++: -Ofast -flto -ffp=3
Fortran: -O3 -hfp3 -h ipa
GCC -O0 or wrapper default -O2 -ftree-vectorize -funroll-loops -ffast-math -march=<target> -Ofast -march=<target> -flto -ffast-math
AOCC -O2 -O3 -flto -funroll-loops -fopenmp -O3 -flto -funroll-loops -unroll-aggressive -fopenmp
Intel (oneAPI) -O2 -O3 -ipo -qopenmp -Ofast -ipo -qopenmp -fp-model fast=2

Note

  • <target> should be replaced with the actual CPU architecture -march=native or -march=znver3
  • Floating-point aggressive flags (-ffast-math, -fp-model fast=2, -hfp3) may affect numerical precision and should be used carefully
  • -flto (link-time optimization) is common in aggressive tiers for inter-file inlining and higher optimizations
  • Intel’s aggressive profile uses -Ofast, -ipo, and -fp-model fast=2 for maximum throughput

Other tools

Cray Scientific and Math Libraries (CSML)

CSML is a set of high performance libraries that provide portability for scientific applications by implementing APIs for arrays (NetCDF), sparse and dense linear algebra (BLAS, LAPACK, ScaLAPACK) and fast Fourier transforms (FFTW).

Cray module Usage
cray-libsci Cray Scientific Libraries
cray-fftw Fastest Fourier Transform in the West (FFTW3)
cray-parallel-netcdf Parallel I/O library for NetCDF file access
cray-R R for use on HPE Cray HPC systems
cray-python Python programming language and libraries for Cray PE
  • cray-libsci is the Cray optimized scientific computing library providing high-performance implementations of BLAS, LAPACK and ScaLAPACK. It is tuned for Cray architectures and supports multi-threading and MPI parallelism.

  • cray-fftw is Cray optimized version of the FFTW (Fastest Fourier Transform in the West) library, providing fast and portable discrete Fourier transform routines for serial and parallel (MPI) applications.

  • cray-parallel-netcdf is an optimized parallel I/O library built on top of MPI-IO, supporting efficient read/write access to NetCDF data formats in distributed-memory applications.

  • cray-R is the Cray-tuned version of the R statistical computing environment, adapted for HPC systems to improve performance, scalability and integration with Cray libraries (linking to cray-libsci for math operations).

  • cray-python is a performance-enhanced version of Python designed for Cray systems, bundled with packages like NumPy and SciPy linked against cray-libsci and optimized for parallel execution (via MPI4Py or OpenMP).

Cray Message Passing Toolkit (CMPT)

CMPT is a collection of software libraries used to perform data transfers between nodes running in parallel applications. CMPT comprises the Message Passing Interface (MPI) and OpenSHMEM parallel programming models.

Cray module Usage
cray-mpich Cray MPICH Message Passing Interface
cray-mpich-ucx Message Passing Interface (MPI) for the UCX netmod
cray-openshmemx Logically shared distributed memory access routines
cray-libpals Parallel Application Launch Service library
cray-mpich-abi Cray MPICH ABI Compatibility module
cray-mpich-abi-pre-intel-5.0 Cray MPICH pre-Intel MPI 5.0 ABI Compatibility module
cray-pals Parallel Application Launch Service

These libraries are essential for building scalable applications on HPE-Cray systems and are automatically integrated when using the Cray compiler wrappers.

  • cray-mpich is Cray highly optimized implementation of the MPI standard, based on MPICH. It provides low-latency and high-throughput communication and supports both MPI-3 features and hybrid MPI/OpenMP applications.

  • cray-openshmemx is Cray implementation of the OpenSHMEM standard, supporting one-sided communication and Partitioned Global Address Space (PGAS) programming.

  • cray-pals is the Process Abstraction Layer for Scalable systems — a low-level runtime layer used internally to manage parallel job startup, process launch and wireup across the interconnect.

Cray MPICH uses the UCX backend by default

The default MPI backend has been switched from cray-mpich (OFI-based) to cray-mpich-ucx due to stability and compatibility issues with the OFI transport on the Mellanox HDR200 Infiniband network.

The UCX backend offers:
- better stability and reliability for CPU workloads
- native support for GPU-aware MPI
- improved compatibility with standard Infiniband fabrics

Performance Analysis and Optimization

Cray module Usage
cray-dsmml Distributed symmetric memory management library (DSMML)
cray-pmi Cray Process Management Interface
craype Setup for Cray PE driver set and targeting modules
papi Performance API (PAPI) project specifies a standard API
perftools-base Performance Tools module (CrayPat, Apprentice2, Reveal)

The Cray Performance Measurement and Analysis Tools (CrayPAT) has a number of different components:

  • CrayPAT the full-featured program analysis tool set. CrayPAT consists of pat_build, the utility used to instrument programs, the CrayPat run time environment, which collects the specified performance data during program execution, and pat_report, the first-level data analysis tool, used to produce text reports or export data for more sophisticated analysis.

  • CrayPAT-lite a simplified and easy-to-use version of CrayPAT that provides basic performance analysis information automatically, with a minimum of user interaction.

  • Reveal the next-generation integrated performance analysis and code optimization tool, which enables the user to correlate performance data captured during program execution directly to the original source and identify opportunities for further optimization.

  • Cray PAPI components, which are support packages for those who want to access performance counters.

  • Cray Apprentice2 the second-level data analysis tool, used to visualize, manipulate and compare sets of program performance data in a GUI environment.

The above tools are made available for use by firstly loading the perftools-base module followed by either perftools (for CrayPAT, Reveal and Apprentice2) or one of the perftools-lite modules.

Using CrayPAT to profile a MPI program

This example shows how to profile a MPI application my_mpi_app.f90 using CrayPAT.

  1. It is recommended to compile the code using Cray wrappers as they use the necessary instrumentation hooks:
    ftn -o my_mpi_app my_mpi_app.f90
  2. Use pat_build to instrument the binary for data collection:
    pat_build -O apa my_mpi_app
    This creates an instrumented version: my_mpi_app+pat. The -O apa option enables Automatic Profile Analysis, which lets CrayPAT analyze where to collect the most useful performance data.
  3. Run the application:
    srun -n 4 ./my_mpi_app+pat
    This run generates a .xf file containing the performance data, for example my_mpi_app+pat+12345-678.xf.
  4. The performance report can be generated:
    pat_report my_mpi_app+pat+12345-678.xf > performance_report.txt
    The report contains:
    • function time breakdown
    • MPI communications costs
    • memory usage
    • possible optimzations suggestions.

Cray Debugger Support Tools (CDST)

Cray module Usage
gdb4hpc Cray Line Mode Parallel Debugger
valgrind4hpc Valgrind-based debugging tool to aid in the detection of memory leaks
cray-stat Cray Stack Trace Analysis Tool
atp Abnormal Termination Processing (ATP)
cray-ccdb Cray Comparative Debugger (CCDB) tool
cray-mrnet Multicast Reduction Network module
cray-cti Cray Common Tools Interface (CTI)
cray-dyninst Dynamic instrumentation libraries
sanitizers4hpc Tool for running HPC code instrumented with LLVM Sanitizer
  • gdb4hpc is a command-line tool working similarly to gdb that allows users to debug parallel programs. It can launch parallel programs or attach to ones already running and allows the user to step through the execution to identify the causes of any unexpected behaviour.

  • valgrind4hpc is a parallel memory debugging tool that aids in detection of memory leaks and errors in parallel applications. It aggregates like errors across processes and threads to simplify debugging of parallel appliciations.

  • cray-stat, the Stack Trace Analysis Tool, generates merged stack traces for parallel applications. It also provides visualisation tools.

  • atp is a runtime tool that automatically detects when an application crashes (e.g., segmentation fault) and collects useful debug information like stack traces, core files and a summary of all MPI ranks.

  • cray-ccdb is a tool for comparative debugging of MPI applications. It allows users to compare variables and behavior across multiple processes to detect divergence (for example, bitwise differences between ranks).

  • sanitizers4hpc is a suite of Cray-integrated sanitizers (based on LLVM/GCC/Valgrind) that help detect memory errors, data races and undefined behavior in C, C++, and Fortran applications.

Using Cray ATP to debug crashes in a MPI program

Cray ATP is a tool that helps diagnose segmentation faults, hangs and other crashes in MPI or OpenMP applications by automatically attaching a debugger (like gdb) when the program fails.

  1. It is recommended to compile the code using with -g to get readable function names and line numbers :
    ftn -g -o my_mpi_app my_mpi_app.f90
  2. Prepend atp-enabled to the MPI launch command:
    atp-enabled srun -n 8 ./my_mpi_app
    If the program crashes, ATP will automatically attach gdb to all MPI ranks and collect stack traces in a directory like atp-test-12345.
  3. After the crash, trace file can be inspected using stat-view:
    stat-view atp-test-12345/statuse
    This gives a rank-by-rank backtrace helping to find::
    • which function caused the crash
    • which rank(s) were affected
    • the source file and line number (if debug symbols are present).

📚 Cray Programming Environment documentation

🧭 Official documentation

📘 Compiler & Library references

💡 Tutorials