Aurora is an Intel/HPE EX supercomputer located in the ALCF at Argonne National Laboratory. Each compute node contains two Intel (Sapphire Rapids) Xeon CPUs and six Intel Xe GPUs (Ponte Vecchio), each with two tiles.

The PBS scheduler is used to submit jobs from login nodes to run on compute nodes.

Configuring Python and Installation

To obtain Python use:

module use /soft/modulefiles
module load frameworks

To obtain libEnsemble:

pip install libensemble

See here for more information on advanced options for installing libEnsemble, including using Spack.


To run the forces_gpu tutorial on Aurora.

To obtain the example you can git clone libEnsemble - although only the forces sub-directory is needed:

git clone
cd libensemble/libensemble/tests/scaling_tests/forces/forces_app

To compile forces (a C with OpenMP target application):

mpicc -DGPU -O3 -fiopenmp -fopenmp-targets=spir64 -o forces.x forces.c

Now go to forces_gpu directory:

cd ../forces_gpu

To make use of all available GPUs, open and adjust the exit_criteria to do more simulations. The following will do two simulations for each worker:

# Instruct libEnsemble to exit after this many simulations
ensemble.exit_criteria = ExitCriteria(sim_max=nsim_workers*2)

Now grab an interactive session on two nodes (or use the batch script at ../submission_scripts/

qsub -A <myproject> -l select=2 -l walltime=15:00 -lfilesystems=home -q EarlyAppAccess -I

Once in the interactive session, you may need to reload the frameworks module:

module use /soft/modulefiles
module load frameworks

Then in the session run:

python --comms local --nworkers 13

This provides twelve workers for running simulations (one for each GPU across two nodes). An extra worker is added to run the persistent generator. The GPU settings for each worker simulation are printed.

Looking at libE_stats.txt will provide a summary of the runs.

Using tiles as GPUs

If you wish to treat each tile as its own GPU, then add the libE_specs option use_tiles_as_gpus=True, so the libE_specs block of becomes:

ensemble.libE_specs = LibeSpecs(

Now you can run again but with twice the workers for running simulations (each will use one GPU tile):

python --comms local --nworkers 25

Note that the forces example will automatically use the GPUs available to each worker (with one MPI rank per GPU), so if fewer workers are provided, more than one GPU will be used per simulation.

Also see forces_gpu_var_resources and forces_multi_app examples for cases that use varying processor/GPU counts per simulation.


Note that a video demonstration of the forces_gpu example on Frontier is also available. The workflow is identical when running on Aurora, with the exception of different compiler options and numbers of workers (because the numbers of GPUs on a node differs).