Aurora
Aurora is an Intel/HPE EX supercomputer located in the ALCF at Argonne National Laboratory. Each compute node contains two Intel (Sapphire Rapids) Xeon CPUs and six Intel Xe GPUs (Ponte Vecchio), each with two tiles.
The PBS scheduler is used to submit jobs from login nodes to run on compute nodes.
Configuring Python and Installation
To obtain Python use:
module use /soft/modulefiles
module load frameworks
To obtain libEnsemble:
pip install libensemble
See here for more information on advanced options for installing libEnsemble, including using Spack.
Example
To run the forces_gpu tutorial on Aurora.
To obtain the example you can git clone libEnsemble - although only the forces sub-directory is needed:
git clone https://github.com/Libensemble/libensemble
cd libensemble/libensemble/tests/scaling_tests/forces/forces_app
To compile forces (a C with OpenMP target application):
mpicc -DGPU -O3 -fiopenmp -fopenmp-targets=spir64 -o forces.x forces.c
Now go to forces_gpu directory:
cd ../forces_gpu
To make use of all available GPUs, open run_libe_forces.py
and adjust
the exit_criteria to do more simulations. The following will do two
simulations for each worker:
# Instruct libEnsemble to exit after this many simulations
ensemble.exit_criteria = ExitCriteria(sim_max=nsim_workers*2)
Now grab an interactive session on two nodes (or use the batch script at
../submission_scripts/submit_pbs_aurora.sh
):
qsub -A <myproject> -l select=2 -l walltime=15:00 -lfilesystems=home -q EarlyAppAccess -I
Once in the interactive session, you may need to reload the frameworks module:
cd $PBS_O_WORKDIR
module use /soft/modulefiles
module load frameworks
Then in the session run:
python run_libe_forces.py --comms local --nworkers 13
This provides twelve workers for running simulations (one for each GPU across two nodes). An extra worker is added to run the persistent generator. The GPU settings for each worker simulation are printed.
Looking at libE_stats.txt
will provide a summary of the runs.
Using tiles as GPUs
If you wish to treat each tile as its own GPU, then add the libE_specs
option use_tiles_as_gpus=True
, so the libE_specs block of
run_libe_forces.py
becomes:
ensemble.libE_specs = LibeSpecs(
num_resource_sets=nsim_workers,
sim_dirs_make=True,
use_tiles_as_gpus=True,
)
Now you can run again but with twice the workers for running simulations (each will use one GPU tile):
python run_libe_forces.py --comms local --nworkers 25
Note that the forces example will automatically use the GPUs available to each worker (with one MPI rank per GPU), so if fewer workers are provided, more than one GPU will be used per simulation.
Also see forces_gpu_var_resources
and forces_multi_app
examples for
cases that use varying processor/GPU counts per simulation.
Demonstration
Note that a video demonstration of the forces_gpu example on Frontier is also available. The workflow is identical when running on Aurora, with the exception of different compiler options and numbers of workers (because the numbers of GPUs on a node differs).