Aurora
Aurora is an Intel/HPE EX supercomputer located in the ALCF at Argonne National Laboratory. Each compute node contains two Intel (Sapphire Rapids) Xeon CPUs and six Intel Xe GPUs (Ponte Vecchio), each with two tiles.
The PBS scheduler is used to submit jobs from login nodes to run on compute nodes.
Configuring Python and Installation
To obtain Python and create a virtual environment:
module load frameworks
python -m venv /path/to-venv --system-site-packages
. /path/to-venv/bin/activate
where /path/to-venv
can be anywhere you have write access. For future sessions,
just load the frameworks module and run the activate line.
To obtain libEnsemble:
pip install libensemble
See here for more information on advanced options for installing libEnsemble, including using Spack.
Example
To run the forces_gpu tutorial on Aurora.
To obtain the example you can git clone libEnsemble - although only
the forces
sub-directory is strictly needed:
git clone https://github.com/Libensemble/libensemble
cd libensemble/libensemble/tests/scaling_tests/forces/forces_app
To compile forces (a C with OpenMP target application):
mpicc -DGPU -O3 -fiopenmp -fopenmp-targets=spir64 -o forces.x forces.c
Now go to forces_gpu directory:
cd ../forces_gpu
To make use of all available GPUs, open run_libe_forces.py and adjust
the exit_criteria
to perform more simulations. The following will run two
simulations for each worker:
# Instruct libEnsemble to exit after this many simulations
ensemble.exit_criteria = ExitCriteria(sim_max=nsim_workers*2)
Now grab an interactive session on two nodes (or use the batch script at
../submission_scripts/submit_pbs_aurora.sh
):
qsub -A <myproject> -l select=2 -l walltime=15:00 -lfilesystems=home:flare -q debug -I
Once in the interactive session, you may need to reload the frameworks module:
cd $PBS_O_WORKDIR
. /path/to-venv/bin/activate
Then in the session run:
python run_libe_forces.py -n 13
This provides twelve workers for running simulations (one for each GPU across two nodes). An extra worker is added to run the persistent generator. The GPU settings for each worker simulation are printed.
Looking at libE_stats.txt
will provide a summary of the runs.
Now try running:
./cleanup.sh
python run_libe_forces.py -n 7
And you will see it runs with two cores and two GPUs are used per worker. The forces example automatically uses the GPUs available to each worker.
Live viewing GPU usage
To see GPU usage, SSH into a compute node you are on in another window and run:
module load xpu-smi
watch -n 0.1 xpu-smi dump -d -1 -m 0 -n 1
Using tiles as GPUs
To treat each tile as its own GPU, add the use_tiles_as_gpus=True
option
to the libE_specs
block in run_libe_forces.py:
ensemble.libE_specs = LibeSpecs(
num_resource_sets=nsim_workers,
sim_dirs_make=True,
use_tiles_as_gpus=True,
)
Now you can run again but with twice the workers for running simulations (each will use one GPU tile):
python run_libe_forces.py -n 25
Running generator on the manager
An alternative is to run the generator on a thread on the manager. The number of workers can then be set to the number of simulation workers.
Change the libE_specs
in run_libe_forces.py as follows:
nsim_workers = ensemble.nworkers
# Persistent gen does not need resources
ensemble.libE_specs = LibeSpecs(
gen_on_manager=True,
then we can run with 12 (instead of 13) workers:
python run_libe_forces.py -n 12
Dynamic resource assignment
In the forces directory you will also find:
forces_gpu_var_resources
uses varying processor/GPU counts per simulation.forces_multi_app
uses varying processor/GPU counts per simulation and also uses two different user executables, one which is CPU-only and one which uses GPUs. This allows highly efficient use of nodes for multi-application ensembles.
Demonstration
Note that a video demonstration of the forces_gpu example on Frontier is also available. The workflow is identical when running on Aurora, with the exception of different compiler options and numbers of workers (because the numbers of GPUs on a node differs).