Perlmutter is an HPE Cray “Shasta” system located at NERSC. Its compute nodes are equipped with four A100 NVIDIA GPUs.
It uses the SLURM scheduler to submit jobs from login nodes to run on the compute nodes.
Configuring Python and Installation
Begin by loading the
python module. The following modules are recommended:
module load python
Create a conda environment
You can create a conda environment in which to install libEnsemble and all dependencies. For example:
conda create -n libe-pm python=3.9 -y
As Perlmutter has a shared HOME filesystem with other clusters, using
-pm suffix (for Perlmutter) is good practice.
Activate your virtual environment with:
export PYTHONNOUSERSITE=1 conda activate libe-pm
Installing libEnsemble and dependencies
Having loaded the Anaconda Python module, libEnsemble can be installed by one of the following ways.
Install via pip into the environment.
(my_env) user@perlmutter07:~$ pip install libensemble
Install via conda:
(my_env) user@perlmutter07:~$ conda config --add channels conda-forge (my_env) user@perlmutter07:~$ conda install -c conda-forge libensemble
See advanced installation for other installation options.
Perlmutter uses Slurm for job submission and management. The two most common
commands for initiating jobs are
sbatch for running
in interactive and batch modes, respectively. libEnsemble runs on the compute nodes
on Perlmutter using either
multi-processing (recommended) or
To run the forces_gpu tutorial on Perlmutter.
To obtain the example you can git clone libEnsemble - although only the forces sub-directory is needed:
git clone https://github.com/Libensemble/libensemble cd libensemble/libensemble/tests/scaling_tests/forces/forces_app
To compile forces:
module load PrgEnv-nvidia cudatoolkit craype-accel-nvidia80 cc -DGPU -O3 -fopenmp -mp=gpu -target-accel=nvidia80 -o forces.x forces.c
Now go to forces_gpu directory:
Now grab an interactive session on one node:
salloc -N 1 -t 20 -C gpu -q interactive -A <project_id>
Then in the session run:
python run_libe_forces.py --comms local --nworkers 4
To see GPU usage, ssh into the node you are on in another window and run:
watch -n 0.1 nvidia-smi
To watch video
There is a video demonstration of the forces example on Perlmutter.
The video uses libEnsemble version 0.9.3, where some adjustments of the scripts are needed to run on Perlmutter. These are no longer necessary. libEnsemble now correctly detects MPI runner and GPU setting on Perlmutter and the GPU code runs with many more particles than the CPU version (forces_simple).
Example submission scripts are also given in the examples.
Running libEnsemble with mpi4py
Running libEnsemble with local comms is usually sufficient on Perlmutter. However, if you need
mpi4py, you should install and run as follows:
module load PrgEnv-gnu cudatoolkit MPICC="cc -target-accel=nvidia80 -shared" pip install --force --no-cache-dir --no-binary=mpi4py mpi4py
This line will build
mpi4py on top of a CUDA-aware Cray MPICH.
To run using 4 workers (one manager):
export SLURM_EXACT=1 srun -n 5 python my_script.py
More information on using Python and
mpi4py on Perlmutter can be found
in the Python on Perlmutter documentation.
Some FAQs specific to Perlmutter. See more on the FAQ page.
srun: Job ****** step creation temporarily disabled, retrying (Requested nodes are busy)
Having created a dir
You may also see:
srun: Job ****** step creation still disabled, retrying (Requested nodes are busy)
This error has been encountered on Perlmutter. It is recommended to add these lines to submission scripts:
export SLURM_EXACT=1 export SLURM_MEM_PER_NODE=0
and to avoid using
#SBATCH commands that may limit resources to srun job steps such as:
#SBATCH --ntasks-per-node=4 #SBATCH --gpus-per-task=1
Instead provide these to sub-tasks via the
extra_args option to
GTL_DEBUG:  cudaHostRegister: no CUDA-capable device is detected
If using the environment variable
srun commands, at
time of writing, expect an option for allocating GPUs (e.g.~
allocate one GPU to each MPI task of the MPI run). It is recommended that tasks submitted
via the MPIExecutor specify this in the
option to the
submit function (rather than using an
#SBATCH command). This is needed
even when using setting
CUDA_VISIBLE_DEVICES or other options.
If running the libEnsemble user calling script with
srun, then it is recommended that
MPICH_GPU_SUPPORT_ENABLED is set in the user
gen_f function where
GPU runs will be submitted, instead of in the batch script. E.g:
os.environ["MPICH_GPU_SUPPORT_ENABLED"] = "1"
warning: /tmp/pgcudafatYDO6wtSva6K2.o: missing .note.GNU-stack section implies executable stack
Recently this warning has been encountered when compiling the forces example on Perlmutter. This does not
affect the run, but can be supressed by adding
-Wl,-znoexecstack to the build line.
See the NERSC Perlmutter docs for more information about Perlmutter.