Summit

Summit is an IBM AC922 system located at the Oak Ridge Leadership Computing Facility (OLCF). Each of the approximately 4,600 compute nodes on Summit contains two IBM POWER9 processors and six NVIDIA Volta V100 accelerators.

Summit features three tiers of nodes: login, launch, and compute nodes.

Users on login nodes submit batch runs to the launch nodes. Batch scripts and interactive sessions run on the launch nodes. Only the launch nodes can submit MPI runs to the compute nodes via jsrun.

Configuring Python

Begin by loading the Python 3 Anaconda module:

$ module load python

You can now create and activate your own custom conda environment:

conda create --name myenv python=3.7
export PYTHONNOUSERSITE=1 # Make sure get python from conda env
. activate myenv

If you are installing any packages with extensions, ensure that the correct compiler module is loaded. If using mpi4py, this must be installed from source, referencing the compiler. Currently, mpi4py must be built with gcc:

module load gcc

With your environment activated, run

CC=mpicc MPICC=mpicc pip install mpi4py --no-binary mpi4py

Installing libEnsemble

Obtaining libEnsemble is now as simple as pip install libensemble. Your prompt should be similar to the following line:

(my_env) user@login5:~$ pip install libensemble

Note

If you encounter pip errors, run python -m pip install --upgrade pip first

Or, you can install via conda:

(my_env) user@login5:~$ conda config --add channels conda-forge
(my_env) user@login5:~$ conda install -c conda-forge libensemble

See here for more information on advanced options for installing libEnsemble.

Special note on resource sets and Executor submit options

When using the portable MPI run configuration options (e.g., num_nodes) to the MPIExecutor submit function, it is important to note that, due to the resource sets used on Summit, the options refer to resource sets as follows:

  • num_procs (int, optional) – The total number resource sets for this run.

  • num_nodes (int, optional) – The number of nodes on which to submit the run.

  • ranks_per_node (int, optional) – The number of resource sets per node.

It is recommended that the user defines a resource set as the minimal configuration of CPU cores/processes and GPUs. These can be added to the extra_args option of the submit function. Alternatively, the portable options can be ignored and everything expressed in extra_args.

For example, the following jsrun line would run three resource sets, each having one core (with one process), and one GPU, along with some extra options:

jsrun -n 3 -a 1 -g 1 -c 1 --bind=packed:1 --smpiargs="-gpu"

To express this line in the submit function may look something like the following:

exctr = Executor.executor
task = exctr.submit(app_name='mycode',
                    num_procs=3,
                    extra_args='-a 1 -g 1 -c 1 --bind=packed:1 --smpiargs="-gpu"'
                    app_args="-i input")

This would be equivalent to:

exctr = Executor.executor
task = exctr.submit(app_name='mycode',
                    extra_args='-n 3 -a 1 -g 1 -c 1 --bind=packed:1 --smpiargs="-gpu"'
                    app_args="-i input")

The auto-resources in the Executor works out the resources available to each worker, but unlike some other systems, jsrun on Summit dynamically schedules runs to available slots across and within nodes. It can also queue tasks. This allows variable size runs to easily be handled on Summit. If these runs over-use the auto-resource allocations, auto_resources can be turned off in the Executor setup. E.g: In the calling script:

from libensemble.executors.mpi_executor import MPIExecutor
exctr = MPIExecutor(central_mode=True, auto_resources=False)

In the above example, the task being submitted used three GPUs, which is half those available on a Summit node, and thus two such tasks may be allocated to each node (from different workers), if they were running at the same time.

Job Submission

Summit uses LSF for job management and submission. For libEnsemble, the most important command is bsub for submitting batch scripts from the login nodes to execute on the launch nodes.

It is recommended to run libEnsemble on the launch nodes (assuming workers are submitting MPI applications) using the local communications mode (multiprocessing). In the future, Balsam may be used to run libEnsemble on compute nodes.

Interactive Runs

You can run interactively with bsub by specifying the -Is flag, similarly to the following:

$ bsub -W 30 -P [project] -nnodes 8 -Is

This will place you on a launch node.

Note

You will need to reactivate your conda virtual environment.

Batch Runs

Batch scripts specify run settings using #BSUB statements. The following simple example depicts configuring and launching libEnsemble to a launch node with multiprocessing. This script also assumes the user is using the parse_args() convenience function from libEnsemble’s tools module.

#!/bin/bash -x
#BSUB -P <project code>
#BSUB -J libe_mproc
#BSUB -W 60
#BSUB -nnodes 128
#BSUB -alloc_flags "smt1"

# --- Prepare Python ---

# Load conda module and gcc.
module load python
module load gcc

# Name of conda environment
export CONDA_ENV_NAME=my_env

# Activate conda environment
export PYTHONNOUSERSITE=1
source activate $CONDA_ENV_NAME

# --- Prepare libEnsemble ---

# Name of calling script
export EXE=calling_script.py

# Communication Method
export COMMS='--comms local'

# Number of workers.
export NWORKERS='--nworkers 128'

hash -r # Check no commands hashed (pip/python...)

# Launch libE
python $EXE $COMMS $NWORKERS > out.txt 2>&1

With this saved as myscript.sh, allocating, configuring, and queueing libEnsemble on Summit is achieved by running

$ bsub myscript.sh

Example submission scripts are also given in the examples.

Launching User Applications from libEnsemble Workers

Only the launch nodes can submit MPI runs to the compute nodes via jsrun. This can be accomplished in user sim_f functions directly. However, it is highly recommended that the Executor interface be used inside the sim_f or gen_f, because this provides a portable interface with many advantages including automatic resource detection, portability, launch failure resilience, and ease of use.

Additional Information

See the OLCF guides for more information about Summit.