Bebop
Bebop is a Cray CS400 cluster with Intel Broadwell and Knights Landing compute nodes available in the Laboratory Computing Resources Center (LCRC) at Argonne National Laboratory.
Configuring Python
Begin by loading the Python 3 Anaconda module:
module load anaconda3
Create a conda virtual environment in which to install libEnsemble and all dependencies:
conda config --add channels intel
conda create --name my_env intelpython3_core python=3
source activate my_env
Installing libEnsemble and Dependencies
You should have an indication that the virtual environment is activated. Start by installing mpi4py in this environment, making sure to reference the preinstalled Intel MPI compiler. Your prompt should be similar to the following block:
CC=mpiicc MPICC=mpiicc pip install mpi4py --no-binary mpi4py
libEnsemble can then be installed via pip
or conda
. To install via pip
:
pip install libensemble
To install via conda
:
conda config --add channels conda-forge
conda install -c conda-forge libensemble
See here for more information on advanced options for installing libEnsemble.
Job Submission
Bebop uses Slurm for job submission and management. The two commands you’ll
likely use the most to run jobs are srun
and sbatch
for running
interactively and batch, respectively.
libEnsemble node-worker affinity is especially flexible on Bebop. By adjusting
srun
runtime options users may assign multiple libEnsemble workers to each
allocated node(oversubscription) or assign multiple nodes per worker.
Interactive Runs
You can allocate four Knights Landing nodes for thirty minutes through the following:
salloc -N 4 -p knl -A [username OR project] -t 00:30:00
With your nodes allocated, queue your job to start with four MPI ranks:
srun -n 4 python calling.py
mpirun
should also work. This line launches libEnsemble with a manager and
three workers to one allocated compute node, with three nodes available for
the workers to launch calculations with the Executor or a launch command.
This is an example of running in centralized mode, and,
if using the Executor, libEnsemble should
be initiated with libE_specs["dedicated_mode"]=True
Note
When performing a distributed MPI libEnsemble run and not oversubscribing, specify one more MPI process than the number of allocated nodes. The manager and first worker run together on a node.
If you would like to interact directly with the compute nodes via a shell, the following starts a bash session on a Knights Landing node for thirty minutes:
srun --pty -A [username OR project] -p knl -t 00:30:00 /bin/bash
Note
You will need to reactivate your conda virtual environment and reload your modules! Configuring this routine to occur automatically is recommended.
Batch Runs
Batch scripts specify run settings using #SBATCH
statements. A simple example
for a libEnsemble use case running in distributed MPI
mode on Broadwell nodes resembles the following:
1#!/bin/bash
2#SBATCH -J myjob
3#SBATCH -N 4
4#SBATCH -p bdwall
5#SBATCH -A myproject
6#SBATCH -o myjob.out
7#SBATCH -e myjob.error
8#SBATCH -t 00:15:00
9
10# These four lines construct a machinefile for the executor and slurm
11srun hostname | sort -u > node_list
12head -n 1 node_list > machinefile.$SLURM_JOBID
13cat node_list >> machinefile.$SLURM_JOBID
14export SLURM_HOSTFILE=machinefile.$SLURM_JOBID
15
16srun --ntasks 5 python calling_script.py
With this saved as myscript.sh
, allocating, configuring, and running libEnsemble
on Bebop is achieved by running
sbatch myscript.sh
Example submission scripts for running on Bebop in distributed and centralized mode are also given in the examples.
Debugging Strategies
View the status of your submitted jobs with squeue
, and cancel jobs with
scancel <Job ID>
.
Additional Information
See the LCRC Bebop docs here for more information about Bebop.