Theta is a Cray XC40 system based on the second-generation Intel Xeon Phi processor, available in the ALCF at Argonne National Laboratory.
Theta features three tiers of nodes: login, MOM,
and compute nodes. Users on login nodes submit batch jobs to the MOM nodes.
MOM nodes execute user batch scripts to run on the compute nodes via
Theta will not schedule more than one MPI application per compute node.
Begin by loading the Python 3 Miniconda module:
$ module load miniconda-3/latest
Create a conda virtual environment. We recommend cloning the base environment. This environment will contain mpi4py and many other packages that are configured correctly for Theta:
$ conda create --name my_env --clone $CONDA_PREFIX
The “executing transaction” step of creating your new environment may take a while!
Following a successful environment creation, the prompt will suggest activating your new environment immediately. A conda error may result; follow the on-screen instructions to configure your shell with conda.
Activate your virtual environment with
$ export PYTHONNOUSERSITE=1 $ conda activate my_env
If you do not wish to clone the miniconda environment and instead create your own, and
you are using
mpi4py make sure the install picks up Cray’s compiler drivers. E.g:
$ conda create --name my_env python=3.8 $ export PYTHONNOUSERSITE=1 $ conda activate my_env $ CC=cc MPICC=cc pip install mpi4py --no-binary mpi4py
More information on using conda on Theta is also available.
Installing libEnsemble and Balsam
You should get an indication that your virtual environment is activated.
Obtaining libEnsemble is now as simple as
pip install libensemble.
Your prompt should be similar to the following line:
(my_env) user@thetalogin6:~$ pip install libensemble
If you encounter pip errors, run
python -m pip install --upgrade pip first.
Or, you can install via
conda (which comes with some common dependencies):
(my_env) user@thetalogin6:~$ conda config --add channels conda-forge (my_env) user@thetalogin6:~$ conda install -c conda-forge libensemble
See here for more information on advanced options for installing libEnsemble.
Balsam allows libEnsemble to be run on compute nodes, and still submit tasks from workers (see Job Submission below). The Balsam Executor can submit tasks to the Balsam Service, which will submit these tasks dynamically to a corresponding Balsam Site.
See the Balsam Executor docs for more information.
On Theta, libEnsemble can be launched to two locations:
1. A MOM Node: All of libEnsemble’s manager and worker processes run centrally on a front-end MOM node. libEnsemble’s MPI Executor takes responsibility for direct user-application submission to allocated compute nodes. libEnsemble must be configured to run with multiprocessing communications, since mpi4py isn’t configured for use on the MOM nodes.
1. The Compute Nodes: libEnsemble is submitted to Balsam, and all manager and worker processes are tasked to a back-end compute node and run centrally. libEnsemble’s Balsam Executor interfaces with the Balsam service for dynamic user-application submission to the compute nodes.
When considering on which nodes to run libEnsemble, consider whether your
gen_f user functions (not applications) execute computationally expensive
code, or code built specifically for the compute node architecture. Recall also
that only the MOM nodes can launch MPI applications.
Although libEnsemble workers on the MOM nodes can technically submit
user applications to the compute nodes directly via
aprun within user functions, it
is highly recommended that the aforementioned executor
interface be used instead. The libEnsemble Executor features advantages such as
automatic resource detection, portability, launch failure resilience, and ease of use.
Theta features one default production queue,
default, and two debug queues,
For the default queue, the minimum number of nodes to allocate at once is 128.
Module and environment variables
In order to ensure proper functioning of libEnsemble, including the ability to kill running tasks, the following environment variable should be set:
It is also recommended that the following environment modules be unloaded, if present:
module unload trackdeps module unload darshan module unload xalt
You can run interactively with
qsub by specifying the
-I flag, similarly
to the following:
$ qsub -A [project] -n 8 -q debug-cache-quad -t 60 -I
This will place you on a MOM node. Then, to launch jobs to the compute
aprun where you would use
You will need to reactivate your conda virtual environment. Configuring this routine to occur automatically is recommended.
Batch scripts specify run settings using
#COBALT statements. The following
simple example depicts configuring and launching libEnsemble to a MOM node with
multiprocessing. This script also assumes the user is using the
convenience function from libEnsemble’s tools module.
#!/bin/bash -x #COBALT -t 02:00:00 #COBALT -n 128 #COBALT -q default #COBALT -A [project] #COBALT -O libE-project # --- Prepare Python --- # Obtain Conda PATH from miniconda-3/latest module CONDA_DIR=/soft/datascience/conda/miniconda3/latest/bin # Name of conda environment export CONDA_ENV_NAME=my_env # Activate conda environment export PYTHONNOUSERSITE=1 source $CONDA_DIR/activate $CONDA_ENV_NAME # --- Prepare libEnsemble --- # Name of calling script export EXE=calling_script.py # Communication Method export COMMS="--comms local" # Number of workers. export NWORKERS="--nworkers 128" # Required for killing tasks from workers on Theta export PMI_NO_FORK=1 # Unload Theta modules that may interfere with task monitoring/kills module unload trackdeps module unload darshan module unload xalt python $EXE $COMMS $NWORKERS > out.txt 2>&1
With this saved as
myscript.sh, allocating, configuring, and queueing
libEnsemble on Theta is achieved by running
$ qsub --mode script myscript.sh
View the status of your submitted jobs with
qstat -fu [user].
Theta features two debug queues each with sixteen nodes. Each user can allocate up to eight nodes at once for a maximum of one hour. To allocate nodes on a debug queue interactively, use
$ qsub -A [project] -n 4 -q debug-flat-quad -t 60 -I
See the ALCF Support Center for more information about Theta.
Read the documentation for Balsam here.