Theta
Theta is a Cray XC40 system based on the second-generation Intel Xeon Phi processor, available in the ALCF at Argonne National Laboratory.
Theta features three tiers of nodes: login, MOM,
and compute nodes. Users on login nodes submit batch jobs to the MOM nodes.
MOM nodes execute user batch scripts to run on the compute nodes via aprun
.
Theta will not schedule more than one MPI application per compute node.
Configuring Python
Begin by loading the Python 3 Miniconda module:
$ module load miniconda-3/latest
Create a conda virtual environment. We recommend cloning the base environment. This environment will contain mpi4py and many other packages that are configured correctly for Theta:
$ conda create --name my_env --clone $CONDA_PREFIX
Note
The “executing transaction” step of creating your new environment may take a while!
Following a successful environment creation, the prompt will suggest activating your new environment immediately. A conda error may result; follow the on-screen instructions to configure your shell with conda.
Activate your virtual environment with
$ export PYTHONNOUSERSITE=1
$ conda activate my_env
Alternative
If you do not wish to clone the miniconda environment and instead create your own, and
you are using mpi4py
make sure the install picks up Cray’s compiler drivers. E.g:
$ conda create --name my_env python=3.7
$ export PYTHONNOUSERSITE=1
$ conda activate my_env
$ CC=cc MPICC=cc pip install mpi4py --no-binary mpi4py
More information on using conda on Theta is also available.
Installing libEnsemble and Balsam
libEnsemble
You should get an indication that your virtual environment is activated.
Obtaining libEnsemble is now as simple as pip install libensemble
.
Your prompt should be similar to the following line:
(my_env) user@thetalogin6:~$ pip install libensemble
Note
If you encounter pip errors, run python -m pip install --upgrade pip
first.
Or, you can install via conda
(which comes with some common dependencies):
(my_env) user@thetalogin6:~$ conda config --add channels conda-forge
(my_env) user@thetalogin6:~$ conda install -c conda-forge libensemble
See here for more information on advanced options for installing libEnsemble.
Balsam (Optional)
Balsam allows libEnsemble to be run on compute nodes, and still submit tasks from workers (see Job Submission below). The Balsam Executor will stage in tasks to a database hosted on a MOM node, which will submit these tasks dynamically to the compute nodes.
Balsam can be installed with:
pip install balsam-flow
Initialize a Balsam database at a location of your choice. E.g:
balsam init ~/myWorkflow
Further notes on using Balsam:
Call
balsamactivate
in the batch script (see below). Make sure no active postgres databases are running on either login or MOM nodes before callingqsub
. You can check with the script ps_nodes.Balsam requires PostgreSQL version 9.6.4 or later, but problems may be encountered when using the default
pg_ctl
and PostgreSQL 10.12 installation installed in/usr/bin
. This may be resolved by loading the postgresql/9.6.12 modules within submission scripts that use Balsam.By default there are a maximum of 128 concurrent database connections. Each worker will use a connection and a few extra are needed. Increase the number of connections by appending a new
max_connections=
line tobalsamdb/postgresql.conf
in the database directory. E.g.~max_connections=1024
There is a Balsam module available (balsam/0.3.8), but the module’s Python installation supersedes others when loaded. In practice, libEnsemble or other Python packages installed into another environment become inaccessible. Installing Balsam into a separate Python virtual environment is recommended instead.
Read Balsam’s documentation here.
Note
Balsam creates run-specific directories inside data/my_workflow
in the database
directory. For example: $HOME/my_balsam_db/data/libe_workflow/job_run_libe_forces_b7073fa9/
.
From here, files can be staged out (see the example batch script below).
Job Submission
On Theta, libEnsemble can be launched to two locations:
1. A MOM Node: All of libEnsemble’s manager and worker processes run centrally on a front-end MOM node. libEnsemble’s MPI Executor takes responsibility for direct user-application submission to allocated compute nodes. libEnsemble must be configured to run with multiprocessing communications, since mpi4py isn’t configured for use on the MOM nodes.
2. The Compute Nodes: libEnsemble is submitted to Balsam, and all manager and worker processes are tasked to a back-end compute node and run centrally. libEnsemble’s Balsam Executor interfaces with Balsam running on a MOM node for dynamic user-application submission to the compute nodes.
![]()
When considering on which nodes to run libEnsemble, consider whether your sim_f
or gen_f
user functions (not applications) execute computationally expensive
code, or code built specifically for the compute node architecture. Recall also
that only the MOM nodes can launch MPI applications.
Although libEnsemble workers on the MOM nodes can technically submit
user applications to the compute nodes directly via aprun
within user functions, it
is highly recommended that the aforementioned executor
interface be used instead. The libEnsemble Executor features advantages such as
automatic resource detection, portability, launch failure resilience, and ease of use.
Theta features one default production queue, default
, and two debug queues,
debug-cache-quad
and debug-flat-quad
.
Note
For the default queue, the minimum number of nodes to allocate at once is 128.
Module and environment variables
In order to ensure proper functioning of libEnsemble, including the ability to kill running tasks, the following environment variable should be set:
export PMI_NO_FORK=1
It is also recommended that the following environment modules be unloaded, if present:
module unload trackdeps
module unload darshan
module unload xalt
Interactive Runs
You can run interactively with qsub
by specifying the -I
flag, similarly
to the following:
$ qsub -A [project] -n 8 -q debug-cache-quad -t 60 -I
This will place you on a MOM node. Then, to launch jobs to the compute
nodes, use aprun
where you would use mpirun
.
Note
You will need to reactivate your conda virtual environment, reactivate your Balsam database (if using Balsam), and reload your modules. Configuring this routine to occur automatically is recommended.
Batch Runs
Batch scripts specify run settings using #COBALT
statements. The following
simple example depicts configuring and launching libEnsemble to a MOM node with
multiprocessing. This script also assumes the user is using the parse_args()
convenience function from libEnsemble’s tools module.
#!/bin/bash -x
#COBALT -t 02:00:00
#COBALT -n 128
#COBALT -q default
#COBALT -A [project]
#COBALT -O libE-project
# --- Prepare Python ---
# Obtain Conda PATH from miniconda-3/latest module
CONDA_DIR=/soft/datascience/conda/miniconda3/latest/bin
# Name of conda environment
export CONDA_ENV_NAME=my_env
# Activate conda environment
export PYTHONNOUSERSITE=1
source $CONDA_DIR/activate $CONDA_ENV_NAME
# --- Prepare libEnsemble ---
# Name of calling script
export EXE=calling_script.py
# Communication Method
export COMMS='--comms local'
# Number of workers.
export NWORKERS='--nworkers 128'
# Required for killing tasks from workers on Theta
export PMI_NO_FORK=1
# Unload Theta modules that may interfere with task monitoring/kills
module unload trackdeps
module unload darshan
module unload xalt
python $EXE $COMMS $NWORKERS > out.txt 2>&1
With this saved as myscript.sh
, allocating, configuring, and queueing
libEnsemble on Theta is achieved by running
$ qsub --mode script myscript.sh
Balsam Runs
Here is an example Balsam submission script. It requires a pre-initialized (but not activated) postgresql database. Note, the example runs libEnsemble over two dedicated nodes, reserving the other 127 nodes for launched applications. libEnsemble is run with MPI on 128 processors (one manager and 127 workers).:
#!/bin/bash -x
#COBALT -t 60
#COBALT -O libE_test
#COBALT -n 129
#COBALT -q default
#COBALT -A [project]
# Name of calling script
export EXE=calling_script.py
# Number of workers.
export NUM_WORKERS=127
# Number of nodes to run libE
export LIBE_NODES=2
# Wall-clock for entire libE run (supplied to Balsam)
export LIBE_WALLCLOCK=45
# Name of working directory where Balsam places running jobs/output
export WORKFLOW_NAME=libe_workflow
# If user script takes ``wallclock_max`` argument.
# export SCRIPT_ARGS=$(($LIBE_WALLCLOCK-3))
export SCRIPT_ARGS=""
# Name of conda environment
export CONDA_ENV_NAME=my_env
export BALSAM_DB_NAME=myWorkflow
# Required for killing tasks from workers on Theta
export PMI_NO_FORK=1
# Unload Theta modules that may interfere with task monitoring/kills
module unload trackdeps
module unload darshan
module unload xalt
# Obtain Conda PATH from miniconda-3/latest module
CONDA_DIR=/soft/datascience/conda/miniconda3/latest/bin
# Ensure environment is isolated
export PYTHONNOUSERSITE=1
# Activate conda environment
source $CONDA_DIR/activate $CONDA_ENV_NAME
# Activate Balsam database
source balsamactivate $BALSAM_DB_NAME
# Currently need at least one DB connection per worker (for postgres).
if [[ $NUM_WORKERS -gt 100 ]]
then
# Add a margin
export BALSAM_DB_PATH=~/$BALSAM_DB_NAME # Pre-pend with PATH
echo -e "max_connections=$(($NUM_WORKERS+20)) # Appended by submission script" \
>> $BALSAM_DB_PATH/balsamdb/postgresql.conf
fi
wait
# Make sure no existing apps/jobs
balsam rm apps --all --force
balsam rm jobs --all --force
wait
sleep 3
# Add calling script to Balsam database as app and job.
export THIS_DIR=$PWD
export SCRIPT_BASENAME=${EXE%.*}
export LIBE_PROCS=$((NUM_WORKERS+1)) # Manager and workers
export PROCS_PER_NODE=$((LIBE_PROCS/LIBE_NODES)) # Must divide evenly
balsam app --name $SCRIPT_BASENAME.app --exec $EXE --desc "Run $SCRIPT_BASENAME"
balsam job --name job_$SCRIPT_BASENAME --workflow $WORKFLOW_NAME \
--application $SCRIPT_BASENAME.app --args $SCRIPT_ARGS \
--wall-time-minutes $LIBE_WALLCLOCK \
--num-nodes $LIBE_NODES --ranks-per-node $PROCS_PER_NODE \
--url-out="local:/$THIS_DIR" --stage-out-files="*.out *.txt *.log" \
--url-in="local:/$THIS_DIR/*" --yes
# Run job
balsam launcher --consume-all --job-mode=mpi --num-transition-threads=1
wait
source balsamdeactivate
Further examples of Balsam submission scripts can be be found in the examples.
Debugging Strategies
View the status of your submitted jobs with qstat -fu [user]
.
Theta features two debug queues each with sixteen nodes. Each user can allocate up to eight nodes at once for a maximum of one hour. To allocate nodes on a debug queue interactively, use
$ qsub -A [project] -n 4 -q debug-flat-quad -t 60 -I
Additional Information
See the ALCF Support Center for more information about Theta.
Read the documentation for Balsam here.