Running libEnsemble
Introduction
libEnsemble runs with one manager and multiple workers. Each worker may run either a generator or simulator function (both are Python scripts). Generators determine the parameters/inputs for simulations. Simulator functions run and manage simulations, which often involve running a user application (see Executor).
Note
As of version 1.3.0, the generator can be run as a thread on the manager, using the libE_specs option gen_on_manager. When using this option, set the number of workers desired for running simulations. See Running generator on the manager for more details.
To use libEnsemble, you will need a calling script, which in turn will specify generator and simulator functions. Many examples are available.
There are currently three communication options for libEnsemble (determining how
the Manager and Workers communicate). These are local
, mpi
, tcp
.
The default is local
if nworkers
is specified, otherwise mpi
.
Note that local
comms can be used on multi-node systems, where
the MPI executor is used to distribute MPI applications
across the nodes. Indeed, this is the most commonly used option, even on large
supercomputers.
Note
You do not need the mpi
communication mode to use the
MPI Executor. The communication modes described
here only refer to how the libEnsemble manager and workers communicate.
Uses Python’s built-in multiprocessing module.
The comms
type local
and number of workers nworkers
may
be provided in libE_specs.
Then run:
python myscript.py
Or, if the script uses the parse_args
function
or an Ensemble
object with Ensemble(parse_args=True)
,
you can specify these on the command line:
python myscript.py --comms local --nworkers N
This will launch one manager and N
workers.
The following abbreviated line is equivalent to the above:
python myscript.py -n N
libEnsemble will run on one node in this scenario. To
disallow this node
from app-launches (if running libEnsemble on a compute node),
set libE_specs["dedicated_mode"] = True
.
This mode can also be used to run on a launch node of a three-tier
system (e.g., Summit), ensuring the whole compute-node allocation is available for
launching apps. Make sure there are no imports of mpi4py
in your Python scripts.
Note that on macOS (since Python 3.8) and Windows, the default multiprocessing method
is "spawn"
instead of "fork"
; to resolve many related issues, we recommend placing
calling script code in an if __name__ == "__main__":
block.
Limitations of local mode
Workers cannot be distributed across nodes.
In some scenarios, any import of
mpi4py
will cause this to break.Does not have the potential scaling of MPI mode, but is sufficient for most users.
This option uses mpi4py for the Manager/Worker communication. It is used automatically if you run your libEnsemble calling script with an MPI runner such as:
mpirun -np N python myscript.py
where N
is the number of processes. This will launch one manager and
N-1
workers.
This option requires mpi4py
to be installed to interface with the MPI on your system.
It works on a standalone system, and with both
central and distributed modes of running libEnsemble on
multi-node systems.
It also potentially scales the best when running with many workers on HPC systems.
Limitations of MPI mode
If launching MPI applications from workers, then MPI is nested. This is not supported with Open MPI. This can be overcome by using a proxy launcher (see Balsam). This nesting does work with MPICH and its derivative MPI implementations.
It is also unsuitable to use this mode when running on the launch nodes of
three-tier systems (e.g., Summit). In that case local
mode is recommended.
Run the Manager on one system and launch workers to remote
systems or nodes over TCP. Configure through
libE_specs
, or on the command line
if using an Ensemble
object with
Ensemble(parse_args=True)
,
Reverse-ssh interface
Set comms
to ssh
to launch workers on remote ssh-accessible systems. This
co-locates workers, functions, and any applications. User
functions can also be persistent, unlike when launching remote functions via
Globus Compute.
The remote working directory and Python need to be specified. This may resemble:
python myscript.py --comms ssh --workers machine1 machine2 --worker_pwd /home/workers --worker_python /home/.conda/.../python
Limitations of TCP mode
There cannot be two calls to
libE()
orEnsemble.run()
in the same script.
Further Command Line Options
See the parse_args
function in Convenience Tools for
further command line options.
Persistent Workers
In a regular (non-persistent) worker, the user’s generator or simulation function is called whenever the worker receives work. A persistent worker is one that continues to run the generator or simulation function between work units, maintaining the local data environment.
A common use-case consists of a persistent generator (such as persistent_aposmm) that maintains optimization data while generating new simulation inputs. The persistent generator runs on a dedicated worker while in persistent mode. This requires an appropriate allocation function that will run the generator as persistent.
When running with a persistent generator, it is important to remember that a worker will be dedicated to the generator and cannot run simulations. For example, the following run:
mpirun -np 3 python my_script.py
starts one manager, one worker with a persistent generator, and one worker for running simulations.
If this example was run as:
mpirun -np 2 python my_script.py
No simulations will be able to run.
Running generator on the manager
The majority of libEnsemble use cases run a single generator. The libE_specs option gen_on_manager will cause the generator function to run on a thread on the manager. This can run persistent user functions, sharing data structures with the manager, and avoids additional communication to a generator running on a worker. When using this option, the number of workers specified should be the (maximum) number of concurrent simulations.
If modifying a workflow to use gen_on_manager
consider the following.
Set
nworkers
to the number of workers desired for running simulations.If using
add_unique_random_streams()
to seed random streams, the default generator seed will be zero.If you have a line like
libE_specs["nresource_sets"] = nworkers -1
, this line should be removed.If the generator does use resources,
nresource_sets
can be increased as needed so that the generator and all simulations are resourced.
Environment Variables
Environment variables required in your run environment can be set in your Python sim or gen function. For example:
os.environ["OMP_NUM_THREADS"] = 4
set in your simulation script before the Executor submit command will export the setting
to your run. For running a bash script in a sub environment when using the Executor, see
the env_script
option to the MPI Executor.
Further Run Information
For running on multi-node platforms and supercomputers, there are alternative ways to configure libEnsemble to resources. See the Running on HPC Systems guide for more information, including some examples for specific systems.