Introduction to libEnsemble
libEnsemble is a Python toolkit for coordinating workflows of asynchronous and dynamic ensembles of calculations.
libEnsemble helps users take advantage of massively parallel resources to solve design, decision, and inference problems and expands the class of problems that can benefit from increased parallelism.
libEnsemble aims for:
Extreme scaling: Run on or across laptops, clusters, and leadership-class machines.
Dynamic Ensembles: Generate new tasks on-the-fly based on previous computations.
Dynamic Resource Management: Reassign resource partitions of any size for tasks.
Monitoring/killing of applications: Ensemble members can poll or kill running apps.
Resilience/fault tolerance: libEnsemble can restart incomplete tasks or entire ensembles.
Portability and flexibility: Run identical libEnsemble scripts on different machines.
Exploitation of persistent data/control flow: libEnsemble can pass data between ensemble members.
Low start-up cost: Default single-machine deployments don’t require additional services.
libEnsemble’s users select or supply generator and simulator Python functions; these respectively produce candidate parameters and perform/monitor computations that use those parameters. Generator functions can train models, perform optimizations, and test candidate solutions in a batch or streaming fashion based on simulation results. Simulator functions can themselves use parallel resources and involve libraries or executables that are not written in Python.
With a basic familiarity of Python and NumPy, users can easily incorporate any other mathematics, machine-learning, or resource-management libraries into libEnsemble workflows.
libEnsemble employs a manager/worker scheme that communicates via MPI, multiprocessing, or TCP. Workers control and monitor any level of work using the aforementioned generator and simulator functions, from small subnode tasks to huge many-node computations.
libEnsemble includes an Executor interface so application-launching functions are portable, resilient, and flexible; it also automatically detects available nodes and cores, and can dynamically assign resources to workers.
libEnsemble performs best on Unix-like systems like Linux and macOS. See the FAQ for more information.
Dependencies
Required dependencies:
Python 3.7 or above
When using mpi4py
for libEnsemble communications:
A functional MPI 1.x/2.x/3.x implementation, such as MPICH, built with shared/dynamic libraries
mpi4py v2.0.0 or above
Optional dependencies:
As of v0.9.0, libEnsemble features an updated Balsam Executor for workers to schedule and launch applications to anywhere with a running Balsam site, including to remote machines.
libEnsemble is typically configured and parameterized via Python dictionaries. libEnsemble can also be parameterized via yaml.
As of v0.9.0, libEnsemble features a cross-system capability powered by funcX, a function-as-a-service platform to which workers can submit remote generator or simulator function instances. This feature can help distribute an ensemble across systems and heterogeneous resources.
As of v0.9.3, libEnsemble features a set of command-line utilities for submitting libEnsemble jobs to almost any system or scheduler via a PSI/J Python interface. tqdm is also required.
The example simulation and generation functions and tests require the following:
PETSc/TAO - Can optionally be installed by pip along with
petsc4py
PETSc and NLopt must be built with shared libraries enabled and be present in
sys.path
(e.g., via setting the PYTHONPATH
environment variable). NLopt
should produce a file nlopt.py
if Python is found on the system. See the
NLopt documentation for information about building NLopt with shared
libraries. NLopt may also require SWIG to be installed on certain systems.
Installation
libEnsemble can be installed or accessed from a variety of sources.
Install libEnsemble and its dependencies from PyPI using pip:
pip install libensemble
Install libEnsemble with Conda from the conda-forge channel:
conda config --add channels conda-forge
conda install -c conda-forge libensemble
Install libEnsemble using the Spack distribution:
spack install py-libensemble
libEnsemble is included in the xSDK Extreme-scale Scientific Software Development Kit from xSDK version 0.5.0 onward. Install the xSDK and load the environment with:
spack install xsdk
spack load -r xsdk
The codebase, tests and examples can be accessed in the GitHub repository. If necessary, you may install all optional dependencies (listed above) at once with:
pip install libensemble[extras]
A tarball of the most recent release is also available.
Testing
The provided test suite includes both unit and regression tests and is run regularly on:
The test suite requires the mock, pytest, pytest-cov, and pytest-timeout packages
to be installed and can be run from the libensemble/tests
directory
of the source distribution by running:
./run-tests.sh
Further options are available. To see a complete list of options, run:
./run-tests.sh -h
The regression tests also work as good example libEnsemble scripts and can
be run directly in libensemble/tests/regression_tests
. For example:
cd libensemble/tests/regression_tests
python test_uniform_sampling.py --comms local --nworkers 3
The libensemble/tests/scaling_tests
directory includes example scripts that
use the executor to run compiled applications. These are tested regularly on
HPC systems.
If you have the libEnsemble source code, you can download (but not install) the testing prerequisites and run the tests with:
python setup.py test
in the top-level directory containing the setup script.
Coverage reports are produced separately for unit tests and regression tests
under the relevant directories. For parallel tests, the union of all processors
is taken. Furthermore, a combined coverage report is created at the top level,
which can be viewed at libensemble/tests/cov_merge/index.html
after run_tests.sh
is completed. The coverage results are available
online at Coveralls.
Basic Usage
The default manager/worker communications mode is MPI. The user script is launched as:
mpiexec -np N python myscript.py
where N
is the number of processors. This will launch one manager and
N-1
workers.
If running in local mode, which uses Python’s multiprocessing module, the
local
comms option and the number of workers must be specified, either in libE_specs
or via the command-line using the parse_args()
function. The script
can then be run as a regular Python script:
python myscript.py --comms local --nworkers N
This will launch one manager and N workers.
See the user guide for more information.
Resources
Support:
Email questions or request libEnsemble Slack page access from
libEnsemble@lists.mcs.anl.gov
.Open issues on GitHub.
Join the libEnsemble mailing list for updates about new releases.
Further Information:
Documentation is provided by ReadtheDocs.
Contributions to libEnsemble are welcome.
An overview of libEnsemble’s structure and capabilities is given in this manuscript and poster.
Examples of production user functions and complete workflows can be viewed, downloaded, and contributed to in the libEnsemble Community Examples repository.
Citation:
Please use the following to cite libEnsemble:
@techreport{libEnsemble,
title = {{libEnsemble} Users Manual},
author = {Stephen Hudson and Jeffrey Larson and Stefan M. Wild and
David Bindel and John-Luke Navarro},
institution = {Argonne National Laboratory},
number = {Revision 0.9.3},
year = {2022},
url = {https://buildmedia.readthedocs.org/media/pdf/libensemble/latest/libensemble.pdf}
}
@article{Hudson2022,
title = {{libEnsemble}: A Library to Coordinate the Concurrent
Evaluation of Dynamic Ensembles of Calculations},
author = {Stephen Hudson and Jeffrey Larson and John-Luke Navarro and Stefan Wild},
journal = {{IEEE} Transactions on Parallel and Distributed Systems},
volume = {33},
number = {4},
pages = {977--988},
year = {2022},
doi = {10.1109/tpds.2021.3082815}
}
Example Compatible Packages
libEnsemble and the Community Examples repository include example generator functions for the following libraries:
APOSMM Asynchronously parallel optimization solver for finding multiple minima. Supported local optimization routines include:
DFO-LS Derivative-free solver for (bound constrained) nonlinear least-squares minimization
NLopt Library for nonlinear optimization, providing a common interface for various methods
scipy.optimize Open-source solvers for nonlinear problems, linear programming, constrained and nonlinear least-squares, root finding, and curve fitting.
PETSc/TAO Routines for the scalable (parallel) solution of scientific applications
DEAP Distributed evolutionary algorithms
Distributed optimization methods for minimizing sums of convex functions. Methods include:
Primal-dual sliding (https://arxiv.org/pdf/2101.00143).
Distributed gradient descent with gradient tracking (https://arxiv.org/abs/1908.11444).
Proximal sliding (https://arxiv.org/abs/1406.0919).
ECNoise Estimating Computational Noise in Numerical Simulations
Surmise Modular Bayesian calibration/inference framework
Tasmanian Toolkit for Adaptive Stochastic Modeling and Non-Intrusive ApproximatioN
VTMOP Fortran package for large-scale multiobjective multidisciplinary design optimization
libEnsemble has also been used to coordinate many computationally expensive simulations. Select examples include:
OPAL Object Oriented Parallel Accelerator Library. (See this IPAC manuscript.)
WarpX Advanced electromagnetic particle-in-cell code. (See example WarpX + libE scripts.)
See a complete list of example user scripts.