Introduction to libEnsemble

libEnsemble is a Python toolkit for coordinating workflows of asynchronous and dynamic ensembles of calculations.

libEnsemble can help users take advantage of massively parallel resources to solve design, decision, and inference problems and expand the class of problems that can benefit from increased parallelism.

libEnsemble aims for:

  • Extreme scaling: Run on or across laptops, clusters, and leadership-class machines.

  • Dynamic Ensembles: Generate new tasks on-the-fly based on previous computations.

  • Dynamic Resource Management: Reassign resource partitions of any size for tasks.

  • Monitoring/killing of applications: Ensemble members can poll or kill running apps.

  • Resilience/fault tolerance: libEnsemble can restart incomplete tasks or entire ensembles.

  • Portability and flexibility: Run identical libEnsemble scripts on different machines.

  • Exploitation of persistent data/control flow: libEnsemble can pass data between ensemble members.

  • Low start-up cost: Default single-machine deployments don’t require additional services.

libEnsemble’s users select or supply generator and simulator Python functions; these respectively produce candidate parameters and perform/monitor computations that use those parameters. Generator functions can train models, perform optimizations, and test candidate solutions in a batch or streaming fashion based on simulation results. Simulator functions can themselves use parallel resources and involve libraries or executables that are not written in Python.

With a basic familiarity of Python and NumPy, users can easily incorporate any other mathematics, machine-learning, or resource-management libraries into libEnsemble workflows.

libEnsemble employs a manager/worker scheme that runs on MPI, multiprocessing, or TCP. Workers control and monitor any level of work using the aforementioned generator and simulator functions, from small subnode tasks to huge many-node computations.

libEnsemble includes an Executor interface so application-launching functions are portable, resilient, and flexible; it also automatically detects available nodes and cores, and can dynamically assign resources to workers.

Dependencies

Required dependencies:

When using mpi4py for libEnsemble communications:

  • A functional MPI 1.x/2.x/3.x implementation, such as MPICH, built with shared/dynamic libraries

  • mpi4py v2.0.0 or above

Optional dependencies:

As of v0.9.0, libEnsemble features an updated Balsam Executor for workers to schedule and launch applications to anywhere with a running Balsam site, including to remote machines.

libEnsemble is typically configured and parameterized via Python dictionaries. As of v0.8.0, libEnsemble can also be parameterized via yaml.

As of v0.9.0, libEnsemble features a cross-system capability powered by funcX, a function-as-a-service platform to which workers can submit remote generator or simulator function instances. This feature can help distribute an ensemble across systems and heterogeneous resources.

The example simulation and generation functions and tests require the following:

PETSc and NLopt must be built with shared libraries enabled and be present in sys.path (e.g., via setting the PYTHONPATH environment variable). NLopt should produce a file nlopt.py if Python is found on the system. See the NLopt documentation for information about building NLopt with shared libraries. NLopt may also require SWIG to be installed on certain systems.

Installation

libEnsemble can be installed or accessed from a variety of sources.

Install libEnsemble and its dependencies from PyPI using pip:

pip install libensemble

Install libEnsemble with Conda from the conda-forge channel:

conda config --add channels conda-forge
conda install -c conda-forge libensemble

Install libEnsemble using the Spack distribution:

spack install py-libensemble

libEnsemble is included in the xSDK Extreme-scale Scientific Software Development Kit from xSDK version 0.5.0 onward. Install the xSDK and load the environment with:

spack install xsdk
spack load -r xsdk

The codebase, tests and examples can be accessed in the GitHub repository. If necessary, you may install all optional dependencies (listed above) at once with:

pip install libensemble[extras]

A tarball of the most recent release is also available.

Testing

The provided test suite includes both unit and regression tests and is run regularly on:

The test suite requires the mock, pytest, pytest-cov, and pytest-timeout packages to be installed and can be run from the libensemble/tests directory of the source distribution by running:

./run-tests.sh

Further options are available. To see a complete list of options, run:

./run-tests.sh -h

The regression tests also work as good example libEnsemble scripts and can be run directly in libensemble/tests/regression_tests. For example:

cd libensemble/tests/regression_tests
python test_uniform_sampling.py --comms local --nworkers 3

The libensemble/tests/scaling_tests directory includes example scripts that use the executor to run compiled applications. These are tested regularly on HPC systems.

If you have the libEnsemble source code, you can download (but not install) the testing prerequisites and run the tests with:

python setup.py test

in the top-level directory containing the setup script.

Coverage reports are produced separately for unit tests and regression tests under the relevant directories. For parallel tests, the union of all processors is taken. Furthermore, a combined coverage report is created at the top level, which can be viewed at libensemble/tests/cov_merge/index.html after run_tests.sh is completed. The coverage results are available online at Coveralls.

Basic Usage

The default manager/worker communications mode is MPI. The user script is launched as:

mpiexec -np N python myscript.py

where N is the number of processors. This will launch one manager and N-1 workers.

If running in local mode, which uses Python’s multiprocessing module, the local comms option and the number of workers must be specified, either in libE_specs or via the command-line using the parse_args() function. The script can then be run as a regular Python script:

python myscript.py --comms local --nworkers N

This will launch one manager and N workers.

See the user guide for more information.

Resources

Support:

Further Information:

  • Documentation is provided by ReadtheDocs.

  • An overview of libEnsemble’s structure and capabilities is given in this manuscript and poster

  • Examples of production user functions and complete workflows can be viewed, downloaded, and contributed to in the libEnsemble Community Examples repository.

Citation:

  • Please use the following to cite libEnsemble:

@techreport{libEnsemble,
  title   = {{libEnsemble} Users Manual},
  author  = {Stephen Hudson and Jeffrey Larson and Stefan M. Wild and
             David Bindel and John-Luke Navarro},
  institution = {Argonne National Laboratory},
  number  = {Revision 0.9.1+dev},
  year    = {2022},
  url     = {https://buildmedia.readthedocs.org/media/pdf/libensemble/latest/libensemble.pdf}
}

@article{Hudson2022,
  title   = {{libEnsemble}: A Library to Coordinate the Concurrent
             Evaluation of Dynamic Ensembles of Calculations},
  author  = {Stephen Hudson and Jeffrey Larson and John-Luke Navarro and Stefan Wild},
  journal = {{IEEE} Transactions on Parallel and Distributed Systems},
  volume  = {33},
  number  = {4},
  pages   = {977--988},
  year    = {2022},
  doi     = {10.1109/tpds.2021.3082815}
}

Example Compatible Packages

libEnsemble and the Community Examples repository include example generator functions for the following libraries:

  • APOSMM Asynchronously parallel optimization solver for finding multiple minima. Supported local optimization routines include:

    • DFO-LS Derivative-free solver for (bound constrained) nonlinear least-squares minimization

    • NLopt Library for nonlinear optimization, providing a common interface for various methods

    • scipy.optimize Open-source solvers for nonlinear problems, linear programming, constrained and nonlinear least-squares, root finding, and curve fitting.

    • PETSc/TAO Routines for the scalable (parallel) solution of scientific applications

  • DEAP Distributed evolutionary algorithms

  • Distributed optimization methods for minimizing sums of convex functions. Methods include:

  • ECNoise Estimating Computational Noise in Numerical Simulations

  • Surmise Modular Bayesian calibration/inference framework

  • Tasmanian Toolkit for Adaptive Stochastic Modeling and Non-Intrusive ApproximatioN

  • VTMOP Fortran package for large-scale multiobjective multidisciplinary design optimization

libEnsemble has also been used to coordinate many computationally expensive simulations. Select examples include:

See a complete list of example user scripts.