Introduction to libEnsemble

libEnsemble is a Python library to coordinate the concurrent evaluation of dynamic ensembles of calculations. The library is developed to use massively parallel resources to accelerate the solution of design, decision, and inference problems and to expand the class of problems that can benefit from increased concurrency levels.

libEnsemble aims for the following:

  • Extreme scaling

  • Resilience/fault tolerance

  • Monitoring/killing of tasks (and recovering resources)

  • Portability and flexibility

  • Exploitation of persistent data/control flow

The user selects or supplies a generator function that produces input parameters for a simulator function that performs and monitors simulations. For example, the generator function may contain an optimization routine to generate new simulation parameters on-the-fly based on the results of previous simulations. Examples and templates of such functions are included in the library.

libEnsemble employs a manager/worker scheme that can run on various communication media (including MPI, multiprocessing, and TCP); interfacing with user-provided executables is also supported. Each worker can control and monitor any level of work, from small subnode tasks to huge many-node simulations. An executor interface is provided to ensure that scripts are portable, resilient, and flexible; it also enables automatic detection of the nodes and cores available to the user, and can dynamically assign resources to workers.

Dependencies

Required dependencies:

For libEnsemble running with the mpi4py parallelism:

  • A functional MPI 1.x/2.x/3.x implementation, such as MPICH, built with shared/dynamic libraries

  • mpi4py v2.0.0 or above

Optional dependencies:

From v0.2.0, libEnsemble has the option of using the Balsam job manager. Balsam is required in order to run libEnsemble on the compute nodes of some supercomputing platforms that do not support launching tasks from compute nodes. As of v0.5.0, libEnsemble can also be run on launch nodes using multiprocessing.

As of v0.8.0, an alternative interface is available. An Ensemble object is created and can be parameterized by a YAML file.

The example simulation and generation functions and tests require the following:

PETSc and NLopt must be built with shared libraries enabled and present in sys.path (e.g., via setting the PYTHONPATH environment variable). NLopt should produce a file nlopt.py if Python is found on the system. See the NLopt documentation for information about building NLopt with shared libraries. NLopt may also require SWIG to be installed on certain systems.

Installation

libEnsemble can be installed or accessed from a variety of sources.

Install libEnsemble and its dependencies from PyPI using pip:

pip install libensemble

Install libEnsemble with Conda from the conda-forge channel:

conda config --add channels conda-forge
conda install -c conda-forge libensemble

Install libEnsemble using the Spack distribution:

spack install py-libensemble

libEnsemble is included in the xSDK Extreme-scale Scientific Software Development Kit from xSDK version 0.5.0 onward. Install the xSDK and load the environment with

spack install xsdk
spack load -r xsdk

The codebase, tests and examples can be accessed in the GitHub repository. If necessary, you may install all optional dependencies (listed above) at once with

pip install libensemble[extras]

A tarball of the most recent release is also available.

Testing

The provided test suite includes both unit and regression tests and is run regularly on:

The test suite requires the mock, pytest, pytest-cov, and pytest-timeout packages to be installed and can be run from the libensemble/tests directory of the source distribution by running

./run-tests.sh

Further options are available. To see a complete list of options, run

./run-tests.sh -h

The regression tests also work as good example libEnsemble scripts and can be run directly in libensemble/tests/regression_tests. For example:

cd libensemble/tests/regression_tests
python test_uniform_sampling.py --comms local --nworkers 3

The libensemble/tests/scaling_tests directory includes some examples that make use of the executor to run compiled applications. These are tested regularly on HPC systems.

If you have the source distribution, you can download (but not install) the testing prerequisites and run the tests with

python setup.py test

in the top-level directory containing the setup script.

Coverage reports are produced separately for unit tests and regression tests under the relevant directories. For parallel tests, the union of all processors is taken. Furthermore, a combined coverage report is created at the top level, which can be viewed at libensemble/tests/cov_merge/index.html after run_tests.sh is completed. The coverage results are available online at Coveralls.

Note

The executor tests can be run by using the direct-launch or Balsam executors. Balsam integration with libEnsemble is now tested via test_balsam_hworld.py.

Basic Usage

The examples directory contains example libEnsemble calling scripts, simulation functions, generation functions, allocation functions, and libEnsemble submission scripts.

The default manager/worker communications mode is MPI. The user script is launched as

mpiexec -np N python myscript.py

where N is the number of processors. This will launch one manager and N-1 workers.

If running in local mode, which uses Python’s multiprocessing module, the local comms option and the number of workers must be specified. The script can then be run as a regular Python script:

python myscript.py

These options may be specified via the command line by using the parse_args() convenience function within libEnsemble’s tools module.

See the user guide for more information.

Resources

Support:

  • The best way to receive support is to email questions to libEnsemble@lists.mcs.anl.gov.

  • Communicate (and establish a private channel, if desired) at the libEnsemble Slack page.

  • Join the libEnsemble mailing list for updates about new releases.

Further Information:

  • Documentation is provided by ReadtheDocs.

  • An overview of libEnsemble’s structure and capabilities is given in this manuscript and poster

Citation:

  • Please use the following to cite libEnsemble:

@techreport{libEnsemble,
  title   = {{libEnsemble} Users Manual},
  author  = {Stephen Hudson and Jeffrey Larson and Stefan M. Wild and
             David Bindel and John-Luke Navarro},
  institution = {Argonne National Laboratory},
  number  = {Revision 0.8.0+dev},
  year    = {2021},
  url     = {https://buildmedia.readthedocs.org/media/pdf/libensemble/latest/libensemble.pdf}
}

@article{Hudson2022,
  title   = {{libEnsemble}: A Library to Coordinate the Concurrent
             Evaluation of Dynamic Ensembles of Calculations},
  author  = {Stephen Hudson and Jeffrey Larson and John-Luke Navarro and Stefan Wild},
  journal = {{IEEE} Transactions on Parallel and Distributed Systems},
  volume  = {33},
  number  = {4},
  pages   = {977--988},
  year    = {2022},
  doi     = {10.1109/tpds.2021.3082815}
}

Capabilities:

libEnsemble generation capabilities include:

  • APOSMM Asynchronously parallel optimization solver for finding multiple minima. Supported local optimization routines include:

    • DFO-LS Derivative-free solver for (bound constrained) nonlinear least-squares minimization

    • NLopt Library for nonlinear optimization, providing a common interface for various methods

    • scipy.optimize Open-source solvers for nonlinear problems, linear programming, constrained and nonlinear least-squares, root finding, and curve fitting.

    • PETSc/TAO Routines for the scalable (parallel) solution of scientific applications

  • DEAP Distributed evolutionary algorithms

  • Distributed optimization methods for minimizing sums of convex functions. Methods include:

  • ECNoise Estimating Computational Noise in Numerical Simulations

  • Surmise Modular Bayesian calibration/inference framework

  • Tasmanian Toolkit for Adaptive Stochastic Modeling and Non-Intrusive ApproximatioN

  • VTMOP Fortran package for large-scale multiobjective multidisciplinary design optimization

libEnsemble has also been used to coordinate many computationally expensive simulations. Select examples include:

See a complete list of example user scripts.