Executor Overview

Most computationally expensive libEnsemble workflows involve launching applications from a sim_f or gen_f running on a worker to the compute nodes of a supercomputer, cluster, or other compute resource.

The Executor provides a portable interface for running applications on any system.

Basic usage

In calling script

To set up an MPI executor, register an MPI application, and add to the ensemble object.

from libensemble import Ensemble
from libensemble.executors import MPIExecutor

exctr = MPIExecutor()
exctr.register_app(full_path="/path/to/my/exe", app_name="sim1")
ensemble = Ensemble(executor=exctr)

If using the libE() call, the Executor in the calling script does not have to be passed to the libE() function. It is transferred via the Executor.executor class variable.

In user simulation function:

def sim_func(H, persis_info, sim_specs, libE_info):

    input_param = str(int(H["x"][0][0]))
    exctr = libE_info["executor"]

    task = exctr.submit(
        app_name="sim1",
        num_procs=8,
        app_args=input_param,
        stdout="out.txt",
        stderr="err.txt",
    )

    # Wait for task to complete
    task.wait()

Example use-cases:

Electrostatic Forces example: Launches the forces.x MPI application.
Forces example with GPUs: Auto-assigns GPUs via executor.

See the Executor or MPIExecutor interface for the complete API.

See Running on HPC Systems for illustrations of how common options such as libE_specs["dedicated_mode"] affect the run configuration on clusters and supercomputers.

Advanced Features

Example of polling output and killing application:

In simulation function (sim_f).

import time


def sim_func(H, persis_info, sim_specs, libE_info):
    input_param = str(int(H["x"][0][0]))
    exctr = libE_info["executor"]

    task = exctr.submit(
        app_name="sim1",
        num_procs=8,
        app_args=input_param,
        stdout="out.txt",
        stderr="err.txt",
    )

    timeout_sec = 600
    poll_delay_sec = 1

    while not task.finished:
        # Has manager sent a finish signal
        if exctr.manager_kill_received():
            task.kill()
            my_cleanup()

        # Check output file for error and kill task
        elif task.stdout_exists():
            if "Error" in task.read_stdout():
                task.kill()

        elif task.runtime > timeout_sec:
            task.kill()  # Timeout

        else:
            time.sleep(poll_delay_sec)
            task.poll()

    print(task.state)  # state may be finished/failed/killed

Users who wish to poll only for manager kill signals and timeouts don’t necessarily need to construct a polling loop like above, but can instead use the Executor built-in polling_loop() method. An alternative to the above simulation function may resemble:

def sim_func(H, persis_info, sim_specs, libE_info):
    input_param = str(int(H["x"][0][0]))
    exctr = libE_info["executor"]

    task = exctr.submit(
        app_name="sim1",
        num_procs=8,
        app_args=input_param,
        stdout="out.txt",
        stderr="err.txt",
    )

    timeout_sec = 600
    poll_delay_sec = 1

    exctr.polling_loop(task, timeout=timeout_sec, delay=poll_delay_sec)

    print(task.state)  # state may be finished/failed/killed

Note

Applications or tasks submitted via the Balsam Executor are referred to as “jobs” within Balsam, including within Balsam’s database and when describing the state of a completed submission.

The MPIExecutor autodetects system criteria such as the appropriate MPI launcher and mechanisms to poll and kill tasks. It also has access to the resource manager, which partitions resources among workers, ensuring that runs utilize different resources (e.g., nodes). Furthermore, the MPIExecutor offers resilience via the feature of re-launching tasks that fail to start because of system factors.

Various back-end mechanisms may be used by the Executor to best interact with each system, including proxy launchers or task management systems such as Balsam. Currently, these Executors launch at the application level within an existing resource pool. However, submissions to a batch scheduler may be supported in future Executors.