MPI Executor

This module launches and controls the running of MPI applications.

In order to create an MPI executor, the calling script should contain

exctr = MPIExecutor()

See the executor API below for optional arguments.

class mpi_executor.MPIExecutor(custom_info={})

Bases: libensemble.executors.executor.Executor

The MPI executor can create, poll and kill runnable MPI tasks

__init__(custom_info={})

Instantiate a new MPIExecutor instance.

A new MPIExecutor is created with an application registry and configuration attributes.

This is typically created in the user calling script. The MPIExecutor will use system resource information supplied by the libEsnemble resource manager when submitting tasks.

Parameters

custom_info (dict, optional) – Provide custom overrides to selected variables that are usually auto-detected. See below.

The MPIExecutor automatically detects MPI runners and launch mechanisms. However it is possible to override the detected information using the custom_info argument. This takes a dictionary of values.

The allowable fields are:

'mpi_runner' [string]:
    Select runner: 'mpich', 'openmpi', 'aprun', 'srun', 'jsrun', 'custom'
    All except 'custom' relate to runner classes in libEnsemble.
    Custom allows user to define their own run-lines but without parsing
    arguments or making use of auto-resources.
'runner_name' [string]:
    Runner name: Replaces run command if present. All runners have a default
    except for 'custom'.
'subgroup_launch' [Boolean]:
    Whether MPI runs should be initiatied in a new process group. This needs
    to be correct for kills to work correctly. Use the standalone test at
    libensemble/tests/standalone_tests/kill_test to determine correct value
    for a system.

For example:

customizer = {'mpi_runner': 'mpich',
              'runner_name': 'wrapper -x mpich'}

from libensemble.executors.mpi_executor import MPIExecutor
exctr = MPIExecutor(custom_info=customizer)
default_app(calc_type)

Gets the default app for a given calc type

property gen_default_app

Returns the default generator app

get_app(app_name)

Gets the app for a given app_name or raise exception

get_task(taskid)

Returns the task object for the supplied task ID

kill(task)

Kills a task

manager_poll()

Polls for a manager signal

The executor manager_signal attribute will be updated.

poll(task)

Polls a task

polling_loop(task, timeout=None, delay=0.1, poll_manager=False)

Optional, blocking, generic task status polling loop. Operates until the task finishes, times out, or is optionally killed via a manager signal. On completion, returns a presumptive calc_status integer. Potentially useful for running an application via the Executor until it stops without monitoring its intermediate output.

Parameters
  • task (object) – a Task object returned by the executor on submission

  • timeout (int, optional) – Maximum number of seconds for the polling loop to run. Tasks that run longer than this limit are killed. Default: No timeout

  • delay (int, optional) – Sleep duration between polling loop iterations. Default: 0.1 seconds

  • poll_manager (bool, optional) – Whether to also poll the manager for ‘finish’ or ‘kill’ signals. If detected, the task is killed. Default: False.

Returns

calc_status – presumptive integer attribute describing the final status of a launched task

Return type

int

register_app(full_path, app_name=None, calc_type=None, desc=None)

Registers a user application to libEnsemble.

The full_path of the application must be supplied. Either app_name or calc_type can be used to identify the application in user scripts (in the submit function). app_name is recommended.

Parameters
  • full_path (String) – The full path of the user application to be registered

  • app_name (String, optional) – Name to identify this application.

  • calc_type (String, optional) – Calculation type: Set this application as the default ‘sim’ or ‘gen’ function.

  • desc (String, optional) – Description of this application

serial_setup()

Set up to be called by only one process

set_workerID(workerid)

Sets the worker ID for this executor

set_worker_info(comm, workerid=None)

Sets info for this executor

property sim_default_app

Returns the default simulation app

submit(calc_type=None, app_name=None, num_procs=None, num_nodes=None, procs_per_node=None, machinefile=None, app_args=None, stdout=None, stderr=None, stage_inout=None, hyperthreads=False, dry_run=False, wait_on_start=False, extra_args=None)

Creates a new task, and either executes or schedules execution.

The created task object is returned.

Parameters
  • calc_type (String, optional) – The calculation type: ‘sim’ or ‘gen’ Only used if app_name is not supplied. Uses default sim or gen application.

  • app_name (String, optional) – The application name. Must be supplied if calc_type is not.

  • num_procs (int, optional) – The total number of MPI tasks on which to submit the task

  • num_nodes (int, optional) – The number of nodes on which to submit the task

  • procs_per_node (int, optional) – The processes per node for this task

  • machinefile (string, optional) – Name of a machinefile for this task to use

  • app_args (string, optional) – A string of the application arguments to be added to task submit command line

  • stdout (string, optional) – A standard output filename

  • stderr (string, optional) – A standard error filename

  • stage_inout (string, optional) – A directory to copy files from; default will take from current directory

  • hyperthreads (boolean, optional) – Whether to submit MPI tasks to hyperthreads

  • dry_run (boolean, optional) – Whether this is a dry_run - no task will be launched; instead runline is printed to logger (at INFO level)

  • wait_on_start (boolean, optional) – Whether to wait for task to be polled as RUNNING (or other active/end state) before continuing

  • extra_args (String, optional) – Additional command line arguments to supply to MPI runner. If arguments are recognised as MPI resource configuration (num_procs, num_nodes, procs_per_node) they will be used in resources determination unless also supplied in the direct options.

Returns

task – The launched task object

Return type

obj: Task

Note that if some combination of num_procs, num_nodes, and procs_per_node is provided, these will be honored if possible. If resource detection is on and these are omitted, then the available resources will be divided among workers.

Class-specific Attributes

Class-specific attributes can be set directly to alter the behavior of the MPI Executor. However, they should be used with caution, because they may not be implemented in other executors.

max_submit_attempts

(int) Maximum number of launch attempts for a given task. Default: 5.

fail_time

(int or float) Only if wait_on_start is set. Maximum run time to failure in seconds that results in relaunch. Default: 2.

retry_delay_incr

(int or float) Delay increment between launch attempts in seconds. Default: 5. (E.g. First retry after 5 seconds, then 10 seconds, then 15, etc…)

Example. To increase resilience against submission failures:

taskctrl = MPIExecutor()
taskctrl.max_launch_attempts = 8
taskctrl.fail_time = 5
taskctrl.retry_delay_incr = 10