General Specs

libEnsemble is primarily customized by setting options within a libE_specs dictionary or using the LibeSpecs class. When provided as a Python class, options are validated immediately on instantiation.

libE_specs = {
    "comm": MPI.COMM_WORLD,
    "comms": "mpi",
    "save_every_k_gens": 1000,
    "sim_dirs_make": True,
    "ensemble_dir_path": "/scratch/ensemble",
    "profile_worker": False,
}
“comms” [str] = "mpi":

Manager/Worker communications mode Options are "mpi", "local", "tcp"

“nworkers” [int]:

Number of worker processes to spawn (only in local/tcp modes)

“mpi_comm” [MPI communicator] = MPI.COMM_WORLD:

libEnsemble communicator if MPI comms are being used

“dry_run” [bool] = False:

Whether libEnsemble should immediately exit after validating all inputs

“abort_on_exception” [bool] = True:

In MPI mode, whether to call MPI_ABORT on an exception. If False, an exception will be raised by the manager.

“save_every_k_sims” [int]:

Save history array to file after every k simulated points.

“save_every_k_gens” [int]:

Save history array to file after every k generated points.

“save_H_and_persis_on_abort” [bool] = True:

Whether libEnsemble should save the states of H and persis_info on aborting after an error.

“worker_timeout” [int] = 1:

When libEnsemble concludes and attempts to close down workers, the number of seconds until workers are considered timed out. Worker processes are then terminated.

“kill_canceled_sims” [bool] = True:

Try to kill sims with "cancel_requested" set True. If False, the manager avoids this moderate overhead.

“disable_log_files” [bool] = False:

Disable the creation of "ensemble.log" and "libE_stats.txt".

“use_workflow_dir” [bool] = False:

Whether to place all log files, dumped arrays, and default ensemble-directories in a separate workflow directory. Each run is suffixed with a hash. If copying back an ensemble directory from another location, the copy is placed here.

“workflow_dir_path” [str]:

Optional path to the workflow directory. Autogenerated in the current directory if use_workflow_dir is specified.

“ensemble_dir_path” [str] = "./ensemble":

Path to main ensemble directory. Can serve as single working directory for workers, or contain calculation directories.

libE_specs["ensemble_dir_path"] = "/scratch/my_ensemble"
“ensemble_copy_back” [bool] = False:

Whether to copy back directories within ensemble_dir_path back to launch location. Useful if ensemble_dir_path located on node-local storage.

“use_worker_dirs” [bool] = False:

Whether to organize calculation directories under worker-specific directories:

- /ensemble_dir
    - /sim0-worker1
    - /gen1-worker1
    - /sim1-worker2
    ...
- /ensemble_dir
    - /worker1
        - /sim0
        - /gen1
        - /sim4
        ...
    - /worker2
    ...
“sim_dirs_make” [bool] = False:

Whether to make a simulation-function-call specific working directory.

“sim_dir_copy_files” [list]:

Paths to files or directories to copy into each sim directory, or ensemble directory.

“sim_dir_symlink_files” [list]:

Paths to files or directories to symlink into each sim directory, or ensemble directory..

“sim_input_dir” [str]:

Copy this directory and its contents for each simulation-specific directory. If not using calculation directories, contents are copied to the ensemble directory.

“gen_dirs_make” [bool] = False:

Whether to make generator-function-call specific working directory. Each persistent generator creates a single directory.

“gen_dir_copy_files” [list]:

Paths to files or directories to copy into each gen directory, or ensemble directory.

“gen_dir_symlink_files” [list]:

Paths to files or directories to symlink into each gen directory.

“gen_input_dir” [str]:

Copy this directory and its contents for each generator-instance specific directory. If not using calculation directories, contents are copied to the ensemble directory.

“profile” [bool] = False:

Profile manager and worker logic using cProfile.

“safe_mode” [bool] = True:

Prevents user functions from overwriting internal fields, but requires moderate overhead.

“stats_fmt” [dict]:

A dictionary of options for formatting "libE_stats.txt". See “Formatting Options for libE_stats File” for more options.

“workers” [list]:

TCP Only: A list of worker hostnames.

“ip” [str]:

TCP Only: IP address for Manager’s system

“port” [int]:

TCP Only: Port number for Manager’s system

“authkey” [str]:

TCP Only: Authkey for Manager’s system

“workerID” [int]:

TCP Only: Worker ID number assigned to the new process.

“worker_cmd” [list]:

TCP Only: Split string corresponding to worker/client Python process invocation. Contains a local Python path, calling script, and manager/server format-fields for manager_ip, manager_port, authkey, and workerID. nworkers is specified normally.

“use_persis_return_gen” [bool] = False:

Adds persistent generator function H return to managers history array.

“use_persis_return_sim” [bool] = False:

Adds persistent simulator function H return to managers history array.

“final_fields” [list] = []:

List of fields in H that the manager will return to persistent workers along with the PERSIS_STOP tag at the end of the run.

“disable_resource_manager” [bool] = False:

Disable the built-in resource manager, including automatic resource detection and/or assignment of resources to workers. "resource_info" will be ignored.

“platform” [str]:

Name of a known platform, e.g., libE_specs["platform"] = "perlmutter_g" Alternatively specify by setting the LIBE_PLATFORM environment variable.

“platform_specs” [Platform|dict]:

A Platform object (or dictionary) specifying settings for a platform.. Fields not provided will be auto-detected. Can be set to a known platform object.

“num_resource_sets” [int]:

The total number of resource sets into which resources will be divided. By default resources will be divided by workers (excluding zero_resource_workers).

“enforce_worker_core_bounds” [bool] = False:

Permit submission of tasks with a higher processor count than the CPUs available to the worker. Larger node counts are not allowed. Ignored when disable_resource_manager is set.

“dedicated_mode” [bool] = False:

Disallow any resources running libEnsemble processes (manager and workers) from being valid targets for app submissions.

“zero_resource_workers” [list of ints]:

List of workers (by IDs) that require no resources. For when a fixed mapping of workers to resources is required. Otherwise, use "num_resource_sets". For use with supported allocation functions.

“resource_info” [dict]:

Provide resource information that will override automatically detected resources. The allowable fields are given below in “Overriding Auto-detection” Ignored if "disable_resource_manager" is set.

“scheduler_opts” [dict]:

Options for the resource scheduler. See “Scheduler Options” for more options.

Complete Class API
pydantic model libensemble.specs.LibeSpecs

Specifications for configuring libEnsemble’s runtime behavior. Equivalent to a libE_specs dictionary.

field abort_on_exception: bool | None = True

In MPI mode, whether to call MPI_ABORT on an exception. If False, an exception will be raised by the manager

field authkey: str | None = 'libE_auth_1015'

TCP Only: Authkey for Manager’s system

field comms: str | None = 'mpi'

Manager/Worker communications mode. 'mpi', 'local', or 'tcp'

field dedicated_mode: bool | None = False

Instructs libEnsemble to not run applications on resources where libEnsemble processes (manager and workers) are running

field disable_log_files: bool | None = False

Disable the creation of ensemble.log and libE_stats.txt log files

field disable_resource_manager: bool | None = False

Disable the built-in resource manager. If True, automatic resource detection and/or assignment of resources to workers is disabled. resource_info will also be ignored

field dry_run: bool | None = False

Whether libEnsemble should immediately exit after validating all inputs

field enforce_worker_core_bounds: bool | None = False

If False, the Executor will permit submission of tasks with a higher processor count than the CPUs available to the worker as detected by the resource manager. Larger node counts are not allowed. When "disable_resource_manager" is True, this argument is ignored

field ensemble_copy_back: bool | None = False

Whether to copy back directories within ensemble_dir_path back to launch location. Useful if ensemble directory placed on node-local storage

field ensemble_dir_path: str | Path | None = PosixPath('ensemble')

Path to main ensemble directory containing calculation directories. Can serve as single working directory for workers, or contain calculation directories

field final_fields: List[str] | None = []

List of fields in H that the manager will return to persistent workers along with the PERSIS_STOP tag at the end of a run

field gen_dir_copy_files: List[str | Path] | None = []

Paths to files or directories to copy into each generator or ensemble directory. List of strings or pathlib.Path objects

Paths to files or directories to symlink into each generator directory. List of strings or pathlib.Path objects

field gen_dirs_make: bool | None = False

Whether to make generator-specific calculation directories for each generator function call. By default all workers operate within the top-level ensemble directory

field gen_input_dir: str | Path | None = None

Copy this directory and its contents for each generator-instance-specific directory. If not using calculation directories, contents are copied to the ensemble directory

field ip: str | None = None

TCP Only: IP address for Manager’s system

field kill_canceled_sims: bool | None = True

Instructs libEnsemble to send kill signals to sims with their cancel_requested field set. If False, the manager avoids this moderate overhead

field mpi_comm: MPI_Communicator | None = None

libEnsemble communicator. Default: MPI.COMM_WORLD

field num_resource_sets: int | None = None

Total number of resource sets. Resources will be divided into this number. If not set, resources will be divided evenly (excluding zero_resource_workers).

field nworkers: int | None = None

Number of worker processes to spawn (only in local/tcp modes)

field platform: str | None = ''

Name of a known platform defined in the platforms module.

See Known Platforms List

Example:

libE_specs["platform"] = "perlmutter_g"

Note: the environment variable LIBE_PLATFORM is an alternative way of setting.

E.g., on command line or batch submission script:

export LIBE_PLATFORM="perlmutter_g"

See also option platform_specs.

field platform_specs: Platform | dict | None = {}

A Platform obj (or dictionary) specifying settings for a platform.

Example usage in calling script.

To use existing platform:

from libensemble.resources.platforms import PerlmutterGPU

libE_specs["platform_specs"] = PerlmutterGPU()

See Known Platforms List

Or define a platform:

from libensemble.resources.platforms import Platform

libE_specs["platform_specs"] = Platform(
    mpi_runner="srun",
    cores_per_node=64,
    logical_cores_per_node=128,
    gpus_per_node=8,
    gpu_setting_type="runner_default",
    scheduler_match_slots=False,
)

For list of Platform fields see Platform Fields

Any fields not given, will be auto-detected by libEnsemble.

See also option platform.

field port: int | None = 0

TCP Only: Port number for Manager’s system

field profile: bool | None = False

Profile manager and worker logic using cProfile

field resource_info: dict | None = {}

Resource information to override automatically detected resources. Allowed fields are given below in ‘Overriding Auto-detection’ Note that if disable_resource_manager is set then this option is ignored

field safe_mode: bool | None = True

Prevents user functions from overwriting protected History fields, but requires moderate overhead

field save_H_and_persis_on_abort: bool | None = True

Save states of H and persis_info on aborting after an exception

field save_every_k_gens: int | None = 0

Save history array to file after every k generated points

field save_every_k_sims: int | None = 0

Save history array to file after every k evaluated points

field scheduler_opts: dict | None = {}

Options for the resource scheduler. See ‘Scheduler Options’ for more info

field sim_dir_copy_files: List[str | Path] | None = []

Paths to files or directories to copy into each simulation or ensemble directory. List of strings or pathlib.Path objects

Paths to files or directories to symlink into each simulation directory. List of strings or pathlib.Path objects

field sim_dirs_make: bool | None = False

Whether to make simulation-specific calculation directories for each simulation function call. By default all workers operate within the top-level ensemble directory

field sim_input_dir: str | Path | None = None

Copy this directory and its contents for each simulation-specific directory. If not using calculation directories, contents are copied to the ensemble directory

field stats_fmt: dict | None = {}

Options for formatting ‘libE_stats.txt’. See ‘Formatting Options for libE_stats File’ for more info

field use_persis_return_gen: bool | None = False

Adds persistent generator output fields to the History array on return

field use_persis_return_sim: bool | None = False

Adds persistent simulator output fields to the History array on return

field use_worker_dirs: bool | None = False

Whether to organize calculation directories under worker-specific directories

field use_workflow_dir: bool | None = False

Whether to place all log files, dumped arrays, and default ensemble-directories in a separate workflow directory. New runs and their workflow directories will be automatically differentiated. If copying back an ensemble directory from a scratch space, the copy is placed in the workflow directory.

field workerID: int | None = None

TCP Only: Worker ID number assigned to the new process

field worker_cmd: List[str] | None = None

TCP Only: Split string corresponding to worker/client Python process invocation. Contains a local Python path, calling script, and manager/server format-fields for manager_ip, manager_port, authkey, and workerID. nworkers is specified normally

field worker_timeout: int | None = 1

On libEnsemble shutdown, number of seconds after which workers considered timed out, then terminated

field workers: List[str] | None = None

TCP Only: A list of worker hostnames

field workflow_dir_path: str | Path | None = '.'

Optional path to the workflow directory. Autogenerated in the current directory if use_workflow_dir is specified.

field zero_resource_workers: List[int] | None = []

List of workers that require no resources. For when a fixed mapping of workers to resources is required. Otherwise, use num_resource_sets For use with supported allocation functions

Known Platforms List

Known_platforms
pydantic model libensemble.resources.platforms.Known_platforms

A list of platforms with known configurations.

There are three ways to specify a known system:

from libensemble.resources.platforms import PerlmutterGPU

libE_specs["platform_specs"] = PerlmutterGPU()
libE_specs["platform"] = "perlmutter_g"

On command-line or batch submission script:

export LIBE_PLATFORM="perlmutter_g"

If the platform is not specified, libEnsemble will attempt detect known platforms (this is not guaranteed).

Note: libEnsemble should work on any platform, and detects most system configurations correctly. These options are helpful for optimization and where auto-detection encounters ambiguity or an unknown feature.

field generic_rocm: GenericROCm
field crusher: Crusher
field frontier: Frontier
field perlmutter_c: PerlmutterCPU
field perlmutter_g: PerlmutterGPU
field polaris: Polaris
field spock: Spock
field summit: Summit
field sunspot: Sunspot

Platform Fields

Platform Fields
pydantic model libensemble.resources.platforms.Platform

Class to define attributes of a target platform.

All are optional, and any not defined will be determined by libEnsemble’s auto-detection.

field mpi_runner: str | None

MPI runner: One of "mpich", "openmpi", "aprun", "srun", "jsrun", "msmpi", "custom"

field runner_name: str | None

Literal string of MPI runner command. Only needed if different to the default

Note that "mpich" and "openmpi" runners have the default command "mpirun"

field cores_per_node: int | None

Number of physical CPU cores on a compute node of the platform

field logical_cores_per_node: int | None

Number of logical CPU cores on a compute node of the platform

field gpus_per_node: int | None

Number of GPU devices on a compute node of the platform

field gpu_setting_type: str | None

How GPUs will be assigned.

Must take one of the following string options.

  • "runner_default": Use default setting for MPI runner (same as if not set).

  • "env": Use an environment variable (comma separated list of slots)

  • "option_gpus_per_node": Expresses GPUs per node on MPI runner command line.

  • "option_gpus_per_task": Expresses GPUs per task on MPI runner command line.

With the exception of “runner_default”, the gpu_setting_name attribute is also required when this attribute is set.

If “gpu_setting_type” is not provided (same as runner_default) and the MPI runner does not have a default GPU setting in libEnsemble, and no other information is present, then the environment variable CUDA_VISIBLE_DEVICES is used.

Examples:

Use environment variable ROCR_VISIBLE_DEVICES to assign GPUs.

"gpu_setting_type" = "env"
"gpu_setting_name" = "ROCR_VISIBLE_DEVICES"

Use command line option --gpus-per-node

"gpu_setting_type" = "option_gpus_per_node"
"gpu_setting_name" = "--gpus-per-node"
field gpu_setting_name: str | None

Name of GPU setting

See gpu_setting_type for more details.

field scheduler_match_slots: bool | None

Whether the libEnsemble resource scheduler should only assign matching slots when there are multiple (partial) nodes assigned to a sim function.

Defaults to True, within libEnsemble.

Useful if setting an environment variable such as CUDA_VISIBLE_DEVICES, where the value should match on each node of an MPI run (choose True).

When using command-line options just as --gpus-per-node, which allow the systems application level scheduler to manager GPUs, then match_slots can be False (allowing for more efficient scheduling when MPI runs cross nodes).

Scheduler Options

See options for built-in scheduler.

Overriding Resource Auto-Detection

Note that "cores_on_node" and "gpus_on_node" are supported for backward compatibility, but use of platform_specs is recommended for these settings.

Resource Info Fields

The allowable libE_specs["resource_info"] fields are:

"cores_on_node" [tuple (int, int)]:
    Tuple (physical cores, logical cores) on nodes.

"gpus_on_node" [int]:
    Number of GPUs on each node.

"node_file" [str]:
    Name of file containing a node-list. Default is "node_list".

"nodelist_env_slurm" [str]:
    The environment variable giving a node list in Slurm format
    (Default: Uses ``SLURM_NODELIST``).  Queried only if
    a ``node_list`` file is not provided and the resource manager is
    enabled.

"nodelist_env_cobalt" [str]:
    The environment variable giving a node list in Cobalt format
    (Default: Uses ``COBALT_PARTNAME``) Queried only
    if a ``node_list`` file is not provided and the resource manager
    is enabled.

"nodelist_env_lsf" [str]:
    The environment variable giving a node list in LSF format
    (Default: Uses ``LSB_HOSTS``) Queried only
    if a ``node_list`` file is not provided and the resource manager
    is enabled.

"nodelist_env_lsf_shortform" [str]:
    The environment variable giving a node list in LSF short-form
    format (Default: Uses ``LSB_MCPU_HOSTS``) Queried only
    if a ``node_list`` file is not provided and the resource manager is
    enabled.

For example:

customizer = {cores_on_node": (16, 64),
            "node_file": "libe_nodes"}

libE_specs["resource_info"] = customizer

Formatting libE_stats.txt

The allowable libE_specs["stats_fmt"] fields are:

"task_timing" [bool] = ``False``:
    Outputs elapsed time for each task launched by the executor.

"task_datetime" [bool] = ``False``:
    Outputs the elapsed time and start and end time for each task launched by the executor.
    Can be used with the ``"plot_libe_tasks_util_v_time.py"`` to give task utilization plots.

"show_resource_sets" [bool] = ``False``:
    Shows the resource set IDs assigned to each worker for each call of the user function.