General Specs
libEnsemble is primarily customized by setting options within a libE_specs
dictionary or using
the LibeSpecs
class. When provided as a Python class, options are validated immediately on instantiation.
libE_specs = {
"comm": MPI.COMM_WORLD,
"comms": "mpi",
"save_every_k_gens": 1000,
"sim_dirs_make": True,
"ensemble_dir_path": "/scratch/ensemble",
"profile_worker": False,
}
- “comms” [str] =
"mpi"
: Manager/Worker communications mode Options are
"mpi"
,"local"
,"tcp"
- “nworkers” [int]:
Number of worker processes to spawn (only in local/tcp modes)
- “mpi_comm” [MPI communicator] =
MPI.COMM_WORLD
: libEnsemble communicator if MPI comms are being used
- “dry_run” [bool] =
False
: Whether libEnsemble should immediately exit after validating all inputs
- “abort_on_exception” [bool] =
True
: In MPI mode, whether to call
MPI_ABORT
on an exception. IfFalse
, an exception will be raised by the manager.- “save_every_k_sims” [int]:
Save history array to file after every k simulated points.
- “save_every_k_gens” [int]:
Save history array to file after every k generated points.
- “save_H_and_persis_on_abort” [bool] =
True
: Whether libEnsemble should save the states of
H
andpersis_info
on aborting after an error.- “worker_timeout” [int] =
1
: When libEnsemble concludes and attempts to close down workers, the number of seconds until workers are considered timed out. Worker processes are then terminated.
- “kill_canceled_sims” [bool] =
True
: Try to kill sims with
"cancel_requested"
setTrue
. IfFalse
, the manager avoids this moderate overhead.- “disable_log_files” [bool] =
False
: Disable the creation of
"ensemble.log"
and"libE_stats.txt"
.
- “use_workflow_dir” [bool] =
False
: Whether to place all log files, dumped arrays, and default ensemble-directories in a separate
workflow
directory. Each run is suffixed with a hash. If copying back an ensemble directory from another location, the copy is placed here.- “workflow_dir_path” [str]:
Optional path to the workflow directory. Autogenerated in the current directory if
use_workflow_dir
is specified.- “ensemble_dir_path” [str] =
"./ensemble"
: Path to main ensemble directory. Can serve as single working directory for workers, or contain calculation directories.
libE_specs["ensemble_dir_path"] = "/scratch/my_ensemble"
- “ensemble_copy_back” [bool] =
False
: Whether to copy back directories within
ensemble_dir_path
back to launch location. Useful ifensemble_dir_path
located on node-local storage.- “use_worker_dirs” [bool] =
False
: Whether to organize calculation directories under worker-specific directories:
- /ensemble_dir - /sim0-worker1 - /gen1-worker1 - /sim1-worker2 ...
- /ensemble_dir - /worker1 - /sim0 - /gen1 - /sim4 ... - /worker2 ...
- “sim_dirs_make” [bool] =
False
: Whether to make a simulation-function-call specific working directory.
- “sim_dir_copy_files” [list]:
Paths to files or directories to copy into each sim directory, or ensemble directory.
- “sim_dir_symlink_files” [list]:
Paths to files or directories to symlink into each sim directory, or ensemble directory..
- “sim_input_dir” [str]:
Copy this directory and its contents for each simulation-specific directory. If not using calculation directories, contents are copied to the ensemble directory.
- “gen_dirs_make” [bool] =
False
: Whether to make generator-function-call specific working directory. Each persistent generator creates a single directory.
- “gen_dir_copy_files” [list]:
Paths to files or directories to copy into each gen directory, or ensemble directory.
- “gen_dir_symlink_files” [list]:
Paths to files or directories to symlink into each gen directory.
- “gen_input_dir” [str]:
Copy this directory and its contents for each generator-instance specific directory. If not using calculation directories, contents are copied to the ensemble directory.
- “profile” [bool] =
False
: Profile manager and worker logic using
cProfile
.- “safe_mode” [bool] =
True
: Prevents user functions from overwriting internal fields, but requires moderate overhead.
- “stats_fmt” [dict]:
A dictionary of options for formatting
"libE_stats.txt"
. See “Formatting Options for libE_stats File” for more options.
- “workers” [list]:
TCP Only: A list of worker hostnames.
- “ip” [str]:
TCP Only: IP address for Manager’s system
- “port” [int]:
TCP Only: Port number for Manager’s system
- “authkey” [str]:
TCP Only: Authkey for Manager’s system
- “workerID” [int]:
TCP Only: Worker ID number assigned to the new process.
- “worker_cmd” [list]:
TCP Only: Split string corresponding to worker/client Python process invocation. Contains a local Python path, calling script, and manager/server format-fields for
manager_ip
,manager_port
,authkey
, andworkerID
.nworkers
is specified normally.
- “use_persis_return_gen” [bool] =
False
: Adds persistent generator function H return to managers history array.
- “use_persis_return_sim” [bool] =
False
: Adds persistent simulator function H return to managers history array.
- “final_fields” [list] =
[]
: List of fields in H that the manager will return to persistent workers along with the
PERSIS_STOP
tag at the end of the run.
- “disable_resource_manager” [bool] =
False
: Disable the built-in resource manager, including automatic resource detection and/or assignment of resources to workers.
"resource_info"
will be ignored.- “platform” [str]:
Name of a known platform, e.g.,
libE_specs["platform"] = "perlmutter_g"
Alternatively specify by setting theLIBE_PLATFORM
environment variable.- “platform_specs” [Platform|dict]:
A
Platform
object (or dictionary) specifying settings for a platform.. Fields not provided will be auto-detected. Can be set to a known platform object.- “num_resource_sets” [int]:
The total number of resource sets into which resources will be divided. By default resources will be divided by workers (excluding
zero_resource_workers
).- “enforce_worker_core_bounds” [bool] =
False
: Permit submission of tasks with a higher processor count than the CPUs available to the worker. Larger node counts are not allowed. Ignored when
disable_resource_manager
is set.- “dedicated_mode” [bool] =
False
: Disallow any resources running libEnsemble processes (manager and workers) from being valid targets for app submissions.
- “zero_resource_workers” [list of ints]:
List of workers (by IDs) that require no resources. For when a fixed mapping of workers to resources is required. Otherwise, use
"num_resource_sets"
. For use with supported allocation functions.- “resource_info” [dict]:
Provide resource information that will override automatically detected resources. The allowable fields are given below in “Overriding Auto-detection” Ignored if
"disable_resource_manager"
is set.- “scheduler_opts” [dict]:
Options for the resource scheduler. See “Scheduler Options” for more options.
Complete Class API
- pydantic model libensemble.specs.LibeSpecs
Specifications for configuring libEnsemble’s runtime behavior. Equivalent to a
libE_specs
dictionary.- field abort_on_exception: bool | None = True
In MPI mode, whether to call
MPI_ABORT
on an exception. If False, an exception will be raised by the manager
- field authkey: str | None = 'libE_auth_1015'
TCP Only: Authkey for Manager’s system
- field comms: str | None = 'mpi'
Manager/Worker communications mode.
'mpi'
,'local'
, or'tcp'
- field dedicated_mode: bool | None = False
Instructs libEnsemble to not run applications on resources where libEnsemble processes (manager and workers) are running
- field disable_log_files: bool | None = False
Disable the creation of
ensemble.log
andlibE_stats.txt
log files
- field disable_resource_manager: bool | None = False
Disable the built-in resource manager. If
True
, automatic resource detection and/or assignment of resources to workers is disabled.resource_info
will also be ignored
- field dry_run: bool | None = False
Whether libEnsemble should immediately exit after validating all inputs
- field enforce_worker_core_bounds: bool | None = False
If
False
, the Executor will permit submission of tasks with a higher processor count than the CPUs available to the worker as detected by the resource manager. Larger node counts are not allowed. When"disable_resource_manager"
isTrue
, this argument is ignored
- field ensemble_copy_back: bool | None = False
Whether to copy back directories within
ensemble_dir_path
back to launch location. Useful if ensemble directory placed on node-local storage
- field ensemble_dir_path: str | Path | None = PosixPath('ensemble')
Path to main ensemble directory containing calculation directories. Can serve as single working directory for workers, or contain calculation directories
- field final_fields: List[str] | None = []
List of fields in
H
that the manager will return to persistent workers along with thePERSIS_STOP
tag at the end of a run
- field gen_dir_copy_files: List[str | Path] | None = []
Paths to files or directories to copy into each generator or ensemble directory. List of strings or pathlib.Path objects
- field gen_dir_symlink_files: List[str | Path] | None = []
Paths to files or directories to symlink into each generator directory. List of strings or pathlib.Path objects
- field gen_dirs_make: bool | None = False
Whether to make generator-specific calculation directories for each generator function call. By default all workers operate within the top-level ensemble directory
- field gen_input_dir: str | Path | None = None
Copy this directory and its contents for each generator-instance-specific directory. If not using calculation directories, contents are copied to the ensemble directory
- field ip: str | None = None
TCP Only: IP address for Manager’s system
- field kill_canceled_sims: bool | None = True
Instructs libEnsemble to send kill signals to sims with their
cancel_requested
field set. IfFalse
, the manager avoids this moderate overhead
- field mpi_comm: MPI_Communicator | None = None
libEnsemble communicator. Default:
MPI.COMM_WORLD
- field num_resource_sets: int | None = None
Total number of resource sets. Resources will be divided into this number. If not set, resources will be divided evenly (excluding zero_resource_workers).
- field nworkers: int | None = None
Number of worker processes to spawn (only in local/tcp modes)
- field platform: str | None = ''
Name of a known platform defined in the platforms module.
Example:
libE_specs["platform"] = "perlmutter_g"
Note: the environment variable LIBE_PLATFORM is an alternative way of setting.
E.g., on command line or batch submission script:
export LIBE_PLATFORM="perlmutter_g"
See also option
platform_specs
.
- field platform_specs: Platform | dict | None = {}
A Platform obj (or dictionary) specifying settings for a platform.
Example usage in calling script.
To use existing platform:
from libensemble.resources.platforms import PerlmutterGPU libE_specs["platform_specs"] = PerlmutterGPU()
Or define a platform:
from libensemble.resources.platforms import Platform libE_specs["platform_specs"] = Platform( mpi_runner="srun", cores_per_node=64, logical_cores_per_node=128, gpus_per_node=8, gpu_setting_type="runner_default", scheduler_match_slots=False, )
For list of Platform fields see
Platform Fields
Any fields not given, will be auto-detected by libEnsemble.
See also option
platform
.
- field port: int | None = 0
TCP Only: Port number for Manager’s system
- field profile: bool | None = False
Profile manager and worker logic using cProfile
- field resource_info: dict | None = {}
Resource information to override automatically detected resources. Allowed fields are given below in ‘Overriding Auto-detection’ Note that if
disable_resource_manager
is set then this option is ignored
- field safe_mode: bool | None = True
Prevents user functions from overwriting protected History fields, but requires moderate overhead
- field save_H_and_persis_on_abort: bool | None = True
Save states of
H
andpersis_info
on aborting after an exception
- field save_every_k_gens: int | None = 0
Save history array to file after every k generated points
- field save_every_k_sims: int | None = 0
Save history array to file after every k evaluated points
- field scheduler_opts: dict | None = {}
Options for the resource scheduler. See ‘Scheduler Options’ for more info
- field sim_dir_copy_files: List[str | Path] | None = []
Paths to files or directories to copy into each simulation or ensemble directory. List of strings or pathlib.Path objects
- field sim_dir_symlink_files: List[str | Path] | None = []
Paths to files or directories to symlink into each simulation directory. List of strings or pathlib.Path objects
- field sim_dirs_make: bool | None = False
Whether to make simulation-specific calculation directories for each simulation function call. By default all workers operate within the top-level ensemble directory
- field sim_input_dir: str | Path | None = None
Copy this directory and its contents for each simulation-specific directory. If not using calculation directories, contents are copied to the ensemble directory
- field stats_fmt: dict | None = {}
Options for formatting ‘libE_stats.txt’. See ‘Formatting Options for libE_stats File’ for more info
- field use_persis_return_gen: bool | None = False
Adds persistent generator output fields to the History array on return
- field use_persis_return_sim: bool | None = False
Adds persistent simulator output fields to the History array on return
- field use_worker_dirs: bool | None = False
Whether to organize calculation directories under worker-specific directories
- field use_workflow_dir: bool | None = False
Whether to place all log files, dumped arrays, and default ensemble-directories in a separate workflow directory. New runs and their workflow directories will be automatically differentiated. If copying back an ensemble directory from a scratch space, the copy is placed in the workflow directory.
- field workerID: int | None = None
TCP Only: Worker ID number assigned to the new process
- field worker_cmd: List[str] | None = None
TCP Only: Split string corresponding to worker/client Python process invocation. Contains a local Python path, calling script, and manager/server format-fields for manager_ip, manager_port, authkey, and workerID. nworkers is specified normally
- field worker_timeout: int | None = 1
On libEnsemble shutdown, number of seconds after which workers considered timed out, then terminated
- field workers: List[str] | None = None
TCP Only: A list of worker hostnames
- field workflow_dir_path: str | Path | None = '.'
Optional path to the workflow directory. Autogenerated in the current directory if use_workflow_dir is specified.
- field zero_resource_workers: List[int] | None = []
List of workers that require no resources. For when a fixed mapping of workers to resources is required. Otherwise, use
num_resource_sets
For use with supported allocation functions
Known Platforms List
Known_platforms
- pydantic model libensemble.resources.platforms.Known_platforms
A list of platforms with known configurations.
There are three ways to specify a known system:
from libensemble.resources.platforms import PerlmutterGPU libE_specs["platform_specs"] = PerlmutterGPU()
libE_specs["platform"] = "perlmutter_g"
On command-line or batch submission script:
export LIBE_PLATFORM="perlmutter_g"
If the platform is not specified, libEnsemble will attempt detect known platforms (this is not guaranteed).
Note: libEnsemble should work on any platform, and detects most system configurations correctly. These options are helpful for optimization and where auto-detection encounters ambiguity or an unknown feature.
- field generic_rocm: GenericROCm
- field crusher: Crusher
- field frontier: Frontier
- field perlmutter_c: PerlmutterCPU
- field perlmutter_g: PerlmutterGPU
- field polaris: Polaris
- field spock: Spock
- field summit: Summit
- field sunspot: Sunspot
Platform Fields
Platform Fields
- pydantic model libensemble.resources.platforms.Platform
Class to define attributes of a target platform.
All are optional, and any not defined will be determined by libEnsemble’s auto-detection.
- field mpi_runner: str | None
MPI runner: One of
"mpich"
,"openmpi"
,"aprun"
,"srun"
,"jsrun"
,"msmpi"
,"custom"
- field runner_name: str | None
Literal string of MPI runner command. Only needed if different to the default
Note that
"mpich"
and"openmpi"
runners have the default command"mpirun"
- field cores_per_node: int | None
Number of physical CPU cores on a compute node of the platform
- field logical_cores_per_node: int | None
Number of logical CPU cores on a compute node of the platform
- field gpus_per_node: int | None
Number of GPU devices on a compute node of the platform
- field gpu_setting_type: str | None
How GPUs will be assigned.
Must take one of the following string options.
"runner_default"
: Use default setting for MPI runner (same as if not set)."env"
: Use an environment variable (comma separated list of slots)"option_gpus_per_node"
: Expresses GPUs per node on MPI runner command line."option_gpus_per_task"
: Expresses GPUs per task on MPI runner command line.
With the exception of “runner_default”, the
gpu_setting_name
attribute is also required when this attribute is set.If “gpu_setting_type” is not provided (same as
runner_default
) and the MPI runner does not have a default GPU setting in libEnsemble, and no other information is present, then the environment variableCUDA_VISIBLE_DEVICES
is used.Examples:
Use environment variable ROCR_VISIBLE_DEVICES to assign GPUs.
"gpu_setting_type" = "env" "gpu_setting_name" = "ROCR_VISIBLE_DEVICES"
Use command line option
--gpus-per-node
"gpu_setting_type" = "option_gpus_per_node" "gpu_setting_name" = "--gpus-per-node"
- field gpu_setting_name: str | None
Name of GPU setting
See
gpu_setting_type
for more details.
- field scheduler_match_slots: bool | None
Whether the libEnsemble resource scheduler should only assign matching slots when there are multiple (partial) nodes assigned to a sim function.
Defaults to
True
, within libEnsemble.Useful if setting an environment variable such as
CUDA_VISIBLE_DEVICES
, where the value should match on each node of an MPI run (choose True).When using command-line options just as
--gpus-per-node
, which allow the systems application level scheduler to manager GPUs, thenmatch_slots
can be False (allowing for more efficient scheduling when MPI runs cross nodes).
Scheduler Options
See options for built-in scheduler.
Overriding Resource Auto-Detection
Note that "cores_on_node"
and "gpus_on_node"
are supported for backward
compatibility, but use of platform_specs
is recommended for these settings.
Resource Info Fields
The allowable libE_specs["resource_info"]
fields are:
"cores_on_node" [tuple (int, int)]:
Tuple (physical cores, logical cores) on nodes.
"gpus_on_node" [int]:
Number of GPUs on each node.
"node_file" [str]:
Name of file containing a node-list. Default is "node_list".
"nodelist_env_slurm" [str]:
The environment variable giving a node list in Slurm format
(Default: Uses ``SLURM_NODELIST``). Queried only if
a ``node_list`` file is not provided and the resource manager is
enabled.
"nodelist_env_cobalt" [str]:
The environment variable giving a node list in Cobalt format
(Default: Uses ``COBALT_PARTNAME``) Queried only
if a ``node_list`` file is not provided and the resource manager
is enabled.
"nodelist_env_lsf" [str]:
The environment variable giving a node list in LSF format
(Default: Uses ``LSB_HOSTS``) Queried only
if a ``node_list`` file is not provided and the resource manager
is enabled.
"nodelist_env_lsf_shortform" [str]:
The environment variable giving a node list in LSF short-form
format (Default: Uses ``LSB_MCPU_HOSTS``) Queried only
if a ``node_list`` file is not provided and the resource manager is
enabled.
For example:
customizer = {cores_on_node": (16, 64),
"node_file": "libe_nodes"}
libE_specs["resource_info"] = customizer
Formatting libE_stats.txt
The allowable libE_specs["stats_fmt"]
fields are:
"task_timing" [bool] = ``False``:
Outputs elapsed time for each task launched by the executor.
"task_datetime" [bool] = ``False``:
Outputs the elapsed time and start and end time for each task launched by the executor.
Can be used with the ``"plot_libe_tasks_util_v_time.py"`` to give task utilization plots.
"show_resource_sets" [bool] = ``False``:
Shows the resource set IDs assigned to each worker for each call of the user function.