Resources Module

This module detects and returns system resources

class resources.resources.Resources(libE_specs, platform_info={}, top_level_dir=None)

Provides system resources to libEnsemble and executor.

A resources instance is always initialized unless libE_specs["disable_resource_manager"] is True.

Class Attributes:

Variables:

Resources – resources: The resources object is stored here and can be retrieved in user functions.

Parameters:
  • libE_specs (dict) –

  • platform_info (dict) –

  • top_level_dir (str) –

Object Attributes:

These are set on initialization.

Variables:
  • top_level_dir (string) – Directory where searches for node_list file.

  • glob_resources (GlobalResources) – Maintains resources available to libEnsemble.

Parameters:
  • libE_specs (dict) –

  • platform_info (dict) –

  • top_level_dir (str) –

The following are set up after manager/worker fork.

The resource manager is set up only on the manager, while the worker resources object is set up on workers.

Variables:
  • resource_manager (ResourceManager) – An object that manages resource set assignment to workers.

  • worker_resources (WorkerResources) – An object that contains worker-specific resources.

Parameters:
  • libE_specs (dict) –

  • platform_info (dict) –

  • top_level_dir (str) –

__init__(libE_specs, platform_info={}, top_level_dir=None)

Initiate a new resources object

Parameters:
  • libE_specs (dict) –

  • platform_info (dict) –

  • top_level_dir (str | None) –

Return type:

None

classmethod init_resources(libE_specs, platform_info={})

Initiate resource management

Parameters:
  • libE_specs (dict) –

  • platform_info (dict) –

Return type:

None

set_worker_resources(num_workers, workerid)

Initiate the worker resources component of resources

Parameters:
  • num_workers (int) –

  • workerid (int) –

Return type:

None

set_resource_manager(num_workers)

Initiate the resource manager component of resources

Parameters:

num_workers (int) –

Return type:

None

add_comm_info(libE_nodes)

Adds comms-specific information to resources

Removes libEnsemble nodes from nodelist if in dedicated_mode.

Return type:

None

class resources.resources.GlobalResources(libE_specs, platform_info={}, top_level_dir=None)

Object Attributes:

These are set on initialization. :ivar str top_level_dir: Directory where searches for node_list file :ivar EnvResources env_resources: Object storing environment variables used by resources :ivar list global_nodelist: list of all nodes available for running user applications :ivar int logical_cores_avail_per_node: Logical cores (including SMT threads) available on a node :ivar int physical_cores_avail_per_node: Physical cores available on a node :ivar list zero_resource_workers: List of workerIDs to have no resources. :ivar bool dedicated_mode: Whether to remove libE nodes from global nodelist. :ivar int num_resource_sets: Number of resource sets, if supplied by the user.

Parameters:
  • libE_specs (dict) –

  • platform_info (dict) –

  • top_level_dir (str) –

__init__(libE_specs, platform_info={}, top_level_dir=None)

Initializes a new Resources instance

Determines the compute resources available for current allocation, including node list and cores/hardware threads available within nodes.

The following parameters may be extracted from libE_specs

Parameters:
  • top_level_dir (str, Optional) – Directory libEnsemble runs in (default is current working directory)

  • dedicated_mode (bool, Optional) – If true, then dedicate nodes to running libEnsemble. Dedicated mode means that any nodes running libE processes (manager and workers), will not be available to worker-launched tasks (user applications). They will be removed from the nodelist (if present), before dividing into resource sets.

  • zero_resource_workers (List[int], Optional) – List of workers that require no resources.

  • num_resource_sets (int, Optional) – The total number of resource sets. Resources will be divided into this number. Default: None. If None, resources will be divided by workers (excluding zero_resource_workers).

  • cores_on_node (tuple (int, int), Optional) – If supplied gives (physical cores, logical cores) for the nodes. If not supplied, this will be auto-detected.

  • gpus_on_node (int, Optional) – If supplied gives number of GPUs for the nodes. If not supplied, this will be auto-detected.

  • enforce_worker_core_bounds (bool, Optional) – If True, then libEnsemble’s executor will raise an exception if it detects that a worker has been instructed to launch tasks with the number of requested processes being excessive to the number of cores allocated to that worker, or not enough processes were requested to satisfy allocated cores.

  • node_file (str, Optional) – If supplied, give the name of a file in the run directory to use as a node-list for use by libEnsemble. Defaults to a file named “node_list”. If the file does not exist, then the node-list will be auto-detected.

  • nodelist_env_slurm (str, Optional) – The environment variable giving a node list in Slurm format (Default: uses SLURM_NODELIST). Note: This is queried only if a node_list file is not provided.

  • nodelist_env_cobalt (str, Optional) – The environment variable giving a node list in Cobalt format (Default: uses COBALT_PARTNAME). Note: This is queried only if a node_list file is not provided.

  • nodelist_env_lsf (str, Optional) – The environment variable giving a node list in LSF format (Default: uses LSB_HOSTS). Note: This is queried only if a node_list file is not provided.

  • nodelist_env_lsf_shortform (str, Optional) – The environment variable giving a node list in LSF short-form format (Default: uses LSB_MCPU_HOSTS) Note: This is only queried if a node_list file is not provided.

  • libE_specs (dict) –

  • platform_info (dict) –

Return type:

None

add_comm_info(libE_nodes)

Adds comms-specific information to resources

Removes libEnsemble nodes from nodelist if in dedicated_mode.

update_scheduler_opts(scheduler_opts)

Add scheduler options from platform_info, if not present

static is_nodelist_shortnames(nodelist)

Returns False if any entry contains a ‘.’, else True

static remove_nodes(global_nodelist_in, remove_list)

Removes any nodes in remove_list from the global nodelist

static get_global_nodelist(node_file='node_list', rundir=None, env_resources=None)

Returns the list of nodes available to all libEnsemble workers.

If a node_file exists this is used, otherwise the environment is interrogated for a node list. If a dedicated manager node is used, then a node_file is recommended.

In dedicated mode, any node with a libE worker is removed from the list.