Resources Module

This module detects and returns system resources

class resources.resources.Resources(top_level_dir=None, central_mode=False, zero_resource_workers=[], allow_oversubscribe=False, launcher=None, cores_on_node=None, node_file=None, nodelist_env_slurm=None, nodelist_env_cobalt=None, nodelist_env_lsf=None, nodelist_env_lsf_shortform=None)

Provides system resources to libEnsemble and executor.

This is intialized when the executor is created with auto_resources set to true.

Object Attributes:

These are set on initialization.

Variables
  • top_level_dir (string) – Directory where searches for node_list file

  • central_mode (boolean) – If true, then running in central mode; otherwise distributed

  • env_resources (EnvResources) – An object storing environment variables used by resources

  • global_nodelist (list) – A list of all nodes available for running user applications

  • logical_cores_avail_per_node (int) – Logical cores (including SMT threads) available on a node

  • physical_cores_avail_per_node (int) – Physical cores available on a node

  • worker_resources (WorkerResources) – An object that can contain worker specific resources

__init__(top_level_dir=None, central_mode=False, zero_resource_workers=[], allow_oversubscribe=False, launcher=None, cores_on_node=None, node_file=None, nodelist_env_slurm=None, nodelist_env_cobalt=None, nodelist_env_lsf=None, nodelist_env_lsf_shortform=None)

Initializes a new Resources instance

Determines the compute resources available for current allocation, including node list and cores/hardware threads available within nodes.

Parameters
  • top_level_dir (string, optional) – Directory libEnsemble runs in (default is current working directory)

  • central_mode (boolean, optional) – If true, then running in central mode, otherwise distributed. Central mode means libE processes (manager and workers) are grouped together and do not share nodes with applications. Distributed mode means Workers share nodes with applications.

  • zero_resource_workers (list of ints, optional) – List of workers that require no resources.

  • allow_oversubscribe (boolean, optional) – If false, then resources will raise an error if task process counts exceed the CPUs available to the worker, as detected by auto_resources. Larger node counts will always raise an error. When auto_resources is off, this argument is ignored.

  • launcher (String, optional) – The name of the job launcher, such as mpirun or aprun. This may be used to obtain intranode information by launching a probing job onto the compute nodes. If not present, the local node will be used to obtain this information.

  • cores_on_node (tuple (int,int), optional) – If supplied gives (physical cores, logical cores) for the nodes. If not supplied, this will be auto-detected.

  • node_file (String, optional) – If supplied, give the name of a file in the run directory to use as a node-list for use by libEnsemble. Defaults to a file named ‘node_list’. If the file does not exist, then the node-list will be auto-detected.

  • nodelist_env_slurm (String, optional) – The environment variable giving a node list in Slurm format (Default: uses SLURM_NODELIST). Note: This is queried only if a node_list file is not provided and auto_resources=True.

  • nodelist_env_cobalt (String, optional) – The environment variable giving a node list in Cobalt format (Default: uses COBALT_PARTNAME). Note: This is queried only if a node_list file is not provided and auto_resources=True.

  • nodelist_env_lsf (String, optional) – The environment variable giving a node list in LSF format (Default: uses LSB_HOSTS). Note: This is queried only if a node_list file is not provided and auto_resources=True.

  • nodelist_env_lsf_shortform (String, optional) – The environment variable giving a node list in LSF short-form format (Default: uses LSB_MCPU_HOSTS) Note: This is only queried if a node_list file is not provided and auto_resources=True.

add_comm_info(libE_nodes)

Adds comms-specific information to resources

Removes libEnsemble nodes from nodelist if in central_mode.

static get_MPI_variant()

Returns MPI base implementation

Returns

mpi_variant – MPI variant ‘aprun’ or ‘jsrun’ or ‘mpich’ or ‘openmpi’

Return type

string:

static is_nodelist_shortnames(nodelist)

Returns True if any entry contains a ‘.’, else False

static remove_nodes(global_nodelist_in, remove_list)

Removes any nodes in remove_list from the global nodelist

static best_split(a, n)

Creates the most even split of list a into n parts and return list of lists

static get_global_nodelist(node_file='node_list', rundir=None, env_resources=None)

Returns the list of nodes available to all libEnsemble workers.

If a node_file exists this is used, otherwise the environment is interrogated for a node list. If a dedicated manager node is used, then a node_file is recommended.

In central mode, any node with a libE worker is removed from the list.

class resources.resources.WorkerResources(workerID, comm, resources)

Provide system resources per worker to libEnsemble and executor.

Object Attributes:

These are set on initialisation.

Variables
  • num_workers (int) – Total number of workers

  • workerID (int) – workerID

  • local_nodelist (list) – A list of all nodes assigned to this worker

  • local_node_count (int) – The number of nodes available to this worker (rounded up to whole number)

  • workers_per_node (int) – The number of workers per node (if using subnode workers)

__init__(workerID, comm, resources)

Initializes a new WorkerResources instance

Determines the compute resources available for current worker, including node list and cores/hardware threads available within nodes.

Parameters
  • workerID (int) – workerID of current process

  • comm (Comm) – The Comm object for manager/worker communications

  • resources (Resources) – A Resources object containing global nodelist and intranode information

static map_workerid_to_index(num_workers, workerID, zero_resource_list)

Map WorkerID to index into a nodelist

static get_workers2assign2(num_workers, resources)

Returns workers to assign resources to

static even_assignment(nnodes, nworkers)

Returns True if workers are evenly distributied to nodes, else False

static expand_list(nnodes, nworkers, nodelist)

Duplicates each element of nodelist to best map workers to nodes.

Returns node list with duplicates, and a list of local (on-node) worker counts, both indexed by worker.

static get_local_nodelist(num_workers, workerID, resources)

Returns the list of nodes available to the current worker

Assumes that self.global_nodelist has been calculated (in __init__). Also self.global_nodelist will have already removed non-application nodes