Resource Detection
The resource manager can detect system resources, and partition these to workers. The MPI Executor accesses the resources available to the current worker when launching tasks.
Node-lists are detected by an environment variable on the following systems:
Scheduler |
Nodelist Env. variable |
---|---|
SLURM |
SLURM_NODELIST |
COBALT |
COBALT_PARTNAME |
LSF |
LSB_HOSTS/LSB_MCPU_HOSTS |
PBS |
PBS_NODEFILE |
These environment variable names can be modified via the resource_info
libE_specs
option.
On other systems you may have to supply a node list in a file called node_list in your run directory. For example, on ALCF system Cooley, the session node list can be obtained as follows:
cat $COBALT_NODEFILE > node_list
Resource detection can be disabled by setting
libE_specs["disable_resource_manager"] = True
, and users can simply supply run
configuration options on the Executor submit line.
This will usually work sufficiently on
systems that have application-level scheduling and queuing (e.g., jsrun
on Summit).
However, on many cluster and multi-node systems, if the built-in resource
manager is disabled, then runs without a hostlist or machinefile supplied may be
undesirably scheduled to the same nodes.
System detection for resources can be overridden using the resource_info
libE_specs
option.