Allocation Functions
Although the included allocation functions, or alloc_f
’s are sufficient for
most users, those who want to fine-tune how data or resources are allocated to their gen_f
and sim_f
can write their own. The alloc_f
is unique since it is called
by libEnsemble’s manager instead of a worker.
Most alloc_f
function definitions written by users resemble:
def my_allocator(W, H, sim_specs, gen_specs, alloc_specs, persis_info, libE_info):
Where W is an array containing information
about each worker’s state, H is the trimmed History array,
containing rows initialized by the generator, and libE_info
is a set of selected
statistics for allocation functions to better determine the progress of work or
if various exit conditions have been reached.
Inside an alloc_f
, most users first check that it’s appropriate to allocate work,
since if all workers are busy or the maximum amount of work has been evaluated, proceeding
with the allocation function is usually not necessary:
if libE_info["sim_max_given"] or not libE_info["any_idle_workers"]:
return {}, persis_info
If allocation is to continue, a support class is instantiated (see below), and a Work dictionary is initialized:
manage_resources = "resource_sets" in H.dtype.names or libE_info["use_resource_sets"]
support = AllocSupport(W, manage_resources, persis_info, libE_info)
Work = {}
This Work dictionary is populated with integer keys wid
for each worker and
dictionary values to give to those workers. An example Work dictionary from a run of
the test_1d_sampling.py
regression test resembles:
{
1: {
"H_fields": ["x"],
"persis_info": {"rand_stream": RandomState(...) at ..., "worker_num": 1},
"tag": 1,
"libE_info": {"H_rows": array([368])}
},
2: {
"H_fields": ["x"],
"persis_info": {"rand_stream": RandomState(...) at ..., "worker_num": 2},
"tag": 1,
"libE_info": {"H_rows": array([369])}
},
3: {
"H_fields": ["x"],
"persis_info": {"rand_stream": RandomState(...) at ..., "worker_num": 3},
"tag": 1,
"libE_info": {"H_rows": array([370])}
},
...
}
Based on information from the API reference above, this Work dictionary
describes instructions for each of the workers to call the sim_f
(tag: 1
)
with data from the "x"
field and a given "H_row"
from the
History array. A worker specific persis_info
is also given.
Constructing these arrays and determining which workers are available
for receiving data is simplified by use of the AllocSupport
class
available within the libensemble.tools.alloc_support
module:
- class libensemble.tools.alloc_support.AllocSupport(W, manage_resources=False, persis_info={}, libE_info={}, user_resources=None, user_scheduler=None)
A helper class to assist with writing allocation functions.
This class contains methods for common operations like populating work units, determining which workers are available, evaluating what values need to be distributed to workers, and others.
Note that since the
alloc_f
is called periodically by the Manager, this class instance (if used) will be recreated/destroyed on each loop.- __init__(W, manage_resources=False, persis_info={}, libE_info={}, user_resources=None, user_scheduler=None)
Instantiate a new AllocSupport instance
W
is. They are referenced by the various methods, but are never modified.By default, an
AllocSupport
instance uses any initiated libEnsemble resource module and the built-in libEnsemble scheduler.- Parameters:
W – A Worker array
manage_resources – Optional, boolean for if to assign resource sets when creating work units
persis_info – Optional, A dictionary of persistent information.
scheduler_opts – Optional, A dictionary of options to pass to the resource scheduler.
user_resources – Optional, A user supplied
resources
object.user_scheduler – Optional, A user supplied
user_scheduler
object.
- assign_resources(rsets_req, use_gpus=None, user_params=[])
Schedule resource sets to a work record if possible.
For default scheduler, if more than one group (node) is required, will try to find even split, otherwise allocates whole nodes.
Raises
InsufficientFreeResources
if the required resources are not currently available, orInsufficientResourcesError
if the required resources do not exist.- Parameters:
rsets_req – Int. Number of resource sets to request.
use_gpus – Bool. Whether to use GPU resource sets.
user_params – list of Integers. User parameters num_procs, num_gpus
- Returns:
List of Integers. Resource set indices assigned.
- avail_worker_ids(persistent=None, active_recv=False, zero_resource_workers=None)
Returns available workers as a list of IDs, filtered by the given options.
- Parameters:
persistent – Optional int. Only return workers with given
persis_state
(1=sim, 2=gen).active_recv – Optional boolean. Only return workers with given active_recv state.
zero_resource_workers – Optional boolean. Only return workers that require no resources
- Returns:
List of worker IDs
If there are no zero resource workers defined, then the
zero_resource_workers
argument will be ignored.
- count_gens()
Returns the number of active generators.
- test_any_gen()
Returns
True
if a generator worker is active.
- count_persis_gens()
Return the number of active persistent generators.
- sim_work(wid, H, H_fields, H_rows, persis_info, **libE_info)
Add sim work record to given
Work
dictionary.Includes evaluation of required resources if the worker is not in a persistent state.
- Parameters:
wid – Int. Worker ID.
H – History array. For parsing out requested resource sets.
H_fields – Which fields from H to send
H_rows – Which rows of
H
to send.persis_info – Worker specific persis_info dictionary
- Returns:
a Work entry
Additional passed parameters are inserted into
libE_info
in the resulting work record.If
rset_team
is passed as an additional parameter, it will be honored, assuming that any resource checking has already been done.
- gen_work(wid, H_fields, H_rows, persis_info, **libE_info)
Add gen work record to given
Work
dictionary.Includes evaluation of required resources if the worker is not in a persistent state.
- Parameters:
Work – Work dictionary
wid – Worker ID.
H_fields – Which fields from H to send
H_rows – Which rows of
H
to send.persis_info – Worker specific persis_info dictionary
- Returns:
A Work entry
Additional passed parameters are inserted into
libE_info
in the resulting work record.If
rset_team
is passed as an additional parameter, it will be honored, and assume that any resource checking has already been done. For example, passingrset_team=[]
, would ensure that no resources are assigned.
- all_sim_started(H, pt_filter=None, low_bound=None)
Returns
True
if all expected points have started their simExcludes cancelled points.
- Parameters:
pt_filter – Optional boolean array filtering expected returned points in
H
.low_bound – Optional lower bound for testing all returned.
- Returns:
True if all expected points have started their sim
- all_sim_ended(H, pt_filter=None, low_bound=None)
Returns
True
if all expected points have had their sim_endExcludes cancelled points that were not already sim_started.
- Parameters:
pt_filter – Optional boolean array filtering expected returned points in
H
.low_bound – Optional lower bound for testing all returned.
- Returns:
True if all expected points have had their sim_end
- all_gen_informed(H, pt_filter=None, low_bound=None)
Returns
True
if gen has been informed of all expected pointsExcludes cancelled points that were not already given out.
- Parameters:
pt_filter – Optional boolean array filtering expected sim_end points in
H
.low_bound – Optional lower bound for testing all returned.
- Returns:
True if gen have been informed of all expected points
- points_by_priority(H, points_avail, batch=False)
Returns indices of points to give by priority
- Parameters:
points_avail – Indices of points that are available to give
batch – Optional boolean. Should batches of points with the same priority be given simultaneously.
- Returns:
An array of point indices to give.
The Work dictionary is returned to the manager alongside persis_info
. If 1
is returned as third value, this instructs the ensemble to stop.
For allocation functions, as with the other user functions, the level of complexity can vary widely. Various scheduling and work distribution features are available in the existing allocation functions, including prioritization of simulations, returning evaluation outputs to the generator immediately or in batch, assigning varying resource sets to evaluations, and other methods of fine-tuned control over the data available to other user functions.
Information from the manager describing the progress of the current libEnsemble
routine can be found in libE_info
, passed into the allocation function:
libE_info = {"exit_criteria": dict, # Criteria for ending routine
"elapsed_time": float, # Time elapsed since start of routine
"manager_kill_canceled_sims": bool, # True if manager is to send kills to cancelled simulations
"sim_started_count": int, # Total number of points given for simulation function evaluation
"sim_ended_count": int, # Total number of points returned from simulation function evaluations
"gen_informed_count": int, # Total number of evaluated points given back to a generator function
"sim_max_given": bool, # True if `sim_max` simulations have been given out to workers
"use_resource_sets": bool} # True if num_resource_sets has been explicitly set.
In most supplied examples, the allocation function will just return once sim_max_given
is True
, but the user could choose to do something different, such as cancel points or
keep returning completed points to the generator. Generators that construct models based
on all evaluated points, for example, may need simulation work units at the end
of an ensemble to be returned to the generator anyway.
Alternatively, users can use elapsed_time
to track the ensembles runtime inside their
allocation function and detect impending timeouts, then pack up cleanup work requests,
or mark points for cancellation.
The remaining values above are useful for efficient filtering of H values
(e.g., sim_ended_count
), saves a filtering an entire column of H.
Note
An error occurs when the alloc_f
returns nothing while
all workers are idle
Descriptions of included allocation functions can be found here.
The default allocation function used by libEnsemble if one isn’t specified is
give_sim_work_first
. During its worker ID loop, it checks if there’s unallocated
work and assigns simulations for that work. Otherwise, it initializes
generators for up to "num_active_gens"
instances. Other settings like
batch_mode
is also supported. See
here for more information about give_sim_work_first
.
For a shorter, simpler example, here is the fast_alloc
allocation function:
from libensemble.tools.alloc_support import AllocSupport, InsufficientFreeResources
def give_sim_work_first(W, H, sim_specs, gen_specs, alloc_specs, persis_info, libE_info):
"""
This allocation function gives (in order) entries in ``H`` to idle workers
to evaluate in the simulation function. The fields in ``sim_specs["in"]``
are given. If all entries in `H` have been given a be evaluated, a worker
is told to call the generator function, provided this wouldn't result in
more than ``alloc_specs["user"]["num_active_gen"]`` active generators.
This fast_alloc variation of give_sim_work_first is useful for cases that
simply iterate through H, issuing evaluations in order and, in particular,
is likely to be faster if there will be many short simulation evaluations,
given that this function contains fewer column length operations.
tags: alloc, simple, fast
.. seealso::
`test_fast_alloc.py <https://github.com/Libensemble/libensemble/blob/develop/libensemble/tests/regression_tests/test_fast_alloc.py>`_ # noqa
"""
if libE_info["sim_max_given"] or not libE_info["any_idle_workers"]:
return {}, persis_info
user = alloc_specs.get("user", {})
manage_resources = libE_info["use_resource_sets"]
support = AllocSupport(W, manage_resources, persis_info, libE_info)
gen_count = support.count_gens()
Work = {}
gen_in = gen_specs.get("in", [])
for wid in support.avail_worker_ids():
# Skip any cancelled points
while persis_info["next_to_give"] < len(H) and H[persis_info["next_to_give"]]["cancel_requested"]:
persis_info["next_to_give"] += 1
# Give sim work if possible
if persis_info["next_to_give"] < len(H):
try:
Work[wid] = support.sim_work(wid, H, sim_specs["in"], [persis_info["next_to_give"]], [])
except InsufficientFreeResources:
break
persis_info["next_to_give"] += 1
elif gen_count < user.get("num_active_gens", gen_count + 1):
# Give gen work
return_rows = range(len(H)) if gen_in else []
try:
Work[wid] = support.gen_work(wid, gen_in, return_rows, persis_info.get(wid))
except InsufficientFreeResources:
break
gen_count += 1
persis_info["total_gen_calls"] += 1
return Work, persis_info