Convenience Tools and Functions
Calling Script Function Support
- class tools.ForkablePdb(completekey='tab', stdin=None, stdout=None, skip=None, nosigint=False, readrc=True)
A Pdb subclass that may be used from a forked multiprocessing child
Usage:
from libensemble.tools import ForkablePdb ForkablePdb().set_trace()
- tools.add_unique_random_streams(persis_info, nstreams, seed='')
Creates nstreams random number streams for the libE manager and workers when nstreams is num_workers + 1. Stream i is initialized with seed i by default. Otherwise the streams can be initialized with a provided seed.
The entries are appended to the provided persis_info dictionary.
persis_info = add_unique_random_streams(old_persis_info, nworkers + 1)
- Parameters
persis_info (
dict
) – Persistent information dictionary (example)nstreams (
int
) – Number of independent random number streams to produceseed (
int
) – (Optional) Seed for identical random number streams for each worker. If explicitly set toNone
, random number streams are unique and seed via other pseudorandom mechanisms.
- tools.check_inputs(libE_specs=None, alloc_specs=None, sim_specs=None, gen_specs=None, exit_criteria=None, H0=None, serial_check=False)
Checks whether the libEnsemble arguments are of the correct data type and contain sufficient information to perform a run. There is no return value. An exception is raised if any of the checks fail.
from libensemble.tools import check_inputs check_inputs(sim_specs=my_sim_specs, gen_specs=my_gen_specs, exit_criteria=ec)
- Parameters
libE_specs (
dict
, optional) – libEnsemble data structuresalloc_specs (
dict
, optional) – libEnsemble data structuressim_specs (
dict
, optional) – libEnsemble data structuresgen_specs (
dict
, optional) – libEnsemble data structuresexit_criteria (
dict
, optional) – libEnsemble data structuresH0 (
numpy structured array
, optional) – A previous libEnsemble history to be prepended to the history in the current libEnsemble run (example)serial_check (
boolean
) – If true, assumes running a serial check. This means, for example, the details of current MPI communicator are not checked (can be run with libE_specs{‘mpi_comm’: ‘mpi’} without running through mpiexec.
- tools.eprint(*args, **kwargs)
Prints a user message to standard error
- tools.parse_args()
Parses command-line arguments. Use in calling script.
from libensemble.tools import parse_args nworkers, is_manager, libE_specs, misc_args = parse_args()
From the shell:
$ python calling_script --comms local --nworkers 4
Usage:
usage: test_... [-h] [--comms [{local, tcp, ssh, client, mpi}]] [--nworkers [NWORKERS]] [--workers WORKERS [WORKERS ...]] [--nsim_workers [NSIM_WORKERS]] [--nresource_sets [NRESOURCE_SETS]] [--workerID [WORKERID]] [--server SERVER SERVER SERVER] [--pwd [PWD]] [--worker_pwd [WORKER_PWD]] [--worker_python [WORKER_PYTHON]] [--tester_args [TESTER_ARGS [TESTER_ARGS ...]]] Note that running via an MPI runner uses the default 'mpi' comms, and '--nworkers' will be ignored. The number of processes are supplied via the MPI run line. One being the manager, and the rest are workers. --comms, Communications medium for manager and workers. Default is 'mpi'. --nworkers, (For 'local' or 'tcp' comms) Set number of workers. --nsim_workers, (For 'local' or 'mpi' comms) A convenience option for common cases. If used with no other criteria, will generate one additional zero-resource worker for use as a generator. If the number of workers has also been specified, will generate enough zero-resource workers to match the other criteria. --nresource_sets, Explicitly set the number of resource sets. This sets libE_specs['num_resource_sets']. By default, resources will be divided by workers (excluding zero_resource_workers). Example command lines: Run with 'local' comms and 4 workers $ python calling_script --comms local --nworkers 4 Run with 'local' comms and 5 workers - one gen (no resources), and 4 sims. $ python calling_script --comms local --nsim_workers 4 Run with 'local' comms with 4 workers and 8 resource sets. The extra resource sets will be used for larger simulations (using variable resource assignment). $ python calling_script --comms local --nresource_sets 8 Previous example with 'mpi' comms. $ mpirun -np 5 python calling_script --nresource_sets 8
- Returns
nworkers (
int
) – Number of workers libEnsemble will initiateis_manager (
boolean
) – Indicates whether the current process is the manager processlibE_specs (
dict
) – Settings and specifications for libEnsemble (example)
- tools.save_libE_output(H, persis_info, calling_file, nworkers, mess='Run completed')
Writes out history array and persis_info to files.
Format: <calling_script>_results_History_length=<length>_evals=<Completed evals>_ranks=<nworkers>
save_libE_output(H, persis_info, __file__, nworkers)
- Parameters
H (NumPy structured array) – History array storing rows for each point. (example)
persis_info (
dict
) – Persistent information dictionary (example)calling_file (
string
) – Name of user-calling script (or user chosen name) to prefix output files. The convention is to send __file__ from user calling script.nworkers (
int
) – The number of workers in this ensemble. Added to output file names.mess (
String
) – A message to print/log when saving the file.
Persistent Function Support
These routines are commonly used within persistent generator functions
like persistent_aposmm
in libensemble/gen_funcs/
for intermediate
communication with the manager. Persistent simulator functions are also supported.
- class persistent_support.PersistentSupport(libE_info, calc_type)
A helper class to assist with writing persistent user functions.
- recv(blocking=True)
Receive message to worker from manager.
- Returns
message tag, Work dictionary, calc_in array
- send(output, calc_status=0)
Send message from worker to manager.
- Parameters
output – Output array to be sent to manager
calc_status – Optional, Provides a task status
- Returns
None
- send_recv(output, calc_status=0)
Send message from worker to manager and receive response.
- Parameters
output – Output array to be sent to manager
calc_status – Optional, Provides a task status
- Returns
message tag, Work dictionary, calc_in array
Allocation Function Support
These routines are used within custom allocation functions to help prepare Work
structures for workers. See the routines within libensemble/alloc_funcs/
for
examples.
- exception alloc_support.AllocException
Raised for any exception in the alloc support
- class alloc_support.AllocSupport(W, manage_resources=False, persis_info={}, libE_info={}, user_resources=None, user_scheduler=None)
A helper class to assist with writing allocation functions.
This class contains methods for common operations like populating work units, determining which workers are available, evaluating what values need to be distributed to workers, and others.
Note that since the
alloc_f
is called periodically by the Manager, this class instance (if used) will be recreated/destroyed on each loop.- all_gen_informed(H, pt_filter=None, low_bound=None)
Returns
True
if gen has been informed of all expected pointsExcludes cancelled points that were not already given out.
- Parameters
pt_filter – Optional boolean array filtering expected sim_end points in
H
.low_bound – Optional lower bound for testing all returned.
- Returns
True if gen have been informed of all expected points
- all_sim_ended(H, pt_filter=None, low_bound=None)
Returns
True
if all expected points have had their sim_endExcludes cancelled points that were not already sim_started.
- Parameters
pt_filter – Optional boolean array filtering expected returned points in
H
.low_bound – Optional lower bound for testing all returned.
- Returns
True if all expected points have had their sim_end
- all_sim_started(H, pt_filter=None, low_bound=None)
Returns
True
if all expected points have started their simExcludes cancelled points.
- Parameters
pt_filter – Optional boolean array filtering expected returned points in
H
.low_bound – Optional lower bound for testing all returned.
- Returns
True if all expected points have started their sim
- assign_resources(rsets_req)
Schedule resource sets to a work record if possible.
For default scheduler, if more than one group (node) is required, will try to find even split, otherwise allocates whole nodes.
Raises
InsufficientFreeResources
if the required resources are not currently available, orInsufficientResourcesError
if the required resources do not exist.- Parameters
rsets_req – Int. Number of resource sets to request.
- Returns
List of Integers. Resource set indices assigned.
- avail_worker_ids(persistent=None, active_recv=False, zero_resource_workers=None)
Returns available workers as a list of IDs, filtered by the given options.
- Parameters
persistent – Optional int. Only return workers with given
persis_state
(1=sim, 2=gen).active_recv – Optional boolean. Only return workers with given active_recv state.
zero_resource_workers – Optional boolean. Only return workers that require no resources
- Returns
List of worker IDs
If there are no zero resource workers defined, then the
zero_resource_workers
argument will be ignored.
- count_gens()
Returns the number of active generators.
- count_persis_gens()
Return the number of active persistent generators.
- gen_work(wid, H_fields, H_rows, persis_info, **libE_info)
Add gen work record to given
Work
dictionary.Includes evaluation of required resources if the worker is not in a persistent state.
- Parameters
Work – Work dictionary
wid – Worker ID.
H_fields – Which fields from H to send
H_rows – Which rows of
H
to send.persis_info – Worker specific persis_info dictionary
- Returns
A Work entry
Additional passed parameters are inserted into
libE_info
in the resulting work record.If
rset_team
is passed as an additional parameter, it will be honored, and assume that any resource checking has already been done. For example, passingrset_team=[]
, would ensure that no resources are assigned.
- points_by_priority(H, points_avail, batch=False)
Returns indices of points to give by priority
- Parameters
points_avail – Indices of points that are available to give
batch – Optional boolean. Should batches of points with the same priority be given simultaneously.
- Returns
An array of point indices to give.
- sim_work(wid, H, H_fields, H_rows, persis_info, **libE_info)
Add sim work record to given
Work
dictionary.Includes evaluation of required resources if the worker is not in a persistent state.
- Parameters
wid – Int. Worker ID.
H – History array. For parsing out requested resource sets.
H_fields – Which fields from H to send
H_rows – Which rows of
H
to send.persis_info – Worker specific persis_info dictionary
- Returns
a Work entry
Additional passed parameters are inserted into
libE_info
in the resulting work record.If
rset_team
is passed as an additional parameter, it will be honored, assuming that any resource checking has already been done.
- test_any_gen()
Returns
True
if a generator worker is active.
Consensus Subroutines
This file contains many common subroutines used in distributed optimization libraries, including collecting all the sum of {f_i}’s, collecting the gradients, and conducting the consensus step (i.e., take linear combination of your neighbors’ $x$ values.
- consensus_subroutines.get_consensus_gradient(x, gen_specs, libE_info)
- Sends local gen data (@x) and retrieves neighbors local data,
and takes sum of the neighbors’ x’s, which is equivalent to taking the gradient of consensus term for this particular node/agent.
This function is equivalent to the @get_neighbor_vals function, but is less general, i.e., when we need only take a sum rather than a linear combination of our neighbors.
- Parameters
x (np.ndarray) –
local input variable
- Returns
Returns this node’s corresponding gradient of consensus
- Return type
np.ndarray
- consensus_subroutines.get_doubly_stochastic(A)
Generates a doubly stochastic matrix where (i) S_ii > 0 for all i (ii) S_ij > 0 if and only if (i, j) in E
- Parameters
A (np.ndarray) –
adjacency matrix
- Returns
x
- Return type
scipy.sparse.csr_matrix
- consensus_subroutines.get_func_or_grad(x, f_i_idxs, gen_specs, libE_info, get_grad)
- This function is called by a gen to retrieve the function or gradient
of the sum of {f_i}’s via the sim.
- Parameters
x (-) – Input solution vector
f_i_idxs (-) – Which {f_i}’s this calling gen is responsible for
gen_specs (-) – Used to communicate
libE_info – Used to communicate
get_grad (-) – True if we want gradient, otherwise returns function eval
- consensus_subroutines.get_grad_locally(x, f_i_idxs, df)
- This function is called by a gen to locally compute gradients of
the sum of {f_i}’s. Unlike get_grad, this function does not use the sim, but instead evaluates the gradient using the input @df.
- Parameters
x (-) – Input solution vector
f_i_idxs (-) – Which {f_i}’s this calling gen is responsible for
df (-) – Function that returns gradient. Must take in as parameters input @x and index @i (i.e., which f_i to take gradient of)
- consensus_subroutines.get_k_reach_chain_matrix(n, k)
Constructs adjacency matrix for a chain matrix where the ith vertex can reach vertices that are at most @k distances from them (does not wrap around), where the distance is based on the absolute difference between vertices’ indexes.
- consensus_subroutines.get_neighbor_vals(x, local_gen_id, A_gen_ids_no_local, gen_specs, libE_info)
- Sends local gen data (@x) and retrieves neighbors local data.
Sorts the data so the gen ids are in increasing order
- Parameters
x (np.ndarray) –
local input variable
local_gen_id (int) –
this gen’s gen_id
A_gen_ids_local (int) –
expected neighbor’s gen ids, not including local gen id
gen_specs –
objects to communicate and construct mini History array
libE_info –
objects to communicate and construct mini History array
- Returns
X –
2D array of neighbors and local x values sorted by gen_ids
- Return type
np.ndarray
- consensus_subroutines.gm_opt(b, m)
Computes optimal geometric median score
- Parameters
b (-) – 1D array concatenating @m vectors of size @n, i.e., [x_1, x_2,…, x_m]
m (-) – number of vectors
- consensus_subroutines.log_opt(X, y, c, reg=None)
- Computes optimal linear regression with l2 regularization. See, for
reference, https://www.cvxpy.org/examples/machine_learning/logistic_regression.html
- Parameters
X (-) – 2D matrix, 1D matrix, defining the logisitic regression problem
y (np.ndarray) – 2D matrix, 1D matrix, defining the logisitic regression problem
c (-) – Scalar term for regularization
reg (-) – Denotes which regularization to use. Either ‘l1’, ‘l2’, or None
- consensus_subroutines.print_final_score(x, f_i_idxs, gen_specs, libE_info)
- This function is called by a gen so that the alloc will collect
all the {f_i}’s and print their sum.
- Parameters
x (-) – Input solution vector
f_i_idxs (-) – Which {f_i}’s this calling gen is responsible for
gen_specs (-) – Used to communicate
libE_info – Used to communicate
- consensus_subroutines.readin_csv(fname)
- Parses breast-cancer dataset
(http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29) for SVM.
- Parameters
fname (-) – file name containing data
- Returns
- labels (np.ndarray, (m,)) – 1D with the label of each vector
- datas (np.ndarray (2D), (m, n)) – 2D array (matrix) with the collection of dataset
- consensus_subroutines.regls_opt(X, y, c, reg=None)
Computes optimal linear regression with l2 regularization
- Parameters
X (-) – 2D matrix, 1D matrix, where we want to solve optimally for theta so that $y \approx X.dot(theta)$
y (np.ndarray) – 2D matrix, 1D matrix, where we want to solve optimally for theta so that $y \approx X.dot(theta)$
c (-) – Scalar term for regularization
reg (-) – Denotes which regularization to use. Either ‘l1’, ‘l2’, or None
- consensus_subroutines.svm_opt(X, b, c, reg='l1')
Computes optimal support vector machine (SVM) with l1 regularization.
- Parameters
X (-) – 2D matrix, 1D matrix, defining the SVM problem
b (np.ndarray) – 2D matrix, 1D matrix, defining the SVM problem
c (-) – Scalar term for regularization
reg (-) – Denotes which regularization to use. Either ‘l1’, ‘l2’, or None