Convenience Tools and Functions

Calling Script Function Support

class tools.ForkablePdb(completekey='tab', stdin=None, stdout=None, skip=None, nosigint=False, readrc=True)

A Pdb subclass that may be used from a forked multiprocessing child

Usage:

from libensemble.tools import ForkablePdb
ForkablePdb().set_trace()
tools.add_unique_random_streams(persis_info, nstreams, seed='')

Creates nstreams random number streams for the libE manager and workers when nstreams is num_workers + 1. Stream i is initialized with seed i by default. Otherwise the streams can be initialized with a provided seed.

The entries are appended to the provided persis_info dictionary.

persis_info = add_unique_random_streams(old_persis_info, nworkers + 1)
Parameters
  • persis_info (dict) – Persistent information dictionary (example)

  • nstreams (int) – Number of independent random number streams to produce

  • seed (int) – (Optional) Seed for identical random number streams for each worker. If explicitly set to None, random number streams are unique and seed via other pseudorandom mechanisms.

tools.check_inputs(libE_specs=None, alloc_specs=None, sim_specs=None, gen_specs=None, exit_criteria=None, H0=None, serial_check=False)

Checks whether the libEnsemble arguments are of the correct data type and contain sufficient information to perform a run. There is no return value. An exception is raised if any of the checks fail.

from libensemble.tools import check_inputs
check_inputs(sim_specs=my_sim_specs, gen_specs=my_gen_specs, exit_criteria=ec)
Parameters
  • libE_specs (dict, optional) – libEnsemble data structures

  • alloc_specs (dict, optional) – libEnsemble data structures

  • sim_specs (dict, optional) – libEnsemble data structures

  • gen_specs (dict, optional) – libEnsemble data structures

  • exit_criteria (dict, optional) – libEnsemble data structures

  • H0 (numpy structured array, optional) – A previous libEnsemble history to be prepended to the history in the current libEnsemble run (example)

  • serial_check (boolean) – If true, assumes running a serial check. This means, for example, the details of current MPI communicator are not checked (can be run with libE_specs{‘mpi_comm’: ‘mpi’} without running through mpiexec.

tools.eprint(*args, **kwargs)

Prints a user message to standard error

tools.parse_args()

Parses command-line arguments.

from libensemble.tools import parse_args
nworkers, is_manager, libE_specs, misc_args = parse_args()

From the shell:

$ python calling_script --comms local --nworkers 4

Usage:

usage: test_... [-h] [--comms [{local,tcp,ssh,client,mpi}]]
                [--nworkers [NWORKERS]] [--workers WORKERS [WORKERS ...]]
                [--nsim_workers [NSIM_WORKERS]]
                [--nresource_sets [NRESOURCE_SETS]]
                [--workerID [WORKERID]] [--server SERVER SERVER SERVER]
                [--pwd [PWD]] [--worker_pwd [WORKER_PWD]]
                [--worker_python [WORKER_PYTHON]]
                [--tester_args [TESTER_ARGS [TESTER_ARGS ...]]]
Returns

  • nworkers (int) – Number of workers libEnsemble will initiate

  • is_manager (boolean) – Indicates whether the current process is the manager process

  • libE_specs (dict) – Settings and specifications for libEnsemble (example)

tools.save_libE_output(H, persis_info, calling_file, nworkers, mess='Run completed')

Writes out history array and persis_info to files.

Format: <calling_script>_results_History_length=<length>_evals=<Completed evals>_ranks=<nworkers>

save_libE_output(H, persis_info, __file__, nworkers)
Parameters
  • H (NumPy structured array) – History array storing rows for each point. (example)

  • persis_info (dict) – Persistent information dictionary (example)

  • calling_file (string) – Name of user-calling script (or user chosen name) to prefix output files. The convention is to send __file__ from user calling script.

  • nworkers (int) – The number of workers in this ensemble. Added to output file names.

  • mess (String) – A message to print/log when saving the file.

Persistent Function Support

These routines are commonly used within persistent generator functions like persistent_aposmm in libensemble/gen_funcs/ for intermediate communication with the manager. Persistent simulator functions are also supported.

class persistent_support.PersistentSupport(libE_info, calc_type)

A helper class to assist with writing persistent user functions.

recv()

Receive message to worker from manager.

Returns

message tag, Work dictionary, calc_in array

send(output, calc_status=0)

Send message from worker to manager.

Parameters
  • output – Output array to be sent to manager

  • calc_status – Optional, Provides a task status

Returns

None

send_recv(output, calc_status=0)

Send message from worker to manager and receive response.

Parameters
  • output – Output array to be sent to manager

  • calc_status – Optional, Provides a task status

Returns

message tag, Work dictionary, calc_in array

Allocation Function Support

These routines are used within custom allocation functions to help prepare Work structures for workers. See the routines within libensemble/alloc_funcs/ for examples.

exception alloc_support.AllocException

Raised for any exception in the alloc support

class alloc_support.AllocSupport(W, manage_resources=False, persis_info={}, scheduler_opts={}, user_resources=None, user_scheduler=None)

A helper class to assist with writing allocation functions.

This class contains methods for common operations like populating work units, determining which workers are available, evaluating what values need to be distributed to workers, and others.

Note that since the alloc_f is called periodically by the Manager, this class instance (if used) will be recreated/destroyed on each loop.

all_given(H, pt_filter=None, low_bound=None)

Returns True if all expected points have been given to sim

Excludes cancelled points.

Parameters
  • pt_filter – Optional boolean array filtering expected returned points in H.

  • low_bound – Optional lower bound for testing all returned.

Returns

True if all expected points have been returned

all_given_back(H, pt_filter=None, low_bound=None)

Returns True if all expected points have been given back to gen.

Excludes cancelled points that were not already given out.

Parameters
  • pt_filter – Optional boolean array filtering expected returned points in H.

  • low_bound – Optional lower bound for testing all returned.

Returns

True if all expected points have been returned

all_returned(H, pt_filter=None, low_bound=None)

Returns True if all expected points have returned from sim

Excludes cancelled points that were not already given out.

Parameters
  • pt_filter – Optional boolean array filtering expected returned points in H.

  • low_bound – Optional lower bound for testing all returned.

Returns

True if all expected points have been returned

assign_resources(rsets_req)

Schedule resource sets to a work record if possible.

For default scheduler, if more than one group (node) is required, will try to find even split, otherwise allocates whole nodes.

Raises InsufficientFreeResources if the required resources are not currently available, or InsufficientResourcesError if the required resources do not exist.

Parameters

rsets_req – Int. Number of resource sets to request.

Returns

List of Integers. Resource set indices assigned.

avail_worker_ids(persistent=None, active_recv=False, zero_resource_workers=None)

Returns available workers as a list of IDs, filtered by the given options.

Parameters
  • persistent – Optional int. Only return workers with given persis_state (1=sim, 2=gen).

  • active_recv – Optional Boolean. Only return workers with given active_recv state.

  • zero_resource_workers – Optional Boolean. Only return workers that require no resources

Returns

List of worker IDs

If there are no zero resource workers defined, then the zero_resource_workers argument will be ignored.

count_gens()

Returns the number of active generators.

count_persis_gens()

Return the number of active persistent generators.

gen_work(wid, H_fields, H_rows, persis_info, **libE_info)

Add gen work record to given Work dictionary.

Includes evaluation of required resources if the worker is not in a persistent state.

Parameters
  • WorkWork dictionary

  • wid – Worker ID.

  • H_fields – Which fields from H to send

  • H_rows – Which rows of H to send.

  • persis_info – Worker specific persis_info dictionary

Returns

A Work entry

Additional passed parameters are inserted into libE_info in the resulting work record.

If rset_team is passed as an additional parameter, it will be honored, and assume that any resource checking has already been done. For example, passing rset_team=[], would ensure that no resources are assigned.

points_by_priority(H, points_avail, batch=False)

Returns indices of points to give by priority

Parameters
  • points_avail – Indices of points that are available to give

  • batch – Optional Boolean. Should batches of points with the same priority be given simultaneously.

Returns

An array of point indices to give.

sim_work(wid, H, H_fields, H_rows, persis_info, **libE_info)

Add sim work record to given Work dictionary.

Includes evaluation of required resources if the worker is not in a persistent state.

Parameters
  • wid – Int. Worker ID.

  • HHistory array. For parsing out requested resource sets.

  • H_fields – Which fields from H to send

  • H_rows – Which rows of H to send.

  • persis_info – Worker specific persis_info dictionary

Returns

a Work entry

Additional passed parameters are inserted into libE_info in the resulting work record.

If rset_team is passed as an additional parameter, it will be honored, assuming that any resource checking has already been done.

test_any_gen()

Returns True if a generator worker is active.

Consensus Subroutines

This file contains many common subroutines used in distributed optimization libraries, including collecting all the sum of {f_i}’s, collecting the gradients, and conducting the consensus step (i.e., take linear combination of your neighbors’ $x$ values.

consensus_subroutines.get_consensus_gradient(x, gen_specs, libE_info)
Sends local gen data (@x) and retrieves neighbors local data,

and takes sum of the neighbors’ x’s, which is equivalent to taking the gradient of consensus term for this particular node/agent.

This function is equivalent to the @get_neighbor_vals function, but is less general, i.e., when we need only take a sum rather than a linear combination of our neighbors.

Parameters

x (np.ndarray) –

  • local input variable

Returns

  • Returns this node’s corresponding gradient of consensus

Return type

np.ndarray

consensus_subroutines.get_doubly_stochastic(A)

Generates a doubly stochastic matrix where (i) S_ii > 0 for all i (ii) S_ij > 0 if and only if (i, j) in E

Parameters

A (np.ndarray) –

  • adjacency matrix

Returns

x

Return type

scipy.sparse.csr_matrix

consensus_subroutines.get_func_or_grad(x, f_i_idxs, gen_specs, libE_info, get_grad)
This function is called by a gen to retrieve the function or gradient

of the sum of {f_i}’s via the sim.

Parameters
  • x (-) – Input solution vector

  • f_i_idxs (-) – Which {f_i}’s this calling gen is responsible for

  • gen_specs (-) – Used to communicate

  • libE_info – Used to communicate

  • get_grad (-) – True if we want gradient, otherwise returns function eval

consensus_subroutines.get_grad_locally(x, f_i_idxs, df)
This function is called by a gen to locally compute gradients of

the sum of {f_i}’s. Unlike get_grad, this function does not use the sim, but instead evaluates the gradient using the input @df.

Parameters
  • x (-) – Input solution vector

  • f_i_idxs (-) – Which {f_i}’s this calling gen is responsible for

  • df (-) – Function that returns gradient. Must take in as parameters input @x and index @i (i.e., which f_i to take gradient of)

consensus_subroutines.get_k_reach_chain_matrix(n, k)

Constructs adjacency matrix for a chain matrix where the ith vertex can reach vertices that are at most @k distances from them (does not wrap around), where the distance is based on the absolute difference between vertices’ indexes.

consensus_subroutines.get_neighbor_vals(x, local_gen_id, A_gen_ids_no_local, gen_specs, libE_info)
Sends local gen data (@x) and retrieves neighbors local data.

Sorts the data so the gen ids are in increasing order

Parameters
  • x (np.ndarray) –

    • local input variable

  • local_gen_id (int) –

    • this gen’s gen_id

  • A_gen_ids_local (int) –

    • expected neighbor’s gen ids, not including local gen id

  • gen_specs

    • objects to communicate and construct mini History array

  • libE_info

    • objects to communicate and construct mini History array

Returns

X

  • 2D array of neighbors and local x values sorted by gen_ids

Return type

np.ndarray

consensus_subroutines.gm_opt(b, m)

Computes optimal geometric median score

Parameters
  • b (-) – 1D array concatenating @m vectors of size @n, i.e., [x_1, x_2,…, x_m]

  • m (-) – number of vectors

consensus_subroutines.log_opt(X, y, c, reg=None)
Computes optimal linear regression with l2 regularization. See, for

reference, https://www.cvxpy.org/examples/machine_learning/logistic_regression.html

Parameters
  • X (-) – 2D matrix, 1D matrix, defining the logisitic regression problem

  • y (np.ndarray) – 2D matrix, 1D matrix, defining the logisitic regression problem

  • c (-) – Scalar term for regularization

  • reg (-) – Denotes which regularization to use. Either ‘l1’, ‘l2’, or None

consensus_subroutines.print_final_score(x, f_i_idxs, gen_specs, libE_info)
This function is called by a gen so that the alloc will collect

all the {f_i}’s and print their sum.

Parameters
  • x (-) – Input solution vector

  • f_i_idxs (-) – Which {f_i}’s this calling gen is responsible for

  • gen_specs (-) – Used to communicate

  • libE_info – Used to communicate

consensus_subroutines.readin_csv(fname)
Parses breast-cancer dataset

(http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29) for SVM.

Parameters

fname (-) – file name containing data

Returns

  • - labels (np.ndarray, (m,)) – 1D with the label of each vector

  • - datas (np.ndarray (2D), (m, n)) – 2D array (matrix) with the collection of dataset

consensus_subroutines.regls_opt(X, y, c, reg=None)

Computes optimal linear regression with l2 regularization

Parameters
  • X (-) – 2D matrix, 1D matrix, where we want to solve optimally for theta so that $y pprox X.dot(theta)$

  • y (np.ndarray) – 2D matrix, 1D matrix, where we want to solve optimally for theta so that $y pprox X.dot(theta)$

  • c (-) – Scalar term for regularization

  • reg (-) – Denotes which regularization to use. Either ‘l1’, ‘l2’, or None

consensus_subroutines.svm_opt(X, b, c, reg='l1')

Computes optimal support vector machine (SVM) with l1 regularization.

Parameters
  • X (-) – 2D matrix, 1D matrix, defining the SVM problem

  • b (np.ndarray) – 2D matrix, 1D matrix, defining the SVM problem

  • c (-) – Scalar term for regularization

  • reg (-) – Denotes which regularization to use. Either ‘l1’, ‘l2’, or None