elasticslice.managers package

Submodules

elasticslice.managers.core module

class elasticslice.managers.core.ElasticSliceClientEndpoint[source]

Bases: elasticslice.rpc.protogeni.ProtoGeniClientServerEndpoint

Creates a stub elasticslice client-server API endpoint that simply returns SUCCESS (and prints out the RPC arguments on invocation if in debug mode).

GetVersion(params=None)[source]
NotifyDeleteComplete(params)[source]

The server calls this method to tell us that it has finished invoking the CM v2.0 DeleteNodes() method on your behalf – so from Cloudlab’s perspective, those nodes have been removed from your experiment. @params[“nodelist”] is a list of physical resources it is going to revoke. Each list item must have at least the following key/value pairs:

"node_id" : "<NODE_ID>" (the physical node_id)
"client_id" : "<CLIENT_ID>" (the client_id from the rspec, if this
                             client has this component_id allocated)
"max_wait_time" : <seconds>

The only valid response from this method is SUCCESS; it is ignored anyway.

NotifyDeletePending(params)[source]

The server calls this method to tell us it is going to revoke some of our resources. @params[“nodelist”] is a list of physical resources it is going to revoke. Each list item must have at least the following key/value pairs:

"node_id" : "<NODE_ID>" (the physical node_id)
"client_id" : "<CLIENT_ID>" (the client_id from the rspec, if this
                             client has this component_id allocated)
"max_wait_time" : <seconds>

The server may call this method repeatedly as a sort of “countdown” timer (by default it gets called every minute), and it will count down the max_wait_time dict fields on each call. Thus, you have multiple opportunities to respond when your delete jobs have finished.

The server may specify max_wait_time of 0 if it cannot wait gracefully for the client to clean off the node (in which case the node may have been revoked by the time the client receives this message); otherwise it should specify how long it is willing to wait for the client to release the node before it forcibly takes the node back.

The client may respond with simply success, or it can reply with a list of dicts that specify nodes that may be immediately stolen. This list should be idential to the input list, but you tell the server that it can stop waiting for a particular node by setting max_wait_time to 0.

SetResourceValues(params)[source]

@params[“nodelist”] is a list of physical node resources. Each list item must have at least the following key/value pairs:

"node_id" : "<NODE_ID>"
"value" : float(0,1.0)

Each list item may also have the following values, if the server sends them:

"client_id" : "<CLIENT_ID>" (the client_id from the rspec, if this
                             client has this component_id allocated)
"component_urn" : "<COMPONENT_URN>"
__init__()[source]
class elasticslice.managers.core.ElasticSliceHelper(server=None, config=None)[source]

Bases: object

An abstract class that can be subclassed to specialize the functionality of the ElasticSliceManager (below) without actually changing that class. The idea is that this class is supposed to handle the semantics of dealing with a dynamic experiment (i.e., rspec, startup commands, adding new nodes, handling deletions) – but doesn’t have to manage any of the dynamic experiment’s state or lifecycle operations.

__init__(server=None, config=None)[source]
create_rspec()[source]

Returns an rspec XML string that will be used to create the sliver for a dynamic experiment if it doesn’t already exist. This method is only called if the sliver doesn’t exist already. If the sliver does exist, the manager will only call generateAddNodesArgs below when it decides to add nodes dynamically. At that time, the helper should grab the current sliver manifest and initialize any necessary state to be able to generate new nodes (that don’t overlap with existing ones, for instance).

get_add_args(count=1)[source]

Returns a dict of nodes that conforms to the CMv2 AddNodes() API call.

get_delete_commands(nodelist)[source]

Returns a list of sh commands that are executed serially in a forked child.

handle_added_node(node, status)[source]

This method should be called by a manager when it sees that an added node has become “ready” or “failed”. @node is a dict like

{ ‘client_id’:’node-1’,’node_id’:’pc100’,...}

and @status is either ‘ready’, ‘failed’. The return value is not checked by the caller.

handle_deleted_node(node)[source]

This method should be called by a manager when it sees that an added node has become “ready” or “failed”. @node is a dict like

{ ‘client_id’:’node-1’,’node_id’:’pc100’,...}

The return value is not checked by the caller.

class elasticslice.managers.core.ElasticSliceManager(server, config=None)[source]

Bases: elasticslice.managers.core.ElasticSliceHelper

A manager of a dynamic slice that creates a slice (and its sliver); adds nodes when possible (i.e., when usage is under some specified threshold); optionally deletes nodes when the free threshold is breached; and renews the slice and sliver.

A manager is also a helper by default, although the default helper interface (ElasticSliceHelper) methods either raise NotImplementedError or return None. Managers that do not provide a Helper implementation natively should subclass the PluginElasticSliceHelper (which takes an external helper object and calls it), and the library user should call manager.set_helper(helper_obj).

__init__(server, config=None)[source]
add_nodes(count=0, cm=None)[source]

This function manages the addition of nodes. This manager will call it to add nodes to this slice, as necessary.

Parameters:
  • count – If nonzero, positive integer, this method checks (via should_add_nodes()) if the manager should add exactly count nodes, or not, and return accordingly.
  • cm – the CM (component manager) at which to add nodes.
create_slice()[source]

Create the slice from within the manager, using information from the ElasticSliceHelper associated with this ElasticSliceManager.

create_sliver(cm=None)[source]

Create the sliver from within the manager, using information from the ElasticSliceHelper associated with this ElasticSliceManager.

Parameters:cm – the CM (component manager) at which to create the sliver.
delete_nodes(cm=None)[source]

This function handles the deletion of the nodes in @nodelist. This manager will call it to delete nodes from this slice, as necessary.

ensure_slice()[source]

Check to see if the slice exists; if not, create it.

ensure_sliver(cm=None)[source]

Check to see if the sliver exists; if not, create it using information from the ElasticSliceHelper associated with this ElasticSliceManager.

Parameters:cm – the CM (component manager) at which to create the sliver.
get_system_state()[source]

Allows this client to pull in state from a system that is using or interacting with the dynamic resources this client has obtained for it. If you are creating a dynamic experiment for yourself, this may well be a noop. However, if you are allowing the resources to be used by something else (like another cluster, or cluster management system), you might want to query it to get its state so that state can be taken into account in add_nodes or delete_nodes.

For instance, if you are using this client to manage the addition or subtraction of nodes into Slurm, you will want to grab Slurm’s job queue for each node, or something.

manage()[source]

Runs manage_once infinitely, aborting only on a fatal error, or on user interrupt.

manage_once(force=False)[source]

Performs one management cycle. A management cycle is all the logic needed to do to choose, for this cycle, whether or not to add or delete any nodes from the slice. If @force is specified, it performs each step in its algorithm, regardless of if the specified time interval between core operations has been reached. Otherwise, the core operations are only performed at the intervals specified for each.

set_system_state()[source]

Allows this client to push its state (i.e., resource values Cloudlab has pushed to it) to the “system” it is integrating with (i.e. Slurm).

should_add_nodes(count=0, cm=None)[source]

By default, add_nodes calls this method to find out if it should add any nodes. If this function returns an integer, add_nodes interprets that value as the number of new nodes to add, and it calls get_add_args() to get the arguments to pass to _add_nodes. If the function returns a dict, it passes that directly to _add_nodes. If the function returns None, False, nothing is added. This gives subclasses enough flexibility, hopefully. If you pass a nonzero, positive integer via count, this method must check if it should add exactly count nodes, or not, and return accordingly.

Parameters:
  • count – If nonzero, positive integer, this method checks if the manager should add exactly count nodes, or not, and return accordingly.
  • cm – the CM (component manager) at which to add nodes.
should_delete_nodes(cm=None)[source]

Should we delete a node? If so, this function must return a list of nodes to delete. This list is exactly the arguments that CM::DeleteNodes expects. If there is nothing to do, it can return None or False.

Parameters:cm – the CM (component manager) at which to add nodes.
update_all()[source]

This function updates the details of all nodes at the CMs this manager has been told to use, even those nodes that are not currently available.

update_available()[source]

This function updates the details of currently available nodes at the CMs this manager has been told to use. It is a cheap way to obtain the free node list for heavily-utilized clusters.

update_sliver_status()[source]

This function updates sliver status.

class elasticslice.managers.core.PluginElasticSliceHelper(server=None, config=None, helper=None)[source]

Bases: elasticslice.managers.core.ElasticSliceHelper

Managers that don’t implement any semantics themselves (i.e., how to Add, Delete a node, or create an Rspec), should use this class as a mixin.

__init__(server=None, config=None, helper=None)[source]
create_rspec()[source]
get_add_args(count=1)[source]
get_delete_commands(nodelist)[source]
handle_added_node(node, status)[source]
handle_deleted_node(node)[source]
set_helper(helper)[source]

This class is designed to be used only as a mixin to a Manager. The core driver program looks for this function as a method of the Manager, and if it exists, calls it. This is a mixin function to a Manager; thus we must provide this so that this plugin helper can call through to an external helper.

class elasticslice.managers.core.SimpleElasticSliceClientEndpoint(helper=None, delete_in_group=True, rlock=None)[source]

Bases: elasticslice.managers.core.ElasticSliceClientEndpoint

SimpleElasticSliceClientEndpoint is a simple class that provides the elasticslice client endpoint API, but tracks and saves notification info rom the server, and tracks node delete operations and kicks off commands using a Manager to run when a node should be deleted. An experiment creator creates a ProtoGeniClientServer with one of these endpoints, and it handles the elasticslice RPC invocations from the ProtoGeni server. If delete_in_group is true, the start_delete_operation function will be called with all nodes in the list sent to NotifyDeletePending; this allows experiments which want to delete at once all nodes the server sent to us. This avoids potential wasted work (i.e., if the experiment is an openstack cluster, and we need to migrate VMs from the hypervisor/compute nodes to other compute nodes, we don’t want to migrate to another compute node that is about to be deleted!). On the other hand, if you must delete nodes one at a time, you can do that too, by setting delete_in_group=False.

NotifyDeleteComplete(params)[source]

The server calls this method to tell us that it has finished invoking the CM v2.0 DeleteNodes() method on your behalf – so from Cloudlab’s perspective, those nodes have been removed from your experiment. @params[“nodelist”] is a list of physical resources it is going to revoke. Each list item must have at least the following key/value pairs:

“node_id” : “<NODE_ID>” (the physical node_id) “client_id” : “<CLIENT_ID>” (the client_id from the rspec, if this

client has this component_id allocated)

“max_wait_time” : <seconds>

The only valid response from this method is SUCCESS; it is ignored anyway.

NotifyDeletePending(params)[source]

The server calls this method to tell us it is going to revoke some of our resources. @params[“nodelist”] is a list of physical resources it is going to revoke. Each list item must have at least the following key/value pairs:

“node_id” : “<NODE_ID>” (the physical node_id) “client_id” : “<CLIENT_ID>” (the client_id from the rspec, if this

client has this component_id allocated)

“max_wait_time” : <seconds>

The server may call this method repeatedly as a sort of “countdown” timer (by default it gets called every minute), and it will count down the max_wait_time dict fields on each call. Thus, you have multiple opportunities to respond when your delete jobs have finished.

The server may specify max_wait_time of 0 if it cannot wait gracefully for the client to clean off the node (in which case the node may have been revoked by the time the client receives this message); otherwise it should specify how long it is willing to wait for the client to release the node before it forcibly takes the node back.

The client may respond with simply success, or it can reply with a list of dicts that specify nodes that may be immediately stolen. This list should be idential to the input list, but you tell the server that it can stop waiting for a particular node by setting max_wait_time to 0.

SetResourceValues(params)[source]

@params[“nodelist”] is a list of physical node resources. Each list item must have at least the following key/value pairs:

“node_id” : “<NODE_ID>” “value” : float(0,1.0)

Each list item may also have the following values, if the server sends them:

“client_id” : “<CLIENT_ID>” (the client_id from the rspec, if this
client has this component_id allocated)

“component_urn” : “<COMPONENT_URN>”

__init__(helper=None, delete_in_group=True, rlock=None)[source]
end_delete(nodelist)[source]
has_pending_deletes()[source]
start_delete(nodelist)[source]
class elasticslice.managers.core.SimpleElasticSliceHelper(server=None, config=None, num_pcs=1, image_urn=None, num_lans=0, multiplex_lans=False, tarballs=[], startup_command=None, nodetype=None, node_prefix='node', lan_prefix='lan')[source]

Bases: elasticslice.managers.core.ElasticSliceHelper

A default, simple elasticslice helper. User can specify number of PCs, an image URN, number of lans (and whether or not they are multiplexed), tarballs, startup command, node type, and then some naming conventions.

__init__(server=None, config=None, num_pcs=1, image_urn=None, num_lans=0, multiplex_lans=False, tarballs=[], startup_command=None, nodetype=None, node_prefix='node', lan_prefix='lan')[source]
create_rspec()[source]
get_add_args(count=1)[source]
get_delete_commands(nodelist)[source]
class elasticslice.managers.core.SimpleElasticSliceManager(server, config=None, minthreshold=1, maxthreshold=10, percent_available_minimum=0.5, manage_order=['ensure_slice', 'ensure_sliver', 'update_sliver_status', 'renew', 'update_all', 'update_available', 'get_system_state', 'add_nodes', 'delete_nodes', 'set_system_state'], manage_intervals={'update_available': 600, 'ensure_sliver': 60, 'delete_nodes': 300, 'update_sliver_status': 60, 'update_all': 86400, 'renew': 3600, 'add_nodes': 300, 'ensure_slice': 60}, method_reset_triggers={'add_nodes': ['update_sliver_status', 'update_available', 'get_system_state', 'set_system_state'], 'delete_nodes': ['update_sliver_status', 'update_available', 'get_system_state', 'set_system_state']}, retry_interval=30, nodetype=None, cmlist=None, enable_delete=False, email=True)[source]

Bases: elasticslice.managers.core.ElasticSliceManager, elasticslice.managers.core.PluginElasticSliceHelper, elasticslice.managers.core.SimpleElasticSliceClientEndpoint

CM_METHODS = ['ensure_sliver', 'update_sliver_status', 'update_available', 'update_all', 'add_nodes', 'delete_nodes']
DEF_INTERVAL = 60
DEF_INTERVALS = {'update_available': 600, 'ensure_sliver': 60, 'delete_nodes': 300, 'update_sliver_status': 60, 'update_all': 86400, 'renew': 3600, 'add_nodes': 300, 'ensure_slice': 60}
DEF_MAX_THRESHOLD = 10

A manager of a dynamic slice that creates a slice (and its sliver); adds nodes when possible (i.e., when usage is under some specified threshold); optionally deletes nodes when the free threshold is breached; and renews the slice and sliver. The user really should provide a Helper to manage the semantics of the experiment.

ElasticSliceManager locks using a per-instance threading.RLock. Thus, once one thread has the lock, they’re good.

If you subclass SimpleElasticSliceManager, and want to replace its helper functions, it is very important that you first subclass whatever helper class you’re extending, then SimpleElasticSliceManager. For instance,

class FooManager(SimpleElasticSliceHelper,SimpleElasticSliceManager):

Why? Because that makes the method resolution order inside of FooManager be FooManager, SimpleElasticSliceHelper, SimpleElasticSliceManager ... . If you instead tried to do the reverse:

class FooManager(SimpleElasticSliceManager,SimpleElasticSliceHelper):

the method resolution order would be FooManager, SimpleElasticSliceManager, SimpleElasticSliceHelper. Because SimpleElasticSliceManager provides an implementation of each method in the ElasticSliceHelper interface, those methods will be called, and those from SimpleElasticSliceHelper will never be called (which defeats the point, that you hoped to re-use the helper methods from SimpleElasticSliceHelper). So consider method resolution order! Helpers are not mixins, because an ElasticSliceManager is also an ElasticSliceHelper! This is an OO multiple inheritance style, not a mixin style.

DEF_MIN_THRESHOLD = 1
DEF_ORDER = ['ensure_slice', 'ensure_sliver', 'update_sliver_status', 'renew', 'update_all', 'update_available', 'get_system_state', 'add_nodes', 'delete_nodes', 'set_system_state']
METHOD_RESET_TRIGGERS = {'add_nodes': ['update_sliver_status', 'update_available', 'get_system_state', 'set_system_state'], 'delete_nodes': ['update_sliver_status', 'update_available', 'get_system_state', 'set_system_state']}
__init__(server, config=None, minthreshold=1, maxthreshold=10, percent_available_minimum=0.5, manage_order=['ensure_slice', 'ensure_sliver', 'update_sliver_status', 'renew', 'update_all', 'update_available', 'get_system_state', 'add_nodes', 'delete_nodes', 'set_system_state'], manage_intervals={'update_available': 600, 'ensure_sliver': 60, 'delete_nodes': 300, 'update_sliver_status': 60, 'update_all': 86400, 'renew': 3600, 'add_nodes': 300, 'ensure_slice': 60}, method_reset_triggers={'add_nodes': ['update_sliver_status', 'update_available', 'get_system_state', 'set_system_state'], 'delete_nodes': ['update_sliver_status', 'update_available', 'get_system_state', 'set_system_state']}, retry_interval=30, nodetype=None, cmlist=None, enable_delete=False, email=True)[source]
add_nodes(count=0, cm=None)[source]
delete_nodes(cm=None)[source]
get_email()[source]
get_node_status(node, cm=None, key=None)[source]

Get either the status dictionary for the given virtual node name @node, or if @key is specified, retrieve that key.

is_adding(node=None, cm=None)[source]
is_deleting(node=None, cm=None)[source]
manage()[source]
manage_once()[source]

This is a function that runs all the manage methods, in order, that we were configured to run. XXX: Currently, we catch exceptions from any individual function and keep going; probably need something better.

renew()[source]
should_add_nodes(count=0, cm=None)[source]
should_delete_nodes(cm=None)[source]
start()[source]
stop()[source]
update_all(cm=None, force=False, lastFetchBefore=None)[source]
update_available(cm=None)[source]
update_sliver_status(cm=None, force=False)[source]

Ok, now grab our sliver status to see the nodes we have at @cm, and what their status is.

Module contents