Skip to content

Instantly share code, notes, and snippets.

@chrisruffalo
Last active June 4, 2018 18:38
Show Gist options
  • Save chrisruffalo/e78fb45e4d56dbffffe252aea2c8e519 to your computer and use it in GitHub Desktop.
Save chrisruffalo/e78fb45e4d56dbffffe252aea2c8e519 to your computer and use it in GitHub Desktop.
A proposal for a mechanism for creating locks around critical sections in CFME.
###############################################################################################################
# Author: Chris Ruffalo <cruffalo@redhat.com>
#
# -------------------------------------------------------------------------------------------------------------
# Description:
# -------------------------------------------------------------------------------------------------------------
# In some situations in CFME automation it is necessary to access a resource that does not
# support shared access. This code was initially developed to work with a web API that could
# return the same result to multiple workflows and end up with virtual machines having identical
# properties where they should've been unique.
#
# To achieve this goal a locking scheme was created so that a workflow could execute an automation step
# that would acquire a lock using VMDB properties for the VMDB object that would be serving as the root
# of the lock.
#
# The overall intent of this implementation is to do the best job possible of protecting limited critical
# sections for simultaneous execution. While this implementation tries to follow the rough semantics of
# various types of locks it does not posses the power to provide similar guarantees.
#
# -------------------------------------------------------------------------------------------------------------
# Operation:
# -------------------------------------------------------------------------------------------------------------
# When speaking of locks in CFME it is important to realize that these act more like a psuedo-lock and
# do not follow the exact semantics of an actual semaphore as usually seen in many programming languages.
# Instead this implementation uses the VMDB as a simple provider for locks by setting a property on the
# objects that will serve as the root for the operation.
#
# When the lock is first acquired it sets a property (configured by the 'key' input) to a given value
# (configured by the input 'value'). On some object types this may be through `set_option` and on
# others it may be through `custom_set`. All of this depends on the type of object.
#
# After setting the lock this method quieries for all active records of the same type that have a property
# 'key' whose value matches 'value'. There is some other logic to allow for the lock to be expired without
# being manually released as well.
#
# When a lock cannot be acquired the property is set back to nil and a random backoff occurs. The presence
# of this random backoff assures that if two methods mutually acquire the lock they should also mutually
# backoff for different amounts of time.
#
# This operation results in a slow but moderately safe locking mechanism.
#
# -------------------------------------------------------------------------------------------------------------
# Configuration Inputs:
# -------------------------------------------------------------------------------------------------------------
# action - Action to perform. Accepted values are 'acquire' and 'release'. The default value is 'acquire'.
#
# object - Key on the root object to use to lookup the objec that will be the target of the lock. Defaults
# to the value of $evm.root['vmdb_object_type']. This can be used to create a lock on a type
# or value that is not the target of the current operation. (Ex: 'vm' or 'miq_provision') Supported
# types are 'vm', 'miq_request', 'service', and 'miq_provision'. Other types will throw an error.
#
# key - Key to use to set the lock value on the target object. A good example might be 'network_name',
# 'enclave', 'group' or whatever the unique description of the locked resource should be.
#
# value - Value to use that describes any subgroup that is being locked. This allows this locking code
# to operate on a smaller subsection of the provided resource. For example, if using 'network_name'
# the value might be something like 'production' or 'sandbox'.
#
# expires - One of 'true' to use the default expire time (300 seconds), 'false' to have the lock never
# expire automatically, or an integer representing the time until expiration (ex: 1000). If
# an expiration value is given the lock will be automatically removed or ignored after the
# time elapses which gives some safety for processes that can fail after acquiring a lock.
#
# backoff - Time, in seconds, to wait (at minimum) before attempting to reacquire the lock. The
# default value is 15. Prevents the process from tripping over itself and creating churn
# as the process is retried.
#
# random - Maximum random amount of seconds to add to the backoff time. This helps processes that are in
# the backoff state from colliding by spreading out the time when they will restart.
#
# allowed - The number of simultaneous locks allowed. The default value is 1 which makes this behave like
# a mutex. Setting a number of greater than 1 makes the lock behave like a counting latch.
#
###############################################################################################################
# constants
ACTION_ACQUIRE = 'acquire'
ACTION_RELEASE = 'release'
DEFAULT_ACTION = ACTION_ACQUIRE
DEFAULT_OBJECT_KEY = 'vmdb_object_type'
DEFAULT_VALUE = 'default_lock'
DEFAULT_EXPIRES = 'false'
DEFAULT_EXP_TIME = 300
DEFAULT_BACKOFF = 15
DEFAULT_RANDOM = 45
DEFAULT_ALLOWED = 1
VALUE_PROPERTY = :value
EXPIRES_PROPERTY = :expires_after
# instance scoped variables
@method = $evm.current_method
@org = $evm.root['tenant'].name
@debug = true #$evm.root['debug'] || false
########################### Utility ###########################
def log(level, msg)
$evm.log(level, "#{@org} - #{@method} :: #{msg}")
end
############################ Query ############################
# get active vms that don't match the given id
def get_active_vms(vm_id)
return $evm.vmdb(:vm).where("id != ?", vm_id).select { |vm|
vm.archived == false && vm.orphaned == false
}
end
# get active provisioning requests that don't match the given id
def get_active_provs(prov_id)
return $evm.vmdb(:miq_provision).where("id != ? and state in ('active','queued','pending')", prov_id)
end
# get active requests that don't match the given id
def get_active_requests(req_id)
return $evm.vmdb(:miq_request).where("id != ? and request_state in ('active','queued','pending')", prov_id)
end
# get
def get_active_services(srv_id)
return $evm.vmdb(:service).where("id != ? and retired = false", srv_id)
end
# get locked objects and select valid locked objects based on the key and value of the lock as well
# as the current time compared to the expiration time if present
def get_locked(object, type, key, value)
# default to empty list
locked = []
# get values from vmdb
log(:info, "Selecting (active) objects according to type=#{type}") if @debug
case type
when 'vm'
locked = get_active_vms(object.id)
when 'service'
locked = get_active_services(object.id)
when 'miq_provision'
locked = get_active_provs(object.id)
when 'miq_request'
locked = get_active_requests(object.id)
else
raise "Other object types are not supported by this method."
end
log(:info, "Selected #{locked.size} objects of type=#{type}") if @debug
# select only the values that are locked (object[key] == value) and
# that have a non-expired lock
selected = locked.select { |item|
# get the value hash and return false if nil or empty
value_hash = prop_get(item, type, key)
next false if value_hash.nil? || value_hash.empty?
# check expiration if it exists
if value_hash.key?(EXPIRES_PROPERTY)
# if the expiration happened before now
if value_hash[EXPIRES_PROPERTY] < Time.now
# force unlock on that item because it is expired
unlock(item, key)
# do not select because it is expired
next false
end
end
# check value if other checks have passed and use it to determine if the lock is the same
value_hash.key?(VALUE_PROPERTY) && value_hash[VALUE_PROPERTY] == value
}
# log
log(:info, "Found #{selected.size} locked items from #{locked.size} items") if @debug
return selected
end
############################ Prop #############################
# get lock property for any object
def prop_get(object, type, key)
case type
when 'vm', 'service'
return object.custom_get(key)
when 'miq_provision', 'miq_request'
return object.get_option(key)
else
raise "Other object types are not supported by this method."
end
end
# set lock property for any object
def prop_set(object, type, key, value)
when 'vm', 'service'
return object.custom_set(key, value)
when 'miq_provision', 'miq_request'
return object.set_option(key, value)
else
raise "Other object types are not supported by this method."
end
end
# unset lock property for any object
def prop_unset(object, type, key)
prop_set(object, type, key, nil)
end
############################ Lock #############################
# locks the object by setting the key and value on the target
# object as well as (optionally) the expiration time
def lock(object, type, key, value, expires)
# lock value and log message
value_hash = {VALUE_PROPERTY => value}
log_msg = "Setting lock #{type}.#{key} => '#{value}'"
expire_time = Time.now
# parse out expires value based on expiration
if 'false'.casecmp(expires) == 0
expire_time = nil
elsif 'true'.casecmp(expires) == 0
expire_time = expire_time + DEFAULT_EXP_TIME
elsif expires.to_i
expire_time = expire_time + expires.to_i
end
# if expire time is available add the value
# to the value hash before setting
unless expire_time.nil?
value_hash[EXPIRES_PROPERTY] = expire_time
log_msg = "#{log_msg}, expires at #{expire_time}"
end
# first set property
prop_set(object, type, key, value_hash)
log(:info, log_msg)
end
# unlocks the object (deletes the key from its options)
def unlock(object, type, key)
unless key.present?
return
end
# remove property from object
prop_unset(object, type, key)
# log
log(:info, "Removed lock #{type || object.type}.#{key}")
end
# performs abort/retry action
def do_backoff(object, key, value, backoff, random, allowed, current_acquired)
# get random time
random_time = rand(random) + backoff
# set message
msg = "backing off #{object.type}.#{key} => '#{value}' for #{random_time} seconds because lock count is at #{current_acquired} of #{allowed}"
log(:info, msg)
# set retry for process
$evm.root['ae_result'] = 'retry'
$evm.root['ae_reason'] = msg
$evm.root['ae_retry_interval'] = "#{random_time}.seconds"
# exit from here
exit MIQ_OK
end
# acquires the lock, checks to see if we are under the allowed
# count of acquired locks, backs off if it cannot acquire the
# lock
def acquire(object, type, key, value, expires, backoff, random, allowed)
# get lock
lock(object, type, key, value, expires)
# get other locked items
locked = get_locked(object, type, key, value)
# this is the same, or nearly the same, as locked.size > allowed - 1 which
# is because we know that we have at least one lock that won't show up in
# the query at this point
if locked.size >= allowed
# immediately unlock
unlock(object, type, key)
# do backoff and attempt to reacquire lock later
do_backoff(object, key, value, backoff, random, allowed, locked.size)
end
# otherwise proceed in locked state
end
############################ Body ############################
@object = nil
@input_hash = {}
begin
# inputs/configuration
['action','object','key','value','expires','backoff','random','allowed'].each do |k|
@input_hash[k] = $evm.object[k]
end
# set to defaults if defaults are not available
@input_hash['action'] ||= DEFAULT_ACTION
@input_hash['object'] ||= $evm.root[DEFAULT_OBJECT_KEY]
@input_hash['value'] ||= DEFAULT_VALUE
@input_hash['expires'] ||= DEFAULT_EXPIRES
@input_hash['backoff'] ||= DEFAULT_BACKOFF
@input_hash['random'] ||= DEFAULT_RANDOM
@input_hash['allowed'] = @input_hash['allowed'].to_i > 0 ? @input_hash['allowed'].to_i : DEFAULT_ALLOWED
#log configuration values
@input_hash.each { |key, value|
log(:info, "Config: #{key}=>#{value}") if @debug
}
type_target = @input_hash['object']
# ensure that there is a target object key
raise "No value found for input object's key, aborting" unless type_target.present?
# ensure that target object is supported
unless 'vm'.casecmp(type_target) == 0
|| 'miq_provision'.casecmp(type_target) == 0
|| 'miq_request'.casecmp(type_target) == 0
|| 'service'.casecmp(type_target) == 0
raise "Unexpected value '#{type_target}' given for the target object type. Expected 'vm', 'service', 'miq_request', or 'miq_provision'."
end
# ensure that we have an object
@object = $evm.root[type_target]
raise "Could not get object from $evm.root[#{@input_hash['object']}] to use as lock target, aborting" if @object.nil?
# ensure that we have a target key
raise "No input key found, aborting" unless @input_hash['key'].present?
# decide what action to take
case @input_hash['action']
when ACTION_ACQUIRE
acquire(@object, type_target, @input_hash['key'], @input_hash['value'], @input_hash['expires'], @input_hash['backoff'], @input_hash['random'], @input_hash['allowed'])
when ACTION_RELEASE
unlock(@object, type_target, @input_hash['key'])
else
raise "The action #{@input_hash['action']} cannot be performed. Expected 'acquire' or 'release'."
end
# confirm exit OK
exit MIQ_OK
rescue => err
log(:error, "could not obtain lock, exiting with error => #{err}")
log(:error, "stack trace: #{err.backtrace.join("\n")}")
# make sure to always unlock during an error
unless @object.nil?
unlock(@object, @input_hash['object'], @input_hash['key'])
end
# nothing else to be done if an error happens while getting the lock
exit MIQ_ABORT
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment